WO2022012554A1 - Multi-channel audio signal encoding method and apparatus - Google Patents

Multi-channel audio signal encoding method and apparatus Download PDF

Info

Publication number
WO2022012554A1
WO2022012554A1 PCT/CN2021/106102 CN2021106102W WO2022012554A1 WO 2022012554 A1 WO2022012554 A1 WO 2022012554A1 CN 2021106102 W CN2021106102 W CN 2021106102W WO 2022012554 A1 WO2022012554 A1 WO 2022012554A1
Authority
WO
WIPO (PCT)
Prior art keywords
energy
amplitude
channels
channel
audio signals
Prior art date
Application number
PCT/CN2021/106102
Other languages
French (fr)
Chinese (zh)
Inventor
王智
丁建策
王宾
李海婷
王喆
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to JP2023502892A priority Critical patent/JP2023533367A/en
Priority to EP21842335.8A priority patent/EP4174853A4/en
Priority to BR112023000835A priority patent/BR112023000835A2/en
Publication of WO2022012554A1 publication Critical patent/WO2022012554A1/en
Priority to US18/154,451 priority patent/US20230154472A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/002Dynamic bit allocation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information

Definitions

  • the present application relates to audio coding and decoding technologies, and in particular, to a multi-channel audio signal coding method and device.
  • Audio coding is one of the key technologies of multimedia technology. Audio coding compresses the amount of data by removing redundant information in the original audio signal to facilitate storage or transmission.
  • Multi-channel audio coding is the coding of more than two channels, and the common ones are 5.1 channels, 7.1 channels, 7.1.4 channels, 22.2 channels, etc.
  • Multi-channel signal screening, group pairing, stereo processing, multi-channel side information generation, quantization processing, entropy coding processing and code stream multiplexing on multiple original audio signals to form a serial bit stream (coded code stream) , to facilitate transmission over the channel or storage in digital media.
  • coded code stream serial bit stream
  • the energy of all channels is usually averaged. This way affects the quality of the encoded audio signal.
  • the above energy equalization method may cause insufficient quality of coded bits of channel frames with large energy/amplitude, and redundant coded bits of channel frames with small energy wastes resources.
  • the total available bits are tight, resulting in a significant degradation in the quality of channel frames with large energy/amplitude.
  • the present application provides a multi-channel audio signal encoding method and device, which are beneficial to improve the quality of the encoded audio signal.
  • an embodiment of the present application provides a multi-channel audio signal encoding method, the method may include: acquiring audio signals of P channels of a current frame of the multi-channel audio signal, where P is a positive integer greater than 1,
  • the audio signals of the P channels include audio signals of K channel pairs, where K is a positive integer.
  • the respective bit numbers of the K channel pairs are determined according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits.
  • the audio signals of the P channels are encoded to obtain an encoded code stream.
  • the energy/amplitude of the audio signal of one channel in the P channels includes the energy/amplitude of the audio signal of the one channel in the time domain, the energy/amplitude of the audio signal of the one channel after time-frequency transformation Amplitude, the time-frequency transformed and whitened energy/amplitude of the audio signal of the one channel, the energy/amplitude of the audio signal of the one channel after energy/amplitude equalization, or the audio signal of the one channel after stereo processing At least one of the following energy/amplitude.
  • the energy/amplitude after time-frequency transformation and whitening, the energy/amplitude after energy/amplitude equalization, or the energy/amplitude after stereo processing At least one of the energy/amplitude of the channel pair is allocated to the channel pair, and the number of bits for each of the K channel pairs is determined, so as to realize the reasonable allocation of the bit number of each channel pair in the multi-channel signal encoding, so as to ensure the decoding end. Reconstruct the quality of the audio signal.
  • the K channel pairs include the current channel pair
  • the method may further include: performing energy/amplitude measurements on the audio signals of the two channels of the current channel pair in the K channel pairs. Equalization to obtain the energy/amplitude of the audio signals of the two channels of the current channel pair after energy/amplitude equalization.
  • the K channel pairs include the current channel pair
  • encoding the audio signals of the P channels according to the respective bit numbers of the K channel pairs may include: according to the current channel
  • the number of bits of the pair and the respective stereo-processed energy/amplitude of the audio signals of the two channels in the current channel pair determine the respective number of bits of the two channels in the current channel pair.
  • the audio signals of the two channels are encoded according to the respective bit numbers of the two channels in the current channel pair.
  • the bits within the channel pair can be allocated based on the respective bit numbers of the K channels, so as to achieve a reasonable allocation of each channel in the multi-channel signal encoding.
  • determining the respective bit numbers of the K channel pairs according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits may include:
  • the respective energy/amplitude of the audio signal determines the sum of the energy/amplitude of the current frame.
  • the respective bit coefficients of the K channel pairs are determined according to the sum of the respective energy/amplitude of the audio signals of the K channel pairs and the energy/amplitude sum of the current frame.
  • the respective bit numbers of the K channel pairs are determined according to the respective bit coefficients of the K channel pairs and the available number of bits.
  • determining the energy/amplitude sum of the current frame according to the respective energy/amplitude of the audio signals of the P channels may include: after the stereo processing of the audio signals of the P channels, respectively. energy/amplitude, determine the energy/amplitude sum of the current frame.
  • the energy/amplitude equalization can be performed on the two channels in a single channel pair, so that the channel pair with a large energy/amplitude difference can still maintain a large energy/amplitude equalization after the energy/amplitude equalization.
  • energy/amplitude difference so that when bit allocation is performed based on the energy/amplitude after energy/amplitude equalization, more bits can be allocated to channel pairs with larger energy/amplitude to ensure that channels with larger energy/amplitude
  • the right coded bits meet its coding requirements, thereby improving the quality of the reconstructed audio signal at the decoding end.
  • determining the energy/amplitude sum of the current frame according to the respective stereo-processed energy/amplitude of the audio signals of the P channels may include: according to the formula Calculate the energy/magnitude and sum_E post for this current frame.
  • ch represents the channel index
  • E post (ch) represents the stereo-processed energy/amplitude of the audio signal of the channel whose channel index is ch
  • sampleCoef post (ch, i) represents the ch-th sound after stereo processing.
  • the ith coefficient of the current frame of the track N represents the number of coefficients of the current frame, and N takes a positive integer greater than 1.
  • determining the energy/amplitude sum of the current frame according to the respective energy/amplitude of the audio signals of the P channels may include: equalizing according to the respective energy/amplitude of the audio signals of the P channels energy/amplitude before, determine the energy/amplitude sum of the current frame, the energy/amplitude of the audio signal of one channel in the P channels
  • the energy/amplitude before equalization includes the audio signal of the one channel in the time domain , or the energy/amplitude of the audio signal of the one channel after time-frequency transformation, or the energy/amplitude of the audio signal of the one channel after time-frequency transformation and whitening.
  • the energy/amplitude sum of the current frame is determined by using the energy/amplitude of the audio signals of the P channels of the current frame before equalization, so as to perform bit allocation based on the energy/amplitude sum of the current frame , that is, using the energy/amplitude before energy/amplitude equalization to perform bit allocation, it is possible to reasonably allocate the number of bits of each channel in multi-channel signal encoding, so as to ensure the quality of the reconstructed audio signal at the decoding end.
  • This implementation manner can solve the problem of insufficient coding bits for the signal of the channel with large energy/amplitude, so as to ensure the quality of the reconstructed audio signal at the decoding end.
  • Using the energy/amplitude before energy/amplitude equalization for bit allocation compared with using the energy/amplitude after energy/amplitude equalization for bit allocation, can reasonably allocate the number of bits of each channel in multi-channel signal coding, and the number of bits
  • the allocation processing is decoupled from the energy/amplitude equalization processing. That is, the bit allocation process is not affected by the energy/amplitude equalization process.
  • this implementation method uses the energy/amplitude before the energy/amplitude equalization to perform bit allocation, and can achieve reasonable distribution of multi-channel signals The number of bits of each channel in encoding, so that more encoding bits are allocated to channel signals with large energy/amplitude, so as to ensure the quality of the reconstructed audio signal at the decoding end.
  • the energy/amplitude sum of the current frame is determined according to the energy/amplitude before equalization of the respective energy/amplitude of the audio signals of the P channels, which may include:
  • determining the energy/amplitude sum of the current frame according to the respective energy/amplitude of the audio signals of the P channels may include: equalizing according to the respective energy/amplitude of the audio signals of the P channels The previous energy/amplitude and the respective weighting coefficients of the P channels are used to determine the energy/amplitude sum of the current frame, and the weighting coefficient is less than or equal to 1.
  • the number of bits of each channel in the multi-channel signal encoding can be adjusted through the weighting coefficient, so as to achieve reasonable allocation of the number of bits of each channel in the multi-channel signal encoding.
  • the energy/amplitude sum is determined according to the energy/amplitude of the audio signals of the P channels before equalization and the respective weighting coefficients of the P channels, which may include:
  • ch represents the channel index
  • E pre (ch) is the energy/amplitude of the audio signal of the ch-th channel before energy/amplitude equalization
  • ⁇ (ch) is the weighting coefficient of the ch-th channel
  • the weighting coefficients of the two channels are the same, and the magnitude of the weighting coefficients of the two channels of the one channel pair is inversely proportional to the normalized correlation value between the two channels of the one channel pair.
  • the number of bits of each channel in multi-channel signal coding is adjusted by the weighting coefficient, and the size of the weighting coefficient of the two channels of a channel pair is normalized between the two channels of the channel pair.
  • the correlation value is inversely proportional, that is, the number of bits of the channel pair with low correlation can be increased through the weighting coefficient, thereby improving the encoding effect and ensuring the quality of the reconstructed audio signal at the decoding end.
  • determining the respective bit numbers of the K channel pairs may include: according to the respective energy/amplitude of the audio signals of the P channels, and the number of available bits, determine the number of bits for each of the K channel pairs and the number of bits for each of the Q channels.
  • Encoding the audio signals of the P channels according to the respective bit numbers of the K channel pairs may include: respectively encoding the audio signals of the K channel pairs according to the respective bit numbers of the K channel pairs.
  • one of the Q channels may be a monophonic channel, or may also be a channel obtained by downmixing.
  • the respective bit numbers of the K channel pairs and the respective bit numbers of the Q channels can be determined.
  • the method includes: determining the energy/amplitude sum of the current frame according to the respective energy/amplitude of the audio signals of the P channels.
  • the respective bit coefficients of the K channel pairs are determined according to the sum of the respective energy/amplitude of the audio signals of the K channel pairs and the energy/amplitude sum of the current frame.
  • the respective bit coefficients of the Q channels are determined according to the sum of the respective energy/amplitude of the audio signals of the Q channels and the energy/amplitude of the current frame.
  • the respective bit numbers of the K channel pairs are determined according to the respective bit coefficients of the K channel pairs and the available number of bits.
  • the respective bit numbers of the Q channels are determined according to the respective bit coefficients of the Q channels and the available number of bits.
  • encoding the audio signals of the P channels according to the respective bit numbers of the K channels may include: encoding the P channels according to the respective bit numbers of the K channels.
  • the energy/amplitude equalized audio signal of the channel is encoded.
  • the energy/amplitude equalized audio signals of the P channels can be encoded, wherein the energy/amplitude equalized audio signals of the P channels can be encoded by encoding the audio signals of the P channels.
  • the encoding may include stereo processing, entropy encoding, etc., which can improve encoding efficiency and encoding effect.
  • an embodiment of the present application provides a multi-channel audio signal encoding device, and the multi-channel audio signal encoding device may be an audio encoder, or a chip or a system-on-a-chip of an audio encoding device, or an audio encoder.
  • the multi-channel audio signal encoding apparatus can implement the functions executed in the above first aspect or each possible design of the above first aspect, and the functions can be implemented by executing corresponding software through hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the multi-channel audio signal encoding apparatus may include: an acquisition module configured to acquire the audio signals of the P channels of the current frame of the multi-channel audio signal and the P audio signals The respective energy/amplitude of the audio signals of the channels, P is a positive integer greater than 1, the audio signals of the P channels include audio signals of K channel pairs, and K is a positive integer.
  • the bit allocation module is configured to determine the respective bit numbers of the K channel pairs according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits.
  • the encoding module is configured to encode the audio signals of the P channels according to the respective bit numbers of the K channels to obtain an encoded code stream.
  • the energy/amplitude of the audio signal of one channel in the P channels includes the energy/amplitude of the audio signal of the one channel in the time domain, the energy/amplitude of the audio signal of the one channel after time-frequency transformation Amplitude, the time-frequency transformed and whitened energy/amplitude of the audio signal of the one channel, the energy/amplitude of the audio signal of the one channel after energy/amplitude equalization, or the audio signal of the one channel after stereo processing At least one of the following energy/amplitude.
  • the K channel pairs include the current channel pair, and the encoding module is used for: according to the number of bits of the current channel pair and the respective audio signals of the two channels in the current channel pair.
  • the energy/amplitude after stereo processing determines the respective bit numbers of the two channels in the current channel pair.
  • the audio signals of the two channels are encoded according to the respective bit numbers of the two channels in the current channel pair.
  • the bit allocation module is configured to: determine the energy/amplitude sum of the current frame according to the respective energy/amplitude of the audio signals of the P channels.
  • the respective bit coefficients of the K channel pairs are determined according to the sum of the respective energy/amplitude of the audio signals of the K channel pairs and the energy/amplitude sum of the current frame.
  • the respective bit numbers of the K channel pairs are determined according to the respective bit coefficients of the K channel pairs and the available number of bits.
  • the bit allocation module is configured to: determine the energy/amplitude sum of the current frame according to the respective stereo-processed energy/amplitude of the audio signals of the P channels.
  • bit allocation module is used to: according to the formula Calculate the energy/magnitude and sum_E post for this current frame.
  • ch represents the channel index
  • E post (ch) represents the stereo-processed energy/amplitude of the audio signal of the channel whose channel index is ch
  • sampleCoef post (ch, i) represents the ch-th sound after stereo processing.
  • the ith coefficient of the current frame of the track N represents the number of coefficients in the current frame, and N takes a positive integer greater than 1.
  • the bit allocation module is used to: determine the energy/amplitude sum of the current frame according to the energy/amplitude before equalization of the respective energy/amplitude of the audio signals of the P channels, the P channels.
  • the energy/amplitude of the audio signal of one channel before equalization includes the energy/amplitude of the audio signal of the one channel in the time domain, or the energy/amplitude of the audio signal of the one channel after time-frequency transformation. Amplitude, or the energy/amplitude of the audio signal of one channel after time-frequency transformation and whitening.
  • the bit allocation module is used to: according to the formula Calculate the energy/amplitude sum_E pre of the current frame, where ch represents the channel index, and E pre (ch) represents the energy/amplitude of the audio signal of the channel whose channel index is ch before energy/amplitude equalization.
  • the bit allocation module is used for: according to the energy/amplitude before equalization of the respective energy/amplitude of the audio signals of the P channels and the respective weighting coefficients of the P channels, determine the value of the current frame. Energy/amplitude sum, the weighting factor is less than or equal to 1.
  • bit allocation block is used to:
  • ch represents the channel index
  • E pre (ch) is the energy/amplitude of the audio signal of the ch-th channel before energy/amplitude equalization
  • ⁇ (ch) is the weighting coefficient of the ch-th channel
  • the weighting coefficients of the two channels are the same, and the size of the weighting coefficients of the two channels of the one channel pair is inversely proportional to the normalized correlation value between the two channels of the one channel pair.
  • the bit allocation module is configured to: determine the respective bit numbers of the K channel pairs and the respective bit numbers of the Q channels according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits.
  • the encoding module is used to encode the audio signals of the K channel pairs according to the respective bit numbers of the K channel pairs, and respectively encode the audio signals of the Q channels according to the respective bit numbers of the Q channels to encode.
  • the bit allocation module is configured to: determine the sum of the energy/amplitude of the current frame according to the respective energy/amplitude of the audio signals of the P channels.
  • the respective bit coefficients of the K channel pairs are determined according to the sum of the respective energy/amplitude of the audio signals of the K channel pairs and the energy/amplitude sum of the current frame.
  • the respective bit coefficients of the Q channels are determined according to the sum of the energy/amplitude of the audio signals of the Q channels and the energy/amplitude of the current frame.
  • the respective bit numbers of the K channel pairs are determined according to the respective bit coefficients of the K channel pairs and the available number of bits.
  • the respective bit numbers of the Q channels are determined according to the respective bit coefficients of the Q channels and the available number of bits.
  • the encoding module is configured to encode the energy/amplitude equalized audio signals of the P channels according to the respective bit numbers of the K channels.
  • the apparatus may further include: an energy/amplitude equalization module.
  • the energy/amplitude equalization module is configured to obtain the energy/amplitude equalized audio signals of the P channels according to the audio signals of the P channels.
  • an embodiment of the present application provides a multi-channel audio signal encoding method, the method may include: acquiring audio signals of P channels of a current frame of the multi-channel audio signal, where P is a positive integer greater than 1,
  • the audio signals of the P channels include audio signals of K channel pairs, where K is a positive integer.
  • the energy/amplitude of the respective energy/amplitude equalized audio signals of the two channels of the channel pair may include: acquiring audio signals of P channels of a current frame of the multi-channel audio signal, where P is a positive integer greater than 1,
  • the audio signals of the P channels include audio signals of K channel pairs, where K is a positive integer.
  • the respective bit numbers of the two channels of the current channel pair are determined according to the energy/amplitude after energy/amplitude equalization of the audio signals of the two channels of the current channel pair, and the number of available bits.
  • the audio signals of the two channels are encoded respectively according to the respective bit numbers of the two channels of the current channel pair, so as to obtain an encoded code stream.
  • the energy/amplitude equalization can be performed on the two channels in a single channel pair, so that the channel pair with a large energy/amplitude difference can still maintain a large energy/amplitude equalization after the energy/amplitude equalization.
  • energy/amplitude difference so that when bit allocation is performed based on the energy/amplitude after energy/amplitude equalization, more bits can be allocated to channel pairs with larger energy/amplitude to ensure that channels with larger energy/amplitude
  • the right coded bits meet its coding requirements, thereby improving the quality of the reconstructed audio signal at the decoding end.
  • Determining the respective bit numbers of the two channels of the current channel pair may include: determining the energy/amplitude sum of the current frame according to the respective energy/amplitude equalized energy/amplitude of the audio signals of the P channels. According to the energy/amplitude sum of the current frame, the energy/amplitude after energy/amplitude equalization of the audio signals of the two channels of the current channel pair, and the number of available bits, determine the two audio channels of the current channel pair. the number of bits for each channel.
  • the respective energy/amplitude equalized energy/amplitude of the audio signals of the two channels of the current channel pair and the number of available bits, determine the respective bit numbers of the two channels of the current channel pair, which may include :
  • the energy/amplitude after the energy/amplitude equalization of the audio signals of the respective two channels by the K channels, and the energy/amplitude after the energy/amplitude equalization of the audio signals of the Q channels determine The energy/magnitude sum of the current frame.
  • the respective bit numbers of the two channels of the current channel pair are determined according to the energy/amplitude sum of the current frame, the respective energy/amplitude of the audio signals of the two channels of the current channel pair, and the number of available bits.
  • the respective bit numbers of the Q channels are determined according to the energy/amplitude sum of the current frame, the energy/amplitude after energy/amplitude equalization of the audio signals of the Q channels, and the number of available bits.
  • Encoding the audio signals of the two channels according to the respective bit numbers of the two channels of the current channel pair, and obtaining the encoded code stream may include: according to the respective bit numbers of the K channel pairs, respectively.
  • the audio signals of the K channel pairs are encoded, and the audio signals of the Q channels are encoded according to the respective bit numbers of the Q channels, so as to obtain an encoded code stream.
  • an embodiment of the present application provides a multi-channel audio signal encoding device, and the multi-channel audio signal encoding device may be an audio encoder, or a chip or a system-on-chip of an audio encoding device, or an audio encoder.
  • the multi-channel audio signal encoding apparatus can implement the functions executed in the above third aspect or each possible design of the above third aspect, and the functions can be implemented by executing corresponding software in hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the multi-channel audio signal encoding apparatus may include: an acquisition module configured to acquire the audio signals of P channels of the current frame of the multi-channel audio signal, where P is greater than 1 A positive integer of , the audio signals of the P channels include audio signals of K channel pairs, and K is a positive integer.
  • the energy/amplitude equalization module is used for performing energy analysis on the audio signals of the two channels of the current channel pair according to the respective energy/amplitude of the audio signals of the two channels of the current channel pair in the K channel pairs. /amplitude equalization, to obtain the respective energy/amplitude equalized energy/amplitude of the audio signals of the two channels of the current channel pair.
  • a bit allocation module configured to determine the respective energy/amplitude of the audio signals of the two channels of the current channel pair after energy/amplitude equalization, and the number of available bits, to determine the respective two channels of the current channel pair. number of bits.
  • the encoding module is configured to encode the audio signals of the two channels according to the respective bit numbers of the two channels of the current channel pair, so as to obtain an encoded code stream.
  • bit allocation module is used to: determine the current energy/amplitude according to the respective energy/amplitude equalized energy/amplitude of the audio signals of the P channels.
  • the energy/amplitude sum of the frame according to the energy/amplitude sum of the current frame, the respective energy/amplitude equalized energy/amplitude of the audio signals of the two channels of the current channel pair, and the available number of bits, determine the The number of bits for each of the two channels of the current channel pair.
  • the bit allocation module is used for: according to the energy/amplitude equalized energy/amplitude of the audio signals of the respective two channels according to the K channels, and the energy/amplitude equalization of the audio signals of the Q channels Determine the energy/amplitude sum of the current frame; according to the energy/amplitude sum of the current frame, the respective energy/amplitude of the audio signals of the two channels of the current channel pair, and the number of available bits, Determine the respective bit numbers of the two channels of the current channel pair; according to the energy/amplitude sum of the current frame, the respective energy/amplitude equalized energy/amplitude of the audio signals of the Q channels, and the available bits number to determine the number of bits for each of the Q channels.
  • the encoding module is used for: encoding the audio signals of the K channel pairs according to the respective bit numbers of the K channel pairs, and respectively encoding the audio signals of the Q channels according to the respective bit numbers of the Q channels.
  • the signal is encoded to obtain the encoded code stream.
  • an embodiment of the present application provides an audio signal encoding apparatus, comprising: a non-volatile memory and a processor coupled to each other, the processor calling program codes stored in the memory to execute the above-mentioned first The method of any one of the aspects, or to perform the method of any one of the third aspects above.
  • an embodiment of the present application provides an audio signal encoding device, including: an encoder, where the encoder is configured to perform the method described in any one of the first aspect above, or perform the method described in the third aspect above The method of any one.
  • an embodiment of the present application provides a computer-readable storage medium, including a computer program, when the computer program is executed on a computer, the computer program causes the computer to execute the method described in any one of the above-mentioned first aspects, Alternatively, the method according to any one of the above third aspects is performed.
  • an embodiment of the present application provides a computer-readable storage medium, including an encoded code stream obtained according to any of the methods described in the first aspect above, or the method described in any of the above-mentioned third aspects.
  • the encoded code stream obtained by the method is obtained by the method.
  • the present application provides a computer program product, the computer program product includes a computer program, when the computer program is executed by a computer, for executing the method described in any one of the above first aspects, or executing the above The method of any one of the third aspects.
  • the present application provides a chip, including a processor and a memory, the memory is used for storing a computer program, and the processor is used for calling and running the computer program stored in the memory, so as to execute the above-mentioned first aspect The method of any one of the above, or to perform the method of any one of the third aspects above.
  • the multi-channel audio signal encoding method and device acquire the audio signals of P channels of the current frame of the multi-channel audio signal, where the audio signals of the P channels include audio signals of K channel pairs , according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits, determine the respective bit numbers of the K channel pairs, and according to the respective bit numbers of the K channel pairs, for the audio signals of the P channels Encode to get the encoded bitstream.
  • the energy/amplitude of the audio signal of one channel of the P channels includes the energy/amplitude of the audio signal of the one channel in the time domain, the energy/amplitude of the audio signal of the one channel after time-frequency transformation, The energy/amplitude of the audio signal of the one channel after time-frequency transformation and whitening, the energy/amplitude of the audio signal of the one channel after energy/amplitude equalization, or the audio signal of the one channel after stereo processing At least one of energy/amplitude.
  • the energy/amplitude after time-frequency transformation, the energy/amplitude after time-frequency transformation and whitening, and the energy/amplitude after energy/amplitude equalization performs the bit allocation for the channel pair, and determines the respective bit numbers of the K channel pairs, thereby realizing the reasonable allocation of the bits of each channel pair in the multi-channel signal encoding. to ensure the quality of the reconstructed audio signal at the decoding end.
  • the method of the embodiments of the present application can solve the problem of insufficient coding bits for channel pairs with large energy/amplitude, so as to ensure the quality of the reconstructed audio signal at the decoding end .
  • FIG. 1 is a schematic diagram of an example of an audio encoding and decoding system in an embodiment of the application
  • FIG. 2 is a flowchart of a method for encoding a multi-channel audio signal according to an embodiment of the present application
  • FIG. 3 is a flowchart of a multi-channel audio signal encoding method according to an embodiment of the application.
  • FIG. 4 is a flowchart of a method for allocating bits of a channel pair according to an embodiment of the present application
  • FIG. 5 is a schematic diagram of a processing process of an encoding end according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a processing process of a channel coding unit according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a processing process of a channel coding unit according to an embodiment of the present application.
  • FIG. 8 is a flowchart of another multi-channel audio signal encoding method according to an embodiment of the application.
  • FIG. 9 is a schematic structural diagram of an audio signal encoding apparatus according to an embodiment of the application.
  • FIG. 10 is a schematic structural diagram of an audio signal encoding device according to an embodiment of the present application.
  • At least one (item) refers to one or more, and "a plurality” refers to two or more.
  • “And/or” is used to describe the relationship between related objects, indicating that there can be three kinds of relationships, for example, “A and/or B” can mean: only A, only B, and both A and B exist , where A and B can be singular or plural.
  • the character “/” generally indicates that the associated objects are an “or” relationship.
  • At least one item(s) below” or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s).
  • At least one (a) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c” ”, where a, b, c can be single or multiple respectively, or part of them can be single and part of them can be multiple.
  • FIG. 1 exemplarily shows a schematic block diagram of an audio encoding and decoding system 10 to which the embodiments of the present application are applied.
  • audio encoding and decoding system 10 may include source device 12 and destination device 14, source device 12 producing encoded audio data, and thus source device 12 may be referred to as an audio encoding device.
  • Destination device 14 may decode the encoded audio data produced by source device 12, and thus destination device 14 may be referred to as an audio decoding device.
  • Various implementations of source device 12, destination device 14, or both may include one or more processors and a memory coupled to the one or more processors.
  • Source device 12 and destination device 14 may include a variety of devices, including desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set-top boxes, so-called "smart" phones, and other telephone handsets , TVs, speakers, digital media players, video game consoles, in-vehicle computers, any wearable devices, virtual reality (VR) devices, servers providing VR services, augmented reality (AR) devices, A server, wireless communication device or the like that provides AR services.
  • VR virtual reality
  • AR augmented reality
  • FIG. 1 depicts source device 12 and destination device 14 as separate devices
  • device embodiments may also include the functionality of both source device 12 and destination device 14 or both, ie source device 12 or a corresponding and the functionality of the destination device 14 or corresponding.
  • source device 12 or corresponding functionality and destination device 14 or corresponding functionality may be implemented using the same hardware and/or software, or using separate hardware and/or software, or any combination thereof .
  • Source device 12 and destination device 14 may be communicatively connected via link 13 through which destination device 14 may receive encoded audio data from source device 12 .
  • Link 13 may include one or more media or devices capable of moving encoded audio data from source device 12 to destination device 14 .
  • link 13 may include one or more communication media that enable source device 12 to transmit encoded audio data directly to destination device 14 in real-time.
  • source device 12 may modulate the encoded audio data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated audio data to destination device 14 .
  • the one or more communication media may include wireless and/or wired communication media, such as radio frequency (RF) spectrum or one or more physical transmission lines.
  • RF radio frequency
  • the one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (eg, the Internet).
  • the one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from source device 12 to destination device 14.
  • Source device 12 includes encoder 20 , and optionally, source device 12 may also include audio source 16 , pre-processor 18 , and communication interface 22 .
  • the encoder 20 , the audio source 16 , the preprocessor 18 , and the communication interface 22 may be hardware components in the source device 12 or software programs in the source device 12 . They are described as follows:
  • Audio source 16 which may include or may be any type of sound capture device, for example capturing real world sounds, and/or any type of audio generation device. Audio source 16 may be a microphone for capturing sound or a memory for storing audio data, audio source 16 may also include any category (internal or external) that stores previously captured or generated audio data and/or acquires or receives audio data. )interface. When the audio source 16 is a microphone, the audio source 16 may be, for example, a local or integrated microphone integrated in the source device; when the audio source 16 is a memory, the audio source 16 may be local or, for example, an integrated microphone integrated in the source device memory.
  • the interface may be, for example, an external interface that receives audio data from an external audio source, such as an external sound capture device, such as a microphone, an external memory, or an external audio generation device.
  • the interface may be any class of interface according to any proprietary or standardized interface protocol, eg wired or wireless interfaces, optical interfaces.
  • the audio data transmitted from the audio source 16 to the preprocessor 18 may also be referred to as original audio data 17 .
  • the preprocessor 18 is used for receiving the original audio data 17 and performing preprocessing on the original audio data 17 to obtain the preprocessed audio 19 or the preprocessed audio data 19 .
  • the preprocessing performed by the preprocessor 18 may include filtering, or denoising, or the like.
  • the encoder 20 (or called the audio encoder 20) is used to receive the pre-processed audio data 19, and used to execute the various embodiments described later, so as to realize the encoding method of the audio signal encoding method described in this application. application.
  • a communication interface 22 that can be used to receive encoded audio data 21 and to transmit the encoded audio data 21 via link 13 to destination device 14 or any other device (eg, memory) for storage or direct reconstruction , the other device can be any device for decoding or storage.
  • the communication interface 22 may, for example, be used to encapsulate the encoded audio data 21 into a suitable format, eg, data packets, for transmission over the link 13 .
  • the destination device 14 includes a decoder 30 , and optionally, the destination device 14 may also include a communication interface 28 , an audio post-processor 32 and a speaker device 34 . They are described as follows:
  • a communication interface 28 may be used to receive encoded audio data 21 from source device 12 or any other source, such as a storage device, such as an encoded audio data storage device.
  • the communication interface 28 may be used to transmit or receive encoded audio data 21 via the link 13 between the source device 12 and the destination device 14, such as a direct wired or wireless connection, or via any kind of network.
  • Classes of networks are, for example, wired or wireless networks or any combination thereof, or any classes of private and public networks, or any combination thereof.
  • the communication interface 28 may, for example, be used to decapsulate data packets transmitted by the communication interface 22 to obtain encoded audio data 21 .
  • Both the communication interface 28 and the communication interface 22 may be configured as a one-way communication interface or a two-way communication interface, and may be used, for example, to send and receive messages to establish connections, acknowledge and exchange any other communication links and/or, for example, encoded audio Data transfer information about data transfer.
  • Decoder 30 (or referred to as decoder 30 ) for receiving encoded audio data 21 and providing decoded audio data 31 or decoded audio 31 .
  • the post-processing performed by the audio post-processor 32 may include, for example, rendering, or any other processing, and may also be used to transmit the post-processed audio data 33 to the speaker device 34 .
  • a loudspeaker device 34 for receiving post-processed audio data 33 to play audio to eg a user or viewer.
  • the speaker device 34 may be or include any type of speaker for presenting the reconstructed sound.
  • FIG. 1 depicts source device 12 and destination device 14 as separate devices
  • device embodiments may include the functionality of both source device 12 and destination device 14 or both, ie source device 12 or Corresponding functionality and destination device 14 or corresponding functionality.
  • source device 12 or corresponding functionality and destination device 14 or corresponding functionality may be implemented using the same hardware and/or software, or using separate hardware and/or software, or any combination thereof .
  • Source device 12 and destination device 14 may include any of a variety of devices, including any class of handheld or stationary devices, such as notebook or laptop computers, mobile phones, smartphones, tablet or tablet computers, cameras, desktops Computers, set-top boxes, televisions, cameras, in-vehicle equipment, stereos, digital media players, audio game consoles, audio streaming devices (such as content serving servers or content distribution servers), broadcast receiver equipment, broadcast transmitter equipment, Smart glasses, smart watches, etc., and can use no or any kind of operating system.
  • handheld or stationary devices such as notebook or laptop computers, mobile phones, smartphones, tablet or tablet computers, cameras, desktops Computers, set-top boxes, televisions, cameras, in-vehicle equipment, stereos, digital media players, audio game consoles, audio streaming devices (such as content serving servers or content distribution servers), broadcast receiver equipment, broadcast transmitter equipment, Smart glasses, smart watches, etc., and can use no or any kind of operating system.
  • Both encoder 20 and decoder 30 may be implemented as any of a variety of suitable circuits, eg, one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (application-specific integrated circuits) circuit, ASIC), field-programmable gate array (FPGA), discrete logic, hardware, or any combination thereof.
  • DSPs digital signal processors
  • ASIC application-specific integrated circuits
  • FPGA field-programmable gate array
  • an apparatus may store instructions for the software in a suitable non-transitory computer-readable storage medium and may execute the instructions in hardware using one or more processors to perform the techniques of this disclosure . Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be considered one or more processors.
  • the audio encoding and decoding system 10 shown in FIG. 1 is merely an example, and the techniques of this application may be applicable to audio encoding setups (eg, audio encoding or decoding).
  • data may be retrieved from local storage, streamed over a network, and the like.
  • An audio encoding device may encode and store data to memory, and/or an audio decoding device may retrieve and decode data from memory.
  • encoding and decoding is performed by devices that do not communicate with each other but merely encode data to and/or retrieve data from memory and decode data.
  • the above-mentioned encoder may be a multi-channel encoder, for example, a stereo encoder, a 5.1 channel encoder, or a 7.1 channel encoder, or the like.
  • the above audio data may also be referred to as an audio signal.
  • the audio signal in the embodiment of the present application refers to an input signal in an audio coding device, and the audio signal may include multiple frames.
  • the current frame may specifically refer to a certain one of the audio signals.
  • frame in the embodiment of the present application, the encoding and decoding of the audio signal of the current frame is used as an example, and the previous frame or the next frame of the current frame in the audio signal can be encoded and decoded correspondingly according to the encoding and decoding mode of the audio signal of the current frame, The encoding and decoding process of the previous frame or the next frame of the current frame in the audio signal will not be described one by one.
  • the audio signal in this embodiment of the present application may be a multi-channel signal, that is, an audio signal including P channels. The embodiments of the present application are used to implement multi-channel audio signal encoding.
  • energy/amplitude in the embodiments of the present application represents energy or amplitude, and, in the actual processing process, for the processing of a frame, if the energy is initially processed, then in the subsequent processing All are processing energy, or, if amplitude is initially processed, then amplitude is processed in subsequent processing.
  • the above encoder may execute the multi-channel audio signal encoding method of the embodiments of the present application, so as to reasonably allocate the number of bits of each channel in the multi-channel signal encoding, so as to ensure the quality of the reconstructed audio signal at the decoding end and improve the encoding quality.
  • the specific implementation can refer to the specific explanations of the following embodiments.
  • FIG. 2 is a flowchart of a multi-channel audio signal encoding method according to an embodiment of the present application.
  • the execution body of the embodiment of the present application may be the above encoder.
  • the method in this embodiment may include:
  • Step 101 Acquire the audio signals of P channels of the current frame of the multi-channel audio signal, where P is a positive integer greater than 1, and the audio signals of the P channels include audio signals of K channel pairs.
  • the audio signal of one channel pair includes audio signals of two channels.
  • One channel pair in this embodiment of the present application may be any one of the K channel pairs. Coupling the audio signals of two channels is the audio signal of one channel pair.
  • P 2K.
  • the 5.1 channel includes a left (L) channel, a right (R) channel, a center (C) channel, a low frequency effects (LFE) channel, and a left surround (LS) channel. ) channel, and Surround Right (RS) channel.
  • L channel signal and the R channel signal are paired to form the first channel pair, and after stereo processing, the middle channel M1 channel signal and the side channel S1 channel signal are obtained, and the LS channel signal and the RS channel signal are obtained.
  • the channel signals are grouped to form a second channel pair, and the center channel M2 channel signal and the side channel S2 channel signal are obtained through stereo processing.
  • the audio signals of the above-mentioned P channels include the audio signal of the first channel pair, the audio signal of the second channel pair, and the LFE channel signal and the C channel signal that have not undergone stereo processing.
  • the audio signal of the first channel pair The signals include a center channel M1 channel signal and a side channel S1 channel signal, and the audio signal of the second channel pair includes a center channel M2 channel signal and a side channel S2 channel signal.
  • the middle channels M1 and M2 and the side channels S1 and S2 may be considered as the channels obtained by the downmix processing, that is, the downmix channels.
  • the P channels do not include the LFE channel.
  • the LFE channel may be allocated a fixed number of bits regardless of whether the LFE channel's energy/amplitude value is high or low.
  • the fixed number may be a preset value, that is, no matter how many channels the multi-channel signal includes, and no matter the encoding bit rate of the multi-channel signal, the fixed number is unchanged, For example fixed at 80, 100 or 120 and so on.
  • the fixed number can also be determined according to at least one of the number of channels included in the multi-channel signal and the encoding bit rate of the multi-channel signal.
  • the higher the bit rate, the larger the fixed number for example, when the multi-channel signal is a 5.1-channel signal, that is, includes 6 channels, if the encoding bit rate is 192kbps, the fixed number can be 80, which is LFE sound.
  • the number of bits allocated for the channel is 80bits; if the encoding bit rate is 256kbps, the fixed number can be 120, that is, the number of bits allocated for the LFE channel is 120bits; for example, when the encoding bit rate is 192kbps, if multiple audio
  • the fixed number may be 60, that is, the number of bits allocated for the LFE channel is 60 bits.
  • Step 102 Determine the respective bit numbers of the K channel pairs according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits.
  • the energy/amplitude of the audio signal of one channel of the P channels includes the energy/amplitude of the audio signal of the one channel in the time domain, the energy/amplitude of the audio signal of the one channel after time-frequency transformation , the energy/amplitude of the audio signal of the one channel after time-frequency transformation and whitening, the energy/amplitude of the audio signal of the one channel after energy/amplitude equalization, or the audio signal of the one channel after stereo processing At least one of the energy/amplitude of .
  • the energy/amplitude in the time domain, the energy/amplitude after time-frequency transformation, and the energy/amplitude after time-frequency transformation and whitening are the energy/amplitude before energy/amplitude equalization. In other words, in the bit allocation process, any one or more of the above energy/amplitude can be selected for bit allocation.
  • the available bits do not include the fixed number of bits.
  • the time-frequency transformed and whitened energy/amplitude of the audio signal of one channel refers to the energy/amplitude after time-frequency transformation and whitening of the audio signal of one channel, and the whitening is used to make the one audio
  • the frequency domain coefficients of the audio signal of the channel are more flat, so as to facilitate subsequent coding
  • a bit allocation is performed according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits.
  • One bit allocation here refers to bit allocation to channel pairs, that is, to allocate corresponding bit numbers to different channel pairs.
  • the respective bit numbers of the K channel pairs are determined according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits, and the number of bits is also referred to as the number of initially allocated bits.
  • a channel pair can be used as a basic unit, and a bit allocation is performed on a basic unit according to the ratio of the energy/amplitude of a basic unit to the energy/amplitude of all basic units (K basic units).
  • the energy/amplitude of any one basic unit can be determined according to the energy/amplitude of the audio signals of the two channels in the basic unit.
  • the energy/amplitude of a base unit may be the sum of the energy/amplitude of the audio signals of the two channels within the base unit.
  • the respective bit numbers of the K channel pairs and the respective bit numbers of the Q channels are determined.
  • a channel pair can be used as a basic unit, and an unpaired single channel can be used as a basic unit.
  • K+Q basic units a bit allocation is performed on a basic unit.
  • the energy/amplitude of the basic unit may be determined according to the energy/amplitude of the audio signals of the two channels in the basic unit.
  • the energy/amplitude of the basic unit may be determined according to the energy/amplitude of the audio signal of the channel.
  • bit allocation can be performed among basic units (K+Q basic units) to obtain the number of bits of each basic unit.
  • the number of bits for each of the K channel pairs and the number of bits for each of the Q channels are obtained.
  • one of the Q channels may be a monophonic channel, or may also be a channel obtained through downmix processing, that is, a downmix channel.
  • an achievable way can be based on the energy/amplitude, Either the energy/amplitude after time-frequency transformation, or the energy/amplitude after time-frequency transformation and whitening, and can be determined by the number of bits.
  • energy/amplitude equalization may be performed on the audio signals of the K channel pairs before bit allocation.
  • the manner of performing energy/amplitude equalization on the audio signals of the K channel pairs may be the audio signals of all of the plurality of channel pairs, or the plurality of channel pairs and one or more unpaired channels Perform energy/amplitude equalization.
  • the manner of performing energy/amplitude equalization on the audio signals of the K channel pairs may also be performing energy/amplitude equalization on the audio signals of the two channels in a single channel pair.
  • Another achievable implementation can be determined according to any one of the energy/amplitude after energy/amplitude equalization or the energy/amplitude after stereo processing of the audio signals of the K channel pairs, and the number of available bits.
  • energy/amplitude equalization may be performed on the audio signals of the K channel pairs before bit allocation.
  • the manner of performing energy/amplitude equalization on the audio signals of the K channel pairs may be performing energy/amplitude equalization on the audio signals of two channels in a single channel pair.
  • the energy/amplitude after energy/amplitude equalization or the energy/amplitude after stereo processing of the audio signals of the K channel pairs is the energy/amplitude of the audio signals of the two channels in a single channel pair. obtained after amplitude equalization.
  • an achievable way can be based on the audio of the Q channels.
  • the energy/amplitude of each signal in the time domain, or the energy/amplitude after time-frequency transformation, or the energy/amplitude after time-frequency transformation and whitening can be determined by the number of bits.
  • Another achievable manner can be determined according to any one of the energy/amplitude after energy/amplitude equalization or the energy/amplitude after stereo processing of the audio signals of the Q channels, and the number of available bits.
  • the energy/amplitude after energy/amplitude equalization or the energy/amplitude after stereo processing of the audio signals of the Q channels is equal to the energy/amplitude before energy/amplitude equalization or the energy/amplitude before stereo processing .
  • the encoding quality of the channel will not be improved, so a threshold can be preset, and the bit allocation to the channel This threshold is taken into account during the process so that regardless of the energy/amplitude of the single channel, the number of bits allocated to a single channel will not exceed the threshold, so that more bits can be allocated to other channels to improve the other channels.
  • the encoding quality of the single channel will not be reduced, and the encoding quality of the whole signal will also be improved.
  • the determining the respective bit numbers of the K channel pairs may further include the following steps:
  • the M th channel is the first channel of the P channels whose initial allocation bit number is greater than a threshold, allocate the redundant bits to the P channels P-1 channels other than the M-th channel are used to obtain the number of updated bits of the P-1 channels; wherein, the number of updated bits of the M-th channel is the threshold. If the M th channel is not the first channel whose number of initially allocated bits is greater than the threshold, the number of redundant bits is allocated to the P channels except the The M channels and other channels other than the channels whose initial allocation bit number is determined to be greater than the threshold value are obtained, so as to obtain the updated bit number of the other channels.
  • the channel with the determined initial allocation bit number greater than the threshold is the Nth channel
  • the other channels include the Mth channel and the Nth channel among the P channels except the Mth channel and the Nth channel.
  • frmBitMax can be calculated from the saturated encoding bit rate, frame length, and encoding sampling rate of a single channel according to the following formula:
  • rateMax represents the saturated encoding bit rate of a single channel
  • frameLen represents the frame length
  • fs represents the encoding sample rate.
  • rateMax can be 256000bps, 240000bps, 224000bps, 192000bps, etc.
  • the value of rateMax can be selected according to the coding efficiency of the encoder, or can be set according to experience, which is not limited here.
  • the L channel and R channel group are downmixed to obtain M1 channel and S1 channel
  • the LS channel and RS channel group are downmixed to obtain M2 channel and S2 channel.
  • Bits(M1) represents the initial allocation bit number of M1 channel
  • Bits(S1) represents the initial allocation bit number of S1 channel
  • Bits(M2) represents the initial allocation bit number of M2 channel
  • Bits(S2) represents S2
  • the initial allocation bit number of the channel, the initial allocation bit number of the channel that does not participate in the group pair is Bits(C) and Bits(LFE).
  • Step 4 Assign diffBits to the channel of allocFlag[j] ⁇ 1, as follows:
  • Bits(j) Bits(j)+diffBits ⁇ Bits(j)/sumBits
  • step 4 after performing step 4, the following steps can also be performed:
  • Step 4 Assign diffBits to the channel of allocFlag[j] ⁇ 1, as follows:
  • Bits(j) Bits(j)+diffBits ⁇ Bits(j)/sumBits
  • step 4 after performing step 4, the following steps can also be performed:
  • Step 103 Encode the audio signals of the P channels according to the respective bit numbers of the K channels to obtain an encoded code stream.
  • the number of bits may be the number of initially allocated bits or the number of updated bits.
  • Encoding the audio signals of the P channels may include performing quantization, entropy encoding, and code stream multiplexing on the audio signals of the P channels to obtain an encoded code stream.
  • the audio signals of the P channels are quantized, entropy encoded, and stream multiplexed to obtain an encoded stream.
  • the audio signals of the P channels of the current frame of the multi-channel audio signal are acquired, and the audio signals of the P channels include the audio signals of the K channel pairs.
  • the energy/amplitude, and the number of available bits determine the respective bit numbers of the K channel pairs, and encode the audio signals of the P channels according to the respective bit numbers of the K channels to obtain the encoded code stream.
  • the energy/amplitude of the audio signal of one channel of the P channels includes the energy/amplitude of the audio signal of the one channel in the time domain, the energy/amplitude after time-frequency transformation, the time-frequency transformation and whitening at least one of the energy/amplitude after energy/amplitude equalization, or the energy/amplitude after stereo processing.
  • the energy/amplitude after time-frequency transformation, the energy/amplitude after time-frequency transformation and whitening, and the energy/amplitude after energy/amplitude equalization performs the bit allocation for the channel pair, and determines the respective bit numbers of the K channel pairs, thereby realizing the reasonable allocation of the bits of each channel pair in the multi-channel signal encoding. to ensure the quality of the reconstructed audio signal at the decoding end.
  • the method of the embodiments of the present application can solve the problem of insufficient coding bits for channel pairs with large energy/amplitude, so as to ensure the quality of the reconstructed audio signal at the decoding end .
  • FIG. 3 is a flowchart of a multi-channel audio signal encoding method according to an embodiment of the present application.
  • the execution body of the embodiment of the present application may be the above encoder.
  • the method of the present embodiment may include:
  • Step 201 Acquire audio signals of P channels of a current frame of a multi-channel audio signal, where P is a positive integer greater than 1, and the audio signals of the P channels include audio signals of K channel pairs.
  • step 201 may refer to step 101 of the embodiment shown in FIG. 2 , and details are not repeated here.
  • Step 202 Determine the respective bit numbers of the K channel pairs according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits.
  • a bit allocation is performed according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits.
  • the method of the embodiment of the present application can determine the respective bit numbers of the K channel pairs according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits .
  • the method of the embodiment of the present application can determine the corresponding K channel pairs according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits. The number of bits and the number of bits for each of the Q channels.
  • step 202 the explanation about the respective bit numbers of the K channel pairs and the determination of the respective bit numbers of the Q channels can be referred to in FIG. 1 .
  • Step 102 in the illustrated embodiment is not repeated here.
  • Step 203 according to the number of bits of the current channel pair in the K channel pairs and the respective stereo processed energy/amplitude of the audio signals of the two channels in the current channel pair, determine the two sound channels in the current channel pair. the number of bits for each channel.
  • Secondary bit allocation is to allocate the number of bits of the two channels of the current channel pair. That is, for the basic units corresponding to the channels of the group pair, the bits are allocated in the basic unit according to the respective energy/amplitude ratios of the audio signals of the two channels in the basic unit.
  • the current channel pair may be any one of the K channel pairs.
  • the secondary bit allocation here refers to the bit allocation for two channels in a channel pair, that is, allocating corresponding bit numbers to the two channels in the channel pair.
  • step 203 can be used to allocate bits in the channel pair to obtain the respective bit numbers of the two channels in the channel pair.
  • Step 204 Encode the audio signals of the two channels according to the respective bit numbers of the two channels in the current channel pair to obtain an encoded code stream.
  • Respectively encoding the audio signals of the two channels in the current channel pair may include quantization, entropy encoding, and code stream multiplexing respectively on the audio signals of the two channels in the current channel pair to obtain an encoded code stream.
  • the audio signals of the P channels are respectively quantized, entropy encoded, and stream multiplexed to obtain an encoded stream.
  • the audio signals of the K channel pairs are quantized, entropy encoded, and stream multiplexed according to the respective bit numbers of the K channels, respectively. Perform quantization, entropy encoding, and code stream multiplexing on the audio signals of the Q channels to obtain an encoded code stream.
  • the audio signals of the P channels of the current frame of the multi-channel audio signal are acquired, and the audio signals of the P channels include the audio signals of the K channel pairs.
  • the energy/amplitude, and the number of available bits determine the respective number of bits of the K channel pairs, according to the respective number of bits of the K channel pairs, according to the number of bits of the current channel pair among the K channel pairs and the current channel pair.
  • the respective stereo-processed energy/amplitude of the audio signals of the two channels in the current channel pair determines the respective bit numbers of the two channels in the current channel pair, and respectively sets the bit numbers of the two channels in the current channel pair according to the respective bit numbers of the two channels in the current channel pair.
  • the audio signals of the two channels are encoded to obtain an encoded code stream.
  • the energy/amplitude after time-frequency transformation, the energy/amplitude after time-frequency transformation and whitening, and the energy/amplitude after energy/amplitude equalization performs bit allocation for channel pairs, determines the respective bit numbers of the K channel pairs, and then performs channel pairing based on the respective bit numbers of the K channels.
  • the number of bits of each channel in the multi-channel signal encoding can be reasonably allocated to ensure the quality of the reconstructed audio signal at the decoding end.
  • the method of the embodiment of the present application can solve the problem of insufficient coding bits of the channel pair signal with large energy/amplitude, so as to ensure the reconstruction of the audio signal at the decoding end. quality.
  • FIG. 4 is a flowchart of a method for allocating bits of a channel pair according to an embodiment of the present application.
  • the executive body of the embodiment of the present application may be the foregoing encoder, and this embodiment is one of step 102 of the embodiment shown in FIG. 2 above.
  • a specific implementation manner, as shown in FIG. 4 the method of this embodiment may include:
  • Step 1021 Determine the energy/amplitude sum of the current frame according to the respective energy/amplitude of the audio signals of the P channels.
  • the respective energy/amplitude of the audio signals of the P channels includes the respective energy/amplitude of the audio signals of the P channels in the time domain, the energy/amplitude after time-frequency transformation, the energy after time-frequency transformation and whitening At least one of /amplitude, energy/amplitude after energy/amplitude equalization, or energy/amplitude after stereo processing.
  • Manner 1 Determine the energy/amplitude sum of the current frame according to the respective stereo-processed energy/amplitude of the audio signals of the P channels.
  • the current frame energy / energy may be amplitude and / sum_E pos amplitude and the stereo processing.
  • the stereo-processed energy/amplitude and sum_E post can be determined according to the following formulas (1) and (2).
  • ch represents the channel index
  • E post (ch) represents the energy/amplitude of the audio signal of the channel whose channel index is ch after stereo processing
  • sampleCoef post (ch, i) represents the stereo processed channel of ch.
  • the ith coefficient of the current frame N represents the number of coefficients of the current frame, and N takes a positive integer greater than 1.
  • the channel whose channel index is ch may be any one of the above P channels.
  • the energy/amplitude sum of the current frame can be determined by the above method 1, and then the above-mentioned one bit allocation can be completed by the following steps 1022 and 1023.
  • the energy/amplitude sum of the current frame is determined according to the energy/amplitude before equalization of the respective energy/amplitude of the audio signals of the P channels.
  • the energy/amplitude sum may be the energy/amplitude sum sum_E pre before energy/amplitude equalization.
  • the energy/amplitude and sum_E pre before energy/amplitude equalization may be determined according to the following formulas (3) and (4).
  • E pre (ch) represents the energy/amplitude of the audio signal of the channel whose channel index is ch before energy/amplitude equalization
  • sampleCoef(ch, i) represents the current frame of the ch channel before energy/amplitude equalization.
  • N represents the number of coefficients of the current frame, and N takes a positive integer greater than 1.
  • the energy/amplitude sum of the current frame can be determined through the second method above, and then the above-mentioned first bit allocation can be completed through the following steps 1022 and 1023 .
  • Manner 3 Determine the energy/amplitude sum of the current frame according to the respective energy/amplitude of the audio signals of the P channels before equalization and the respective weighting coefficients of the P channels.
  • the weighting coefficient of any one of the P channels is less than or equal to 1.
  • the energy/amplitude sum may be the energy/amplitude sum sum_E pre before energy/amplitude equalization.
  • the energy/amplitude sum sum_E pre before energy/amplitude equalization is determined according to the following formula (5).
  • ⁇ (ch) is the weighting coefficient of the channel whose channel index is ch, the weighting coefficients of the two channels of a channel pair are the same, and the weighting coefficients of the two channels of a channel pair are the same as the weighting coefficients of the two channels of the channel pair.
  • the normalized correlation values between the two channels in a pair are inversely proportional.
  • ⁇ (ch) is 1 when the channel with the channel index ch does not participate in the group pair.
  • the channel whose channel index is ch1 (hereinafter referred to as ch1)
  • the channel whose channel index is ch2 (hereinafter referred to as ch2)
  • the channel whose channel index is ch3 Take the channel (hereinafter referred to as ch3) and the channel with channel index ch4 (hereinafter referred to as ch4) as examples, where the pair of ch1 and ch2, and the pair of ch3 and ch4 are taken as examples, ⁇ (ch1) and ⁇ (ch2) are equal, And both are less than 1, ⁇ (ch3) and ⁇ (ch4) are equal, and both are less than 1.
  • ⁇ (ch1) and ⁇ (ch2) can be determined according to the normalized correlation value Corr_norm(ch1, ch2) of ch1 and ch2.
  • ⁇ (ch3) and ⁇ (ch4) may be determined according to the normalized correlation value Corr_norm(ch3, ch4).
  • the values of ⁇ (ch3) and ⁇ (ch4) where the normalized correlation value Corr_norm(ch3, ch4) is larger, are smaller than the values of ⁇ (ch1) and ⁇ (ch2) where the normalized correlation value Corr_norm(ch1, ch2) is smaller value of . That is, ⁇ (ch1) and ⁇ (ch2) are inversely proportional to the normalized correlation values Corr_norm(ch1, ch2) of ch1 and ch2.
  • ⁇ (ch1) and ⁇ (ch2) can be calculated by the following formula (6).
  • ⁇ (ch1, ch2) C+(1-C) ⁇ (1-Corr_norm(ch1,ch2))/(1-threshold)(6)
  • C is a constant, C ⁇ [0,1], threshold is the normalized pair threshold of ch1 and ch2, threshold ⁇ [0,1], Corr_norm(ch1,ch2) is the normalized correlation of ch1 and ch2 value, coeff(ch1,ch2) ⁇ [0,1]. In some embodiments, C may take 0.707.
  • the threshold can be 0.2, 0.25, or 0.28 and so on.
  • the two channel correlation values can be calculated by the following formula (7), taking ch1 and ch2 as examples.
  • Corr_norm(ch1, ch2) is the normalized correlation value of ch1 and ch2
  • spec_ch1(i) is the time domain or frequency domain coefficient of ch1
  • spec_ch2(i) is the time domain or frequency domain coefficient of channel ch2
  • N is the number of coefficients for the current frame.
  • the L and R channels are the first channel pair and the normalized correlation value is Corr_norm(L,R), the LS channel and the RS channel are the second channel pair and the normalized correlation value is Corr_norm (LS,RS).
  • the correlation values of the two channels of other channel pairs can also be calculated by using the formula (7), and the weighting coefficients of the channels of the channel pair can also be calculated by using the formula (6).
  • the reduction degree of the energy/amplitude sum of the two channels is related to the similarity of the audio signals of the two channels, that is, the two The higher the correlation of the audio signal of the channel, the more the energy/amplitude sum of the two channels is reduced after stereo processing.
  • the weighting coefficient is increased in one bit allocation.
  • the weighting coefficients of the two channels with high correlation are smaller than the weighting coefficients of two channels with low correlation.
  • the weighting coefficients of the ungrouped channels are greater than the weighting coefficients of the paired channels.
  • the weighting coefficients of the two channels of the same pair are the same. That is, the energy/amplitude sum can be determined in the third method above, and then the above-mentioned first bit allocation can be completed through the following steps 1022 and 1023 .
  • Step 1022 Determine the respective bit coefficients of the K channel pairs according to the energy/amplitude sum of the audio signals of the K channel pairs and the energy/amplitude sum of the current frame.
  • the energy/amplitude sum of the audio signals of the K channel pairs and the energy/amplitude sum determined in the above step 1021 can be determined.
  • the energy/amplitude of the audio signals of the K channel pairs can be determined according to the respective energy/amplitude of the audio signals of the K channel pairs, and the energy/amplitude determined in step 1021 above.
  • Amplitude sum determine the respective bit coefficients of the K channel pairs, and determine the respective bit coefficients of the Q channels according to the respective energy/amplitude of the Q channels and the energy/amplitude sum determined in the above step 1021.
  • the respective bit coefficients of the K channel pairs may be the ratios of the respective energy/amplitude of the K channel pairs to the energy/amplitude sum determined in the foregoing step 1021 .
  • the energy/amplitude of a channel pair may be the sum of the energy/amplitude of the two channels in the channel pair.
  • the respective bit coefficients of the Q unpaired channels are the ratios of the respective energy/amplitude of the Q channels in the sum of the energy/amplitude determined in step 1021 above.
  • Step 1023 Determine the respective bit numbers of the K channel pairs according to the respective bit coefficients and the available bit numbers of the K channel pairs.
  • the respective bit numbers of the K channel pairs can be determined according to the respective bit coefficients of the K channel pairs and the number of available bits.
  • the respective bit numbers of the K channel pairs can be determined according to the respective bit coefficients and available bits of the K channel pairs, and according to the respective bit coefficients and available bits of the Q channels, Determines the number of bits for each of the Q channels.
  • the audio signals of the P channels of the current frame of the multi-channel audio signal are acquired, and the audio signals of the P channels include the audio signals of the K channel pairs.
  • Energy/amplitude determine the energy/amplitude sum of the current frame, according to the respective energy/amplitude of the audio signals of the K channel pairs, and the energy/amplitude sum of the current frame, determine the respective bit coefficients of the K channel pairs, according to K
  • the respective bit coefficients and available bits of each channel pair are determined, the respective bit numbers of K channel pairs are determined, and the audio signals of P channels are encoded according to the respective bit numbers of K channel pairs to obtain an encoded code flow.
  • the energy/amplitude in the time domain, the energy/amplitude after time-frequency transformation, the energy/amplitude after time-frequency transformation and whitening, and the energy/amplitude after energy/amplitude equalization of the audio signals passing through the P channels At least one of the amplitude or the energy/amplitude after stereo processing determines the energy/amplitude sum of the current frame, and based on the ratio of the respective energy/amplitude of the audio signals of each channel pair in the energy/amplitude sum, the The bit allocation of channel pairs determines the number of bits of each of the K channel pairs, so as to reasonably allocate the number of bits of each channel pair in multi-channel signal encoding, so as to ensure the quality of the reconstructed audio signal at the decoding end.
  • the method of the embodiments of the present application can solve the problem of insufficient coding bits for channel pairs with large energy/amplitude, so as to ensure the quality of the reconstructed audio signal at the decoding end .
  • the following embodiments take a 5.1-channel signal as an example to schematically illustrate the multi-channel audio signal encoding method according to the embodiment of the present application.
  • FIG. 5 is a schematic diagram of a processing process of an encoding end according to an embodiment of the present application.
  • the encoding end may include a multi-channel encoding processing unit 401 , a channel encoding unit 402 and a code stream multiplexing interface 403 .
  • the encoding end may be an encoder as described above.
  • the multi-channel encoding processing unit 401 is used to perform multi-channel signal filtering, group pairing, stereo processing and multi-channel side information generation on the input signal.
  • the input signal is a 5.1 (L channel, R channel, C channel, LFE channel, LS channel, RS channel) signal.
  • the multi-channel encoding processing unit 401 pairs the L channel signal and the R channel signal to form a first channel pair, and obtains the middle channel M1 channel signal and the side channel S1 sound through stereo processing.
  • the LS channel signal and the RS channel signal are paired to form a second channel pair, and the middle channel M2 channel signal and the side channel S2 channel signal are obtained through stereo processing.
  • the multi-channel energy/amplitude equalization increases the benefits of stereo processing, that is, the energy/amplitude is concentrated in the middle channel to facilitate the channel
  • the coding unit improves coding efficiency.
  • equalizing the channels of the group pair is adopted to obtain the energy/amplitude equalization between the channels. It is assumed that the energy/amplitude of the current frame of each input channel before energy/amplitude equalization is energy_L, energy_R, energy_C, energy_LS, and energy_RS, respectively.
  • energy_L is the energy/amplitude of the L channel signal before energy/amplitude equalization
  • energy_R is the energy/amplitude of the R channel signal before energy/amplitude equalization
  • energy_C is the energy/amplitude of the C channel signal before energy/amplitude equalization
  • energy_LS is Energy/amplitude of the LS channel signal before energy/amplitude equalization
  • energy_RS is the energy/amplitude of the RS channel signal before energy/amplitude equalization.
  • the energy/amplitude of the L channel and the R channel of the first channel pair after energy/amplitude equalization is energy_avg_LR, and the calculation method of energy_avg_LR may use the following formula (8).
  • the energy/amplitude of the LS channel and the RS channel after energy/amplitude equalization of the second channel pair are both energy_avg_LSRS, and the calculation method of energy_avg_LSRS may use the following formula (9).
  • the avg(a1, a2) function realizes the mean value of the input two parameters a1 and a2.
  • a1 takes energy_L
  • a2 takes energy_R
  • a1 takes energy_LS
  • a2 takes energy_RS.
  • the energy/amplitude energy(ch) (including energy_L, energy_R, energy_C, energy_LS, energy_RS) of each channel before energy/amplitude equalization is calculated as follows:
  • sampleCoef(ch, i) represents the i-th coefficient of the current frame of the channel whose channel index is ch
  • N represents the number of coefficients of the current frame
  • different ch values can correspond to the above L channel, R channel channel, C channel, LFE channel, LS channel, RS channel.
  • energy_L is equal to E pre (L)
  • energy_R is equal to E pre (R)
  • energy_LS is equal to E pre (LS)
  • energy_RS is equal to E pre (RS)
  • energy_C is equal to E pre (C).
  • the multi-channel encoding processing unit 401 outputs the stereo-processed M1 channel signal, the S1 channel signal, the M2 channel signal, the S2 channel signal, the LFE channel signal and the C channel signal not subjected to the stereo processing, and the multi-channel signal. Roadside information.
  • the channel encoding unit 402 is used to encode the stereo processed M1 channel signal, S1 channel signal, M2 channel signal, S2 channel signal, LFE channel signal and C channel signal without stereo processing, and multi-channel signal.
  • the channel side information is encoded, and the encoded channels E1-E6 are output.
  • Channel encoding unit 402 may include a plurality of channel processing boxes that allocate more bits to channels with greater energy/amplitude than channels with less energy/amplitude. After the channel coding unit 402 performs quantization and entropy coding to remove redundancy at the coding end, the coded channels E1-E6 are sent to the code stream multiplexing interface 403.
  • the code stream multiplexing interface 403 multiplexes the six encoded channels E1-E6 to form a serial bit stream (bitStream), so as to facilitate the multi-channel audio signal to be transmitted in the channel or stored in the digital medium.
  • FIG. 6 is a schematic diagram of a processing process of a channel encoding unit according to an embodiment of the present application.
  • the channel encoding unit 402 may include a bit allocation unit 4021 and a quantization entropy encoding unit 4023 .
  • This embodiment is an example of the above-mentioned first mode.
  • the bit allocation unit 4021 is used to perform the primary bit allocation and the secondary bit allocation in the above-mentioned embodiment, so as to obtain the number of bits of each channel.
  • the bit allocation unit 4021 determines the energy/amplitude and sum_E post after stereo processing according to the above formulas (1) and (2). Then, the bit coefficients of each channel pair and the bit coefficients of the unpaired channels are determined by the following formulas (11) to (14). In this embodiment, the bit coefficient of the first channel pair is represented by Ratio(L,R), the bit coefficient of the second channel pair is represented by Ratio(LS,RS), and the bit coefficient of the unpaired C channel is represented by Ratio(C) is represented, and the bit coefficients of the unpaired LFE channels are represented by Ratio(LFE).
  • the bit allocation unit is based on Ratio(L,R), Ratio(LS,RS), Ratio(C), Ratio(LFE), the number of available bits bAvail, the channel pair indices pairIdx1 and pairIdx2, and the stereo processed result of each channel.
  • the energy/amplitude E post (ch) is calculated to obtain the number of bits for each channel.
  • the channel pair index pairIdx1 and pairIdx2 may be output by the multi-channel encoding processing unit 401, the channel pair index pairIdx1 is used to indicate the L channel and the R channel group pair, and the channel pair index pairIdx2 is used to indicate the LS channel paired with the RS channel group.
  • the number of bits of each channel can be determined by the following formulas (15) to (22).
  • Bits(M1, S1) represents the number of bits of the first channel pair
  • Bits(M2, S2) represents the number of bits of the second channel pair.
  • Bit allocation between channels within a channel pair and bit allocation for channels not involved in a group :
  • bit allocation between the channels of the group pair channel is as follows:
  • Bits(M1) represents the number of bits of the M1 channel
  • Bits(S1) represents the number of bits of the S1 channel
  • Bits(M2) represents the number of bits of the M2 channel
  • Bits(S2) represents the number of bits of the S2 channel.
  • bit assignments for channels not participating in a group pair are as follows:
  • Bits(C) represents the number of bits of the C channel
  • Bits(LFE) represents the number of bits of the LFE channel.
  • the quantization entropy coding unit 4023 performs stereo processing on the M1 channel signal, the S1 channel signal, the M2 channel signal, the S2 channel signal, the C channel signal, the LFE channel signal and the multi-channel signal according to the number of bits of each channel.
  • the side information is quantized and entropy encoded to obtain the encoded channel E1-E6 signals.
  • the channel pair is used as the granularity to perform energy/amplitude equalization on the two channels of the channel pair. Since the energy/amplitude ratio between the channel pairs before stereo processing is different, the The energy/amplitude ratio is also different. Then, according to the energy/amplitude ratio of each channel pair after stereo processing, the bit allocation between the channel pairs is performed, and finally the internal bit allocation of the channel pair is performed, which can realize the reasonable distribution of multi-channel signals. The number of bits of each channel in the encoding to ensure the quality of the reconstructed audio signal at the decoding end.
  • the method of the embodiment of the present application can solve the problem of insufficient coding bits of the channel pair signal with large energy/amplitude, so as to ensure the reconstruction of the audio signal at the decoding end. quality.
  • the embodiment of the present application further provides another energy/amplitude equalization manner.
  • the above-mentioned 5.1-channel signal is taken as an example for further illustration.
  • energy_avg The energy/amplitude of each channel after equalization is energy_avg.
  • energy_avg can be determined by the following formula (23).
  • the Avg(a1, a2, ..., an) function realizes the mean value of the input n parameters a1, a2, ..., an.
  • FIG. 7 is a schematic diagram of a processing process of a channel encoding unit according to an embodiment of the present application.
  • the channel encoding unit 402 may include a bit allocation unit 4021 , a quantization entropy encoding unit 4023 and a bit calculation unit 4022 .
  • This embodiment is an example of the above-mentioned second manner.
  • the bit allocation unit 4021 is configured to perform the primary bit allocation and the secondary bit allocation in the above-mentioned embodiment, so as to obtain the number of bits of each channel.
  • the bit calculation unit 4022 determines the energy/amplitude sum sum_E pre before energy/amplitude equalization according to the above formulas (3) and (4). Then, the bit coefficients of each channel pair and the bit coefficients of the unpaired channels are determined by the following formulae (24) to (27).
  • the bit coefficient of the first channel pair is represented by Ratio(L,R)
  • the bit coefficient of the second channel pair is represented by Ratio(LS,RS)
  • the bit coefficient of the unpaired C channel is represented by Ratio(C) is represented
  • the bit coefficients of the unpaired LFE channels are represented by Ratio(LFE).
  • the bit allocation unit 4021 is based on Ratio(L,R), Ratio(LS,RS), Ratio(C), Ratio(LFE), the number of available bits bAvail, the channel pair indices pairIdx1 and pairIdx2, and the stereo processing of each channel.
  • the energy/amplitude E post (ch) is calculated to obtain the number of bits for each channel.
  • the channel pair index pairIdx1 and pairIdx2 may be output by the multi-channel encoding processing unit 401, the channel pair index pairIdx1 is used to indicate the L channel and the R channel group pair, and the channel pair index pairIdx2 is used to indicate the LS channel Pair with the RS channel group.
  • the number of bits of each channel can be determined by the above formulae (15) to (22).
  • the quantization entropy coding unit 4023 performs stereo processing on the M1 channel signal, the S1 channel signal, the M2 channel signal, the S2 channel signal, the C channel signal, the LFE channel signal and the multi-channel signal according to the number of bits of each channel.
  • the side information is quantized and entropy encoded to obtain the encoded channel E1-E6 signals.
  • stereo processing is performed after performing energy/amplitude equalization on all channels.
  • the energy/amplitude ratio of each channel after stereo processing is similar, in this embodiment of the present application, after stereo processing Perform bit allocation between channel pairs according to the energy/amplitude ratio of the pair, and then perform bit allocation within the channel pair according to the energy/amplitude after stereo processing.
  • the bit allocation between each channel pair is guided. Since the energy/amplitude ratio of the channel pair before stereo processing is different, the bit allocation between each channel pair is performed accordingly. , which can reasonably allocate the number of bits of each channel in the multi-channel signal encoding, so as to ensure the quality of the reconstructed audio signal at the decoding end.
  • the method of the embodiment of the present application can solve the problem of insufficient coding bits of the channel pair signal with large energy/amplitude, so as to ensure the reconstruction of the audio signal at the decoding end. quality.
  • the channel encoding unit 402 may include a bit allocation unit 4021, a quantization entropy encoding unit 4023, and a bit calculation unit 4022, and may also be used to implement the functions of each step in the third mode.
  • the bit allocation unit 4021 is configured to perform the primary bit allocation and the secondary bit allocation in the above-mentioned embodiment, so as to obtain the number of bits of each channel.
  • the bit allocation unit 4021 determines the energy/amplitude and sum_E pre before the energy/amplitude equalization according to the above formulas (5) to (7). Then, the bit coefficients of each channel pair and the bit coefficients of the unpaired channels are determined by the following formulae (28) to (31). In this embodiment, the bit coefficient of the first channel pair is represented by Ratio(L,R), the bit coefficient of the second channel pair is represented by Ratio(LS,RS), and the bit coefficient of the unpaired C channel is represented by Ratio(C) is represented, and the bit coefficients of the unpaired LFE channels are represented by Ratio(LFE).
  • ⁇ (L) represents the weighting coefficient of the L channel
  • ⁇ (R) represents the weighting coefficient of the R channel
  • ⁇ (LS) represents the weighting coefficient of the LS channel
  • ⁇ (RS) represents the weighting coefficient of the RS channel
  • ⁇ (C) represents the weighting coefficient of the C channel
  • ⁇ (LFE) represents the weighting coefficient of the LFE channel.
  • the number of bits of each channel can be determined by the above equations (15) to (22).
  • the quantization entropy coding unit pairs the stereo processed M1 channel signal, S1 channel signal, M2 channel signal, S2 channel signal, C channel signal, LFE channel signal and multi-channel side signal according to the number of bits of each channel.
  • the information is quantized and entropy encoded to obtain encoded channel E1-E6 signals.
  • FIG. 8 is a flowchart of another multi-channel audio signal encoding method according to an embodiment of the present application.
  • the execution body of the embodiment of the present application may be the foregoing encoder.
  • the method in this embodiment may include:
  • Step 501 Acquire audio signals of P channels of the current frame of the multi-channel audio signal, where P is a positive integer greater than 1, and the audio signals of the P channels include audio signals of K channel pairs.
  • the audio signal of one channel pair includes audio signals of two channels.
  • One channel pair in this embodiment of the present application may be any one of the K channel pairs. Coupling the audio signals of two channels is the audio signal of one channel pair.
  • P 2K.
  • step 501 may refer to step 101 of the embodiment shown in FIG. 2 , and details are not repeated here.
  • Step 502 according to the respective energy/amplitude of the audio signals of the two channels of the current channel pair in the K channel pairs, perform energy/amplitude equalization on the audio signals of the two channels of the current channel pair, and obtain the The energy/amplitude of the respective energy/amplitude equalized audio signals of the two channels of the current channel pair.
  • the embodiments of the present application perform energy/amplitude equalization for channel pairs, that is, each channel pair performs energy/amplitude equalization within the channel pair.
  • each channel pair performs energy/amplitude equalization within the channel pair.
  • the two channels of the current channel pair Perform energy/amplitude equalization on the audio signal of the current channel pair, and obtain the energy/amplitude equalized energy/amplitude of the two channels of the current channel pair.
  • energy/amplitude equalization can be performed in the channel pair in the manner of step 502 above, so as to obtain the respective energies of the two channels in the current channel pair. /amplitude equalized energy/amplitude.
  • the above formula (8) may be used to determine the energy/amplitude after energy/amplitude equalization of the two channels of the current channel pair. That is, L and R in formula (8) are replaced by the two channels of the current channel pair.
  • Step 503 Determine the respective bit numbers of the two channels of the current channel pair according to the respective energy/amplitude after energy/amplitude equalization of the audio signals of the two channels of the current channel pair, and the number of available bits .
  • the current channel pair may be any one of the K channel pairs.
  • the method of the embodiment of the present application may determine the energy/amplitude sum of the current frame according to the energy/amplitude equalized energy/amplitude of the audio signals of the respective two channels of the K channels. According to the energy/amplitude sum of the current frame, the energy/amplitude after energy/amplitude equalization of the audio signals of the two channels of the current channel pair, and the number of available bits, determine the respective energy/amplitude of the two channels of the current channel pair. number of bits.
  • the ratio of the energy/amplitude equalized energy/amplitude of the audio signals of the two channels of the current channel pair to the energy/amplitude sum determine the two channels of the current channel pair.
  • the number of bits for each channel determine the two channels of the current channel pair.
  • the method of the embodiment of the present application can be based on the energy/amplitude of the audio signals of the respective two channels of the K channels after energy/amplitude equalization, and the audio frequency of the Q channels.
  • the energy/amplitude of the signal after energy/amplitude equalization determines the energy/amplitude sum of the current frame.
  • the respective bit numbers of the two channels of the current channel pair are determined according to the energy/amplitude sum, the respective energy/amplitude of the audio signals of the two channels of the current channel pair, and the number of available bits. According to the energy/amplitude sum, the energy/amplitude after energy/amplitude equalization of the audio signals of the Q channels, and the number of available bits, the number of bits for each of the Q channels is determined.
  • the respective bit numbers of the Q channels are determined according to the ratio of the energy/amplitude equalized energy/amplitude of the audio signals of the Q channels to the energy/amplitude sum and the number of available bits.
  • the energy/amplitude after energy/amplitude equalization of the audio signals of the Q channels may be equal to the respective energy/amplitude before energy/amplitude equalization, and approximately equal to the respective energy/amplitude after stereo processing.
  • the energy/amplitude equalized energy/amplitude of the respective two-channel audio signals of the K channels may be approximately equal to the stereo-processed energy/amplitude of the respective two-channel audio signals.
  • the above formula (1) can be used to determine the energy/amplitude sum, that is, the energy/amplitude after stereo processing in formula (1) is replaced by the energy/amplitude equalized energy of each channel in this embodiment. /amplitude.
  • Step 504 Encode the audio signals of the two channels according to the respective bit numbers of the two channels of the current channel pair to obtain an encoded code stream.
  • Respectively encoding the audio signals of the two channels in the current channel pair may include quantization, entropy encoding, and code stream multiplexing respectively on the audio signals of the two channels in the current channel pair to obtain an encoded code stream.
  • the audio signals of the P channels are respectively quantized, entropy encoded, and stream multiplexed to obtain an encoded stream.
  • the audio signals of the K channel pairs are quantized, entropy encoded, and stream multiplexed according to the respective bit numbers of the K channels, respectively. Perform quantization, entropy encoding, and code stream multiplexing on the audio signals of the Q channels to obtain an encoded code stream.
  • the audio signals of P channels of the current frame of the multi-channel audio signal are acquired, the audio signals of the P channels include audio signals of K channel pairs, and the current channel is centered according to the K channel pairs.
  • the respective energy/amplitude of the audio signals of the two channels perform energy/amplitude equalization on the audio signals of the two channels of the current channel pair, and obtain the energy/amplitude of the two channels of the current channel pair
  • the equalized energy/amplitude, according to the energy/amplitude of the two channels of the current channel pair after equalization, and the number of available bits determine the respective bit numbers of the two channels of the current channel pair , and respectively encode the audio signals of the two channels according to the respective bit numbers of the two channels of the current channel pair to obtain an encoded code stream.
  • the bit allocation is performed based on the energy/amplitude after the energy/amplitude equalization, so as to realize the reasonable allocation of the bits of each channel in the multi-channel signal encoding, so as to ensure the reconstruction of the audio signal at the decoding end. quality.
  • the method of the embodiment of the present application can solve the problem of insufficient coding bits of the channel pair signal with large energy/amplitude, so as to ensure the reconstruction of the audio signal at the decoding end. quality.
  • FIG. 8 The embodiment shown in FIG. 8 is explained by taking the embodiment shown in FIG. 5 and FIG. 6 as an example.
  • the multi-channel encoding processing unit 401 in the embodiment shown in FIG. 5 may perform steps 501 and 502 in the embodiment shown in FIG. 8 , and the channel encoding unit 402 may perform step 503 in the embodiment shown in FIG. 8 .
  • the channel encoding unit 402 can perform step 503 of the embodiment shown in FIG. 8
  • the difference from the embodiments shown in FIG. 5 and FIG. 6 is that the bit allocation unit 4021 can determine the number of bits of each channel in the following manner.
  • the bit allocation unit 4021 in this embodiment of the present application may perform bit allocation according to the energy/amplitude equalized of the respective energy/amplitude of the P channels. Specifically, the following formulas (32) to (37) can be used to determine.
  • the multi-channel encoding processing unit 401 needs to adopt the energy/amplitude equalization method of the channel pair, that is, the energy/amplitude equalization within the channel pair.
  • sum_E post can be determined by using the above formula (1).
  • E(L, R) The energy/amplitude sum E(L, R) before the energy/amplitude equalization of the L channel and the R channel, after the energy/amplitude equalization, the energy/amplitude sum of the L channel and the R channel has not changed, still is E(L, R).
  • E post (M1, S1) the stereo processed energy/amplitude sum of the L channel and the R channel becomes E post (M1, S1). Because stereo processing will slightly reduce the redundancy between the L channel and the R channel and satisfy E post (M1, S1) ⁇ E(L, R).
  • the The processing of the multi-channel coding processing unit 401 in this embodiment and the bit allocation unit 4021 in this embodiment can make the bits Bits(M1)+Bits(S1) allocated by E(L, R) much larger than Bits(M2) +Bits(S2), so as to achieve the purpose of allocating bits between channel pairs according to energy/amplitude.
  • bit allocation is performed based on the energy/amplitude after energy/amplitude equalization, so as to realize the reasonable distribution of the number of bits of each channel in the multi-channel signal encoding, so as to ensure the decoding end Reconstruct the quality of the audio signal.
  • the method of the embodiment of the present application can solve the problem of insufficient coding bits of the channel pair signal with large energy/amplitude, so as to ensure the reconstruction of the audio signal at the decoding end. quality.
  • an embodiment of the present application further provides an audio signal encoding apparatus, which can be applied to an audio encoder.
  • FIG. 9 is a schematic structural diagram of an audio signal encoding apparatus according to an embodiment of the present application.
  • the audio signal encoding apparatus 700 includes an acquisition module 701 , a bit allocation module 702 , and an encoding module 703 .
  • the acquisition module 701 is used to acquire the respective energy/amplitude of the audio signals of the P channels and the audio signals of the P channels of the current frame of the multi-channel audio signal, where P is a positive integer greater than 1, and the P channels
  • the audio signal includes audio signals of K channel pairs, where K is a positive integer.
  • the bit allocation module 702 is configured to determine the respective bit numbers of the K channel pairs according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits.
  • the encoding module 703 is configured to encode the audio signals of the P channels according to the respective bit numbers of the K channels to obtain an encoded code stream.
  • the energy/amplitude of the audio signal of one channel in the P channels includes the energy/amplitude of the audio signal of the one channel in the time domain, the energy/amplitude of the audio signal of the one channel after time-frequency transformation Amplitude, the time-frequency transformed and whitened energy/amplitude of the audio signal of the one channel, the energy/amplitude of the audio signal of the one channel after energy/amplitude equalization, or the audio signal of the one channel after stereo processing At least one of the following energy/amplitude.
  • the encoding module 703 is configured to determine, according to the number of bits of the current channel pair in the K channel pairs and the respective stereo-processed energy/amplitude of the audio signals of the two channels in the current channel pair, The respective bit numbers of the two channels in the current channel pair; the audio signals of the two channels are encoded according to the respective bit numbers of the two channels in the current channel pair.
  • the bit allocation module 702 is configured to: determine the energy/amplitude sum of the current frame according to the respective energy/amplitude of the audio signals of the P channels. According to the sum of the energy/amplitude of the audio signals of the K channel pairs and the energy/amplitude sum of the current frame, the respective bit coefficients of the K channel pairs are determined. The respective bit numbers of the K channel pairs are determined according to the respective bit coefficients of the K channel pairs and the available number of bits.
  • the bit allocation module 702 is configured to: determine the energy/amplitude sum of the current frame according to the respective stereo-processed energy/amplitude of the audio signals of the P channels.
  • bit allocation module 702 is used to:
  • ch represents the channel index
  • E post (ch) represents the energy/amplitude of the audio signal of the channel whose channel index is ch after stereo processing
  • sampleCoef post (ch, i) represents the stereo processed channel of ch.
  • the ith coefficient of the current frame N represents the number of coefficients of the current frame, and N takes a positive integer greater than 1.
  • the bit allocation module 702 is configured to: determine the energy/amplitude sum of the current frame according to the energy/amplitude before equalization of the respective energy/amplitude of the audio signals of the P channels.
  • the bit allocation module 702 is used to: according to the formula Calculate the energy/amplitude and sum_E pre of the current frame, where ch represents the channel index, and E pre (ch) represents the energy/amplitude of the audio signal of the channel whose channel index is ch before energy/amplitude equalization.
  • the bit allocation module 702 is configured to: determine the current frame according to the energy/amplitude of the audio signals of the P channels before equalization and the respective weighting coefficients of the P channels. Energy/amplitude sum, the weighting factor is less than or equal to 1.
  • bit allocation module 702 is used to:
  • ⁇ (ch) is the weighting coefficient of the ch channel, the weighting coefficients of the two channels of a channel pair are the same, and the weighting coefficients of the two channels of a channel pair are the same as the difference between the two channels. is inversely proportional to the normalized correlation value of .
  • the bit allocation module 702 is configured to: determine the respective bit numbers of the K channel pairs and the respective bit numbers of the Q channels according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits.
  • the encoding module 703 is configured to encode the audio signals of the K channel pairs according to the respective bit numbers of the K channels, respectively encode the Q channel audio signals according to the respective bit numbers of the Q channels.
  • the bit allocation module 702 is configured to: determine the energy/amplitude sum of the current frame according to the respective energy/amplitude of the audio signals of the P channels. According to the sum of the respective energy/amplitude of the audio signals of the K channel pairs and the energy/amplitude sum of the current frame, the respective bit coefficients of the K channel pairs are determined. The respective bit coefficients of the Q channels are determined according to the sum of the energy/amplitude of the audio signals of the Q channels and the energy/amplitude of the current frame. The respective bit numbers of the K channel pairs are determined according to the respective bit coefficients of the K channel pairs and the available number of bits. The number of bits of each of the Q channels is determined according to the respective bit coefficients and the number of available bits of the Q channels.
  • the apparatus may further include: an energy/amplitude equalization module 704 .
  • the energy/amplitude equalization module 704 is configured to obtain the energy/amplitude equalized audio signals of the P channels according to the audio signals of the P channels.
  • the energy/amplitude of the aforementioned audio signal of one channel after energy/amplitude equalization is obtained from the energy/amplitude equalized audio signal of the one channel.
  • the encoding module 703 is configured to encode the energy/amplitude equalized audio signals of the P channels according to the respective bit numbers of the K channels.
  • the acquisition module 701, the bit allocation module 702, and the encoding module 703 can be applied to the audio signal encoding process at the encoding end.
  • An embodiment of the present application further provides another audio signal encoding apparatus.
  • the audio signal encoding apparatus may adopt the schematic structural diagram shown in FIG. 9 , and the audio signal encoding apparatus of this embodiment is used to execute the method of the embodiment shown in FIG. 8 . .
  • the functions of each module in the embodiment shown in FIG. 9 are different.
  • the obtaining module 701 is configured to obtain the audio signals of P channels of the current frame of the multi-channel audio signal, where P is A positive integer greater than 1, the audio signals of the P channels include audio signals of K channel pairs, and K is a positive integer.
  • the energy/amplitude equalization module 704 is configured to perform an analysis on the audio signals of the two channels of the current channel pair according to the respective energy/amplitude of the audio signals of the two channels of the current channel pair in the K channel pairs.
  • Energy/amplitude equalization Obtain the energy/amplitude equalized energy/amplitude of the audio signals of the two channels of the current channel pair.
  • a bit allocation module 702 configured to determine the respective energy/amplitude of the audio signals of the two channels of the current channel pair after equalization of energy/amplitude, and the number of available bits, to determine the respective two channels of the current channel pair. number of bits.
  • the encoding module 703 is configured to encode the audio signals of the two channels according to the respective bit numbers of the two channels of the current channel pair to obtain an encoded code stream.
  • the bit allocation module 702 is configured to determine the energy/amplitude sum of the current frame according to the energy/amplitude equalized energy/amplitude of the audio signals of the P channels. Determine the two channels of the current channel pair according to the energy/amplitude sum of the current frame, the respective energy/amplitude equalized energy/amplitude of the audio signals of the two channels of the current channel pair, and the number of available bits the respective number of bits.
  • the bit allocation module 702 is configured to equalize the energy/amplitude of the audio signals of the respective two channels according to the energy/amplitude of the K channels, and the energy/amplitude equalized energy/amplitude of the audio signals of the Q channels.
  • energy/amplitude determines the energy/amplitude sum of the current frame.
  • the respective bit numbers of the two channels of the current channel pair are determined according to the energy/amplitude sum of the current frame, the respective energy/amplitude of the audio signals of the two channels of the current channel pair, and the number of available bits.
  • the respective bit numbers of the Q channels are determined according to the energy/amplitude sum of the current frame, the energy/amplitude equalized energy/amplitude of the audio signals of the Q channels, and the number of available bits.
  • the encoding module 703 is configured to encode the audio signals of the K channel pairs according to the respective bit numbers of the K channels, and respectively encode the audio signals of the Q channels according to the respective bit numbers of the Q channels The signal is encoded to obtain the encoded code stream.
  • the acquisition module 701 , the bit allocation module 702 , the energy/amplitude equalization module 704 , and the encoding module 703 can be applied to the audio signal encoding process at the encoding end.
  • an embodiment of the present application provides an audio signal encoder.
  • the audio signal encoder is used to encode an audio signal, including: performing the encoder described in one or more of the above embodiments, wherein , the audio signal encoding device is used to encode and generate the corresponding code stream.
  • an embodiment of the present application provides a device for encoding an audio signal, for example, an audio signal encoding device, as shown in FIG. 10 , the audio signal encoding device 800 includes:
  • a processor 801, a memory 802, and a communication interface 803 (wherein the number of processors 801 in the audio signal encoding device 800 may be one or more, and one processor is taken as an example in FIG. 10).
  • the processor 801 , the memory 802 , and the communication interface 803 may be connected by a bus or in other ways, wherein the connection by a bus is taken as an example in FIG. 10 .
  • Memory 802 may include read-only memory and random access memory, and provides instructions and data to processor 801 .
  • a portion of memory 802 may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 802 stores an operating system and operation instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operation instructions may include various operation instructions for implementing various operations.
  • the operating system may include various system programs for implementing various basic services and handling hardware-based tasks.
  • the processor 801 controls the operation of the audio encoding device, and the processor 801 may also be referred to as a central processing unit (central processing unit, CPU).
  • CPU central processing unit
  • various components of the audio coding device are coupled together through a bus system, where the bus system may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus.
  • the various buses are referred to as bus systems in the figures.
  • the methods disclosed in the above embodiments of the present application may be applied to the processor 801 or implemented by the processor 801 .
  • the processor 801 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above-mentioned method may be completed by an integrated logic circuit of hardware in the processor 801 or an instruction in the form of software.
  • the above-mentioned processor 801 may be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application specific integrated circuit (application specific integrated circuit, ASIC), a field-programmable gate array (field-programmable gate array, FPGA) or Other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory 802, and the processor 801 reads the information in the memory 802, and completes the steps of the above method in combination with its hardware.
  • the communication interface 803 can be used to receive or transmit digital or character information, for example, it can be an input/output interface, a pin or a circuit, and the like. For example, the above-mentioned encoded code stream is sent through the communication interface 803 .
  • an embodiment of the present application provides an audio encoding device, including: a non-volatile memory and a processor coupled to each other, the processor calling program codes stored in the memory to execute Part or all of the steps of the multi-channel audio signal encoding method as described in one or more of the above embodiments.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a program code, wherein the program code includes a program code for executing one or more of the above Instructions for part or all of the steps of the multi-channel audio signal encoding method described in the embodiments.
  • an embodiment of the present application provides a computer program product, when the computer program product is run on a computer, the computer is made to execute the multiple methods described in one or more of the above embodiments. Some or all of the steps of a method for encoding a channel audio signal.
  • the processor mentioned in the above embodiments may be an integrated circuit chip, which has signal processing capability.
  • each step of the above method embodiments may be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software.
  • the processor may be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), or other Programming logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the methods disclosed in the embodiments of the present application may be directly embodied as executed by a hardware coding processor, or executed by a combination of hardware and software modules in the coding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware.
  • the memory mentioned in the above embodiments may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically programmable Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • Volatile memory may be random access memory (RAM), which acts as an external cache.
  • RAM random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous DRAM
  • SDRAM double data rate synchronous dynamic random access memory
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link dynamic random access memory
  • direct rambus RAM direct rambus RAM
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution, and the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, removable hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Stereophonic System (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A multi-channel audio signal encoding method and apparatus (700). The method may comprise: obtaining audio signals of P channels of the current frame of a multi-channel audio signal, the audio signals of the P channels comprising audio signals of K channel pairs (steps 101, 201, 501); determining the respective number of bits of the K channel pairs according to the respective energy/amplitude and available number of bits of the audio signals of the P channels (steps 102, 202); and encoding the audio signals of the P channels according to the respective number of bits of the K channels to obtain an encoded code stream (step 103), so as to improve the coding quality.

Description

多声道音频信号编码方法和装置Multi-channel audio signal encoding method and device
本申请要求于2020年07月17日提交中国专利局、申请号为202010699775.8、申请名称为“多声道音频信号编码方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202010699775.8 and the application name "Multi-channel audio signal coding method and device", which was submitted to the Chinese Patent Office on July 17, 2020, the entire contents of which are incorporated herein by reference Applying.
技术领域technical field
本申请涉及音频编解码技术,尤其涉及一种多声道音频信号编码方法和装置。The present application relates to audio coding and decoding technologies, and in particular, to a multi-channel audio signal coding method and device.
背景技术Background technique
随着多媒体技术的不断发展,音频在多媒体通信、消费电子、虚拟现实、人机交互等领域得到了广泛应用。音频编码是多媒体技术的关键技术之一。音频编码通过去除原始音频信号中的冗余信息来实现数据量的压缩,以方便存储或传输。With the continuous development of multimedia technology, audio has been widely used in multimedia communication, consumer electronics, virtual reality, human-computer interaction and other fields. Audio coding is one of the key technologies of multimedia technology. Audio coding compresses the amount of data by removing redundant information in the original audio signal to facilitate storage or transmission.
多声道音频编码是两个以上声道的编码,常见的有5.1声道、7.1声道、7.1.4声道、22.2声道等。通过对多路原始音频信号进行多声道信号的筛选、组对、立体声处理、多声道边信息生成、量化处理、熵编码处理以及码流复用,形成串行比特流(编码码流),以方便在信道中传输或在数字媒介中存储。其中,由于多声道声道间能量差异较大,所以在进行立体声处理之前需要对多声道进行能量均衡,以增加立体声处理的收益,从而提升编码效率。Multi-channel audio coding is the coding of more than two channels, and the common ones are 5.1 channels, 7.1 channels, 7.1.4 channels, 22.2 channels, etc. By performing multi-channel signal screening, group pairing, stereo processing, multi-channel side information generation, quantization processing, entropy coding processing and code stream multiplexing on multiple original audio signals to form a serial bit stream (coded code stream) , to facilitate transmission over the channel or storage in digital media. Among them, since the energy difference between the multi-channel channels is relatively large, it is necessary to perform energy equalization on the multi-channel before performing the stereo processing, so as to increase the revenue of the stereo processing, thereby improving the coding efficiency.
对于能量均衡,通常采用对所有声道的能量取均值的方式。这种方式会影响编码后的音频信号的质量。例如,对于声道间能量差异较大的情况,上述能量均衡方法会造成能量/幅度大的声道帧的编码比特不够质量变差,能量小的声道帧的编码比特冗余浪费资源。在低码率情况下,总可用比特紧张导致能量/幅度大的声道帧的质量下降明显。For energy equalization, the energy of all channels is usually averaged. This way affects the quality of the encoded audio signal. For example, in the case of large energy difference between channels, the above energy equalization method may cause insufficient quality of coded bits of channel frames with large energy/amplitude, and redundant coded bits of channel frames with small energy wastes resources. In the case of low bit rates, the total available bits are tight, resulting in a significant degradation in the quality of channel frames with large energy/amplitude.
发明内容SUMMARY OF THE INVENTION
本申请提供一种多声道音频信号编码方法和装置,有益于提升编码音频信号的质量。The present application provides a multi-channel audio signal encoding method and device, which are beneficial to improve the quality of the encoded audio signal.
第一方面,本申请实施例提供一种多声道音频信号编码方法,该方法可以包括:获取多声道音频信号的当前帧的P个声道的音频信号,P为大于1的正整数,该P个声道的音频信号包括K个声道对的音频信号,K为正整数。获取P个声道的音频信号各自的能量/幅度。根据该P个声道的音频信号各自的能量/幅度,和可用比特数,确定该K个声道对各自的比特数。根据该K个声道对各自的比特数,对该P个声道的音频信号进行编码,以获取编码码流。In a first aspect, an embodiment of the present application provides a multi-channel audio signal encoding method, the method may include: acquiring audio signals of P channels of a current frame of the multi-channel audio signal, where P is a positive integer greater than 1, The audio signals of the P channels include audio signals of K channel pairs, where K is a positive integer. Obtain the respective energy/amplitude of the audio signals of the P channels. The respective bit numbers of the K channel pairs are determined according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits. According to the respective bit numbers of the K channels, the audio signals of the P channels are encoded to obtain an encoded code stream.
其中,该P个声道中的一个声道的音频信号的能量/幅度包括该一个声道的音频信号在时域的能量/幅度、该一个声道的音频信号经时频变换后的能量/幅度、该一个声道的音频信号经时频变换以及白化后的能量/幅度、该一个声道的音频信号经能量/幅度均衡后的能量/幅度、或该一个声道的音频信号经立体声处理后的能量/幅度中至少一项。Wherein, the energy/amplitude of the audio signal of one channel in the P channels includes the energy/amplitude of the audio signal of the one channel in the time domain, the energy/amplitude of the audio signal of the one channel after time-frequency transformation Amplitude, the time-frequency transformed and whitened energy/amplitude of the audio signal of the one channel, the energy/amplitude of the audio signal of the one channel after energy/amplitude equalization, or the audio signal of the one channel after stereo processing At least one of the following energy/amplitude.
本实现方式,通过根据P个声道的音频信号各自在时域的能量/幅度、经时频变换以 及白化后的能量/幅度、经能量/幅度均衡后的能量/幅度、或经立体声处理后的能量/幅度中至少一项进行针对声道对的比特分配,确定K个声道对各自的比特数,从而实现合理分配多声道信号编码中各个声道对的比特数,以保证解码端重建音频信号的质量。In this implementation, according to the energy/amplitude of the audio signals of the P channels in the time domain, the energy/amplitude after time-frequency transformation and whitening, the energy/amplitude after energy/amplitude equalization, or the energy/amplitude after stereo processing At least one of the energy/amplitude of the channel pair is allocated to the channel pair, and the number of bits for each of the K channel pairs is determined, so as to realize the reasonable allocation of the bit number of each channel pair in the multi-channel signal encoding, so as to ensure the decoding end. Reconstruct the quality of the audio signal.
一种可能的设计中,该K个声道对包括当前声道对,该方法还可以包括:对该K个声道对中的当前声道对的两个声道的音频信号进行能量/幅度均衡,以获取当前声道对的两个声道的音频信号各自经能量/幅度均衡后的能量/幅度。In a possible design, the K channel pairs include the current channel pair, and the method may further include: performing energy/amplitude measurements on the audio signals of the two channels of the current channel pair in the K channel pairs. Equalization to obtain the energy/amplitude of the audio signals of the two channels of the current channel pair after energy/amplitude equalization.
本实现方式,通过对单个声道对内的两个声道的音频信号进行能量/幅度均衡,以实现对于能量/幅度差异较大的声道对间,经过能量/幅度均衡后,仍可以保持较大的能量/幅度差异,从而使得基于能量/幅度均衡后的能量/幅度进行比特分配时,可以给能量/幅度较大的声道对分配更多的比特,以保证能量/幅度较大的声道对的编码比特满足其编码需求,进而提升解码端重建音频信号的质量。In this implementation manner, by performing energy/amplitude equalization on the audio signals of the two channels in a single channel pair, it is possible to achieve that for channel pairs with large energy/amplitude differences, after energy/amplitude equalization, the Large energy/amplitude difference, so that when bit allocation is performed based on the energy/amplitude after energy/amplitude equalization, more bits can be allocated to channel pairs with larger energy/amplitude to ensure that The encoded bits of the channel pair meet their encoding requirements, thereby improving the quality of the reconstructed audio signal at the decoding end.
一种可能的设计中,K个声道对包括当前声道对,根据该K个声道对各自的比特数,对该P个声道的音频信号进行编码,可以包括:根据该当前声道对的比特数以及该当前声道对中两个声道的音频信号各自的经立体声处理后的能量/幅度,确定该当前声道对中两个声道各自的比特数。根据该当前声道对中两个声道各自的比特数分别对该两个声道的音频信号进行编码。In a possible design, the K channel pairs include the current channel pair, and encoding the audio signals of the P channels according to the respective bit numbers of the K channel pairs may include: according to the current channel The number of bits of the pair and the respective stereo-processed energy/amplitude of the audio signals of the two channels in the current channel pair determine the respective number of bits of the two channels in the current channel pair. The audio signals of the two channels are encoded according to the respective bit numbers of the two channels in the current channel pair.
本实现方式,在获取K个声道对各自的比特数之后,可以基于K个声道对各自的比特数进行声道对内的比特分配,从而实现合理分配多声道信号编码中各个声道的比特数,以保证解码端重建音频信号的质量。In this implementation manner, after obtaining the respective bit numbers of the K channel pairs, the bits within the channel pair can be allocated based on the respective bit numbers of the K channels, so as to achieve a reasonable allocation of each channel in the multi-channel signal encoding. The number of bits to ensure the quality of the reconstructed audio signal at the decoding end.
一种可能的设计中,根据该P个声道的音频信号各自的能量/幅度,和可用比特数,确定所述K个声道对各自的比特数,可以包括:根据该P个声道的音频信号各自的能量/幅度,确定该当前帧的能量/幅度和。根据该K个声道对的音频信号各自的能量/幅度,与该当前帧的能量/幅度和,确定该K个声道对各自的比特系数。根据该K个声道对各自的比特系数和该可用比特数,确定该K个声道对各自的比特数。In a possible design, determining the respective bit numbers of the K channel pairs according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits may include: The respective energy/amplitude of the audio signal determines the sum of the energy/amplitude of the current frame. The respective bit coefficients of the K channel pairs are determined according to the sum of the respective energy/amplitude of the audio signals of the K channel pairs and the energy/amplitude sum of the current frame. The respective bit numbers of the K channel pairs are determined according to the respective bit coefficients of the K channel pairs and the available number of bits.
一种可能的设计中,根据该P个声道的音频信号各自的能量/幅度,确定该当前帧的能量/幅度和,可以包括:根据该P个声道的音频信号各自的经立体声处理后的能量/幅度,确定该当前帧的能量/幅度和。In a possible design, determining the energy/amplitude sum of the current frame according to the respective energy/amplitude of the audio signals of the P channels, may include: after the stereo processing of the audio signals of the P channels, respectively. energy/amplitude, determine the energy/amplitude sum of the current frame.
本实现方式,可以通过对单个声道对内的两个声道进行能量/幅度均衡,以实现对于能量/幅度差异较大的声道对间,经过能量/幅度均衡后,仍可以保持较大的能量/幅度差异,从而使得基于能量/幅度均衡后的能量/幅度进行比特分配时,可以给能量/幅度较大的声道对分配更多的比特,以保证能量/幅度较大的声道对的编码比特满足其编码需求,进而提升解码端重建音频信号的质量。In this implementation, the energy/amplitude equalization can be performed on the two channels in a single channel pair, so that the channel pair with a large energy/amplitude difference can still maintain a large energy/amplitude equalization after the energy/amplitude equalization. energy/amplitude difference, so that when bit allocation is performed based on the energy/amplitude after energy/amplitude equalization, more bits can be allocated to channel pairs with larger energy/amplitude to ensure that channels with larger energy/amplitude The right coded bits meet its coding requirements, thereby improving the quality of the reconstructed audio signal at the decoding end.
一种可能的设计中,根据该P个声道的音频信号各自的经立体声处理后的能量/幅度,确定该当前帧的能量/幅度和,可以包括:根据公式
Figure PCTCN2021106102-appb-000001
计算该当前帧的能量/幅度和sum_E post
In a possible design, determining the energy/amplitude sum of the current frame according to the respective stereo-processed energy/amplitude of the audio signals of the P channels may include: according to the formula
Figure PCTCN2021106102-appb-000001
Calculate the energy/magnitude and sum_E post for this current frame.
其中,
Figure PCTCN2021106102-appb-000002
in,
Figure PCTCN2021106102-appb-000002
其中,ch表示声道索引,E post(ch)表示声道索引为ch的声道的音频信号经立体声处 理后的能量/幅度,sampleCoef post(ch,i)表示经立体声处理后的第ch声道的当前帧的第i个系数,N表示该当前帧的系数的个数,N取大于1的正整数。 Among them, ch represents the channel index, E post (ch) represents the stereo-processed energy/amplitude of the audio signal of the channel whose channel index is ch, and sampleCoef post (ch, i) represents the ch-th sound after stereo processing. The ith coefficient of the current frame of the track, N represents the number of coefficients of the current frame, and N takes a positive integer greater than 1.
一种可能的设计中,根据该P个声道的音频信号各自的能量/幅度,确定该当前帧的能量/幅度和,可以包括:根据该P个声道的音频信号各自的能量/幅度均衡前的能量/幅度,确定该当前帧的能量/幅度和,该P个声道中的一个声道的音频信号的能量/幅度均衡前的能量/幅度包括该一个声道的音频信号在时域的能量/幅度,或该一个声道的音频信号经时频变换后的能量/幅度,或该一个声道的音频信号经时频变换以及白化后的能量/幅度。In a possible design, determining the energy/amplitude sum of the current frame according to the respective energy/amplitude of the audio signals of the P channels may include: equalizing according to the respective energy/amplitude of the audio signals of the P channels energy/amplitude before, determine the energy/amplitude sum of the current frame, the energy/amplitude of the audio signal of one channel in the P channels The energy/amplitude before equalization includes the audio signal of the one channel in the time domain , or the energy/amplitude of the audio signal of the one channel after time-frequency transformation, or the energy/amplitude of the audio signal of the one channel after time-frequency transformation and whitening.
本实现方式,通过使用当前帧的P个声道的音频信号各自的能量/幅度均衡前的能量/幅度,确定当前帧的能量/幅度和,以基于该当前帧的能量/幅度和进行比特分配,即采用能量/幅度均衡前的能量/幅度进行比特分配,可以实现合理分配多声道信号编码中各个声道的比特数,以保证解码端重建音频信号的质量。本实现方式可以解决能量/幅度大的声道对信号的编码比特不足的问题,以保证解码端重建音频信号的质量。In this implementation, the energy/amplitude sum of the current frame is determined by using the energy/amplitude of the audio signals of the P channels of the current frame before equalization, so as to perform bit allocation based on the energy/amplitude sum of the current frame , that is, using the energy/amplitude before energy/amplitude equalization to perform bit allocation, it is possible to reasonably allocate the number of bits of each channel in multi-channel signal encoding, so as to ensure the quality of the reconstructed audio signal at the decoding end. This implementation manner can solve the problem of insufficient coding bits for the signal of the channel with large energy/amplitude, so as to ensure the quality of the reconstructed audio signal at the decoding end.
采用能量/幅度均衡前的能量/幅度进行比特分配,相较于采用能量/幅度均衡后的能量/幅度进行比特分配,可以实现合理分配多声道信号编码中各个声道的比特数,以及比特分配处理与能量/幅度均衡处理解耦。即比特分配处理,并不受能量/幅度均衡处理的影响。例如,即使在能量/幅度均衡处理过程中采用对所有声道的能量/幅度取均值的方式,本实现方式采用能量/幅度均衡前的能量/幅度进行比特分配也可以实现合理分配多声道信号编码中各个声道的比特数,以使得分配给能量/幅度大的声道信号的编码比特更多,以保证解码端重建音频信号的质量。Using the energy/amplitude before energy/amplitude equalization for bit allocation, compared with using the energy/amplitude after energy/amplitude equalization for bit allocation, can reasonably allocate the number of bits of each channel in multi-channel signal coding, and the number of bits The allocation processing is decoupled from the energy/amplitude equalization processing. That is, the bit allocation process is not affected by the energy/amplitude equalization process. For example, even if the energy/amplitude of all channels is averaged during the energy/amplitude equalization process, this implementation method uses the energy/amplitude before the energy/amplitude equalization to perform bit allocation, and can achieve reasonable distribution of multi-channel signals The number of bits of each channel in encoding, so that more encoding bits are allocated to channel signals with large energy/amplitude, so as to ensure the quality of the reconstructed audio signal at the decoding end.
一种可能的设计中,根据该P个声道的音频信号各自的能量/幅度均衡前的能量/幅度,确定该当前帧的能量/幅度和,可以包括:In a possible design, the energy/amplitude sum of the current frame is determined according to the energy/amplitude before equalization of the respective energy/amplitude of the audio signals of the P channels, which may include:
根据公式
Figure PCTCN2021106102-appb-000003
计算该当前帧的能量/幅度和sum_E pre,其中,ch表示声道索引,E pre(ch)表示声道索引为ch的声道的音频信号经能量/幅度均衡前的能量/幅度。
According to the formula
Figure PCTCN2021106102-appb-000003
Calculate the energy/amplitude sum_E pre of the current frame, where ch represents the channel index, and E pre (ch) represents the energy/amplitude of the audio signal of the channel whose channel index is ch before energy/amplitude equalization.
一种可能的设计中,根据该P个声道的音频信号各自的能量/幅度,确定该当前帧的能量/幅度和,可以包括:根据该P个声道的音频信号各自的能量/幅度均衡前的能量/幅度和该P个声道各自的加权系数,确定该当前帧的能量/幅度和,该加权系数小于或等于1。In a possible design, determining the energy/amplitude sum of the current frame according to the respective energy/amplitude of the audio signals of the P channels may include: equalizing according to the respective energy/amplitude of the audio signals of the P channels The previous energy/amplitude and the respective weighting coefficients of the P channels are used to determine the energy/amplitude sum of the current frame, and the weighting coefficient is less than or equal to 1.
本实现方式,通过加权系数,可以调整多声道信号编码中各个声道的比特数,以实现合理分配多声道信号编码中各个声道的比特数。In this implementation manner, the number of bits of each channel in the multi-channel signal encoding can be adjusted through the weighting coefficient, so as to achieve reasonable allocation of the number of bits of each channel in the multi-channel signal encoding.
一种可能的设计中,根据该P个声道的音频信号各自的能量/幅度均衡前的能量/幅度和该P个声道各自的加权系数,确定该能量/幅度和,可以包括:In a possible design, the energy/amplitude sum is determined according to the energy/amplitude of the audio signals of the P channels before equalization and the respective weighting coefficients of the P channels, which may include:
根据公式
Figure PCTCN2021106102-appb-000004
计算该当前帧的能量/幅度和sum_E pre
According to the formula
Figure PCTCN2021106102-appb-000004
Calculate the energy/amplitude and sum_E pre of the current frame;
其中,ch表示声道索引,E pre(ch)为第ch声道的音频信号经能量/幅度均衡前的能量/幅度,α(ch)为第ch声道的加权系数,一个声道对的两个声道的加权系数相同,且所述一个声道对的两个声道的加权系数大小与所述一个声道对的两个声道之间的归一化相关值成反比。 Among them, ch represents the channel index, E pre (ch) is the energy/amplitude of the audio signal of the ch-th channel before energy/amplitude equalization, α(ch) is the weighting coefficient of the ch-th channel, and the The weighting coefficients of the two channels are the same, and the magnitude of the weighting coefficients of the two channels of the one channel pair is inversely proportional to the normalized correlation value between the two channels of the one channel pair.
本实现方式,通过加权系数调整多声道信号编码中各个声道的比特数,一个声道对的两个声道的加权系数大小与该声道对的两个声道之间的归一化相关值成反比,即通过加权 系数可以提升相关度低的声道对的比特数,从而提升编码效果,以保证解码端重建音频信号的质量。In this implementation manner, the number of bits of each channel in multi-channel signal coding is adjusted by the weighting coefficient, and the size of the weighting coefficient of the two channels of a channel pair is normalized between the two channels of the channel pair. The correlation value is inversely proportional, that is, the number of bits of the channel pair with low correlation can be increased through the weighting coefficient, thereby improving the encoding effect and ensuring the quality of the reconstructed audio signal at the decoding end.
一种可能的设计中,该P个声道的音频信号还包括未组对的Q个声道的音频信号,P=2×K+Q,Q为正整数。根据该P个声道的音频信号各自的能量/幅度,和可用比特数,确定该K个声道对各自的比特数,可以包括:根据该P个声道的音频信号各自的能量/幅度,和可用比特数,确定该K个声道对各自的比特数以及该Q个声道各自的比特数。根据该K个声道对各自的比特数,对该P个声道的音频信号进行编码,可以包括:根据该K个声道对各自的比特数分别对该K个声道对的音频信号进行编码,根据该Q个声道各自的比特数分别对该Q个声道的音频信号进行编码。其中,所述Q个声道中的一个声道可以是单声道,或者也可以是经过下混得到的声道。In a possible design, the audio signals of the P channels further include the audio signals of the Q channels that are not paired, and P=2×K+Q, where Q is a positive integer. According to the respective energy/amplitude of the audio signals of the P channels and the number of available bits, determining the respective bit numbers of the K channel pairs may include: according to the respective energy/amplitude of the audio signals of the P channels, and the number of available bits, determine the number of bits for each of the K channel pairs and the number of bits for each of the Q channels. Encoding the audio signals of the P channels according to the respective bit numbers of the K channel pairs may include: respectively encoding the audio signals of the K channel pairs according to the respective bit numbers of the K channel pairs. encoding, encoding the audio signals of the Q channels according to the respective bit numbers of the Q channels. Wherein, one of the Q channels may be a monophonic channel, or may also be a channel obtained by downmixing.
一种可能的设计中,根据该P个声道的音频信号各自的能量/幅度,和可用比特数,确定该K个声道对各自的比特数以及该Q个声道各自的比特数,可以包括:根据该P个声道的音频信号各自的能量/幅度,确定该当前帧的能量/幅度和。根据该K个声道对的音频信号各自的能量/幅度,与该当前帧的能量/幅度和,确定该K个声道对各自的比特系数。根据Q个声道的音频信号各自的能量/幅度,与该当前帧的能量/幅度和,确定该Q个声道各自的比特系数。根据该K个声道对各自的比特系数和该可用比特数,确定该K个声道对各自的比特数。根据该Q个声道各自的比特系数和所述可用比特数,确定该Q个声道各自的比特数。In a possible design, according to the respective energies/amplitudes of the audio signals of the P channels and the number of available bits, the respective bit numbers of the K channel pairs and the respective bit numbers of the Q channels can be determined. The method includes: determining the energy/amplitude sum of the current frame according to the respective energy/amplitude of the audio signals of the P channels. The respective bit coefficients of the K channel pairs are determined according to the sum of the respective energy/amplitude of the audio signals of the K channel pairs and the energy/amplitude sum of the current frame. The respective bit coefficients of the Q channels are determined according to the sum of the respective energy/amplitude of the audio signals of the Q channels and the energy/amplitude of the current frame. The respective bit numbers of the K channel pairs are determined according to the respective bit coefficients of the K channel pairs and the available number of bits. The respective bit numbers of the Q channels are determined according to the respective bit coefficients of the Q channels and the available number of bits.
一种可能的设计中,根据该K个声道对各自的比特数,对该P个声道的音频信号进行编码,可以包括:根据该K个声道对各自的比特数,对该P个声道的经能量/幅度均衡后的音频信号进行编码。In a possible design, encoding the audio signals of the P channels according to the respective bit numbers of the K channels may include: encoding the P channels according to the respective bit numbers of the K channels. The energy/amplitude equalized audio signal of the channel is encoded.
本实现方式,可以对P个声道的经能量/幅度均衡后的音频信号进行编码,其中,该P个声道的经能量/幅度均衡后的音频信号可以通过对P个声道的音频信号进行能量/幅度均衡后获取,该编码可以包括立体声处理、熵编码等,可以提高编码效率以及编码效果。In this implementation manner, the energy/amplitude equalized audio signals of the P channels can be encoded, wherein the energy/amplitude equalized audio signals of the P channels can be encoded by encoding the audio signals of the P channels. Obtained after performing energy/amplitude equalization, the encoding may include stereo processing, entropy encoding, etc., which can improve encoding efficiency and encoding effect.
第二方面,本申请实施例提供一种多声道音频信号编码装置,该多声道音频信号编码装置可以为音频编码器,或音频编码设备的芯片或者片上系统,还可以为音频编码器中用于实现上述第一方面或上述第一方面的任一可能的设计的方法的功能模块。该多声道音频信号编码装置可以实现上述第一方面或上述第一方面的各可能的设计中所执行的功能,功能可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个上述功能相应的模块。举例来说,一种可能的设计中,该多声道音频信号编码装置,可以包括:获取模块,用于获取多声道音频信号的当前帧的P个声道的音频信号和该P个声道的音频信号各自的能量/幅度,P为大于1的正整数,该P个声道的音频信号包括K个声道对的音频信号,K为正整数。比特分配模块,用于根据该P个声道的音频信号各自的能量/幅度,和可用比特数,确定该K个声道对各自的比特数。编码模块,用于根据该K个声道对各自的比特数,对该P个声道的音频信号进行编码,以获取编码码流。In a second aspect, an embodiment of the present application provides a multi-channel audio signal encoding device, and the multi-channel audio signal encoding device may be an audio encoder, or a chip or a system-on-a-chip of an audio encoding device, or an audio encoder. A functional module of a method for implementing the above-mentioned first aspect or any possible design of the above-mentioned first aspect. The multi-channel audio signal encoding apparatus can implement the functions executed in the above first aspect or each possible design of the above first aspect, and the functions can be implemented by executing corresponding software through hardware. The hardware or software includes one or more modules corresponding to the above functions. For example, in a possible design, the multi-channel audio signal encoding apparatus may include: an acquisition module configured to acquire the audio signals of the P channels of the current frame of the multi-channel audio signal and the P audio signals The respective energy/amplitude of the audio signals of the channels, P is a positive integer greater than 1, the audio signals of the P channels include audio signals of K channel pairs, and K is a positive integer. The bit allocation module is configured to determine the respective bit numbers of the K channel pairs according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits. The encoding module is configured to encode the audio signals of the P channels according to the respective bit numbers of the K channels to obtain an encoded code stream.
其中,该P个声道中的一个声道的音频信号的能量/幅度包括该一个声道的音频信号在时域的能量/幅度、该一个声道的音频信号经时频变换后的能量/幅度、该一个声道的音频信号经时频变换以及白化后的能量/幅度、该一个声道的音频信号经能量/幅度均衡后的能量/幅度、或该一个声道的音频信号经立体声处理后的能量/幅度中至少一项。Wherein, the energy/amplitude of the audio signal of one channel in the P channels includes the energy/amplitude of the audio signal of the one channel in the time domain, the energy/amplitude of the audio signal of the one channel after time-frequency transformation Amplitude, the time-frequency transformed and whitened energy/amplitude of the audio signal of the one channel, the energy/amplitude of the audio signal of the one channel after energy/amplitude equalization, or the audio signal of the one channel after stereo processing At least one of the following energy/amplitude.
一种可能的设计中,该K个声道对包括当前声道对,该编码模块用于:根据该当前声道对的比特数以及该当前声道对中两个声道的音频信号各自的经立体声处理后的能量/幅度,确定该当前声道对中两个声道各自的比特数。根据该当前声道对中两个声道各自的比特数分别对该两个声道的音频信号进行编码。In a possible design, the K channel pairs include the current channel pair, and the encoding module is used for: according to the number of bits of the current channel pair and the respective audio signals of the two channels in the current channel pair. The energy/amplitude after stereo processing determines the respective bit numbers of the two channels in the current channel pair. The audio signals of the two channels are encoded according to the respective bit numbers of the two channels in the current channel pair.
一种可能的设计中,该比特分配模块用于:根据该P个声道的音频信号各自的能量/幅度,确定该当前帧的能量/幅度和。根据该K个声道对的音频信号各自的能量/幅度,与该当前帧的能量/幅度和,确定该K个声道对各自的比特系数。根据该K个声道对各自的比特系数和该可用比特数,确定该K个声道对各自的比特数。In a possible design, the bit allocation module is configured to: determine the energy/amplitude sum of the current frame according to the respective energy/amplitude of the audio signals of the P channels. The respective bit coefficients of the K channel pairs are determined according to the sum of the respective energy/amplitude of the audio signals of the K channel pairs and the energy/amplitude sum of the current frame. The respective bit numbers of the K channel pairs are determined according to the respective bit coefficients of the K channel pairs and the available number of bits.
一种可能的设计中,该比特分配模块用于:根据该P个声道的音频信号各自的经立体声处理后的能量/幅度,确定该当前帧的能量/幅度和。In a possible design, the bit allocation module is configured to: determine the energy/amplitude sum of the current frame according to the respective stereo-processed energy/amplitude of the audio signals of the P channels.
一种可能的设计中,该比特分配模块用于:根据公式
Figure PCTCN2021106102-appb-000005
计算该当前帧的能量/幅度和sum_E post
In a possible design, the bit allocation module is used to: according to the formula
Figure PCTCN2021106102-appb-000005
Calculate the energy/magnitude and sum_E post for this current frame.
其中,
Figure PCTCN2021106102-appb-000006
in,
Figure PCTCN2021106102-appb-000006
其中,ch表示声道索引,E post(ch)表示声道索引为ch的声道的音频信号经立体声处理后的能量/幅度,sampleCoef post(ch,i)表示经立体声处理后的第ch声道的当前帧的第i个系数,N表示所述当前帧中的系数的个数,N取大于1的正整数。 Among them, ch represents the channel index, E post (ch) represents the stereo-processed energy/amplitude of the audio signal of the channel whose channel index is ch, and sampleCoef post (ch, i) represents the ch-th sound after stereo processing. The ith coefficient of the current frame of the track, N represents the number of coefficients in the current frame, and N takes a positive integer greater than 1.
一种可能的设计中,该比特分配模块用于:根据该P个声道的音频信号各自的能量/幅度均衡前的能量/幅度,确定该当前帧的能量/幅度和,该P个声道中的一个声道的音频信号的能量/幅度均衡前的能量/幅度包括该一个声道的音频信号在时域的能量/幅度,或该一个声道的音频信号经时频变换后的能量/幅度,或该一个声道的音频信号经时频变换以及白化后的能量/幅度。In a possible design, the bit allocation module is used to: determine the energy/amplitude sum of the current frame according to the energy/amplitude before equalization of the respective energy/amplitude of the audio signals of the P channels, the P channels. The energy/amplitude of the audio signal of one channel before equalization includes the energy/amplitude of the audio signal of the one channel in the time domain, or the energy/amplitude of the audio signal of the one channel after time-frequency transformation. Amplitude, or the energy/amplitude of the audio signal of one channel after time-frequency transformation and whitening.
一种可能的设计中,该比特分配模块用于:根据公式
Figure PCTCN2021106102-appb-000007
计算该当前帧的能量/幅度和sum_E pre,其中,ch表示声道索引,E pre(ch)表示声道索引为ch的声道的音频信号经能量/幅度均衡前的能量/幅度。
In a possible design, the bit allocation module is used to: according to the formula
Figure PCTCN2021106102-appb-000007
Calculate the energy/amplitude sum_E pre of the current frame, where ch represents the channel index, and E pre (ch) represents the energy/amplitude of the audio signal of the channel whose channel index is ch before energy/amplitude equalization.
一种可能的设计中,该比特分配模块用于:根据该P个声道的音频信号各自的能量/幅度均衡前的能量/幅度和该P个声道各自的加权系数,确定该当前帧的能量/幅度和,该加权系数小于或等于1。In a possible design, the bit allocation module is used for: according to the energy/amplitude before equalization of the respective energy/amplitude of the audio signals of the P channels and the respective weighting coefficients of the P channels, determine the value of the current frame. Energy/amplitude sum, the weighting factor is less than or equal to 1.
一种可能的设计中,该比特分配模块用于:In one possible design, the bit allocation block is used to:
根据公式
Figure PCTCN2021106102-appb-000008
计算该当前帧的能量/幅度和sum_E pre
According to the formula
Figure PCTCN2021106102-appb-000008
Calculate the energy/amplitude and sum_E pre of the current frame;
其中,ch表示声道索引,E pre(ch)为第ch声道的音频信号经能量/幅度均衡前的能量/幅度,α(ch)为第ch声道的加权系数,一个声道对的两个声道的加权系数相同,且所述一个声道对的两个声道的加权系数大小与该一个声道对的两个声道之间的归一化相关值成反比。 Among them, ch represents the channel index, E pre (ch) is the energy/amplitude of the audio signal of the ch-th channel before energy/amplitude equalization, α(ch) is the weighting coefficient of the ch-th channel, and the The weighting coefficients of the two channels are the same, and the size of the weighting coefficients of the two channels of the one channel pair is inversely proportional to the normalized correlation value between the two channels of the one channel pair.
一种可能的设计中,该P个声道的音频信号还包括未组对的Q个声道的音频信号,P=2×K+Q,K为正整数,Q为正整数。该比特分配模块用于:根据该P个声道的音频信号各自的能量/幅度,和可用比特数,确定该K个声道对各自的比特数以及该Q个声道各 自的比特数。该编码模块用于根据该K个声道对各自的比特数分别对该K个声道对的音频信号进行编码,根据该Q个声道各自的比特数分别对该Q个声道的音频信号进行编码。In a possible design, the audio signals of the P channels also include audio signals of the Q channels that are not paired, where P=2×K+Q, where K is a positive integer, and Q is a positive integer. The bit allocation module is configured to: determine the respective bit numbers of the K channel pairs and the respective bit numbers of the Q channels according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits. The encoding module is used to encode the audio signals of the K channel pairs according to the respective bit numbers of the K channel pairs, and respectively encode the audio signals of the Q channels according to the respective bit numbers of the Q channels to encode.
一种可能的设计中,比特分配模块用于:根据该P个声道的音频信号各自的能量/幅度,确定该当前帧的能量/幅度和。根据该K个声道对的音频信号各自的能量/幅度,与该当前帧的能量/幅度和,确定该K个声道对各自的比特系数。根据该Q个声道的音频信号各自的能量/幅度,与该当前帧的能量/幅度和,确定该Q个声道各自的比特系数。根据该K个声道对各自的比特系数和该可用比特数,确定该K个声道对各自的比特数。根据该Q个声道各自的比特系数和所述可用比特数,确定该Q个声道各自的比特数。In a possible design, the bit allocation module is configured to: determine the sum of the energy/amplitude of the current frame according to the respective energy/amplitude of the audio signals of the P channels. The respective bit coefficients of the K channel pairs are determined according to the sum of the respective energy/amplitude of the audio signals of the K channel pairs and the energy/amplitude sum of the current frame. The respective bit coefficients of the Q channels are determined according to the sum of the energy/amplitude of the audio signals of the Q channels and the energy/amplitude of the current frame. The respective bit numbers of the K channel pairs are determined according to the respective bit coefficients of the K channel pairs and the available number of bits. The respective bit numbers of the Q channels are determined according to the respective bit coefficients of the Q channels and the available number of bits.
一种可能的设计中,该编码模块用于根据该K个声道对各自的比特数,对该P个声道的经能量/幅度均衡后的音频信号进行编码。In a possible design, the encoding module is configured to encode the energy/amplitude equalized audio signals of the P channels according to the respective bit numbers of the K channels.
在一种实施方式中,该装置还可以包括:能量/幅度均衡模块。该能量/幅度均衡模块,用于根据该P个声道的音频信号,获取该P个声道的经能量/幅度均衡后的音频信号。In one embodiment, the apparatus may further include: an energy/amplitude equalization module. The energy/amplitude equalization module is configured to obtain the energy/amplitude equalized audio signals of the P channels according to the audio signals of the P channels.
第三方面,本申请实施例提供一种多声道音频信号编码方法,该方法可以包括:获取多声道音频信号的当前帧的P个声道的音频信号,P为大于1的正整数,该P个声道的音频信号包括K个声道对的音频信号,K为正整数。根据该K个声道对中当前声道对的两个声道的音频信号各自的能量/幅度,对该当前声道对的两个声道的音频信号进行能量/幅度均衡,以获取该当前声道对的两个声道的音频信号各自的经能量/幅度均衡后的能量/幅度。根据该当前声道对的两个声道的音频信号各自的经能量/幅度均衡后的能量/幅度,和可用比特数,确定该当前声道对的两个声道各自的比特数。根据该当前声道对的两个声道各自的比特数分别对该两个声道的音频信号进行编码,以获取编码码流。In a third aspect, an embodiment of the present application provides a multi-channel audio signal encoding method, the method may include: acquiring audio signals of P channels of a current frame of the multi-channel audio signal, where P is a positive integer greater than 1, The audio signals of the P channels include audio signals of K channel pairs, where K is a positive integer. According to the respective energy/amplitude of the audio signals of the two channels of the current channel pair in the K channel pairs, perform energy/amplitude equalization on the audio signals of the two channels of the current channel pair to obtain the current channel pair. The energy/amplitude of the respective energy/amplitude equalized audio signals of the two channels of the channel pair. The respective bit numbers of the two channels of the current channel pair are determined according to the energy/amplitude after energy/amplitude equalization of the audio signals of the two channels of the current channel pair, and the number of available bits. The audio signals of the two channels are encoded respectively according to the respective bit numbers of the two channels of the current channel pair, so as to obtain an encoded code stream.
本实现方式,可以通过对单个声道对内的两个声道进行能量/幅度均衡,以实现对于能量/幅度差异较大的声道对间,经过能量/幅度均衡后,仍可以保持较大的能量/幅度差异,从而使得基于能量/幅度均衡后的能量/幅度进行比特分配时,可以给能量/幅度较大的声道对分配更多的比特,以保证能量/幅度较大的声道对的编码比特满足其编码需求,进而提升解码端重建音频信号的质量。In this implementation, the energy/amplitude equalization can be performed on the two channels in a single channel pair, so that the channel pair with a large energy/amplitude difference can still maintain a large energy/amplitude equalization after the energy/amplitude equalization. energy/amplitude difference, so that when bit allocation is performed based on the energy/amplitude after energy/amplitude equalization, more bits can be allocated to channel pairs with larger energy/amplitude to ensure that channels with larger energy/amplitude The right coded bits meet its coding requirements, thereby improving the quality of the reconstructed audio signal at the decoding end.
一种可能的设计中,P=2×K,K为正整数,根据该当前声道对的两个声道的音频信号各自的经能量/幅度均衡后的能量/幅度,和可用比特数,确定该当前声道对的两个声道各自的比特数,可以包括:根据P个声道的音频信号各自的经能量/幅度均衡后的能量/幅度,确定该当前帧的能量/幅度和。根据该当前帧的能量/幅度和、该当前声道对的两个声道的音频信号各自的经能量/幅度均衡后的能量/幅度以及可用比特数,确定该当前声道对的两个声道各自的比特数。In a possible design, P=2×K, K is a positive integer, according to the energy/amplitude after energy/amplitude equalization of the audio signals of the two channels of the current channel pair, and the number of available bits, Determining the respective bit numbers of the two channels of the current channel pair may include: determining the energy/amplitude sum of the current frame according to the respective energy/amplitude equalized energy/amplitude of the audio signals of the P channels. According to the energy/amplitude sum of the current frame, the energy/amplitude after energy/amplitude equalization of the audio signals of the two channels of the current channel pair, and the number of available bits, determine the two audio channels of the current channel pair. the number of bits for each channel.
一种可能的设计中,该P个声道的音频信号还包括未组对的Q个声道的音频信号,P=2×K+Q,K为正整数,Q为正整数。根据该当前声道对的两个声道的音频信号各自的经能量/幅度均衡后的能量/幅度,和可用比特数,确定该当前声道对的两个声道各自的比特数,可以包括:根据该K个声道对各自的两个声道的音频信号的能量/幅度均衡后的能量/幅度、以及该Q个声道的音频信号的经能量/幅度均衡后的能量/幅度,确定当前帧的能量/幅度和。根据该当前帧的能量/幅度和、该当前声道对的两个声道的音频信号各自的能量/幅度以及可用比特数,确定该当前声道对的两个声道各自的比特数。根据该当前帧的能量/幅度和、该Q个声道的音频信号各自的经能量/幅度均衡后的能量/幅度以及可用比特数,确 定该Q个声道各自的比特数。根据该当前声道对的两个声道各自的比特数分别对该两个声道的音频信号进行编码,获取编码码流,可以包括:根据该K个声道对各自的比特数分别对该K个声道对的音频信号进行编码,根据该Q个声道各自的比特数分别对该Q个声道的音频信号进行编码,以获取编码码流。In a possible design, the audio signals of the P channels also include audio signals of the Q channels that are not paired, where P=2×K+Q, where K is a positive integer, and Q is a positive integer. According to the respective energy/amplitude equalized energy/amplitude of the audio signals of the two channels of the current channel pair, and the number of available bits, determine the respective bit numbers of the two channels of the current channel pair, which may include : According to the energy/amplitude after the energy/amplitude equalization of the audio signals of the respective two channels by the K channels, and the energy/amplitude after the energy/amplitude equalization of the audio signals of the Q channels, determine The energy/magnitude sum of the current frame. The respective bit numbers of the two channels of the current channel pair are determined according to the energy/amplitude sum of the current frame, the respective energy/amplitude of the audio signals of the two channels of the current channel pair, and the number of available bits. The respective bit numbers of the Q channels are determined according to the energy/amplitude sum of the current frame, the energy/amplitude after energy/amplitude equalization of the audio signals of the Q channels, and the number of available bits. Encoding the audio signals of the two channels according to the respective bit numbers of the two channels of the current channel pair, and obtaining the encoded code stream, may include: according to the respective bit numbers of the K channel pairs, respectively. The audio signals of the K channel pairs are encoded, and the audio signals of the Q channels are encoded according to the respective bit numbers of the Q channels, so as to obtain an encoded code stream.
第四方面,本申请实施例提供一种多声道音频信号编码装置,该多声道音频信号编码装置可以为音频编码器,或音频编码设备的芯片或者片上系统,还可以为音频编码器中用于实现上述第三方面或上述第三方面的任一可能的设计的方法的功能模块。该多声道音频信号编码装置可以实现上述第三方面或上述第三方面的各可能的设计中所执行的功能,功能可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个上述功能相应的模块。举例来说,一种可能的设计中,该多声道音频信号编码装置,可以包括:获取模块,用于获取多声道音频信号的当前帧的P个声道的音频信号,P为大于1的正整数,该P个声道的音频信号包括K个声道对的音频信号,K为正整数。能量/幅度均衡模块,用于根据该K个声道对中当前声道对的两个声道的音频信号各自的能量/幅度,对该当前声道对的两个声道的音频信号进行能量/幅度均衡,以获取该当前声道对的两个声道的音频信号各自的经能量/幅度均衡后的能量/幅度。比特分配模块,用于根据该当前声道对的两个声道的音频信号各自的经能量/幅度均衡后的能量/幅度,和可用比特数,确定该当前声道对的两个声道各自的比特数。编码模块,用于根据该当前声道对的两个声道各自的比特数分别对该两个声道的音频信号进行编码,以获取编码码流。In a fourth aspect, an embodiment of the present application provides a multi-channel audio signal encoding device, and the multi-channel audio signal encoding device may be an audio encoder, or a chip or a system-on-chip of an audio encoding device, or an audio encoder. A functional module of a method for implementing the above third aspect or any possible design of the above third aspect. The multi-channel audio signal encoding apparatus can implement the functions executed in the above third aspect or each possible design of the above third aspect, and the functions can be implemented by executing corresponding software in hardware. The hardware or software includes one or more modules corresponding to the above functions. For example, in a possible design, the multi-channel audio signal encoding apparatus may include: an acquisition module configured to acquire the audio signals of P channels of the current frame of the multi-channel audio signal, where P is greater than 1 A positive integer of , the audio signals of the P channels include audio signals of K channel pairs, and K is a positive integer. The energy/amplitude equalization module is used for performing energy analysis on the audio signals of the two channels of the current channel pair according to the respective energy/amplitude of the audio signals of the two channels of the current channel pair in the K channel pairs. /amplitude equalization, to obtain the respective energy/amplitude equalized energy/amplitude of the audio signals of the two channels of the current channel pair. A bit allocation module, configured to determine the respective energy/amplitude of the audio signals of the two channels of the current channel pair after energy/amplitude equalization, and the number of available bits, to determine the respective two channels of the current channel pair. number of bits. The encoding module is configured to encode the audio signals of the two channels according to the respective bit numbers of the two channels of the current channel pair, so as to obtain an encoded code stream.
一种可能的设计中,P=2×K,K为正整数,该比特分配模块用于:根据该P个声道的音频信号各自的经能量/幅度均衡后的能量/幅度,确定该当前帧的能量/幅度和;根据该当前帧的能量/幅度和、该当前声道对的两个声道的音频信号各自的经能量/幅度均衡后的能量/幅度以及该可用比特数,确定该当前声道对的两个声道各自的比特数。In a possible design, P=2×K, K is a positive integer, and the bit allocation module is used to: determine the current energy/amplitude according to the respective energy/amplitude equalized energy/amplitude of the audio signals of the P channels. The energy/amplitude sum of the frame; according to the energy/amplitude sum of the current frame, the respective energy/amplitude equalized energy/amplitude of the audio signals of the two channels of the current channel pair, and the available number of bits, determine the The number of bits for each of the two channels of the current channel pair.
一种可能的设计中,该P个声道的音频信号还包括未组对的Q个声道的音频信号,P=2×K+Q,K为正整数,Q为正整数。该比特分配模块用于:根据该K个声道对各自的两个声道的音频信号的经能量/幅度均衡后的能量/幅度、以及该Q个声道的音频信号的经能量/幅度均衡后的能量/幅度,确定该当前帧的能量/幅度和;根据该当前帧的能量/幅度和、该当前声道对的两个声道的音频信号各自的能量/幅度以及该可用比特数,确定该当前声道对的两个声道各自的比特数;根据该当前帧的能量/幅度和、该Q个声道的音频信号各自的经能量/幅度均衡后的能量/幅度以及该可用比特数,确定该Q个声道各自的比特数。该编码模块用于:根据该K个声道对各自的比特数分别对该K个声道对的音频信号进行编码,根据该Q个声道各自的比特数分别对该Q个声道的音频信号进行编码,以获取编码码流。In a possible design, the audio signals of the P channels also include audio signals of the Q channels that are not paired, where P=2×K+Q, where K is a positive integer, and Q is a positive integer. The bit allocation module is used for: according to the energy/amplitude equalized energy/amplitude of the audio signals of the respective two channels according to the K channels, and the energy/amplitude equalization of the audio signals of the Q channels Determine the energy/amplitude sum of the current frame; according to the energy/amplitude sum of the current frame, the respective energy/amplitude of the audio signals of the two channels of the current channel pair, and the number of available bits, Determine the respective bit numbers of the two channels of the current channel pair; according to the energy/amplitude sum of the current frame, the respective energy/amplitude equalized energy/amplitude of the audio signals of the Q channels, and the available bits number to determine the number of bits for each of the Q channels. The encoding module is used for: encoding the audio signals of the K channel pairs according to the respective bit numbers of the K channel pairs, and respectively encoding the audio signals of the Q channels according to the respective bit numbers of the Q channels The signal is encoded to obtain the encoded code stream.
第五方面,本申请实施例提供一种音频信号编码装置,包括:相互耦合的非易失性存储器和处理器,所述处理器调用存储在所述存储器中的程序代码以执行如上述第一方面中任一项所述的方法,或者以执行如上述第三方面中任一项所述的方法。In a fifth aspect, an embodiment of the present application provides an audio signal encoding apparatus, comprising: a non-volatile memory and a processor coupled to each other, the processor calling program codes stored in the memory to execute the above-mentioned first The method of any one of the aspects, or to perform the method of any one of the third aspects above.
第六方面,本申请实施例提供一种音频信号编码设备,包括:编码器,所述编码器用于执行如上述第一方面中任一项所述的方法,或者,执行如上述第三方面中任一项所述的方法。In a sixth aspect, an embodiment of the present application provides an audio signal encoding device, including: an encoder, where the encoder is configured to perform the method described in any one of the first aspect above, or perform the method described in the third aspect above The method of any one.
第七方面,本申请实施例提供一种计算机可读存储介质,包括计算机程序,所述计算机程序在计算机上被执行时,使得所述计算机执行上述第一方面中任一项所述的方法,或 者,执行如上述第三方面中任一项所述的方法。In a seventh aspect, an embodiment of the present application provides a computer-readable storage medium, including a computer program, when the computer program is executed on a computer, the computer program causes the computer to execute the method described in any one of the above-mentioned first aspects, Alternatively, the method according to any one of the above third aspects is performed.
第八方面,本申请实施例提供一种计算机可读存储介质,包括根据上述第一方面中任一项所述的方法获得的编码码流,或者根据上述第三方面中任一项所述的方法获得的编码码流。In an eighth aspect, an embodiment of the present application provides a computer-readable storage medium, including an encoded code stream obtained according to any of the methods described in the first aspect above, or the method described in any of the above-mentioned third aspects. The encoded code stream obtained by the method.
第九方面,本申请提供一种计算机程序产品,该计算机程序产品包括计算机程序,当所述计算机程序被计算机执行时,用于执行上述第一方面中任一项所述的方法,或者执行上述第三方面中任一项所述的方法。In a ninth aspect, the present application provides a computer program product, the computer program product includes a computer program, when the computer program is executed by a computer, for executing the method described in any one of the above first aspects, or executing the above The method of any one of the third aspects.
第十方面,本申请提供一种芯片,包括处理器和存储器,所述存储器用于存储计算机程序,所述处理器用于调用并运行所述存储器中存储的计算机程序,以执行如上述第一方面中任一项所述的方法,或者以执行如上述第三方面中任一项所述的方法。In a tenth aspect, the present application provides a chip, including a processor and a memory, the memory is used for storing a computer program, and the processor is used for calling and running the computer program stored in the memory, so as to execute the above-mentioned first aspect The method of any one of the above, or to perform the method of any one of the third aspects above.
本申请实施例的多声道音频信号编码方法和装置,获取多声道音频信号的当前帧的P个声道的音频信号,该P个声道的音频信号包括K个声道对的音频信号,根据P个声道的音频信号各自的能量/幅度,和可用比特数,确定K个声道对各自的比特数,根据K个声道对各自的比特数,对P个声道的音频信号进行编码,以获取编码码流。其中,P个声道中的一个声道的音频信号的能量/幅度包括该一个声道的音频信号在时域的能量/幅度该一个声道的音频信号经时频变换后的能量/幅度、该一个声道的音频信号经时频变换以及白化后的能量/幅度、该一个声道的音频信号经能量/幅度均衡后的能量/幅度、或该一个声道的音频信号经立体声处理后的能量/幅度中至少一项。通过根据P个声道的音频信号各自在时域的能量/幅度、经时频变换后的能量/幅度、经时频变换以及白化后的能量/幅度、经能量/幅度均衡后的能量/幅度、或经立体声处理后的能量/幅度中至少一项进行针对声道对的比特分配,确定K个声道对各自的比特数,从而实现合理分配多声道信号编码中各个声道对的比特数,以保证解码端重建音频信号的质量。例如,对于声道对间能量/幅度差异较大的情况,通过本申请实施例的方法,可以解决能量/幅度大的声道对的编码比特不足的问题,以保证解码端重建音频信号的质量。The multi-channel audio signal encoding method and device according to the embodiments of the present application acquire the audio signals of P channels of the current frame of the multi-channel audio signal, where the audio signals of the P channels include audio signals of K channel pairs , according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits, determine the respective bit numbers of the K channel pairs, and according to the respective bit numbers of the K channel pairs, for the audio signals of the P channels Encode to get the encoded bitstream. Wherein, the energy/amplitude of the audio signal of one channel of the P channels includes the energy/amplitude of the audio signal of the one channel in the time domain, the energy/amplitude of the audio signal of the one channel after time-frequency transformation, The energy/amplitude of the audio signal of the one channel after time-frequency transformation and whitening, the energy/amplitude of the audio signal of the one channel after energy/amplitude equalization, or the audio signal of the one channel after stereo processing At least one of energy/amplitude. According to the energy/amplitude of the audio signals of the P channels in the time domain, the energy/amplitude after time-frequency transformation, the energy/amplitude after time-frequency transformation and whitening, and the energy/amplitude after energy/amplitude equalization , or at least one of the energy/amplitude after stereo processing performs the bit allocation for the channel pair, and determines the respective bit numbers of the K channel pairs, thereby realizing the reasonable allocation of the bits of each channel pair in the multi-channel signal encoding. to ensure the quality of the reconstructed audio signal at the decoding end. For example, in the case of a large difference in energy/amplitude between channel pairs, the method of the embodiments of the present application can solve the problem of insufficient coding bits for channel pairs with large energy/amplitude, so as to ensure the quality of the reconstructed audio signal at the decoding end .
附图说明Description of drawings
图1为本申请实施例中的音频编码及解码系统实例的示意图;1 is a schematic diagram of an example of an audio encoding and decoding system in an embodiment of the application;
图2为本申请实施例的一种多声道音频信号编码方法的流程图;2 is a flowchart of a method for encoding a multi-channel audio signal according to an embodiment of the present application;
图3为本申请实施例的一种多声道音频信号编码方法的流程图;3 is a flowchart of a multi-channel audio signal encoding method according to an embodiment of the application;
图4为本申请实施例的一种声道对的比特分配方法的流程图;4 is a flowchart of a method for allocating bits of a channel pair according to an embodiment of the present application;
图5为本申请实施例的编码端的处理过程的示意图;5 is a schematic diagram of a processing process of an encoding end according to an embodiment of the present application;
图6为本申请实施例的声道编码单元的处理过程的示意图;6 is a schematic diagram of a processing process of a channel coding unit according to an embodiment of the present application;
图7为本申请实施例的声道编码单元的处理过程的示意图;7 is a schematic diagram of a processing process of a channel coding unit according to an embodiment of the present application;
图8为本申请实施例的另一种多声道音频信号编码方法的流程图;8 is a flowchart of another multi-channel audio signal encoding method according to an embodiment of the application;
图9为本申请实施例的一种音频信号编码装置的结构示意图;9 is a schematic structural diagram of an audio signal encoding apparatus according to an embodiment of the application;
图10为本申请实施例的一种音频信号编码设备的结构示意图。FIG. 10 is a schematic structural diagram of an audio signal encoding device according to an embodiment of the present application.
具体实施方式detailed description
本申请实施例涉及的术语“第一”、“第二”等仅用于区分描述的目的,而不能理解为 指示或暗示相对重要性,也不能理解为指示或暗示顺序。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元。方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", etc. involved in the embodiments of the present application are only used for the purpose of distinguishing and describing, and cannot be understood as indicating or implying relative importance, nor can they be understood as indicating or implying order. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover non-exclusive inclusion, eg, comprising a series of steps or elements. A method, system, product or device is not necessarily limited to those steps or units expressly listed, but may include other steps or units not expressly listed or inherent to the process, method, product or device.
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c分别可以是单个,也可以分别是多个,也可以是部分是单个,部分是多个。It should be understood that, in this application, "at least one (item)" refers to one or more, and "a plurality" refers to two or more. "And/or" is used to describe the relationship between related objects, indicating that there can be three kinds of relationships, for example, "A and/or B" can mean: only A, only B, and both A and B exist , where A and B can be singular or plural. The character "/" generally indicates that the associated objects are an "or" relationship. "At least one item(s) below" or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (a) of a, b or c, can mean: a, b, c, "a and b", "a and c", "b and c", or "a and b and c" ”, where a, b, c can be single or multiple respectively, or part of them can be single and part of them can be multiple.
下面描述本申请实施例所应用的系统架构。参见图1,图1示例性地给出了本申请实施例所应用的音频编码及解码系统10的示意性框图。如图1所示,音频编码及解码系统10可包括源设备12和目的地设备14,源设备12产生经编码的音频数据,因此,源设备12可被称为音频编码装置。目的地设备14可对由源设备12所产生的经编码的音频数据进行解码,因此,目的地设备14可被称为音频解码装置。源设备12、目的地设备14或两个的各种实施方案可包含一或多个处理器以及耦合到所述一或多个处理器的存储器。所述存储器可包含但不限于RAM、ROM、EEPROM、快闪存储器或可用于以可由计算机存取的指令或数据结构的形式存储所要的程序代码的任何其它媒体,如本文所描述。源设备12和目的地设备14可以包括各种装置,包含桌上型计算机、移动计算装置、笔记型(例如,膝上型)计算机、平板计算机、机顶盒、所谓的“智能”电话等电话手持机、电视机、音箱、数字媒体播放器、视频游戏控制台、车载计算机、任意可穿戴设备、虚拟现实(virtual reality,VR)设备、提供VR服务的服务器、增强现实(augmented reality,AR)设备、提供AR服务的服务器、无线通信设备或其类似者。The following describes the system architecture to which the embodiments of the present application are applied. Referring to FIG. 1 , FIG. 1 exemplarily shows a schematic block diagram of an audio encoding and decoding system 10 to which the embodiments of the present application are applied. As shown in FIG. 1, audio encoding and decoding system 10 may include source device 12 and destination device 14, source device 12 producing encoded audio data, and thus source device 12 may be referred to as an audio encoding device. Destination device 14 may decode the encoded audio data produced by source device 12, and thus destination device 14 may be referred to as an audio decoding device. Various implementations of source device 12, destination device 14, or both may include one or more processors and a memory coupled to the one or more processors. The memory may include, but is not limited to, RAM, ROM, EEPROM, flash memory, or any other medium that may be used to store the desired program code in the form of instructions or data structures accessible by a computer, as described herein. Source device 12 and destination device 14 may include a variety of devices, including desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set-top boxes, so-called "smart" phones, and other telephone handsets , TVs, speakers, digital media players, video game consoles, in-vehicle computers, any wearable devices, virtual reality (VR) devices, servers providing VR services, augmented reality (AR) devices, A server, wireless communication device or the like that provides AR services.
虽然图1将源设备12和目的地设备14绘示为单独的设备,但设备实施例也可以同时包括源设备12和目的地设备14或同时包括两者的功能性,即源设备12或对应的功能性以及目的地设备14或对应的功能性。在此类实施例中,可以使用相同硬件和/或软件,或使用单独的硬件和/或软件,或其任何组合来实施源设备12或对应的功能性以及目的地设备14或对应的功能性。Although FIG. 1 depicts source device 12 and destination device 14 as separate devices, device embodiments may also include the functionality of both source device 12 and destination device 14 or both, ie source device 12 or a corresponding and the functionality of the destination device 14 or corresponding. In such embodiments, source device 12 or corresponding functionality and destination device 14 or corresponding functionality may be implemented using the same hardware and/or software, or using separate hardware and/or software, or any combination thereof .
源设备12和目的地设备14之间可通过链路13进行通信连接,目的地设备14可经由链路13从源设备12接收经编码的音频数据。链路13可包括能够将经编码的音频数据从源设备12移动到目的地设备14的一或多个媒体或装置。在一个实例中,链路13可包括使得源设备12能够实时将经编码的音频数据直接发射到目的地设备14的一或多个通信媒体。在此实例中,源设备12可根据通信标准(例如无线通信协议)来调制经编码的音频数据,且可将经调制的音频数据发射到目的地设备14。所述一或多个通信媒体可包含无线和/或有线通信媒体,例如射频(RF)频谱或一或多个物理传输线。所述一或多个通信媒体可形成基于分组的网络的一部分,基于分组的网络例如为局域网、广域网或全球网络(例如,因特网)。所述一或多个通信媒体可包含路由器、交换器、基站或促进从源设备12到目的地设 备14的通信的其它设备。 Source device 12 and destination device 14 may be communicatively connected via link 13 through which destination device 14 may receive encoded audio data from source device 12 . Link 13 may include one or more media or devices capable of moving encoded audio data from source device 12 to destination device 14 . In one example, link 13 may include one or more communication media that enable source device 12 to transmit encoded audio data directly to destination device 14 in real-time. In this example, source device 12 may modulate the encoded audio data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated audio data to destination device 14 . The one or more communication media may include wireless and/or wired communication media, such as radio frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (eg, the Internet). The one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from source device 12 to destination device 14.
源设备12包括编码器20,另外可选地,源设备12还可以包括音频源16、预处理器18、以及通信接口22。具体实现形态中,所述编码器20、音频源16、预处理器18、以及通信接口22可能是源设备12中的硬件部件,也可能是源设备12中的软件程序。分别描述如下: Source device 12 includes encoder 20 , and optionally, source device 12 may also include audio source 16 , pre-processor 18 , and communication interface 22 . In a specific implementation form, the encoder 20 , the audio source 16 , the preprocessor 18 , and the communication interface 22 may be hardware components in the source device 12 or software programs in the source device 12 . They are described as follows:
音频源16,可以包括或可以为任何类别的声音捕获设备,用于例如捕获现实世界的声音,和/或任何类别的音频生成设备。音频源16可以为用于捕获声音的麦克风或者用于存储音频数据的存储器,音频源16还可以包括存储先前捕获或产生的音频数据和/或获取或接收音频数据的任何类别的(内部或外部)接口。当音频源16为麦克风时,音频源16可例如为本地的或集成在源设备中的集成麦克风;当音频源16为存储器时,音频源16可为本地的或例如集成在源设备中的集成存储器。当所述音频源16包括接口时,接口可例如为从外部音频源接收音频数据的外部接口,外部音频源例如为外部声音捕获设备,比如麦克风、外部存储器或外部音频生成设备。接口可以为根据任何专有或标准化接口协议的任何类别的接口,例如有线或无线接口、光接口。 Audio source 16, which may include or may be any type of sound capture device, for example capturing real world sounds, and/or any type of audio generation device. Audio source 16 may be a microphone for capturing sound or a memory for storing audio data, audio source 16 may also include any category (internal or external) that stores previously captured or generated audio data and/or acquires or receives audio data. )interface. When the audio source 16 is a microphone, the audio source 16 may be, for example, a local or integrated microphone integrated in the source device; when the audio source 16 is a memory, the audio source 16 may be local or, for example, an integrated microphone integrated in the source device memory. When the audio source 16 includes an interface, the interface may be, for example, an external interface that receives audio data from an external audio source, such as an external sound capture device, such as a microphone, an external memory, or an external audio generation device. The interface may be any class of interface according to any proprietary or standardized interface protocol, eg wired or wireless interfaces, optical interfaces.
本申请实施例中,由音频源16传输至预处理器18的音频数据也可称为原始音频数据17。In this embodiment of the present application, the audio data transmitted from the audio source 16 to the preprocessor 18 may also be referred to as original audio data 17 .
预处理器18,用于接收原始音频数据17并对原始音频数据17执行预处理,以获取经预处理的音频19或经预处理的音频数据19。例如,预处理器18执行的预处理可以包括滤波、或去噪等。The preprocessor 18 is used for receiving the original audio data 17 and performing preprocessing on the original audio data 17 to obtain the preprocessed audio 19 or the preprocessed audio data 19 . For example, the preprocessing performed by the preprocessor 18 may include filtering, or denoising, or the like.
编码器20(或称音频编码器20),用于接收经预处理的音频数据19,并用于执行后文所描述的各个实施例,以实现本申请所描述的音频信号编码方法在编码侧的应用。The encoder 20 (or called the audio encoder 20) is used to receive the pre-processed audio data 19, and used to execute the various embodiments described later, so as to realize the encoding method of the audio signal encoding method described in this application. application.
通信接口22,可用于接收经编码的音频数据21,并可通过链路13将经编码的音频数据21传输至目的地设备14或任何其它设备(如存储器),以用于存储或直接重构,所述其它设备可为任何用于解码或存储的设备。通信接口22可例如用于将经编码的音频数据21封装成合适的格式,例如数据包,以在链路13上传输。A communication interface 22 that can be used to receive encoded audio data 21 and to transmit the encoded audio data 21 via link 13 to destination device 14 or any other device (eg, memory) for storage or direct reconstruction , the other device can be any device for decoding or storage. The communication interface 22 may, for example, be used to encapsulate the encoded audio data 21 into a suitable format, eg, data packets, for transmission over the link 13 .
目的地设备14包括解码器30,另外可选地,目的地设备14还可以包括通信接口28、音频后处理器32和扬声设备34。分别描述如下:The destination device 14 includes a decoder 30 , and optionally, the destination device 14 may also include a communication interface 28 , an audio post-processor 32 and a speaker device 34 . They are described as follows:
通信接口28,可用于从源设备12或任何其它源接收经编码的音频数据21,所述任何其它源例如为存储设备,存储设备例如为经编码的音频数据存储设备。通信接口28可以用于藉由源设备12和目的地设备14之间的链路13或藉由任何类别的网络传输或接收经编码音频数据21,链路13例如为直接有线或无线连接,任何类别的网络例如为有线或无线网络或其任何组合,或任何类别的私网和公网,或其任何组合。通信接口28可以例如用于解封装通信接口22所传输的数据包以获取经编码的音频数据21。A communication interface 28 may be used to receive encoded audio data 21 from source device 12 or any other source, such as a storage device, such as an encoded audio data storage device. The communication interface 28 may be used to transmit or receive encoded audio data 21 via the link 13 between the source device 12 and the destination device 14, such as a direct wired or wireless connection, or via any kind of network. Classes of networks are, for example, wired or wireless networks or any combination thereof, or any classes of private and public networks, or any combination thereof. The communication interface 28 may, for example, be used to decapsulate data packets transmitted by the communication interface 22 to obtain encoded audio data 21 .
通信接口28和通信接口22都可以配置为单向通信接口或者双向通信接口,以及可以用于例如发送和接收消息来建立连接、确认和交换任何其它与通信链路和/或例如经编码的音频数据传输的数据传输有关的信息。Both the communication interface 28 and the communication interface 22 may be configured as a one-way communication interface or a two-way communication interface, and may be used, for example, to send and receive messages to establish connections, acknowledge and exchange any other communication links and/or, for example, encoded audio Data transfer information about data transfer.
解码器30(或称为解码器30),用于接收经编码的音频数据21并提供经解码的音频数据31或经解码的音频31。Decoder 30 (or referred to as decoder 30 ) for receiving encoded audio data 21 and providing decoded audio data 31 or decoded audio 31 .
音频后处理器32,用于对经解码的音频数据31(也称为经重构的音频数据)执行后处 理,以获得经后处理的音频数据33。音频后处理器32执行的后处理可以包括:例如渲染,或任何其它处理,还可用于将将经后处理的音频数据33传输至扬声设备34。An audio post-processor 32 for performing post-processing on the decoded audio data 31 (also referred to as reconstructed audio data) to obtain post-processed audio data 33. The post-processing performed by the audio post-processor 32 may include, for example, rendering, or any other processing, and may also be used to transmit the post-processed audio data 33 to the speaker device 34 .
扬声设备34,用于接收经后处理的音频数据33以向例如用户或观看者播放音频。扬声设备34可以为或可以包括任何类别的用于呈现经重构的声音的扬声器。A loudspeaker device 34 for receiving post-processed audio data 33 to play audio to eg a user or viewer. The speaker device 34 may be or include any type of speaker for presenting the reconstructed sound.
虽然,图1将源设备12和目的地设备14绘示为单独的设备,但设备实施例也可以同时包括源设备12和目的地设备14或同时包括两者的功能性,即源设备12或对应的功能性以及目的地设备14或对应的功能性。在此类实施例中,可以使用相同硬件和/或软件,或使用单独的硬件和/或软件,或其任何组合来实施源设备12或对应的功能性以及目的地设备14或对应的功能性。Although FIG. 1 depicts source device 12 and destination device 14 as separate devices, device embodiments may include the functionality of both source device 12 and destination device 14 or both, ie source device 12 or Corresponding functionality and destination device 14 or corresponding functionality. In such embodiments, source device 12 or corresponding functionality and destination device 14 or corresponding functionality may be implemented using the same hardware and/or software, or using separate hardware and/or software, or any combination thereof .
本领域技术人员基于描述明显可知,不同单元的功能性或图1所示的源设备12和/或目的地设备14的功能性的存在和(准确)划分可能根据实际设备和应用有所不同。源设备12和目的地设备14可以包括各种设备中的任一个,包含任何类别的手持或静止设备,例如,笔记本或膝上型计算机、移动电话、智能手机、平板或平板计算机、摄像机、台式计算机、机顶盒、电视机、相机、车载设备、音响、数字媒体播放器、音频游戏控制台、音频流式传输设备(例如内容服务服务器或内容分发服务器)、广播接收器设备、广播发射器设备、智能眼镜、智能手表等,并可以不使用或使用任何类别的操作系统。It will be apparent to those skilled in the art based on the description that the functionality of the different units or the existence and (exact) division of the functionality of the source device 12 and/or the destination device 14 shown in FIG. 1 may vary depending on the actual device and application. Source device 12 and destination device 14 may include any of a variety of devices, including any class of handheld or stationary devices, such as notebook or laptop computers, mobile phones, smartphones, tablet or tablet computers, cameras, desktops Computers, set-top boxes, televisions, cameras, in-vehicle equipment, stereos, digital media players, audio game consoles, audio streaming devices (such as content serving servers or content distribution servers), broadcast receiver equipment, broadcast transmitter equipment, Smart glasses, smart watches, etc., and can use no or any kind of operating system.
编码器20和解码器30都可以实施为各种合适电路中的任一个,例如,一个或多个微处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)、离散逻辑、硬件或其任何组合。如果部分地以软件实施所述技术,则设备可将软件的指令存储于合适的非暂时性计算机可读存储介质中,且可使用一或多个处理器以硬件执行指令从而执行本公开的技术。前述内容(包含硬件、软件、硬件与软件的组合等)中的任一者可视为一或多个处理器。Both encoder 20 and decoder 30 may be implemented as any of a variety of suitable circuits, eg, one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (application-specific integrated circuits) circuit, ASIC), field-programmable gate array (FPGA), discrete logic, hardware, or any combination thereof. If the techniques are implemented in part in software, an apparatus may store instructions for the software in a suitable non-transitory computer-readable storage medium and may execute the instructions in hardware using one or more processors to perform the techniques of this disclosure . Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be considered one or more processors.
在一些情况下,图1中所示音频编码及解码系统10仅为示例,本申请的技术可以适用于不必包含编码和解码设备之间的任何数据通信的音频编码设置(例如,音频编码或音频解码)。在其它实例中,数据可从本地存储器检索、在网络上流式传输等。音频编码设备可以对数据进行编码并且将数据存储到存储器,和/或音频解码设备可以从存储器检索数据并且对数据进行解码。在一些实例中,由并不彼此通信而是仅编码数据到存储器和/或从存储器检索数据且解码数据的设备执行编码和解码。In some cases, the audio encoding and decoding system 10 shown in FIG. 1 is merely an example, and the techniques of this application may be applicable to audio encoding setups (eg, audio encoding or decoding). In other examples, data may be retrieved from local storage, streamed over a network, and the like. An audio encoding device may encode and store data to memory, and/or an audio decoding device may retrieve and decode data from memory. In some examples, encoding and decoding is performed by devices that do not communicate with each other but merely encode data to and/or retrieve data from memory and decode data.
上述编码器可以是多声道编码器,例如,立体声编码器,5.1声道编码器,或7.1声道编码器等。The above-mentioned encoder may be a multi-channel encoder, for example, a stereo encoder, a 5.1 channel encoder, or a 7.1 channel encoder, or the like.
上述音频数据也可以称为音频信号,本申请实施例中的音频信号是指音频编码设备中的输入信号,该音频信号中可以包括多个帧,例如当前帧可以特指音频信号中的某一个帧,本申请实施例中以当前帧音频信号的编解码进行示例说明,音频信号中当前帧的前一帧或者后一帧都可以根据该当前帧音频信号的编解码方式进行相应的编解码,对于音频信号中当前帧的前一帧或者后一帧的编解码过程不再逐一说明。另外,本申请实施例中的音频信号可以是多声道信号,即包括P个声道的音频信号。本申请实施例用于实现多声道音频信号编码。The above audio data may also be referred to as an audio signal. The audio signal in the embodiment of the present application refers to an input signal in an audio coding device, and the audio signal may include multiple frames. For example, the current frame may specifically refer to a certain one of the audio signals. frame, in the embodiment of the present application, the encoding and decoding of the audio signal of the current frame is used as an example, and the previous frame or the next frame of the current frame in the audio signal can be encoded and decoded correspondingly according to the encoding and decoding mode of the audio signal of the current frame, The encoding and decoding process of the previous frame or the next frame of the current frame in the audio signal will not be described one by one. In addition, the audio signal in this embodiment of the present application may be a multi-channel signal, that is, an audio signal including P channels. The embodiments of the present application are used to implement multi-channel audio signal encoding.
需要说明的是,本申请实施例中的“能量/幅度”表示的是能量或者幅度,并且,在实 际处理过程中,对于一个帧的处理,如果一开始处理的是能量,那么后续的处理中都是对能量进行处理,或者,如果一开始处理的是幅度,那么后续的处理中都是对幅度进行处理。It should be noted that “energy/amplitude” in the embodiments of the present application represents energy or amplitude, and, in the actual processing process, for the processing of a frame, if the energy is initially processed, then in the subsequent processing All are processing energy, or, if amplitude is initially processed, then amplitude is processed in subsequent processing.
上述编码器可以执行本申请实施例的多声道音频信号编码方法,以实现合理分配多声道信号编码中各个声道的比特数,以保证解码端重建音频信号的质量,提升编码质量。其具体实施方式可以参见下述实施例的具体解释说明。The above encoder may execute the multi-channel audio signal encoding method of the embodiments of the present application, so as to reasonably allocate the number of bits of each channel in the multi-channel signal encoding, so as to ensure the quality of the reconstructed audio signal at the decoding end and improve the encoding quality. The specific implementation can refer to the specific explanations of the following embodiments.
图2为本申请实施例的一种多声道音频信号编码方法的流程图,本申请实施例的执行主体可以是上述编码器,如图2所示,本实施例的方法可以包括:FIG. 2 is a flowchart of a multi-channel audio signal encoding method according to an embodiment of the present application. The execution body of the embodiment of the present application may be the above encoder. As shown in FIG. 2 , the method in this embodiment may include:
步骤101、获取多声道音频信号的当前帧的P个声道的音频信号,P为大于1的正整数,该P个声道的音频信号包括K个声道对的音频信号。Step 101: Acquire the audio signals of P channels of the current frame of the multi-channel audio signal, where P is a positive integer greater than 1, and the audio signals of the P channels include audio signals of K channel pairs.
其中,一个声道对(channel pair)的音频信号包括两个声道的音频信号。本申请实施例的一个声道对可以是K个声道对中的任意一个。组对的(coupling)两个声道的音频信号即为一个声道对的音频信号。The audio signal of one channel pair includes audio signals of two channels. One channel pair in this embodiment of the present application may be any one of the K channel pairs. Coupling the audio signals of two channels is the audio signal of one channel pair.
在一些实施例中,P=2K。通过多声道信号的筛选、组对、立体声处理和多声道边信息生成之后,可以获取P个声道的音频信号,也即K个声道对的音频信号。In some embodiments, P=2K. After filtering, grouping, stereo processing, and generating multi-channel side information of multi-channel signals, audio signals of P channels, that is, audio signals of K channel pairs can be obtained.
在一些实施例中,该P个声道的音频信号还包括未组对的Q个声道的音频信号,P=2×K+Q,K为正整数,Q为正整数。In some embodiments, the audio signals of the P channels further include unpaired audio signals of the Q channels, where P=2×K+Q, where K is a positive integer, and Q is a positive integer.
通过多声道信号的筛选、组对、立体声处理和多声道边信息生成之后,可以获取K个声道对的音频信号,以及未经过立体声处理的Q个声道的音频信号。以5.1声道信号为例,该5.1声道包括左(L)声道、右(R)声道、中央(C)声道、低频效果(low frequency effects,LFE)声道、左环绕(LS)声道、以及右环绕(RS)声道。将L声道信号和R声道信号组对,形成第一声道对,并经过立体声处理,得到中声道M1声道信号和侧声道S1声道信号,将LS声道信号和RS声道信号组对,形成第二声道对,并经过立体声处理得到中声道M2声道信号和侧声道S2声道信号,LFE声道信号和C声道信号为未组对的音频信号。即P=6,K=2,Q=2。上述P个声道的音频信号包括第一声道对的音频信号、第二声道对的音频信号、以及未经过立体声处理的LFE声道信号和C声道信号,第一声道对的音频信号包括中声道M1声道信号和侧声道S1声道信号,第二声道对的音频信号包括中声道M2声道信号和侧声道S2声道信号。其中,所述的中声道M1和M2,以及侧声道S1和S2可以认为是所述的经过下混处理得到的声道,即下混声道。After multi-channel signal screening, group pairing, stereo processing and multi-channel side information generation, audio signals of K channel pairs and audio signals of Q channels without stereo processing can be obtained. Taking a 5.1 channel signal as an example, the 5.1 channel includes a left (L) channel, a right (R) channel, a center (C) channel, a low frequency effects (LFE) channel, and a left surround (LS) channel. ) channel, and Surround Right (RS) channel. The L channel signal and the R channel signal are paired to form the first channel pair, and after stereo processing, the middle channel M1 channel signal and the side channel S1 channel signal are obtained, and the LS channel signal and the RS channel signal are obtained. The channel signals are grouped to form a second channel pair, and the center channel M2 channel signal and the side channel S2 channel signal are obtained through stereo processing. The LFE channel signal and the C channel signal are unpaired audio signals. That is, P=6, K=2, Q=2. The audio signals of the above-mentioned P channels include the audio signal of the first channel pair, the audio signal of the second channel pair, and the LFE channel signal and the C channel signal that have not undergone stereo processing. The audio signal of the first channel pair The signals include a center channel M1 channel signal and a side channel S1 channel signal, and the audio signal of the second channel pair includes a center channel M2 channel signal and a side channel S2 channel signal. Wherein, the middle channels M1 and M2 and the side channels S1 and S2 may be considered as the channels obtained by the downmix processing, that is, the downmix channels.
其中,在一些实施例中,所述的P个声道不包括所述LFE声道。在这些实施例中,不管LFE声道的能量/幅度值是高还是低,都可以为LFE声道分配固定数量的比特。例如,该固定数量可以是预先设定的一个数值,也就是说,不管多声道信号包括多少个声道,以及不管多声道信号编码比特率的大小,该固定数量都是不变的,例如固定为80,100或120等等。或者,该固定数量也可以根据多声道信号包括的声道数量和多声道信号编码比特率中的至少一项进行确定,一般来说,声道数量越多,该固定数量越小,编码比特率越高,该固定数量越大;例如当多声道信号为5.1声道信号,即包括6个声道时,如果编码比特率为192kbps,所述固定数量可以为80,即为LFE声道分配的比特数为80bits;如果所述编码比特率为256kbps,所述固定数量可以为120,即为LFE声道分配的比特数为120bits;例如,当编码比特率为192kbps时,如果多声道信号为7.1声道信号,即包括8个声道时, 所述固定数量可以为60,即为LFE声道分配的比特数为60bits。Wherein, in some embodiments, the P channels do not include the LFE channel. In these embodiments, the LFE channel may be allocated a fixed number of bits regardless of whether the LFE channel's energy/amplitude value is high or low. For example, the fixed number may be a preset value, that is, no matter how many channels the multi-channel signal includes, and no matter the encoding bit rate of the multi-channel signal, the fixed number is unchanged, For example fixed at 80, 100 or 120 and so on. Alternatively, the fixed number can also be determined according to at least one of the number of channels included in the multi-channel signal and the encoding bit rate of the multi-channel signal. Generally speaking, the greater the number of channels, the smaller the fixed number, the better the encoding The higher the bit rate, the larger the fixed number; for example, when the multi-channel signal is a 5.1-channel signal, that is, includes 6 channels, if the encoding bit rate is 192kbps, the fixed number can be 80, which is LFE sound. The number of bits allocated for the channel is 80bits; if the encoding bit rate is 256kbps, the fixed number can be 120, that is, the number of bits allocated for the LFE channel is 120bits; for example, when the encoding bit rate is 192kbps, if multiple audio When the channel signal is a 7.1-channel signal, that is, 8 channels are included, the fixed number may be 60, that is, the number of bits allocated for the LFE channel is 60 bits.
步骤102、根据P个声道的音频信号各自的能量/幅度,和可用比特数,确定K个声道对各自的比特数。Step 102: Determine the respective bit numbers of the K channel pairs according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits.
其中,该P个声道中一个声道的音频信号的能量/幅度包括该一个声道的音频信号在时域的能量/幅度、该一个声道的音频信号经时频变换后的能量/幅度、该一个声道的音频信号经时频变换以及白化后的能量/幅度、该一个声道的音频信号经能量/幅度均衡后的能量/幅度、或该一个声道的音频信号经立体声处理后的能量/幅度中至少一项。该时域的能量/幅度、经时频变换后的能量/幅度和经时频变换以及白化后的能量/幅度即为能量/幅度均衡前的能量/幅度。换言之,在比特分配过程中可以选择如上任意一种或多种能量/幅度进行比特分配。Wherein, the energy/amplitude of the audio signal of one channel of the P channels includes the energy/amplitude of the audio signal of the one channel in the time domain, the energy/amplitude of the audio signal of the one channel after time-frequency transformation , the energy/amplitude of the audio signal of the one channel after time-frequency transformation and whitening, the energy/amplitude of the audio signal of the one channel after energy/amplitude equalization, or the audio signal of the one channel after stereo processing At least one of the energy/amplitude of . The energy/amplitude in the time domain, the energy/amplitude after time-frequency transformation, and the energy/amplitude after time-frequency transformation and whitening are the energy/amplitude before energy/amplitude equalization. In other words, in the bit allocation process, any one or more of the above energy/amplitude can be selected for bit allocation.
其中,当所述P个声道不包括所述LFE声道时,所述可用比特数不包括所述固定数量的比特数。Wherein, when the P channels do not include the LFE channel, the available bits do not include the fixed number of bits.
该一个声道的音频信号经时频变换以及白化后的能量/幅度指对该以一个声道的音频信号进行时频变换以及白化处理后的能量/幅度,该白化处理用于使得该一个声道的音频信号的频域系数更加平整,以便于后续编码,The time-frequency transformed and whitened energy/amplitude of the audio signal of one channel refers to the energy/amplitude after time-frequency transformation and whitening of the audio signal of one channel, and the whitening is used to make the one audio The frequency domain coefficients of the audio signal of the channel are more flat, so as to facilitate subsequent coding,
根据P个声道的音频信号各自的能量/幅度,和可用比特数,进行一次比特分配。这里的一次比特分配是指针对声道对的比特分配,即给不同的声道对分配相应的比特数。A bit allocation is performed according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits. One bit allocation here refers to bit allocation to channel pairs, that is, to allocate corresponding bit numbers to different channel pairs.
对于P=2K,根据P个声道的音频信号各自的能量/幅度,和可用比特数,确定K个声道对各自的比特数,该比特数也称为初始分配比特数。一个声道对可以作为一个基本单元,根据一个基本单元的能量/幅度在所有基本单元(K个基本单元)的能量/幅度中所占比例,对该一个基本单元进行一次比特分配。任意一个基本单元的能量/幅度可以根据该基本单元内的两个声道的音频信号的能量/幅度确定。例如,一个基本单元的能量/幅度可以是该基本单元内的两个声道的音频信号的能量/幅度之和。通过一次比特分配,可以在不同基本单元之间进行比特分配,以得到各个基本单元的比特数,该比特数也称为初始分配比特数。For P=2K, the respective bit numbers of the K channel pairs are determined according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits, and the number of bits is also referred to as the number of initially allocated bits. A channel pair can be used as a basic unit, and a bit allocation is performed on a basic unit according to the ratio of the energy/amplitude of a basic unit to the energy/amplitude of all basic units (K basic units). The energy/amplitude of any one basic unit can be determined according to the energy/amplitude of the audio signals of the two channels in the basic unit. For example, the energy/amplitude of a base unit may be the sum of the energy/amplitude of the audio signals of the two channels within the base unit. Through one bit allocation, bits can be allocated among different basic units to obtain the number of bits of each basic unit, which is also referred to as the number of initially allocated bits.
对于P=2×K+Q,根据P个声道的音频信号各自的能量/幅度,和可用比特数,确定K个声道对各自的比特数,和Q个声道各自的比特数。一个声道对可以作为一个基本单元,未组对的单个声道作为一个基本单元。根据一个基本单元的能量/幅度在所有基本单元(K+Q个基本单元)的能量/幅度中所占比例,对该一个基本单元进行一次比特分配。其中,对于组对的声道对应的基本单元,该基本单元的能量/幅度可以根据该基本单元内的两个声道的音频信号的能量/幅度确定。对于未组对的声道对应的基本单元,该基本单元的能量/幅度可以根据声道的音频信号的能量/幅度确定。通过一次比特分配,可以在基本单元(K+Q个基本单元)之间进行比特分配,以得到各个基本单元的比特数。换言之,得到K个声道对各自的比特数,以及Q个声道各自的比特数。其中,所述Q个声道中的一个声道可以是单声道,或者也可以是经过下混处理得到的声道,即下混声道。For P=2×K+Q, according to the respective energies/amplitudes of the audio signals of the P channels and the number of available bits, the respective bit numbers of the K channel pairs and the respective bit numbers of the Q channels are determined. A channel pair can be used as a basic unit, and an unpaired single channel can be used as a basic unit. According to the ratio of the energy/amplitude of a basic unit to the energy/amplitude of all basic units (K+Q basic units), a bit allocation is performed on a basic unit. Wherein, for the basic unit corresponding to the channel of the group pair, the energy/amplitude of the basic unit may be determined according to the energy/amplitude of the audio signals of the two channels in the basic unit. For the basic unit corresponding to the unpaired channel, the energy/amplitude of the basic unit may be determined according to the energy/amplitude of the audio signal of the channel. Through one bit allocation, bit allocation can be performed among basic units (K+Q basic units) to obtain the number of bits of each basic unit. In other words, the number of bits for each of the K channel pairs and the number of bits for each of the Q channels are obtained. Wherein, one of the Q channels may be a monophonic channel, or may also be a channel obtained through downmix processing, that is, a downmix channel.
无论是P=2K,还是P=2×K+Q,对于K个声道对各自的比特数的确定,一种可实现方式,可以根据K个声道对各自在时域的能量/幅度、或经时频变换后的能量/幅度、或经时频变换以及白化后的能量/幅度中任意一项,以及可用比特数确定。在该实现方式中,为了提升编码效率和编码效果,可以在比特分配之前,对K个声道对的音频信号进行能量/幅度均衡。对K个声道对的音频信号进行能量/幅度均衡的方式可以是对多个声道对、或多个声 道对以及一个或多个未组对的声道中的所有声道的音频信号进行能量/幅度均衡。在该实现方式中,对K个声道对的音频信号进行能量/幅度均衡的方式也可以是对单个声道对内的两个声道的音频信号进行能量/幅度均衡。Whether it is P=2K or P=2×K+Q, for the determination of the respective bit numbers of the K channel pairs, an achievable way can be based on the energy/amplitude, Either the energy/amplitude after time-frequency transformation, or the energy/amplitude after time-frequency transformation and whitening, and can be determined by the number of bits. In this implementation manner, in order to improve coding efficiency and coding effect, energy/amplitude equalization may be performed on the audio signals of the K channel pairs before bit allocation. The manner of performing energy/amplitude equalization on the audio signals of the K channel pairs may be the audio signals of all of the plurality of channel pairs, or the plurality of channel pairs and one or more unpaired channels Perform energy/amplitude equalization. In this implementation manner, the manner of performing energy/amplitude equalization on the audio signals of the K channel pairs may also be performing energy/amplitude equalization on the audio signals of the two channels in a single channel pair.
另一种可实现方式,可以根据K个声道对的音频信号各自的经能量/幅度均衡后的能量/幅度、或经立体声处理后的能量/幅度中任意一项,以及可用比特数确定。在该实现方式中,为了提升编码效率和编码效果,可以在比特分配之前,对K个声道对的音频信号进行能量/幅度均衡。对K个声道对的音频信号进行能量/幅度均衡的方式可以是对单个声道对内的两个声道的音频信号进行能量/幅度均衡。其中,K个声道对的音频信号各自的经能量/幅度均衡后的能量/幅度或经立体声处理后的能量/幅度均为对单个声道对内的两个声道的音频信号进行能量/幅度均衡后获取的。Another achievable implementation can be determined according to any one of the energy/amplitude after energy/amplitude equalization or the energy/amplitude after stereo processing of the audio signals of the K channel pairs, and the number of available bits. In this implementation manner, in order to improve coding efficiency and coding effect, energy/amplitude equalization may be performed on the audio signals of the K channel pairs before bit allocation. The manner of performing energy/amplitude equalization on the audio signals of the K channel pairs may be performing energy/amplitude equalization on the audio signals of two channels in a single channel pair. Wherein, the energy/amplitude after energy/amplitude equalization or the energy/amplitude after stereo processing of the audio signals of the K channel pairs is the energy/amplitude of the audio signals of the two channels in a single channel pair. obtained after amplitude equalization.
与K个声道对各自的比特数的确定类似,在P=2×K+Q时,对于Q个声道各自的比特数的确定,一种可实现方式,可以根据Q个声道的音频信号各自在时域的能量/幅度、或经时频变换后的能量/幅度、或经时频变换以及白化后的能量/幅度中任意一项,以及可用比特数确定。另一种可实现方式,可以根据Q个声道的音频信号各自经能量/幅度均衡后的能量/幅度、或经立体声处理后的能量/幅度中任意一项,以及可用比特数确定。其中,Q个声道的音频信号各自的经能量/幅度均衡后的能量/幅度或经立体声处理后的能量/幅度等于经能量/幅度均衡前的能量/幅度或经立体声处理前的能量/幅度。Similar to the determination of the respective bit numbers of the K channels, when P=2×K+Q, for the determination of the respective bit numbers of the Q channels, an achievable way can be based on the audio of the Q channels. The energy/amplitude of each signal in the time domain, or the energy/amplitude after time-frequency transformation, or the energy/amplitude after time-frequency transformation and whitening, and can be determined by the number of bits. Another achievable manner can be determined according to any one of the energy/amplitude after energy/amplitude equalization or the energy/amplitude after stereo processing of the audio signals of the Q channels, and the number of available bits. The energy/amplitude after energy/amplitude equalization or the energy/amplitude after stereo processing of the audio signals of the Q channels is equal to the energy/amplitude before energy/amplitude equalization or the energy/amplitude before stereo processing .
其中,在一些实施例中,考虑到为单个声道分配的比特数量大于某个阈值后,并不会提高对该声道的编码质量,因此可以预设一个阈值,在对声道进行比特分配的过程中考虑该阈值,使得不管该单个声道的能量/幅度有多大,为单个声道分配的比特数不会超过该阈值,从而可以为其他声道分配更多的比特以提高其他声道的编码质量同时也不会降低所述单个声道的编码质量,也提高了整个信号的编码质量。Among them, in some embodiments, considering that the number of bits allocated to a single channel is greater than a certain threshold, the encoding quality of the channel will not be improved, so a threshold can be preset, and the bit allocation to the channel This threshold is taken into account during the process so that regardless of the energy/amplitude of the single channel, the number of bits allocated to a single channel will not exceed the threshold, so that more bits can be allocated to other channels to improve the other channels. At the same time, the encoding quality of the single channel will not be reduced, and the encoding quality of the whole signal will also be improved.
相应地,在一些实施例中,所述确定K个声道对各自的比特数还可以包括如下步骤:Correspondingly, in some embodiments, the determining the respective bit numbers of the K channel pairs may further include the following steps:
确定所述P个声道中初始分配比特数大于阈值的第M个声道,M大于等于0且小于P;Determine the Mth channel whose initial allocation bit number is greater than the threshold in the P channels, where M is greater than or equal to 0 and less than P;
获取所述第M个声道的冗余比特数量,其中,第M个声道的冗余比特数=第M个声道的初始分配比特数-阈值;Obtain the number of redundant bits of the Mth channel, wherein the number of redundant bits of the Mth channel=the number of initially allocated bits of the Mth channel-threshold;
若所述第M个声道是所述P个声道中第一个确定的初始分配比特数大于阈值的声道,则将所述冗余比特数量个比特分配给所述P个声道中除所述第M个声道外的P-1个声道,以获得所述P-1个声道的更新比特数;其中,所述第M个声道的更新比特数为所述阈值。若所述第M个声道不是所述已经确定的第一个初始分配比特数大于阈值的声道,则将所述冗余比特数量个比特分配给所述P个声道中除所述第M个声道以及已确定的初始分配比特数大于阈值的声道外的其他声道,以获得所述其他声道的更新比特数。例如,所述已确定的初始分配比特数大于阈值的声道为第N个声道,则所述其他声道包括所述P个声道中除所述第M个声道和所述第N个声道外的P-2个声道。需要说明的是,若LFE声道是分配固定数量的比特,则所述P个声道不包括LFE声道。If the M th channel is the first channel of the P channels whose initial allocation bit number is greater than a threshold, allocate the redundant bits to the P channels P-1 channels other than the M-th channel are used to obtain the number of updated bits of the P-1 channels; wherein, the number of updated bits of the M-th channel is the threshold. If the M th channel is not the first channel whose number of initially allocated bits is greater than the threshold, the number of redundant bits is allocated to the P channels except the The M channels and other channels other than the channels whose initial allocation bit number is determined to be greater than the threshold value are obtained, so as to obtain the updated bit number of the other channels. For example, the channel with the determined initial allocation bit number greater than the threshold is the Nth channel, and the other channels include the Mth channel and the Nth channel among the P channels except the Mth channel and the Nth channel. P-2 channels out of one channel. It should be noted that, if the LFE channel is allocated a fixed number of bits, the P channels do not include the LFE channel.
假设单个声道的比特数阈值为frmBitMax。frmBitMax可通过单个声道的饱和编码比特率、帧长、编码采样率按照如下计算式计算得到:Assume that the bit threshold for a single channel is frmBitMax. frmBitMax can be calculated from the saturated encoding bit rate, frame length, and encoding sampling rate of a single channel according to the following formula:
frmBitMax=rateMax×frameLen/fs,frmBitMax=rateMax×frameLen/fs,
其中,rateMax表示单个声道的饱和编码比特率,frameLen表示帧长,fs表示编码采样率。通常rateMax可以取256000bps、240000bps、224000bps、192000bps等等;其中,rateMax的取值可以根据编码器的编码效率进行选取,也可以根据经验进行设定,此处不做限定。Among them, rateMax represents the saturated encoding bit rate of a single channel, frameLen represents the frame length, and fs represents the encoding sample rate. Usually rateMax can be 256000bps, 240000bps, 224000bps, 192000bps, etc. The value of rateMax can be selected according to the coding efficiency of the encoder, or can be set according to experience, which is not limited here.
以多声道信号为5.1声道信号为例,L声道和R声道组对下混得到M1声道和S1声道,LS声道和RS声道组对下混得到M2声道和S2声道。其中,Bits(M1)表示M1声道的初始分配比特数,Bits(S1)表示S1声道的初始分配比特数,Bits(M2)表示M2声道的初始分配比特数,Bits(S2)表示S2声道的初始分配比特数,未参与组对的声道的初始分配比特数为Bits(C)和Bits(LFE)。其中,若给LFE声道分配的是固定数量的比特,则可用比特数=Bits(M1)+Bits(S1)+Bits(M2)+Bits(S2)+Bits(C);若给LFE声道分配的不是固定数量的比特,则可用比特数=Bits(M1)+Bits(S1)+Bits(M2)+Bits(S2)+Bits(C)+Bits(LFE)。Taking the multi-channel signal as a 5.1-channel signal as an example, the L channel and R channel group are downmixed to obtain M1 channel and S1 channel, and the LS channel and RS channel group are downmixed to obtain M2 channel and S2 channel. sound. Among them, Bits(M1) represents the initial allocation bit number of M1 channel, Bits(S1) represents the initial allocation bit number of S1 channel, Bits(M2) represents the initial allocation bit number of M2 channel, Bits(S2) represents S2 The initial allocation bit number of the channel, the initial allocation bit number of the channel that does not participate in the group pair is Bits(C) and Bits(LFE). Among them, if a fixed number of bits are allocated to the LFE channel, the number of available bits=Bits(M1)+Bits(S1)+Bits(M2)+Bits(S2)+Bits(C); if the LFE channel is allocated a fixed number of bits If not a fixed number of bits are allocated, the number of available bits=Bits(M1)+Bits(S1)+Bits(M2)+Bits(S2)+Bits(C)+Bits(LFE).
如下以给LFE声道分配的是固定数量的比特为例进行描述:The following description takes the example of assigning a fixed number of bits to the LFE channel:
可用比特数表示为totalBits,所述阈值表示为frmBitMax。设置allocFlag[5]={0,0,0,0,0},这里假设已对5.1声道进行了排序且M1=0,S1=1,C=2,M2=3,S2=4。执行如下流程:The number of available bits is denoted totalBits and the threshold is denoted frmBitMax. Set allocFlag[5]={0,0,0,0,0}, where 5.1 channels are assumed to be sorted and M1=0, S1=1, C=2, M2=3, S2=4. Execute the following process:
步骤1、如果Bits(i)<=frmBitMax,当跳到步骤5,其中,Bits(i)=frmBitMax时,还需要将allocFlag[i]=1,其中0<=i<5;Step 1. If Bits(i)<=frmBitMax, jump to step 5, where Bits(i)=frmBitMax, also need to set allocFlag[i]=1, where 0<=i<5;
步骤2、如果Bits(i)>frmBitMax,将allocFlag[i]=1,计算diffBits=Bits(ch)–frmBitMax,然后执行步骤3-5;Step 2. If Bits(i)>frmBitMax, set allocFlag[i]=1, calculate diffBits=Bits(ch)-frmBitMax, and then perform steps 3-5;
步骤3、计算sumBits=∑Bits(j),0<=j<5,其中,当allocFlag[j]=1时Bits(j)不累加至sumBits;Step 3. Calculate sumBits=∑Bits(j), 0<=j<5, wherein Bits(j) is not accumulated to sumBits when allocFlag[j]=1;
步骤4、将diffBits分配给allocFlag[j]≠1的通道,具体如下:Step 4. Assign diffBits to the channel of allocFlag[j]≠1, as follows:
Bits(j)=Bits(j)+diffBits×Bits(j)/sumBitsBits(j)=Bits(j)+diffBits×Bits(j)/sumBits
步骤5、如果i=4,结束流程;如果i<3,i++,跳到步骤1。Step 5. If i=4, end the process; if i<3, i++, skip to step 1.
其中,在一种实施方式中,在执行完步骤4之后,还可以执行如下步骤:Wherein, in one embodiment, after performing step 4, the following steps can also be performed:
判断Bits(j)是否大于或等于frmBitMax;如果Bits(j)大于或等于frmBitMax,将allocFlag[j]的值置为1。Determine whether Bits(j) is greater than or equal to frmBitMax; if Bits(j) is greater than or equal to frmBitMax, set the value of allocFlag[j] to 1.
如下是另一个以LFE声道分配的是固定数量的比特进行描述的例子:The following is another example described with a fixed number of bits allocated to the LFE channel:
可用比特数表示为totalBits,所述阈值表示为frmBitMax。设置allocFlag[6]={0,0,0,0,0,0},这里假设已对5.1声道进行了排序且M1=0,S1=1,C=2,M2=3,S2=4,LFE=5。The number of available bits is denoted totalBits and the threshold is denoted frmBitMax. Set allocFlag[6]={0,0,0,0,0,0}, here it is assumed that 5.1 channels have been sorted and M1=0, S1=1, C=2, M2=3, S2=4 , LFE=5.
步骤1、如果Bits(i)<=frmBitMax,跳到步骤5,其中Bits(i)=frmBitMax时,还需要将allocFlag[i]=1,0<=i<6;Step 1. If Bits(i)<=frmBitMax, skip to step 5. When Bits(i)=frmBitMax, it is also necessary to set allocFlag[i]=1, 0<=i<6;
步骤2、如果Bits(i)>frmBitMax,将allocFlag[i]=1,计算diffBits=Bits(i)–frmBitMax,然后执行步骤3-5;Step 2. If Bits(i)>frmBitMax, set allocFlag[i]=1, calculate diffBits=Bits(i)-frmBitMax, and then perform steps 3-5;
步骤3、计算sumBits=∑Bits(j),0<=j<4,其中,当allocFlag[j]=1时Bits(j)不累加至sumBits;Step 3. Calculate sumBits=∑Bits(j), 0<=j<4, wherein Bits(j) is not accumulated to sumBits when allocFlag[j]=1;
步骤4、将diffBits分配给allocFlag[j]≠1的通道,具体如下:Step 4. Assign diffBits to the channel of allocFlag[j]≠1, as follows:
Bits(j)=Bits(j)+diffBits×Bits(j)/sumBitsBits(j)=Bits(j)+diffBits×Bits(j)/sumBits
步骤5、如果i=4,结束流程;如果i<3,i++,跳到步骤1。Step 5. If i=4, end the process; if i<3, i++, skip to step 1.
其中,在一种实施方式中,在执行完步骤4之后,还可以执行如下步骤:Wherein, in one embodiment, after performing step 4, the following steps can also be performed:
判断Bits(j)是否大于或等于frmBitMax;如果Bits(j)大于或等于frmBitMax,将allocFlag[j]的值置为1。Determine whether Bits(j) is greater than or equal to frmBitMax; if Bits(j) is greater than or equal to frmBitMax, set the value of allocFlag[j] to 1.
步骤103、根据K个声道对各自的比特数,对P个声道的音频信号进行编码,获取编码码流。Step 103: Encode the audio signals of the P channels according to the respective bit numbers of the K channels to obtain an encoded code stream.
其中,所述比特数可以是初始分配比特数,也可以是更新比特数。The number of bits may be the number of initially allocated bits or the number of updated bits.
对P个声道的音频信号进行编码,可以包括对P个声道的音频信号进行量化、熵编码以及码流复用,以获取编码码流。Encoding the audio signals of the P channels may include performing quantization, entropy encoding, and code stream multiplexing on the audio signals of the P channels to obtain an encoded code stream.
对于P=2K,根据K个声道对各自的比特数,对P个声道的音频信号进行量化、熵编码以及码流复用,获取编码码流。For P=2K, according to the respective bit numbers of the K channels, the audio signals of the P channels are quantized, entropy encoded, and stream multiplexed to obtain an encoded stream.
对于P=2×K+Q,根据K个声道对各自的比特数和Q个声道各自的比特数,对P个声道的音频信号进行量化、熵编码以及码流复用,获取编码码流。For P=2×K+Q, perform quantization, entropy coding and code stream multiplexing on the audio signals of the P channels according to the respective bit numbers of the K channels and the respective bit numbers of the Q channels to obtain the encoding code stream.
本实施例,获取多声道音频信号的当前帧的P个声道的音频信号,该P个声道的音频信号包括K个声道对的音频信号,根据P个声道的音频信号各自的能量/幅度,和可用比特数,确定K个声道对各自的比特数,根据K个声道对各自的比特数,对P个声道的音频信号进行编码,以获取编码码流。其中,P个声道中的一个声道的音频信号的能量/幅度包括该一个声道的音频信号在时域的能量/幅度、经时频变换后的能量/幅度、经时频变换以及白化后的能量/幅度、经能量/幅度均衡后的能量/幅度、或经立体声处理后的能量/幅度中至少一项。通过根据P个声道的音频信号各自在时域的能量/幅度、经时频变换后的能量/幅度、经时频变换以及白化后的能量/幅度、经能量/幅度均衡后的能量/幅度、或经立体声处理后的能量/幅度中至少一项进行针对声道对的比特分配,确定K个声道对各自的比特数,从而实现合理分配多声道信号编码中各个声道对的比特数,以保证解码端重建音频信号的质量。例如,对于声道对间能量/幅度差异较大的情况,通过本申请实施例的方法,可以解决能量/幅度大的声道对的编码比特不足的问题,以保证解码端重建音频信号的质量。In this embodiment, the audio signals of the P channels of the current frame of the multi-channel audio signal are acquired, and the audio signals of the P channels include the audio signals of the K channel pairs. The energy/amplitude, and the number of available bits, determine the respective bit numbers of the K channel pairs, and encode the audio signals of the P channels according to the respective bit numbers of the K channels to obtain the encoded code stream. Wherein, the energy/amplitude of the audio signal of one channel of the P channels includes the energy/amplitude of the audio signal of the one channel in the time domain, the energy/amplitude after time-frequency transformation, the time-frequency transformation and whitening at least one of the energy/amplitude after energy/amplitude equalization, or the energy/amplitude after stereo processing. According to the energy/amplitude of the audio signals of the P channels in the time domain, the energy/amplitude after time-frequency transformation, the energy/amplitude after time-frequency transformation and whitening, and the energy/amplitude after energy/amplitude equalization , or at least one of the energy/amplitude after stereo processing performs the bit allocation for the channel pair, and determines the respective bit numbers of the K channel pairs, thereby realizing the reasonable allocation of the bits of each channel pair in the multi-channel signal encoding. to ensure the quality of the reconstructed audio signal at the decoding end. For example, in the case of a large difference in energy/amplitude between channel pairs, the method of the embodiments of the present application can solve the problem of insufficient coding bits for channel pairs with large energy/amplitude, so as to ensure the quality of the reconstructed audio signal at the decoding end .
图3为本申请实施例的一种多声道音频信号编码方法的流程图,本申请实施例的执行主体可以是上述编码器,如图3所示,本实施例的方法可以包括:FIG. 3 is a flowchart of a multi-channel audio signal encoding method according to an embodiment of the present application. The execution body of the embodiment of the present application may be the above encoder. As shown in FIG. 3 , the method of the present embodiment may include:
步骤201、获取多声道音频信号的当前帧的P个声道的音频信号,P为大于1的正整数,该P个声道的音频信号包括K个声道对的音频信号。Step 201: Acquire audio signals of P channels of a current frame of a multi-channel audio signal, where P is a positive integer greater than 1, and the audio signals of the P channels include audio signals of K channel pairs.
其中,步骤201的具体解释说明可以参见图2所示实施例的步骤101,此处不再赘述。The specific explanation of step 201 may refer to step 101 of the embodiment shown in FIG. 2 , and details are not repeated here.
步骤202、根据P个声道的音频信号各自的能量/幅度,和可用比特数,确定K个声道对各自的比特数。Step 202: Determine the respective bit numbers of the K channel pairs according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits.
根据P个声道的音频信号各自的能量/幅度,和可用比特数,进行一次比特分配。A bit allocation is performed according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits.
对于P=2×K,在一次比特分配过程中,本申请实施例的方法可以根据P个声道的音频信号各自的能量/幅度,和可用比特数,确定K个声道对各自的比特数。For P=2×K, in a bit allocation process, the method of the embodiment of the present application can determine the respective bit numbers of the K channel pairs according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits .
对于P=2×K+Q,在一次比特分配过程中,本申请实施例的方法可以根据P个声道的音频信号各自的能量/幅度,和可用比特数,确定K个声道对各自的比特数以及Q个声道各自的比特数。For P=2×K+Q, in a bit allocation process, the method of the embodiment of the present application can determine the corresponding K channel pairs according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits. The number of bits and the number of bits for each of the Q channels.
其中,无论是P=2K,还是P=2×K+Q,步骤202中关于K个声道对各自的比特数,以 及Q个声道各自比特数的确定的解释说明,可以参见图1所示实施例的步骤102,此处不再赘述。Wherein, whether it is P=2K or P=2×K+Q, in step 202, the explanation about the respective bit numbers of the K channel pairs and the determination of the respective bit numbers of the Q channels can be referred to in FIG. 1 . Step 102 in the illustrated embodiment is not repeated here.
步骤203、根据K个声道对中当前声道对的比特数和当前声道对中两个声道的音频信号各自的经立体声处理后的能量/幅度,确定当前声道对中两个声道各自的比特数。Step 203, according to the number of bits of the current channel pair in the K channel pairs and the respective stereo processed energy/amplitude of the audio signals of the two channels in the current channel pair, determine the two sound channels in the current channel pair. the number of bits for each channel.
以K个声道对中当前声道对为例,根据K个声道对中当前声道对的比特数和当前声道对中两个声道的音频信号各自的经立体声处理后的能量/幅度,在当前声道对中进行二次比特分配。二次比特分配是对当前声道对的两个声道的比特数进行分配。也即对组对的声道对应的基本单元,根据基本单元内两个声道的音频信号各自的能量/幅度比例在基本单元内进行比特分配。该当前声道对可以是K个声道对中的任意一个声道对。这里的二次比特分配是指针对声道对内的两个声道的比特分配,即给声道对内的两个声道分配相应的比特数。Taking the current channel pair among the K channel pairs as an example, according to the number of bits of the current channel pair among the K channel pairs and the respective stereo-processed energy of the audio signals of the two channels among the current channel pairs/ Amplitude, with secondary bit allocation in the current channel pair. Secondary bit allocation is to allocate the number of bits of the two channels of the current channel pair. That is, for the basic units corresponding to the channels of the group pair, the bits are allocated in the basic unit according to the respective energy/amplitude ratios of the audio signals of the two channels in the basic unit. The current channel pair may be any one of the K channel pairs. The secondary bit allocation here refers to the bit allocation for two channels in a channel pair, that is, allocating corresponding bit numbers to the two channels in the channel pair.
无论是P=2K,还是P=2×K+Q,均可以采用上述步骤203的方式在声道对内进行比特分配,以获取声道对内的两个声道各自的比特数。Regardless of whether P=2K or P=2×K+Q, the method of step 203 above can be used to allocate bits in the channel pair to obtain the respective bit numbers of the two channels in the channel pair.
步骤204、根据当前声道对中两个声道各自的比特数分别对该两个声道的音频信号进行编码,获取编码码流。Step 204: Encode the audio signals of the two channels according to the respective bit numbers of the two channels in the current channel pair to obtain an encoded code stream.
分别对当前声道对中两个声道的音频信号进行编码,可以包括分别对当前声道对中两个声道的音频信号进行量化、熵编码以及码流复用,以获取编码码流。Respectively encoding the audio signals of the two channels in the current channel pair may include quantization, entropy encoding, and code stream multiplexing respectively on the audio signals of the two channels in the current channel pair to obtain an encoded code stream.
对于P=2K,根据K个声道对各自的比特数,分别对P个声道的音频信号进行量化、熵编码以及码流复用,获取编码码流。For P=2K, according to the respective bit numbers of the K channels, the audio signals of the P channels are respectively quantized, entropy encoded, and stream multiplexed to obtain an encoded stream.
对于P=2×K+Q,根据K个声道对各自的比特数分别对K个声道对的音频信号进行量化、熵编码以及码流复用,根据Q个声道各自的比特数分别对所述Q个声道的音频信号进行量化、熵编码以及码流复用,获取编码码流。For P=2×K+Q, the audio signals of the K channel pairs are quantized, entropy encoded, and stream multiplexed according to the respective bit numbers of the K channels, respectively. Perform quantization, entropy encoding, and code stream multiplexing on the audio signals of the Q channels to obtain an encoded code stream.
本实施例,获取多声道音频信号的当前帧的P个声道的音频信号,该P个声道的音频信号包括K个声道对的音频信号,根据P个声道的音频信号各自的能量/幅度,和可用比特数,确定K个声道对各自的比特数,根据K个声道对各自的比特数,根据K个声道对中当前声道对的比特数和当前声道对中两个声道的音频信号各自的经立体声处理后的能量/幅度,确定当前声道对中两个声道各自的比特数,根据当前声道对中两个声道各自的比特数分别对该两个声道的音频信号进行编码,获取编码码流。通过根据P个声道的音频信号各自在时域的能量/幅度、经时频变换后的能量/幅度、经时频变换以及白化后的能量/幅度、经能量/幅度均衡后的能量/幅度、或经立体声处理后的能量/幅度中至少一项进行针对声道对的比特分配,确定K个声道对各自的比特数,进而基于K个声道对各自的比特数进行声道对内的比特分配,从而实现合理分配多声道信号编码中各个声道的比特数,以保证解码端重建音频信号的质量。例如,对于声道对间能量/幅度差异较大的情况,通过本申请实施例的方法,可以解决能量/幅度大的声道对信号的编码比特不足的问题,以保证解码端重建音频信号的质量。In this embodiment, the audio signals of the P channels of the current frame of the multi-channel audio signal are acquired, and the audio signals of the P channels include the audio signals of the K channel pairs. The energy/amplitude, and the number of available bits, determine the respective number of bits of the K channel pairs, according to the respective number of bits of the K channel pairs, according to the number of bits of the current channel pair among the K channel pairs and the current channel pair The respective stereo-processed energy/amplitude of the audio signals of the two channels in the current channel pair determines the respective bit numbers of the two channels in the current channel pair, and respectively sets the bit numbers of the two channels in the current channel pair according to the respective bit numbers of the two channels in the current channel pair. The audio signals of the two channels are encoded to obtain an encoded code stream. According to the energy/amplitude of the audio signals of the P channels in the time domain, the energy/amplitude after time-frequency transformation, the energy/amplitude after time-frequency transformation and whitening, and the energy/amplitude after energy/amplitude equalization , or at least one of the energy/amplitude after stereo processing performs bit allocation for channel pairs, determines the respective bit numbers of the K channel pairs, and then performs channel pairing based on the respective bit numbers of the K channels The number of bits of each channel in the multi-channel signal encoding can be reasonably allocated to ensure the quality of the reconstructed audio signal at the decoding end. For example, in the case of a large difference in energy/amplitude between channel pairs, the method of the embodiment of the present application can solve the problem of insufficient coding bits of the channel pair signal with large energy/amplitude, so as to ensure the reconstruction of the audio signal at the decoding end. quality.
图4为本申请实施例的一种声道对的比特分配方法的流程图,本申请实施例的执行主体可以是上述编码器,本实施例为上述图2所示实施例的步骤102的一种具体的可实现方式,如图4所示,本实施例的方法可以包括:FIG. 4 is a flowchart of a method for allocating bits of a channel pair according to an embodiment of the present application. The executive body of the embodiment of the present application may be the foregoing encoder, and this embodiment is one of step 102 of the embodiment shown in FIG. 2 above. A specific implementation manner, as shown in FIG. 4 , the method of this embodiment may include:
步骤1021、根据P个声道的音频信号各自的能量/幅度,确定当前帧的能量/幅度和。Step 1021: Determine the energy/amplitude sum of the current frame according to the respective energy/amplitude of the audio signals of the P channels.
如上述P个声道的音频信号各自的能量/幅度包括P个声道的音频信号各自在时域的能量/幅度、经时频变换后的能量/幅度、经时频变换以及白化后的能量/幅度、经能量/幅度均衡后的能量/幅度、或经立体声处理后的能量/幅度中至少一项。For example, the respective energy/amplitude of the audio signals of the P channels includes the respective energy/amplitude of the audio signals of the P channels in the time domain, the energy/amplitude after time-frequency transformation, the energy after time-frequency transformation and whitening At least one of /amplitude, energy/amplitude after energy/amplitude equalization, or energy/amplitude after stereo processing.
对不同能量/幅度类型,确定当前帧的能量/幅度和的方式进行解释说明。Explain how the energy/amplitude sum of the current frame is determined for different energy/amplitude types.
方式一、根据P个声道的音频信号各自的经立体声处理后的能量/幅度,确定当前帧的能量/幅度和。该当前帧的能量/幅度和可以是立体声处理后的能量/幅度和sum_E posManner 1: Determine the energy/amplitude sum of the current frame according to the respective stereo-processed energy/amplitude of the audio signals of the P channels. The current frame energy / energy may be amplitude and / sum_E pos amplitude and the stereo processing.
示例性的,可以根据下述公式(1)和公式(2)确定立体声处理后的能量/幅度和sum_E post Exemplarily, the stereo-processed energy/amplitude and sum_E post can be determined according to the following formulas (1) and (2).
Figure PCTCN2021106102-appb-000009
Figure PCTCN2021106102-appb-000009
Figure PCTCN2021106102-appb-000010
Figure PCTCN2021106102-appb-000010
其中,ch表示声道索引,E post(ch)表示声道索引为ch的声道的音频信号经立体声处理后的能量/幅度,sampleCoef post(ch,i)表示立体声处理后的ch声道的当前帧的第i个系数,N表示该当前帧的系数的个数,N取大于1的正整数。声道索引为ch的声道可以是上述P个声道中任意一个声道。 Among them, ch represents the channel index, E post (ch) represents the energy/amplitude of the audio signal of the channel whose channel index is ch after stereo processing, and sampleCoef post (ch, i) represents the stereo processed channel of ch. The ith coefficient of the current frame, N represents the number of coefficients of the current frame, and N takes a positive integer greater than 1. The channel whose channel index is ch may be any one of the above P channels.
即可以通过如上方式一确定当前帧的能量/幅度和,再通过下述步骤1022和步骤1023完成上述一次比特分配。That is, the energy/amplitude sum of the current frame can be determined by the above method 1, and then the above-mentioned one bit allocation can be completed by the following steps 1022 and 1023.
方式二,根据P个声道的音频信号各自的能量/幅度均衡前的能量/幅度,确定当前帧的能量/幅度和。该能量/幅度和可以是能量/幅度均衡前的能量/幅度和sum_E preIn a second manner, the energy/amplitude sum of the current frame is determined according to the energy/amplitude before equalization of the respective energy/amplitude of the audio signals of the P channels. The energy/amplitude sum may be the energy/amplitude sum sum_E pre before energy/amplitude equalization.
示例性的,可以根据下述公式(3)和公式(4)确定能量/幅度均衡前的能量/幅度和sum_E pre Exemplarily, the energy/amplitude and sum_E pre before energy/amplitude equalization may be determined according to the following formulas (3) and (4).
Figure PCTCN2021106102-appb-000011
Figure PCTCN2021106102-appb-000011
Figure PCTCN2021106102-appb-000012
Figure PCTCN2021106102-appb-000012
其中,E pre(ch)表示声道索引为ch的声道的音频信号经能量/幅度均衡前的能量/幅度,sampleCoef(ch,i)表示能量/幅度均衡前的ch声道的当前帧的第i个系数,N表示该当前帧的系数的个数,N取大于1的正整数。 Among them, E pre (ch) represents the energy/amplitude of the audio signal of the channel whose channel index is ch before energy/amplitude equalization, and sampleCoef(ch, i) represents the current frame of the ch channel before energy/amplitude equalization. For the i-th coefficient, N represents the number of coefficients of the current frame, and N takes a positive integer greater than 1.
即可以通过如上方式二确定当前帧的能量/幅度和,再通过下述步骤1022和步骤1023完成上述一次比特分配。That is, the energy/amplitude sum of the current frame can be determined through the second method above, and then the above-mentioned first bit allocation can be completed through the following steps 1022 and 1023 .
方式三、根据P个声道的音频信号各自的能量/幅度均衡前的能量/幅度和P个声道各自的加权系数,确定当前帧的能量/幅度和。该P个声道中任意一个声道的加权系数小于或等于1。该能量/幅度和可以是能量/幅度均衡前的能量/幅度和sum_E preManner 3: Determine the energy/amplitude sum of the current frame according to the respective energy/amplitude of the audio signals of the P channels before equalization and the respective weighting coefficients of the P channels. The weighting coefficient of any one of the P channels is less than or equal to 1. The energy/amplitude sum may be the energy/amplitude sum sum_E pre before energy/amplitude equalization.
示例性的,根据根据下述公式(5)确定能量/幅度均衡前的能量/幅度和sum_E pre Exemplarily, the energy/amplitude sum sum_E pre before energy/amplitude equalization is determined according to the following formula (5).
Figure PCTCN2021106102-appb-000013
Figure PCTCN2021106102-appb-000013
其中,α(ch)为声道索引为ch的声道的加权系数,一个声道对的两个声道的加权系数相同,且一个声道对的两个声道的加权系数大小与该声道对中两个声道之间的归一化相关 值成反比。Among them, α(ch) is the weighting coefficient of the channel whose channel index is ch, the weighting coefficients of the two channels of a channel pair are the same, and the weighting coefficients of the two channels of a channel pair are the same as the weighting coefficients of the two channels of the channel pair. The normalized correlation values between the two channels in a pair are inversely proportional.
一种可实现方式,当声道索引为ch的声道未参与组对时,α(ch)为1。当声道索引为ch的声道参与组对时,以声道索引为ch1的声道(以下简称ch1)、声道索引为ch2的声道(以下简称ch2)、声道索引为ch3的声道(以下简称ch3)以及声道索引为ch4的声道(以下简称ch4)为例,其中以ch1和ch2组对,ch3和ch4组对为例,α(ch1)和α(ch2)相等,且均小于1,α(ch3)和α(ch4)相等,且均小于1。α(ch1)和α(ch2)可以根据ch1和ch2的归一化相关值Corr_norm(ch1,ch2)确定。α(ch3)和α(ch4)可以根据归一化相关值Corr_norm(ch3,ch4)确定。归一化相关值Corr_norm(ch3,ch4)较大的α(ch3)和α(ch4)的值,小于归一化相关值Corr_norm(ch1,ch2)较小的α(ch1)和α(ch2)的值。即α(ch1)和α(ch2)与ch1和ch2的归一化相关值Corr_norm(ch1,ch2)成反比。In one implementation, α(ch) is 1 when the channel with the channel index ch does not participate in the group pair. When the channel whose channel index is ch participates in the group pair, the channel whose channel index is ch1 (hereinafter referred to as ch1), the channel whose channel index is ch2 (hereinafter referred to as ch2), and the channel whose channel index is ch3 Take the channel (hereinafter referred to as ch3) and the channel with channel index ch4 (hereinafter referred to as ch4) as examples, where the pair of ch1 and ch2, and the pair of ch3 and ch4 are taken as examples, α(ch1) and α(ch2) are equal, And both are less than 1, α(ch3) and α(ch4) are equal, and both are less than 1. α(ch1) and α(ch2) can be determined according to the normalized correlation value Corr_norm(ch1, ch2) of ch1 and ch2. α(ch3) and α(ch4) may be determined according to the normalized correlation value Corr_norm(ch3, ch4). The values of α(ch3) and α(ch4) where the normalized correlation value Corr_norm(ch3, ch4) is larger, are smaller than the values of α(ch1) and α(ch2) where the normalized correlation value Corr_norm(ch1, ch2) is smaller value of . That is, α(ch1) and α(ch2) are inversely proportional to the normalized correlation values Corr_norm(ch1, ch2) of ch1 and ch2.
示例性的,当ch1和ch2组对时,可以通过如下公式(6)计算α(ch1)和α(ch2)。Exemplarily, when ch1 and ch2 are paired, α(ch1) and α(ch2) can be calculated by the following formula (6).
α(ch1,ch2)=C+(1-C)×(1–Corr_norm(ch1,ch2))/(1-threshold)(6)α(ch1, ch2)=C+(1-C)×(1-Corr_norm(ch1,ch2))/(1-threshold)(6)
其中,C为常数,C∈[0,1],threshold是ch1和ch2的归一化组对阈值,threshold∈[0,1],Corr_norm(ch1,ch2)是ch1和ch2的归一化相关值,coeff(ch1,ch2)∈[0,1]。在一些实施例中,C可以取0.707。threshold可以取0.2,0.25,或0.28等等。where C is a constant, C∈[0,1], threshold is the normalized pair threshold of ch1 and ch2, threshold∈[0,1], Corr_norm(ch1,ch2) is the normalized correlation of ch1 and ch2 value, coeff(ch1,ch2)∈[0,1]. In some embodiments, C may take 0.707. The threshold can be 0.2, 0.25, or 0.28 and so on.
两个声道相关值可以通过下述公式(7)计算,以ch1和ch2为例。The two channel correlation values can be calculated by the following formula (7), taking ch1 and ch2 as examples.
Figure PCTCN2021106102-appb-000014
Figure PCTCN2021106102-appb-000014
其中,Corr_norm(ch1,ch2)是ch1和ch2归一化的相关值,spec_ch1(i)是ch1的时域或频域系数,spec_ch2(i)是声道ch2的时域或频域系数,N是当前帧的系数的个数。where Corr_norm(ch1, ch2) is the normalized correlation value of ch1 and ch2, spec_ch1(i) is the time domain or frequency domain coefficient of ch1, spec_ch2(i) is the time domain or frequency domain coefficient of channel ch2, N is the number of coefficients for the current frame.
例如,L声道和R声道是第一声道对且归一化相关值为Corr_norm(L,R),LS声道和RS声道是第二声道对且归一化相关值为Corr_norm(LS,RS)。For example, the L and R channels are the first channel pair and the normalized correlation value is Corr_norm(L,R), the LS channel and the RS channel are the second channel pair and the normalized correlation value is Corr_norm (LS,RS).
对于其他声道对的两个声道的相关值也可以采用公式(7)计算,对于声道对的声道的加权系数也可以采用公式(6)计算。The correlation values of the two channels of other channel pairs can also be calculated by using the formula (7), and the weighting coefficients of the channels of the channel pair can also be calculated by using the formula (6).
考虑到立体声处理会减少两个参与立体声处理的声道的能量/幅度和,并且两个声道的能量/幅度和减少的程度和两个声道的音频信号的相似程度有关,即两个声道的音频信号的相关度越高,立体声处理后两个声道的能量/幅度和减少的越多。Considering that stereo processing will reduce the energy/amplitude sum of the two channels involved in stereo processing, and the reduction degree of the energy/amplitude sum of the two channels is related to the similarity of the audio signals of the two channels, that is, the two The higher the correlation of the audio signal of the channel, the more the energy/amplitude sum of the two channels is reduced after stereo processing.
因此,当一次比特分配使用立体声处理前的能量/幅度时,在一次比特分配时增加加权系数。相关度高的两个声道的加权系数小于相关度低的两个声道的加权系数。未组对的声道加权系数大于组对的声道的加权系数。同一组对的两个声道的加权系数相同。即可以通过如上方式三确定能量/幅度和,再通过下述步骤1022和步骤1023完成上述一次比特分配。Therefore, when the energy/amplitude before stereo processing is used for one bit allocation, the weighting coefficient is increased in one bit allocation. The weighting coefficients of the two channels with high correlation are smaller than the weighting coefficients of two channels with low correlation. The weighting coefficients of the ungrouped channels are greater than the weighting coefficients of the paired channels. The weighting coefficients of the two channels of the same pair are the same. That is, the energy/amplitude sum can be determined in the third method above, and then the above-mentioned first bit allocation can be completed through the following steps 1022 and 1023 .
步骤1022、根据K个声道对的音频信号各自的能量/幅度,与当前帧的能量/幅度和,确定K个声道对各自的比特系数。Step 1022: Determine the respective bit coefficients of the K channel pairs according to the energy/amplitude sum of the audio signals of the K channel pairs and the energy/amplitude sum of the current frame.
采用上述方式一、方式二或方式三确定能量/幅度和后,对于P=2K,可以根据K个声道对的音频信号各自的能量/幅度,和上述步骤1021确定的能量/幅度和,确定K个声道对各自的比特系数。After the energy/amplitude sum is determined in the first, second or third manner, for P=2K, the energy/amplitude sum of the audio signals of the K channel pairs and the energy/amplitude sum determined in the above step 1021 can be determined. The respective bit coefficients for the K channel pairs.
采用上述方式一、方式二或方式三确定能量/幅度和后,对于P=2×K+Q,可以根据K 个声道对的音频信号各自的能量/幅度,和上述步骤1021确定的能量/幅度和,确定K个声道对各自的比特系数,根据Q个声道各自的能量/幅度,与上述步骤1021确定的能量/幅度和,确定Q个声道各自的比特系数。After the energy/amplitude sum is determined in the first, second, or third manner, for P=2×K+Q, the energy/amplitude of the audio signals of the K channel pairs can be determined according to the respective energy/amplitude of the audio signals of the K channel pairs, and the energy/amplitude determined in step 1021 above. Amplitude sum, determine the respective bit coefficients of the K channel pairs, and determine the respective bit coefficients of the Q channels according to the respective energy/amplitude of the Q channels and the energy/amplitude sum determined in the above step 1021.
K个声道对各自的比特系数可以是K个声道对各自的能量/幅度,在上述步骤1021确定的能量/幅度和中的占比。一个声道对的能量/幅度可以为该声道对中的两个声道的能量/幅度之和。Q个未组对的声道各自的比特系数为Q个声道各自的能量/幅度,在上述步骤1021确定的能量/幅度和中的占比。The respective bit coefficients of the K channel pairs may be the ratios of the respective energy/amplitude of the K channel pairs to the energy/amplitude sum determined in the foregoing step 1021 . The energy/amplitude of a channel pair may be the sum of the energy/amplitude of the two channels in the channel pair. The respective bit coefficients of the Q unpaired channels are the ratios of the respective energy/amplitude of the Q channels in the sum of the energy/amplitude determined in step 1021 above.
步骤1023、根据K个声道对各自的比特系数和可用比特数,确定K个声道对各自的比特数。Step 1023: Determine the respective bit numbers of the K channel pairs according to the respective bit coefficients and the available bit numbers of the K channel pairs.
对于P=2K,可以根据K个声道对各自的比特系数,和可用比特数,确定K个声道对各自的比特数。For P=2K, the respective bit numbers of the K channel pairs can be determined according to the respective bit coefficients of the K channel pairs and the number of available bits.
对于P=2×K+Q,可以根据K个声道对各自的比特系数和可用比特数,确定K个声道对各自的比特数,根据Q个声道各自的比特系数和可用比特数,确定Q个声道各自的比特数。For P=2×K+Q, the respective bit numbers of the K channel pairs can be determined according to the respective bit coefficients and available bits of the K channel pairs, and according to the respective bit coefficients and available bits of the Q channels, Determines the number of bits for each of the Q channels.
本实施例,获取多声道音频信号的当前帧的P个声道的音频信号,该P个声道的音频信号包括K个声道对的音频信号,根据P个声道的音频信号各自的能量/幅度,确定当前帧的能量/幅度和,根据K个声道对的音频信号各自的能量/幅度,与当前帧的能量/幅度和,确定K个声道对各自的比特系数,根据K个声道对各自的比特系数和可用比特数,确定K个声道对各自的比特数,根据K个声道对各自的比特数,对P个声道的音频信号进行编码,以获取编码码流。其中,通过P个声道的音频信号各自在时域的能量/幅度、经时频变换后的能量/幅度、经时频变换以及白化后的能量/幅度、经能量/幅度均衡后的能量/幅度、或经立体声处理后的能量/幅度中至少一项确定当前帧的能量/幅度和,基于各个声道对的音频信号各自的能量/幅度在该能量/幅度和中的占比,进行针对声道对的比特分配,确定K个声道对各自的比特数,从而实现合理分配多声道信号编码中各个声道对的比特数,以保证解码端重建音频信号的质量。例如,对于声道对间能量/幅度差异较大的情况,通过本申请实施例的方法,可以解决能量/幅度大的声道对的编码比特不足的问题,以保证解码端重建音频信号的质量。In this embodiment, the audio signals of the P channels of the current frame of the multi-channel audio signal are acquired, and the audio signals of the P channels include the audio signals of the K channel pairs. Energy/amplitude, determine the energy/amplitude sum of the current frame, according to the respective energy/amplitude of the audio signals of the K channel pairs, and the energy/amplitude sum of the current frame, determine the respective bit coefficients of the K channel pairs, according to K The respective bit coefficients and available bits of each channel pair are determined, the respective bit numbers of K channel pairs are determined, and the audio signals of P channels are encoded according to the respective bit numbers of K channel pairs to obtain an encoded code flow. Among them, the energy/amplitude in the time domain, the energy/amplitude after time-frequency transformation, the energy/amplitude after time-frequency transformation and whitening, and the energy/amplitude after energy/amplitude equalization of the audio signals passing through the P channels At least one of the amplitude or the energy/amplitude after stereo processing determines the energy/amplitude sum of the current frame, and based on the ratio of the respective energy/amplitude of the audio signals of each channel pair in the energy/amplitude sum, the The bit allocation of channel pairs determines the number of bits of each of the K channel pairs, so as to reasonably allocate the number of bits of each channel pair in multi-channel signal encoding, so as to ensure the quality of the reconstructed audio signal at the decoding end. For example, in the case of a large difference in energy/amplitude between channel pairs, the method of the embodiments of the present application can solve the problem of insufficient coding bits for channel pairs with large energy/amplitude, so as to ensure the quality of the reconstructed audio signal at the decoding end .
下面实施例以5.1声道信号为例,对本申请实施例的多声道音频信号编码方法进行示意性举例说明。The following embodiments take a 5.1-channel signal as an example to schematically illustrate the multi-channel audio signal encoding method according to the embodiment of the present application.
图5为本申请实施例的编码端的处理过程的示意图,如图5所示,该编码端可以包括多声道编码处理单元401、声道编码单元402和码流复用接口403。该编码端可以是如上所述的编码器。FIG. 5 is a schematic diagram of a processing process of an encoding end according to an embodiment of the present application. As shown in FIG. 5 , the encoding end may include a multi-channel encoding processing unit 401 , a channel encoding unit 402 and a code stream multiplexing interface 403 . The encoding end may be an encoder as described above.
多声道编码处理单元401用于对输入信号进行多声道信号的筛选、组对、立体声处理及多声道边信息生成。本实施例中该输入信号为5.1(L声道、R声道、C声道、LFE声道、LS声道、RS声道)信号。The multi-channel encoding processing unit 401 is used to perform multi-channel signal filtering, group pairing, stereo processing and multi-channel side information generation on the input signal. In this embodiment, the input signal is a 5.1 (L channel, R channel, C channel, LFE channel, LS channel, RS channel) signal.
一种举例,多声道编码处理单元401将L声道信号和R声道信号进行组对,形成第一声道对,并经过立体声处理得到中声道M1声道信号和侧声道S1声道信号,将LS声道信号和RS声道信号进行组对,形成第二声道对,并经过立体声处理得到中声道M2声道信 号和侧声道S2声道信号。An example, the multi-channel encoding processing unit 401 pairs the L channel signal and the R channel signal to form a first channel pair, and obtains the middle channel M1 channel signal and the side channel S1 sound through stereo processing. The LS channel signal and the RS channel signal are paired to form a second channel pair, and the middle channel M2 channel signal and the side channel S2 channel signal are obtained through stereo processing.
由于多声道中声道间能量/幅度差异较大,所以在进行立体声处理前,对多声道进行能量/幅度均衡增加立体声处理的收益,即将能量/幅度集中到中声道以方便声道编码单元提高编码效率。本申请实施例采用对组对的声道进行均衡以获得声道间的能量/幅度均衡。假设能量/幅度均衡前各输入声道的当前帧的能量/幅度分别为energy_L、energy_R、energy_C、energy_LS、energy_RS。energy_L为能量/幅度均衡前L声道信号的能量/幅度,energy_R为能量/幅度均衡前R声道信号的能量/幅度,energy_C为能量/幅度均衡前C声道信号的能量/幅度,energy_LS为能量/幅度均衡前LS声道信号的能量/幅度,energy_RS为能量/幅度均衡前RS声道信号的能量/幅度。Due to the large difference in energy/amplitude between multi-channel middle channels, before performing stereo processing, the multi-channel energy/amplitude equalization increases the benefits of stereo processing, that is, the energy/amplitude is concentrated in the middle channel to facilitate the channel The coding unit improves coding efficiency. In the embodiment of the present application, equalizing the channels of the group pair is adopted to obtain the energy/amplitude equalization between the channels. It is assumed that the energy/amplitude of the current frame of each input channel before energy/amplitude equalization is energy_L, energy_R, energy_C, energy_LS, and energy_RS, respectively. energy_L is the energy/amplitude of the L channel signal before energy/amplitude equalization, energy_R is the energy/amplitude of the R channel signal before energy/amplitude equalization, energy_C is the energy/amplitude of the C channel signal before energy/amplitude equalization, and energy_LS is Energy/amplitude of the LS channel signal before energy/amplitude equalization, energy_RS is the energy/amplitude of the RS channel signal before energy/amplitude equalization.
第一声道对的L声道和R声道经能量/幅度均衡后的能量/幅度均为energy_avg_LR,energy_avg_LR的计算方式可以采用如下公式(8)。The energy/amplitude of the L channel and the R channel of the first channel pair after energy/amplitude equalization is energy_avg_LR, and the calculation method of energy_avg_LR may use the following formula (8).
energy_avg_LR=avg(energy_L,energy_R)          (8)energy_avg_LR=avg(energy_L,energy_R) (8)
第二声道对的LS声道和RS声道能量/幅度均衡后的能量/幅度均为energy_avg_LSRS,energy_avg_LSRS的计算方式可以采用如下公式(9)。The energy/amplitude of the LS channel and the RS channel after energy/amplitude equalization of the second channel pair are both energy_avg_LSRS, and the calculation method of energy_avg_LSRS may use the following formula (9).
energy_avg_LSRS=avg(energy_LS,energy_RS)          (9)energy_avg_LSRS=avg(energy_LS,energy_RS) (9)
其中,avg(a1,a2)函数实现输入2个参数a1,a2均值。a1取energy_L,a2取energy_R。a1取energy_LS,a2取energy_RS。Among them, the avg(a1, a2) function realizes the mean value of the input two parameters a1 and a2. a1 takes energy_L, a2 takes energy_R. a1 takes energy_LS, a2 takes energy_RS.
各个声道能量/幅度均衡前的能量/幅度energy(ch)(包括energy_L、energy_R、energy_C、energy_LS、energy_RS)的计算公式如下:The energy/amplitude energy(ch) (including energy_L, energy_R, energy_C, energy_LS, energy_RS) of each channel before energy/amplitude equalization is calculated as follows:
Figure PCTCN2021106102-appb-000015
Figure PCTCN2021106102-appb-000015
其中,sampleCoef(ch,i)表示声道索引为ch的声道的当前帧的第i个系数,N表示该当前帧的系数的个数,不同的ch取值可以对应上述L声道、R声道、C声道、LFE声道、LS声道、RS声道。Among them, sampleCoef(ch, i) represents the i-th coefficient of the current frame of the channel whose channel index is ch, N represents the number of coefficients of the current frame, and different ch values can correspond to the above L channel, R channel channel, C channel, LFE channel, LS channel, RS channel.
本申请实施例中,energy_L等于E pre(L),energy_R等于E pre(R),energy_LS等于E pre(LS),energy_RS等于E pre(RS),energy_C等于E pre(C)。E post(L)=E post(R)=energy_avg_LR。E post(LS)=E post(RS)=energy_avg_LSRS。 In the embodiment of the present application, energy_L is equal to E pre (L), energy_R is equal to E pre (R), energy_LS is equal to E pre (LS), energy_RS is equal to E pre (RS), and energy_C is equal to E pre (C). E post (L) = E post (R) = energy_avg_LR. E post (LS) = E post (RS) = energy_avg_LSRS.
多声道编码处理单元401输出经过立体声处理的M1声道信号、S1声道信号、M2声道信号、S2声道信号和未经过立体声处理的LFE声道信号和C声道信号,以及多声道边信息。The multi-channel encoding processing unit 401 outputs the stereo-processed M1 channel signal, the S1 channel signal, the M2 channel signal, the S2 channel signal, the LFE channel signal and the C channel signal not subjected to the stereo processing, and the multi-channel signal. Roadside information.
声道编码单元402用于对经过立体声处理的M1声道信号、S1声道信号、M2声道信号、S2声道信号和未经过立体声处理的LFE声道信号和C声道信号,以及多声道边信息进行编码,输出编码声道E1-E6。声道编码单元402可以包括多个声道处理盒,声道处理盒对能量/幅度更大的声道分配比能量/幅度小的声道更多的比特。声道编码单元402进行量化和熵编码以去除编码端冗余之后,将编码声道E1-E6送给码流复用接口403。The channel encoding unit 402 is used to encode the stereo processed M1 channel signal, S1 channel signal, M2 channel signal, S2 channel signal, LFE channel signal and C channel signal without stereo processing, and multi-channel signal. The channel side information is encoded, and the encoded channels E1-E6 are output. Channel encoding unit 402 may include a plurality of channel processing boxes that allocate more bits to channels with greater energy/amplitude than channels with less energy/amplitude. After the channel coding unit 402 performs quantization and entropy coding to remove redundancy at the coding end, the coded channels E1-E6 are sent to the code stream multiplexing interface 403.
码流复用接口403将六个编码声道E1-E6进行复用形成串行比特流(bitStream),以方便多声道音频信号在信道中传输或者在数字媒质中存储。The code stream multiplexing interface 403 multiplexes the six encoded channels E1-E6 to form a serial bit stream (bitStream), so as to facilitate the multi-channel audio signal to be transmitted in the channel or stored in the digital medium.
图6为本申请实施例的声道编码单元的处理过程的示意图,如图6所示,上述声道编码单元402可以包括比特分配单元4021和量化熵编码单元4023。本实施例为上述方式一的举例说明。FIG. 6 is a schematic diagram of a processing process of a channel encoding unit according to an embodiment of the present application. As shown in FIG. 6 , the channel encoding unit 402 may include a bit allocation unit 4021 and a quantization entropy encoding unit 4023 . This embodiment is an example of the above-mentioned first mode.
比特分配单元4021用于执行上述实施例中的一次比特分配和二次比特分配,以得 到各个声道的比特数。The bit allocation unit 4021 is used to perform the primary bit allocation and the secondary bit allocation in the above-mentioned embodiment, so as to obtain the number of bits of each channel.
示例性的,比特分配单元4021通过上述公式(1)和(2),确定立体声处理后的能量/幅度和sum_E post。再通过如下公式(11)至(14)确定各个声道对的比特系数,和未组对的声道的比特系数。本实施例中,第一声道对的比特系数用Ratio(L,R)表示,第二声道对的比特系数用Ratio(LS,RS)表示,未组对的C声道的比特系数用Ratio(C)表示,未组对的LFE声道的比特系数用Ratio(LFE)表示。 Exemplarily, the bit allocation unit 4021 determines the energy/amplitude and sum_E post after stereo processing according to the above formulas (1) and (2). Then, the bit coefficients of each channel pair and the bit coefficients of the unpaired channels are determined by the following formulas (11) to (14). In this embodiment, the bit coefficient of the first channel pair is represented by Ratio(L,R), the bit coefficient of the second channel pair is represented by Ratio(LS,RS), and the bit coefficient of the unpaired C channel is represented by Ratio(C) is represented, and the bit coefficients of the unpaired LFE channels are represented by Ratio(LFE).
Ratio(L,R)=(E post(M1)+E post(S1))/sum_E post        (11) Ratio(L,R)=(E post (M1)+E post (S1))/sum_E post (11)
Ratio(LS,RS)=(E post(M2)+E post(S2))/sum_E post        (12) Ratio(LS,RS)=(E post (M2)+E post (S2))/sum_E post (12)
Ratio(C)=E post(C)/sum_E post          (13) Ratio(C)=E post (C)/sum_E post (13)
Ratio(LFE)=E post(LFE)/sum_E post        (14) Ratio (LFE) = E post ( LFE) / sum_E post (14)
比特分配单元根据Ratio(L,R)、Ratio(LS,RS)、Ratio(C)、Ratio(LFE)、可用比特数bAvail、声道对索引pairIdx1和pairIdx2、以及各个声道的立体声处理后的能量/幅度E post(ch)计算得到各个声道的比特数。声道对索引pairIdx1和pairIdx2可以是多声道编码处理单元401输出的,该声道对索引pairIdx1用于指示L声道和R声道组对,该声道对索引pairIdx2用于指示LS声道和RS声道组对。 The bit allocation unit is based on Ratio(L,R), Ratio(LS,RS), Ratio(C), Ratio(LFE), the number of available bits bAvail, the channel pair indices pairIdx1 and pairIdx2, and the stereo processed result of each channel. The energy/amplitude E post (ch) is calculated to obtain the number of bits for each channel. The channel pair index pairIdx1 and pairIdx2 may be output by the multi-channel encoding processing unit 401, the channel pair index pairIdx1 is used to indicate the L channel and the R channel group pair, and the channel pair index pairIdx2 is used to indicate the LS channel paired with the RS channel group.
示例性的,可以通过如下公式(15)至(22)确定各个声道的比特数。Exemplarily, the number of bits of each channel can be determined by the following formulas (15) to (22).
声道对的比特分配:Bit allocation for channel pairs:
Bits(M1,S1)=bAvail×Ratio(L,R)        (15)Bits(M1,S1)=bAvail×Ratio(L,R) (15)
Bits(M2,S2)=bAvail×Ratio(LS,RS)        (16)Bits(M2,S2)=bAvail×Ratio(LS,RS) (16)
其中,Bits(M1,S1)表示第一声道对的比特数,Bits(M2,S2)表示第二声道对的比特数。Wherein, Bits(M1, S1) represents the number of bits of the first channel pair, and Bits(M2, S2) represents the number of bits of the second channel pair.
声道对内声道间的比特分配及未参与组对声道的比特分配:Bit allocation between channels within a channel pair and bit allocation for channels not involved in a group:
其中,组对声道的声道间的比特分配如下:Among them, the bit allocation between the channels of the group pair channel is as follows:
Bits(M1)=Bits(M1,S1)×E post(M1)/(E post(M1)+E post(S1))        (17) Bits(M1)=Bits(M1,S1)×E post (M1)/(E post (M1)+E post (S1)) (17)
Bits(S1)=Bits(M1,S1)×E post(S1)/(E post(M1)+E post(S1))      (18) Bits(S1)=Bits(M1,S1)×E post (S1)/(E post (M1)+E post (S1)) (18)
Bits(M2)=Bits(M2,S2)×E post(M2)/(E post(M2)+E post(S2))      (19) Bits(M2)=Bits(M2,S2)×E post (M2)/(E post (M2)+E post (S2)) (19)
Bits(S2)=Bits(M2,S2)×E post(S2)/(E post(M2)+E post(S2))     (20) Bits(S2)=Bits(M2,S2)×E post (S2)/(E post (M2)+E post (S2)) (20)
其中,Bits(M1)表示M1声道的比特数,Bits(S1)表示S1声道的比特数,Bits(M2)表示M2声道的比特数,Bits(S2)表示S2声道的比特数。Among them, Bits(M1) represents the number of bits of the M1 channel, Bits(S1) represents the number of bits of the S1 channel, Bits(M2) represents the number of bits of the M2 channel, and Bits(S2) represents the number of bits of the S2 channel.
未参与组对的声道的比特分配如下:The bit assignments for channels not participating in a group pair are as follows:
Bits(C)=bAvail×Ratio(C)       (21)Bits(C)=bAvail×Ratio(C) (21)
Bits(LFE)=bAvail×Ratio(LFE)       (22)Bits(LFE)=bAvail×Ratio(LFE) (22)
其中,Bits(C)表示C声道的比特数,Bits(LFE)表示LFE声道的比特数。Among them, Bits(C) represents the number of bits of the C channel, and Bits(LFE) represents the number of bits of the LFE channel.
量化熵编码单元4023根据各声道的比特数对经过立体声处理的M1声道信号、S1声道信号、M2声道信号、S2声道信号、C声道信号、LFE声道信号和多声道边信息进行量化和熵编码得到编码声道E1-E6信号。The quantization entropy coding unit 4023 performs stereo processing on the M1 channel signal, the S1 channel signal, the M2 channel signal, the S2 channel signal, the C channel signal, the LFE channel signal and the multi-channel signal according to the number of bits of each channel. The side information is quantized and entropy encoded to obtain the encoded channel E1-E6 signals.
本实施例,以声道对为粒度,对声道对的两个声道进行能量/幅度均衡,由于立体声处理前声道对间的能量/幅度比例不同,所以立体声处理后声道对间的能量/幅度比例也不同,之后再根据立体声处理后的各个声道对的能量/幅度比例进行声道对间的比特分配,最后进 行声道对内部的比特分配,可以实现合理分配多声道信号编码中各个声道的比特数,以保证解码端重建音频信号的质量。例如,对于声道对间能量/幅度差异较大的情况,通过本申请实施例的方法,可以解决能量/幅度大的声道对信号的编码比特不足的问题,以保证解码端重建音频信号的质量。In this embodiment, the channel pair is used as the granularity to perform energy/amplitude equalization on the two channels of the channel pair. Since the energy/amplitude ratio between the channel pairs before stereo processing is different, the The energy/amplitude ratio is also different. Then, according to the energy/amplitude ratio of each channel pair after stereo processing, the bit allocation between the channel pairs is performed, and finally the internal bit allocation of the channel pair is performed, which can realize the reasonable distribution of multi-channel signals. The number of bits of each channel in the encoding to ensure the quality of the reconstructed audio signal at the decoding end. For example, in the case of a large difference in energy/amplitude between channel pairs, the method of the embodiment of the present application can solve the problem of insufficient coding bits of the channel pair signal with large energy/amplitude, so as to ensure the reconstruction of the audio signal at the decoding end. quality.
与图5所示实施例的多声道编码处理单元401的能量/幅度均衡的具体实施方式,本申请实施例还提供另一种能量/幅度均衡的方式。以上述5.1声道信号为例做进一步举例说明。Similar to the specific implementation of the energy/amplitude equalization of the multi-channel encoding processing unit 401 in the embodiment shown in FIG. 5 , the embodiment of the present application further provides another energy/amplitude equalization manner. The above-mentioned 5.1-channel signal is taken as an example for further illustration.
各个声道能量/幅度均衡后的能量/幅度均为energy_avg。energy_avg可以通过如下公式(23)确定。The energy/amplitude of each channel after equalization is energy_avg. energy_avg can be determined by the following formula (23).
energy_avg=avg(energy_L,energy_R,energy_C,energy_LS,energy_RS)      (23)energy_avg=avg(energy_L,energy_R,energy_C,energy_LS,energy_RS) (23)
其中,Avg(a1,a2,...,an)函数实现输入n个参数a1,a2,...,an的均值。Among them, the Avg(a1, a2, ..., an) function realizes the mean value of the input n parameters a1, a2, ..., an.
图7为本申请实施例的声道编码单元的处理过程的示意图,如图7所示,上述声道编码单元402可以包括比特分配单元4021、量化熵编码单元4023和比特计算单元4022。本实施例为上述方式二的举例说明。FIG. 7 is a schematic diagram of a processing process of a channel encoding unit according to an embodiment of the present application. As shown in FIG. 7 , the channel encoding unit 402 may include a bit allocation unit 4021 , a quantization entropy encoding unit 4023 and a bit calculation unit 4022 . This embodiment is an example of the above-mentioned second manner.
比特分配单元4021用于执行上述实施例中的一次比特分配和二次比特分配,以得到各个声道的比特数。The bit allocation unit 4021 is configured to perform the primary bit allocation and the secondary bit allocation in the above-mentioned embodiment, so as to obtain the number of bits of each channel.
示例性的,比特计算单元4022通过上述公式(3)和(4),确定能量/幅度均衡前的能量/幅度和sum_E pre。再通过如下公式(24)至(27)确定各个声道对的比特系数,和未组对的声道的比特系数。本实施例中,第一声道对的比特系数用Ratio(L,R)表示,第二声道对的比特系数用Ratio(LS,RS)表示,未组对的C声道的比特系数用Ratio(C)表示,未组对的LFE声道的比特系数用Ratio(LFE)表示。 Exemplarily, the bit calculation unit 4022 determines the energy/amplitude sum sum_E pre before energy/amplitude equalization according to the above formulas (3) and (4). Then, the bit coefficients of each channel pair and the bit coefficients of the unpaired channels are determined by the following formulae (24) to (27). In this embodiment, the bit coefficient of the first channel pair is represented by Ratio(L,R), the bit coefficient of the second channel pair is represented by Ratio(LS,RS), and the bit coefficient of the unpaired C channel is represented by Ratio(C) is represented, and the bit coefficients of the unpaired LFE channels are represented by Ratio(LFE).
Ratio(L,R)=(E pre(L)+E pre(R))/sum_E pre        (24) Ratio(L,R)=(E pre (L)+E pre (R))/sum_E pre (24)
Ratio(LS,RS)=(E pre(LS)+E pre(RS))/sum_E pre        (25) Ratio(LS,RS)=(E pre (LS)+E pre (RS))/sum_E pre (25)
Ratio(C)=E pre(C)/sum_E pre       (26) Ratio (C) = E pre ( C) / sum_E pre (26)
Ratio(LFE)=E pre(LFE)/sum_E pre       (27) Ratio (LFE) = E pre ( LFE) / sum_E pre (27)
比特分配单元4021根据Ratio(L,R)、Ratio(LS,RS)、Ratio(C)、Ratio(LFE)、可用比特数bAvail、声道对索引pairIdx1和pairIdx2、以及各个声道的立体声处理后的能量/幅度E post(ch)计算得到各个声道的比特数。声道对索引pairIdx1和pairIdx2可以是多声道编码处理单元401输出的,该声道对索引pairIdx1用于指示L声道和R声道组对,该声道对索引pairIdx2用于指示LS声道和RS声道组对。 The bit allocation unit 4021 is based on Ratio(L,R), Ratio(LS,RS), Ratio(C), Ratio(LFE), the number of available bits bAvail, the channel pair indices pairIdx1 and pairIdx2, and the stereo processing of each channel. The energy/amplitude E post (ch) is calculated to obtain the number of bits for each channel. The channel pair index pairIdx1 and pairIdx2 may be output by the multi-channel encoding processing unit 401, the channel pair index pairIdx1 is used to indicate the L channel and the R channel group pair, and the channel pair index pairIdx2 is used to indicate the LS channel Pair with the RS channel group.
示例性的,基于上述公式(24)至(27)确定的比特数,再通过上述公式(15)至(22)可以确定各个声道的比特数。Exemplarily, based on the number of bits determined by the above formulae (24) to (27), the number of bits of each channel can be determined by the above formulae (15) to (22).
量化熵编码单元4023根据各声道的比特数对经过立体声处理的M1声道信号、S1声道信号、M2声道信号、S2声道信号、C声道信号、LFE声道信号和多声道边信息进行量化和熵编码得到编码声道E1-E6信号。The quantization entropy coding unit 4023 performs stereo processing on the M1 channel signal, the S1 channel signal, the M2 channel signal, the S2 channel signal, the C channel signal, the LFE channel signal and the multi-channel signal according to the number of bits of each channel. The side information is quantized and entropy encoded to obtain the encoded channel E1-E6 signals.
本实施例,对所有声道进行能量/幅度均衡后执行立体声处理,虽然立体声处理后各个声道的能量/幅度比例相似,但是本申请实施例在立体声处理之后,根据立体声处理前的各个声道对的能量/幅度比例进行声道对间的比特分配,再根据立体声处理后的能量/幅度进行声道对内部的比特分配。根据立体声处理前的声道对的能量/幅度比例来指导各声道对间 的比特分配,由于立体声处理前的声道对的能量/幅度比例不同,所以各声道对间依此进行比特分配,可以实现合理分配多声道信号编码中各个声道的比特数,以保证解码端重建音频信号的质量。例如,对于声道对间能量/幅度差异较大的情况,通过本申请实施例的方法,可以解决能量/幅度大的声道对信号的编码比特不足的问题,以保证解码端重建音频信号的质量。In this embodiment, stereo processing is performed after performing energy/amplitude equalization on all channels. Although the energy/amplitude ratio of each channel after stereo processing is similar, in this embodiment of the present application, after stereo processing Perform bit allocation between channel pairs according to the energy/amplitude ratio of the pair, and then perform bit allocation within the channel pair according to the energy/amplitude after stereo processing. According to the energy/amplitude ratio of the channel pair before stereo processing, the bit allocation between each channel pair is guided. Since the energy/amplitude ratio of the channel pair before stereo processing is different, the bit allocation between each channel pair is performed accordingly. , which can reasonably allocate the number of bits of each channel in the multi-channel signal encoding, so as to ensure the quality of the reconstructed audio signal at the decoding end. For example, in the case of a large difference in energy/amplitude between channel pairs, the method of the embodiment of the present application can solve the problem of insufficient coding bits of the channel pair signal with large energy/amplitude, so as to ensure the reconstruction of the audio signal at the decoding end. quality.
在一些实施例中,上述声道编码单元402可以包括比特分配单元4021、量化熵编码单元4023和比特计算单元4022,还可以用于实现上述方式三的各个步骤的功能。In some embodiments, the channel encoding unit 402 may include a bit allocation unit 4021, a quantization entropy encoding unit 4023, and a bit calculation unit 4022, and may also be used to implement the functions of each step in the third mode.
比特分配单元4021用于执行上述实施例中的一次比特分配和二次比特分配,以得到各个声道的比特数。The bit allocation unit 4021 is configured to perform the primary bit allocation and the secondary bit allocation in the above-mentioned embodiment, so as to obtain the number of bits of each channel.
示例性的,比特分配单元4021通过上述公式(5)至(7),确定能量/幅度均衡前的能量/幅度和sum_E pre。再通过如下公式(28)至(31)确定各个声道对的比特系数,和未组对的声道的比特系数。本实施例中,第一声道对的比特系数用Ratio(L,R)表示,第二声道对的比特系数用Ratio(LS,RS)表示,未组对的C声道的比特系数用Ratio(C)表示,未组对的LFE声道的比特系数用Ratio(LFE)表示。 Exemplarily, the bit allocation unit 4021 determines the energy/amplitude and sum_E pre before the energy/amplitude equalization according to the above formulas (5) to (7). Then, the bit coefficients of each channel pair and the bit coefficients of the unpaired channels are determined by the following formulae (28) to (31). In this embodiment, the bit coefficient of the first channel pair is represented by Ratio(L,R), the bit coefficient of the second channel pair is represented by Ratio(LS,RS), and the bit coefficient of the unpaired C channel is represented by Ratio(C) is represented, and the bit coefficients of the unpaired LFE channels are represented by Ratio(LFE).
Ratio(L,R)=(α(L)*E pre(L)+α(R)*E pre(R))/sum_E pre       (28) Ratio(L,R)=(α(L)*E pre (L)+α(R)*E pre (R))/sum_E pre (28)
Ratio(LS,RS)=(α(LS)*E pre(LS)+α(RS)*E pre(RS))/sum_E pre      (29) Ratio(LS,RS)=(α(LS)*E pre (LS)+α(RS)*E pre (RS))/sum_E pre (29)
Ratio(C)=α(C)*E pre(C)/sum_E pre      (30) Ratio(C)=α(C)*E pre (C)/sum_E pre (30)
Ratio(LFE)=α(LFE)*E pre(LFE)/sum_E pre        (31) Ratio(LFE)=α(LFE)*E pre (LFE)/sum_E pre (31)
其中,α(L)表示L声道的加权系数,α(R)表示R声道的加权系数,α(LS)表示LS声道的加权系数,α(RS)表示RS声道的加权系数,α(C)表示C声道的加权系数,α(LFE)表示LFE声道的加权系数。where α(L) represents the weighting coefficient of the L channel, α(R) represents the weighting coefficient of the R channel, α(LS) represents the weighting coefficient of the LS channel, α(RS) represents the weighting coefficient of the RS channel, α(C) represents the weighting coefficient of the C channel, and α(LFE) represents the weighting coefficient of the LFE channel.
示例性的,基于上述公式(28)至(31)确定的比特数,再通过上述公式(15)至(22)可以确定各个声道的比特数。Exemplarily, based on the number of bits determined by the above equations (28) to (31), the number of bits of each channel can be determined by the above equations (15) to (22).
量化熵编码单元根据各声道的比特数对经过立体声处理的M1声道信号、S1声道信号、M2声道信号、S2声道信号、C声道信号、LFE声道信号和多声道边信息进行量化和熵编码得到编码声道E1-E6信号。The quantization entropy coding unit pairs the stereo processed M1 channel signal, S1 channel signal, M2 channel signal, S2 channel signal, C channel signal, LFE channel signal and multi-channel side signal according to the number of bits of each channel. The information is quantized and entropy encoded to obtain encoded channel E1-E6 signals.
本实施例,通过加权系数调整比特分配,可以实现合理分配多声道信号编码中各个声道的比特数,以保证解码端重建音频信号的质量。In this embodiment, by adjusting the bit allocation by the weighting coefficient, it is possible to reasonably allocate the number of bits of each channel in the encoding of the multi-channel signal, so as to ensure the quality of the reconstructed audio signal at the decoding end.
图8为本申请实施例的另一种多声道音频信号编码方法的流程图,本申请实施例的执行主体可以是上述编码器,如图8所示,本实施例的方法可以包括:FIG. 8 is a flowchart of another multi-channel audio signal encoding method according to an embodiment of the present application. The execution body of the embodiment of the present application may be the foregoing encoder. As shown in FIG. 8 , the method in this embodiment may include:
步骤501、获取多声道音频信号的当前帧的P个声道的音频信号,P为大于1的正整数,该P个声道的音频信号包括K个声道对的音频信号。Step 501: Acquire audio signals of P channels of the current frame of the multi-channel audio signal, where P is a positive integer greater than 1, and the audio signals of the P channels include audio signals of K channel pairs.
其中,一个声道对(channel pair)的音频信号包括两个声道的音频信号。The audio signal of one channel pair includes audio signals of two channels.
本申请实施例的一个声道对可以是K个声道对中的任意一个。组对的(coupling)两个声道的音频信号即为一个声道对的音频信号。One channel pair in this embodiment of the present application may be any one of the K channel pairs. Coupling the audio signals of two channels is the audio signal of one channel pair.
在一些实施例中,P=2K。通过多声道信号的筛选、组对、立体声处理和多声道边信息生成之后,可以获取P个声道的音频信号,也即K个声道对的音频信号。In some embodiments, P=2K. After filtering, grouping, stereo processing, and generating multi-channel side information of multi-channel signals, audio signals of P channels, that is, audio signals of K channel pairs can be obtained.
在一些实施例中,该P个声道的音频信号还包括未组对的Q个声道的音频信号,P=2×K+Q,K为正整数,Q为正整数。In some embodiments, the audio signals of the P channels further include unpaired audio signals of the Q channels, where P=2×K+Q, where K is a positive integer, and Q is a positive integer.
其中,步骤501的具体解释说明可以参见图2所示实施例的步骤101,此处不再赘述。The specific explanation of step 501 may refer to step 101 of the embodiment shown in FIG. 2 , and details are not repeated here.
步骤502、根据K个声道对中当前声道对的两个声道的音频信号各自的能量/幅度,对该当前声道对的两个声道的音频信号进行能量/幅度均衡,获取该当前声道对的两个声道的音频信号各自的经能量/幅度均衡后的能量/幅度。Step 502, according to the respective energy/amplitude of the audio signals of the two channels of the current channel pair in the K channel pairs, perform energy/amplitude equalization on the audio signals of the two channels of the current channel pair, and obtain the The energy/amplitude of the respective energy/amplitude equalized audio signals of the two channels of the current channel pair.
本申请实施例针对声道对进行能量/幅度均衡,即各个声道对各自进行声道对内的能量/幅度均衡。以K个声道对中当前声道对为例,根据K个声道对中当前声道对的两个声道的音频信号各自的能量/幅度,对该当前声道对的两个声道的音频信号进行能量/幅度均衡,获取该当前声道对的两个声道的能量/幅度均衡后的能量/幅度。The embodiments of the present application perform energy/amplitude equalization for channel pairs, that is, each channel pair performs energy/amplitude equalization within the channel pair. Taking the current channel pair among the K channel pairs as an example, according to the respective energy/amplitude of the audio signals of the two channels of the current channel pair among the K channel pairs, the two channels of the current channel pair Perform energy/amplitude equalization on the audio signal of the current channel pair, and obtain the energy/amplitude equalized energy/amplitude of the two channels of the current channel pair.
无论是P=2K,还是P=2×K+Q,均可以采用上述步骤502的方式在声道对内进行能量/幅度均衡,以获取当前声道对内的两个声道各自的经能量/幅度均衡后的能量/幅度。Whether it is P=2K or P=2×K+Q, energy/amplitude equalization can be performed in the channel pair in the manner of step 502 above, so as to obtain the respective energies of the two channels in the current channel pair. /amplitude equalized energy/amplitude.
示例性的,可以采用上述公式(8)确定该当前声道对的两个声道的能量/幅度均衡后的能量/幅度。即将公式(8)中的L和R替换为该当前声道对的两个声道。Exemplarily, the above formula (8) may be used to determine the energy/amplitude after energy/amplitude equalization of the two channels of the current channel pair. That is, L and R in formula (8) are replaced by the two channels of the current channel pair.
步骤503、根据该当前声道对的两个声道的音频信号各自的经能量/幅度均衡后的能量/幅度,和可用比特数,确定该当前声道对的两个声道各自的比特数。Step 503: Determine the respective bit numbers of the two channels of the current channel pair according to the respective energy/amplitude after energy/amplitude equalization of the audio signals of the two channels of the current channel pair, and the number of available bits .
以K个声道对中当前声道对为例,根据该当前声道对的两个声道各自的能量/幅度均衡后的能量/幅度,和可用比特数,确定该当前声道对的两个声道各自的比特数。该当前声道对可以是K个声道对中的任意一个声道对。Taking the current channel pair among the K channel pairs as an example, according to the energy/amplitude equalized energy/amplitude of the two channels of the current channel pair, and the number of available bits, determine the two channels of the current channel pair. The number of bits for each channel. The current channel pair may be any one of the K channel pairs.
对于P=2×K,本申请实施例的方法可以根据K个声道对各自的两个声道的音频信号的能量/幅度均衡后的能量/幅度,确定当前帧的能量/幅度和。根据当前帧的能量/幅度和、当前声道对的两个声道的音频信号各自的经能量/幅度均衡后的能量/幅度以及可用比特数,确定当前声道对的两个声道各自的比特数。For P=2×K, the method of the embodiment of the present application may determine the energy/amplitude sum of the current frame according to the energy/amplitude equalized energy/amplitude of the audio signals of the respective two channels of the K channels. According to the energy/amplitude sum of the current frame, the energy/amplitude after energy/amplitude equalization of the audio signals of the two channels of the current channel pair, and the number of available bits, determine the respective energy/amplitude of the two channels of the current channel pair. number of bits.
例如,根据当前声道对的两个声道的音频信号各自的经能量/幅度均衡后的能量/幅度在该能量/幅度和中的占比,以及可用比特数,确定当前声道对的两个声道各自的比特数。For example, according to the ratio of the energy/amplitude equalized energy/amplitude of the audio signals of the two channels of the current channel pair to the energy/amplitude sum, and the number of available bits, determine the two channels of the current channel pair. The number of bits for each channel.
对于P=2×K+Q,本申请实施例的方法可以根据K个声道对各自的两个声道的音频信号的经能量/幅度均衡后的能量/幅度、以及Q个声道的音频信号的经能量/幅度均衡后的能量/幅度,确定当前帧的能量/幅度和。根据该能量/幅度和、当前声道对的两个声道的音频信号各自的能量/幅度以及可用比特数,确定当前声道对的两个声道各自的比特数。根据该能量/幅度和、Q个声道的音频信号各自的经能量/幅度均衡后的能量/幅度以及可用比特数,确定Q个声道各自的比特数。For P=2×K+Q, the method of the embodiment of the present application can be based on the energy/amplitude of the audio signals of the respective two channels of the K channels after energy/amplitude equalization, and the audio frequency of the Q channels The energy/amplitude of the signal after energy/amplitude equalization determines the energy/amplitude sum of the current frame. The respective bit numbers of the two channels of the current channel pair are determined according to the energy/amplitude sum, the respective energy/amplitude of the audio signals of the two channels of the current channel pair, and the number of available bits. According to the energy/amplitude sum, the energy/amplitude after energy/amplitude equalization of the audio signals of the Q channels, and the number of available bits, the number of bits for each of the Q channels is determined.
例如,根据当前声道对的两个声道的音频信号各自的能量/幅度在该能量/幅度和中的占比,以及可用比特数,确定当前声道对的两个声道各自的比特数。根据Q个声道的音频信号各自的经能量/幅度均衡后的能量/幅度在该能量/幅度和中的占比,以及可用比特数,确定Q个声道各自的比特数。For example, according to the ratio of the respective energy/amplitude of the audio signals of the two channels of the current channel pair to the energy/amplitude sum, and the number of available bits, determine the respective number of bits of the two channels of the current channel pair . The respective bit numbers of the Q channels are determined according to the ratio of the energy/amplitude equalized energy/amplitude of the audio signals of the Q channels to the energy/amplitude sum and the number of available bits.
其中,Q个声道的音频信号各自的经能量/幅度均衡后的能量/幅度可以等于各自的能量/幅度均衡前的能量/幅度,且约等于各自的经立体声处理后的能量/幅度。K个声道对各自的两个声道的音频信号的经能量/幅度均衡后的能量/幅度可以约等于各自的两个声道的音频信号的经立体声处理后的能量/幅度。The energy/amplitude after energy/amplitude equalization of the audio signals of the Q channels may be equal to the respective energy/amplitude before energy/amplitude equalization, and approximately equal to the respective energy/amplitude after stereo processing. The energy/amplitude equalized energy/amplitude of the respective two-channel audio signals of the K channels may be approximately equal to the stereo-processed energy/amplitude of the respective two-channel audio signals.
示例性的,可以采用上述公式(1)确定能量/幅度和,即将公式(1)中的经立体声处理后的能量/幅度替换为本实施例的各个声道的经能量/幅度均衡后的能量/幅度。Exemplarily, the above formula (1) can be used to determine the energy/amplitude sum, that is, the energy/amplitude after stereo processing in formula (1) is replaced by the energy/amplitude equalized energy of each channel in this embodiment. /amplitude.
步骤504、根据该当前声道对的两个声道各自的比特数分别对该两个声道的音频信号进行编码,获取编码码流。Step 504: Encode the audio signals of the two channels according to the respective bit numbers of the two channels of the current channel pair to obtain an encoded code stream.
分别对当前声道对中两个声道的音频信号进行编码,可以包括分别对当前声道对中两个声道的音频信号进行量化、熵编码以及码流复用,以获取编码码流。Respectively encoding the audio signals of the two channels in the current channel pair may include quantization, entropy encoding, and code stream multiplexing respectively on the audio signals of the two channels in the current channel pair to obtain an encoded code stream.
对于P=2K,根据K个声道对各自的比特数,分别对P个声道的音频信号进行量化、熵编码以及码流复用,获取编码码流。For P=2K, according to the respective bit numbers of the K channels, the audio signals of the P channels are respectively quantized, entropy encoded, and stream multiplexed to obtain an encoded stream.
对于P=2×K+Q,根据K个声道对各自的比特数分别对K个声道对的音频信号进行量化、熵编码以及码流复用,根据Q个声道各自的比特数分别对该Q个声道的音频信号进行量化、熵编码以及码流复用,获取编码码流。For P=2×K+Q, the audio signals of the K channel pairs are quantized, entropy encoded, and stream multiplexed according to the respective bit numbers of the K channels, respectively. Perform quantization, entropy encoding, and code stream multiplexing on the audio signals of the Q channels to obtain an encoded code stream.
本实施例,获取多声道音频信号的当前帧的P个声道的音频信号,该P个声道的音频信号包括K个声道对的音频信号,根据K个声道对中当前声道对的两个声道的音频信号各自的能量/幅度,对该当前声道对的两个声道的音频信号进行能量/幅度均衡,获取该当前声道对的两个声道的能量/幅度均衡后的能量/幅度,根据该当前声道对的两个声道各自的能量/幅度均衡后的能量/幅度,和可用比特数,确定该当前声道对的两个声道各自的比特数,根据该当前声道对的两个声道各自的比特数分别对该两个声道的音频信号进行编码,获取编码码流。通过声道对内的能量/幅度均衡,基于能量/幅度均衡后的能量/幅度进行比特分配,从而实现合理分配多声道信号编码中各个声道的比特数,以保证解码端重建音频信号的质量。例如,对于声道对间能量/幅度差异较大的情况,通过本申请实施例的方法,可以解决能量/幅度大的声道对信号的编码比特不足的问题,以保证解码端重建音频信号的质量。In this embodiment, the audio signals of P channels of the current frame of the multi-channel audio signal are acquired, the audio signals of the P channels include audio signals of K channel pairs, and the current channel is centered according to the K channel pairs. the respective energy/amplitude of the audio signals of the two channels, perform energy/amplitude equalization on the audio signals of the two channels of the current channel pair, and obtain the energy/amplitude of the two channels of the current channel pair The equalized energy/amplitude, according to the energy/amplitude of the two channels of the current channel pair after equalization, and the number of available bits, determine the respective bit numbers of the two channels of the current channel pair , and respectively encode the audio signals of the two channels according to the respective bit numbers of the two channels of the current channel pair to obtain an encoded code stream. Through the energy/amplitude equalization in the channel pair, the bit allocation is performed based on the energy/amplitude after the energy/amplitude equalization, so as to realize the reasonable allocation of the bits of each channel in the multi-channel signal encoding, so as to ensure the reconstruction of the audio signal at the decoding end. quality. For example, in the case of a large difference in energy/amplitude between channel pairs, the method of the embodiment of the present application can solve the problem of insufficient coding bits of the channel pair signal with large energy/amplitude, so as to ensure the reconstruction of the audio signal at the decoding end. quality.
以图5和图6所示实施例对图8所示实施例进行举例解释说明。The embodiment shown in FIG. 8 is explained by taking the embodiment shown in FIG. 5 and FIG. 6 as an example.
图5所示实施例的多声道编码处理单元401可以执行图8所示实施例的步骤501和步骤502,声道编码单元402可以执行图8所示实施例的步骤503。在声道编码单元402可以执行图8所示实施例的步骤503时,与图5和图6所示实施例的区别在于,比特分配单元4021可以通过如下方式确定各个声道的比特数。The multi-channel encoding processing unit 401 in the embodiment shown in FIG. 5 may perform steps 501 and 502 in the embodiment shown in FIG. 8 , and the channel encoding unit 402 may perform step 503 in the embodiment shown in FIG. 8 . When the channel encoding unit 402 can perform step 503 of the embodiment shown in FIG. 8 , the difference from the embodiments shown in FIG. 5 and FIG. 6 is that the bit allocation unit 4021 can determine the number of bits of each channel in the following manner.
本申请实施例的比特分配单元4021可以根据P个声道各自的能量/幅度均衡后的能量/幅度进行比特分配。具体可以采用如下公式(32)至(37)确定。The bit allocation unit 4021 in this embodiment of the present application may perform bit allocation according to the energy/amplitude equalized of the respective energy/amplitude of the P channels. Specifically, the following formulas (32) to (37) can be used to determine.
Bits(M1)=bAvail×E post(M1)/sum_E post        (32) Bits(M1)=bAvail×E post (M1)/sum_E post (32)
Bits(S1)=bAvail×E post(S1)/sum_E post       (33) Bits(S1)=bAvail×E post (S1)/sum_E post (33)
Bits(M2)=bAvail×E post(M2)/sum_E post        (34) Bits(M2)=bAvail×E post (M2)/sum_E post (34)
Bits(S2)=bAvail×E post(S2)/sum_E post         (35) Bits(S2)=bAvail×E post (S2)/sum_E post (35)
Bits(C)=bAvail×E post(C)/sum_E post       (36) Bits(C)=bAvail×E post (C)/sum_E post (36)
Bits(LFE)=bAvail×E post(LFE)/sum_E post        (37) Bits(LFE)=bAvail×E post (LFE)/sum_E post (37)
采用公式(32)至(37)进行比特分配时,需要多声道编码处理单元401采用声道对的能量/幅度均衡方式,即声道对内的能量/幅度均衡。其中,sum_E post可以采用上述公式(1)确定。 When using formulas (32) to (37) for bit allocation, the multi-channel encoding processing unit 401 needs to adopt the energy/amplitude equalization method of the channel pair, that is, the energy/amplitude equalization within the channel pair. Wherein, sum_E post can be determined by using the above formula (1).
L声道和R声道的能量/幅度均衡前的能量/幅度和E(L,R),经过能量/幅度均衡后,L声道和R声道的能量/幅度和并未发生变化,仍为E(L,R)。L声道和R声道经过立体声处理后,L声道和R声道的立体声处理后的能量/幅度和变为E post(M1,S1)。因为立体声处理会略降低L声道和R声道之间冗余并且满足E post(M1,S1)≈E(L,R)。也就是说,当L声道和R声道的能量/幅度和E(L,R)>>(远大于)LS声道和RS声道的能量/幅度和E(LS,RS)时,通过本申请实施例的多声道编码处理单元401和本实施例的比特分配单元4021的处理,可以使E(L,R)分配的比特Bits(M1)+Bits(S1)远大于Bits(M2)+Bits(S2),从而达到了声道对之间根据能量/幅度分配比特的目的。 The energy/amplitude sum E(L, R) before the energy/amplitude equalization of the L channel and the R channel, after the energy/amplitude equalization, the energy/amplitude sum of the L channel and the R channel has not changed, still is E(L, R). After the L channel and the R channel are stereo processed, the stereo processed energy/amplitude sum of the L channel and the R channel becomes E post (M1, S1). Because stereo processing will slightly reduce the redundancy between the L channel and the R channel and satisfy E post (M1, S1) ≈ E(L, R). That is to say, when the energy/amplitude sum E(L, R) >> (much greater than) the energy/amplitude sum E(LS, RS) of the LS channel and the RS channel, the The processing of the multi-channel coding processing unit 401 in this embodiment and the bit allocation unit 4021 in this embodiment can make the bits Bits(M1)+Bits(S1) allocated by E(L, R) much larger than Bits(M2) +Bits(S2), so as to achieve the purpose of allocating bits between channel pairs according to energy/amplitude.
Bits(M1)+Bits(S1)=bAvail×E post(M1)/sum_E post+bAvail×E post(S1)/sum_E post Bits(M1)+Bits(S1)=bAvail×E post (M1)/sum_E post +bAvail×E post (S1)/sum_E post
=bAvail×E post(M1,S1)/sum_E post =bAvail×E post (M1,S1)/sum_E post
>>bAvail×E post(M2,S2)/sum_E post >>bAvail×E post (M2,S2)/sum_E post
=Bits(M2)+Bits(S2)=Bits(M2)+Bits(S2)
本实施例,通过声道对内的能量/幅度均衡,基于能量/幅度均衡后的能量/幅度进行比特分配,从而实现合理分配多声道信号编码中各个声道的比特数,以保证解码端重建音频信号的质量。例如,对于声道对间能量/幅度差异较大的情况,通过本申请实施例的方法,可以解决能量/幅度大的声道对信号的编码比特不足的问题,以保证解码端重建音频信号的质量。In this embodiment, through the energy/amplitude equalization in the channel pair, bit allocation is performed based on the energy/amplitude after energy/amplitude equalization, so as to realize the reasonable distribution of the number of bits of each channel in the multi-channel signal encoding, so as to ensure the decoding end Reconstruct the quality of the audio signal. For example, in the case of a large difference in energy/amplitude between channel pairs, the method of the embodiment of the present application can solve the problem of insufficient coding bits of the channel pair signal with large energy/amplitude, so as to ensure the reconstruction of the audio signal at the decoding end. quality.
基于与上述方法相同的发明构思,本申请实施例还提供了一种音频信号编码装置,该音频信号编码装置可以应用于音频编码器。Based on the same inventive concept as the above method, an embodiment of the present application further provides an audio signal encoding apparatus, which can be applied to an audio encoder.
图9为本申请实施例的一种音频信号编码装置的结构示意图,如图9所示,该音频信号编码装置700包括:获取模块701、比特分配模块702、以及编码模块703。FIG. 9 is a schematic structural diagram of an audio signal encoding apparatus according to an embodiment of the present application. As shown in FIG. 9 , the audio signal encoding apparatus 700 includes an acquisition module 701 , a bit allocation module 702 , and an encoding module 703 .
获取模块701,用于获取多声道音频信号的当前帧的P个声道的音频信号和P个声道的音频信号各自的能量/幅度,P为大于1的正整数,该P个声道的音频信号包括K个声道对的音频信号,K为正整数。The acquisition module 701 is used to acquire the respective energy/amplitude of the audio signals of the P channels and the audio signals of the P channels of the current frame of the multi-channel audio signal, where P is a positive integer greater than 1, and the P channels The audio signal includes audio signals of K channel pairs, where K is a positive integer.
比特分配模块702,用于根据该P个声道的音频信号各自的能量/幅度,和可用比特数,确定K个声道对各自的比特数。The bit allocation module 702 is configured to determine the respective bit numbers of the K channel pairs according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits.
编码模块703,用于根据该K个声道对各自的比特数,对该P个声道的音频信号进行编码,以获取编码码流。The encoding module 703 is configured to encode the audio signals of the P channels according to the respective bit numbers of the K channels to obtain an encoded code stream.
其中,该P个声道中的一个声道的音频信号的能量/幅度包括该一个声道的音频信号在时域的能量/幅度、该一个声道的音频信号经时频变换后的能量/幅度、该一个声道的音频信号经时频变换以及白化后的能量/幅度、该一个声道的音频信号经能量/幅度均衡后的能量/幅度、或该一个声道的音频信号经立体声处理后的能量/幅度中至少一项。Wherein, the energy/amplitude of the audio signal of one channel in the P channels includes the energy/amplitude of the audio signal of the one channel in the time domain, the energy/amplitude of the audio signal of the one channel after time-frequency transformation Amplitude, the time-frequency transformed and whitened energy/amplitude of the audio signal of the one channel, the energy/amplitude of the audio signal of the one channel after energy/amplitude equalization, or the audio signal of the one channel after stereo processing At least one of the following energy/amplitude.
在一些实施例中,编码模块703用于根据K个声道对中的当前声道对的比特数以及当前声道对中两个声道的音频信号各自的立体声处理后的能量/幅度,确定当前声道对中两个声道各自的比特数;根据当前声道对中两个声道各自的比特数分别对两个声道的音频信号进行编码。In some embodiments, the encoding module 703 is configured to determine, according to the number of bits of the current channel pair in the K channel pairs and the respective stereo-processed energy/amplitude of the audio signals of the two channels in the current channel pair, The respective bit numbers of the two channels in the current channel pair; the audio signals of the two channels are encoded according to the respective bit numbers of the two channels in the current channel pair.
在一些实施例中,比特分配模块702用于:根据该P个声道的音频信号各自的能量/幅度,确定当前帧的能量/幅度和。根据K个声道对的音频信号各自的能量/幅度,与该当前 帧的能量/幅度和,确定K个声道对各自的比特系数。根据该K个声道对各自的比特系数和该可用比特数,确定K个声道对各自的比特数。In some embodiments, the bit allocation module 702 is configured to: determine the energy/amplitude sum of the current frame according to the respective energy/amplitude of the audio signals of the P channels. According to the sum of the energy/amplitude of the audio signals of the K channel pairs and the energy/amplitude sum of the current frame, the respective bit coefficients of the K channel pairs are determined. The respective bit numbers of the K channel pairs are determined according to the respective bit coefficients of the K channel pairs and the available number of bits.
在一些实施例中,该比特分配模块702用于:根据该P个声道的音频信号各自的立体声处理后的能量/幅度,确定该当前帧的能量/幅度和。In some embodiments, the bit allocation module 702 is configured to: determine the energy/amplitude sum of the current frame according to the respective stereo-processed energy/amplitude of the audio signals of the P channels.
在一些实施例中,该比特分配模块702用于:In some embodiments, the bit allocation module 702 is used to:
根据公式
Figure PCTCN2021106102-appb-000016
计算当前帧的能量/幅度和sum_E post
According to the formula
Figure PCTCN2021106102-appb-000016
Calculate the energy/magnitude of the current frame and sum_E post ;
其中,
Figure PCTCN2021106102-appb-000017
in,
Figure PCTCN2021106102-appb-000017
其中,ch表示声道索引,E post(ch)表示声道索引为ch的声道的音频信号经立体声处理后的能量/幅度,sampleCoef post(ch,i)表示立体声处理后的ch声道的当前帧的第i个系数,N表示该当前帧的系数的个数,N取大于1的正整数。 Among them, ch represents the channel index, E post (ch) represents the energy/amplitude of the audio signal of the channel whose channel index is ch after stereo processing, and sampleCoef post (ch, i) represents the stereo processed channel of ch. The ith coefficient of the current frame, N represents the number of coefficients of the current frame, and N takes a positive integer greater than 1.
在一些实施例中,该比特分配模块702用于:根据该P个声道的音频信号各自的能量/幅度均衡前的能量/幅度,确定该当前帧的能量/幅度和。In some embodiments, the bit allocation module 702 is configured to: determine the energy/amplitude sum of the current frame according to the energy/amplitude before equalization of the respective energy/amplitude of the audio signals of the P channels.
在一些实施例中,该比特分配模块702用于:根据公式
Figure PCTCN2021106102-appb-000018
计算当前帧的能量/幅度和sum_E pre,其中,ch表示声道索引,E pre(ch)表示声道索引为ch的声道的音频信号经能量/幅度均衡前的能量/幅度。
In some embodiments, the bit allocation module 702 is used to: according to the formula
Figure PCTCN2021106102-appb-000018
Calculate the energy/amplitude and sum_E pre of the current frame, where ch represents the channel index, and E pre (ch) represents the energy/amplitude of the audio signal of the channel whose channel index is ch before energy/amplitude equalization.
在一些实施例中,该比特分配模块702用于:根据该P个声道的音频信号各自的能量/幅度均衡前的能量/幅度和该P个声道各自的加权系数,确定该当前帧的能量/幅度和,该加权系数小于或等于1。In some embodiments, the bit allocation module 702 is configured to: determine the current frame according to the energy/amplitude of the audio signals of the P channels before equalization and the respective weighting coefficients of the P channels. Energy/amplitude sum, the weighting factor is less than or equal to 1.
在一些实施例中,该比特分配模块702用于:In some embodiments, the bit allocation module 702 is used to:
根据公式
Figure PCTCN2021106102-appb-000019
计算当前帧的能量/幅度和sum_E pre
According to the formula
Figure PCTCN2021106102-appb-000019
Calculate the energy/amplitude and sum_E pre of the current frame;
其中,α(ch)为ch声道的加权系数,一个声道对的两个声道的加权系数相同,且一个声道对的两个声道的加权系数大小与该两个声道之间的归一化相关值成反比。Among them, α(ch) is the weighting coefficient of the ch channel, the weighting coefficients of the two channels of a channel pair are the same, and the weighting coefficients of the two channels of a channel pair are the same as the difference between the two channels. is inversely proportional to the normalized correlation value of .
在一些实施例中,该P个声道的音频信号还包括未组对的Q个声道的音频信号,P=2×K+Q,K为正整数,Q为正整数。该比特分配模块702用于:根据该P个声道的音频信号各自的能量/幅度,和可用比特数,确定K个声道对的各自的比特数和Q个声道各自的比特数。编码模块703用于根据K个声道对各自的比特数分别对K个声道对的音频信号进行编码,根据Q个声道各自的比特数分别对该Q个声道的音频信号进行编码。In some embodiments, the audio signals of the P channels further include unpaired audio signals of the Q channels, where P=2×K+Q, where K is a positive integer, and Q is a positive integer. The bit allocation module 702 is configured to: determine the respective bit numbers of the K channel pairs and the respective bit numbers of the Q channels according to the respective energy/amplitude of the audio signals of the P channels and the number of available bits. The encoding module 703 is configured to encode the audio signals of the K channel pairs according to the respective bit numbers of the K channels, respectively encode the Q channel audio signals according to the respective bit numbers of the Q channels.
在一些实施例中,该比特分配模块702用于:根据P个声道的音频信号各自的能量/幅度,确定当前帧的能量/幅度和。根据K个声道对的音频信号各自的能量/幅度,与该当前帧的能量/幅度和,确定该K个声道对各自的比特系数。根据该Q个声道的音频信号各自的能量/幅度,与该当前帧的能量/幅度和,确定该Q个声道各自的比特系数。根据K个声道对各自的比特系数和所述可用比特数,确定K个声道对各自的比特数。根据Q个声道各自的比特系数和可用比特数,确定Q个声道各自的比特数。In some embodiments, the bit allocation module 702 is configured to: determine the energy/amplitude sum of the current frame according to the respective energy/amplitude of the audio signals of the P channels. According to the sum of the respective energy/amplitude of the audio signals of the K channel pairs and the energy/amplitude sum of the current frame, the respective bit coefficients of the K channel pairs are determined. The respective bit coefficients of the Q channels are determined according to the sum of the energy/amplitude of the audio signals of the Q channels and the energy/amplitude of the current frame. The respective bit numbers of the K channel pairs are determined according to the respective bit coefficients of the K channel pairs and the available number of bits. The number of bits of each of the Q channels is determined according to the respective bit coefficients and the number of available bits of the Q channels.
在一些实施例中,该装置还可以包括:能量/幅度均衡模块704。该能量/幅度均衡模块704用于根据该P个声道的音频信号,获取P个声道的经能量/幅度均衡后的音频信号。前述的一个声道的音频信号经能量/幅度均衡后的能量/幅度即是通过该一个声道的能量/幅度 均衡后的音频信号获得的。In some embodiments, the apparatus may further include: an energy/amplitude equalization module 704 . The energy/amplitude equalization module 704 is configured to obtain the energy/amplitude equalized audio signals of the P channels according to the audio signals of the P channels. The energy/amplitude of the aforementioned audio signal of one channel after energy/amplitude equalization is obtained from the energy/amplitude equalized audio signal of the one channel.
该编码模块703用于根据该K个声道对各自的比特数,对该P个声道的经能量/幅度均衡后的音频信号进行编码。The encoding module 703 is configured to encode the energy/amplitude equalized audio signals of the P channels according to the respective bit numbers of the K channels.
需要说明的是,上述获取模块701、比特分配模块702、以及编码模块703可应用于编码端的音频信号编码过程。It should be noted that the acquisition module 701, the bit allocation module 702, and the encoding module 703 can be applied to the audio signal encoding process at the encoding end.
还需要说明的是,获取模块701、比特分配模块702、以及编码模块703的具体实现过程可参考上述方法实施例的详细描述,为了说明书的简洁,这里不再赘述。It should also be noted that, for the specific implementation process of the obtaining module 701, the bit allocation module 702, and the encoding module 703, reference may be made to the detailed description of the above method embodiments, which are not repeated here for brevity of the description.
本申请实施例还提供另一种音频信号编码装置,该音频信号编码装置的可以采用如图9所示的结构示意图,本实施例的音频信号编码装置用于执行图8所示实施例的方法。An embodiment of the present application further provides another audio signal encoding apparatus. The audio signal encoding apparatus may adopt the schematic structural diagram shown in FIG. 9 , and the audio signal encoding apparatus of this embodiment is used to execute the method of the embodiment shown in FIG. 8 . .
在一些实施例中,与图9所示实施例的各个模块的功能不同,本实施例中,获取模块701用于获取多声道音频信号的当前帧的P个声道的音频信号,P为大于1的正整数,该P个声道的音频信号包括K个声道对的音频信号,K为正整数。In some embodiments, the functions of each module in the embodiment shown in FIG. 9 are different. In this embodiment, the obtaining module 701 is configured to obtain the audio signals of P channels of the current frame of the multi-channel audio signal, where P is A positive integer greater than 1, the audio signals of the P channels include audio signals of K channel pairs, and K is a positive integer.
能量/幅度均衡模块704,用于根据该K个声道对中当前声道对的两个声道的音频信号各自的能量/幅度,对该当前声道对的两个声道的音频信号进行能量/幅度均衡,获取该当前声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度。The energy/amplitude equalization module 704 is configured to perform an analysis on the audio signals of the two channels of the current channel pair according to the respective energy/amplitude of the audio signals of the two channels of the current channel pair in the K channel pairs. Energy/amplitude equalization: Obtain the energy/amplitude equalized energy/amplitude of the audio signals of the two channels of the current channel pair.
比特分配模块702,用于根据该当前声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度,和可用比特数,确定该当前声道对的两个声道各自的比特数。A bit allocation module 702, configured to determine the respective energy/amplitude of the audio signals of the two channels of the current channel pair after equalization of energy/amplitude, and the number of available bits, to determine the respective two channels of the current channel pair. number of bits.
编码模块703,用于根据该当前声道对的两个声道各自的比特数分别对该两个声道的音频信号进行编码,以获取编码码流。The encoding module 703 is configured to encode the audio signals of the two channels according to the respective bit numbers of the two channels of the current channel pair to obtain an encoded code stream.
在一些实施例中,比特分配模块702用于根据P个声道的音频信号各自的能量/幅度均衡后的能量/幅度,确定当前帧的能量/幅度和。根据该当前帧的能量/幅度和、该当前声道对的两个声道的音频信号各自的能量/幅度均衡后的能量/幅度以及可用比特数,确定该当前声道对的两个声道各自的比特数。In some embodiments, the bit allocation module 702 is configured to determine the energy/amplitude sum of the current frame according to the energy/amplitude equalized energy/amplitude of the audio signals of the P channels. Determine the two channels of the current channel pair according to the energy/amplitude sum of the current frame, the respective energy/amplitude equalized energy/amplitude of the audio signals of the two channels of the current channel pair, and the number of available bits the respective number of bits.
在一些实施例中,该P个声道的音频信号还包括未组对的Q个声道的音频信号,P=2×K+Q,K为正整数,Q为正整数。In some embodiments, the audio signals of the P channels further include unpaired audio signals of the Q channels, where P=2×K+Q, where K is a positive integer, and Q is a positive integer.
比特分配模块702用于根据所述K个声道对各自的两个声道的音频信号的能量/幅度均衡后的能量/幅度、以及该Q个声道的音频信号的能量/幅度均衡后的能量/幅度,确定当前帧的能量/幅度和。根据该当前帧的能量/幅度和、该当前声道对的两个声道的音频信号各自的能量/幅度以及可用比特数,确定该当前声道对的两个声道各自的比特数。根据该当前帧的能量/幅度和、该Q个声道的音频信号各自的能量/幅度均衡后的能量/幅度以及可用比特数,确定该Q个声道各自的比特数。The bit allocation module 702 is configured to equalize the energy/amplitude of the audio signals of the respective two channels according to the energy/amplitude of the K channels, and the energy/amplitude equalized energy/amplitude of the audio signals of the Q channels. energy/amplitude, determines the energy/amplitude sum of the current frame. The respective bit numbers of the two channels of the current channel pair are determined according to the energy/amplitude sum of the current frame, the respective energy/amplitude of the audio signals of the two channels of the current channel pair, and the number of available bits. The respective bit numbers of the Q channels are determined according to the energy/amplitude sum of the current frame, the energy/amplitude equalized energy/amplitude of the audio signals of the Q channels, and the number of available bits.
编码模块703,用于根据该K个声道对各自的比特数分别对该K个声道对的音频信号进行编码,根据该Q个声道各自的比特数分别对该Q个声道的音频信号进行编码,以获取编码码流。The encoding module 703 is configured to encode the audio signals of the K channel pairs according to the respective bit numbers of the K channels, and respectively encode the audio signals of the Q channels according to the respective bit numbers of the Q channels The signal is encoded to obtain the encoded code stream.
需要说明的是,上述获取模块701、比特分配模块702、能量/幅度均衡模块704以及编码模块703可应用于编码端的音频信号编码过程。It should be noted that the acquisition module 701 , the bit allocation module 702 , the energy/amplitude equalization module 704 , and the encoding module 703 can be applied to the audio signal encoding process at the encoding end.
还需要说明的是,上述获取模块701、比特分配模块702、能量/幅度均衡模块704以及编码模块703的具体实现过程可参考上述图8所示方法实施例的详细描述,为了说明书的简洁,这里不再赘述。It should also be noted that, for the specific implementation process of the acquisition module 701, the bit allocation module 702, the energy/amplitude equalization module 704, and the encoding module 703, reference may be made to the detailed description of the method embodiment shown in FIG. 8. For the brevity of the description, here No longer.
基于与上述方法相同的发明构思,本申请实施例提供一种音频信号编码器,音频信号编码器用于编码音频信号,包括:如执行如上述一个或者多个实施例中所述的编码器,其中,音频信号编码装置用于编码生成对应的码流。Based on the same inventive concept as the above method, an embodiment of the present application provides an audio signal encoder. The audio signal encoder is used to encode an audio signal, including: performing the encoder described in one or more of the above embodiments, wherein , the audio signal encoding device is used to encode and generate the corresponding code stream.
基于与上述方法相同的发明构思,本申请实施例提供一种用于编码音频信号的设备,例如,音频信号编码设备,请参阅图10所示,音频信号编码设备800包括:Based on the same inventive concept as the above method, an embodiment of the present application provides a device for encoding an audio signal, for example, an audio signal encoding device, as shown in FIG. 10 , the audio signal encoding device 800 includes:
处理器801、存储器802以及通信接口803(其中音频信号编码设备800中的处理器801的数量可以一个或多个,图10中以一个处理器为例)。在本申请的一些实施例中,处理器801、存储器802以及通信接口803可通过总线或其它方式连接,其中,图10中以通过总线连接为例。A processor 801, a memory 802, and a communication interface 803 (wherein the number of processors 801 in the audio signal encoding device 800 may be one or more, and one processor is taken as an example in FIG. 10). In some embodiments of the present application, the processor 801 , the memory 802 , and the communication interface 803 may be connected by a bus or in other ways, wherein the connection by a bus is taken as an example in FIG. 10 .
存储器802可以包括只读存储器和随机存取存储器,并向处理器801提供指令和数据。存储器802的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器802存储有操作系统和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。操作系统可包括各种系统程序,用于实现各种基础业务以及处理基于硬件的任务。 Memory 802 may include read-only memory and random access memory, and provides instructions and data to processor 801 . A portion of memory 802 may also include non-volatile random access memory (NVRAM). The memory 802 stores an operating system and operation instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operation instructions may include various operation instructions for implementing various operations. The operating system may include various system programs for implementing various basic services and handling hardware-based tasks.
处理器801控制音频编码设备的操作,处理器801还可以称为中央处理单元(central processing unit,CPU)。具体的应用中,音频编码设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。The processor 801 controls the operation of the audio encoding device, and the processor 801 may also be referred to as a central processing unit (central processing unit, CPU). In a specific application, various components of the audio coding device are coupled together through a bus system, where the bus system may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus. However, for the sake of clarity, the various buses are referred to as bus systems in the figures.
上述本申请实施例揭示的方法可以应用于处理器801中,或者由处理器801实现。处理器801可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器801中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器801可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器802,处理器801读取存储器802中的信息,结合其硬件完成上述方法的步骤。The methods disclosed in the above embodiments of the present application may be applied to the processor 801 or implemented by the processor 801 . The processor 801 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above-mentioned method may be completed by an integrated logic circuit of hardware in the processor 801 or an instruction in the form of software. The above-mentioned processor 801 may be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application specific integrated circuit (application specific integrated circuit, ASIC), a field-programmable gate array (field-programmable gate array, FPGA) or Other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The methods, steps, and logic block diagrams disclosed in the embodiments of this application can be implemented or executed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory 802, and the processor 801 reads the information in the memory 802, and completes the steps of the above method in combination with its hardware.
通信接口803可用于接收或发送数字或字符信息,例如可以是输入/输出接口、管脚或电路等。举例而言,通过通信接口803发送上述编码码流。The communication interface 803 can be used to receive or transmit digital or character information, for example, it can be an input/output interface, a pin or a circuit, and the like. For example, the above-mentioned encoded code stream is sent through the communication interface 803 .
基于与上述方法相同的发明构思,本申请实施例提供一种音频编码设备,包括:相互耦合的非易失性存储器和处理器,所述处理器调用存储在所述存储器中的程序代码以执行如上述一个或者多个实施例中所述的多声道音频信号编码方法的部分或全部步骤。Based on the same inventive concept as the above method, an embodiment of the present application provides an audio encoding device, including: a non-volatile memory and a processor coupled to each other, the processor calling program codes stored in the memory to execute Part or all of the steps of the multi-channel audio signal encoding method as described in one or more of the above embodiments.
基于与上述方法相同的发明构思,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储了程序代码,其中,所述程序代码包括用于执行如上述一个或者多个实施例中所述的多声道音频信号编码方法的部分或全部步骤的指令。Based on the same inventive concept as the above method, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a program code, wherein the program code includes a program code for executing one or more of the above Instructions for part or all of the steps of the multi-channel audio signal encoding method described in the embodiments.
基于与上述方法相同的发明构思,本申请实施例提供一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如上述一个或者多个实施例中所述的多声道音频信号编码方法的部分或全部步骤。Based on the same inventive concept as the above method, an embodiment of the present application provides a computer program product, when the computer program product is run on a computer, the computer is made to execute the multiple methods described in one or more of the above embodiments. Some or all of the steps of a method for encoding a channel audio signal.
以上各实施例中提及的处理器可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。处理器可以是通用处理器、数字信号处理器(digital signal processor,DSP)、特定应用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。本申请实施例公开的方法的步骤可以直接体现为硬件编码处理器执行完成,或者用编码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。The processor mentioned in the above embodiments may be an integrated circuit chip, which has signal processing capability. In the implementation process, each step of the above method embodiments may be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software. The processor may be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), or other Programming logic devices, discrete gate or transistor logic devices, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the methods disclosed in the embodiments of the present application may be directly embodied as executed by a hardware coding processor, or executed by a combination of hardware and software modules in the coding processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware.
上述各实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。The memory mentioned in the above embodiments may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically programmable Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory. Volatile memory may be random access memory (RAM), which acts as an external cache. By way of example and not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (synchlink DRAM, SLDRAM) ) and direct memory bus random access memory (direct rambus RAM, DR RAM). It should be noted that the memory of the systems and methods described herein is intended to include, but not be limited to, these and any other suitable types of memory.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution, and the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, removable hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims (34)

  1. 一种多声道音频信号编码方法,其特征在于,包括:A method for encoding a multi-channel audio signal, comprising:
    获取多声道音频信号的当前帧的P个声道的音频信号,P为大于1的正整数,所述P个声道的音频信号包括K个声道对的音频信号,K为正整数;Obtain the audio signals of P channels of the current frame of the multi-channel audio signal, where P is a positive integer greater than 1, the audio signals of the P channels include audio signals of K channel pairs, and K is a positive integer;
    获取所述P个声道的音频信号各自的能量/幅度;obtaining the respective energy/amplitude of the audio signals of the P channels;
    根据所述P个声道的音频信号各自的能量/幅度和可用比特数,确定所述K个声道对各自的比特数;Determine the respective bit numbers of the K channel pairs according to the respective energy/amplitude and available bit numbers of the audio signals of the P channels;
    根据所述K个声道对各自的比特数,对所述P个声道的音频信号进行编码,以获取编码码流;According to the respective bit numbers of the K channels, the audio signals of the P channels are encoded to obtain an encoded code stream;
    其中,所述P个声道中的一个声道的音频信号的能量/幅度包括所述一个声道的音频信号在时域的能量/幅度、所述一个声道的音频信号经时频变换后的能量/幅度、所述一个声道的音频信号经时频变换以及白化后的能量/幅度、所述一个声道的音频信号经能量/幅度均衡后的能量/幅度、或所述一个声道的音频信号经立体声处理后的能量/幅度中至少一项。Wherein, the energy/amplitude of the audio signal of one channel of the P channels includes the energy/amplitude of the audio signal of the one channel in the time domain, the time-frequency transformation of the audio signal of the one channel The energy/amplitude of the audio signal of the one channel after time-frequency transformation and whitening, the energy/amplitude of the audio signal of the one channel after energy/amplitude equalization, or the energy/amplitude of the audio signal of the one channel At least one item of energy/amplitude of the audio signal after stereo processing.
  2. 根据权利要求1所述的方法,其特征在于,所述K个声道对包括当前声道对,所述根据所述K个声道对各自的比特数,对所述P个声道的音频信号进行编码包括:根据所述当前声道对的比特数对所述当前声道对的音频信号进行编码;The method according to claim 1, wherein the K channel pairs include a current channel pair, and the audio frequency of the P channels is adjusted according to the respective bit numbers of the K channel pairs. The signal encoding includes: encoding the audio signal of the current channel pair according to the number of bits of the current channel pair;
    所述根据所述当前声道对的比特数对所述当前声道对的音频信号进行编码包括:The encoding of the audio signal of the current channel pair according to the number of bits of the current channel pair includes:
    根据所述当前声道对的比特数以及所述当前声道对中两个声道的音频信号各自的经立体声处理后的能量/幅度,确定所述当前声道对中两个声道各自的比特数;According to the number of bits of the current channel pair and the respective stereo-processed energy/amplitude of the audio signals of the two channels in the current channel pair, determine the respective values of the two channels in the current channel pair number of bits;
    根据所述当前声道对中两个声道各自的比特数分别对所述两个声道的音频信号进行编码。The audio signals of the two channels are encoded according to the respective bit numbers of the two channels in the current channel pair.
  3. 根据权利要求1或2所述的方法,其特征在于,所述根据所述P个声道的音频信号各自的能量/幅度和可用比特数,确定所述K个声道对各自的比特数,包括:The method according to claim 1 or 2, wherein the respective bit numbers of the K channel pairs are determined according to the respective energy/amplitude and the number of available bits of the audio signals of the P channels, include:
    根据所述P个声道的音频信号各自的能量/幅度,确定所述当前帧的能量/幅度和;Determine the energy/amplitude sum of the current frame according to the respective energy/amplitude of the audio signals of the P channels;
    根据所述K个声道对的音频信号各自的能量/幅度,与所述当前帧的能量/幅度和,确定所述K个声道对各自的比特系数;According to the energy/amplitude sum of the audio signals of the K channel pairs and the energy/amplitude sum of the current frame, determine the respective bit coefficients of the K channel pairs;
    根据所述K个声道对各自的比特系数和所述可用比特数,确定所述K个声道对各自的比特数。The respective bit numbers of the K channel pairs are determined according to the respective bit coefficients of the K channel pairs and the available number of bits.
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述P个声道的音频信号各自的能量/幅度,确定所述当前帧的能量/幅度和,包括:The method according to claim 3, wherein the determining the sum of the energy/amplitude of the current frame according to the respective energy/amplitude of the audio signals of the P channels comprises:
    根据所述P个声道的音频信号各自的经立体声处理后的能量/幅度,确定所述当前帧的能量/幅度和。The energy/amplitude sum of the current frame is determined according to the stereo-processed energy/amplitude of the audio signals of the P channels.
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述P个声道的音频信号各自的经立体声处理后的能量/幅度,确定所述当前帧的能量/幅度和,包括:The method according to claim 4, wherein the determining the energy/amplitude sum of the current frame according to the respective stereo-processed energy/amplitude of the audio signals of the P channels comprises:
    根据公式
    Figure PCTCN2021106102-appb-100001
    计算所述当前帧的能量/幅度和sum_E post
    According to the formula
    Figure PCTCN2021106102-appb-100001
    Calculate the energy/amplitude and sum_E post of the current frame;
    其中,
    Figure PCTCN2021106102-appb-100002
    in,
    Figure PCTCN2021106102-appb-100002
    其中,ch表示声道索引,E post(ch)表示声道索引为ch的声道的音频信号经立体声处理后的能量/幅度,sampleCoef post(ch,i)表示经立体声处理后的第ch声道的当前帧的第i个系数,N表示所述当前帧的系数的个数,N取大于1的正整数。 Among them, ch represents the channel index, E post (ch) represents the stereo-processed energy/amplitude of the audio signal of the channel whose channel index is ch, and sampleCoef post (ch, i) represents the ch-th sound after stereo processing. The ith coefficient of the current frame of the track, N represents the number of coefficients of the current frame, and N takes a positive integer greater than 1.
  6. 根据权利要求3所述的方法,其特征在于,所述根据所述P个声道的音频信号各自的能量/幅度,确定所述当前帧的能量/幅度和,包括:The method according to claim 3, wherein the determining the sum of the energy/amplitude of the current frame according to the respective energy/amplitude of the audio signals of the P channels comprises:
    根据所述P个声道的音频信号各自的能量/幅度均衡前的能量/幅度,确定所述当前帧的能量/幅度和,所述P个声道中的一个声道的音频信号的能量/幅度均衡前的能量/幅度包括所述一个声道的音频信号在时域的能量/幅度,或所述一个声道的音频信号经时频变换后的能量/幅度,或所述一个声道的音频信号经时频变换以及白化后的能量/幅度。Determine the energy/amplitude sum of the current frame according to the respective energy/amplitude energy/amplitude of the audio signals of the P channels before equalization, and the energy/amplitude of the audio signal of one of the P channels The energy/amplitude before amplitude equalization includes the energy/amplitude of the audio signal of the one channel in the time domain, or the energy/amplitude of the audio signal of the one channel after time-frequency transformation, or the energy/amplitude of the audio signal of the one channel. The time-frequency transformed and whitened energy/amplitude of the audio signal.
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述P个声道的音频信号各自的能量/幅度均衡前的能量/幅度,确定所述当前帧的能量/幅度和,包括:The method according to claim 6, wherein the determining the energy/amplitude sum of the current frame according to the energy/amplitude before equalization of the respective energy/amplitude of the audio signals of the P channels comprises:
    根据公式
    Figure PCTCN2021106102-appb-100003
    计算所述当前帧的能量/幅度和sum_E pre,其中,ch表示声道索引,E pre(ch)表示声道索引为ch的声道的音频信号经能量/幅度均衡前的能量/幅度。
    According to the formula
    Figure PCTCN2021106102-appb-100003
    Calculate the energy/amplitude sum_E pre of the current frame, where ch represents the channel index, and E pre (ch) represents the energy/amplitude of the audio signal of the channel whose channel index is ch before energy/amplitude equalization.
  8. 根据权利要求3所述的方法,其特征在于,所述根据所述P个声道的音频信号各自的能量/幅度,确定所述当前帧的能量/幅度和,包括:The method according to claim 3, wherein the determining the sum of the energy/amplitude of the current frame according to the respective energy/amplitude of the audio signals of the P channels comprises:
    根据所述P个声道的音频信号各自的能量/幅度均衡前的能量/幅度和所述P个声道各自的加权系数,确定所述当前帧的能量/幅度和,所述加权系数小于或等于1。Determine the energy/amplitude sum of the current frame according to the energy/amplitude of the audio signals of the P channels before equalization and the weighting coefficients of the P channels, and the weighting coefficient is less than or equal to 1.
  9. 根据权利要求8所述的方法,其特征在于,所述根据所述P个声道的音频信号各自的能量/幅度均衡前的能量/幅度和所述P个声道各自的加权系数,确定所述当前帧的能量/幅度和,包括:The method according to claim 8, characterized in that, determining the said P channels according to the energy/amplitude before equalization of the respective energy/amplitude of the audio signals of the P channels and the respective weighting coefficients of the P channels. The energy/amplitude sum of the current frame, including:
    根据公式
    Figure PCTCN2021106102-appb-100004
    计算所述当前帧的能量/幅度和sum_E pre
    According to the formula
    Figure PCTCN2021106102-appb-100004
    Calculate the energy/amplitude and sum_E pre of the current frame;
    其中,ch表示声道索引,E pre(ch)为第ch声道的音频信号经能量/幅度均衡前的能量/幅度,α(ch)为第ch声道的加权系数,一个声道对的两个声道的加权系数相同,且所述一个声道对的两个声道的加权系数大小与所述一个声道对的两个声道之间的归一化相关值成反比。 Among them, ch represents the channel index, E pre (ch) is the energy/amplitude of the audio signal of the ch-th channel before energy/amplitude equalization, α(ch) is the weighting coefficient of the ch-th channel, and the The weighting coefficients of the two channels are the same, and the magnitude of the weighting coefficients of the two channels of the one channel pair is inversely proportional to the normalized correlation value between the two channels of the one channel pair.
  10. 根据权利要求1至9任一项所述的方法,其特征在于,所述P个声道的音频信号还包括未组对的Q个声道的音频信号,P=2×K+Q,Q为正整数;The method according to any one of claims 1 to 9, wherein the audio signals of the P channels further include the audio signals of unpaired Q channels, P=2×K+Q, Q is a positive integer;
    所述根据所述P个声道的音频信号各自的能量/幅度和可用比特数,确定所述K个声道对各自的比特数,包括:Determining the respective bit numbers of the K channel pairs according to the respective energy/amplitude and the number of available bits of the audio signals of the P channels, including:
    根据所述P个声道的音频信号各自的能量/幅度,和所述可用比特数,确定所述K个声道对各自的比特数以及所述Q个声道各自的比特数;Determine the respective bit numbers of the K channel pairs and the respective bit numbers of the Q channels according to the respective energy/amplitude of the audio signals of the P channels and the available number of bits;
    所述根据所述K个声道对各自的比特数,对所述P个声道的音频信号进行编码,包括:The encoding of the audio signals of the P channels according to the respective bit numbers of the K channels includes:
    根据所述K个声道对各自的比特数分别对所述K个声道对的音频信号进行编码,根据所述Q个声道各自的比特数分别对所述Q个声道的音频信号进行编码。The audio signals of the K channel pairs are respectively encoded according to the respective bit numbers of the K channel pairs, and the audio signals of the Q channels are respectively encoded according to the respective bit numbers of the Q channels coding.
  11. 根据权利要求10所述的方法,其特征在于,所述根据所述P个声道的音频信号各自的能量/幅度,和所述可用比特数,确定所述K个声道对各自的比特数以及所述Q个声 道各自的比特数,包括:The method according to claim 10, wherein the respective bit numbers of the K channel pairs are determined according to the respective energy/amplitude of the audio signals of the P channels and the available number of bits and the respective bit numbers of the Q channels, including:
    根据所述P个声道的音频信号各自的能量/幅度,确定所述当前帧的能量/幅度和;Determine the energy/amplitude sum of the current frame according to the respective energy/amplitude of the audio signals of the P channels;
    根据所述K个声道对的音频信号各自的能量/幅度,与所述当前帧的能量/幅度和,确定所述K个声道对各自的比特系数;According to the energy/amplitude sum of the audio signals of the K channel pairs and the energy/amplitude sum of the current frame, determine the respective bit coefficients of the K channel pairs;
    根据所述Q个声道的音频信号各自的能量/幅度,与所述当前帧的能量/幅度和,确定所述Q个声道各自的比特系数;According to the energy/amplitude sum of the audio signals of the Q channels and the energy/amplitude sum of the current frame, determine the respective bit coefficients of the Q channels;
    根据所述K个声道对各自的比特系数和所述可用比特数,确定所述K个声道对各自的比特数;determining the respective bit numbers of the K channel pairs according to the respective bit coefficients of the K channel pairs and the available number of bits;
    根据所述Q个声道各自的比特系数和所述可用比特数,确定所述Q个声道各自的比特数。The respective bit numbers of the Q channels are determined according to the respective bit coefficients of the Q channels and the available number of bits.
  12. 根据权利要求1至11任一项所述的方法,其特征在于,所述根据所述K个声道对各自的比特数,对所述P个声道的音频信号进行编码,包括:The method according to any one of claims 1 to 11, wherein the encoding the audio signals of the P channels according to the respective bit numbers of the K channels comprises:
    根据所述K个声道对各自的比特数,对所述P个声道的经能量/幅度均衡后的音频信号进行编码。The energy/amplitude equalized audio signals of the P channels are encoded according to the respective bit numbers of the K channel pairs.
  13. 一种多声道音频信号编码装置,其他在于,所述装置包括:A multi-channel audio signal encoding device, the other is that the device comprises:
    获取模块,用于获取多声道音频信号的当前帧的P个声道的音频信号和所述P个声道的音频信号各自的能量/幅度,P为大于1的正整数,所述P个声道的音频信号包括K个声道对的音频信号,K为正整数;The acquisition module is used to acquire the respective energy/amplitude of the audio signals of the P channels of the current frame of the multi-channel audio signal and the audio signals of the P channels, where P is a positive integer greater than 1, and the P channels The audio signal of the channel includes the audio signal of K channel pairs, where K is a positive integer;
    比特分配模块,用于根据所述P个声道的音频信号各自的能量/幅度和可用比特数,确定所述K个声道对各自的比特数;A bit allocation module, configured to determine the respective bit numbers of the K channel pairs according to the respective energy/amplitude and the number of available bits of the audio signals of the P channels;
    编码模块,用于根据所述K个声道对各自的比特数,对所述P个声道的音频信号进行编码,以获取编码码流;an encoding module, configured to encode the audio signals of the P channels according to the respective bit numbers of the K channels to obtain an encoded code stream;
    其中,所述P个声道中的一个声道的音频信号的能量/幅度包括所述一个声道的音频信号在时域的能量/幅度、所述一个声道的音频信号经时频变换后的能量/幅度、所述一个声道的音频信号经时频变换以及白化后的能量/幅度、所述一个声道的音频信号经能量/幅度均衡后的能量/幅度、或所述一个声道的音频信号经立体声处理后的能量/幅度中至少一项。Wherein, the energy/amplitude of the audio signal of one channel of the P channels includes the energy/amplitude of the audio signal of the one channel in the time domain, the time-frequency transformation of the audio signal of the one channel The energy/amplitude of the audio signal of the one channel after time-frequency transformation and whitening, the energy/amplitude of the audio signal of the one channel after energy/amplitude equalization, or the energy/amplitude of the audio signal of the one channel At least one item of energy/amplitude of the audio signal after stereo processing.
  14. 根据权利要求13所述的装置,其特征在于,所述K个声道对包括当前声道对,所述编码模块用于:根据所述当前声道对的比特数以及所述当前声道对中两个声道的音频信号各自的经立体声处理后的能量/幅度,确定所述当前声道对中两个声道各自的比特数;根据所述当前声道对中两个声道各自的比特数分别对所述两个声道的音频信号进行编码。The apparatus according to claim 13, wherein the K channel pairs include a current channel pair, and the encoding module is configured to: according to the number of bits of the current channel pair and the current channel pair The respective stereo-processed energy/amplitude of the audio signals of the two channels in the current channel pair determines the respective bit numbers of the two channels in the current channel pair; The number of bits encodes the audio signals of the two channels, respectively.
  15. 根据权利要求14所述的装置,其特征在于,所述比特分配模块用于:The apparatus according to claim 14, wherein the bit allocation module is configured to:
    根据所述P个声道的音频信号各自的能量/幅度,确定所述当前帧的能量/幅度和;Determine the energy/amplitude sum of the current frame according to the respective energy/amplitude of the audio signals of the P channels;
    根据所述K个声道对的音频信号各自的能量/幅度,与所述当前帧的能量/幅度和,确定所述K个声道对各自的比特系数;According to the energy/amplitude sum of the audio signals of the K channel pairs and the energy/amplitude sum of the current frame, determine the respective bit coefficients of the K channel pairs;
    根据所述K个声道对各自的比特系数和所述可用比特数,确定所述K个声道对各自的比特数。The respective bit numbers of the K channel pairs are determined according to the respective bit coefficients of the K channel pairs and the available number of bits.
  16. 根据权利要求15所述的装置,其特征在于,所述比特分配模块用于:根据所述P个声道的音频信号各自的经立体声处理后的能量/幅度,确定所述当前帧的能量/幅度和。The device according to claim 15, wherein the bit allocation module is configured to: determine the energy/amplitude of the current frame according to the respective stereo-processed energy/amplitude of the audio signals of the P channels magnitude and.
  17. 根据权利要求16所述的装置,其特征在于,所述比特分配模块用于:The apparatus according to claim 16, wherein the bit allocation module is configured to:
    根据公式
    Figure PCTCN2021106102-appb-100005
    计算所述当前帧的能量/幅度和sum_E post
    According to the formula
    Figure PCTCN2021106102-appb-100005
    Calculate the energy/amplitude and sum_E post of the current frame;
    其中,
    Figure PCTCN2021106102-appb-100006
    in,
    Figure PCTCN2021106102-appb-100006
    其中,ch表示声道索引,E post(ch)表示声道索引为ch的声道的音频信号经立体声处理后的能量/幅度,sampleCoef post(ch,i)表示经立体声处理后的第ch声道的当前帧的第i个系数,N表示所述当前帧中的系数的个数,N取大于1的正整数。 Among them, ch represents the channel index, E post (ch) represents the stereo-processed energy/amplitude of the audio signal of the channel whose channel index is ch, and sampleCoef post (ch, i) represents the ch-th sound after stereo processing. The ith coefficient of the current frame of the track, N represents the number of coefficients in the current frame, and N takes a positive integer greater than 1.
  18. 根据权利要求15所述的装置,其特征在于,所述比特分配模块用于:根据所述P个声道的音频信号各自的能量/幅度均衡前的能量/幅度,确定所述当前帧的能量/幅度和,所述P个声道中的一个声道的音频信号的能量/幅度均衡前的能量/幅度包括所述一个声道的音频信号在时域的能量/幅度,或所述一个声道的音频信号经时频变换后的能量/幅度,或所述一个声道的音频信号经时频变换以及白化后的能量/幅度。The apparatus according to claim 15, wherein the bit allocation module is configured to: determine the energy of the current frame according to the energy/amplitude before equalization of the respective energy/amplitude of the audio signals of the P channels /amplitude sum, the energy/amplitude of the audio signal of one channel of the P channels before equalization includes the energy/amplitude of the audio signal of the one channel in the time domain, or the energy/amplitude of the audio signal of the one channel in the time domain, or the The energy/amplitude of the audio signal of one channel after time-frequency transformation, or the energy/amplitude of the audio signal of one channel after time-frequency transformation and whitening.
  19. 根据权利要求18所述的装置,其特征在于,所述比特分配模块用于:The apparatus according to claim 18, wherein the bit allocation module is configured to:
    根据公式
    Figure PCTCN2021106102-appb-100007
    计算所述当前帧的能量/幅度和sum_E pre,其中,ch表示声道索引,E pre(ch)表示声道索引为ch的声道的音频信号经能量/幅度均衡前的能量/幅度。
    According to the formula
    Figure PCTCN2021106102-appb-100007
    Calculate the energy/amplitude sum_E pre of the current frame, where ch represents the channel index, and E pre (ch) represents the energy/amplitude of the audio signal of the channel whose channel index is ch before energy/amplitude equalization.
  20. 根据权利要求15所述的装置,其特征在于,所述比特分配模块用于:根据所述P个声道的音频信号各自的能量/幅度均衡前的能量/幅度和所述P个声道各自的加权系数,确定所述当前帧的能量/幅度和,所述加权系数小于或等于1。The apparatus according to claim 15, wherein the bit allocation module is configured to: according to the energy/amplitude of the audio signals of the P channels before equalization and the respective energy/amplitude of the P channels The weighting coefficient of , determines the energy/amplitude sum of the current frame, and the weighting coefficient is less than or equal to 1.
  21. 根据权利要求20所述的装置,其特征在于,所述比特分配模块用于:The apparatus according to claim 20, wherein the bit allocation module is configured to:
    根据公式
    Figure PCTCN2021106102-appb-100008
    计算所述当前帧的能量/幅度和sum_E pre
    According to the formula
    Figure PCTCN2021106102-appb-100008
    Calculate the energy/amplitude and sum_E pre of the current frame;
    其中,ch表示声道索引,E pre(ch)为第ch声道的音频信号经能量/幅度均衡前的能量/幅度,α(ch)为第ch声道的加权系数,一个声道对的两个声道的加权系数相同,且所述一个声道对的两个声道的加权系数大小与所述一个声道对的两个声道之间的归一化相关值成反比。 Among them, ch represents the channel index, E pre (ch) is the energy/amplitude of the audio signal of the ch-th channel before energy/amplitude equalization, α(ch) is the weighting coefficient of the ch-th channel, and the The weighting coefficients of the two channels are the same, and the magnitude of the weighting coefficients of the two channels of the one channel pair is inversely proportional to the normalized correlation value between the two channels of the one channel pair.
  22. 根据权利要求13至21任一项所述的装置,其特征在于,所述P个声道的音频信号还包括未组对的Q个声道的音频信号,P=2×K+Q,Q为正整数;所述比特分配模块用于:根据所述P个声道的音频信号各自的能量/幅度,和所述可用比特数,确定所述K个声道对各自的比特数以及所述Q个声道各自的比特数;所述编码模块用于根据所述K个声道对各自的比特数分别对所述K个声道对的音频信号进行编码,根据所述Q个声道各自的比特数分别对所述Q个声道的音频信号进行编码。The device according to any one of claims 13 to 21, wherein the audio signals of the P channels further include audio signals of the Q channels that are not paired, P=2×K+Q, Q is a positive integer; the bit allocation module is configured to: determine the respective bit numbers of the K channel pairs and the The respective bit numbers of the Q channels; the encoding module is configured to encode the audio signals of the K channel pairs according to the respective bit numbers of the K channel pairs, and according to the respective bit numbers of the Q channels encoding the audio signals of the Q channels respectively.
  23. 根据权利要求22所述的装置,其特征在于,所述比特分配模块用于:根据所述P个声道的音频信号各自的能量/幅度,确定所述当前帧的能量/幅度和;根据所述K个声道对的音频信号各自的能量/幅度,与所述当前帧的能量/幅度和,确定所述K个声道对各自的比特系数;根据所述Q个声道的音频信号各自的能量/幅度,与所述当前帧的能量/幅度和,确定所述Q个声道各自的比特系数;根据所述K个声道对各自的比特系数和所述可用比特数,确定所述K个声道对各自的比特数;根据所述Q个声道各自的比特系数和所述可 用比特数,确定所述Q个声道各自的比特数。The apparatus according to claim 22, wherein the bit allocation module is configured to: determine the energy/amplitude sum of the current frame according to the respective energy/amplitude of the audio signals of the P channels; The respective energy/amplitude of the audio signals of the K channel pairs is summed with the energy/amplitude of the current frame to determine the respective bit coefficients of the K channel pairs; according to the respective audio signals of the Q channels and the energy/amplitude sum of the current frame to determine the respective bit coefficients of the Q channels; according to the respective bit coefficients of the K channel pairs and the number of available bits, determine the The respective bit numbers of the K channel pairs; the respective bit numbers of the Q channels are determined according to the respective bit coefficients of the Q channels and the available bit number.
  24. 根据权利要求13至23任一项所述的装置,其特征在于,The device according to any one of claims 13 to 23, characterized in that:
    所述编码模块用于根据所述K个声道对各自的比特数,对所述P个声道的经能量/幅度均衡后的音频信号进行编码。The encoding module is configured to encode the energy/amplitude equalized audio signals of the P channels according to the respective bit numbers of the K channels.
  25. 一种多声道音频信号编码方法,其特征在于,包括:A method for encoding a multi-channel audio signal, comprising:
    获取多声道音频信号的当前帧的P个声道的音频信号,P为大于1的正整数,所述P个声道的音频信号包括K个声道对的音频信号,K为正整数;Obtain the audio signals of P channels of the current frame of the multi-channel audio signal, where P is a positive integer greater than 1, the audio signals of the P channels include audio signals of K channel pairs, and K is a positive integer;
    根据所述K个声道对中当前声道对的两个声道的音频信号各自的能量/幅度,对所述当前声道对的两个声道的音频信号进行能量/幅度均衡,以获取所述当前声道对的两个声道的音频信号各自的经能量/幅度均衡后的能量/幅度;According to the respective energy/amplitude of the audio signals of the two channels of the current channel pair in the K channel pairs, perform energy/amplitude equalization on the audio signals of the two channels of the current channel pair to obtain the energy/amplitude of the respective audio signals of the two channels of the current channel pair after energy/amplitude equalization;
    根据所述当前声道对的两个声道的音频信号各自的经能量/幅度均衡后的能量/幅度,和可用比特数,确定所述当前声道对的两个声道各自的比特数;Determine the respective bit numbers of the two channels of the current channel pair according to the respective energy/amplitude equalized energy/amplitude of the audio signals of the two channels of the current channel pair and the number of available bits;
    根据所述当前声道对的两个声道各自的比特数分别对所述两个声道的音频信号进行编码,以获取编码码流。The audio signals of the two channels are encoded respectively according to the respective bit numbers of the two channels of the current channel pair, so as to obtain an encoded code stream.
  26. 根据权利要求25所述的方法,其特征在于,P=2×K,K为正整数,所述根据所述当前声道对的两个声道的音频信号各自的经能量/幅度均衡后的能量/幅度,和可用比特数,确定所述当前声道对的两个声道各自的比特数,包括:The method according to claim 25, characterized in that, P=2*K, K is a positive integer, and the energy/amplitude equalization of the audio signals of the two channels according to the current channel pair. The energy/amplitude, and the number of available bits, determine the respective number of bits of the two channels of the current channel pair, including:
    根据所述P个声道的音频信号各自的经能量/幅度均衡后的能量/幅度,确定所述当前帧的能量/幅度和;Determine the energy/amplitude sum of the current frame according to the respective energy/amplitude equalized energy/amplitude of the audio signals of the P channels;
    根据所述当前帧的能量/幅度和、所述当前声道对的两个声道的音频信号各自的经能量/幅度均衡后的能量/幅度以及所述可用比特数,确定所述当前声道对的两个声道各自的比特数。The current channel is determined according to the sum of the energy/amplitude of the current frame, the energy/amplitude after energy/amplitude equalization of the audio signals of the two channels of the current channel pair, and the number of available bits The number of bits for each of the two channels of the pair.
  27. 根据权利要求25或26所述的方法,其特征在于,所述P个声道的音频信号还包括未组对的Q个声道的音频信号,P=2×K+Q,K为正整数,Q为正整数;The method according to claim 25 or 26, wherein the audio signals of the P channels further include the audio signals of unpaired Q channels, where P=2×K+Q, and K is a positive integer , Q is a positive integer;
    所述根据所述当前声道对的两个声道的音频信号各自的经能量/幅度均衡后的能量/幅度,和所述可用比特数,确定所述当前声道对的两个声道各自的比特数,包括:According to the energy/amplitude of the audio signals of the two channels of the current channel pair after energy/amplitude equalization, and the number of available bits, determine the respective energy/amplitude of the two channels of the current channel pair. number of bits, including:
    根据所述K个声道对各自的两个声道的音频信号的经能量/幅度均衡后的能量/幅度、以及所述Q个声道的音频信号的经能量/幅度均衡后的能量/幅度,确定所述当前帧的能量/幅度和;According to the energy/amplitude equalized energy/amplitude of the audio signals of the respective two channels of the K channels, and the energy/amplitude equalized energy/amplitude of the audio signals of the Q channels , determine the energy/amplitude sum of the current frame;
    根据所述当前帧的能量/幅度和、所述当前声道对的两个声道的音频信号各自的能量/幅度以及所述可用比特数,确定所述当前声道对的两个声道各自的比特数;According to the sum of the energy/amplitude of the current frame, the respective energy/amplitude of the audio signals of the two channels of the current channel pair, and the number of available bits, determine each of the two channels of the current channel pair the number of bits;
    根据所述当前帧的能量/幅度和、所述Q个声道的音频信号各自的经能量/幅度均衡后的能量/幅度以及所述可用比特数,确定所述Q个声道各自的比特数;Determine the respective bit numbers of the Q channels according to the energy/amplitude sum of the current frame, the energy/amplitude equalized energy/amplitude of the audio signals of the Q channels, and the available number of bits ;
    根据所述当前声道对的两个声道各自的比特数分别对所述两个声道的音频信号进行编码,获取编码码流,包括:Encode the audio signals of the two channels according to the respective bit numbers of the two channels of the current channel pair, and obtain an encoded code stream, including:
    根据所述K个声道对各自的比特数分别对所述K个声道对的音频信号进行编码,根据所述Q个声道各自的比特数分别对所述Q个声道的音频信号进行编码,以获取编码码流。The audio signals of the K channel pairs are respectively encoded according to the respective bit numbers of the K channel pairs, and the audio signals of the Q channels are respectively encoded according to the respective bit numbers of the Q channels Encode to get the encoded bitstream.
  28. 一种音频信号编码装置,其特征在于,包括:An audio signal encoding device, comprising:
    获取模块,用于获取多声道音频信号的当前帧的P个声道的音频信号,P为大于1的正整数,所述P个声道的音频信号包括K个声道对的音频信号,K为正整数;an acquisition module, configured to acquire the audio signals of P channels of the current frame of the multi-channel audio signal, where P is a positive integer greater than 1, and the audio signals of the P channels include audio signals of K channel pairs, K is a positive integer;
    能量/幅度均衡模块,用于根据所述K个声道对中当前声道对的两个声道的音频信号各自的能量/幅度,对所述当前声道对的两个声道的音频信号进行能量/幅度均衡,以获取所述当前声道对的两个声道的音频信号各自的经能量/幅度均衡后的能量/幅度;An energy/amplitude equalization module, configured to compare the audio signals of the two channels of the current channel pair according to the respective energy/amplitude of the audio signals of the two channels of the current channel pair in the K channel pairs performing energy/amplitude equalization to obtain the respective energy/amplitude equalized energy/amplitude of the audio signals of the two channels of the current channel pair;
    比特分配模块,用于根据所述当前声道对的两个声道的音频信号各自的经能量/幅度均衡后的能量/幅度,和可用比特数,确定所述当前声道对的两个声道各自的比特数;A bit allocation module, configured to determine the two audio signals of the current channel pair according to the respective energy/amplitude after energy/amplitude equalization of the audio signals of the two channels of the current channel pair, and the number of available bits. The number of bits of each channel;
    编码模块,用于根据所述当前声道对的两个声道各自的比特数分别对所述两个声道的音频信号进行编码,以获取编码码流。The encoding module is configured to encode the audio signals of the two channels according to the respective bit numbers of the two channels of the current channel pair, so as to obtain an encoded code stream.
  29. 根据权利要求28所述的装置,其特征在于,P=2×K,K为正整数,所述比特分配模块用于:The device according to claim 28, wherein, P=2×K, K is a positive integer, and the bit allocation module is used for:
    根据所述P个声道的音频信号各自的经能量/幅度均衡后的能量/幅度,确定所述当前帧的能量/幅度和;Determine the energy/amplitude sum of the current frame according to the respective energy/amplitude after energy/amplitude equalization of the audio signals of the P channels;
    根据所述当前帧的能量/幅度和、所述当前声道对的两个声道的音频信号各自的经能量/幅度均衡后的能量/幅度以及所述可用比特数,确定所述当前声道对的两个声道各自的比特数。The current channel is determined according to the sum of the energy/amplitude of the current frame, the energy/amplitude after energy/amplitude equalization of the audio signals of the two channels of the current channel pair, and the number of available bits The number of bits for each of the two channels of the pair.
  30. 根据权利要求28或29所述的装置,其特征在于,所述P个声道的音频信号还包括未组对的Q个声道的音频信号,P=2×K+Q,K为正整数,Q为正整数;The device according to claim 28 or 29, wherein the audio signals of the P channels further include the audio signals of the Q channels that are not paired, and P=2×K+Q, where K is a positive integer , Q is a positive integer;
    所述比特分配模块用于:The bit allocation module is used for:
    根据所述K个声道对各自的两个声道的音频信号的经能量/幅度均衡后的能量/幅度、以及所述Q个声道的音频信号的经能量/幅度均衡后的能量/幅度,确定所述当前帧的能量/幅度和;According to the energy/amplitude equalized energy/amplitude of the audio signals of the respective two channels of the K channels, and the energy/amplitude equalized energy/amplitude of the audio signals of the Q channels , determine the energy/amplitude sum of the current frame;
    根据所述当前帧的能量/幅度和、所述当前声道对的两个声道的音频信号各自的能量/幅度以及所述可用比特数,确定所述当前声道对的两个声道各自的比特数;According to the sum of the energy/amplitude of the current frame, the respective energy/amplitude of the audio signals of the two channels of the current channel pair, and the number of available bits, determine the respective two channels of the current channel pair. the number of bits;
    根据所述当前帧的能量/幅度和、所述Q个声道的音频信号各自的经能量/幅度均衡后的能量/幅度以及所述可用比特数,确定所述Q个声道各自的比特数;According to the energy/amplitude sum of the current frame, the energy/amplitude after energy/amplitude equalization of the audio signals of the Q channels, and the available number of bits, determine the respective bit numbers of the Q channels ;
    所述编码模块用于:The encoding module is used to:
    根据所述K个声道对各自的比特数分别对所述K个声道对的音频信号进行编码,根据所述Q个声道各自的比特数分别对所述Q个声道的音频信号进行编码,以获取编码码流。The audio signals of the K channel pairs are encoded according to the respective bit numbers of the K channel pairs, and the audio signals of the Q channels are respectively encoded according to the respective bit numbers of the Q channels. Encode to get the encoded bitstream.
  31. 一种音频信号编码装置,其特征在于,包括:相互耦合的非易失性存储器和处理器,所述处理器调用存储在所述存储器中的程序代码以执行如权利要求1至12任一项所述的方法,或者以执行如权利要求25至27任一项所述的方法。An audio signal encoding device, comprising: a non-volatile memory and a processor coupled to each other, the processor calling program codes stored in the memory to execute any one of claims 1 to 12 the method described, or to perform the method according to any one of claims 25 to 27.
  32. 一种音频信号编码设备,其特征在于,包括:编码器,所述编码器用于执行如权利要求1至12任一项所述的方法,或者用于执行如权利要求25至27任一项所述的方法。An audio signal encoding device, characterized in that it comprises: an encoder, the encoder is used for performing the method as claimed in any one of claims 1 to 12, or for performing the method as claimed in any one of claims 25 to 27. method described.
  33. 一种计算机可读存储介质,其特征在于,包括计算机程序,所述计算机程序在计算机上被执行时,使得所述计算机执行权利要求1至12任一项所述的方法,或者使得所述计算机执行权利要求25至27任一项所述的方法。A computer-readable storage medium, characterized by comprising a computer program, which, when executed on a computer, causes the computer to execute the method of any one of claims 1 to 12, or causes the computer to The method of any one of claims 25 to 27 is performed.
  34. 一种计算机可读存储介质,其特征在于,包括如权利要求1至12任一项所述的方法所获得的编码码流,或者如权利要求25至27任一项所述的方法所获得的编码码流。A computer-readable storage medium, characterized in that it comprises an encoded code stream obtained by the method according to any one of claims 1 to 12, or obtained by the method according to any one of claims 25 to 27 encoding stream.
PCT/CN2021/106102 2020-07-17 2021-07-13 Multi-channel audio signal encoding method and apparatus WO2022012554A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2023502892A JP2023533367A (en) 2020-07-17 2021-07-13 Multi-channel audio signal encoding method and apparatus
EP21842335.8A EP4174853A4 (en) 2020-07-17 2021-07-13 Multi-channel audio signal encoding method and apparatus
BR112023000835A BR112023000835A2 (en) 2020-07-17 2021-07-13 MULTI-CHANNEL AUDIO SIGNAL CODING METHOD AND DEVICE
US18/154,451 US20230154472A1 (en) 2020-07-17 2023-01-13 Multi-channel audio signal encoding method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010699775.8 2020-07-17
CN202010699775.8A CN113948097A (en) 2020-07-17 2020-07-17 Multi-channel audio signal coding method and device

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/154,451 Continuation US20230154472A1 (en) 2020-07-17 2023-01-13 Multi-channel audio signal encoding method and apparatus

Publications (1)

Publication Number Publication Date
WO2022012554A1 true WO2022012554A1 (en) 2022-01-20

Family

ID=79326894

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/106102 WO2022012554A1 (en) 2020-07-17 2021-07-13 Multi-channel audio signal encoding method and apparatus

Country Status (6)

Country Link
US (1) US20230154472A1 (en)
EP (1) EP4174853A4 (en)
JP (1) JP2023533367A (en)
CN (1) CN113948097A (en)
BR (1) BR112023000835A2 (en)
WO (1) WO2022012554A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101276587A (en) * 2007-03-27 2008-10-01 北京天籁传音数字技术有限公司 Audio encoding apparatus and method thereof, audio decoding device and method thereof
US20150189457A1 (en) * 2013-12-30 2015-07-02 Aliphcom Interactive positioning of perceived audio sources in a transformed reproduced sound field including modified reproductions of multiple sound fields
CN105264595A (en) * 2013-06-05 2016-01-20 汤姆逊许可公司 Method for encoding audio signals, apparatus for encoding audio signals, method for decoding audio signals and apparatus for decoding audio signals
CN108206022A (en) * 2016-12-16 2018-06-26 南京青衿信息科技有限公司 Utilize the codec and its decoding method of AES/EBU transmission three-dimensional acoustical signals
CN109074810A (en) * 2016-02-17 2018-12-21 弗劳恩霍夫应用研究促进协会 Device and method for the stereo filling in multi-channel encoder

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPWO2007116809A1 (en) * 2006-03-31 2009-08-20 パナソニック株式会社 Stereo speech coding apparatus, stereo speech decoding apparatus, and methods thereof
TWI505262B (en) * 2012-05-15 2015-10-21 Dolby Int Ab Efficient encoding and decoding of multi-channel audio signal with multiple substreams
US20150025894A1 (en) * 2013-07-16 2015-01-22 Electronics And Telecommunications Research Institute Method for encoding and decoding of multi channel audio signal, encoder and decoder
JP7384893B2 (en) * 2018-07-04 2023-11-21 フラウンホーファー-ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン Multi-signal encoders, multi-signal decoders, and related methods using signal whitening or signal post-processing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101276587A (en) * 2007-03-27 2008-10-01 北京天籁传音数字技术有限公司 Audio encoding apparatus and method thereof, audio decoding device and method thereof
CN105264595A (en) * 2013-06-05 2016-01-20 汤姆逊许可公司 Method for encoding audio signals, apparatus for encoding audio signals, method for decoding audio signals and apparatus for decoding audio signals
US20150189457A1 (en) * 2013-12-30 2015-07-02 Aliphcom Interactive positioning of perceived audio sources in a transformed reproduced sound field including modified reproductions of multiple sound fields
CN109074810A (en) * 2016-02-17 2018-12-21 弗劳恩霍夫应用研究促进协会 Device and method for the stereo filling in multi-channel encoder
CN108206022A (en) * 2016-12-16 2018-06-26 南京青衿信息科技有限公司 Utilize the codec and its decoding method of AES/EBU transmission three-dimensional acoustical signals

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4174853A4

Also Published As

Publication number Publication date
JP2023533367A (en) 2023-08-02
US20230154472A1 (en) 2023-05-18
CN113948097A (en) 2022-01-18
EP4174853A1 (en) 2023-05-03
BR112023000835A2 (en) 2023-03-21
EP4174853A4 (en) 2023-11-22

Similar Documents

Publication Publication Date Title
US8175729B2 (en) Preserving matrix surround information in encoded audio/video system and method
WO2010125228A1 (en) Encoding of multiview audio signals
US8041041B1 (en) Method and system for providing stereo-channel based multi-channel audio coding
US20230040515A1 (en) Audio signal coding method and apparatus
GB2580899A (en) Audio representation and associated rendering
EP4082010A1 (en) Combining of spatial audio parameters
US20230137053A1 (en) Audio Coding Method and Apparatus
US20230145725A1 (en) Multi-channel audio signal encoding and decoding method and apparatus
US20230298600A1 (en) Audio encoding and decoding method and apparatus
WO2022012554A1 (en) Multi-channel audio signal encoding method and apparatus
US20230105508A1 (en) Audio Coding Method and Apparatus
WO2022110722A1 (en) Audio encoding/decoding method and device
WO2022257824A1 (en) Three-dimensional audio signal processing method and apparatus
WO2022237851A1 (en) Audio encoding method and apparatus, and audio decoding method and apparatus
US20220392460A1 (en) Enabling stereo content for voice calls
WO2022242534A1 (en) Encoding method and apparatus, decoding method and apparatus, device, storage medium and computer program
WO2022253187A1 (en) Method and apparatus for processing three-dimensional audio signal
WO2022012553A1 (en) Coding/decoding method and apparatus for multi-channel audio signal
WO2023051368A1 (en) Encoding and decoding method and apparatus, and device, storage medium and computer program product
WO2023051367A1 (en) Decoding method and apparatus, and device, storage medium and computer program product
CN115410585A (en) Audio data encoding and decoding method, related device and computer readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21842335

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2023502892

Country of ref document: JP

Kind code of ref document: A

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112023000835

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 2021842335

Country of ref document: EP

Effective date: 20230127

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 112023000835

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20230116