WO2022012675A1 - 多声道音频信号的编码方法和装置 - Google Patents

多声道音频信号的编码方法和装置 Download PDF

Info

Publication number
WO2022012675A1
WO2022012675A1 PCT/CN2021/106826 CN2021106826W WO2022012675A1 WO 2022012675 A1 WO2022012675 A1 WO 2022012675A1 CN 2021106826 W CN2021106826 W CN 2021106826W WO 2022012675 A1 WO2022012675 A1 WO 2022012675A1
Authority
WO
WIPO (PCT)
Prior art keywords
channel
mode
energy
pair
channel signals
Prior art date
Application number
PCT/CN2021/106826
Other languages
English (en)
French (fr)
Inventor
王智
丁建策
王宾
王喆
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21841790.5A priority Critical patent/EP4174852A4/en
Priority to BR112023000667A priority patent/BR112023000667A2/pt
Priority to KR1020237004414A priority patent/KR20230035383A/ko
Priority to JP2023503019A priority patent/JP2023534049A/ja
Priority to AU2021310236A priority patent/AU2021310236A1/en
Publication of WO2022012675A1 publication Critical patent/WO2022012675A1/zh
Priority to US18/154,486 priority patent/US20230186924A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/002Dynamic bit allocation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/22Mode decision, i.e. based on audio signal content versus external parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/06Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients

Definitions

  • the present application relates to audio processing technology, and in particular, to a method and apparatus for encoding multi-channel audio signals.
  • Encoding and decoding of multi-channel audio is a technique for encoding or decoding audio that contains more than two channels.
  • Common multi-channel audios include 5.1-channel audio, 7.1-channel audio, 7.1.4-channel audio, and 22.2-channel audio.
  • MPS MPEG Surround
  • the present application provides a multi-channel audio signal encoding method and apparatus, so as to make the encoding method of audio frames more diverse and more efficient.
  • the present application provides a method for encoding a multi-channel audio signal, including: acquiring a first audio frame to be encoded, where the first audio frame includes at least five channel signals; The at least five channel signals are paired to obtain a first channel pair set, the first channel pair set includes at least one channel pair, and one channel pair includes one of the at least five channel signals.
  • the channel signal is encoded, and the target group pair mode is the first group pair mode or the second group pair mode.
  • the first audio frame in this embodiment may be any frame of multi-channel audio to be encoded, and the first audio frame includes five or more channel signals. Coding two channel signals with higher correlation together can reduce redundancy and improve coding efficiency. Therefore, in this embodiment, the pairing is determined according to the correlation value between the two channel signals. In order to find the pairing mode with the highest correlation as much as possible, the correlation value between at least five channel signals in the first audio frame can be calculated to obtain the correlation value set of the first audio frame.
  • the first pairing method includes: selecting a channel pair from channel pairs corresponding to at least five channel signals and adding the channel pair to the first channel pair set for the purpose of obtaining the sum of the maximum correlation values.
  • the sum of the first correlation values is the sum of the correlation values of all channel pairs in the first channel pair set corresponding to the first pair mode.
  • the second group pairing method includes: firstly adding the channel pair with the largest correlation value among the channel pairs corresponding to the at least five channel signals into the second channel pair set; then centering the channel pairs corresponding to the at least five channel signals The channel pair with the largest correlation value among other channel pairs except the associated channel is added to the second channel pair set, and the associated channel pair includes the channel signal included in the channel pair that has been added to the first channel pair set any one of .
  • the sum of the second correlation values is the sum of the correlation values of all channel pairs in the second channel pair set corresponding to the second pair mode.
  • two grouping methods are fused, and according to the sum of the correlation values corresponding to the grouping methods, it is determined whether to adopt the grouping method of the prior art, or to adopt the grouping method aiming at maximizing the sum of the correlation values, so that the audio frame more diverse and efficient coding methods.
  • the method of determining the target group pair of the at least five channel signals according to the sum of the first correlation value and the sum of the second correlation value includes: when the first correlation value is When the sum of a correlation value is greater than the sum of the second correlation value, the target group pair mode is determined to be the first group pair mode; when the sum of the first correlation value is equal to the sum of the second correlation value When , the target group pairing mode is determined to be the second group pairing mode.
  • the target group pair method is determined according to the sum of the correlation values, so that the sum of the correlation values of all the channel pairs included in the target channel pair set can be as large as possible, and the number of channel pairs in the group pair can be increased as much as possible. Redundancy between channel signals.
  • the method before the encoding the at least five channel signals according to the target group pairing method, the method further includes: acquiring fluctuation interval values of the at least five channel signals; When the target group pairing method is the first group pairing method, the energy equalization mode is determined according to the fluctuation interval value of the at least five channel signals; when the target group pairing method is the second group pairing method, Determine the energy equalization mode according to the fluctuation interval value of the at least five channel signals and determine the target group pairing mode of the at least five channel signals again; performing energy equalization processing to obtain at least five equalized channel signals; correspondingly, the encoding the at least five channel signals according to the target group pairing method includes: encoding the at least five channel signals according to the target group pairing method At least five equalized channel signals are encoded.
  • the aforementioned energy equalization may also be amplitude equalization
  • the object of energy equalization processing is energy
  • the object of amplitude equalization processing is amplitude.
  • the first energy equalization mode is the Pair energy equalization mode.
  • this mode for any channel pair, only two channel signals in the channel pair are used to obtain two equalized channel signals corresponding to the channel pair.
  • the meaning of only using the representation is that when obtaining the equalized channel signal, the energy equalization process is performed only according to the two channel signals included in the channel pair in the unit of channel pair, and the obtained two equalized channel signals are obtained.
  • the channel signal is also only related to the two channel signals, and other channel signals other than the channel pair are not required to participate in energy equalization. However, it is only used and is not used to limit the information content involved in the energy equalization process.
  • the second energy equalization mode is an overall energy equalization mode, which uses two channel signals in one channel pair and at least one channel signal outside one channel to obtain two equalized channel signals corresponding to one channel pair. It should be noted that the present application may also adopt other energy balancing modes, which are not specifically limited.
  • the energy equalization mode may be further determined according to the fluctuation interval values of the at least five channel signals.
  • the energy equalization mode may be further determined according to the at least five channel signals. The fluctuation interval value of the energy equalization mode is determined and the target group pairing mode of at least five channel signals can be determined again. more diverse and efficient coding methods.
  • the determining the energy equalization mode according to the fluctuation interval value of the at least five channel signals includes: when the fluctuation interval value meets a preset condition, determining that the energy equalization mode is the first energy balance mode; or, when the fluctuation interval value does not meet a preset condition, determine that the energy balance mode is the second energy balance mode.
  • the determining the energy equalization mode according to the fluctuation interval value of the at least five channel signals and determining the target group pairing mode of the at least five channel signals again includes: when the at least five channel signals are When the fluctuation interval value meets the preset condition, it is determined that the target group pairing mode is the first group pairing mode, and the energy balance mode is the first energy balance mode; or, when the fluctuation interval value does not meet the predetermined condition When the conditions are set, it is determined that the target group pairing mode is the second group pairing mode, and the energy balancing mode is the second energy balancing mode.
  • the method before the determining the energy equalization mode according to the fluctuation interval value of the at least five channel signals, the method further includes: judging whether the encoding bit rate corresponding to the first audio frame is greater than the bit rate Threshold, optional, in one embodiment, the bit rate threshold can be set to 28kbps/(number of valid channel signals/frame rate); 28kbps can also be other empirical values, such as 30kbps, 26kbps, etc.
  • the effective channel signal refers to other channel signals except LFE, for example, the channel signal of 5.1 channel except LFE includes C, L, R, LS, RS, and the channel signal of 7.1 channel except LFE
  • the signal includes C, L, R, LS, RS, LB, and RB; when the encoding code rate is greater than the code rate threshold, it is determined that the energy equalization mode is the second energy equalization mode; when the encoding code When the code rate is less than or equal to the code rate threshold, the energy balancing mode is determined according to the fluctuation interval value.
  • the frame rate refers to the number of frames processed per unit time.
  • the sampling rate is 48000Hz
  • the number of sampling samples corresponding to one audio frame is 960
  • the coding efficiency can be improved by adding the factor of the coding rate.
  • the fluctuation interval value includes the energy flatness of the first audio frame; the fluctuation interval value conforming to a preset condition means that the energy flatness is less than a first threshold, for example, the first audio frame
  • a threshold value may be 0.483; or, the fluctuation interval value includes the amplitude flatness of the first audio frame; the fluctuation interval value meeting the preset condition means that the amplitude flatness is less than a second threshold, for example, the second
  • the threshold value may be 0.695; or, the fluctuation interval value includes the energy deviation degree of the first audio frame; the fluctuation interval value meeting the preset condition means that the energy deviation degree is not within the first preset range, for example, this The first preset range may be 0.04 to 25; or, the fluctuation interval value includes the amplitude deviation degree of the first audio frame; the fluctuation interval value meeting the preset condition means that the amplitude deviation degree is not within the second preset range.
  • the second preset range may be 0.2-5.
  • the energy equalization mode is determined with reference to characteristics of multiple dimensions of the channel signal, which can improve the accuracy of energy equalization.
  • the performing group pairing on the at least five channel signals according to the first group pairing manner to obtain the first channel pair set includes: for the purpose of obtaining the sum of the maximum correlation values, A channel pair is selected from the channel pairs corresponding to the at least five channel signals and added to the first channel pair set.
  • the performing group pairing on the at least five channel signals according to the second group pairing manner to obtain the second channel pair set includes: firstly matching the at least five channel signals to The channel pair with the largest correlation value in the channel pair is added to the second channel pair set; then the channel pairs corresponding to the at least five channel signals except the associated channel are centrally correlated. The channel pair with the largest value is added to the second channel pair set, and the associated channel pair includes any one of the channel signals included in the channel pair that has been added to the first channel pair set.
  • the energy equalization mode when the energy equalization mode is the first energy equalization mode, performing energy equalization processing on the at least five channel signals according to the energy equalization mode to obtain at least Five equalized channel signals, including: for the current channel pair in the target channel pair set corresponding to the group pair manner, calculating the average of the energy or amplitude values of the two channel signals included in the current channel pair and performing energy equalization processing on the two channel signals according to the average value to obtain two corresponding equalized channel signals.
  • performing energy equalization processing on the at least five channel signals according to the energy equalization mode to obtain at least Equalizing five channel signals comprising: calculating an average value of energy or amplitude values of the at least five channel signals, and performing energy equalization processing on the at least five channel signals according to the average value to obtain the at least five channel signals.
  • Five equalized channel signals comprising: calculating an average value of energy or amplitude values of the at least five channel signals, and performing energy equalization processing on the at least five channel signals according to the average value to obtain the at least five channel signals.
  • the present application provides an encoding device, comprising: an acquisition module configured to acquire a first audio frame to be encoded, where the first audio frame includes at least five channel signals; The at least five channel signals are paired to obtain a first channel pair set, the first channel pair set includes at least one channel pair, and one channel pair includes two of the at least five channel signals a channel signal; obtain the sum of the first correlation values of the first channel pair set, the one channel pair has one correlation value, and the correlation value is used to represent the two sound channels of the one channel pair.
  • the at least five channel signals are encoded in an encoding mode, and the target group pair mode is the first group pair mode or the second group pair mode.
  • the determining module is specifically configured to, when the sum of the first correlation value is greater than the sum of the second correlation value, determine that the target group pair mode is the first group pair mode; when the sum of the first correlation value is equal to the sum of the second correlation value, the target group pair mode is determined to be the second group pair mode.
  • the determining module is further configured to acquire the fluctuation interval value of the at least five channel signals; when the target group pairing method is the first group pairing method, according to the The fluctuation interval values of the at least five channel signals determine an energy equalization mode; when the target group pairing mode is the second group pairing mode, determine the energy equalization mode according to the fluctuation interval values of the at least five channel signals and determine the target group pairing mode of the at least five channel signals again; correspondingly, the encoding module is further configured to perform energy equalization processing on the at least five channel signals according to the energy equalization mode to obtain at least five equalized channel signals; the at least five equalized channel signals are encoded according to the target group pair manner.
  • the determining module is specifically configured to determine that the energy balance mode is the first energy balance mode when the fluctuation interval value meets a preset condition; When the interval value does not meet the preset condition, it is determined that the energy balance mode is the second energy balance mode.
  • the determining module is specifically configured to, when the fluctuation interval value meets a preset condition, determine that the target group pair mode is the first group pair mode, the energy balance mode is the first energy balance mode; or, when the fluctuation interval value does not meet the preset condition, determine that the target group pairing mode is the second group pairing mode, and the energy balance mode is the second Energy balance mode.
  • the determining module is further configured to determine whether the encoding bit rate corresponding to the first audio frame is greater than a bit rate threshold; when the encoding bit rate is greater than the bit rate threshold, The energy equalization mode is determined to be the second energy equalization mode; the energy equalization mode is determined according to the fluctuation interval value only when the coding rate is less than or equal to the code rate threshold.
  • the fluctuation interval value includes the energy flatness of the first audio frame; the fluctuation interval value conforming to a preset condition means that the energy flatness is less than a first threshold; or, the The fluctuation interval value includes the amplitude flatness of the first audio frame; the fluctuation interval value conforming to the preset condition means that the amplitude flatness is less than the second threshold; or, the fluctuation interval value includes the first audio frequency
  • the energy deviation degree of the frame; the fluctuation interval value meeting the preset condition means that the energy deviation degree is not within the first preset range; or, the fluctuation interval value includes the amplitude deviation degree of the first audio frame;
  • the fact that the fluctuation interval value meets the preset condition means that the amplitude deviation is not within the second preset range.
  • the obtaining module is specifically configured to select a channel pair from the channel pairs corresponding to the at least five channel signals and add them to the first channel for the purpose of obtaining the maximum correlation value sum.
  • the obtaining module is specifically configured to firstly add the channel pair with the largest correlation value among the channel pairs corresponding to the at least five channel signals into the second channel pair set; Then, the channel pair with the largest correlation value among the channel pairs corresponding to the at least five channel signals except the associated channel is added to the second channel pair set, and the associated channel pair Any one of the channel signals included in the channel pair that has been added to the first channel pair set is included.
  • the encoding module when the energy equalization mode is the first energy equalization mode, is specifically configured to target the current sound in the target channel pair set corresponding to the group pair mode channel pair, calculate the average value of the energy or amplitude values of the two channel signals included in the current channel pair, and perform energy equalization processing on the two channel signals according to the average value to obtain the corresponding two channel signals. Equalize the channel signal.
  • the encoding module is specifically configured to calculate the average value of the energy or amplitude values of the at least five channel signals and performing energy equalization processing on the at least five channel signals according to the average value to obtain the at least five equalized channel signals.
  • the present application provides a device, comprising: one or more processors; a memory for storing one or more programs; when the one or more programs are executed by the one or more processors, The one or more processors are caused to implement the method of any of the above first aspects.
  • the present application provides a computer-readable storage medium, comprising a computer program, which, when executed on a computer, causes the computer to execute the method according to any one of the above-mentioned first aspects.
  • the present application provides a computer-readable storage medium, comprising an encoded code stream obtained according to the encoding method for a multi-channel audio signal according to any one of the foregoing first aspects.
  • FIG. 1 exemplarily presents a schematic block diagram of an audio decoding system 10 applied in the present application
  • FIG. 2 exemplarily presents a schematic block diagram of an audio decoding device 200 to which the present application is applied;
  • FIG. 3 is a flowchart of an exemplary embodiment of a method for encoding a multi-channel audio signal provided by the present application
  • FIG. 4 is an exemplary structural diagram of an encoding device to which the encoding method for a multi-channel audio signal provided by the present application is applied;
  • Fig. 5a is an exemplary structural diagram of the mode selection module
  • 5b is an exemplary structural diagram of a multi-channel mode selection unit
  • FIG. 6 is an exemplary structural diagram of a decoding device to which the multi-channel audio decoding method provided by the present application is applied;
  • FIG. 7 is a schematic structural diagram of an embodiment of an encoding device of the present application.
  • FIG. 8 is a schematic structural diagram of an embodiment of a device of the present application.
  • At least one (item) refers to one or more, and "a plurality” refers to two or more.
  • “And/or” is used to describe the relationship between related objects, indicating that there can be three kinds of relationships, for example, “A and/or B” can mean: only A, only B, and both A and B exist , where A and B can be singular or plural.
  • the character “/” generally indicates that the associated objects are an “or” relationship.
  • At least one item(s) below” or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s).
  • At least one (a) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c" ", where a, b, c can be single or multiple.
  • Audio frame Audio data is streaming.
  • the amount of audio data within a period of time is usually taken as a frame of audio. This period is called “sampling time", which can be determined according to the codec. Determine its value according to the requirements of the device and specific applications, for example, the duration is 2.5ms to 60ms, and ms is milliseconds.
  • Audio signal is the information carrier of frequency and amplitude variation of regular sound waves with speech, music and sound effects. Audio is a continuously changing analog signal that can be represented by a continuous curve called a sound wave. Audio is a digital signal generated by analog-to-digital conversion or by a computer. Sound waves have three important parameters: frequency, amplitude and phase, which determine the characteristics of the audio signal.
  • Channel signal refers to the independent audio signals that are collected or played back at different spatial positions during recording or playback. Therefore, the number of channels is the number of sound sources during sound recording or the number of speakers during playback.
  • FIG. 1 exemplarily shows a schematic block diagram of an audio decoding system 10 applied in the present application.
  • the audio coding system 10 may include a source device 12 and a destination device 14.
  • the source device 12 generates an encoded code stream, and thus, the source device 12 may be referred to as an audio encoding device.
  • the destination device 14 may decode the encoded codestream generated by the source device 12, and thus, the destination device 14 may be referred to as an audio decoding device.
  • the source device 12 includes an encoder 20 and, optionally, an audio source 16 , an audio preprocessor 18 , and a communication interface 22 .
  • Audio source 16 may include or be any type of audio capture device for capturing real world speech, music, sound effects, etc., and/or any type of audio generation device, such as an audio processor for generating speech, music, and sound effects or equipment.
  • the audio source may be any type of memory or storage that stores the above audio.
  • the audio preprocessor 18 is used to receive (raw) audio data 17 and to preprocess the audio data 17 to obtain preprocessed audio data 19 .
  • the preprocessing performed by the audio preprocessor 18 may include trimming or denoising. It is understood that the audio preprocessing unit 18 may be an optional component.
  • An encoder 20 is used to receive preprocessed audio data 19 and provide encoded audio data 21 .
  • a communication interface 22 in source device 12 may be used to receive encoded audio data 21 and send encoded audio data 21 over communication channel 13 to destination device 14 for storage or direct reconstruction.
  • the destination device 14 includes a decoder 30 and, optionally, a communication interface 28 , an audio post-processor 32 and a playback device 34 .
  • the communication interface 28 in the destination device 14 is used to receive the encoded audio data 21 directly from the source device 12 and to provide the encoded audio data 21 to the decoder 30 .
  • Communication interface 22 and communication interface 28 may be used through a direct communication link between source device 12 and destination device 14, such as a direct wired or wireless connection, etc., or through any type of network, such as a wired network, a wireless network, or any A combination, any type of private network and public network, or any type of combination, transmits or receives encoded audio data 21 .
  • the communication interface 22 may be used to encapsulate the encoded audio data 21 into a suitable format such as a message, and/or to process the encoded audio data 21 using any type of transfer encoding or processing for transmission over a communication link or communication network .
  • the communication interface 28 corresponds to the communication interface 22 and may be used, for example, to receive transmission data and process the transmission data to obtain encoded audio data 21 using any type of corresponding transmission decoding or processing and/or decapsulation.
  • Both the communication interface 22 and the communication interface 28 can be configured as a one-way communication interface as indicated by the arrow in FIG. 1 from the corresponding communication channel 13 of the source device 12 to the destination device 14, or a two-way communication interface, and can be used to send and receive messages etc. to establish a connection, acknowledge and exchange any other information related to a communication link and/or data transfer such as encoded audio data, etc.
  • a decoder 30 is used to receive encoded audio data 21 and provide decoded audio data 31 .
  • the audio post-processor 32 is used for post-processing the decoded audio data 31 to obtain post-processed post-processed audio data 33 .
  • the post-processing performed by the audio post-processor 32 may include, for example, trimming or resampling, and the like.
  • Playback device 34 is used to receive post-processed audio data 33 to play audio to a user or listener.
  • Playback device 34 may be or include any type of player for playing reconstructed audio, eg, integrated or external speakers.
  • speakers may include speakers, speakers, and the like.
  • FIG. 2 exemplarily shows a schematic block diagram of an audio decoding device 200 applied in the present application.
  • the audio coding apparatus 200 may be an audio decoder (eg, decoder 30 of FIG. 1 ) or an audio encoder (eg, encoder 20 of FIG. 1 ).
  • the audio decoding device 200 includes: an input port 210 and a receiving unit (Rx) 220 for receiving data, a processor, logic unit or central processing unit 230 for processing data, and a transmitting unit (Tx) 240 for transmitting data and egress port 250, and a memory 260 for storing data.
  • the audio decoding device 200 may also include photoelectric conversion components and electro-optical (EO) components coupled with the input port 210, the receiving unit 220, the transmitting unit 240, and the output port 250 for the exit or entrance of optical or electrical signals.
  • EO electro-optical
  • the processor 230 is implemented by hardware and software.
  • the processor 230 may be implemented as one or more CPU chips, cores (eg, multi-core processors), FPGAs, ASICs, and DSPs.
  • the processor 230 communicates with the ingress port 210 , the receiving unit 220 , the transmitting unit 240 , the egress port 250 and the memory 260 .
  • the processor 230 includes a decoding module 270 (eg, an encoding module or a decoding module).
  • the decoding module 270 implements the embodiments disclosed in this application, so as to implement the encoding method for multi-channel audio signals provided in this application.
  • the transcoding module 270 implements, processes, or provides various encoding operations.
  • decoding module 270 is implemented as instructions stored in memory 260 and executed by processor 230.
  • Memory 260 includes one or more magnetic disks, tape drives, and solid-state drives, and may serve as an overflow data storage device for storing programs as they are selectively executed, and for storing instructions and data read during program execution.
  • Memory 260 may be volatile and/or non-volatile, and may be read only memory (ROM), random access memory (RAM), random access memory (ternary content-addressable memory, TCAM) and/or static Random Access Memory (SRAM).
  • ROM read only memory
  • RAM random access memory
  • TCAM ternary content-addressable memory
  • SRAM static Random Access Memory
  • the present application provides a method for encoding a multi-channel audio signal.
  • FIG. 3 is a flowchart of an exemplary embodiment of a method for encoding a multi-channel audio signal provided by the present application.
  • the process 300 may be performed by the source device 12 or the audio coding device 200 in the audio coding system 10 .
  • Process 300 is described as a series of steps or operations, and it should be understood that process 300 may be performed in various orders and/or concurrently, and is not limited to the order of execution shown in FIG. 3 .
  • the method includes:
  • Step 301 Obtain a first audio frame to be encoded.
  • the first audio frame in this embodiment may be any frame of multi-channel audio to be encoded, and the first audio frame includes five or more channel signals.
  • a 5.1 channel includes a center channel (C), a front left channel (left, L), a front right channel (right, R), a left surround channel (LS), a back The right surround channel (right surround, RS) and the 0.1 channel low frequency effects (low frequency effects, LFE) a total of six channel signals.
  • the 7.1 channel includes C, L, R, LS, RS, LB, RB and LFE a total of eight channel signals, where LFE is the audio channel from 3-120Hz, which is usually sent to a channel specially designed for low tones. designed speakers.
  • Step 302 Perform group pairing on at least five channel signals according to the first group pairing manner to obtain a first channel pair set.
  • the first set of channel pairs includes at least one channel pair including two of the at least five channel signals.
  • Step 303 Obtain the sum of the first correlation values of the first channel pair set.
  • a channel pair has a correlation value that represents the correlation between the two channel signals of a channel pair.
  • the pairing is determined according to the correlation value between the two channel signals.
  • a correlation value set of the first audio frame may be obtained by first calculating the correlation values between at least five channel signals in the first audio frame. For example, five channel signals may form 10 channel pairs in total, and correspondingly, the correlation value set may include 10 correlation values.
  • the correlation values can be normalized, so that the correlation values of all channel pairs are limited to a specific range, so as to set a unified judgment standard for the correlation values, such as a group pair threshold, the group pair threshold can be Set to a value greater than or equal to 0.2 and less than or equal to 1, for example, it can be 0.3, so that as long as the normalized correlation value of the two channel signals is less than the group pair threshold, the correlation between the two channel signals is considered to be relatively high. Poor, no group pair encoding is required.
  • the correlation value between two channel signals can be calculated using the following formula:
  • corr(ch1, ch2) represents the normalized correlation value between the channel signal ch1 and the channel signal ch2
  • spec_ch1(i) represents the frequency domain coefficient of the ith frequency point of the channel signal ch1
  • spec_ch2(i ) is the frequency domain coefficient of the ith frequency point of the channel signal ch2
  • N represents the total number of frequency points of an audio frame.
  • the first pairing method includes: selecting a channel pair from channel pairs corresponding to at least five channel signals and adding the channel pair to the first channel pair set for the purpose of obtaining the sum of the maximum correlation values.
  • the sum of the first correlation values is the sum of the correlation values of all channel pairs in the first channel pair set obtained by performing group pairing on at least five channel signals according to the first group pairing method.
  • the first pairing mode in this embodiment may include the following two implementation modes:
  • the M correlation values must be greater than or equal to the group pair threshold. This is because the correlation value smaller than the group pair threshold indicates the corresponding channel pair. The correlation between the two channel signals in is low, and no group pair coding is necessary. In order to improve the coding efficiency, it is not necessary to select all the correlation values greater than or equal to the group pair threshold, so an upper limit N of M is set, that is, at most N correlation values can be selected.
  • N can be an integer greater than or equal to 2, and the maximum value of N cannot exceed the number of all channel pairs corresponding to all channel signals of the first audio frame.
  • the larger the value of N the larger the amount of computation involved, while the smaller the value of N is, the channel pair set may be lost, thereby reducing the coding efficiency.
  • each channel pair set includes at least one of the M channel pairs corresponding to the M correlation values, and when the channel pair set includes more than two channels
  • the 3 channel pairs corresponding to the largest correlation value selected according to the correlation value set are (L, R), (R, C) and (LS, RS), where the correlation of (LS, RS) The value is less than the group pair threshold, so it is excluded, then the remaining two channel pairs (L, R) and (R, C) can obtain two channel pair sets, one of which includes (L ,R) and the other includes (R,C).
  • the method for acquiring the M channel pair sets in this embodiment may include: The first channel pair is added to the first channel pair set, and the M channel pair sets include the first channel pair set.
  • the other channel pairs except the associated channel in the multiple channel pairs include correlation values
  • a channel pair whose correlation value is greater than the group pair threshold is included, select a channel pair with the largest correlation value from other channel pairs and add it to the first channel pair set.
  • step b may be iteratively executed.
  • the correlation value smaller than the pair pair threshold may be deleted from the correlation value set, so that the number of channel pairs can be reduced, thereby reducing the number of iterations.
  • the correlation value set includes correlation values of multiple channel pairs of at least five channel signals of the first audio frame, and the multiple channel pairs are regularly combined (that is, multiple sound channels in the same channel pair set are combined. Channel pairs cannot contain the same channel signal), and multiple channel pair sets corresponding to the at least five channel signals can be obtained.
  • the following formula can be used to calculate the number of all channel pair sets:
  • Pair_num represents the number of all channel pair sets
  • CH represents the number of channel signals involved in multi-channel processing in the first audio frame, which is the result of filtering by the multi-channel mask.
  • multiple channel pair sets can be obtained according to other channel pairs in the multiple channel pairs that are not related to the outside, and the correlation value of the uncorrelated channel pair can be obtained. is smaller than the group pair threshold, in this way, the number of channel pairs participating in the calculation can be reduced when obtaining the channel pair set, thereby reducing the number of channel pair sets, and the calculation amount of the sum of correlation values can also be reduced in subsequent steps.
  • Step 304 Perform group pairing on at least five channel signals according to the second group pairing manner to obtain a second channel pair set.
  • Step 305 Obtain the sum of the second correlation values of the second channel pair set.
  • the second group pairing method includes: firstly adding the channel pair with the largest correlation value among the channel pairs corresponding to the at least five channel signals into the second channel pair set; then centering the channel pairs corresponding to the at least five channel signals The channel pair with the largest correlation value among other channel pairs except the associated channel is added to the second channel pair set, and the associated channel pair includes the channel signal included in the channel pair that has been added to the first channel pair set any one of .
  • the sum of the second correlation values is the sum of the correlation values of all channel pairs in the second channel pair set obtained by performing group pairing on at least five channel signals according to the second group pairing method.
  • Step 306 Determine the target group pair mode of the at least five channel signals according to the sum of the first correlation value and the sum of the second correlation value.
  • the target group pairing mode is determined as the first group pairing mode; when the first correlation value sum is equal to the second correlation value sum, the target group pairing mode is determined as The second set of pairs.
  • Step 307 Obtain fluctuation interval values of at least five channel signals.
  • the fluctuation interval value is used to represent the difference in energy or amplitude between at least five channel signals.
  • Step 308 When the target group pairing mode is the first group pairing mode, determine the energy equalization mode according to the fluctuation interval values of the at least five channel signals.
  • the energy equalization mode includes a first energy equalization mode and a second energy equalization mode, wherein the first energy equalization mode uses two channel signals in one channel pair to obtain two equalized channel signals corresponding to one channel pair.
  • the second energy equalization mode uses two channel signals in one channel pair and at least one channel signal outside one channel to obtain two equalized channel signals corresponding to one channel pair.
  • Determining the energy equalization mode according to the fluctuation interval values of the at least five channel signals may include: when the fluctuation interval value meets the preset condition, determining that the energy equalization mode is the first energy equalization mode; when the fluctuation interval value does not meet the preset condition, determining the energy equalization mode as the first energy equalization mode; It is determined that the energy balancing mode is the second energy balancing mode.
  • the above-mentioned fluctuation interval value includes the energy flatness of the first audio frame, and the fluctuation interval value meets the preset condition means that the energy flatness is smaller than the first threshold; or, the fluctuation interval value includes the amplitude flatness of the first audio frame, and the fluctuation interval value meets the The preset condition means that the amplitude flatness is less than the second threshold; or, the fluctuation interval value includes the energy deviation degree of the first audio frame, and the fluctuation interval value conforming to the preset condition means that the energy deviation degree is not within the first preset range; or, The fluctuation interval value includes the amplitude deviation degree of the first audio frame, and the fluctuation interval value conforming to the preset condition means that the amplitude deviation degree is not within the second preset range.
  • the energy flatness represents the fluctuation of the frame energy after the frequency domain coefficient energy of the current frame of the multiple channels screened by the multi-channel screening unit is normalized, and the flatness calculation formula can be used. to measure.
  • the energy flatness of the current frame is 1; when the energy of a certain channel of the current frame is 0, the energy flatness of the current frame is 0, so the inter-channel energy flatness is 0.
  • the value range of energy flatness is [0,1]. The greater the fluctuation of the energy between channels, the smaller the value of its energy flatness.
  • a uniform first threshold may be set for all channel formats (eg, 5.1, 7.1, 9.1, 11.1), for example, it may be 0.483, 0.492, or 0.504, and so on.
  • different first thresholds are set for different channel formats. For example, the first threshold value of the 5.1 channel format is 0.511, the first threshold value of the 7.1 channel format is 0.563, the first threshold value of the 9.1 channel format is 0.608, and the first threshold value of the 11.1 channel format is 0.654.
  • the amplitude flatness represents the fluctuation of the frame amplitude after the frequency domain coefficient amplitudes of the current frames of the multiple channels screened by the multi-channel screening unit are normalized, and can be measured by the flatness calculation formula.
  • the frame amplitude of all channels is the same, its flatness is 1; when the frame amplitude of one of the channels is 0, its flatness is 0. Therefore the range of amplitude flatness is between [0,1].
  • a uniform second threshold may be set for all channel formats (eg, 5.1, 7.1, 9.1, 11.1), for example, may be 0.695, 0.701, or 0.710, and so on.
  • different second thresholds may be given for different channel formats, for example, the second threshold of 5.1 channel format may be 0.715, the second threshold of 7.1 channel format may be 0.753, and the second threshold of 9.1 channel format
  • the second threshold for the channel format may be 0.784, and the second threshold for the 11.1 channel format may be 0.809.
  • an energy equalization mode may be determined by using the above-mentioned various pieces of information representing fluctuation interval values of at least five channel signals, which include energy flatness, amplitude flatness, energy deviation, or amplitude deviation.
  • the energy equalization mode is the first energy equalization mode; when the energy flatness of the first audio frame is greater than or equal to the first threshold, the energy equalization mode is determined to be the second energy equalization mode.
  • the energy equalization mode is the first energy equalization mode; when the amplitude flatness of the first audio frame is greater than or equal to the second threshold, the energy equalization mode is determined to be the second energy equalization mode.
  • the energy equalization mode may also be determined according to the encoding bit rate corresponding to the first audio frame, that is, the encoding code Whether the code rate is greater than the code rate threshold, when the code rate is greater than the code rate threshold, the energy equalization mode is determined to be the second energy equalization mode; when the code rate is less than or equal to the code rate threshold, according to at least five channel signals
  • the fluctuation interval value determines the energy balance mode.
  • Step 309 When the target group pairing mode is the second group pairing mode, determine the energy equalization mode according to the fluctuation interval value of the at least five channel signals and determine the target group pairing mode of the at least five channel signals again.
  • the target group pairing mode is determined to be the first group pairing mode, and the energy balance mode is the first energy equalization mode; when the fluctuation interval value does not meet the preset conditions, the target group pairing mode is determined to be the first energy equalization mode.
  • the energy balance mode is the second energy balance mode.
  • step 308 If the fluctuation interval value and the fluctuation interval value meet the preset conditions, reference may be made to step 308, which will not be repeated here.
  • Step 310 Perform energy equalization processing on the at least five channel signals according to the energy equalization mode to obtain at least five equalized channel signals.
  • the energy equalization mode is the first energy equalization mode
  • an average of the energy or amplitude values of the two channel signals included in the current channel pair may be calculated According to the average value, energy equalization processing is performed on the two channel signals respectively to obtain two corresponding equalized channel signals.
  • the energy equalization mode is the second energy equalization mode
  • the average value of the energy or amplitude values of the at least five channel signals can be calculated, and the energy equalization processing is performed on the at least five channel signals according to the average value to obtain at least five equalized sound signals. channel signal.
  • Step 311 Encode at least five equalized channel signals according to the channel pair set corresponding to the target group pair mode.
  • the object of encoding is the at least five channel signals, not the equalized channel signals.
  • the two grouping methods are fused, and it is determined according to the sum of the correlation values corresponding to the grouping methods to adopt the grouping method of the prior art or the grouping method aiming at maximizing the sum of the correlation values.
  • the fluctuation interval value of the channel signal determines the energy equalization mode, so that the energy equalization is more in line with the fluctuation interval value of the channel, which can make the encoding method of the audio frame more diverse and efficient.
  • the 5.1 channel includes a center channel (C), a front left channel (left, L), a front right channel (right, R), and a left surround channel (left surround).
  • C center channel
  • L front left channel
  • R front right channel
  • R left surround channel
  • LFE right surround back channel
  • LFE low frequency effects
  • FIG. 4 is an exemplary structural diagram of an encoding apparatus to which the encoding method for a multi-channel audio signal provided by the present application is applied.
  • the encoding apparatus may be the encoder 20 of the source device 12 in the audio decoding system 10, or may be is the decoding module 270 in the audio decoding device 200 .
  • the encoding device may include a mode selection module, a multi-channel fusion processing module, a channel encoding module and a code stream multiplexing interface, wherein,
  • the input of the mode selection module includes the six-channel signals (L, R, C, LS, RS, LFE) of 5.1 channels, and the multi-channel processing indicator (MultiProcFlag), and the output includes the filtered five-channel signals (L, R, C, LS, RS), and mode selection side information
  • the mode selection side information includes energy balance mode (Pair energy balance mode or overall energy balance mode) and group pair mode (MCT group pair or MCAC group pair ), and the correlation value side information (global correlation value side information or MCT correlation value side information) corresponding to the group-pair mode.
  • the multi-channel fusion processing module includes a multi-channel coding tool (MCT) unit and a multi-channel adaptive coupling (MCAC) unit.
  • MCT multi-channel coding tool
  • MCAC multi-channel adaptive coupling
  • the energy balance can be determined by selecting side information according to the mode mode, and which of the two modules performs energy equalization and stereo processing on the five channel signals (L, R, C, LS, RS), and the output includes the processed channel signals (P1-P4, C ), and multi-channel side information including a set of channel pairs.
  • the channel encoding module uses the mono encoding unit (or mono box, mono tool) to encode the processed channel signal (P1-P4, C) output by the multi-channel fusion processing module and output the corresponding encoding Channel signals (E1-E5).
  • the channel signal with higher energy (or higher amplitude) is allocated more bits, and the channel signal with less energy (or lower amplitude) is allocated more bits. Allocate fewer bits.
  • the channel encoding module may also use a stereo encoding unit, such as a parametric stereo encoder or a lossy stereo encoder, to encode the processed channel signal output by the multi-channel processing module.
  • the unpaired channel signal (eg C) can be directly input to the channel encoding module to obtain the encoded channel signal E5.
  • the code stream multiplexing interface generates an encoded multi-channel signal, and the encoded multi-channel signal includes the encoded channel signal (E1-E5) output by the channel encoding module and side information (including mode selection side information and multi-channel side information) .
  • the code stream multiplexing interface can process the encoded multi-channel signal into a serial signal or a serial bit stream.
  • Figure 5a is an exemplary structural diagram of the mode selection module.
  • the mode selection module includes: a multi-channel screening unit, a global correlation value statistics unit, an MCT correlation value statistics unit and a multi-channel mode selection unit.
  • the multi-channel screening unit filters out the five-channel signals participating in the multi-channel processing from the six-channel signals (L, R, C, LS, RS, LFE) according to the multi-channel processing indicator (MultiProcFlag), namely L, R, C, LS, RS.
  • MultiProcFlag multi-channel processing indicator
  • the global correlation value statistics unit first calculates the channel signals involved in multi-channel processing, that is, the normalized correlation value between any two channel signals in L, R, C, LS, and RS.
  • the present application can use the following formula to calculate the correlation value between two channel signals (for example, the channel signal ch1 and the channel signal ch2):
  • corr(ch1, ch2) represents the normalized correlation value between the channel signal ch1 and the channel signal ch2
  • spec_ch1(i) represents the frequency domain coefficient of the ith frequency point of the channel signal ch1
  • spec_ch2(i ) represents the frequency domain coefficient of the ith frequency point of the channel signal ch2
  • N represents the total number of frequency points of an audio frame.
  • the sum of the correlation values of the channel pairs) is the largest, and the channel pair set corresponding to the largest one (referred to as the target channel pair set).
  • the MCT correlation value statistics unit first calculates the five channel signals involved in multi-channel processing, that is, the normalized correlation value between any two channel signals in L, R, C, LS, and RS. Similarly, the above formula can be used to calculate the correlation value between two channel signals (eg, the channel signal ch1 and the channel signal ch2).
  • the channel pair corresponding to the highest correlation value (for example, L and R) is selected to be added to the target channel pair set, and in the second iterative process, the correlation value of the channel pair containing L and/or R is deleted, From the remaining correlation values, select the channel pair (eg, LS and RS) corresponding to the maximum correlation value to be added to the target channel pair set, and so on until the correlation value is cleared.
  • the global correlation value statistics unit and the MCT correlation value statistics unit can filter the correlation values according to the set thresholds, that is, Correlation values greater than or equal to the group pair threshold are retained, while correlation values less than the group pair threshold are removed, or their values are set to 0. This reduces the amount of computation.
  • Fig. 5b is an exemplary structural diagram of the multi-channel mode selection unit. As shown in Fig. 5b, the multi-channel mode selection unit includes a module selection unit and an energy equalization selection unit.
  • the module selection unit also determines the target group pair mode according to the fluctuation interval values of the multiple channel signals provided by the energy equalization selection unit, for example, when the five channel signals (L, When the energy flatness of R, C, LS, RS) is less than the first threshold, the target group pair mode is MCAC group pair; when the energy flatness of the five channel signals (L, R, C, LS, RS) is greater than or When it is equal to the first threshold, the target group pair mode is the MCT group pair.
  • the energy equalization mode and the final energy equalization mode of the five channel signals can be determined at one time according to the fluctuation interval values of the multiple channel signals provided by the energy equalization selection unit.
  • Target group pairing method For example, when the energy flatness of the five channel signals (L, R, C, LS, RS) is less than the first threshold, the target group pair mode is the MCAC group pair, and the energy equalization mode is the first energy equalization mode; When the energy flatness of the channel signals (L, R, C, LS, RS) is greater than or equal to the first threshold, the group pairing mode is MCT group pairing, and the energy equalization mode is the second energy equalization mode.
  • the energy equalization selection unit first calculates the energy or amplitude value of each channel signal, and the application can use the following formula to calculate the energy or amplitude value of the channel signal (ch):
  • energy(ch) represents the energy or amplitude value of the channel signal ch
  • sepc_coeff(ch, i) represents the frequency domain coefficient of the ith frequency point of the channel signal ch
  • N represents the total frequency points of an audio frame.
  • the present application can use the following formula to calculate the normalized energy or amplitude value of the channel signal (ch):
  • energy_uniform(ch) represents the normalized energy or amplitude value of the channel signal ch
  • the fluctuation interval value of the five channel signals is calculated.
  • the fluctuation interval value can refer to the energy flatness.
  • the application can use the following formula to calculate the energy flatness of the five channel signals:
  • efm represents the energy flatness of the five channel signals
  • the channel indices of L, R, C, LS, and RS refer to Table 1.
  • the fluctuation interval value can also refer to the degree of energy deviation.
  • energy_uniform(ch) obtained by the above calculation
  • the application can use the following formula to calculate the average energy or amplitude of the five channel signals. value:
  • avg_energy_uniform represents the average energy or amplitude value of the five channel signals, and the channel indices of L, R, C, LS, and RS are shown in Table 1.
  • the energy deviation of the channel signal (ch) is calculated using the following formula:
  • deviation(ch) represents the energy deviation of the channel signal ch.
  • the largest of the energy deviation degrees of L, R, C, LS, and RS is determined as the energy deviation degree deviation of the five channel signals.
  • the fluctuation interval value may also refer to an amplitude value or an amplitude deviation degree, the principle of which is similar to the above-mentioned energy-related value, and will not be repeated here.
  • the energy equalization mode of the present application includes two implementation modes, wherein the Pair energy equalization mode is for each channel pair in the target channel pair set corresponding to the group pair mode determined by the module selection unit, using one channel
  • the two equalized channel signals corresponding to the one channel pair are obtained by centering the two channel signals.
  • the overall energy equalization mode uses two channel signals in one channel pair and at least one channel signal outside one channel to obtain two equalized channel signals corresponding to the one channel pair.
  • the corresponding equalized channel signal is the channel signal itself.
  • the energy balance selection unit determines the energy balance mode according to the fluctuation interval value, including the following two judgment methods:
  • the energy balance mode When efm is less than the first threshold, the energy balance mode is the Pair energy balance mode; when efm is greater than or equal to the first threshold, the energy balance mode is the overall energy balance mode.
  • the energy balance mode is the overall energy balance mode; when the deviation is not within the value range [threshold, 1/threshold], the energy balance mode is Pair energy Balanced mode.
  • the value range of threshold can be (0,1).
  • the deviation may represent the ratio of the frequency domain amplitude of each channel of the current frame to the average value of the frequency domain amplitude of each channel of the current frame, that is, the amplitude deviation degree.
  • the frequency-domain amplitude of the current channel is less than or equal to the average of the frequency-domain amplitudes of each channel of the current frame, satisfying the condition "frequency-domain amplitude of the current channel/average of the frequency-domain amplitudes of each channel of the current frame" between (0.2, 1], that is, between (threshold, 1]; 2, the frequency domain amplitude of the current channel is greater than the average frequency domain amplitude of each channel of the current frame, and the “current sound
  • the frequency domain amplitude of the current channel/the average value of the frequency domain amplitude of each channel of the current frame” is between (1,5); considering the above two situations, when the frequency domain amplitude of the current channel and the current frame When the proportional relationship of the average value of the frequency domain amplitude of the , that is, between (threshold, 1/threshold), (threshold, 1/threshold) is the above-mentioned second preset range.
  • the value of threshold can be between (0, 1), and the value of threshold The smaller the value, the greater the fluctuation of the frequency domain amplitude of the current channel relative to the average value of the frequency domain amplitudes of each channel of the current frame. The smaller the fluctuation of the average value of the amplitude in the frequency domain of each channel is, the value of the threshold can be 0.2, 0.15, 0.125, 0.11, or 0.1 and so on.
  • the deviation can also represent the ratio of the frequency domain energy of each channel to the average value of the frequency domain energy of each channel, that is, the energy deviation degree.
  • the frequency domain energy of the current channel Less than or equal to the average value of the frequency-domain energy of each channel of the current frame, and satisfying the condition "the frequency-domain energy of the current channel/the average value of the frequency-domain energy of each channel of the current frame" is between (0.04, 1] Second, the frequency domain energy of the current channel is greater than the average value of the frequency domain energy of each channel of the current frame, and the condition of “frequency domain energy of the current channel/each channel of the current frame” is satisfied.
  • the average value of the frequency domain energy of the channel is between (1, 25); combining the above two cases, when the frequency domain energy of the current channel is proportional to the average value of the frequency domain energy of each channel of the current frame When it is less than 25, the range of “frequency domain energy of the current channel/average frequency domain energy of each channel of the current frame” that satisfies the condition is between (0.04, 25), that is, (threshold, 1/threshold ), (threshold, 1/threshold) is the above-mentioned first preset range.
  • the threshold can be between (0, 1), and the smaller the value of the threshold is, it means that the frequency domain energy of the current channel is relative to the The greater the fluctuation of the average value of the frequency domain energy of each channel of the current frame, the greater the value of threshold, indicating that the fluctuation of the frequency domain energy of the current channel relative to the average value of the frequency domain energy of each channel of the current frame is greater. Small.
  • the value of Threshold can be 0.04, 0.0225, 0.015625, 0.0121, or 0.01 and so on.
  • the amplitude deviation degree and the energy deviation degree also have a square relationship, that is, the fluctuation of the frame amplitude between channels corresponding to the square of the amplitude deviation degree is approximately equal to the sound wave corresponding to the energy deviation degree.
  • the volatility of the frame energy between tracks Since there is a square relationship between the amplitude and the energy, the amplitude deviation degree and the energy deviation degree also have a square relationship, that is, the fluctuation of the frame amplitude between channels corresponding to the square of the amplitude deviation degree is approximately equal to the sound wave corresponding to the energy deviation degree. The volatility of the frame energy between tracks.
  • the above-mentioned first preset range can also be extended to (0, 1/threshold), and the interval range of Pair energy balance is [1/threshold, + ⁇ ) at this time, indicating that when the current When the frequency domain energy of the channel is greater than the average frequency domain energy of each channel of the current frame, and "the frequency domain energy of the current channel/the average frequency domain energy of each channel of the current frame" is greater than 1/threshold , before the Pair energy balance is performed.
  • the above-mentioned second preset range can also be extended to (0, 1/threshold), and the interval range of the Pair amplitude equalization is [1/threshold, + ⁇ ), which indicates that when the current When the frequency domain amplitude of the channel is greater than the average frequency domain amplitude of each channel of the current frame, and "the frequency domain amplitude of the current channel/the average frequency domain amplitude of each channel of the current frame" is greater than 1/threshold , before performing Pair amplitude equalization.
  • the energy equalization selection unit can calculate the normalized energy or amplitude value according to the five channel signals, and then obtain the energy flatness or energy deviation, or can only calculate the normalized channel signal according to the group pair successfully.
  • the normalized energy or amplitude value can be obtained, and then the energy flatness or energy deviation can be obtained.
  • the normalized energy or amplitude value can also be calculated according to some channel signals in the five channel signals, and then the energy flatness or energy deviation can be obtained. . This application does not specifically limit this.
  • the multi-channel fusion processing module includes an MCT unit and an MCAC unit, wherein,
  • the MCT unit first uses the overall energy equalization mode to perform energy equalization processing on the five channel signals (L, R, C, LS, RS) to obtain Le, Re, Ce, LSe and RSe, and then obtains the target sound according to the MCT correlation value side information.
  • the channel pair set, the stereo processing is performed on the two equalized channel signals (for example, (Le, Re) or (LSe, RSe)) of the channel pair in the target channel pair set through the stereo box.
  • the MCAC unit obtains the target channel pair set (for example, (L, R) and (LS, RS)) according to the global correlation value side information, and then according to the energy equalization mode, if it is the Pair energy equalization mode, the target channel pair set is
  • the two channel signals of the channel pair (for example, (L, R) and (LS, RS)) in the channel pair are energy equalized to obtain (Le, Re) and (LSe, RSe), and then equalized by the stereo box.
  • the two equalized channel signals (eg (Le, Re) or (LSe, RSe)) in the stereo are processed in stereo.
  • the stereo processing unit may employ prediction-based or Karhunen-Loeve Transform (KLT)-based processing, ie the input two channel signals are rotated (eg via a 2x2 rotation matrix) to maximize energy compression, This concentrates the signal energy in one channel.
  • KLT Karhunen-Loeve Transform
  • the stereo processing unit After the stereo processing unit processes the input two channel signals, it outputs the processed channel signals (P1-P4) corresponding to the two channel signals and the multi-channel side information.
  • the multi-channel side information includes the sum of the correlation values. A collection of target channel pairs.
  • FIG. 6 is an exemplary structural diagram of a decoding apparatus to which the multi-channel audio decoding method provided by the present application is applied.
  • the decoding apparatus may be the decoder 30 of the destination device 14 in the audio decoding system 10, or may be a The decoding module 270 in the audio decoding apparatus 200 .
  • the decoding device can include a code stream demultiplexing interface, a channel decoding module and a multi-channel processing module, wherein,
  • the code stream demultiplexing interface receives the encoded multi-channel signal (eg serial bit stream bitstream) from the encoding device, and obtains the encoded channel signal (E) and multi-channel parameters (SIDE_PAIR) after demultiplexing.
  • E encoded channel signal
  • SIDE_PAIR multi-channel parameters
  • the channel decoding module uses a monaural decoding unit (or a monaural box, a monaural tool) to decode the coded channel signal output by the code stream demultiplexing interface and output the decoded channel signal (D).
  • a monaural decoding unit or a monaural box, a monaural tool
  • D decoded channel signal
  • the multi-channel processing module includes a plurality of stereo processing units.
  • the stereo processing unit can adopt prediction-based or KLT-based processing, that is, the input two channel signals are inversely rotated (for example, via a 2 ⁇ 2 rotation matrix), so that the signal Transform to the original signal direction.
  • the decoded channel signal output by the channel decoding module can identify which two decoded channel signal groups are paired by the multi-channel parameters, and input the decoded channel signal of the pair into the stereo processing unit, and the stereo processing unit decodes the two input channel signals.
  • the channel signal (CH) corresponding to the two decoded channel signals is output.
  • stereo processing unit 1 processes D1 and D2 according to SIDE_PAIR1 to obtain CH1 and CH2
  • stereo processing unit 2 processes D3 and D4 according to SIDE_PAIR2 to obtain CH3 and CH4, ...
  • the unpaired channel signal (eg CHj) does not need to be processed by the stereo processing unit in the multi-channel processing module, and can be directly output after decoding.
  • FIG. 7 is a schematic structural diagram of an embodiment of an encoding apparatus of the present application. As shown in FIG. 7 , the apparatus may be applied to the source device 12 or the audio decoding device 200 in the above-mentioned embodiment.
  • the encoding apparatus in this embodiment may include: an acquisition module 601 , an encoding module 602 and a determination module 603 . in,
  • the obtaining module 601 is configured to obtain a first audio frame to be encoded, where the first audio frame includes at least five channel signals; the at least five channel signals are paired according to the first pairing method to obtain the first audio channel signal.
  • a channel pair set, the first channel pair set includes at least one channel pair, and one channel pair includes two channel signals in the at least five channel signals; acquiring the first channel pair
  • the sum of the first correlation values of the set, the one channel pair has one correlation value, and the correlation value is used to represent the correlation between the two channel signals of the one channel pair; according to the second set of pairs
  • the at least five channel signals are grouped to obtain a second channel pair set; the sum of the second correlation values of the second channel pair set is obtained; the determining module 603 is used for determining according to the first channel pair set.
  • the sum of the correlation values and the sum of the second correlation values determine the target group pair mode of the at least five channel signals; the encoding module 602 is configured to perform the target group pair mode on the at least five channel signals according to the target group pair mode Encoding is performed, and the target group pairing manner is the first group pairing manner or the second group pairing manner.
  • the determining module 603 is specifically configured to, when the sum of the first correlation value is greater than the sum of the second correlation value, determine that the target group pair mode is the first Group pair mode; when the sum of the first correlation values is equal to the sum of the second correlation values, the target group pair mode is determined to be the second group pair mode.
  • the determining module 603 is further configured to acquire the fluctuation interval value of the at least five channel signals; when the target group pairing method is the first group pairing method, according to The fluctuation interval value of the at least five channel signals determines an energy equalization mode; when the target group pairing mode is the second group pairing mode, the energy equalization mode is determined according to the fluctuation interval value of the at least five channel signals correspondingly, the encoding module 602 is further configured to perform energy equalization processing on the at least five channel signals respectively according to the energy equalization mode to obtain at least five equalized channel signals; encoding the at least five equalized channel signals according to the target group pair; and the energy equalization mode is a first energy equalization mode or a second energy equalization mode.
  • the determining module 603 is specifically configured to determine that the energy balancing mode is the first energy balancing mode when the fluctuation interval value meets a preset condition; or, when the When the fluctuation interval value does not meet the preset condition, it is determined that the energy balance mode is the second energy balance mode.
  • the determining module 603 is specifically configured to determine that the target group pairing method is the first group pairing method when the fluctuation interval value meets a preset condition, and the energy balance The mode is the first energy balance mode; or, when the fluctuation interval value does not meet the preset condition, it is determined that the target group pairing mode is the second group pairing mode, and the energy balance mode is the first Two energy balance mode.
  • the determining module 603 is further configured to determine whether the encoding bit rate corresponding to the first audio frame is greater than a bit rate threshold; when the encoding bit rate is greater than the bit rate threshold , determine that the energy equalization mode is the second energy equalization mode; only when the coding rate is less than or equal to the code rate threshold, the energy equalization mode is determined according to the fluctuation interval value.
  • the fluctuation interval value includes the energy flatness of the first audio frame; the fluctuation interval value conforming to a preset condition means that the energy flatness is less than a first threshold; or, the The fluctuation interval value includes the amplitude flatness of the first audio frame; the fluctuation interval value conforming to the preset condition means that the amplitude flatness is less than the second threshold; or, the fluctuation interval value includes the first audio frequency
  • the energy deviation degree of the frame; the fluctuation interval value meeting the preset condition means that the energy deviation degree is not within the first preset range; or, the fluctuation interval value includes the amplitude deviation degree of the first audio frame;
  • the fact that the fluctuation interval value meets the preset condition means that the amplitude deviation is not within the second preset range.
  • the obtaining module 601 is specifically configured to select a channel pair from the channel pairs corresponding to the at least five channel signals and add them to the The first channel pair collection.
  • the obtaining module 601 is specifically configured to firstly add the channel pair with the largest correlation value among the channel pairs corresponding to the at least five channel signals into the second channel pair set ; Then add the channel pair with the largest correlation value in the channel pairs corresponding to the at least five channel signals except the associated channel pair outside the second channel pair set, and the associated channel pair The pair includes any one of the channel signals included in the channel pair that has been added to the first channel pair set.
  • the encoding module 602 when the energy equalization mode is the first energy equalization mode, is specifically configured to target the current channel pair in the target channel pair set corresponding to the group pair mode Channel pair, calculate the average value of the energy or amplitude values of the two channel signals included in the current channel pair, and perform energy equalization processing on the two channel signals according to the average value to obtain the corresponding two channel signals. balanced channel signal.
  • the encoding module 602 is specifically configured to calculate the average of the energy or amplitude values of the at least five channel signals and performing energy equalization processing on the at least five channel signals according to the average value to obtain the at least five equalized channel signals.
  • the apparatus of this embodiment can be used to execute the technical solution of the method embodiment shown in FIG. 3 , and its implementation principle and technical effect are similar, and details are not repeated here.
  • FIG. 8 is a schematic structural diagram of an embodiment of a device of the present application.
  • the device may be the encoding device in the foregoing embodiment.
  • the device in this embodiment may include: a processor 701 and a memory 702.
  • the memory 702 is used to store one or more programs; when the one or more programs are executed by the processor 701, the processor 701 realizes The technical solution of the method embodiment is shown in FIG. 3 .
  • each step of the above method embodiments may be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software.
  • the processor may be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), or other Programming logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application-specific integrated circuit
  • FPGA field programmable gate array
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in the present application can be directly embodied as executed by a hardware encoding processor, or executed by a combination of hardware and software modules in the encoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware.
  • the memory mentioned in the above embodiments may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically programmable Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • Volatile memory may be random access memory (RAM), which acts as an external cache.
  • RAM random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous DRAM
  • SDRAM double data rate synchronous dynamic random access memory
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link dynamic random access memory
  • direct rambus RAM direct rambus RAM
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution, and the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
  • Time-Division Multiplex Systems (AREA)

Abstract

一种多声道音频信号的编码方法(300)和装置。多声道音频信号的编码方法(300)包括:获取待编码的第一音频帧(301);根据第一组对方式对至少五个声道信号进行组对以获得第一声道对集合(302);获取第一声道对集合的第一相关值之和,一个声道对具有一个相关值(303);根据第二组对方式对至少五个声道信号进行组对以获得第二声道对集合(304);获取第二声道对集合的第二相关值之和(305);根据第一相关值之和和第二相关值之和确定至少五个声道信号的目标组对方式(306);根据目标组对方式对应的声道对集合对至少五个声道信号进行编码,目标组对方式为第一组对方式或者第二组对方式(311)。多声道音频信号的编码方法(300)和装置可以使得音频帧的编码方法更多样和更高效。

Description

多声道音频信号的编码方法和装置
本申请要求于2020年7月17日提交中国专利局、申请号为202010728902.2、申请名称为“多声道音频信号的编码方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及音频处理技术,尤其涉及一种多声道音频信号的编码方法和装置。
背景技术
多声道音频的编解码是对包含两个以上声道的音频进行编码或解码的技术。常见的多声道音频有5.1声道音频、7.1声道音频、7.1.4声道音频以及22.2声道音频等。
MPEG环绕声(MPEG Surround,MPS)标准规定了针对四个声道的联合编码,但仍需有可以针对上述各种多声道音频信号的编解码方法。
发明内容
本申请提供一种多声道音频信号的编码方法和装置,以使得音频帧的编码方法更多样和更高效。
第一方面,本申请提供一种多声道音频信号的编码方法,包括:获取待编码的第一音频帧,所述第一音频帧包括至少五个声道信号;根据第一组对方式对所述至少五个声道信号进行组对以获得第一声道对集合,所述第一声道对集合包括至少一个声道对,一个声道对包括所述至少五个声道信号中的两个声道信号;获取所述第一声道对集合的第一相关值之和,所述一个声道对具有一个相关值,所述相关值用于表示所述一个声道对的两个声道信号之间的相关性;根据第二组对方式对所述至少五个声道信号进行组对以获得第二声道对集合;获取所述第二声道对集合的第二相关值之和;根据所述第一相关值之和和所述第二相关值之和确定所述至少五个声道信号的目标组对方式;根据所述目标组对方式对所述至少五个声道信号进行编码,所述目标组对方式为所述第一组对方式或者所述第二组对方式。
本实施例的第一音频帧可以是待编码的多声道音频中的任意一个帧,该第一音频帧包括了五个或五个以上的声道信号。将相关性越高的两个声道信号放在一起编码可以减少冗余,提升编码效率,因此本实施例在组对时,是依据两个声道信号之间的相关值来确定的。为了尽可能找寻相关性最高的组对方式,可以计算第一音频帧中的至少五个声道信号中两两之间的相关值得到第一音频帧的相关值集合。第一组对方式包括:以获取最大相关值之和为目的,从至少五个声道信号对应的声道对中选取声道对加入所述第一声道对集合。第一相关值之和是第一组对方式对应的第一声道对集合中的所有声道对的相关值之和。第二组对方式包括:先将至少五个声道信号对应的声道对中相关值最大的声道对加入第二声道对集合;然后将至少五个声道信号对应的声道对中除关联声道对外的其他声道对中相关值最大的声道对加入第二声道对集合,关联声道对包括已加入第一声道对集合的声道对所包括的声道信号中的任意一个。第二相关值之和是第二组对方式对应的第二声道对集合中的 所有声道对的相关值之和。
本实施例将两种组对方式进行融合,根据组对方式对应的相关值之和确定采用现有技术的组对方式,还是采用以相关值之和最大为目的的组对方式,使得音频帧的编码方法更多样和更高效。
在一种可能的实现方式中,所述根据所述第一相关值之和和所述第二相关值之和确定所述至少五个声道信号的目标组对方式,包括:当所述第一相关值之和大于所述第二相关值之和时,确定所述目标组对方式为所述第一组对方式;当所述第一相关值之和等于所述第二相关值之和时,确定所述目标组对方式为所述第二组对方式。
初始根据相关值之和确定目标组对方式,可以实现目标声道对集合所包含的所有声道对的相关值之和尽可能大,并尽可能增加组对的声道对的个数,减少声道信号之间的冗余。
在一种可能的实现方式中,所述根据所述目标组对方式对所述至少五个声道信号进行编码之前,还包括:获取所述至少五个声道信号的波动区间值;当所述目标组对方式为所述第一组对方式时,根据所述至少五个声道信号的波动区间值确定能量均衡模式;当所述目标组对方式为所述第二组对方式时,根据所述至少五个声道信号的波动区间值确定能量均衡模式以及再次确定所述至少五个声道信号的目标组对方式;根据所述能量均衡模式分别对所述至少五个声道信号进行能量均衡处理以得到至少五个均衡声道信号;相应的,所述根据所述目标组对方式对所述至少五个声道信号进行编码,包括:根据所述目标组对方式对所述至少五个均衡声道信号进行编码。
其中,在本申请实施例中,前述的能量均衡也可以是幅度均衡,能量均衡处理的对象是能量,幅度均衡处理的对象是幅度。其中,一个声道信号的能量和该一个声道信号的幅度之间存在平方的关系,即能量=幅度 2=幅度×幅度。
第一能量均衡模式为Pair能量均衡模式,该模式下,针对任意一个声道对,仅使用该声道对中两个声道信号获取该声道对对应的两个均衡声道信号。需要说明的是,仅使用表示的含义是,在获取均衡声道信号时,以声道对为单位,只根据声道对中包含的两个声道信号进行能量均衡处理,获取的两个均衡声道信号也只与该两个声道信号相关,而不需要声道对之外的其他声道信号参与能量均衡。但仅使用并不用于限定进行能量均衡处理时所涉及的信息内容,例如,可以在能量均衡处理时参考声道信号的相关特性参数、编解码参数等,对此不做具体限定。第二能量均衡模式为整体能量均衡模式,该模式使用一个声道对中两个声道信号以及一个声道对外至少一个声道信号来获取一个声道对对应的两个均衡声道信号。需要说明的是,本申请还可以采用其他的能量均衡模式,对此不做具体限定。
当初始确定使用第一组对方式时,可以进一步根据至少五个声道信号的波动区间值确定能量均衡模式,当初始确定使用第二组对方式时,还可以进一步根据至少五个声道信号的波动区间值确定能量均衡模式以及再次确定至少五个声道信号的目标组对方式,可以实现从多维度确定组对方式,并且使得能量均衡更符合多个声道信号的特性,使得音频帧的编码方法更多样和更高效。
在一种可能的实现方式中,所述根据所述至少五个声道信号的波动区间值确定能量均衡模式,包括:当所述波动区间值符合预设条件时,确定所述能量均衡模式为所述第一能量均衡模式;或者,当所述波动区间值不符合预设条件时,确定所述能量均衡模式为所述第二能量均衡模式。
在一种可能的实现方式中,所述根据所述至少五个声道信号的波动区间值确定能量均衡模式以及再次确定所述至少五个声道信号的目标组对方式,包括:当所述波动区间值符合预设条件时,确定所述目标组对方式为所述第一组对方式,所述能量均衡模式为所述第一能量均衡模式;或者,当所述波动区间值不符合预设条件时,确定所述目标组对方式为所述第二组对方式,所述能量均衡模式为所述第二能量均衡模式。
在一种可能的实现方式中,所述根据所述至少五个声道信号的波动区间值确定能量均衡模式之前,还包括:判断与所述第一音频帧对应的编码码率是否大于码率阈值,可选的,在一种实施方式中,该码率阈值可以设置为28kbps/(有效声道信号的个数/帧率);其中的28kbps也可以为其他经验值,如30kbps,26kbps等等;其中,有效声道信号是指除LFE外的其他声道信号,例如5.1声道除LFE外的声道信号包括C、L、R、LS、RS,7.1声道除LFE外的声道信号包括C、L、R、LS、RS、LB、RB;当所述编码码率大于所述码率阈值时,确定所述能量均衡模式为所述第二能量均衡模式;当所述编码码率小于或等于所述码率阈值时,才根据所述波动区间值确定所述能量均衡模式。其中,帧率是指单位时间内处理的帧数。帧率的计算公式为,帧率=采样率/一个音频帧对应的采样样本的个数。例如,采样率为48000Hz,一个音频帧对应的采样样本个数是960,帧率是48000/960=50(帧/s)
在确定能量均衡模式的时候,再加上编码码率的因素,可以提高编码效率。
在一种可能的实现方式中,所述波动区间值包括所述第一音频帧的能量平整度;所述波动区间值符合预设条件是指所述能量平整度小于第一阈值,例如该第一阈值可以是0.483;或者,所述波动区间值包括所述第一音频帧的幅度平整度;所述波动区间值符合预设条件是指所述幅度平整度小于第二阈值,例如该第二阈值可以是0.695;或者,所述波动区间值包括所述第一音频帧的能量偏离度;所述波动区间值符合预设条件是指所述能量偏离度不在第一预设范围内,例如该第一预设范围可以是0.04~25;或者,所述波动区间值包括所述第一音频帧的幅度偏离度;所述波动区间值符合预设条件是指所述幅度偏离度不在第二预设范围内,例如该第二预设范围可以是0.2~5。
参考声道信号的多个维度的特性确定能量均衡模式,可以提高能量均衡的准确度。
在一种可能的实现方式中,所述根据第一组对方式对所述至少五个声道信号进行组对以获得第一声道对集合,包括:以获取最大相关值之和为目的,从所述至少五个声道信号对应的声道对中选取声道对加入所述第一声道对集合。
在一种可能的实现方式中,所述根据第二组对方式对所述至少五个声道信号进行组对以获得第二声道对集合包括:先将所述至少五个声道信号对应的声道对中相关值最大的声道对加入所述第二声道对集合;然后将所述至少五个声道信号对应的声道对中除关联声道对外的其他声道对中相关值最大的声道对加入所述第二声道对集合,所述关联声道对包括已加入所述第一声道对集合的声道对所包括的声道信号中的任意一个。
在一种可能的实现方式中,当所述能量均衡模式为所述第一能量均衡模式时,所述根据所述能量均衡模式分别对所述至少五个声道信号进行能量均衡处理以得到至少五个均衡声道信号,包括:针对所述组对方式对应的目标声道对集合中的当前声道对,计算所述当前声道对包含的两个声道信号的能量或幅度值的平均值,根据所述平均值分别对所述两个声道信号进行能量均衡处理以得到对应的两个均衡声道信号。
在一种可能的实现方式中,当所述能量均衡模式为所述第二能量均衡模式时,所述根 据所述能量均衡模式分别对所述至少五个声道信号进行能量均衡处理以得到至少五个均衡声道信号,包括:计算所述至少五个声道信号的能量或幅度值的平均值,根据所述平均值分别对所述至少五个声道信号进行能量均衡处理得到所述至少五个均衡声道信号。
第二方面,本申请提供一种编码装置,包括:获取模块,用于获取待编码的第一音频帧,所述第一音频帧包括至少五个声道信号;根据第一组对方式对所述至少五个声道信号进行组对以获得第一声道对集合,所述第一声道对集合包括至少一个声道对,一个声道对包括所述至少五个声道信号中的两个声道信号;获取所述第一声道对集合的第一相关值之和,所述一个声道对具有一个相关值,所述相关值用于表示所述一个声道对的两个声道信号之间的相关性;根据第二组对方式对所述至少五个声道信号进行组对以获得第二声道对集合;获取所述第二声道对集合的第二相关值之和;确定模块,用于根据所述第一相关值之和和所述第二相关值之和确定所述至少五个声道信号的目标组对方式;编码模块,用于根据所述目标组对方式对所述至少五个声道信号进行编码,所述目标组对方式为所述第一组对方式或者所述第二组对方式。
在一种可能的实现方式中,所述确定模块,具体用于当所述第一相关值之和大于所述第二相关值之和时,确定所述目标组对方式为所述第一组对方式;当所述第一相关值之和等于所述第二相关值之和时,确定所述目标组对方式为所述第二组对方式。
在一种可能的实现方式中,所述确定模块,还用于获取所述至少五个声道信号的波动区间值;当所述目标组对方式为所述第一组对方式时,根据所述至少五个声道信号的波动区间值确定能量均衡模式;当所述目标组对方式为所述第二组对方式时,根据所述至少五个声道信号的波动区间值确定能量均衡模式以及再次确定所述至少五个声道信号的目标组对方式;相应的,所述编码模块,还用于根据所述能量均衡模式分别对所述至少五个声道信号进行能量均衡处理以得到至少五个均衡声道信号;根据所述目标组对方式对所述至少五个均衡声道信号进行编码。
在一种可能的实现方式中,所述确定模块,具体用于当所述波动区间值符合预设条件时,确定所述能量均衡模式为所述第一能量均衡模式;或者,当所述波动区间值不符合预设条件时,确定所述能量均衡模式为所述第二能量均衡模式。
在一种可能的实现方式中,所述确定模块,具体用于当所述波动区间值符合预设条件时,确定所述目标组对方式为所述第一组对方式,所述能量均衡模式为所述第一能量均衡模式;或者,当所述波动区间值不符合预设条件时,确定所述目标组对方式为所述第二组对方式,所述能量均衡模式为所述第二能量均衡模式。
在一种可能的实现方式中,所述确定模块,还用于判断与所述第一音频帧对应的编码码率是否大于码率阈值;当所述编码码率大于所述码率阈值时,确定所述能量均衡模式为所述第二能量均衡模式;当所述编码码率小于或等于所述码率阈值时,才根据所述波动区间值确定所述能量均衡模式。
在一种可能的实现方式中,所述波动区间值包括所述第一音频帧的能量平整度;所述波动区间值符合预设条件是指所述能量平整度小于第一阈值;或者,所述波动区间值包括所述第一音频帧的幅度平整度;所述波动区间值符合预设条件是指所述幅度平整度小于第二阈值;或者,所述波动区间值包括所述第一音频帧的能量偏离度;所述波动区间值符合预设条件是指所述能量偏离度不在第一预设范围内;或者,所述波动区间值包括所述第一 音频帧的幅度偏离度;所述波动区间值符合预设条件是指所述幅度偏离度不在第二预设范围内。
在一种可能的实现方式中,所述获取模块,具体用于以获取最大相关值之和为目的,从所述至少五个声道信号对应的声道对中选取声道对加入所述第一声道对集合。
在一种可能的实现方式中,所述获取模块,具体用于先将所述至少五个声道信号对应的声道对中相关值最大的声道对加入所述第二声道对集合;然后将所述至少五个声道信号对应的声道对中除关联声道对外的其他声道对中相关值最大的声道对加入所述第二声道对集合,所述关联声道对包括已加入所述第一声道对集合的声道对所包括的声道信号中的任意一个。
在一种可能的实现方式中,当所述能量均衡模式为所述第一能量均衡模式时,所述编码模块,具体用于针对所述组对方式对应的目标声道对集合中的当前声道对,计算所述当前声道对包含的两个声道信号的能量或幅度值的平均值,根据所述平均值分别对所述两个声道信号进行能量均衡处理以得到对应的两个均衡声道信号。
在一种可能的实现方式中,当所述能量均衡模式为所述第二能量均衡模式时,所述编码模块,具体用于计算所述至少五个声道信号的能量或幅度值的平均值,根据所述平均值分别对所述至少五个声道信号进行能量均衡处理得到所述至少五个均衡声道信号。
第三方面,本申请提供一种设备,包括:一个或多个处理器;存储器,用于存储一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如上述第一方面中任一项所述的方法。
第四方面,本申请提供一种计算机可读存储介质,包括计算机程序,所述计算机程序在计算机上被执行时,使得所述计算机执行上述第一方面中任一项所述的方法。
第五方面,本申请提供一种计算机可读存储介质,包括根据如上述第一方面中任一项所述的多声道音频信号的编码方法获得的编码码流。
附图说明
图1示例性地给出了本申请所应用的音频译码系统10的示意性框图;
图2示例性地给出了本申请所应用的音频译码设备200的示意性框图;
图3是本申请提供的多声道音频信号的编码方法的一个示例性的实施例的流程图;
图4是本申请提供的多声道音频信号的编码方法所应用的编码装置的一个示例性的结构图;
图5a是模式选择模块的一个示例性的结构图;
图5b是多声道模式选择单元的一个示例性的结构图;
图6是本申请提供的多声道音频的解码方法所应用的解码装置的一个示例性的结构图;
图7为本申请编码装置实施例的结构示意图;
图8为本申请设备实施例的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请中的附图,对本申 请中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获取的所有其他实施例,都属于本申请保护的范围。
本申请的说明书实施例和权利要求书及附图中的术语“第一”、“第二”等仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元。方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。
本申请涉及到的相关名词解释:
音频帧:音频数据是流式的,在实际应用中,为了便于音频处理和传输,通常取一时长内的音频数据量作为一帧音频,该时长被称为“采样时间”,可以根据编解码器和具体应用的需求确定其值,例如该时长为2.5ms~60ms,ms为毫秒。
音频信号:音频信号是带有语音、音乐和音效的有规律的声波的频率、幅度变化信息载体。音频是一种连续变化的模拟信号,可用一条连续的曲线来表示,称为声波。音频通过模数转换或计算机生成的数字信号即为音频信号。声波有三个重要参数:频率、幅度和相位,这也就决定了音频信号的特征。
声道信号:是指声音在录制或播放时在不同空间位置采集或回放的相互独立的音频信号。因此声道数也就是声音录制时的音源数量或回放时的扬声器数量。
以下是本申请所应用的系统架构。
图1示例性地给出了本申请所应用的音频译码系统10的示意性框图。如图1所示,音频译码系统10可包括源设备12和目的设备14,源设备12产生经编码的码流,因此,源设备12可被称为音频编码装置。目的设备14可对由源设备12所产生的经编码的码流进行解码,因此,目的设备14可被称为音频解码装置。
源设备12包括编码器20,可选地,可包括音频源16、音频预处理器18、通信接口22。
音频源16可包括或可以为任意类型的用于捕获现实世界语音、音乐和音效等的音频捕获设备,和/或任意类型的音频生成设备,例如用于生成语音、音乐和音效的音频处理器或设备。所述音频源可以为存储上述音频的任意类型的内存或存储器。
音频预处理器18用于接收(原始)音频数据17,并对音频数据17进行预处理,得到预处理音频数据19。例如,音频预处理器18执行的预处理可包括修剪或去噪。可以理解的是,音频预处理单元18可以为可选组件。
编码器20用于接收预处理音频数据19并提供编码音频数据21。
源设备12中的通信接口22可用于接收编码音频数据21并通过通信信道13向目的设备14发送编码音频数据21,以便存储或直接重建。
目的设备14包括解码器30,可选地,可包括通信接口28、音频后处理器32和播放设备34。
目的设备14中的通信接口28用于直接从源设备12接收编码音频数据21,并将编码音频数据21提供给解码器30。
通信接口22和通信接口28可用于通过源设备12与目的设备14之间的直连通信链路,例如直接有线或无线连接等,或者通过任意类型的网络,例如有线网络、无线网络或其任意组合、任意类型的私网和公网或其任意类型的组合,发送或接收编码音频数据21。
例如,通信接口22可用于将编码音频数据21封装为报文等合适的格式,和/或使用任意类型的传输编码或处理来处理编码音频数据21,以便在通信链路或通信网络上进行传输。
通信接口28与通信接口22对应,例如,可用于接收传输数据,并使用任意类型的对应传输解码或处理和/或解封装,对传输数据进行处理,得到编码音频数据21。
通信接口22和通信接口28均可配置为如图1中从源设备12指向目的设备14的对应通信信道13的箭头所指示的单向通信接口,或双向通信接口,并且可用于发送和接收消息等,以建立连接,确认并交换与通信链路和/或编码音频数据等数据传输相关的任何其它信息,等等。
解码器30用于接收编码音频数据21并提供解码音频数据31。
音频后处理器32用于对解码音频数据31进行后处理,得到后处理后的后处理音频数据33。音频后处理器32执行的后处理可以包括例如修剪或重采样等。
播放设备34用于接收后处理音频数据33,以向用户或收听者播放音频。播放设备34可以为或包括任意类型的用于播放重建后音频的播放器,例如,集成或外部扬声器。例如,扬声器可包括喇叭、音响等。
图2示例性地给出了本申请所应用的音频译码设备200的示意性框图。在一个实施例中,音频译码设备200可以是音频解码器(例如图1的解码器30)或音频编码器(例如图1的编码器20)。
音频译码设备200包括:用于接收数据的入端口210和接收单元(Rx)220,用于处理数据的处理器、逻辑单元或中央处理器230,用于传输数据的发射单元(Tx)240和出端口250,以及,用于存储数据的存储器260。音频译码设备200还可以包括与入端口210、接收单元220、发射单元240和出端口250耦合的光电转换组件和电光(EO)组件,用于光信号或电信号的出口或入口。
处理器230通过硬件和软件实现。处理器230可以实现为一个或多个CPU芯片、核(例如,多核处理器)、FPGA、ASIC和DSP。处理器230与入端口210、接收单元220、发射单元240、出端口250和存储器260通信。处理器230包括译码模块270(例如编码模块或解码模块)。译码模块270实现本申请中所公开的实施例,以实现本申请所提供的多声道音频信号的编码方法。例如,译码模块270实现、处理或提供各种编码操作。因此,通过译码模块270为音频译码设备200的功能提供了实质性的改进,并影响了音频译码设备200到不同状态的转换。或者,以存储在存储器260中并由处理器230执行的指令来实 现译码模块270。
存储器260包括一个或多个磁盘、磁带机和固态硬盘,可以用作溢出数据存储设备,用于在选择性地执行这些程序时存储程序,并存储在程序执行过程中读取的指令和数据。存储器260可以是易失性和/或非易失性的,可以是只读存储器(ROM)、随机存取存储器(RAM)、随机存取存储器(ternary content-addressable memory,TCAM)和/或静态随机存取存储器(SRAM)。
基于上述实施例的描述,本申请提供了一种多声道音频信号的编码方法。
图3是本申请提供的多声道音频信号的编码方法的一个示例性的实施例的流程图。该过程300可由音频译码系统10中的源设备12或音频译码设备200执行。过程300描述为一系列的步骤或操作,应当理解的是,过程300可以以各种顺序执行和/或同时发生,不限于图3所示的执行顺序。如图3所示,该方法包括:
步骤301、获取待编码的第一音频帧。
本实施例的第一音频帧可以是待编码的多声道音频中的任意一个帧,该第一音频帧包括了五个或五个以上的声道信号。例如,5.1声道包括中央声道(C)、前置左声道(left,L)、前置右声道(right,R)、后置左环绕声道(left surround,LS)、后置右环绕声道(right surround,RS)以及0.1声道低频效果(low frequency effects,LFE)共六个声道信号。7.1声道包括C、L、R、LS、RS、LB、RB和LFE共八个声道信号,其中,LFE是从3-120Hz的音频声道,该声道通常发送到专门为低音调而设计的扬声器。
步骤302、根据第一组对方式对至少五个声道信号进行组对以获得第一声道对集合。
第一声道对集合包括至少一个声道对,该一个声道对包括至少五个声道信号中的两个声道信号。
步骤303、获取第一声道对集合的第一相关值之和。
一个声道对具有一个相关值,该相关值用于表示一个声道对的两个声道信号之间的相关性。
将相关性越高的两个声道信号放在一起编码可以减少冗余,提升编码效率,因此本实施例在组对时,是依据两个声道信号之间的相关值来确定的。为了尽可能找寻相关性最高的组对方式,可以先计算第一音频帧中的至少五个声道信号中两两之间的相关值得到第一音频帧的相关值集合。例如五个声道信号一共可以组成10个声道对,相对应的,相关值集合中可以包括10个相关值。
可选的,可以对相关值做归一化处理,这样所有声道对的相关值都限定在一特定范围内,以便于设置相关值的统一判断标准,例如组对阈值,该组对阈值可以设置为大于或等于0.2、且小于或等于1的值,例如可以是0.3,这样只要两个声道信号的归一化相关值小于组对阈值,就认为该两个声道信号的相关性较差,不需要组对编码。
在一种可能的实现方式中,可以采用以下公式计算两个声道信号(例如ch1和ch2)之间的相关值:
Figure PCTCN2021106826-appb-000001
其中,corr(ch1,ch2)表示声道信号ch1和声道信号ch2之间归一化的相关值,spec_ch1(i) 表示声道信号ch1的第i个频点的频域系数,spec_ch2(i)是声道信号ch2的第i个频点的频域系数,N表示一个音频帧的总频点数。
需要说明的是,还可以采用其他的算法或公式计算两个声道信号之间的相关值,本申请对此不做具体限定。
第一组对方式包括:以获取最大相关值之和为目的,从至少五个声道信号对应的声道对中选取声道对加入第一声道对集合。第一相关值之和是根据第一组对方式对至少五个声道信号进行组对得到的第一声道对集合中的所有声道对的相关值之和。本实施例第一组对方式可以包括以下两种实现方式:
(1)从相关值集合中选取最大的M个相关值,该M个相关值必须是大于或等于组对阈值的,这是因为小于组对阈值的相关值,表示其所对应的声道对中的两个声道信号之间的相关性较低,没有组对编码的必要。而为了提高编码效率,无需把所有大于或等于组对阈值的相关值全都选出来,因此设定了一个M的上限N,即最多选取N个相关值即可。
N可以选取大于或等于2的整数,N的最大值也不能超过第一音频帧的所有声道信号对应的所有声道对的个数。N的值越大,伴随的计算量会增加,而N的值越小,可能会出现声道对集合丢失的情况,从而降低编码效率。
可选的,可以将N设置为最大声道对数加一,即
Figure PCTCN2021106826-appb-000002
CH表示第一音频帧包含的声道信号的个数。例如,5.1声道包含五个声道信号,则N=3;7.1声道包含七个声道信号,则N=4。
然后根据M个相关值获取M个声道对集合,每个声道对集合至少包括M个相关值对应的M个声道对的其中之一,且当声道对集合包括两个以上声道对时,两个以上声道对不包含相同的声道信号。例如,5.1声道,根据相关值集合选出来的最大相关值对应的3个声道对是(L,R)、(R,C)和(LS,RS),其中(LS,RS)的相关值小于组对阈值,因此排除,那么剩余的两个声道对(L,R)和(R,C)可以得到两个声道对集合,这两个声道对集合的其中一个包括(L,R),另一个包括(R,C)。
以M个大于或等于组对阈值的相关值对应的M个声道对中的任意一个(例如第一声道对)为例,本实施例获取M个声道对集合的方法可以包括:将第一声道对加入第一声道对集合,M个声道对集合包括该第一声道对集合该,当多个声道对中除关联声道对外的其他声道对中包括相关值大于组对阈值的声道对时,从其他声道对中选取相关值最大的一个声道对加入第一声道对集合,关联声道对包括已加入第一声道对集合的声道对所包括的声道信号中的任意一个。
上述过程除将第一声道对加入第一声道对集合的步骤外,均为迭代处理步骤。即
a、判断多个声道对中除关联声道对外的其他声道对中是否包括相关值大于组对阈值的声道对。
b、若包括相关值大于组对阈值的声道对,则从其他声道对中选取相关值最大的一个声道对加入第一声道对集合。
此时只要其他声道对中包括相关值大于组对阈值的声道对,就可以迭代执行上述步骤b。
可选的,为了减少计算量,可以从相关值集合中将小于组对阈值的相关值删除,这样可以减少声道对的个数,进而减少迭代的次数。
(2)根据多个声道对获取至少五个声道信号对应的所有声道对集合,根据相关值集合获取所有声道对集合中任意一个声道对集合包含的所有声道对的相关值之和,将所有声道对集合中对应于最大的相关值之和的声道对集合确定为目标声道对集合。
相关值集合包括了第一音频帧的至少五个声道信号的多个声道对的相关值,将该多个声道对进行有规则的组合(即同一声道对集合中的多个声道对之间不能包含相同的声道信号),可以得到该至少五个声道信号对应的多个声道对集合。
在一种可能的实现方式中,当声道信号的个数为奇数时,可以采用以下公式计算所有声道对集合的个数:
Figure PCTCN2021106826-appb-000003
在一种可能的实现方式中,当声道信号的个数为偶数时,可以采用以下公式计算所有声道对集合的个数:
Figure PCTCN2021106826-appb-000004
其中,Pair_num表示所有声道对集合的个数,CH表示第一音频帧里参与多声道处理的声道信号的个数,是经过多声道掩码筛选后的结果。
可选的,为了减少计算量,得到相关值集合之后,可以根据多个声道对中除非相关声道对外的其他声道对获取多个声道对集合,该非相关声道对的相关值小于组对阈值,这样在获取声道对集合时可以减少参与计算的声道对的个数,进而减少声道对集合的个数,在后续步骤也可以减少相关值之和的计算量。
步骤304、根据第二组对方式对至少五个声道信号进行组对以获得第二声道对集合。
步骤305、获取第二声道对集合的第二相关值之和。
第二组对方式包括:先将至少五个声道信号对应的声道对中相关值最大的声道对加入第二声道对集合;然后将至少五个声道信号对应的声道对中除关联声道对外的其他声道对中相关值最大的声道对加入第二声道对集合,关联声道对包括已加入第一声道对集合的声道对所包括的声道信号中的任意一个。第二相关值之和是根据第二组对方式对至少五个声道信号进行组对得到的第二声道对集合中的所有声道对的相关值之和。
每次选取声道对时,只选取当前最大相关值对应的声道对加入第二声道对集合。
步骤306、根据第一相关值之和和第二相关值之和确定至少五个声道信号的目标组对方式。
当第一相关值之和大于第二相关值之和时,确定目标组对方式为第一组对方式;当第一相关值之和等于第二相关值之和时,确定目标组对方式为第二组对方式。
步骤307、获取至少五个声道信号的波动区间值。
波动区间值用于表示至少五个声道信号之间的能量或幅度的差异大小。
步骤308、当目标组对方式为第一组对方式时,根据至少五个声道信号的波动区间值确定能量均衡模式。
能量均衡模式包括第一能量均衡模式和第二能量均衡模式,其中,第一能量均衡模式使用一个声道对中两个声道信号获取一个声道对对应的两个均衡声道信号。第二能量均衡模式使用一个声道对中两个声道信号以及一个声道对外至少一个声道信号来获取一个声 道对对应的两个均衡声道信号。
根据至少五个声道信号的波动区间值确定能量均衡模式可以包括:当波动区间值符合预设条件时,确定能量均衡模式为第一能量均衡模式;当波动区间值不符合预设条件时,确定能量均衡模式为第二能量均衡模式。
上述波动区间值包括第一音频帧的能量平整度,波动区间值符合预设条件是指能量平整度小于第一阈值;或者,波动区间值包括第一音频帧的幅度平整度,波动区间值符合预设条件是指幅度平整度小于第二阈值;或者,波动区间值包括第一音频帧的能量偏离度,波动区间值符合预设条件是指能量偏离度不在第一预设范围内;或者,波动区间值包括第一音频帧的幅度偏离度,波动区间值符合预设条件是指幅度偏离度不在第二预设范围内。
在本发明实施例中,能量平整度表示的是经过多声道筛选单元筛选后的多个声道的当前帧频域系数能量归一化后的帧能量的波动性,可以通过平整度计算公式来衡量。当当前帧的所有的声道的能量相同时,当前帧的能量平整度为1;当当前帧的某个声道的能量为0时,当前帧的能量平整度为0,因此声道间的能量平整度的取值范围是[0,1]。声道间的能量的波动性越大,其能量平整度的值越小。在一种实施方式中,可以针对所有的声道格式(比如5.1、7.1、9.1、11.1)设置一个统一的第一阈值,例如可以为0.483,0.492,或0.504等等。在另一种实施方式中,针对不同的声道格式设置不同的第一阈值。比如,5.1声道格式的第一阈值为0.511,7.1声道格式的第一阈值为0.563,9.1声道格式的第一阈值为0.608,11.1声道格式的第一阈值为0.654。
幅度平整度表示的是经过多声道筛选单元筛选后的多个声道的当前帧频域系数幅度归一化后的帧幅度的波动性,可以通过平整度计算公式来衡量。当所有的声道的帧幅度相同时,其平整度为1;当其中某个声道的帧幅度为0时,其平整度为0。因此幅度平整度的范围在[0,1]之间。声道间的幅度的波动性越大,其平整度的值越小。在一种实施方式中,可以针对所有的声道格式(比如5.1、7.1、9.1、11.1)设置一个统一的第二阈值,例如可以为0.695,0.701,或0.710等等。在另一种实施方式中,可以针对不同的声道格式给出不同的第二阈值,例如,5.1声道格式的第二阈值可以为0.715,7.1声道格式的第二阈值可以为0.753,9.1声道格式的第二阈值可以为0.784,11.1声道格式的第二阈值可以为0.809。
由于幅度和能量之间存在平方关系,因此幅度平整度和能量平整度也存在平方的关系,即幅度平整度的平方对应的声道间的帧幅度的波动性近似等同于能量平整度对应的声道间帧能量的波动性。
本实施例可以通过至少五个声道信号的上述多种表示波动区间值的信息确定能量均衡模式,其包括能量平整度、幅度平整度、能量偏离度或者幅度偏离度。
(1)计算至少五个声道信号的能量值,根据至少五个声道信号的能量值获取第一音频帧的能量平整度,当第一音频帧的能量平整度小于第一阈值时,确定能量均衡模式为第一能量均衡模式;当第一音频帧的能量平整度大于或等于第一阈值时,确定能量均衡模式为第二能量均衡模式。
(2)计算至少五个声道信号的幅度值,根据至少五个声道信号的幅度值获取第一音频帧的幅度平整度,当第一音频帧的幅度平整度小于第二阈值时,确定能量均衡模式为第一能量均衡模式;当第一音频帧的幅度平整度大于或等于第二阈值时,确定能量均衡模式为第二能量均衡模式。
(3)计算至少五个声道信号的能量值,根据至少五个声道信号的能量值获取第一音频帧的能量偏离度,当第一音频帧的能量偏离度不在第一预设范围内时,确定能量均衡模式为第一能量均衡模式;当第一音频帧的能量偏离度在第一预设范围内时,确定能量均衡模式为第二能量均衡模式。
(4)计算至少五个声道信号的幅度值,根据至少五个声道信号的幅度值获取第一音频帧的幅度偏离度,当第一音频帧的幅度偏离度不在第二预设范围内时,确定能量均衡模式为第一能量均衡模式;当第一音频帧的幅度偏离度在第二预设范围内时,确定能量均衡模式为第二能量均衡模式。
需要说明的是,本申请还可以采用其他的能量均衡模式,对此不做具体限定。
在一种可能的实现方式中,根据至少五个声道信号的波动区间值确定能量均衡模式之前,还可以先根据与第一音频帧对应的编码码率确定能量均衡模式,即判断该编码码率是否大于码率阈值,当该编码码率大于码率阈值时,确定能量均衡模式为第二能量均衡模式;当该编码码率小于或等于码率阈值时,根据至少五个声道信号的波动区间值确定能量均衡模式。
步骤309、当目标组对方式为第二组对方式时,根据至少五个声道信号的波动区间值确定能量均衡模式以及再次确定至少五个声道信号的目标组对方式。
当波动区间值符合预设条件时,确定目标组对方式为第一组对方式,能量均衡模式为第一能量均衡模式;当波动区间值不符合预设条件时,确定目标组对方式为第二组对方式,能量均衡模式为第二能量均衡模式。
波动区间值及波动区间值符合预设条件可参见步骤308,此处不再赘述。
步骤310、根据能量均衡模式分别对至少五个声道信号进行能量均衡处理以得到至少五个均衡声道信号。
当能量均衡模式为第一能量均衡模式时,可以针对组对方式对应的目标声道对集合中的当前声道对,计算当前声道对包含的两个声道信号的能量或幅度值的平均值,根据平均值分别对两个声道信号进行能量均衡处理以得到对应的两个均衡声道信号。
这样当至少五个声道信号的波动区间值较大时,可以只在相关的两个声道信号之间进行能量均衡,使得立体声处理时对于比特的分配更符合声道信号的波动区间值,避免在低码率的编码环境中能量大的声道对因比特不足导致编码噪声可能会远大于能量小的声道对的编码噪声,而能量小的声道对的比特会有冗余的问题。
当能量均衡模式为第二能量均衡模式时,可以计算至少五个声道信号的能量或幅度值的平均值,根据平均值分别对至少五个声道信号进行能量均衡处理得到至少五个均衡声道信号。
步骤311、根据目标组对方式对应的声道对集合对至少五个均衡声道信号进行编码。
可选的,如果上述步骤中没有对至少五个声道信号进行能量均衡处理,则编码的对象是该至少五个声道信号,而非均衡声道信号。
本实施例将两种组对方式进行融合,根据组对方式对应的相关值之和确定采用现有技术的组对方式,还是采用以相关值之和最大为目的的组对方式,并且结合了声道信号的波动区间值确定能量均衡模式,使得能量均衡更符合声道的波动区间值,可以使得音频帧的编码方法更多样和更高效。
以下通过两个具体的实施例对图3所示方法实施例中如何确定组对方式和能量均衡模式的过程进行描述。以5.1声道为例,该5.1声道包括中央声道(C)、前置左声道(left,L)、前置右声道(right,R)、后置左环绕声道(left surround,LS)、后置右环绕声道(right surround,RS)以及0.1声道低频效果(low frequency effects,LFE),按照例如表1所示,给上述六个声道信号设置声道索引。
表1
声道索引 声道信号
0 L
1 R
2 LS
3 RS
4 C
5 LFE
图4是本申请提供的多声道音频信号的编码方法所应用的编码装置的一个示例性的结构图,该编码装置可以是音频译码系统10中的源设备12的编码器20,也可以是音频译码设备200中的译码模块270。该编码装置可以包括模式选择模块、多声道融合处理模块、声道编码模块和码流复用接口,其中,
模式选择模块的输入包括5.1声道的六个声道信号(L、R、C、LS、RS、LFE),以及多声道处理指示符(MultiProcFlag),输出包括筛选后的五个声道信号(L、R、C、LS、RS),以及模式选择边信息,该模式选择边信息包括能量均衡模式(Pair能量均衡模式或者整体能量均衡模式)和组对方式(MCT组对或者MCAC组对),以及与组对方式对应的相关值边信息(全局相关值边信息或者MCT相关值边信息)。
多声道融合处理模块包括多声道编码工具(multi-channel coding tool,MCT)单元和多声道自适应组对(multi-channel adaptive coupling,MCAC)单元,根据模式选择边信息可以确定能量均衡模式,以及由这两个模块中的哪一个模块对五个声道信号(L、R、C、LS、RS)进行能量均衡处理和立体声处理,输出包括处理声道信号(P1-P4、C),以及多声道边信息,该多声道边信息包括声道对集合。
声道编码模块使用单声道编码单元(或者单声道声道盒、单声道工具)对多声道融合处理模块输出的处理声道信号(P1-P4、C)进行编码输出对应的编码声道信号(E1-E5)。单声道编码单元对声道信号编码过程中,对具有较高能量(或较高振幅)的声道信号分配较多的比特数,对具有较少能量(或较少振幅)的声道信号分配较少的比特数。可选的,声道编码模块也可以采用立体声编码单元,例如参数立体声编码器或损耗立体声编码器对多声道处理模块输出的处理声道信号进行编码。
需要说明的是,未组对的声道信号(例如C)可以直接输入声道编码模块得到编码声道信号E5。
码流复用接口产生编码多声道信号,该编码多声道信号包括声道编码模块输出的编码声道信号(E1-E5)和边信息(包括模式选择边信息和多声道边信息)。可选的,码流复用接口可以将编码多声道信号处理成串行信号或串行比特流。
图5a是模式选择模块的一个示例性的结构图,如图5a所示,模式选择模块包括:多声道筛选单元、全局相关值统计单元、MCT相关值统计单元和多声道模式选择单元。
多声道筛选单元根据多声道处理指示符(MultiProcFlag)从六个声道信号(L、R、C、LS、RS、LFE)中筛选出参与多声道处理的五个声道信号,即L、R、C、LS、RS。
全局相关值统计单元先计算参与多声道处理的声道信号,即L、R、C、LS、RS中任意两个声道信号之间归一化的相关值。本申请可以采用以下公式计算两个声道信号(例如声道信号ch1和声道信号ch2)之间的相关值:
Figure PCTCN2021106826-appb-000005
其中,corr(ch1,ch2)表示声道信号ch1和声道信号ch2之间归一化的相关值,spec_ch1(i)表示声道信号ch1的第i个频点的频域系数,spec_ch2(i)表示声道信号ch2的第i个频点的频域系数,N表示一个音频帧的总频点数。然后根据任意两个声道信号之间归一化的相关值确定参与多声道处理的声道信号对应的所有声道对集合中,相关值之和(即声道对集合中包含的所有声道对的相关值之和)最大者、以及该最大者对应的声道对集合(视为目标声道对集合)。最后输出全局相关值边信息,该全局相关值边信息包括最大相关值之和corr_sum_max和目标声道对集合。假设目标声道对集合包括(R,C)和(LS,RS),最大相关值之和corr_sum_max=corr(L,R)+corr(LS,RS)。
MCT相关值统计单元先计算参与多声道处理的五个声道信号,即L、R、C、LS、RS中任意两个声道信号之间归一化的相关值。同样可以采用上述公式计算两个声道信号(例如声道信号ch1和声道信号ch2)之间的相关值。然后在第一迭代处理中选择最高相关值对应的声道对(例如L和R)加入目标声道对集合,在第二迭代处理中删除包含L和/或R的声道对的相关值,在剩余的相关值中选择最大相关值对应的声道对(例如LS和RS)加入目标声道对集合,以此类推,直到相关值清空。最后输出MCT相关值边信息,该MCT相关值边信息包括目标声道对集合和目标声道对集合对应的相关值之和corr_sum_curr。假设目标声道对集合包括(R,C)和(LS,RS),相关值之和corr_sum_curr=corr(L,R)+corr(LS,RS)。
需要说明的是,全局相关值统计单元和MCT相关值统计单元在得到任意两个声道信号之间归一化的相关值之后,可以根据设定的组对阈值对相关值进行筛选,即,大于或等于组对阈值的相关值保留,而小于组对阈值的相关值删除,或者将其值设置为0。这样可以减少计算量。
图5b是多声道模式选择单元的一个示例性的结构图,如图5b所示,多声道模式选择单元包括模块选择单元和能量均衡选择单元。
模块选择单元根据全局相关值边信息和MCT相关值边信息确定组对方式:当corr_sum_max>corr_sum_curr时,组对方式为全局相关值统计单元采用的多声道自适应组对(multi-channel adaptive coupling,MCAC);当corr_sum_max=corr_sum_curr时,组对方式为MCT相关值统计单元采用的MCT组对。
进一步的,对于组对方式为MCT组对的情况,模块选择单元还根据能量均衡选择单元提供的多个声道信号的波动区间值确定目标组对方式,例如当五个声道信号(L、R、C、LS、RS)的能量平整度小于第一阈值时,目标组对方式为MCAC组对;当五个声道信号 (L、R、C、LS、RS)的能量平整度大于或等于第一阈值时,目标组对方式为MCT组对。
需要说明的是,初次确定目标组对方式为MCT组对时,可以根据能量均衡选择单元提供的多个声道信号的波动区间值一次性确定出五个声道信号的能量均衡模式和最终的目标组对方式。例如,当五个声道信号(L、R、C、LS、RS)的能量平整度小于第一阈值时,目标组对方式为MCAC组对,能量均衡模式为第一能量均衡模式;当五个声道信号(L、R、C、LS、RS)的能量平整度大于或等于第一阈值时,组对方式为MCT组对,能量均衡模式为第二能量均衡模式。
能量均衡选择单元先计算各个声道信号的能量或幅度值,本申请可以采用以下公式计算声道信号(ch)的能量或幅度值:
Figure PCTCN2021106826-appb-000006
其中,energy(ch)表示声道信号ch的能量或幅度值,sepc_coeff(ch,i)表示声道信号ch的第i个频点的频域系数,N表示一个音频帧的总频点数。
然后计算各个声道信号的归一化的能量或幅度值,本申请可以采用以下公式计算声道信号(ch)的归一化的能量或幅度值:
Figure PCTCN2021106826-appb-000007
其中,energy_uniform(ch)表示声道信号ch的归一化的能量或幅度值,energy_max表示五个声道信号的能量或幅度值(即energy(L)、energy(R)、energy(C)、energy(LS)和energy(RS))中的最大者。若energy_max=0,则energy_uniform(ch)均为0。
接下来计算五个声道信号的波动区间值,可选的,波动区间值可以是指能量平整度,本申请可以采用以下公式计算五个声道信号的能量平整度:
Figure PCTCN2021106826-appb-000008
其中,efm表示五个声道信号的能量平整度,L、R、C、LS、RS的声道索引参见表1。
可选的,波动区间值也可以是指能量偏离度,基于上述计算得到的归一化的能量或幅度值energy_uniform(ch),本申请可以采用以下公式计算五个声道信号的平均能量或幅度值:
Figure PCTCN2021106826-appb-000009
其中,avg_energy_uniform表示五个声道信号的平均能量或幅度值,L、R、C、LS、RS的声道索引参见表1。
采用以下公式计算声道信号(ch)的能量偏离度:
Figure PCTCN2021106826-appb-000010
其中,deviation(ch)表示声道信号ch的能量偏离度。将L、R、C、LS、RS的能量偏 离度中的最大者确定为五个声道信号的能量偏离度deviation。
可选的,波动区间值还可以是指幅度值或幅度偏离度,其原理和上述能量相关的值类似,此处不再赘述。
如上所述,本申请的能量均衡模式包括两种实现方式,其中,Pair能量均衡模式是针对模块选择单元确定的组对方式对应的目标声道对集合中的各个声道对,使用一个声道对中两个声道信号获取该一个声道对对应的两个均衡声道信号。整体能量均衡模式是使用一个声道对中两个声道信号以及一个声道对外至少一个声道信号来获取该一个声道对对应的两个均衡声道信号。而对于没有组对的声道信号,其对应的均衡声道信号即为该声道信号本身。
能量均衡选择单元根据波动区间值确定能量均衡模式,包括以下两种判断方式:
(1)当efm小于第一阈值时,能量均衡模式为Pair能量均衡模式;当efm大于或等于第一阈值时,能量均衡模式为整体能量均衡模式。
(2)当deviation在值区间[threshold,1/threshold]之内时,能量均衡模式为整体能量均衡模式;当deviation不在值区间[threshold,1/threshold]之内时,能量均衡模式为Pair能量均衡模式。threshold的取值范围可以是(0,1)。
deviation可以表示当前帧的各声道的频域幅度相对于当前帧的各声道的频域幅度的平均值的比值,即幅度偏离度。当当前帧的当前声道的频域幅度和当前帧的各声道的频域幅度的平均值之间的比例关系小于5(对应于threshold=0.2)时,可以分为两种情况:一、当前声道的频域幅度小于或等于当前帧的各声道的频域幅度的平均值,满足条件的“当前声道的频域幅度/当前帧的各声道的频域幅度的平均值”在(0.2,1]之间,也就是在(threshold,1]之间;二,当前声道的频域幅度大于当前帧的各声道的频域幅度的平均值,满足条件的“当前声道的频域幅度/当前帧的各声道的频域幅度的平均值”在(1,5)之间;综合以上两种情况,当当前声道的频域幅度和当前帧的各声道的频域幅度的平均值的比例关系小于5时,满足条件的“当前声道的频域幅度/当前帧的各声道的频域幅度的平均值”的范围在(0.2,5)之间,也就是在(threshold,1/threshold)之间,(threshold,1/threshold)即为上述的第二预设范围。其中,threshold的取值可以在(0,1)之间,threshold的值越小,表示当前声道的频域幅度相对于当前帧的各声道的频域幅度的平均值的波动越大,threshold的值越大,表示当前声道的频域幅度相对于当前帧的各声道的频域幅度的平均值的波动越小。其中,threshold的取值可以是0.2,0.15,0.125,0.11,或0.1等等。
deviation也可以表示的是各声道的频域能量相对于各声道的频域能量的平均值的比值,即能量偏离度。当当前帧的当前声道的频域能量和各声道的频域能量的平均值的比例关系小于25(threshold=0.04)时,可以分为两种情况:一、当前声道的频域能量小于或等于当前帧的各声道的频域能量的平均值,满足条件的“当前声道的频域能量/当前帧的各声道的频域能量的平均值”在(0.04,1]之间,也就是(threshold,1];二,当前声道的频域能量大于当前帧的各声道的频域能量的平均值,满足条件的“当前声道的频域能量/当前帧的各声道的频域能量的平均值”在(1,25)之间;综合以上两种情况,当当前声道的频域能量和当前帧的各声道的频域能量的平均值的比例关系小于25时,满足条件的“当前声道的频域能量/当前帧的各声道的频域能量的平均值”的范围在(0.04,25)之间,也就是在(threshold,1/threshold)之间,(threshold,1/threshold)即为上述的第一预设范围。其中,threshold 可以在(0,1)之间,threshold的值越小,表示当前声道的频域能量相对于当前帧的各声道的频域能量的平均值的波动越大,threshold的值越大,表示当前声道的频域能量相对于当前帧的各声道的频域能量的平均值的波动越小。Threshold的取值可以是0.04,0.0225,0.015625,0.0121,或0.01等等。
由于幅度和能量之间存在平方关系,因此幅度偏离度和能量偏离度也存在平方的关系,即幅度偏离度的平方对应的声道间的帧幅度的波动性近似等同于能量偏离度对应的声道间帧能量的波动性。
在另一种实施方式中,上述的第一预设范围也可以扩展成(0,1/threshold),此时Pair能量均衡的区间范围为[1/threshold,+∞),此时表明当当前声道的频域能量大于当前帧的各声道的频域能量的平均值,并且“当前声道的频域能量/当前帧的各声道的频域能量的平均值”大于1/threshold时,才进行Pair能量均衡。
在另一种实施方式中,上述的第二预设范围也可以扩展成(0,1/threshold),此时Pair幅度均衡的区间范围为[1/threshold,+∞),此时表明当当前声道的频域幅度大于当前帧的各声道的频域幅度的平均值,并且“当前声道的频域幅度/当前帧的各声道的频域幅度的平均值”大于1/threshold时,才进行Pair幅度均衡。
需要说明的是,能量均衡选择单元可以根据五个声道信号计算归一化的能量或幅度值,进而得到能量平整度或能量偏离度,也可以只根据组对成功的声道信号计算归一化的能量或幅度值,进而得到能量平整度或能量偏离度,还可以根据五个声道信号中的部分声道信号计算归一化的能量或幅度值,进而得到能量平整度或能量偏离度。本申请对此不做具体限定。
多声道融合处理模块包括MCT单元和MCAC单元,其中,
MCT单元先采用整体能量均衡模式对五个声道信号(L、R、C、LS、RS)进行能量均衡处理得到Le、Re、Ce、LSe和RSe,然后根据MCT相关值边信息获取目标声道对集合,通过立体声盒对目标声道对集合中声道对的两个均衡声道信号(例如,(Le,Re)或者(LSe,RSe))进行立体声处理。
MCAC单元根据全局相关值边信息获取目标声道对集合(例如,(L,R)和(LS,RS)),再根据能量均衡模式,如果是Pair能量均衡模式,则对目标声道对集合中的声道对的两个声道信号(例如,(L,R)和(LS,RS))进行能量均衡处理得到(Le,Re)和(LSe,RSe),再通过立体声盒对均衡声道信号进行立体声处理;如果是整体能量均衡模式,则对五个声道信号进行能量均衡处理得到Le、Re、Ce、LSe、RSe,再根据目标声道对集合,通过立体声盒对声道对中的两个均衡声道信号(例如(Le,Re)或者(LSe,RSe))进行立体声处理。
立体声处理单元可以采用基于预测的或者基于Karhunen-Loeve变换(Karhunen-Loeve Transform,KLT)的处理,即输入的两个声道信号被旋转(例如经由2×2旋转矩阵)以最大化能量压缩,从而将信号能量集中于一个声道内。
立体声处理单元对输入的两个声道信号处理后,输出该两个声道信号对应的处理声道信号(P1-P4)以及多声道边信息,多声道边信息包括相关值之和和目标声道对集合。
图6是本申请提供的多声道音频的解码方法所应用的解码装置的一个示例性的结构图,该解码装置可以是音频译码系统10中的目的设备14的解码器30,也可以是音频译码设备200中的译码模块270。该解码装置可以包括码流解复用接口、声道解码模块和多 声道处理模块,其中,
码流解复用接口接收来自编码装置的编码多声道信号(例如串行比特流bitstream),解复用后得到编码声道信号(E)和多声道参数(SIDE_PAIR)。例如,E1、E2、E3、E4、…、Ei1、Ei,以及SIDE_PAIR1,SIDE_PAIR2,…,SIDE_PAIRm。
声道解码模块使用单声道解码单元(或者单声道声道盒、单声道工具)对码流解复用接口输出的编码声道信号进行解码输出解码声道信号(D)。例如,E1、E2、E3、E4、…、Ei1、Ei分别通过一个单声道解码单元进行解码得到E1解码得D1、D2、D3、D4、…、Di1、Di。
多声道处理模块包括多个立体声处理单元,立体声处理单元可以采用基于预测的或者基于KLT的处理,即输入的两个声道信号被反旋转(例如经由2×2旋转矩阵),从而将信号变换到原始信号方向。
声道解码模块输出的解码声道信号藉由多声道参数可以识别哪两个解码声道信号组对,将组对的解码声道信号输入立体声处理单元,立体声处理单元对输入的两个解码声道信号处理后,输出该两个解码声道信号对应的声道信号(CH)。例如,立体声处理单元1根据SIDE_PAIR1对D1和D2处理,得到CH1和CH2,立体声处理单元2根据SIDE_PAIR2对D3和D4处理,得到CH3和CH4,…,立体声处理单元m根据SIDE_PAIRm对Di-1和Di处理,得到CHi-1和CHi。
需要说明的是,针对未组对的声道信号(例如CHj)不需要经过多声道处理模块中的立体声处理单元处理,可以解码后直接输出。
图7为本申请编码装置实施例的结构示意图,如图7所示,该装置可以应用于上述实施例中的源设备12或音频译码设备200。本实施例的编码装置可以包括:获取模块601、编码模块602和确定模块603。其中,
获取模块601,用于获取待编码的第一音频帧,所述第一音频帧包括至少五个声道信号;根据第一组对方式对所述至少五个声道信号进行组对以获得第一声道对集合,所述第一声道对集合包括至少一个声道对,一个声道对包括所述至少五个声道信号中的两个声道信号;获取所述第一声道对集合的第一相关值之和,所述一个声道对具有一个相关值,所述相关值用于表示所述一个声道对的两个声道信号之间的相关性;根据第二组对方式对所述至少五个声道信号进行组对以获得第二声道对集合;获取所述第二声道对集合的第二相关值之和;确定模块603,用于根据所述第一相关值之和和所述第二相关值之和确定所述至少五个声道信号的目标组对方式;编码模块602,用于根据所述目标组对方式对所述至少五个声道信号进行编码,所述目标组对方式为所述第一组对方式或者所述第二组对方式。
在一种可能的实现方式中,所述确定模块603,具体用于当所述第一相关值之和大于所述第二相关值之和时,确定所述目标组对方式为所述第一组对方式;当所述第一相关值之和等于所述第二相关值之和时,确定所述目标组对方式为所述第二组对方式。
在一种可能的实现方式中,所述确定模块603,还用于获取所述至少五个声道信号的波动区间值;当所述目标组对方式为所述第一组对方式时,根据所述至少五个声道信号的波动区间值确定能量均衡模式;当所述目标组对方式为所述第二组对方式时,根据所述至少五个声道信号的波动区间值确定能量均衡模式以及再次确定所述至少五个声道信号的目标组对方式;相应的,所述编码模块602,还用于根据所述能量均衡模式分别对所述至 少五个声道信号进行能量均衡处理以得到至少五个均衡声道信号;根据所述目标组对方式对所述至少五个均衡声道信号进行编码;所述能量均衡模式为第一能量均衡模式或者第二能量均衡模式。
在一种可能的实现方式中,所述确定模块603,具体用于当所述波动区间值符合预设条件时,确定所述能量均衡模式为所述第一能量均衡模式;或者,当所述波动区间值不符合预设条件时,确定所述能量均衡模式为所述第二能量均衡模式。
在一种可能的实现方式中,所述确定模块603,具体用于当所述波动区间值符合预设条件时,确定所述目标组对方式为所述第一组对方式,所述能量均衡模式为所述第一能量均衡模式;或者,当所述波动区间值不符合预设条件时,确定所述目标组对方式为所述第二组对方式,所述能量均衡模式为所述第二能量均衡模式。
在一种可能的实现方式中,所述确定模块603,还用于判断与所述第一音频帧对应的编码码率是否大于码率阈值;当所述编码码率大于所述码率阈值时,确定所述能量均衡模式为所述第二能量均衡模式;当所述编码码率小于或等于所述码率阈值时,才根据所述波动区间值确定所述能量均衡模式。
在一种可能的实现方式中,所述波动区间值包括所述第一音频帧的能量平整度;所述波动区间值符合预设条件是指所述能量平整度小于第一阈值;或者,所述波动区间值包括所述第一音频帧的幅度平整度;所述波动区间值符合预设条件是指所述幅度平整度小于第二阈值;或者,所述波动区间值包括所述第一音频帧的能量偏离度;所述波动区间值符合预设条件是指所述能量偏离度不在第一预设范围内;或者,所述波动区间值包括所述第一音频帧的幅度偏离度;所述波动区间值符合预设条件是指所述幅度偏离度不在第二预设范围内。
在一种可能的实现方式中,所述获取模块601,具体用于以获取最大相关值之和为目的,从所述至少五个声道信号对应的声道对中选取声道对加入所述第一声道对集合。
在一种可能的实现方式中,所述获取模块601,具体用于先将所述至少五个声道信号对应的声道对中相关值最大的声道对加入所述第二声道对集合;然后将所述至少五个声道信号对应的声道对中除关联声道对外的其他声道对中相关值最大的声道对加入所述第二声道对集合,所述关联声道对包括已加入所述第一声道对集合的声道对所包括的声道信号中的任意一个。
在一种可能的实现方式中,当所述能量均衡模式为所述第一能量均衡模式时,所述编码模块602,具体用于针对所述组对方式对应的目标声道对集合中的当前声道对,计算所述当前声道对包含的两个声道信号的能量或幅度值的平均值,根据所述平均值分别对所述两个声道信号进行能量均衡处理以得到对应的两个均衡声道信号。
在一种可能的实现方式中,当所述能量均衡模式为所述第二能量均衡模式时,所述编码模块602,具体用于计算所述至少五个声道信号的能量或幅度值的平均值,根据所述平均值分别对所述至少五个声道信号进行能量均衡处理得到所述至少五个均衡声道信号。
本实施例的装置,可以用于执行图3所示方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。
图8为本申请设备实施例的结构示意图,如图8所示,该设备可以是上述实施例中的编码设备。本实施例的设备可以包括:处理器701和存储器702,存储器702,用于存储 一个或多个程序;当所述一个或多个程序被所述处理器701执行,使得所述处理器701实现如图3所示方法实施例的技术方案。
在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。处理器可以是通用处理器、数字信号处理器(digital signal processor,DSP)、特定应用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。本申请公开的方法的步骤可以直接体现为硬件编码处理器执行完成,或者用编码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
上述各实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (26)

  1. 一种多声道音频信号的编码方法,其特征在于,包括:
    获取待编码的第一音频帧,所述第一音频帧包括至少五个声道信号;
    根据第一组对方式对所述至少五个声道信号进行组对以获得第一声道对集合,所述第一声道对集合包括至少一个声道对,一个声道对包括所述至少五个声道信号中的两个声道信号;
    获取所述第一声道对集合的第一相关值之和,所述一个声道对具有一个相关值,所述相关值用于表示所述一个声道对的两个声道信号之间的相关性;
    根据第二组对方式对所述至少五个声道信号进行组对以获得第二声道对集合;
    获取所述第二声道对集合的第二相关值之和;
    根据所述第一相关值之和和所述第二相关值之和确定所述至少五个声道信号的目标组对方式;
    根据所述目标组对方式对所述至少五个声道信号进行编码,所述目标组对方式为所述第一组对方式或者所述第二组对方式。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述第一相关值之和和所述第二相关值之和确定所述至少五个声道信号的目标组对方式,包括:
    当所述第一相关值之和大于所述第二相关值之和时,确定所述目标组对方式为所述第一组对方式;
    当所述第一相关值之和等于所述第二相关值之和时,确定所述目标组对方式为所述第二组对方式。
  3. 根据权利要求1或2所述的方法,其特征在于,所述根据所述目标组对方式对所述至少五个声道信号进行编码之前,还包括:
    获取所述至少五个声道信号的波动区间值;
    当所述目标组对方式为所述第一组对方式时,根据所述至少五个声道信号的波动区间值确定能量均衡模式;
    当所述目标组对方式为所述第二组对方式时,根据所述至少五个声道信号的波动区间值确定能量均衡模式以及再次确定所述至少五个声道信号的目标组对方式;
    根据所述能量均衡模式分别对所述至少五个声道信号进行能量均衡处理以得到至少五个均衡声道信号;
    相应的,所述根据所述目标组对方式对所述至少五个声道信号进行编码,包括:
    根据所述目标组对方式对所述至少五个均衡声道信号进行编码。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述至少五个声道信号的波动区间值确定能量均衡模式,包括:
    当所述波动区间值符合预设条件时,确定所述能量均衡模式为所述第一能量均衡模式;或者,
    当所述波动区间值不符合预设条件时,确定所述能量均衡模式为所述第二能量均衡模式。
  5. 根据权利要求3或4所述的方法,其特征在于,所述根据所述至少五个声道信号的 波动区间值确定能量均衡模式以及再次确定所述至少五个声道信号的目标组对方式,包括:
    当所述波动区间值符合预设条件时,确定所述目标组对方式为所述第一组对方式,所述能量均衡模式为所述第一能量均衡模式;或者,
    当所述波动区间值不符合预设条件时,确定所述目标组对方式为所述第二组对方式,所述能量均衡模式为所述第二能量均衡模式。
  6. 根据权利要求3-5中任一项所述的方法,其特征在于,所述根据所述至少五个声道信号的波动区间值确定能量均衡模式之前,还包括:
    判断与所述第一音频帧对应的编码码率是否大于码率阈值;
    当所述编码码率大于所述码率阈值时,确定所述能量均衡模式为所述第二能量均衡模式;
    当所述编码码率小于或等于所述码率阈值时,才根据所述波动区间值确定所述能量均衡模式。
  7. 根据权利要求4-6中任一项所述的方法,其特征在于,所述波动区间值包括所述第一音频帧的能量平整度;所述波动区间值符合预设条件是指所述能量平整度小于第一阈值;或者,
    所述波动区间值包括所述第一音频帧的幅度平整度;所述波动区间值符合预设条件是指所述幅度平整度小于第二阈值;或者,
    所述波动区间值包括所述第一音频帧的能量偏离度;所述波动区间值符合预设条件是指所述能量偏离度不在第一预设范围内;或者,
    所述波动区间值包括所述第一音频帧的幅度偏离度;所述波动区间值符合预设条件是指所述幅度偏离度不在第二预设范围内。
  8. 根据权利要求1-7中任一项所述的方法,其特征在于,所述根据第一组对方式对所述至少五个声道信号进行组对以获得第一声道对集合,包括:
    以获取最大相关值之和为目的,从所述至少五个声道信号对应的声道对中选取声道对加入所述第一声道对集合。
  9. 根据权利要求1-8中任一项所述的方法,其特征在于,所述根据第二组对方式对所述至少五个声道信号进行组对以获得第二声道对集合包括:
    先将所述至少五个声道信号对应的声道对中相关值最大的声道对加入所述第二声道对集合;
    然后将所述至少五个声道信号对应的声道对中除关联声道对外的其他声道对中相关值最大的声道对加入所述第二声道对集合,所述关联声道对包括已加入所述第一声道对集合的声道对所包括的声道信号中的任意一个。
  10. 根据权利要求3-7中任一项所述的方法,其特征在于,当所述能量均衡模式为所述第一能量均衡模式时,所述根据所述能量均衡模式分别对所述至少五个声道信号进行能量均衡处理以得到至少五个均衡声道信号,包括:
    针对所述组对方式对应的目标声道对集合中的当前声道对,计算所述当前声道对包含的两个声道信号的能量或幅度值的平均值,根据所述平均值分别对所述两个声道信号进行能量均衡处理以得到对应的两个均衡声道信号。
  11. 根据权利要求3-7中任一项所述的方法,其特征在于,当所述能量均衡模式为所 述第二能量均衡模式时,所述根据所述能量均衡模式分别对所述至少五个声道信号进行能量均衡处理以得到至少五个均衡声道信号,包括:
    计算所述至少五个声道信号的能量或幅度值的平均值,根据所述平均值分别对所述至少五个声道信号进行能量均衡处理得到所述至少五个均衡声道信号。
  12. 一种编码装置,其特征在于,包括:
    获取模块,用于获取待编码的第一音频帧,所述第一音频帧包括至少五个声道信号;根据第一组对方式对所述至少五个声道信号进行组对以获得第一声道对集合,所述第一声道对集合包括至少一个声道对,一个声道对包括所述至少五个声道信号中的两个声道信号;获取所述第一声道对集合的第一相关值之和,所述一个声道对具有一个相关值,所述相关值用于表示所述一个声道对的两个声道信号之间的相关性;根据第二组对方式对所述至少五个声道信号进行组对以获得第二声道对集合;获取所述第二声道对集合的第二相关值之和;
    确定模块,用于根据所述第一相关值之和和所述第二相关值之和确定所述至少五个声道信号的目标组对方式;
    编码模块,用于根据所述目标组对方式对所述至少五个声道信号进行编码,所述目标组对方式为所述第一组对方式或者所述第二组对方式。
  13. 根据权利要求12所述的装置,其特征在于,所述确定模块,具体用于当所述第一相关值之和大于所述第二相关值之和时,确定所述目标组对方式为所述第一组对方式;当所述第一相关值之和等于所述第二相关值之和时,确定所述目标组对方式为所述第二组对方式。
  14. 根据权利要求12或13所述的装置,其特征在于,所述确定模块,还用于获取所述至少五个声道信号的波动区间值;当所述目标组对方式为所述第一组对方式时,根据所述至少五个声道信号的波动区间值确定能量均衡模式;当所述目标组对方式为所述第二组对方式时,根据所述至少五个声道信号的波动区间值确定能量均衡模式以及再次确定所述至少五个声道信号的目标组对方式;
    相应的,所述编码模块,还用于根据所述能量均衡模式分别对所述至少五个声道信号进行能量均衡处理以得到至少五个均衡声道信号;根据所述目标组对方式对所述至少五个均衡声道信号进行编码。
  15. 根据权利要求14所述的装置,其特征在于,所述确定模块,具体用于当所述波动区间值符合预设条件时,确定所述能量均衡模式为所述第一能量均衡模式;或者,当所述波动区间值不符合预设条件时,确定所述能量均衡模式为所述第二能量均衡模式。
  16. 根据权利要求14或15所述的装置,其特征在于,所述确定模块,具体用于当所述波动区间值符合预设条件时,确定所述目标组对方式为所述第一组对方式,所述能量均衡模式为所述第一能量均衡模式;或者,当所述波动区间值不符合预设条件时,确定所述目标组对方式为所述第二组对方式,所述能量均衡模式为所述第二能量均衡模式。
  17. 根据权利要求14-16中任一项所述的装置,其特征在于,所述确定模块,还用于判断与所述第一音频帧对应的编码码率是否大于码率阈值;当所述编码码率大于所述码率阈值时,确定所述能量均衡模式为所述第二能量均衡模式;当所述编码码率小于或等于所述码率阈值时,才根据所述波动区间值确定所述能量均衡模式。
  18. 根据权利要求15-17中任一项所述的装置,其特征在于,所述波动区间值包括所述第一音频帧的能量平整度;所述波动区间值符合预设条件是指所述能量平整度小于第一阈值;或者,
    所述波动区间值包括所述第一音频帧的幅度平整度;所述波动区间值符合预设条件是指所述幅度平整度小于第二阈值;或者,
    所述波动区间值包括所述第一音频帧的能量偏离度;所述波动区间值符合预设条件是指所述能量偏离度不在第一预设范围内;或者,
    所述波动区间值包括所述第一音频帧的幅度偏离度;所述波动区间值符合预设条件是指所述幅度偏离度不在第二预设范围内。
  19. 根据权利要求12-18中任一项所述的装置,其特征在于,所述获取模块,具体用于以获取最大相关值之和为目的,从所述至少五个声道信号对应的声道对中选取声道对加入所述第一声道对集合。
  20. 根据权利要求12-19中任一项所述的装置,其特征在于,所述获取模块,具体用于先将所述至少五个声道信号对应的声道对中相关值最大的声道对加入所述第二声道对集合;然后将所述至少五个声道信号对应的声道对中除关联声道对外的其他声道对中相关值最大的声道对加入所述第二声道对集合,所述关联声道对包括已加入所述第一声道对集合的声道对所包括的声道信号中的任意一个。
  21. 根据权利要求14-18中任一项所述的装置,其特征在于,当所述能量均衡模式为所述第一能量均衡模式时,所述编码模块,具体用于针对所述组对方式对应的目标声道对集合中的当前声道对,计算所述当前声道对包含的两个声道信号的能量或幅度值的平均值,根据所述平均值分别对所述两个声道信号进行能量均衡处理以得到对应的两个均衡声道信号。
  22. 根据权利要求14-18中任一项所述的装置,其特征在于,当所述能量均衡模式为所述第二能量均衡模式时,所述编码模块,具体用于计算所述至少五个声道信号的能量或幅度值的平均值,根据所述平均值分别对所述至少五个声道信号进行能量均衡处理得到所述至少五个均衡声道信号。
  23. 一种设备,其特征在于,包括:
    一个或多个处理器;
    存储器,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-11中任一项所述的方法。
  24. 一种计算机可读存储介质,其特征在于,包括计算机程序,所述计算机程序在计算机上被执行时,使得所述计算机执行权利要求1-11中任一项所述的方法。
  25. 一种计算机可读存储介质,其特征在于,包括根据如权利要求1-11中任一项所述的多声道音频信号的编码方法获得的编码码流。
  26. 一种计算机程序,其特征在于,所述计算机程序在计算机上被执行时,使得所述计算机执行权利要求1-11中任一项所述的方法。
PCT/CN2021/106826 2020-07-17 2021-07-16 多声道音频信号的编码方法和装置 WO2022012675A1 (zh)

Priority Applications (6)

Application Number Priority Date Filing Date Title
EP21841790.5A EP4174852A4 (en) 2020-07-17 2021-07-16 CODING METHOD AND DEVICE FOR A MULTI-CHANNEL AUDIO SIGNAL
BR112023000667A BR112023000667A2 (pt) 2020-07-17 2021-07-16 Método e aparelho de codificação de sinais de áudio de canais múltiplos, dispositivo e meio de armazenamento legível por computador
KR1020237004414A KR20230035383A (ko) 2020-07-17 2021-07-16 멀티 채널 오디오 신호 코딩 방법 및 장치
JP2023503019A JP2023534049A (ja) 2020-07-17 2021-07-16 マルチチャネル音声信号コーディング方法及び装置
AU2021310236A AU2021310236A1 (en) 2020-07-17 2021-07-16 Multi-channel audio signal coding method and apparatus
US18/154,486 US20230186924A1 (en) 2020-07-17 2023-01-13 Multi-Channel Audio Signal Coding Method and Apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010728902.2A CN114023338A (zh) 2020-07-17 2020-07-17 多声道音频信号的编码方法和装置
CN202010728902.2 2020-07-17

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/154,486 Continuation US20230186924A1 (en) 2020-07-17 2023-01-13 Multi-Channel Audio Signal Coding Method and Apparatus

Publications (1)

Publication Number Publication Date
WO2022012675A1 true WO2022012675A1 (zh) 2022-01-20

Family

ID=79554491

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/106826 WO2022012675A1 (zh) 2020-07-17 2021-07-16 多声道音频信号的编码方法和装置

Country Status (8)

Country Link
US (1) US20230186924A1 (zh)
EP (1) EP4174852A4 (zh)
JP (1) JP2023534049A (zh)
KR (1) KR20230035383A (zh)
CN (1) CN114023338A (zh)
AU (1) AU2021310236A1 (zh)
BR (1) BR112023000667A2 (zh)
WO (1) WO2022012675A1 (zh)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1424713A (zh) * 2003-01-14 2003-06-18 北京阜国数字技术有限公司 高频耦合的伪小波5声道音频编/解码方法
US20040230423A1 (en) * 2003-05-16 2004-11-18 Divio, Inc. Multiple channel mode decisions and encoding
WO2008108077A1 (ja) * 2007-03-02 2008-09-12 Panasonic Corporation 符号化装置および符号化方法
CN101765880A (zh) * 2007-07-27 2010-06-30 松下电器产业株式会社 语音编码装置和语音编码方法
CN104240712A (zh) * 2014-09-30 2014-12-24 武汉大学深圳研究院 一种三维音频多声道分组聚类编码方法及系统
US20160078877A1 (en) * 2013-04-26 2016-03-17 Nokia Technologies Oy Audio signal encoder
CN106710600A (zh) * 2016-12-16 2017-05-24 广州广晟数码技术有限公司 多声道音频信号的去相关编码方法和装置
CN109389987A (zh) * 2017-08-10 2019-02-26 华为技术有限公司 音频编解码模式确定方法和相关产品

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3208800A1 (en) * 2016-02-17 2017-08-23 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for stereo filing in multichannel coding
RU2769788C1 (ru) * 2018-07-04 2022-04-06 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Кодер, многосигнальный декодер и соответствующие способы с использованием отбеливания сигналов или постобработки сигналов

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1424713A (zh) * 2003-01-14 2003-06-18 北京阜国数字技术有限公司 高频耦合的伪小波5声道音频编/解码方法
US20040230423A1 (en) * 2003-05-16 2004-11-18 Divio, Inc. Multiple channel mode decisions and encoding
WO2008108077A1 (ja) * 2007-03-02 2008-09-12 Panasonic Corporation 符号化装置および符号化方法
CN101765880A (zh) * 2007-07-27 2010-06-30 松下电器产业株式会社 语音编码装置和语音编码方法
US20160078877A1 (en) * 2013-04-26 2016-03-17 Nokia Technologies Oy Audio signal encoder
CN104240712A (zh) * 2014-09-30 2014-12-24 武汉大学深圳研究院 一种三维音频多声道分组聚类编码方法及系统
CN106710600A (zh) * 2016-12-16 2017-05-24 广州广晟数码技术有限公司 多声道音频信号的去相关编码方法和装置
CN109389987A (zh) * 2017-08-10 2019-02-26 华为技术有限公司 音频编解码模式确定方法和相关产品

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4174852A4

Also Published As

Publication number Publication date
KR20230035383A (ko) 2023-03-13
US20230186924A1 (en) 2023-06-15
EP4174852A4 (en) 2024-01-03
BR112023000667A2 (pt) 2023-01-31
CN114023338A (zh) 2022-02-08
AU2021310236A1 (en) 2023-02-16
JP2023534049A (ja) 2023-08-07
EP4174852A1 (en) 2023-05-03

Similar Documents

Publication Publication Date Title
CN105432097B (zh) 伴有内容分析和加权的具有立体声房间脉冲响应的滤波
US9478225B2 (en) Systems, methods, apparatus, and computer-readable media for three-dimensional audio coding using basis function coefficients
US9609452B2 (en) Obtaining sparseness information for higher order ambisonic audio renderers
ES2635327T3 (es) Compresión de las representaciones descompuestas de un campo sonoro
US9009057B2 (en) Audio encoding and decoding to generate binaural virtual spatial signals
US20150341736A1 (en) Obtaining symmetry information for higher order ambisonic audio renderers
TW201642248A (zh) 編碼或解碼一多聲道訊號之裝置與方法
KR20230011480A (ko) 오디오 신호들의 파라메트릭 재구성
US9930465B2 (en) Parametric mixing of audio signals
US20170110140A1 (en) Coding higher-order ambisonic coefficients during multiple transitions
WO2022012675A1 (zh) 多声道音频信号的编码方法和装置
WO2022247651A1 (zh) 多声道音频信号的编码方法和装置
EP3149972B1 (en) Obtaining symmetry information for higher order ambisonic audio renderers
WO2022012553A1 (zh) 多声道音频信号的编解码方法和装置
WO2020008112A1 (en) Energy-ratio signalling and synthesis
US20210297777A1 (en) Optimized Audio Forwarding
JP7453997B2 (ja) DirACベースの空間オーディオ符号化のためのパケット損失隠蔽
TW202242852A (zh) 適應性增益控制
CN115497485A (zh) 三维音频信号编码方法、装置、编码器和系统
EP4085453A1 (en) Spatial audio parameter encoding and associated decoding
TWI773267B (zh) 一種線性預測編碼參數的編碼方法和編碼裝置
WO2023173941A1 (zh) 一种多声道信号的编解码方法和编解码设备以及终端设备
JP2024063226A (ja) DirACベースの空間オーディオ符号化のためのパケット損失隠蔽
RU2020130054A (ru) Представление пространственного звука посредством звукового сигнала и ассоциированных с ним метаданных
CN113948097A (zh) 多声道音频信号编码方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21841790

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2023503019

Country of ref document: JP

Kind code of ref document: A

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112023000667

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 112023000667

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20230113

ENP Entry into the national phase

Ref document number: 20237004414

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2021841790

Country of ref document: EP

Effective date: 20230125

ENP Entry into the national phase

Ref document number: 2021310236

Country of ref document: AU

Date of ref document: 20210716

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE