WO2023173941A1 - 一种多声道信号的编解码方法和编解码设备以及终端设备 - Google Patents

一种多声道信号的编解码方法和编解码设备以及终端设备 Download PDF

Info

Publication number
WO2023173941A1
WO2023173941A1 PCT/CN2023/073845 CN2023073845W WO2023173941A1 WO 2023173941 A1 WO2023173941 A1 WO 2023173941A1 CN 2023073845 W CN2023073845 W CN 2023073845W WO 2023173941 A1 WO2023173941 A1 WO 2023173941A1
Authority
WO
WIPO (PCT)
Prior art keywords
channel
mute
signal
enable flag
flag
Prior art date
Application number
PCT/CN2023/073845
Other languages
English (en)
French (fr)
Inventor
王智
王喆
李海婷
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202210699863.7A external-priority patent/CN116798438A/zh
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023173941A1 publication Critical patent/WO2023173941A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/012Comfort noise or silence coding

Definitions

  • the present application relates to the field of audio coding and decoding, and in particular, to a coding and decoding method for multi-channel signals, a coding and decoding device, and a terminal device.
  • the compression of audio data is an indispensable link in media applications such as media communications and media broadcasting.
  • the compression of audio data can be achieved through multi-channel coding.
  • Multi-channel coding can be encoding a sound bed signal with multiple channels.
  • Multi-channel coding can also be coding multiple object audio signals.
  • Multi-channel encoding can also be encoding a mixed signal that contains both the acoustic bed signal and the object audio signal.
  • Sound bed signals, object signals, or mixed signals containing sound bed signals and object audio signals can all be input into the audio channel as multi-channel signals.
  • the characteristics of multi-channel signals cannot be exactly the same, and the characteristics of multi-channel signals cannot be exactly the same. Characteristics are also constantly changing.
  • the above-mentioned multi-channel signals are processed using a fixed coding scheme, for example, a unified bit allocation scheme is used for processing, and the multi-channel signals are quantized and coded according to the result of bit allocation.
  • a unified bit allocation scheme has the advantage of being simple and easy to operate, it has the problems of low coding efficiency and waste of coding bit resources.
  • Embodiments of the present application provide a multi-channel signal encoding and decoding method, encoding and decoding equipment, and terminal equipment to improve encoding efficiency and encoding bit resource utilization.
  • embodiments of the present application provide a method for encoding multi-channel signals, including:
  • mute mark information of the multi-channel signal to obtain the mute mark information, where the mute mark information includes: a mute enable flag and/or a mute flag;
  • a code stream is generated according to the transmission channel signal of each transmission channel and the silence mark information, and the code stream includes: the silence mark information and the multi-channel quantization encoding result of the transmission channel signal of each transmission channel.
  • the mute flag information of the multi-channel signal includes: a mute enable flag and/or a mute flag; multi-channel encoding is performed on the multi-channel signal to obtain the transmission channel signal of each transmission channel; A code stream is generated according to the transmission channel signal of each transmission channel and the silence mark information, and the code stream includes: the silence mark information and the multi-channel quantization encoding result of the transmission channel signal of each transmission channel.
  • the information encodes the transmission channel signals of each transmission channel to generate a code stream, taking into account the mute situation of multi-channel signals, thus improving coding efficiency and coding bit resource utilization.
  • the multi-channel signal includes: acoustic bed signal and/or object signal;
  • the mute mark information includes: the mute enable flag; the mute enable flag includes: a global mute enable flag, or a partial mute enable flag, where,
  • the global mute enable flag is a mute enable flag acting on the multi-channel signal.
  • the partial mute enable flag is a mute enable flag that acts on some channels in the multi-channel signal.
  • mute enable flag is the partial mute enable flag
  • the partial mute enable flag is an object mute enable flag acting on the object signal, or the partial mute enable flag is an acoustic bed mute enable flag acting on the acoustic bed signal, or the The partial mute enable flag is a mute enable flag that acts on other channel signals that do not include non-low frequency effect LFE channel signals in the multi-channel signal, or the partial mute enable flag is a mute enable flag that acts on the multi-channel signal.
  • the mute enable flag of the channel signals participating in the pair is an object mute enable flag acting on the object signal, or the partial mute enable flag is an acoustic bed mute enable flag acting on the acoustic bed signal, or the The partial mute enable flag is a mute enable flag that acts on other channel signals that do not include non-low frequency effect LFE channel signals in the multi-channel signal, or the partial mute enable flag is a mute enable flag that acts on the multi-channel signal.
  • the mute indication for the acoustic bed signal and/or the object signal can be carried out through the global mute enable flag or the partial mute enable flag, so that subsequent operations can be performed based on the global mute enable flag or the partial mute enable flag.
  • Coding processing such as bit allocation, can improve coding efficiency.
  • the multi-channel signal includes: an acoustic bed signal and an object signal;
  • the mute mark information includes: the mute enable flag; the mute enable flag includes: an acoustic bed mute enable flag, and an object mute enable flag,
  • the mute enable flag occupies a first bit and a second bit, the first bit is used to carry the value of the acoustic bed mute enable flag, and the second bit is used to carry the object mute The value of the enable flag.
  • the mute enable flag can use different bits to indicate the specific implementation of the mute enable flag. For example, the first bit and the second bit are predefined. Through the above different bits, the mute can be indicated.
  • the enable flags are the sound bed mute enable flag and the object mute enable flag.
  • the mute mark information includes: the mute enable flag
  • the mute enable flag is used to indicate whether the mute mark detection function is turned on; or,
  • the mute enable flag is used to indicate whether the mute flag of each channel of the multi-channel signal needs to be sent.
  • the mute enable flag is used to indicate whether each channel of the multi-channel signal is a non-mute channel.
  • the mute enable flag is used to indicate whether the mute detection function is enabled. For example, when the mute enable flag is a first value (for example, 1), it indicates that the mute detection function is turned on and the mute flag of each channel of the multi-channel signal is further detected. When the mute enable flag is the second value (for example, 0), it means that the mute detection function is turned off.
  • the mute enable flag can also be used to indicate whether each channel of the multi-channel signal is a non-mute channel. For example, when the mute enable flag is a first value (for example, 1), it indicates that the mute flag of each channel needs to be further detected. When the mute enable flag is the second value (for example, 0), it indicates that each channel of the multi-channel signal is a non-mute channel.
  • obtaining the mute mark information of the multi-channel signal includes:
  • Silence mark detection is performed on each channel of the multi-channel signal to obtain the silence mark information.
  • control signaling can be input into the encoding device, and the silence mark information can be determined based on the control signaling.
  • the silence mark information can be controlled by external input, or the encoding device can include encoding parameters (also called encoder parameters), Encoding parameters can be used to determine silence mark information and can be preset based on encoder parameters such as encoding rate and encoding bandwidth. Alternatively, the silence mark information may also be determined based on the silence detection results of each channel. In the embodiment of the present application, there is no limitation on the implementation method of the mute mark information.
  • the mute flag information includes: the mute enable flag and the mute flag;
  • the mute mark detection on each channel of the multi-channel signal to obtain the mute mark information includes:
  • the mute enable flag is determined according to the mute flag of each channel.
  • the encoding end can first detect the mute flag of each channel, and the mute flag of each channel is used to indicate whether each channel is a mute frame. After the mute flag of each channel is determined, the mute enable flag is determined based on the mute flag of each channel. Based on the above method, the mute enable flag can be generated, so that the mute mark information can be generated.
  • the mute mark information includes: the mute flag; or, the mute mark information includes: the mute enable flag and the mute flag;
  • the mute flag is used to indicate whether each channel on which the mute enable flag acts is a mute channel, and the mute channel is a channel that does not require encoding or a channel that needs to be encoded according to low bits.
  • the value of the mute flag when the value of the mute flag is a first value (for example, 1), it indicates that the channel on which the mute enable flag is applied is a mute channel; when the value of the mute flag is a second value (for example, 0), it indicates that the mute enable flag is a mute channel.
  • the value of the mute flag is the first value (for example, 1), the channel is not encoded or is encoded according to lower bits.
  • the method before obtaining the silence mark information of the multi-channel signal, the method further includes:
  • the multi-channel signal is pre-processed to obtain a pre-processed multi-channel signal.
  • the pre-processing includes at least one of the following: transient detection, window type judgment, time-frequency transformation, frequency domain noise shaping, time-frequency transformation, etc. Domain noise shaping, frequency band extension coding;
  • the acquisition of mute mark information of multi-channel signals includes:
  • Silence mark detection is performed on the preprocessed multi-channel signal to obtain the silence mark information.
  • the method further includes:
  • the multi-channel signal is pre-processed to obtain a pre-processed multi-channel signal.
  • the pre-processing includes at least one of the following: transient detection, window type judgment, time-frequency transformation, frequency domain noise shaping, time-frequency transformation, etc. Domain noise shaping, frequency band extension coding;
  • the silence mark information is modified according to the preprocessed multi-channel signal.
  • the silence mark information can also be corrected based on the preprocessing results. For example, after frequency domain noise shaping, the energy of a certain channel of the multi-channel signal changes, and the sound can be adjusted. The silent mark detection result of the channel is used to correct the silent mark information.
  • the transmission channel signal according to each transmission channel and the mute mark Information generation code stream including:
  • the multi-channel signal is encoded according to the adjusted multi-channel processing method to obtain the code stream.
  • the encoding end can adjust the initial multi-channel processing method according to the silence mark information, and then encode the multi-channel signal according to the adjusted multi-channel processing method, thereby improving the coding efficiency. For example, during the screening process of multi-channel signals, channels with a mute flag of 1 do not participate in group pair screening.
  • generating a code stream based on the transmission channel signals of each transmission channel and the silence mark information includes:
  • the mute mark information the number of available bits and the multi-channel side information, perform bit allocation for each transmission channel to obtain the bit allocation result of each transmission channel;
  • the transmission channel signals of each transmission channel are encoded according to the bit allocation results of each channel to obtain the code stream.
  • the encoding end performs bit allocation based on the silence mark information, the number of available bits, and multi-channel side information; it performs encoding based on the bit allocation results of each transmission channel to obtain the encoded code stream.
  • the specific content of the bit allocation strategy is not limited.
  • the encoding of the transmission channel signal may be multi-channel quantization encoding.
  • the specific implementation of the multi-channel quantization encoding in the embodiment of the present application may be to group the downmixed signals through neural network changes to obtain potential features; Quantize and perform interval encoding.
  • the specific implementation of multi-channel quantization coding may be to perform quantization coding on the downmixed signal based on vector quantization pairs.
  • the bit allocation for each transmission channel according to the mute mark information, the number of available bits and the multi-channel side information includes:
  • bit allocation is performed for each transmission channel according to the bit allocation strategy corresponding to the silence mark information.
  • the bit allocation based on the mute mark information may be based on the total available bits and the signal characteristics of each transmission channel, combined with the bit allocation strategy to perform the initial bit allocation. Then, the bit allocation result is adjusted according to the mute mark information. Through the adjustment of the bit allocation, the transmission efficiency of the multi-channel signal can be improved.
  • the multi-channel side information includes: a channel bit allocation ratio field,
  • the channel bit allocation ratio field is used to indicate the bit allocation ratio between non-low frequency effect LFE channels in the multi-channel signal.
  • the channel bit allocation ratio field can indicate the bit allocation ratio of all channels in the multi-channel signal except the LFE channel, thereby determining the number of bits for each non-LFE channel.
  • the silent mark detection on each channel of the multi-channel signal includes:
  • the silence flag of each channel of the current frame is determined.
  • the silence detection parameters of each channel of the current frame are compared with the silence detection threshold respectively.
  • the silence detection parameter of the first channel of the current frame is less than Silence detection threshold, then the first audio channel of the current frame is a mute frame, that is, the first audio channel at the current moment is a mute channel, and the mute flag muteFlag[1] of the first audio channel of the current frame is the first value (for example, 1).
  • the silence detection parameter of the first channel of the current frame is greater than or equal to the silence detection threshold, the first channel of the current frame is a non-silent frame, that is, the first channel of the current frame is a non-silent channel, and the mute flag of the first channel of the current frame muteFlag[1] is the second value (for example, 0).
  • performing multi-channel coding processing on the multi-channel signal to obtain the transmission channel signal of each transmission channel includes:
  • the multi-channel group signal is downmixed according to the multi-channel side information to obtain the transmission channel signal of each transmission channel.
  • the encoding device filters the multi-channel signals, for example, filters out the multi-channel signals that do not participate in the multi-channel pairing, and obtains the filtered multi-channel signals.
  • the filtered multi-channel signal may be a multi-channel signal participating in the group pair, for example, the filtered channel does not include the LFE channel.
  • a downmixing process is performed. The specific downmixing process will not be described in detail.
  • the transmission channel signal of each transmission channel can be obtained.
  • the transmission channel may be a multi-channel group. For the downmixed channel.
  • the multi-channel side information includes at least one of the following: inter-channel amplitude difference parameter quantization codebook index, number of channel group pairs, and channel pair index;
  • the inter-channel amplitude difference parameter quantization codebook index is a codebook index used to indicate the inter-channel amplitude difference ILD parameter quantization of each channel in each channel of the multi-channel signal
  • the number of channel group pairs is used to represent the number of channel group pairs of the current frame of the multi-channel signal
  • the channel pair index is used to represent the index of the channel pair.
  • the number of bits occupied by the inter-channel amplitude difference parameter quantization codebook index is not limited in the embodiment of the present application.
  • the inter-channel amplitude difference parameter quantization codebook index occupies 5 bits.
  • the inter-channel amplitude difference parameter quantization codebook index can be expressed as mcIld[ch1], mcIld[ch2], which occupies 5 bits.
  • the number of bits occupied by the number of channel group pairs is not limited.
  • the number of channel group pairs occupies 4 bits
  • the number of channel group pairs is expressed as pairCnt, which occupies 4 bits and is used to represent the number of channel group pairs in the current frame.
  • the number of bits occupied by the channel pair index is not limited.
  • the channel pair index is expressed as channelPairIndex.
  • the number of channelPairIndex bits is related to the total number of channels. It is used to represent the index of the channel pair.
  • the index values of the two channels in the current channel pair can be parsed, namely ch1 and ch2. .
  • embodiments of the present application provide a method for decoding multi-channel signals, including:
  • the mute mark information includes: a mute enable flag, and/or a mute flag;
  • the decoding end can obtain the silence mark information from the code stream of the encoding end, thereby facilitating the decoding end to perform decoding processing in a manner consistent with that of the encoding end.
  • parsing the silence mark information from the code stream of the encoding device includes:
  • the acoustic bed mute enable flag and/or the object mute enable flag are parsed from the code stream; and each element is parsed from the code stream according to the acoustic bed mute enable flag and/or the object mute enable flag. Mute flag for part of the channel.
  • the code end parses the silence mark information from the code stream of the encoding device.
  • the silence mark information obtained by the decoding end corresponds to the encoding side.
  • the mute flag is used to indicate whether each channel is a mute channel.
  • the mute channel is a channel that does not need to be encoded or a channel that needs to be encoded according to low bits.
  • the decoding end can parse each sound channel from the code stream. Road mute sign. In one way, the mute enable flag can also be used to indicate whether each channel is a non-mute channel.
  • the mute enable flag when the mute enable flag is a first value (for example, 1), it indicates that the mute flag of each channel needs to be further detected. When the mute enable flag is the second value (for example, 0), it means that each channel is a non-mute channel.
  • the decoder parses the mute enable flag from the code stream. If the mute enable flag is the first value, the mute enable flag is retrieved from the code stream. The mute flag is parsed from the stream.
  • the mute enable flag includes: a sound bed mute enable flag, and/or an object mute enable flag
  • the decoder parses the sound bed mute enable flag and/or object mute enable flag from the code stream, and mute flags for each channel.
  • the decoder parses out the sound bed mute enable flag and/or object mute enable flag from the code stream; parses out the sound bed mute enable flag and/or object mute enable flag from the code stream. Mute flags for some channels.
  • decoding the encoded information of each transmission channel includes:
  • the coded information of each transmission channel is decoded according to the number of coded bits of each channel.
  • the code stream can also include multi-channel side information.
  • the decoding end can allocate bits to each transmission channel based on the multi-channel side information and the mute flag information to obtain the number of encoding bits for each transmission channel.
  • the decoding end The obtained number of encoding bits is the same as the number of encoding bits preset at the encoding end, and then the encoding information of each transmission channel is decoded according to the number of encoding bits of each transmission channel, thereby realizing decoding of the transmission channel signal of each transmission channel.
  • the method further includes:
  • Post-processing is performed on the multi-channel decoding output signal, and the post-processing includes at least one of the following: frequency band extension decoding, inverse time domain noise shaping, inverse frequency domain noise shaping, and inverse time-frequency transformation.
  • the above-mentioned post-processing process of the multi-channel decoding output signal is opposite to the pre-processing process at the encoding end, and the specific processing method is no longer limited.
  • the multi-channel side information includes at least one of the following: inter-channel amplitude difference parameter quantization codebook index, number of channel group pairs, and channel pair index;
  • the inter-channel amplitude difference parameter quantization codebook index is used to indicate the channel of each of the channels. Codebook index for quantization of amplitude difference ILD parameters,
  • the number of channel group pairs is used to represent the number of channel group pairs of the current frame of the multi-channel signal
  • the channel pair index is used to represent the index of the channel pair.
  • inventions of the present application provide an encoding device.
  • the encoding device includes:
  • a mute mark detection module used to obtain mute mark information of multi-channel signals, where the mute mark information includes: a mute enable flag, and/or a mute flag;
  • a multi-channel encoding module used to perform multi-channel encoding processing on the multi-channel signal to obtain the transmission channel signal of each transmission channel;
  • a code stream generation module configured to generate a code stream according to the transmission channel signal of each transmission channel and the silence mark information.
  • the code stream includes: the silence mark information and the multi-channel quantization encoding of the transmission channel signal. result.
  • embodiments of the present application provide a decoding device, where the decoding device includes:
  • An analysis module configured to parse the silence mark information from the code stream of the encoding device, and determine the coding information of each transmission channel based on the silence mark information.
  • the silence mark information includes: a silence enable flag, and/or a silence mark. ;
  • An inverse quantization module used to decode the encoded information of each transmission channel to obtain the decoded signal of each transmission channel
  • a multi-channel decoding module is used to perform multi-channel decoding processing on the decoded signals of each transmission channel to obtain a multi-channel decoded output signal.
  • inventions of the present application provide a computer-readable storage medium.
  • the computer-readable storage medium stores instructions that, when run on a computer, cause the computer to execute the above-mentioned first aspect or the second aspect. method described.
  • embodiments of the present application provide a computer program product containing instructions that, when run on a computer, cause the computer to execute the method described in the first aspect or the second aspect.
  • inventions of the present application provide a communication device.
  • the communication device may include entities such as terminal equipment or chips.
  • the communication device includes: a processor and a memory; the memory is used to store instructions; the processor is used to store instructions. Executing the instructions in the memory causes the communication device to perform the method described in any one of the foregoing first aspect or second aspect.
  • embodiments of the present application provide a computer-readable storage medium that stores a code stream generated by the method of the first aspect.
  • the present application provides a chip system, which includes a processor and is used to support the encoding and decoding device to implement the functions involved in the above aspects, for example, sending or processing the data involved in the above methods and/or information.
  • the chip system further includes a memory, and the memory is used to store necessary program instructions and data for the encoding and decoding device.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • Figure 1 is a schematic structural diagram of a multi-channel signal processing system provided by an embodiment of the present application.
  • Figure 2a is a schematic diagram of the audio encoder and audio decoder provided by the embodiment of the present application applied to terminal equipment;
  • Figure 2b is a schematic diagram of the audio encoder provided by the embodiment of the present application being applied to wireless equipment or core network equipment;
  • Figure 2c is a schematic diagram of the audio decoder provided by the embodiment of the present application applied to wireless equipment or core network equipment;
  • Figure 3a is a schematic diagram of the multi-channel encoder and multi-channel decoder provided by the embodiment of the present application applied to terminal equipment;
  • Figure 3b is a schematic diagram of the multi-channel encoder provided by the embodiment of the present application applied to wireless equipment or core network equipment;
  • Figure 3c is a schematic diagram of the multi-channel decoder provided by the embodiment of the present application applied to wireless equipment or core network equipment;
  • Figure 4 is a schematic diagram of a multi-channel signal encoding method provided by an embodiment of the present application.
  • Figure 5 is a schematic diagram of a multi-channel signal decoding method provided by an embodiment of the present application.
  • Figure 6 is a schematic diagram of a multi-channel signal encoding process provided by an embodiment of the present application.
  • Figure 7 is a schematic diagram of a multi-channel signal encoding process provided by an embodiment of the present application.
  • Figure 8 is a schematic diagram of a multi-channel signal decoding process provided by an embodiment of the present application.
  • Figure 9 is a schematic diagram of a multi-channel signal decoding process provided by an embodiment of the present application.
  • Figure 10 is a schematic structural diagram of an encoding device provided by an embodiment of the present application.
  • Figure 11 is a schematic structural diagram of a decoding device provided by an embodiment of the present application.
  • Figure 12 is a schematic structural diagram of another encoding device provided by an embodiment of the present application.
  • Figure 13 is a schematic structural diagram of another decoding device provided by an embodiment of the present application.
  • Embodiments of the present application provide a multi-channel signal coding and decoding method, terminal equipment, and network equipment to improve coding efficiency and coding bit resource utilization.
  • Sound is a continuous wave produced by the vibration of an object.
  • the object that vibrates and emits sound waves is called a sound source.
  • sound waves propagate through a medium (such as air, solid or liquid), the hearing organs of humans or animals can perceive the sound.
  • Characteristics of sound waves include pitch, intensity, and timbre.
  • Pitch indicates the pitch of a sound.
  • Sound intensity indicates the loudness of the sound. Sound intensity may also be called loudness or volume. The unit of sound intensity is decibel (dB). Timbre is also called fret.
  • the frequency of sound waves determines the pitch. The higher the frequency, the higher the pitch.
  • the number of times an object vibrates in one second is called frequency, and the unit of frequency is Hertz (Hz).
  • the frequency of sound that the human ear can recognize is between 20Hz and 20,000Hz.
  • the amplitude of the sound wave determines the intensity of the sound. The greater the amplitude, the greater the sound intensity. The closer you are to the sound source, the louder the sound intensity.
  • the shape of the sound wave determines the timbre.
  • the waveforms of sound waves include square waves, sawtooth waves, sine waves and pulse waves.
  • sounds can be divided into regular sounds and irregular sounds.
  • Irregular sound refers to the sound produced by the sound source vibrating irregularly. Irregular sounds are, for example, noises that affect people's work, study, rest, etc.
  • Regular sound refers to the sound produced by the sound source vibrating regularly. Regular sounds include speech and musical tones.
  • regular sound is an analog signal that changes continuously in the time-frequency domain. This analog signal may be called an audio signal (acoustic signals). Audio signal is an information carrier that carries speech, music and sound effects.
  • Sound can also be divided into mono and stereo.
  • Mono has one sound channel, with a microphone picking up the sound and a speaker playing it back.
  • Stereo has multiple sound channels, and different sound channels transmit different sound waveforms.
  • the sound channel may also be referred to as a channel or channel for short.
  • a multi-channel signal may include signals of each channel, and each channel may also be referred to as each channel.
  • each channel and each channel are The meaning is the same.
  • the transmission channel signal is multi-channel encoded
  • the transmission channel refers to the channel after multi-channel encoding.
  • the multi-channel encoding can include channel group pairs. And downmix processing, so the transmission channel can also be called a channel pair and a downmixed channel.
  • the multi-channel encoding process please refer to the description of the multi-channel encoding process in subsequent embodiments.
  • Multi-channel encoding may be encoding a sound bed signal with multiple channels, such as 5.1 channel, 5.1.4 channel, 7.1 channel, 7.1.4 channel, 22.2 channel, etc.
  • Multi-channel encoding can also be encoding multiple object audio signals.
  • Multi-channel encoding may also be encoding a mixed signal containing both the acoustic bed signal and/or the object audio signal.
  • 5.1 channels including center channel (C), front left channel (L), front right channel (R), rear left surround channel (LS), rear right surround channel (RS ), and the 0.1 (LFE) channel.
  • the 5.1.4 channel is based on the 5.1 channel and adds the following channels: left high channel, right high channel, left high surround channel, and right high surround channel.
  • 7.1 channels include center channel (C), front left channel (L), front right channel (R), rear left surround channel (LS), rear right surround channel (RS), left Back channel (LB), right back channel (RB) and 0.1 channel LFE channel.
  • the 7.1.4 channel adds four height channels to the 7.1 channel.
  • 22.2-channel is a multi-channel format, including a total of 22 channels in three layers and 2 LFE channels.
  • the mixed signal of the acoustic bed signal and the object signal is a signal combination in three-dimensional sound, which jointly completes the audio recording, transmission and playback requirements of complex scenes such as film production, sports competitions, and concerts.
  • the sound content of the field in sports game broadcasts is usually represented by acoustic bed signals, and the comments of different commentators are usually represented by multiple object audios.
  • the characteristics of the input signals between different channels are not exactly the same. At different times, the characteristics of the input signals from the same channel Characteristics are also constantly changing.
  • the current multi-channel signal adopts a fixed coding scheme, which does not consider the differences in input signal characteristics at different times and or between different channels. For example, a unified bit allocation scheme is used for processing, and the multi-channel signal is processed according to the result of bit allocation. Quantization encoding.
  • the multi-channel audio signal to be encoded contains a 5.1.4-channel sound bed signal and 4 object signals.
  • channels 0-9 belong to the acoustic bed signal
  • channels 10-13 belong to the object signal.
  • channels 6-9 and channels 11, 12, and 13 are silent channels (less information can be perceived by hearing), and other channels contain main audio information, that is, non-silent channels.
  • the muted channels become channels 10, 12, and 13, and the other channels contain the main audio information.
  • Embodiments of the present application provide an audio processing technology, and in particular provide an audio coding technology for multi-channel signals to improve the traditional audio coding system.
  • Multi-channel signals refer to audio signals including multiple channels, such as Multichannel signals can be stereo signals.
  • Audio processing includes audio encoding and audio decoding. Audio encoding is performed on the source side and involves encoding (e.g., compressing) raw audio to reduce the amount of data required to represent that audio so that it can be stored and/or transmitted more efficiently. Audio decoding is performed on the destination side and involves inverse processing relative to the encoder to reconstruct the original audio. The encoding part and the decoding part are also collectively called encoding.
  • the audio processing system 100 may include: a multi-channel signal encoding device 101 and a multi-channel signal decoding device 102.
  • the multi-channel signal encoding device 101 can also be called an audio encoding device and can be used to generate a code stream. Then the audio code stream can be transmitted to the multi-channel signal decoding device 102 through the audio transmission channel.
  • the multi-channel signal The decoding device 102 can also be called a multi-audio decoding device, which can receive a code stream, then perform the audio decoding function of the multi-channel signal decoding device 102, and finally obtain a reconstructed signal.
  • the multi-channel signal encoding device can be applied to various terminal devices that require audio communication, wireless devices and core network equipment that require transcoding.
  • the multi-channel signal encoding device can It is the audio encoder of the above-mentioned terminal equipment, wireless equipment or core network equipment.
  • the multi-channel signal decoding device can be applied to various terminal equipment that requires audio communication, wireless equipment and core network equipment that require transcoding.
  • the multi-channel signal decoding device can be the above-mentioned terminal equipment or Audio decoder for wireless devices or core network devices.
  • audio encoders can include media gateways of wireless access networks, core networks, transcoding devices, media resource servers, mobile terminals, fixed network terminals, etc. Audio encoders can also be used in virtual reality technology (VR). ) Audio encoder in streaming services.
  • VR virtual reality technology
  • the end-to-end encoding and decoding process of audio signals includes: audio signal A is collected After the module (acquisition), preprocessing operation (audioPReprocessing) is performed.
  • the preprocessing operation includes filtering out the low-frequency part of the signal. It can use 20Hz or 50Hz as the dividing point to extract the orientation information in the signal, and then perform encoding processing (audio encoding). ) is packaged (file/segment encapsulation) and sent (delivery) to the decoding end.
  • the decoding end first unpacks (file/segment decapsulation), then decodes (audio decoding), and performs binaural rendering (audio rendering) on the decoded signal.
  • the rendered signal is mapped to the listener's headphones, which can be independent headphones or headphones on a glasses device.
  • each terminal device can include: audio encoder, channel encoder, audio decoder, and channel decoder.
  • the channel encoder is used for channel encoding the audio signal
  • the channel decoder is used for channel decoding the audio signal.
  • the first terminal device 20 may include: a first audio encoder 201, a first channel encoder 202, a first audio decoder 203, and a first channel decoder 204.
  • the second terminal device 21 may include: a second audio decoder 211, a second channel decoder 212, a second audio encoder 213, and a second channel encoder 214.
  • First terminal device The device 20 is connected to a wireless or wired first network communication device 22, the first network communication device 22 and the wireless or wired second network communication device 23 are connected through a digital channel, and the second terminal device 21 is connected to a wireless or wired second network communication device 23.
  • Network communication equipment23 may generally refer to signal transmission equipment, such as communication base stations, data exchange equipment, etc.
  • the terminal device as the sending end first collects the audio, performs audio coding on the collected audio signal, and then performs channel coding, and then transmits it in the digital channel through the wireless network or core network.
  • the terminal device as the receiving end performs channel decoding according to the received signal to obtain the code stream, and then recovers the audio signal through audio decoding, and the terminal device at the receiving end performs audio playback.
  • the wireless device or the core network device 25 includes: a channel decoder 251, other audio decoders 252, the audio encoder 253 provided in the embodiment of the present application, and the channel encoder 254, where the other audio decoders 252 refer to the other audio codecs.
  • the channel decoder 251 is first used to decode the signal entering the device, and then other audio decoders 252 are used to perform audio decoding, and then the audio encoder 253 provided in the embodiment of the present application is used.
  • Audio coding finally using the channel encoder 254 to perform channel coding on the audio signal, and then transmit it after completing the channel coding.
  • the other audio decoder 252 performs audio decoding on the code stream decoded by the channel decoder 251.
  • the wireless device or core network device 25 includes: a channel decoder 251, an audio decoder 255 provided in the embodiment of the present application, other audio encoders 256, and a channel encoder 254, where the other audio encoders 256 refer to the other than audio codecs.
  • the channel decoder 251 is first used to decode the signal entering the device, and then the audio decoder 255 is used to decode the received audio encoding stream, and then other audio encoders 256 are used.
  • wireless equipment refers to radio frequency-related equipment in communication
  • core network equipment refers to core network-related equipment in communication
  • the multi-channel signal encoding device can be applied to various terminal devices that require audio communication, wireless devices and core network equipment that require transcoding, such as the multi-channel signal encoding device. It can be a multi-channel encoder of the above-mentioned terminal equipment or wireless equipment or core network equipment. Similarly, the multi-channel signal decoding device can be applied to various terminal equipment that requires audio communication, wireless equipment and core network equipment that require transcoding. For example, the multi-channel signal decoding device can be the above-mentioned terminal equipment or Multi-channel decoder for wireless equipment or core network equipment.
  • Each terminal device may include: a multi-channel encoder, a channel encoder, Multi-channel decoder, channel decoder.
  • the multi-channel encoder can perform the audio encoding method provided by the embodiment of the present application, and the multi-channel decoder can perform the audio decoding method provided by the embodiment of the present application.
  • the channel encoder is used for channel encoding the multi-channel signal
  • the channel decoder is used for channel decoding the multi-channel signal.
  • the first terminal device 30 may include: a first multi-channel encoder 301, a first channel encoder 302, a first multi-channel decoder 303, and a first channel decoder 304.
  • the second terminal device 31 may include: a second multi-channel decoder 311, a second channel decoder 312, The second multi-channel encoder 313 and the second channel encoder 314.
  • the first terminal device 30 is connected to a wireless or wired first network communication device 32.
  • the first network communication device 32 and a wireless or wired second network communication device 33 are connected through a digital channel.
  • the second terminal device 31 is connected to a wireless or wired network communication device 32. the second network communication device 33.
  • the above-mentioned wireless or wired network communication equipment may generally refer to signal transmission equipment, such as communication base stations, data exchange equipment, etc.
  • the terminal device as the sending end performs multi-channel coding on the collected multi-channel signals, and then transmits them in digital channels through the wireless network or core network after channel coding.
  • the terminal device as the receiving end performs channel decoding based on the received signal to obtain the multi-channel signal encoding code stream, and then recovers the multi-channel signal through multi-channel decoding, which is played back by the terminal device as the receiving end.
  • FIG. 3b it is a schematic diagram of the multi-channel encoder provided by the embodiment of the present application applied to wireless equipment or core network equipment.
  • the wireless equipment or core network equipment 35 includes: a channel decoder 351 and other audio decoders 352.
  • the multi-channel encoder 353 and the channel encoder 354 are similar to the aforementioned Figure 2b and will not be described again here.
  • FIG. 3c it is a schematic diagram of the multi-channel decoder provided by the embodiment of the present application applied to a wireless device or a core network device.
  • the wireless device or core network device 35 includes: a channel decoder 351, a multi-channel decoder 355.
  • Other audio encoders 356 and channel encoders 354 are similar to the aforementioned Figure 2c and will not be described again here.
  • the audio encoding processing can be a part of the multi-channel encoder, and the audio decoding processing can be a part of the multi-channel decoder.
  • multi-channel encoding of the collected multi-channel signals can be performed on the collected multi-channel signals.
  • the multi-channel signal is processed to obtain an audio signal, and then the obtained audio signal is encoded according to the method provided by the embodiment of the present application; the decoding end encodes the code stream according to the multi-channel signal, decodes it to obtain the audio signal, and after upmixing Recover multi-channel signals. Therefore, the embodiments of the present application can also be applied to multi-channel encoders and multi-channel decoders in terminal equipment, wireless equipment, and core network equipment. In wireless or core network equipment, if transcoding needs to be implemented, corresponding multi-channel encoding processing is required.
  • a multi-channel signal encoding method provided by an embodiment of the present application is introduced.
  • This method can be executed by a terminal device.
  • the terminal device can be a multi-channel signal encoding device (hereinafter referred to as the encoding end or encoder or
  • the encoding device for example, the encoding end may be an artificial intelligence (AI) encoder).
  • the multi-channel signal may include multiple channels, such as a first channel and a second channel, or the multiple channels may include a first channel, a second channel, a third channel, etc.
  • the encoding process performed by the encoding device (or encoding end) in the embodiment of the present application is explained:
  • the mute mark information includes: a mute enable flag and/or a mute flag.
  • the mute mark information of the multi-channel signal can be obtained.
  • the mute mark information may indicate the mute status of channels in the multi-channel signal. For example, mute mark detection is performed on multi-channel signals to detect whether the multi-channel signals support mute marks.
  • the encoding end can generate mute mark information based on the multi-channel signals.
  • the silence mark information can be used to guide subsequent encoding processing, such as bit allocation and other processing.
  • the silence mark information can also be written into the code stream by the encoding end and transmitted to the decoding end to ensure consistent encoding and decoding processing.
  • the mute mark information is used to indicate the mute mark of the multi-channel signal.
  • the mute mark information has a variety of implementation methods.
  • the mute mark information may include a mute enable flag and/or a mute flag.
  • the mute enable flag is used to indicate whether silence detection is turned on, and the mute flag is used to indicate whether each channel of the multi-channel signal is a silent frame.
  • the multi-channel signal includes an acoustic bed signal and/or an object signal.
  • the current encoding scheme Regardless of the differences in input signal characteristics at different times and between different channels, a unified coding scheme is used for processing, which results in low coding efficiency.
  • the mute enable flag provided in the embodiment of the present application can provide mute instructions for acoustic bed signals and/or object signals.
  • the mute mark information includes: mute enable flag; the mute enable flag includes: global mute enable flag, or partial mute enable flag, where,
  • the global mute enable flag is a mute enable flag that acts on multi-channel signals.
  • the partial mute enable flag is a mute enable flag that acts on some channels in a multi-channel signal.
  • the mute enable flag is recorded as HasSilFlag, and the mute enable flag can be a global mute enable flag or a partial mute enable flag.
  • the above-mentioned global mute enable flag or partial mute enable flag can be used to indicate mute for the acoustic bed signal and/or object signal, so that subsequent encoding processing, such as bits, can be performed based on the global mute enable flag or partial mute enable flag. Allocation can improve coding efficiency.
  • mute enable flag is a partial mute enable flag
  • the partial mute enable flag is an object mute enable flag that acts on an object signal, or the partial mute enable flag is a sound bed mute enable flag that acts on a sound bed signal, or the partial mute enable flag is an object mute enable flag that acts on a multi-sound
  • the channel signal does not contain the mute enable flags of other channels that are not low frequency effects (Low Frequency Effects, LFE) channels, or the partial mute enable flags act on channel signals participating in the pairing of multi-channel signals. mute enable flag.
  • the global mute enable flag acts on all channels, and the partial mute enable flag acts on some channels.
  • the object mute enable flag is applied to the channel corresponding to the object signal in the multi-channel signal
  • the sound bed mute enable flag is applied to the channel corresponding to the sound bed signal in the multi-channel signal.
  • the object mute enable flag that only acts on the object signal in the multi-channel signal is denoted as objMuteEna.
  • the sound bed mute enable flag that only acts on the sound bed signal in a multi-channel signal is recorded as bedMuteEna.
  • the global mute enable flag is the mute enable flag that acts on the multi-channel signal: when the multi-channel signal only contains the sound bed signal, the global mute enable flag is the mute enable flag that acts on the sound bed signal. Enable flag; when the multi-channel signal only contains the object signal, the global mute enable flag is the mute enable flag acting on the object signal; when the multi-channel signal contains the acoustic bed signal and the object signal, the global mute enable flag The flag is a mute enable flag that acts on the acoustic bed signal and the object signal.
  • the partial mute enable flag is a mute enable flag that acts on some channels in the multi-channel signal, and some channels are preset.
  • the partial mute enable flag is a mute enable flag that acts on the object signal.
  • the object mute enable flag, or the partial mute enable flag is a sound bed mute enable flag acting on the sound bed signal, or the partial mute enable flag is a sound bed mute enable flag acting on the multi-channel signal Mute enable flag for other channel signals that do not include the LFE channel signal.
  • the partial mute enable flag is a mute enable flag that acts on the channel signals participating in the group pair in the multi-channel signal. In the embodiments of the present application, the specific method of group pair processing of multi-channel signals is not limited.
  • the multi-channel signal includes: an acoustic bed signal, and an object signal;
  • the mute mark information includes: mute enable flag; the mute enable flag includes: acoustic bed mute enable flag, and object mute enable flag,
  • the mute enable flag occupies the first bit and the second bit.
  • the first bit is used to carry the value of the acoustic bed mute enable flag
  • the second bit is used to carry the value of the object mute enable flag.
  • the mute enable flag can use different bits to indicate the specific implementation of the mute enable flag.
  • the first bit and the second bit are predefined, and the first bit is used to carry the acoustic bed mute enable flag. value, second The bit bits are used to carry the value of the object mute enable flag.
  • the mute enable flag is the acoustic bed mute enable flag and the object mute enable flag.
  • step 401 obtains the mute mark information of the multi-channel signal, including:
  • A3. Perform silence mark detection on each channel of the multi-channel signal to obtain the silence mark information.
  • control signaling can be input into the encoding device, and the silence mark information can be determined according to the control signaling.
  • the silence mark information can be controlled by external input, or the encoding device can include encoding parameters (also called encoder parameters), and the encoding parameters can be To determine the silence mark information, it can be preset based on encoder parameters such as encoding rate and encoding bandwidth. Alternatively, the silence mark information may also be determined based on the silence detection results of each channel. In the embodiment of the present application, there is no limitation on the implementation method of the mute mark information.
  • the mute flag information includes: a mute enable flag
  • the mute enable flag is used to indicate whether the mute mark detection function is turned on;
  • the mute enable flag is used to indicate whether the mute flag of each channel of the multi-channel signal needs to be sent.
  • the mute enable flag is used to indicate whether each channel of the multi-channel signal is a non-mute channel.
  • the mute enable flag is used to indicate whether mute detection is turned on. For example, when the mute enable flag is a first value (for example, 1), it indicates that the mute detection function is turned on and the mute flag of each channel is further detected. When the mute enable flag is the second value (for example, 0), it means that the mute detection function is turned off.
  • the mute enable flag can be used to indicate whether each channel is an unmuted channel. For example, when the mute enable flag is a first value (for example, 1), it indicates that the mute flag of each channel needs to be further detected. When the mute enable flag is the second value (for example, 0), it indicates that each channel is a non-mute channel.
  • the mute flag information includes: a mute enable flag and a mute flag
  • Step A3 performs silence mark detection on each channel of the multi-channel signal to obtain silence mark information, including:
  • A31 Perform mute mark detection on each channel of the multi-channel signal to obtain the mute mark of each channel;
  • A32 Determine the mute enable flag according to the mute flag of each channel.
  • the encoding end can first detect the mute flag of each channel, and the mute flag of each channel is used to indicate whether each channel is a mute frame.
  • the channel numbers of the acoustic bed signal are from 0 to 9, and the channel numbers of the object signal are from 10 to 13.
  • the mute mark information includes: a mute flag; or, the mute mark information includes: a mute enable flag and a mute flag;
  • the mute flag is used to indicate whether each channel to which the mute enable flag acts is a mute channel.
  • a mute channel is a channel that does not require encoding or a channel that requires low-bit encoding.
  • the channel numbers for the acoustic bed signal are from 0 to 9, and the channel numbers for the object signal are from 10 to 13.
  • a silent channel is a channel whose signal energy or decibel or loudness is lower than the hearing threshold. It is a channel that does not need to be encoded or that needs to be encoded according to lower bits.
  • the mute flag When the value of the mute flag is the first value (for example, 1), it indicates that the channel is a mute channel; when the value of the mute flag is the second value (for example, 0), it indicates that the channel is an unmuted channel.
  • the value of the mute flag is the first value (for example, 1), the channel is not encoded or is encoded according to lower bits.
  • step A3 performs silence mark detection on each channel of the multi-channel signal, including:
  • B1. Determine the signal energy of each channel of the current frame according to the input signal of each channel of the current frame of the multi-channel signal.
  • the signal energy of each channel of the current frame is determined.
  • the value of the frame length is not limited.
  • B2. Determine the silence detection parameters of each channel of the current frame according to the signal energy of each channel of the current frame.
  • the silence detection parameters of each channel of the current frame are used to characterize the energy value, power value, decibel value or loudness value of each channel signal of the current frame.
  • the first audio channel of the current frame is a mute frame, that is, the first audio channel at the current moment is a mute channel, and the mute flag muteFlag[1] of the first audio channel of the current frame is the first value (for example, 1).
  • the silence detection parameter of the first channel of the current frame is greater than or equal to the silence detection threshold, the first channel of the current frame is a non-silent frame, that is, the first channel of the current frame is a non-silent channel, and the mute flag of the first channel of the current frame muteFlag[1] is the second value (for example, 0).
  • the encoding device can perform multi-channel encoding processing on the multi-channel signal.
  • multi-channel encoding processes There are many multi-channel encoding processes. For details, see the examples of subsequent embodiments. Through the above encoding process, each transmission channel can be obtained transmission channel signal.
  • the specific implementation of multi-channel quantization coding can be to group the downmixed signals through neural network changes to obtain potential features; quantize the potential features and perform interval coding.
  • the specific implementation of multi-channel quantization coding may be to perform quantization coding on the downmixed signal based on vector quantization pairs. The embodiments of the present application do not limit this.
  • step 402 performs multi-channel coding processing on the multi-channel signal to obtain the transmission channel signal of each transmission channel, including:
  • the encoding device completes the screening of multi-channel signals, and the filtered signals are the multi-channel signals participating in the pairing.
  • the filtered channels do not include the LFE channel, and there is no limit to the specific screening method.
  • the encoding device filters multi-channel signals, and the filtered multi-channel signals can be multi-channel signals that participate in pairing. After completing the screening of multi-channel signals, the multi-channel signals can also be grouped. For example, channel ch1 and channel ch2 form a channel pair, and a multi-channel pair signal is obtained.
  • the specific method of group pair processing is not limited by the present invention.
  • the multi-channel side information includes at least one of the following: inter-channel amplitude difference parameter quantization codebook index, number of channel group pairs, and channel pair index. Among them, the inter-channel amplitude difference parameter quantization codebook index is used to indicate each sound in each channel of the multi-channel signal.
  • ILD interaural level difference
  • the multi-channel side information can be used to downmix the multi-channel group pair signal.
  • the specific down-mixing process will not be explained in detail.
  • Multi-channel pairing and downmixing can obtain the transmission channel signals of each transmission channel after multi-channel pairing and downmixing.
  • the transmission channel may specifically refer to the channel after multi-channel pairing and downmixing.
  • the encoding method of the multi-channel signal performed by the encoding end also includes:
  • the pre-processing includes at least one of the following: transient detection, window type judgment, time-frequency transformation, frequency domain noise shaping, time domain noise Shaping, band extension coding;
  • step 401 obtains the mute mark information of the multi-channel signal, including:
  • the input signal for mute mark detection may be an original input multi-channel signal or a pre-processed multi-channel signal. Preprocessing may include but is not limited to: transient detection, window type judgment, time-frequency transformation, frequency domain noise shaping, time domain noise shaping, frequency band extension coding and other processing.
  • the multi-channel signal may be a time domain signal or a frequency domain signal.
  • the encoding method for multi-channel signals performed by the encoding end also includes:
  • the pre-processing includes at least one of the following: transient detection, window type judgment, time-frequency transformation, frequency domain noise shaping, time domain noise Shaping, band extension coding;
  • the encoding end can preprocess multi-channel signals. Preprocessing may include but is not limited to: transient detection, window type judgment, time-frequency transformation, frequency domain noise shaping, time domain noise shaping, frequency band extension coding and other processing. Multi-channel signals can be time domain signals or frequency domain signals.
  • the silence mark information in step 401 can also be modified according to the preprocessed multi-channel signal. For example, after frequency domain noise shaping, the signal energy of a certain channel of the multi-channel signal changes. The mute mark detection results for this channel can be adjusted.
  • the code stream includes: the silence mark information and the multi-channel quantization encoding result of the transmission channel signal of each transmission channel.
  • the encoding end generates a code stream, and the code stream includes silence mark information, so that the decoding end can obtain the silence mark information, and decode the code stream based on the silence mark information, which facilitates the decoding end to adopt the same method as the encoding end.
  • Decoding processing such as bit allocation.
  • step 403 generates a code stream based on the transmission channel signal and silence mark information of each transmission channel, including:
  • the encoding end can adjust the initial multi-channel processing method based on the mute mark information, and then adjust the multi-channel processing method based on the adjusted multi-channel processing method.
  • Channel processing method encodes multi-channel signals, which can improve coding efficiency. For example, during the screening process of multi-channel signals, channels with a mute flag of 1 do not participate in group pair screening.
  • step 403 generates a code stream based on the transmission channel signal and silence mark information of each transmission channel, including:
  • the mute mark information the number of available bits and the multi-channel side information, perform bit allocation for each transmission channel, and obtain the bit allocation results of each transmission channel;
  • G2 Encode the transmission channel signal of each transmission channel according to the bit allocation result of each channel to obtain a code stream.
  • the encoding end can use the silence mark information for bit allocation of the transmission channel.
  • the initial bit allocation is made for each transmission channel based on the number of available bits and multi-channel side information, and then the bit allocation is performed based on the silence mark information to obtain each transmission.
  • the bit allocation result of the channel; the transmission channel signal is encoded according to the bit allocation result of each transmission channel to obtain a code stream, which can be called an encoded code stream, or a code stream of a multi-channel signal.
  • step G1 allocates bits to each transmission channel according to the silence mark information, the number of available bits and the multi-channel side information, including:
  • the encoding end can allocate bits to each transmission channel based on the silence mark information.
  • the mute enable flag can be used to select different bit allocation strategies.
  • the specific content of the bit allocation strategy is not limited. An example is as follows: Assume that the mute enable flag includes the bed mute enable flag bedMuteEna and the object mute enable flag objMuteEna. Bit allocation is performed based on the mute mark information. It may be based on the total The available bits and the signal characteristics of each transmission channel are used to perform initial bit allocation. Then, the bit allocation result is adjusted according to the mute mark information. Through the adjustment of the bit allocation, the transmission efficiency of the multi-channel signal can be improved.
  • the object mute enable flag objMuteEna is 1, the bits initially allocated to the channel with muteflag of 1 in the object signal are allocated to the sound bed signal or other object channels. If the sound bed mute enable flag bedMuteEna and the object mute enable flag are both 1, the bits initially allocated to the channel with muteflag 1 in the object channel can be reallocated to other object channels, and the sound bed signal with muteflag 1 can be reallocated to other object channels. The bits initially assigned to the channel are reallocated to other sound bed channels.
  • multi-channel side information includes: channel bit allocation ratio
  • the channel bit allocation ratio is used to indicate the bit allocation ratio between non-low frequency effect LFE channels in a multi-channel signal.
  • the low frequency effect LFE channel is an audio channel with a bass sound range from 3-120Hz. This channel can be used to send to speakers specially designed for bass sounds.
  • the channel bit allocation ratio is used to indicate the bits of the non-LFE channel. Distribution ratio. For example, the channel bit allocation ratio occupies 6 bits. In the embodiment of the present application, the number of bits occupied by the channel bit allocation ratio is not limited.
  • the channel bit allocation ratio can be the channel bit allocation ratio field in the multi-channel side information, expressed as chBitRatios, which occupies 6 bits and is used to indicate the channel bit allocation ratio of all channels in the multi-channel signal except the LFE channel. Bit allocation ratio.
  • the channel bit allocation ratio field can indicate the bit allocation ratio of each transmission channel, thereby determining the number of bits obtained by each transmission channel. Without limitation, the number of bits can be further converted into a number of bytes.
  • the multi-channel side information includes at least one of the following: inter-channel amplitude difference parameter quantization codebook index, number of channel group pairs, and channel pair index;
  • the inter-channel amplitude difference parameter quantization codebook index is used to indicate the codebook index of the inter-channel amplitude difference (Interaural Level Difference, ILD) parameter quantization of each channel in each channel;
  • the number of channel group pairs used to represent the number of channel group pairs in the current frame of the multi-channel signal
  • Channel pair index used to represent the index of the channel pair.
  • the number of bits occupied by the inter-channel amplitude difference parameter quantization codebook index is not limited in the embodiment of the present application.
  • the inter-channel amplitude difference parameter quantization codebook index occupies 5 bits.
  • the inter-channel amplitude difference parameter quantization codebook index can be expressed as mcIld[ch1], mcIld[ch2], which occupies 5 bits.
  • the number of bits occupied by the number of channel group pairs is not limited.
  • the number of channel group pairs occupies 4 bits
  • the number of channel group pairs is expressed as pairCnt, which occupies 4 bits and is used to represent the number of channel group pairs in the current frame.
  • the number of bits occupied by the channel pair index is not limited.
  • the channel pair index is expressed as channelPairIndex.
  • the number of channelPairIndex bits is related to the total number of channels. It is used to represent the index of the channel pair.
  • the index values of the two channels in the current channel pair can be parsed, namely ch1 and ch2. .
  • the encoding method for multi-channel signals performed by the encoding device also includes:
  • the encoding end after the encoding end obtains the transmission channel signal and the silence mark information of each transmission channel, it can generate a code stream, which carries the silence mark information, and the encoding end can send the code stream to the decoding end.
  • mute mark detection is performed on multi-channel signals to obtain mute mark information.
  • the mute mark information includes: mute enable flag and/or mute flag; on the multi-channel signal Perform multi-channel encoding processing to obtain transmission channel signals of each transmission channel; generate a code stream according to the transmission channel signal of each transmission channel and the silence mark information, where the code stream includes: the silence mark information and the silence mark information.
  • the multi-channel quantization encoding results of the transmission channel signals of each transmission channel are described. Performing subsequent encoding processing based on the silence mark information can improve encoding efficiency.
  • Embodiments of the present application also provide a method for decoding multi-channel signals, which method can be executed by a terminal device.
  • the terminal device can be a multi-channel signal decoding device (hereinafter referred to as a decoding terminal or decoder, for example, the The decoding end can be an AI decoder).
  • the method performed on the decoding end in the embodiment of the present application mainly includes:
  • the mute mark information includes: a mute enable flag and/or a mute flag.
  • the decoding end adopts the opposite processing method to that of the encoding end.
  • the code stream is received from the encoding device. Since the code stream carries silence mark information, the coding information of each transmission channel is determined based on the silence mark information.
  • the silence mark information includes : Mute enable flag, and/or mute flag.
  • Mute enable flag and/or mute flag.
  • step 501 parses the silence mark information from the code stream of the encoding device, including:
  • the decoding end parses the silence mark information from the code stream of the encoding device. Depending on the specific content of the silence mark information generated by the encoding device, the silence mark information obtained by the decoding end corresponds to the encoding side.
  • the mute flag is used to indicate whether each channel is a mute channel.
  • the mute channel is a channel that does not need to be encoded or a channel that needs to be encoded according to low bits.
  • the decoder can parse it out from the code stream. Mute flag for each channel. In one way, the mute enable flag can also be used to indicate whether each channel is a non-mute channel.
  • the mute enable flag when the mute enable flag is a first value (for example, 1), it indicates that the mute flag of each channel needs to be further detected. When the mute enable flag is the second value (for example, 0), it means that each channel is a non-mute channel.
  • the decoder parses the mute enable flag from the code stream. If the mute enable flag is the first value, the mute enable flag is retrieved from the code stream. The mute flag is parsed from the stream.
  • the mute enable flag includes: a sound bed mute enable flag, and/or an object mute enable flag
  • the decoder parses the sound bed mute enable flag and/or object mute enable flag from the code stream, and mute flags for each channel.
  • the decoder parses out the sound bed mute enable flag and/or object mute enable flag from the code stream; parses out the sound bed mute enable flag and/or object mute enable flag from the code stream. Mute flags for some channels. There is no limit to which specific part of the channel the mute flag is obtained.
  • the decoding end After the decoding end obtains the encoding information of each transmission channel from the code stream, it can decode the encoding information of each transmission channel.
  • the decoding and inverse quantization process is opposite to the quantization encoding process of the encoding end, so that each transmission channel can be obtained.
  • the decoded signal of the channel After the decoding end obtains the encoding information of each transmission channel from the code stream, it can decode the encoding information of each transmission channel.
  • the decoding and inverse quantization process is opposite to the quantization encoding process of the encoding end, so that each transmission channel can be obtained.
  • the decoded signal of the channel After the decoding end obtains the encoding information of each transmission channel from the code stream, it can decode the encoding information of each transmission channel.
  • the decoding and inverse quantization process is opposite to the quantization encoding process of the encoding end, so that each transmission channel can be obtained.
  • the decoded signal of the channel After the decoding
  • step 502 decodes the encoded information of each transmission channel, including:
  • I3. Decode the coded information of each transmission channel according to the number of coded bits of each channel.
  • the code stream can also include multi-channel side information.
  • the decoding end can allocate bits to each transmission channel based on the multi-channel side information and the mute flag information to obtain the number of encoding bits for each channel.
  • the encoding bits obtained by the decoding end The number is the same as the number of encoding bits preset at the encoding end, and then the encoding information of each transmission channel is decoded according to the number of encoding bits of each transmission channel, thereby realizing the decoding of the transmission channel signal of each transmission channel.
  • the multi-channel side information includes: channel bit allocation ratio field,
  • the channel bit allocation ratio field is used to indicate the bit allocation ratio of non-low frequency effects (Low Frequency Effects, LFE) channels in each channel.
  • LFE Low Frequency Effects
  • the low frequency effect LFE channel is an audio channel with bass sounds ranging from 3-120Hz, which can be used to send to speakers specially designed for bass sounds.
  • the channel bit allocation ratio field occupies 6 bits. In the embodiment of the present application, the number of bits occupied by the channel bit allocation ratio field is not limited.
  • the channel bit allocation ratio field is expressed as chBitRatios, which occupies 6 bits and is used to indicate the bit allocation ratio of non-LFE channels in each channel.
  • the channel bit allocation ratio field can indicate the bit allocation ratio of each channel, thereby determining the number of bits obtained by each channel. Without limitation, the number of bits can be further converted into a number of bytes.
  • the multi-channel side information includes at least one of the following: inter-channel amplitude difference parameter quantization codebook index, number of channel group pairs, and channel pair index;
  • the inter-channel amplitude difference parameter quantization codebook index is used to indicate the codebook index of the inter-channel amplitude difference ILD parameter quantization of each channel;
  • the number of channel group pairs used to represent the number of channel group pairs in the current frame of the multi-channel signal
  • Channel pair index used to represent the index of the channel pair.
  • the number of bits occupied by the inter-channel amplitude difference parameter quantization codebook index is not limited in the embodiment of the present application.
  • the inter-channel amplitude difference parameter quantization codebook index occupies 5 bits.
  • the inter-channel amplitude difference parameter quantization codebook index can be expressed as mcIld[ch1], mcIld[ch2], which occupies 5 bits.
  • the number of bits occupied by the number of channel group pairs is not limited.
  • the number of channel group pairs occupies 4 bits
  • the number of channel group pairs is expressed as pairCnt, which occupies 4 bits and is used to represent the number of channel group pairs in the current frame.
  • the number of bits occupied by the channel pair index is not limited.
  • the channel pair index is expressed as channelPairIndex.
  • the number of channelPairIndex bits is related to the total number of channels. It is used to represent the index of the channel pair.
  • the index values of the two channels in the current channel pair can be parsed, namely ch1 and ch2. .
  • step I2 allocates bits to each transmission channel based on multi-channel side information and mute flag information, including:
  • I21 Determine the first remaining number of bits based on the number of available bits and the number of security bits
  • the number of safe bits is expressed as safeBits, and the number of safe bytes is 8 bits.
  • the first remaining number of bits can be obtained by subtracting the number of safe bits from the number of available bits.
  • I22 Allocate the first remaining number of bits to each channel according to the channel bit allocation ratio field in the multi-channel side information.
  • the channel bit allocation ratio field is used to indicate the bit allocation ratio of each channel;
  • the second number of remaining bits is allocated to each channel according to the channel bit allocation ratio field
  • the second number of remaining bits can be obtained by subtracting the number of bits allocated to each channel from the first number of remaining bits.
  • the third remaining bit number is allocated to the channel with the largest number of allocated bits when the first remaining bit number is used for bit allocation;
  • the third remaining bit number can be obtained by subtracting the number of bits allocated to each channel from the second remaining bit number.
  • the first channel can be any one of the various channels.
  • the decoding end After decoding, the decoding end obtains the decoded signal of each transmission channel, and then further decodes the decoded signal of each transmission channel to obtain a decoded output signal.
  • the decoding method of the multi-channel signal performed by the decoding end also includes:
  • Post-process the multi-channel decoding output signal includes at least one of the following: frequency band extension decoding, Inverse time domain noise shaping, inverse frequency domain noise shaping, inverse time-frequency transformation.
  • the above-mentioned post-processing process of the output signal is opposite to the pre-processing process at the encoding end, and the specific processing method is no longer limited.
  • the decoding end can obtain the silence mark information from the code stream of the encoding end, so that the decoding end can perform decoding processing in a manner consistent with that of the encoding end, such as bit allocation.
  • Multi-channel audio encoder products include mobile phone terminals, chips and wireless networks.
  • Embodiment 1 The encoding end of Embodiment 1 is shown in Figure 6 and includes a silence mark detection unit, a multi-channel encoding processing unit, a multi-channel quantization encoding unit, and a code stream multiplexing interface.
  • the silent mark detection unit is mainly used to detect the silent mark information based on the input signal and determine the silent mark information.
  • the mute flag information may include a mute enable flag and/or a mute flag.
  • the mute enable flag is denoted as HasSilFlag.
  • the mute enable flag may be a global mute enable flag or a partial mute enable flag.
  • an object mute enable flag that only acts on an object signal in a multi-channel signal is denoted as objMuteEna.
  • the sound bed mute enable flag that only acts on the object signal in the multi-channel signal is recorded as bedMuteEna.
  • the global mute enable flag is a mute enable flag that acts on the multi-channel signal.
  • the global mute enable flag is a mute enable flag that acts on the sound bed signal.
  • the global mute enable flag is the mute enable flag acting on the object signal
  • the global mute enable flag is A mute enable flag that acts on the acoustic bed signal and object signal.
  • the partial mute enable flag is a mute enable flag that acts on some channels in the multi-channel signal, and some channels are preset.
  • the partial mute enable flag is a mute enable flag that acts on the object signal.
  • the object mute enable flag, or the partial mute enable flag is a sound bed mute enable flag acting on the sound bed signal, or the partial mute enable flag is a sound bed mute enable flag acting on the multi-channel signal Mute enable flag for other channel signals that do not include the LFE channel signal.
  • the partial mute enable flag is a mute enable flag that acts on the channel signals participating in the group pair in the multi-channel signal. In the embodiments of the present application, the specific method of group pair processing of multi-channel signals is not limited.
  • the mute enable flag is used to indicate whether mute detection is enabled. For example, when the mute enable flag is a first value (for example, 1), it means that the mute detection function is turned on and the mute flag of each channel is further detected. When the mute enable flag is the second value (for example, 0), it means that the mute detection function is turned off.
  • the mute enable flag can also be used to indicate whether further transmission of the mute flag for each channel is required. For example, when the mute enable flag is a first value (for example, 1), it indicates that the mute flag of each channel needs to be further transmitted. When the mute enable flag is the second value (for example, 0), it indicates that there is no need to further transmit the mute flag of each channel.
  • the mute enable flag can also be used to indicate whether each channel is a non-mute channel. For example, when the mute enable flag is a first value (for example, 1), it indicates that the mute flag of each channel needs to be further detected. When the mute enable flag is the second value (for example, 0), it indicates that each channel is a non-mute channel.
  • the global mute enable flag acts on all channels, and the partial mute enable flag acts on some channels.
  • the object mute enable flag is applied to the channel corresponding to the object signal in the multi-channel signal
  • the sound bed mute enable flag is applied to the channel corresponding to the sound bed signal in the multi-channel signal.
  • the mute enable flag can be controlled by external input, can be preset based on encoder parameters such as encoding rate and encoding bandwidth, and can also be determined based on the mute detection results of each channel.
  • the mute flag of each channel is used to indicate whether each channel is a mute frame.
  • the channel numbers of the acoustic bed signal are from 0 to 9, and the channel numbers of the object signal are from 10 to 13.
  • a silent channel is a channel whose signal energy/decibel/loudness is lower than the hearing threshold. It is a channel that does not need to be encoded or a channel that only needs to be encoded according to lower bits.
  • the value of the mute flag is the first value (for example, 1), it indicates that the channel is a mute channel; when the value of the mute flag is the second value (for example, 0), it indicates that the channel is an unmuted channel.
  • the value of the mute flag is the first value (for example, 1), the channel is not encoded or is encoded according to lower bits.
  • the input signal for mute mark detection can be the original input signal or a preprocessed signal. Preprocessing may include but is not limited to: transient detection, window type judgment, time-frequency transformation, frequency domain noise shaping, time domain noise shaping, frequency band extension coding and other processing.
  • the input signal can be a time domain signal or a frequency domain signal. Taking the input signal as the time domain signal of each channel in a multi-channel signal as an example, a method of detecting the mute flag of each channel can be:
  • the energy of each channel signal of the current frame is determined.
  • the energy energy(ch) of the ch-th channel of the current frame is:
  • orig ch is the input signal of the ch-th channel of the current frame
  • energy(ch) is the energy of the ch-th channel of the current frame.
  • the silence detection parameters of each channel of the current frame are determined.
  • the silence detection parameters of each channel of the current frame are used to characterize the energy value, power value, decibel value or loudness value of each channel signal of the current frame.
  • the silence detection parameter of each channel of the current frame may be the value in the log domain of the energy of each channel signal of the current frame, such as log2(energy(ch)) or log10(energy(ch)). According to the energy of each channel signal of the current frame, calculate the silence detection parameters of each channel of the current frame.
  • the silence detection parameters of each channel of the current frame meet the following conditions:
  • energyDB[ch] 10*log10(energy[ch]/Bit_Depth/Bit_Depth);
  • energyDB[ch] is the silence detection parameter of the ch-th channel of the current frame
  • energy(ch) is the energy of the ch-th channel of the current frame
  • Bit_Depth is the full offset value of the bit width.
  • the sampling bit depth is 16 bits
  • the silence flag of each channel of the current frame is determined.
  • the silence detection parameters of each channel of the current frame Compare the silence detection parameters of each channel of the current frame with the silence detection threshold: if the silence detection parameter of the ch-th channel of the current frame is less than the silence detection threshold, then the ch-th channel of the current frame is a silent frame, that is, the ch-th channel is silent at the current moment.
  • the mute flag silFlag[i] of the ch channel of the current frame is the first value (for example, 1).
  • the ch-th channel of the current frame is a non-silent frame, that is, the ch-th channel at the current moment is a non-silent channel, and the mute flag of the ch-th channel of the current frame is silFlag[i] is the second value (e.g. 0).
  • the pseudo code for determining the silence flag of the ch channel of the current frame is as follows:
  • the mute mark information may include the mute enable flag and/or the mute flag.
  • the different mute mark information is composed as follows:
  • the silence flag information is the silence flag silFlag[i] of each channel. Determine the mute flag silFlag[i] of each channel, write the mute flag silFlag[i] of each channel into the code stream, and transmit it to the decoding end.
  • the silence flag information includes the silence enable flag HasSilFlag and the silence flag silFlag[i].
  • the silence enable flag HasSilFlag indicates whether the current frame turns on the silence detection function, and can also be used to indicate whether the current frame transmits the silence detection results of each channel.
  • the mute flag information includes the sound bed mute enable flag bedMuteEna, the object mute enable flag objMuteEna, and the mute flag silFlag[i] of each channel.
  • the acoustic bed mute enable flag bedMuteEna can be used to indicate whether the mute detection function of the corresponding channel of the acoustic bed signal is turned on in the current frame.
  • the object mute enable flag objMuteEna can be used to indicate whether the mute detection function of the corresponding channel of the object signal is turned on in the current frame. For example:
  • the object mute enable flag objMuteEna is 1
  • the mute flag values of the corresponding channels of the sound bed signal are all set to 0, that is, non-mute channels.
  • the mute flag value of the channel corresponding to the target signal is the mute detection result.
  • the sound bed mute enable flag bedMuteEna is 1
  • the object mute enable flag objMuteEna is 0, and the mute flag values of the corresponding channels of the object signals are all set to 0, that is, non-mute channels.
  • the mute flag value of the corresponding channel of the acoustic bed signal is the mute detection result.
  • the mute mark information includes the sound bed mute enable flag bedMuteEna, the object mute enable flag objMuteEna and the mute flag, the mute flag of each channel can be transmitted.
  • the mute flag information includes the sound bed mute enable flag bedMuteEna, the object mute enable flag objMuteEna, and the mute flag silFlag[i] of some channels.
  • method four only the mute flags of some channels are transmitted. For example, when the acoustic bed mute enable flag bedMuteEna is 0 and the object mute enable flag objMuteEna is 1, only the corresponding pass of the object signal can be transmitted.
  • the mute flag of the channel does not transmit the mute flag of the channel corresponding to the sound bed signal; when the sound bed mute enable flag bedMuteEna is 1 and the object mute enable flag objMuteEna is 0, only the mute flag of the channel corresponding to the sound bed signal can be transmitted; when When the sound bed mute enable flag bedMuteEna is 0 and the object mute enable flag objMuteEna is 0, there is no need to transmit the mute flag of each channel; when the sound bed mute enable flag bedMuteEna is 1 and the object mute enable flag objMuteEna is 1, Then the mute flag of each channel is transmitted.
  • the sound bed mute enable flag bedMuteEna and the object mute enable flag objMuteEna can also be represented by a 2-bit mute enable flag HasSilFlag.
  • the embodiments of this application are not limiting.
  • Method 6 First determine the mute flag of each channel, and then determine the mute enable flag based on the mute flag of each channel.
  • the mute enable flag may be a global mute enable flag. If the mute flags of each channel are all 0, the global mute enable flag is set to 0. It is only necessary to write the global mute enable flag into the code stream and transmit it to the decoding side. There is no need to transmit the mute flag of each channel. If at least one mute flag of each channel is 1, the global mute enable flag is set to 1. It is only necessary to write the global mute enable flag into the code stream and transmit it to the decoding side. There is no need to transmit the mute flag of each channel.
  • the mute enable flag may be the bed mute enable flag bedMuteEna and the object mute enable flag objMuteEna.
  • the sound bed mute enable flag bedMuteEna if the mute flags of each channel corresponding to the sound bed signal are all 0, then the sound bed mute enable flag is set to 0, and only the sound bed mute enable flag needs to be written into the code stream , transmitted to the decoding side, there is no need to transmit the mute flag of each channel corresponding to the acoustic bed signal. If at least one of the mute flags of each channel corresponding to the sound bed signal is 1, the sound bed mute enable flag is set to 1.
  • the multi-channel coding processing unit completes the screening, grouping, downmixing processing and multi-channel side information generation of multi-channel signals, and obtains each transmission channel signal after multi-channel pairing and downmixing.
  • pre-processing may also be included between the silence mark detection process and the multi-channel encoding process, which is used to pre-process the input signal to obtain the pre-processed input as the input of the multi-channel encoding process.
  • Preprocessing may include but is not limited to: transient detection, window type judgment, time-frequency transformation, frequency domain noise shaping, time domain noise shaping, frequency band extension coding and other processes, which are not limited in the embodiments of this application.
  • multi-channel signals are screened based on multi-channel input signals or pre-processed multi-channel signals to obtain filtered multi-channel signals.
  • Perform downmix processing (such as mid-side information (MIDSIDE, MS) processing) on the multi-channel group pair signal to obtain the downmixed signal of the multi-channel group pair to be encoded.
  • the silence mark information can be corrected.
  • the energy of a certain transmission channel signal changes, and the silence detection result of this channel can be adjusted.
  • Multi-channel side information includes but is not limited to: group pair number, group pair channel index list, group pair channel interaural intensity difference ILD coefficient list, group pair channel ILD big and small endian list.
  • the initial multi-channel processing method can be adjusted according to the silence mark information. For example, in the filtering of multi-channel signals During the selection process, channels with a mute flag of 1 do not participate in group pair selection.
  • the multi-channel quantization encoding unit performs quantization encoding on the downmixed transmission channel signals of the multi-channel group pair.
  • Multi-channel quantization coding includes bit allocation processing and encoding.
  • the specific implementation of multi-channel quantization coding can be to group the downmixed signals through neural network changes to obtain potential features; quantize the potential features and perform interval coding.
  • the specific implementation of multi-channel quantization coding may be to perform quantization coding on the downmixed signal based on vector quantization pairs. The embodiments of the present application do not limit this.
  • bit allocation can be performed based on silence mark information. For example, different bit allocation strategies are selected based on the mute enable flag.
  • the mute enable flag includes the acoustic bed mute enable flag bedMuteEna and the object mute enable flag objMuteEna.
  • Bit allocation is performed based on the mute flag information. The initial bit allocation may be based on the total available bits and the signal characteristics of each channel. The bit allocation result is then adjusted based on the mute mark information. For example, if the object mute enable flag objMuteEna is 1, the bits initially allocated to the channel whose mute flag is 1 in the object signal are allocated to the sound bed signal or other object channels.
  • the bits initially allocated to the channel with a mute flag of 1 in the object channel can be reallocated to other object channels, and the mute flag in the sound bed signal is 1.
  • the bits initially assigned to a channel are reallocated to other sound bed channels.
  • the code stream multiplexing interface multiplexes the encoded audio channels to form a serial bit stream bitStream to facilitate transmission in the channel or storage in digital media.
  • the decoding end of this embodiment is shown in Figure 8 and includes a code stream demultiplexing unit, a channel decoding inverse quantization unit, a multi-channel decoding processing unit, and a multi-channel post-processing unit.
  • the code stream demultiplexing unit analyzes the silence flag information from the received code stream and determines the coding information of each channel.
  • the silence mark information is parsed from the received code stream.
  • the parsing process is the reverse process of the encoding end writing the silence mark information into the code stream.
  • the decoding end first parses the sound bed mute enable flag bedMuteEna and the object mute enable flag objMuteEna from the code stream; then parses the sound bed mute enable flag bedMuteEna and the object mute enable flag objMuteEna , parse the mute flag of the corresponding channel from the code stream. For example: when the sound bed mute enable flag bedMuteEna is 0 and the object mute enable flag objMuteEna is 1, the mute flag of the corresponding channel of the object signal is parsed from the code stream; when the sound bed mute enable flag bedMuteEna is 1, the object is muted.
  • the mute flag of the corresponding channel of the sound bed signal is analyzed from the code stream; when the sound bed mute enable flag bedMuteEna is 0, When the object mute enable flag objMuteEna is 0, there is no need to parse the mute flag from the code stream; when the sound bed mute enable flag bedMuteEna is 1 and the object mute enable flag objMuteEna is 1, the mute of each channel is parsed from the code stream. flag, the number of analyzed channels is the sum of the number of channels corresponding to the acoustic bed signal and the number of channels corresponding to the object signal.
  • the specific syntax for the decoder to parse the silence mark information from the code stream is as follows:
  • Bit allocation is performed based on multi-channel side information to determine the number of encoding bits for each channel.
  • the decoding side also needs to perform bit allocation based on the silence flag information to determine the number of encoding bits for each channel.
  • the coding information of each channel is determined from the received code stream.
  • the decoding unit performs inverse encoding and inverse quantization on each encoded channel to obtain a downmixed decoded signal of the multi-channel group pair.
  • Inverse encoding and inverse quantization are the inverse processes of multi-channel quantization encoding at the encoding end.
  • the multi-channel decoding processing unit and the multi-channel group perform multi-channel decoding processing on the downmixed decoded signal to obtain a multi-channel output signal.
  • the multi-channel decoding process is the reverse process of the multi-channel encoding process.
  • Multi-channel side information is used to reconstruct the multi-channel output signal based on the downmixed decoded signal of the multi-channel group pair.
  • the multi-channel decoding process at the decoder will also include corresponding post-processing, such as: band extension decoding, inverse time domain noise shaping, inverse frequency domain noise shaping, inverse time-frequency transformation, etc. to obtain the final output signal.
  • post-processing such as: band extension decoding, inverse time domain noise shaping, inverse frequency domain noise shaping, inverse time-frequency transformation, etc.
  • detecting silence mark information on multi-channel input signals, determining the silence mark information, and performing subsequent coding processing, such as bit allocation, based on the silence mark information can improve coding efficiency.
  • This embodiment of the present application proposes a method for generating a mute identification bit stream based on input signal characteristics.
  • the encoding end detects the silence mark information of the multi-channel input signal to determine the silence mark information; transmits the silence mark information to the decoding end; allocates bits according to the silence mark information to encode the multi-channel signal.
  • the decoding end parses the silence mark information from the code stream; allocates bits according to the silence mark information and decodes the multi-channel signal.
  • each input signal is calculated to obtain a silence flag bit, which is used to guide bit allocation for encoding and decoding. Determine whether the input signal is a silent frame. If it is a silent frame, the channel will not be encoded or a small number of bits will be encoded. Calculate the decibel value or loudness value of the signal at the input end and compare it with the set hearing threshold. If the value is lower than the hearing threshold, the mute flag is set to 1, otherwise the mute flag is set to 0.
  • the mute flag When the mute flag is 1, the channel is not encoded or encoded according to lower bits; the data before quantization of the channel whose mute bit is 1 can be cleared to 0; the mute flag is transmitted to the decoder as side information to guide the decoding end's bit demultiplexing and encoding.
  • a 10-bit mute flag is transmitted in the multi-channel side information, 1 bit for each channel, and the order is consistent with the order of the input channels; other modules on the encoding end can modify the mute flag, changing the mute flag from 1 Change it to 0 and transmit it in the code stream.
  • the embodiments of the present application have the following advantages: detect the mute mark information of multi-channel input signals, determine the mute mark information, and perform subsequent encoding processing, such as bit allocation, based on the mute mark information.
  • the mute channel may not be encoded or may be encoded according to a relatively simple method.
  • Low-bit encoding saves encoding bits and improves encoding efficiency.
  • the hybrid coding improvement scheme is described as follows:
  • a mixed-mode codec supports codecs for both acoustic bed and object signals.
  • the specific implementation plan is divided into three parts:
  • Hybrid coding bit pre-allocation According to the multi-channel side information bedBitsRatio, the number of pre-allocated bits bedAvailbleBytes of the sound bed signal and the number of pre-allocated bits objAvailbleBytes of the object signal are obtained.
  • Hybrid coding bit allocation divided into four steps, in order of processing: silent frame bit allocation, non-silent frame bit allocation adaptation, non-silent frame bit allocation, non-silent frame bit allocation adaptation restoration.
  • Silent frame bit allocation If there is a silent frame, allocate bits to the silent frame channel according to the silence flag silFlag[i] of the side information and the mix allocation strategy mixAllocStrategy, and update the pre-allocated bit number bedAvailbleBytes of the acoustic bed signal and the pre-allocated bit number of the object signal. The total number of bits allocated objAvailbleBytes.
  • Non-silent frame bit allocation adaptation Sequential mapping of channel parameters to facilitate non-silent frame bit allocation processing.
  • Non-silent frame bit allocation Allocate bits according to the updated pre-allocated bit number bedAvailbleBytes of the acoustic bed signal and the updated pre-allocated bit number objAvailbleBytes of the object signal and the channel bit allocation scaling factor chBitRatios.
  • Non-silent frame bit allocation adaptation restoration sequential inverse mapping of vocal channel parameters, which is used to facilitate subsequent interval decoding, inverse quantization and neural network inverse transformation steps.
  • Mixed coding upmixing Perform M/S upmixing based on the two paired channels ch1 and ch2 indicated by the channel pair index channelPairIndex, and obtain the upmixed channel signal.
  • the multi-channel stereo side information syntax is shown in Table 1 below, which is the DecodeMcSideBits() syntax.
  • bedBitsRatio occupies 4 bits and represents the scale factor index of the acoustic bed signal in the total number of bits. The value is 0-15.
  • the corresponding floating point ratio is as follows:
  • mixAllocStrategy occupies 2 bits and represents the distribution strategy of the mixed signal of the acoustic bed signal and the object signal.
  • the hybrid allocation strategy may be predetermined, or the hybrid allocation strategy may be predefined according to coding parameters, which include: coding rate and signal characteristic parameters. Encoding parameters are predetermined. The value range and meaning of the allocation strategy are as follows:
  • HasSilFlag occupies 1 bit, 0 means turning off silent frame processing or there is no silent frame; 1 means turning on silent frames Processed and silent frames are present.
  • silFlag[i] occupies 1 bit and represents the silent frame mark of the corresponding channel. 0 represents a non-silent frame and 1 represents a silent frame.
  • soundBedType occupies 1 bit, type of sound bed, 0 f only object signal or none (only objs), 1 is sound bed signal or HOA signal or mc or hoa.
  • codingProfile occupies 3 bits, 0 mono, or stereo signal or sound bed signal for mono/stereo/mc, 1 sound bed and object mixed signal for channel+obj mix, 2 for hoa.
  • pairCnt occupies 4 bits and is used to represent the number of channel pair pairs in the current frame.
  • the number of channelPairIndex bits is related to the total number of channels, see Note 1 in the above table. Used to represent the index of a channel pair.
  • the index values of the two channels in the current channel pair can be parsed, namely ch1 and ch2.
  • mcIld[ch1], mcIld[ch2] occupy 4 bits.
  • the inter-channel amplitude difference parameter of each channel in the current channel pair is used to restore the amplitude of the decoded spectrum.
  • scaleFlag[ch1], scaleFlag[ch2] occupy 1 bit, indicating the scaling flag parameter of each channel in the current channel pair, indicating whether the amplitude of the current channel is reduced or enlarged.
  • chBitRatios occupies 4 bits and represents the bit allocation ratio of each channel.
  • the decoding process is as follows. First, hybrid coding bits are pre-allocated.
  • the function of the hybrid coding bit pre-allocation module is to calculate the remaining available bits after removing other side information to obtain the number of sound bed pre-allocation bytes and The number of bytes pre-allocated by the object is provided for use by subsequent modules.
  • the number of available bytes remaining in the current frame after deducting other side information is recorded as availableBytes, where the number of bytes pre-allocated by the sound bed is bedAvailbleBytes, and the number of bytes pre-allocated by the object is objAvailbleBytes.
  • the scale factor index parameter of the acoustic bed signal accounting for the total number of bits is bedBitsRatio.
  • the floating point scale factor corresponding to bedBitsRatio is bedBitsRatioFloat.
  • the corresponding relationship between bedBitsRatio and bedBitsRatioFloat is shown in the bedBitsRatio part of the aforementioned semantics.
  • the formula for calculating the number of pre-allocated bytes of the sound bed bedAvailbleBytes and the number of pre-allocated bytes of objects objAvailbleBytes based on the number of available bytes availableBytes and the floating-point scale factor bedBitsRatioFloat of the total number of bits of the sound bed signal is as follows:
  • bedAvailbleBytes floor(availableBytes*bedBitsRatioFloat);
  • objAvailbleBytes availableBytes–bedAvailbleBytes.
  • the hybrid coding bit allocation process is as follows.
  • the hybrid coding bit allocation will be completed based on the bit allocation parameters in the bit stream, the number of available bytes and other parameters.
  • the available bits will be allocated to each downmix channel in the hybrid coding multi-channel stereo, thereby completing Subsequent interval decoding, inverse quantization and neural network inverse transform steps.
  • the hybrid coding bit allocation consists of the following parts:
  • Bit allocation for silence frame channels The function of the bit allocation processing module of the silent frame channel is based on the allocation strategy parameter mixAllocStrategy of the mixed signal of the acoustic bed signal and the object signal obtained by decoding in the bit stream and the mute enable flag HasSilFlag and the silent frame flag parameter obtained by decoding in the bit stream.
  • the silence flag silFlag is used to complete the bit allocation of the mixed signal silence frame.
  • Step 1 Hybrid coding silent frame bit allocation processing.
  • the hybrid coding silence frame bit allocation processing sub-module marks relevant parameters of the silence frame obtained by decoding in the bit stream.
  • HasSilFlag and silFlag are used to complete the bit allocation of hybrid coding silent frames. There are the following situations and corresponding treatments:
  • Case 1 When HasSilFlag is parsed to 0, it means that the silent frame processing mode is not enabled in the current frame or there is no silent frame in the current frame, and the hybrid coding silent frame bit allocation processing sub-module does not perform other operations.
  • Case 2 When HasSilFlag is parsed to 1, it means that silent frame processing is enabled in the current frame and there is a silent frame. At this time, the silFlag[i] of all channels are traversed.
  • silFlag[i] When silFlag[i] is 1, the number of bytes of the channelBytes[i] is set to the minimum number of safety bytes, safetyBytes, and the value of the minimum number of safety bytes, safetyBytes. It is related to the requirements of the quantization and interval coding modules on the number of input bytes. For example, it can be set to 10 bytes here.
  • bedAvailbleBytes- safetyBytes.
  • Step 2 Silence frame remaining bit allocation strategy.
  • the function of the silence frame bit allocation strategy sub-module is to determine how to allocate the remaining bits generated by the silence frame to the sound according to the allocation strategy parameter mixAllocStrategy of the mixed signal of the acoustic bed signal and the object signal obtained by decoding in the bit stream when there is a silence frame. Whether it is a bed signal or an object signal, the specific allocation strategy is determined by the value of mixAllocStrategy. For details on the meaning of the value of mixAllocStrategy, see the mixAllocStrategy section.
  • the embodiment of this application supports two different strategies for allocating remaining bits of silence frames. First do the precomputation:
  • the average number of bytes allocated to the object channel objAvgBytes is calculated.
  • the calculation formula is as follows:
  • objAvgBytes[i] floor(objAvailbleBytes/objNum);
  • bedAvailbleBytes+ objSilLeftBytes
  • objAvailbleBytes- objSilLeftBytes.
  • Non-silent frame bit allocation pre-adaptation Map the input parameters of the bit allocation of the non-silent frame channel into a continuous arrangement of channels (the existence of the silent frame channel will cause the non-silent frame channel to be physically arranged discretely), which facilitates the subsequent module's non-silent frame channel arrangement Bit allocation processing.
  • Bit allocation for non-silent frame channels adopts the bit allocation general module. Its function is to allocate the available bits to the sound bed based on the updated pre-allocated number of bytes bedAvailbleBytes and the channel bit allocation ratio. Sound Bed Object Each downmix channel in a multichannel stereo.
  • the number of available bytes entered is recorded as availableBytes.
  • the first step is to allocate bits to each channel according to chBitRatios.
  • the number of bytes per channel can be expressed as:
  • channelBytes[i] availableBytes*chBitRatios[i]/(1 ⁇ 4).
  • (1 ⁇ 4) represents the maximum value range of the channel bit allocation ratio chBitRatios.
  • Step 3 If there are still bits left after the end of step 2, the remaining bits will be allocated to the channel with the most bytes allocated in step 1.
  • Step 4 If the number of bytes allocated to some channels exceeds the upper limit of the number of bytes for a single channel, the excess will be allocated to the remaining channels.
  • the bit allocation process for the non-silent frame audio channel of the object uses the bit allocation general module. Its function is to allocate the available bits to the sound bed object based on the number of available bytes after the object is updated, objAvailbleBytes and the channel bit allocation ratio. Individual downmix channels in multichannel stereo.
  • the target specific non-silent frame channel performs a bit allocation process and the non-silent frame channel of the sound bed signal performs a bit allocation process.
  • Non-silent frame channel adaptation restoration The byte number parameter output by the non-silent frame channel bit allocation processing is inversely mapped into a physical arrangement according to the aforementioned rules (the existence of the silent frame channel will cause the non-silent frame channel to be physically discretely arranged), which is convenient Processing of subsequent module interval decoding, inverse quantization and neural network inverse transform steps.
  • the modified discrete cosine transform (MDCT) spectrum of the upmixed channel needs to be processed by inverse binaural level difference (ILD).
  • ILD inverse binaural level difference
  • factor is the amplitude adjustment factor corresponding to the ILD parameter of the i-th channel
  • (1 ⁇ 4) is the maximum quantization value range of mcIld
  • mdctSpectrum[i] represents the MDCT coefficient vector of the i-th channel.
  • the multi-channel signal is a mixed signal including the acoustic bed signal and the object signal
  • the multi-channel signal contains a silent frame
  • different mixed signals including the acoustic bed signal and the object signal are used.
  • the allocation strategy mixAllocStrategy distributes the bits saved in silent frames to other non-silent frames to improve coding efficiency.
  • the improvements of the embodiments of this application are as follows: determine the number of pre-allocated bits bedAvailbleBytes of the sound bed and the total number of pre-allocated bits objAvailbleBytes of the object; determine whether the sound bed and the object include silent frames; if there is a silent frame, based on the side information silFlag [i] and mixAllocStrategy to allocate bits to the silent frame channel, and update the pre-allocated bit number bedAvailbleBytes of the sound bed and the total pre-allocated bit number objAvailbleBytes of the object.
  • the embodiment of the present application proposes a method of bit allocation mode bit stream in acoustic bed object mixing mode. Analyze the allocation strategy mixAllocStrategy of the mixed signal including the acoustic bed signal and the object signal from the code stream; allocate bits to the silent frame channel according to the allocation strategy of the mixed signal including the acoustic bed signal and the object signal.
  • Whether to allocate the remaining number of bits generated by the silence frame to the acoustic bed signal or the object signal is determined according to the obtained allocation strategy parameter mixAllocStrategy of the mixed signal including the acoustic bed signal and the object signal.
  • mixAllocStrategy2 bits indicating the distribution strategy of the mixed signal including the acoustic bed signal and the object signal.
  • the value range and meaning are as follows:
  • the extra bits generated by the Mute mechanism belong to the sound bed signal, and the extra bits are allocated to other sound bed signals.
  • the extra bits belong to the object signal, and the extra bits are allocated to other object signals.
  • the specific remaining bit allocation methods corresponding to the two different silence frame remaining bit allocation strategies.
  • the multi-channel signal is a mixed signal including the acoustic bed signal and the object signal
  • the object signal is treated as the acoustic bed signal and the bits are allocated together according to a unified bit allocation strategy.
  • the acoustic bed signal and the object signal interact with each other, and the quality changes uniformly. Difference.
  • the embodiment of this application proposes a method of bit allocation bit stream in acoustic bed object mixing mode, specifically:
  • the bit allocation proportion factor is obtained according to the code stream decoding.
  • the bit allocation proportion factor is used to represent the difference between the number of encoding bits of the acoustic bed signal and/or the object channel signal and the total The relationship between the number of available bits;
  • bit allocation scaling factor determine the number of pre-allocated bits bedAvailbleBytes of the acoustic bed signal and the number of pre-allocated bits objAvailbleBytes of the object signal;
  • the bit allocation proportional factor is the proportional factor of the number of encoding bits of the acoustic bed signal to the total number of available bits (bedBitsRatioFloat in the embodiment), or the proportional factor of the number of encoding bits of the object signal to the total number of available bits, or the encoding of the acoustic bed signal.
  • the bit allocation proportional factor is the proportional factor of the number of coded bits of the acoustic bed signal to the total number of available bits.
  • the specific method of determining the bit allocation proportional factor is: parsing the bit allocation proportional factor index (such as bedBitsRatio in the embodiment) from the code stream, According to the bit allocation scaling factor index, the bit allocation scaling factor (such as bedBitsRatioFloat in the embodiment) is determined.
  • the bit allocation proportional factor index may be a coding index after uniform quantization coding of the bit allocation proportional factor, or may be a coding index after non-uniform quantization coding of the bit allocation scaling factor.
  • bit allocation scaling factor index and the bit allocation scaling factor may have a linear relationship or a non-linear relationship.
  • the formula for calculating the number of bed pre-allocated bytes bedAvailbleBytes and the number of object pre-allocated bytes objAvailbleBytes is calculated based on the number of available bytes availableBytes and bedBitsRatioFloat, the floating-point scale factor bedBitsRatioFloat that accounts for the total number of bits.
  • the formula is as follows:
  • bedAvailbleBytes floor(availableBytes*bedBitsRatioFloat);
  • objAvailbleBytes availableBytes–bedAvailbleBytes.
  • Parse the silence flag information (including HasSilFlag and silFlag[i]) from the code stream, perform bit allocation based on the pre-allocated bit number bedAvailbleBytes of the acoustic bed signal, the pre-allocated bit number objAvailbleBytes of the object signal and the silence flag information, and determine each channel The number of bit allocations.
  • Steps for hybrid coding bit allocation Determine whether there is a silence frame based on the silence flag information; if there is a silence frame, allocate bits to the silence frame channel based on the side information silFlag[i] (and mixAllocStrategy), and update the pre-allocation of the acoustic bed signal The number of bits bedAvailbleBytes and the total number of pre-allocated bits of the object signal objAvailbleBytes; according to the non-silent frame bit allocation principle, allocate bits to the non-silent frame channel (including non-silent frame bit allocation adaptation, non-silent frame bit allocation and non-silent frame bits Distribution adaptation restores three steps).
  • the encoding end determines the bit allocation scaling factor
  • bit allocation scaling factor index and the bit allocation scaling factor may have a linear relationship or a non-linear relationship.
  • the scaling factor is predefined according to the encoding parameters.
  • Coding parameters include: coding rate and signal characteristic parameters. Encoding parameters are predetermined.
  • the coding parameters are adaptively determined based on the characteristics of each frame signal, such as the type of signal.
  • the encoding end determines the hybrid allocation strategy and carries the hybrid allocation strategy in the code stream.
  • the encoding end sends it to the decoding end.
  • the allocation strategy of the sound bed object mixed signal can also include other modes, such as:
  • the object mute enable flag is 1, and the excess bits caused by the existence of mute channels in the object signal are allocated to other non-mute channels in the object channel;
  • Mode 2 The object mute enable flag is 1, and the excess bits generated due to the existence of mute channels in the object signal are allocated to the channel where the sound bed signal is located;
  • Mode 3 The sound bed mute enable flag is 1, and the excess bits caused by the existence of mute channels in the sound bed signal are allocated to other non-silent channels in the sound bed channel;
  • Mode 4 The sound bed mute enable flag is 1, and the excess bits caused by the existence of mute channels in the sound bed signal are allocated to the channel where the target signal is located;
  • Mode 5 The sound bed mute enable flag and the object mute enable flag are both 1, and the excess bits caused by the existence of mute channels in the object signal are allocated to other non-mute channels in the object channel;
  • Mode 6 The sound bed mute enable flag and the object mute enable flag are both 1, and the excess bits generated due to the existence of mute channels in the object signal are allocated to other non-mute channels in the sound bed channel.
  • the mixed signal coding improvement scheme is as follows:
  • the mixed-signal encoding mode in the AVS3P3 standard supports the encoding and decoding of acoustic bed signals and object signals.
  • this proposal proposes an efficient coding method for mixed signals, which improves the coding quality of mixed signals through reasonable bit allocation of silent frames and non-silent frames in acoustic bed signals and object signals.
  • the bit allocation strategy of mixed signals is implemented on the encoding end, and the decoding end does not distinguish between sound beds and objects in the bit allocation process.
  • Specific implementation plans include:
  • the mute enable flag is denoted as HasSilFlag, and the mute flag of the i-th channel in each channel is denoted as silFlag[i].
  • the mute enable flag is the mute applied to other channel signals that do not include the LFE channel signal in the multi-channel signal. Enable flag.
  • HasSilFlag is used to indicate whether there are silence frames in channels other than the LFE channel in each channel. Except for the LFE channel, the SilFlag corresponding to each channel is used to indicate whether the channel is a silent frame.
  • chBitRatios[i] changed from a non-LFE channel to a non-LFE non-silent channel; the number of bits in chBitRatios[i] was changed from 4 to 6;
  • the ILD side information is changed from a 4-bit inter-channel amplitude difference parameter and a 1-bit scaling flag parameter to a 5-bit scaling factor codebook index.
  • the multi-channel stereo decoding syntax is shown in Table 2 below, which is the Avs3McDec() syntax.
  • the multi-channel stereo side information syntax is as shown in Table 3, which is the DecodeMcSideBits() syntax.
  • coupleChNum is the number of channels in the multi-channel signal excluding the LFE channel.
  • HasSilFlag occupies 1 bit and indicates whether there is a silent frame in each channel of the current frame of the audio signal. 0 indicates that there is no silent frame, and 1 indicates that there is a silent frame.
  • silFlag[i] occupies 1 bit, 0 indicates that the i-th channel is a non-silent frame, 1 indicates that the i-th channel is a silent frame
  • mcIld[ch1], mcIld[ch2] occupy 5 bits.
  • the codebook index of the inter-channel amplitude difference ILD parameter quantification of each channel in the current channel pair is used to restore the amplitude of the decoded spectrum.
  • pairCnt occupies 4 bits and is used to represent the number of channel pair pairs in the current frame.
  • the channel pair index is expressed as channelPairIndex.
  • the number of channelPairIndex bits is related to the total number of channels. See Note 1 in the above table. Used to represent the index of a channel pair.
  • the index values of the two channels in the current channel pair can be parsed, namely ch1 and ch2.
  • chBitRatios occupies 6 bits, indicating the bit allocation ratio of each channel.
  • the decoding process is as follows:
  • the mixed signal bit allocation is based on the mute channel mark and bit allocation ratio parameters obtained by decoding in the bit stream, and the remaining available bits after removing other side information are allocated to each downmix channel in the multi-channel stereo, thereby completing the subsequent intervals Decoding, inverse quantization and neural network inverse transform steps.
  • mute channels there may be mute channels in the multi-channel stereo mode.
  • the mute channels do not need to participate in the bit allocation process of the multi-channel stereo mode.
  • a fixed number of bytes can be allocated in advance, and the number of bytes is 8. If the mute channel exists, the number of pre-allocated bytes of the mute channel is deducted from the number of available bytes availableBytes, and the remaining bytes after deduction are allocated to other channels except the mute channel.
  • the first step is to pre-allocate the number of safe bytes safeBits for each channel, and the number of safe bytes is 8.
  • the number of safe bytes is deducted from the number of available bytes availableBytes, and the remaining number of availableBytes after deduction is used to continue allocation in subsequent steps.
  • bits are allocated to each channel according to chBitRatios.
  • the number of bytes of each channel can be expressed as:
  • channelBytes[i] availableBytes*chBitRatios[i]/(1 ⁇ 6).
  • (1 ⁇ 6) represents the maximum value range of the channel bit allocation ratio chBitRatios.
  • step 1 if there are still bits remaining after the third step, allocate the remaining bits to the channel with the most allocated bytes in step 1.
  • Step 5 If the number of bytes allocated to some channels exceeds the upper limit of the number of bytes in a single channel, the excess will be allocated to the remaining channels.
  • M/S upmixing is performed on the two paired channels ch1 and ch2 indicated by the channel pair index channelPairIndex.
  • the upmixing method is the same as the two-channel stereo mode M/S upmixing. Mix uniformly.
  • M/S upmixing it is necessary to perform inverse ILD processing on the MDCT spectrum of the upmixed channel to restore the amplitude difference of the channels.
  • the pseudo code of the inverse ILD processing is as follows:
  • mdctSpectrum[i] factor*mdctSpectrum[i].
  • factor is the amplitude adjustment factor corresponding to the ILD parameter of the i-th channel
  • mcIldCodebook is the quantization codebook of the ILD parameter, as shown in Table 4 below
  • mcIld[i] represents the codebook index corresponding to the ILD parameter of the i-th channel
  • mdctSpectrum[i] represents the MDCT coefficient vector of the i-th channel.
  • Table 4 is the following Table 4 is the mcILD code table:
  • an encoding device 1000 may include: a silence mark information acquisition module 1001, a multi-channel encoding module 1002, and a code stream generation module 1003, wherein,
  • a mute mark information acquisition module used to obtain mute mark information of multi-channel signals, where the mute mark information includes: a mute enable flag, and/or a mute flag;
  • a multi-channel encoding module used to perform multi-channel encoding processing on the multi-channel signal to obtain the transmission channel signal of each transmission channel;
  • a code stream generation module configured to generate a code stream according to the transmission channel signal of each transmission channel and the silence mark information.
  • the code stream includes: the silence mark information and the multi-channel encoding result of the transmission channel signal. .
  • a decoding device 1100 provided by an embodiment of the present application may include: a parsing module 1101 and a processing module 1102, wherein,
  • An analysis module configured to parse the silence mark information from the code stream of the encoding device, and determine the coding information of each transmission channel based on the silence mark information.
  • the silence mark information includes: a silence enable flag, and/or a silence mark. ;
  • a processing module configured to decode the encoded information of each transmission channel to obtain the decoded signal of each transmission channel
  • the processing module is also used to perform multi-channel decoding processing on the decoded signals of each transmission channel to obtain a multi-channel decoded output signal.
  • An embodiment of the present application also provides a computer storage medium, wherein the computer storage medium stores a program, and the program executes some or all of the steps described in the above method embodiments.
  • Receiver 1201, transmitter 1202, processor 1203 and memory 1204 (the number of processors 1203 in the encoding device 1200 may be one or more, one processor is taken as an example in Figure 12).
  • the receiver 1201, the transmitter 1202, the processor 1203 and the memory 1204 may be connected through a bus or other means, wherein the connection through the bus is taken as an example in Figure 12.
  • Memory 1204 may include read-only memory and random access memory and provides instructions and data to processor 1203 .
  • a portion of memory 1204 may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 1204 stores an operating system and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, where the operating instructions may include various operating instructions for implementing various operations.
  • the operating system may include various system programs that are used to implement various basic services and handle hardware-based tasks.
  • the processor 1203 controls the operation of the encoding device, and the processor 1203 may also be called a central processing unit (CPU).
  • CPU central processing unit
  • various components of the encoding device are coupled together through a bus system.
  • the bus system may also include a power bus, a control bus, a status signal bus, etc.
  • various buses are called bus systems in the figure.
  • the methods disclosed in the above embodiments of the present application can be applied to the processor 1203 or implemented by the processor 1203.
  • the processor 1203 may be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 1203 .
  • the above-mentioned processor 1203 can be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (field-programmable gate array, FPGA), or Other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC application specific integrated circuit
  • FPGA field-programmable gate array
  • FPGA field-programmable gate array
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory 1204.
  • the processor 1203 reads the information in the memory 1204 and completes the steps of the above method in combination with its hardware.
  • the receiver 1201 can be used to receive input numeric or character information, and generate signal input related to the relevant settings and function control of the encoding device.
  • the transmitter 1202 can include a display device such as a display screen, and the transmitter 1202 can be used to output numbers through an external interface. or character information.
  • the processor 1203 is used to execute the method executed by the encoding device as shown in FIG. 4, FIG. 6, and FIG. 7 in the aforementioned embodiment.
  • the decoding device 1300 includes:
  • Receiver 1301, transmitter 1302, processor 1303 and memory 1304 (the number of processors 1303 in the decoding device 1300 may be one or more, one processor is taken as an example in Figure 13).
  • the receiver 1301, the transmitter 1302, the processor 1303 and the memory 1304 may be connected through a bus or other means, where, In Figure 13, connection via bus is taken as an example.
  • Memory 1304 may include read-only memory and random access memory and provides instructions and data to processor 1303 . Portion of memory 1304 may also include NVRAM.
  • the memory 1304 stores operating systems and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, where the operating instructions may include various operating instructions for implementing various operations.
  • the operating system may include various system programs that are used to implement various basic services and handle hardware-based tasks.
  • the processor 1303 controls the operation of the decoding device, and the processor 1303 may also be called a CPU.
  • various components of the decoding device are coupled together through a bus system, where in addition to the data bus, the bus system may also include a power bus, a control bus, a status signal bus, etc.
  • bus systems are called bus systems in the figure.
  • the methods disclosed in the above embodiments of the present application can be applied to the processor 1303 or implemented by the processor 1303.
  • the processor 1303 may be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 1303 .
  • the above-mentioned processor 1303 may be a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory 1304.
  • the processor 1303 reads the information in the memory 1304 and completes the steps of the above method in combination with its hardware.
  • the processor 1303 is used to execute the method executed by the decoding device as shown in FIG. 5, FIG. 8, and FIG. 9 in the aforementioned embodiment.
  • the chip when the encoding device or the decoding device is a chip in a terminal, the chip includes: a processing unit and a communication unit.
  • the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output device. Output interface, pin or circuit, etc.
  • the processing unit can execute computer execution instructions stored in the storage unit, so that the chip in the terminal executes any one of the audio encoding methods of the first aspect, or any one of the audio decoding methods of the second aspect.
  • the storage unit is a storage unit within the chip, such as a register, cache, etc.
  • the storage unit may also be a storage unit in the terminal located outside the chip, such as a read-only memory (read-only memory). -only memory, ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.
  • the processor mentioned in any of the above places may be a general central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control program execution of the method of the first aspect or the second aspect.
  • the device embodiments described above are only illustrative.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physically separate.
  • the physical unit can be located in one place, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the connection relationship between modules indicates that there is a communication connection between them, which can be implemented as a Or multiple communication buses or signal lines.
  • the present application can be implemented by software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memories, Special components, etc. to achieve. In general, all functions performed by computer programs can be easily implemented with corresponding hardware. Moreover, the specific hardware structures used to implement the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. circuit etc. However, for this application, software program implementation is a better implementation in most cases. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology.
  • the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to cause a computer device (which can be a personal computer, server, or network device, etc.) to execute the method described in each embodiment of the application. .
  • a computer device which can be a personal computer, server, or network device, etc.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means.
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that a computer can store, or a data storage device such as a server or data center integrated with one or more available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, solid state disk (Solid State Disk, SSD)), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本申请实施例公开了一种多声道信号的编码方法和编解码设备以及终端设备,其中,一种多声道信号的编解码方法,包括:获取多声道信号的静音标记信息,所述静音标记信息包括:静音使能标志,和/或静音标志;对所述多声道信号进行多声道编码处理,以得到各传输通道的传输通道信号;根据所述各传输通道的传输通道信号和所述静音标记信息生成码流,所述码流包括:所述静音标记信息和所述传输通道信号的多声道编码结果。本申请实施例中根据静音标记信息对各传输通道的传输通道信号进行编码以生成码流,考虑到了多声道信号的静音情况,因此提高编码效率和编码比特资源利用率。

Description

一种多声道信号的编解码方法和编解码设备以及终端设备
本申请要求于2022年03月14日提交中国专利局、申请号为202210254868.9、发明名称为“一种多声道信号的编解码方法和终端设备以及网络设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请要求于2022年06月20日提交中国专利局、申请号为202210699863.7、发明名称为“一种多声道信号的编解码方法和编解码设备以及终端设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及音频编解码领域,尤其涉及一种多声道信号的编解码方法和编解码设备以及终端设备。
背景技术
音频数据的压缩是媒体通信和媒体广播等媒体应用中不可或缺的环节。音频数据的压缩可以通过多声道编码实现,多声道编码可以是对具有多个声道的声床信号进行编码,多声道编码也可以是对多个对象音频信号进行编码。多声道编码还可以是对同时包含声床信号和对象音频信号的混合信号进行编码。
声床信号、对象信号、还是包含声床信号和对象音频信号的混合信号都可以作为多声道信号输入到音频通道中,而多声道信号的特征不可能完全相同,而且多声道信号的特征也在不断变化。
目前针对上述的多声道信号,采用固定的编码方案进行处理,例如采用统一的比特分配方案进行处理,根据比特分配的结果对多声道信号进行量化编码。上述统一的比特分配方案虽然具有简单易操作的优点,但是存在编码效率低,编码比特资源浪费的问题。
发明内容
本申请实施例提供了一种多声道信号的编解码方法和编解码设备以及终端设备,用于提高编码效率和编码比特资源利用率。
为解决上述技术问题,本申请实施例提供以下技术方案:
第一方面,本申请实施例提供一种多声道信号的编码方法,包括:
获取多声道信号的静音标记信息,以得到静音标记信息,所述静音标记信息包括:静音使能标志,和/或静音标志;
对所述多声道信号进行多声道编码处理,以得到各传输通道的传输通道信号;
根据所述各传输通道的传输通道信号和所述静音标记信息生成码流,所述码流包括:所述静音标记信息和所述各传输通道的传输通道信号的多声道量化编码结果。
在上述方案中,多声道信号的静音标记信息包括:静音使能标志,和/或静音标志;对所述多声道信号进行多声道编码处理,以得到各传输通道的传输通道信号;根据所述各传输通道的传输通道信号和所述静音标记信息生成码流,所述码流包括:所述静音标记信息和所述各传输通道的传输通道信号的多声道量化编码结果。本申请实施例中根据静音标记 信息对各传输通道的传输通道信号进行编码以生成码流,考虑到了多声道信号的静音情况,因此提高编码效率和编码比特资源利用率。
在一种可能的实现方式中,所述多声道信号,包括:声床信号,和/或对象信号;
所述静音标记信息包括:所述静音使能标志;所述静音使能标志包括:全局静音使能标志,或部分静音使能标志,其中,
所述全局静音使能标志为作用于所述多声道信号的静音使能标志;或者,
所述部分静音使能标志为作用于所述多声道信号中部分声道的静音使能标志。
在一种可能的实现方式中,当所述静音使能标志为所述部分静音使能标志时,
所述部分静音使能标志为作用于所述对象信号的对象静音使能标志,或者,所述部分静音使能标志为作用于所述声床信号的声床静音使能标志,或者,所述部分静音使能标志为作用于所述多声道信号中不包含非低频效果LFE声道信号的其他声道信号的静音使能标志,或者所述部分静音使能标志为作用于多声道信号中参与组对的声道信号的静音使能标志。
在上述方案中,通过上述全局静音使能标志,或部分静音使能标志能够对针对声床信号和/或对象信号进行静音指示,从而基于全局静音使能标志或部分静音使能标志进行后续的编码处理,例如比特分配,可以提升编码效率。
在一种可能的实现方式中,所述多声道信号,包括:声床信号,和对象信号;
所述静音标记信息包括:所述静音使能标志;所述静音使能标志包括:声床静音使能标志,和对象静音使能标志,
所述静音使能标志占用第一比特位和第二比特位,所述第一比特位用于承载所述声床静音使能标志的值,所述第二比特位用于承载所述对象静音使能标志的值。
在上述方案中,静音使能标志可以使用不同的比特位来指示该静音使能标志的具体实现方式,例如预定义第一比特位和第二比特位,通过上述不同的比特位,能够指示静音使能标志为声床静音使能标志,和对象静音使能标志。
在一种可能的实现方式中,所述静音标记信息包括:所述静音使能标志;
所述静音使能标志用于指示静音标记检测功能是否开启;或者,
所述静音使能标志用于指示是否需要发送所述多声道信号的各声道的静音标志;或者,
所述静音使能标志用于指示所述多声道信号的各声道是否均为非静音通道。
在上述方案中,静音使能标志用于指示静音检测功能是否开启。例如,静音使能标志为第一值(例如1)时,表示开启静音检测功能,进一步检测多声道信号的各声道的静音标志。静音使能标志为第二值(例如0)时,表示关闭静音检测功能。
在上述方案中,静音使能标志还可以用于指示多声道信号的各声道是否均为非静音通道。例如,静音使能标志为第一值(例如1)时,表示需要进一步检测各声道的静音标志。静音使能标志为第二值(例如0)时,表示多声道信号的各声道均为非静音通道。
在一种可能的实现方式中,所述获取多声道信号的静音标记信息,包括:
根据输入编码设备的控制信令获取所述静音标记信息;或者,
根据编码设备的编码参数获取所述静音标记信息;或者,
对所述多声道信号的各声道进行静音标记检测,以得到所述静音标记信息。
在上述方案中,编码设备中可以输入控制信令,根据该控制信令确定静音标记信息,静音标记信息可以由外部输入控制,或者,编码设备会包括编码参数(也称为编码器参数),编码参数可用于确定静音标记信息,可以根据编码速率、编码带宽等编码器参数预先设定。或者,还可以根据各通道的静音检测结果确定静音标记信息。本申请实施例中对于静音标记信息的实现方式不做限定。
在一种可能的实现方式中,所述静音标记信息包括:所述静音使能标志和所述静音标志;
所述对多声道信号的各声道进行静音标记检测,以得到静音标记信息,包括:
对所述多声道信号的各声道进行静音标记检测,以得到所述各声道的静音标志;
根据所述各声道的静音标志确定所述静音使能标志。
在上述方案中,编码端可以先检测各声道的静音标志,各声道的静音标志用于指示各声道是否为静音帧。在确定各声道的静音标志之后,根据各声道的静音标志确定静音使能标志,基于上述方式可以生成静音使能标志,从而可以生成静音标记信息。
在一种可能的实现方式中,所述静音标记信息包括:所述静音标志;或者,所述静音标记信息包括:所述静音使能标志和所述静音标志;
所述静音标志,用于指示所述静音使能标记作用的各声道是否为静音通道,所述静音通道为不需要编码的通道或者需要按照低比特编码的通道。
在上述方案中,静音标志的值为第一值(例如1)时,表示静音使能标记作用的该声道为静音通道;静音标志的值为第二值(例如0)时,表示静音使能标记作用的该声道为非静音通道。静音标志的值为第一值(例如1)时,不对该声道进行编码或者按照较低比特编码。
在一种可能的实现方式中,所述获取多声道信号的静音标记信息之前,所述方法还包括:
对所述多声道信号进行预处理,以得到预处理后的多声道信号,所述预处理包括如下至少一种:暂态检测、窗型判断、时频变换、频域噪声整形、时域噪声整形、频带扩展编码;
所述获取多声道信号的静音标记信息,包括:
对所述预处理后的多声道信号进行静音标记检测,以得到所述静音标记信息。
在上述方案中,通过上述预处理过程,可以提高多声道信号的编码效率。
在一种可能的实现方式中,所述方法还包括:
对所述多声道信号进行预处理,以得到预处理后的多声道信号,所述预处理包括如下至少一种:暂态检测、窗型判断、时频变换、频域噪声整形、时域噪声整形、频带扩展编码;
根据所述预处理后的多声道信号对所述静音标记信息进行修正。
在上述方案中,经过预处理之后,还可以根据预处理的结果对静音标记信息进行修正,例如,频域噪声整形后,多声道信号的某一声道的能量发生变化,可调整该声道的静音标记检测结果,从而对静音标记信息进行修正。
在一种可能的实现方式中,所述根据所述各传输通道的传输通道信号和所述静音标记 信息生成码流,包括:
根据所述静音标记信息调整初始多声道处理方式,以得到调整后的多声道处理方式;
根据所述调整后的多声道处理方式对所述多声道信号进行编码,以得到所述码流。
在上述方案中,编码端可以依据静音标记信息调整初始多声道处理方式,再根据调整后的多声道处理方式对多声道信号进行编码,从而可以提高编码效率。例如,在多声道信号的筛选过程中,静音标志为1的声道不参与组对筛选。
在一种可能的实现方式中,所述根据所述各传输通道的传输通道信号和所述静音标记信息生成码流,包括:
根据所述静音标记信息、可用比特数和多声道边信息,为所述各传输通道进行比特分配,得到所述各传输通道的比特分配结果;
根据所述各通道的比特分配结果对所述各传输通道的传输通道信号进行编码,以得到所述码流。
在上述方案中,编码端根据静音标记信息、可用比特数和多声道边信息,进行比特分配;根据各传输通道的比特分配结果进行编码,获得编码后的码流。对于该比特分配策略的具体内容不做限定。例如,对传输通道信号的编码可以是多声道量化编码,本申请实施例对多声道量化编码的具体实现可以是组对下混后的信号经过神经网络变化,获得潜在特征;对潜在特征进行量化,并进行区间编码。多声道量化编码的具体实现可以是基于矢量量化对组对下混后的信号进行量化编码。
在一种可能的实现方式中,所述根据所述静音标记信息、可用比特数和多声道边信息,为所述各传输通道进行比特分配,包括:
根据可用比特数和多声道边信息,按照所述静音标记信息对应的比特分配策略为所述各传输通道进行比特分配。
在上述方案中,依据静音标记信息进行比特分配,可以是先根据总的可用比特和各传输通道的信号特征,结合比特分配策略进行初次比特分配。再根据静音标记信息调整比特分配结果,通过比特分配的调整,能够提高多声道信号的传输效率。
在一种可能的实现方式中,所述多声道边信息,包括:声道比特分配比例字段,
其中,所述声道比特分配比例字段用于指示多声道信号中非低频效果LFE声道之间的比特分配比例。
在上述方案中,通过声道比特分配比例字段,能够指示多声道信号中除LFE声道以外的所有声道的比特分配比例,从而确定出每个非LFE声道的比特数。
在一种可能的实现方式中,所述对多声道信号的各声道进行静音标记检测,包括:
根据所述多声道信号的当前帧的各声道的输入信号,确定所述当前帧的各声道的信号能量;
根据所述当前帧的各声道的信号能量,确定所述当前帧的各声道的静音检测参数;
根据所述当前帧的各声道的静音检测参数和预设的静音检测阈值,确定所述当前帧的各声道的静音标志。
在上述方案中,将当前帧各声道的静音检测参数分别与静音检测阈值进行比较,以当前帧的第一声道的静音标志检测为例,如果当前帧第一声道的静音检测参数小于静音检测 阈值,则当前帧第一声道为静音帧,即当前时刻第一声道为静音通道,当前帧第一声道的静音标志muteFlag[1]为第一值(例如1)。如果当前帧第一声道的静音检测参数大于等于静音检测阈值,则当前帧第一声道为非静音帧,即当前时刻第一声道为非静音通道,当前帧第一声道的静音标志muteFlag[1]为第二值(例如0)。
在一种可能的实现方式中,所述对所述多声道信号进行多声道编码处理,以得到所述各传输通道的传输通道信号,包括:
对所述多声道信号进行多声道信号筛选,以得到筛选后的多声道信号;
对所述筛选后的多声道信号进行组对处理,以得到多声道组对信号和多声道边信息;
根据所述多声道边信息对所述多声道组对信号进行下混处理,以得到所述各传输通道的传输通道信号。
在上述方案中,编码设备对多声道信号进行筛选,例如筛选掉不参与多声道组对的多声道信号,得到筛选后的多声道信号。筛选后的多声道信号可以是参与组对的多声道信号,例如筛选后的声道不包括LFE声道。完成多声道信号的筛选之后,还可以对多声道信号进行组对,例如ch1和ch2组成一个声道组对,得到多声道组对信号。在生成多声道组对信号之后,再进行下混处理,对于具体的下混过程不再详细说明,可以得到各传输通道的传输通道信号,本申请实施例中传输通道可以是多声道组对下混后的通道。
在一种可能的实现方式中,所述多声道边信息包括如下至少一种:声道间幅度差参数量化码书索引、声道组对数量、声道对索引;
其中,所述声道间幅度差参数量化码书索引,用于指示所述多声道信号的各声道中每个声道的声道间幅度差ILD参数量化的码书索引,
所述声道组对数量,用于表示所述多声道信号的当前帧的声道组对数量,
所述声道对索引,用于表示声道对的索引。
在上述方案中,本申请实施例中不限定声道间幅度差参数量化码书索引占用的比特数。例如,声道间幅度差参数量化码书索引占用5个比特。声道间幅度差参数量化码书索引可以表示为mcIld[ch1]、mcIld[ch2],占用5比特,当前声道对中每个声道的声道间幅度差ILD参数量化的码书索引,用于恢复解码频谱的幅度。本申请实施例中不限定声道组对数量占用的比特数。例如,声道组对数量占用4个比特,声道组对数量表示为pairCnt,占用4比特,用于表示当前帧的声道组对数量。本申请实施例中不限定声道对索引占用的比特数。例如,声道对索引表示为channelPairIndex,channelPairIndex比特数与总声道数量有关,用于表示声道对的索引,可解析得到当前声道对中的两个声道的索引值,即ch1和ch2。
第二方面,本申请实施例提供了一种多声道信号的解码方法,包括:
从编码设备的码流中解析出静音标记信息,并根据所述静音标记信息确定各传输通道的编码信息,所述静音标记信息包括:静音使能标志,和/或静音标志;
对所述各传输通道的编码信息进行解码,以得到所述各传输通道的解码信号;
对所述各传输通道的解码信号进行多声道解码处理,以得到多声道解码输出信号。
在上述方案中,本申请实施例中解码端可以从编码端的码流中得到静音标记信息,从而便于解码端采用与编码端一致的方式进行解码处理。
在一种可能的实现方式中,所述从编码设备的码流中解析出静音标记信息,包括:
从所述码流中解析出各声道的静音标志;或者,
从所述码流中解析出所述静音使能标志,若所述静音使能标志为第一值时,从所述码流中解析出静音标志;或者,
从所述码流中解析出声床静音使能标志和/或对象静音使能标志,及各声道的静音标志;或者,
从所述码流中解析出声床静音使能标志和/或对象静音使能标志;根据所述声床静音使能标志和/或对象静音使能标志,从所述码流中解析出各声道的部分声道的静音标志。
在上述方案中,码端从编码设备的码流中解析出静音标记信息,根据编码设备生成的静音标记信息的具体内容的不同,解码端得到的静音标记信息与编码侧相对应。具体的,一种方式中,静音标志,用于指示各声道是否为静音通道,静音通道为不需要编码的通道或者需要按照低比特编码的通道,解码端可以从码流中解析出各声道的静音标志。一种方式中,静音使能标志还可以用于指示各声道是否均为非静音通道。例如,静音使能标志为第一值(例如1)时,表示需要进一步检测各声道的静音标志。静音使能标志为第二值(例如0)时,表示各声道均为非静音通道,解码端从码流中解析出静音使能标志,若静音使能标志为第一值时,从码流中解析出静音标志。一种方式中,静音使能标志包括:声床静音使能标志,和/或对象静音使能标志,解码端从码流中解析出声床静音使能标志和/或对象静音使能标志,及各声道的静音标志。一种方式中,解码端从码流中解析出声床静音使能标志和/或对象静音使能标志;根据声床静音使能标志和/或对象静音使能标志,从码流中解析出部分声道的静音标志。
在一种可能的实现方式中,所述对所述各传输通道的编码信息进行解码,包括:
从所述码流中解析出多声道边信息;
根据所述多声道边信息和所述静音标志信息为所述各传输通道进行比特分配,以得到所述各通道的编码比特数;
根据所述各通道的编码比特数对所述各传输通道的编码信息进行解码。
在上述方案中,码流中还可以包括多声道边信息,解码端可以根据多声道边信息和静音标志信息为各传输通道进行比特分配,以得到各传输通道的编码比特数,解码端得到的编码比特数与编码端预设的编码比特数相同,再根据各传输通道的编码比特数对各传输通道的编码信息进行解码,从而实现对各传输通道的传输通道信号的解码。
在一种可能的实现方式中,所述对所述各传输通道的解码信号进行多声道解码处理,以得到多声道解码输出信号之后,所述方法还包括:
对所述多声道解码输出信号进行后处理,所述后处理包括如下至少一种:频带扩展解码、逆时域噪声整形、逆频域噪声整形、逆时频变换。
在上述方案中,上述对多声道解码输出信号进行后处理的过程与编码端的预处理的过程相逆,对于具体的处理方式不再限定。
在一种可能的实现方式中,所述多声道边信息包括如下至少一种:声道间幅度差参数量化码书索引、声道组对数量、声道对索引;
其中,所述声道间幅度差参数量化码书索引,用于指示所述各声道中每个声道的声道 间幅度差ILD参数量化的码书索引,
所述声道组对数量,用于表示所述多声道信号的当前帧的声道组对数量,
所述声道对索引,用于表示声道对的索引。
第三方面,本申请实施例提供了一种编码设备,所述编码设备包括:
静音标记检测模块,用于获取多声道信号的静音标记信息,所述静音标记信息包括:静音使能标志,和/或静音标志;
多声道编码模块,用于对所述多声道信号进行多声道编码处理,以得到各传输通道的传输通道信号;
码流生成模块,用于根据所述各传输通道的传输通道信号和所述静音标记信息生成码流,所述码流包括:所述静音标记信息和所述传输通道信号的多声道量化编码结果。
第四方面,本申请实施例提供了一种解码设备,所述解码设备包括:
解析模块,用于从编码设备的码流中解析出静音标记信息,并根据所述静音标记信息确定各传输通道的编码信息,所述静音标记信息包括:静音使能标志,和/或静音标志;
反量化模块,用于对所述各传输通道的编码信息进行解码,以得到所述各传输通道的解码信号;
多声道解码模块,用于对所述各传输通道的解码信号进行多声道解码处理,以得到多声道解码输出信号。
第五方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面或第二方面所述的方法。
第六方面,本申请实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或第二方面所述的方法。
第七方面,本申请实施例提供一种通信装置,该通信装置可以包括终端设备或者芯片等实体,所述通信装置包括:处理器、存储器;所述存储器用于存储指令;所述处理器用于执行所述存储器中的所述指令,使得所述通信装置执行如前述第一方面或第二方面中任一项所述的方法。
第八方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储第一方面的方法所生成的码流。
第九方面,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持编解码设备实现上述方面中所涉及的功能,例如,发送或处理上述方法中所涉及的数据和/或信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存编解码设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。
附图说明
图1为本申请实施例提供的一种多声道信号的处理系统的组成结构示意图;
图2a为本申请实施例提供的音频编码器和音频解码器应用于终端设备的示意图;
图2b为本申请实施例提供的音频编码器应用于无线设备或者核心网设备的示意图;
图2c为本申请实施例提供的音频解码器应用于无线设备或者核心网设备的示意图;
图3a为本申请实施例提供的多声道编码器和多声道解码器应用于终端设备的示意图;
图3b为本申请实施例提供的多声道编码器应用于无线设备或者核心网设备的示意图;
图3c为本申请实施例提供的多声道解码器应用于无线设备或者核心网设备的示意图;
图4为本申请实施例提供的一种多声道信号的编码方法的示意图;
图5为本申请实施例提供的一种多声道信号的解码方法的示意图;
图6为本申请实施例提供的一种多声道信号的编码流程的示意图;
图7为本申请实施例提供的一种多声道信号的编码流程的示意图;
图8为本申请实施例提供的一种多声道信号的解码流程的示意图;
图9为本申请实施例提供的一种多声道信号的解码流程的示意图;
图10为本申请实施例提供的一种编码设备的组成结构示意图;
图11为本申请实施例提供的一种解码设备的组成结构示意图;
图12为本申请实施例提供的另一种编码设备的组成结构示意图;
图13为本申请实施例提供的另一种解码设备的组成结构示意图。
具体实施方式
本申请实施例提供了一种多声道信号的编解码方法和终端设备以及网络设备,用于提高编码效率和编码比特资源利用率。
下面结合附图,对本申请的实施例进行描述。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
声音(sound)是由物体振动产生的一种连续的波。产生振动而发出声波的物体称为声源。声波通过介质(如:空气、固体或液体)传播的过程中,人或动物的听觉器官能感知到声音。
声波的特征包括音调、音强和音色。音调表示声音的高低。音强表示声音的大小。音强也可以称为响度或音量。音强的单位是分贝(decibel,dB)。音色又称为音品。
声波的频率决定了音调的高低。频率越高音调越高。物体在一秒钟之内振动的次数称为频率,频率单位是赫兹(hertz,Hz)。人耳能识别的声音的频率在20Hz至20000Hz之间。
声波的幅度决定了音强的强弱。幅度越大音强越大。距离声源越近,音强越大。
声波的波形决定了音色。声波的波形包括方波、锯齿波、正弦波和脉冲波等。
根据声波的特征,声音可以分为规则声音和无规则声音。无规则声音是指声源无规则地振动发出的声音。无规则声音例如是影响人们工作、学习和休息等的噪声。规则声音是指声源规则地振动发出的声音。规则声音包括语音和乐音。声音用电表示时,规则声音是一种在时频域上连续变化的模拟信号。该模拟信号可以称为音频信号(acoustic signals)。 音频信号是一种携带语音、音乐和音效的信息载体。
由于人的听觉具有辨别空间中声源的位置分布的能力,则听音者听到空间中的声音时,除了能感受到声音的音调、音强和音色外,还能感受到声音的方位。
声音还可以根据分为单声道和立体声。单声道具有一个声音通道,用一个传声器拾取声音,用一个扬声器进行放音。立体声具有多个声音通道,且不同的声音通道传输不同声音波形。其中,声音通道也可以简称为声道或者通道,例如多声道信号可以包括各声道的信号,该各声道也可以称为各通道,本申请后续实施例中各声道与各通道的含义相同。当多声道信号经过多声道编码之后可以得到各传输通道的传输通道信号,该传输通道指的是经过多声道编码之后的通道,进一步的,该多声道编码可以包括声道组对以及下混处理,因此传输通道也可以称为声道组对以及下混后的通道。详见后续实施例中对多声道编码过程的说明。
本申请实施例应用于音频编解码领域,特别是多声道编码。多声道编码可以是对具有多个声道的声床信号进行编码,例如5.1声道、5.1.4声道、7.1声道、7.1.4声道、22.2声道等。多声道编码也可以是对多个对象音频信号进行编码。多声道编码还可以是对同时包含声床信号和/或对象音频信号的混合信号进行编码。
其中,5.1声道:包括中央声道(C)、前置左声道(L)、前置右声道(R)、后置左环绕声道(LS)、后置右环绕声道(RS),以及0.1(LFE)声道。
5.1.4声道是在5.1声道基础上增加如下声道:左高声道、右高声道、左高环绕声道、右高环绕声道。
7.1声道包括中央声道(C)、前置左声道(L)、前置右声道(R)、后置左环绕声道(LS)、后置右环绕声道(RS),左后置声道(LB)、右后置声道(RB)以及0.1声道LFE声道。
7.1.4声道是在7.1声道基础上增加4个高度声道。
22.2声道是一种多声道格式,包括三层共22个声道以及2个LFE声道。
声床信号和对象信号的混合信号是三维声中一种信号组合,共同完成电影制作、体育比赛、音乐会等复杂场景的音频录制、传输及重放需求。例如,体育比赛转播中赛场的声音内容通常由声床信号表示,不同评论员的评论通常用多个对象音频表示。无论是声床信号、对象信号、还是包含声床信号和对象音频信号的混合信号,在同一时刻,不同声道间的输入信号的特征不完全相同,不同时刻间,同一声道的输入信号的特征也在不断变化。
目前的多声道信号采用固定的编码方案,不考虑不同时刻和或不同声道间的输入信号特征的差异,例如采用统一的比特分配方案进行处理,根据比特分配的结果对多声道信号进行量化编码。
采用相同的比特分配方案无法适应不同时刻不同声道间输入信号特征的变化,编码效率低。例如,待编码的多通道音频信号包含5.1.4声道的声床信号和4个对象信号。其中,待编码的14个声道中,通道0-9属于声床信号、通道10-13属于对象信号。某一时刻,通道6-9和通道11、12、13是静音通道(能被听觉感知的信息少),其他通道包含主要音频信息,即非静音通道。另一时刻,静音通道变成通道10、12、13,其他通道包含主要音频信息。
如果不同时刻采用相同的比特分配方案,可能会导致有些包含主要音频信息的声道没 有足够的比特数进行编码,而有些静音通道被分配过多的编码比特数,造成编码比特资源的浪费。
本申请实施例提供一种音频处理技术,尤其是提供一种面向多声道信号的音频编码技术,以改进传统的音频编码系统,多声道信号是指包括多个声道的音频信号,例如多声道信号可以是立体声信号。音频处理包括音频编码和音频解码两部分。音频编码在源侧执行,包括编码(例如,压缩)原始音频以减少表示该音频所需的数据量,从而更高效地存储和/或传输。音频解码在目的侧执行,包括相对于编码器作逆处理,以重建原始音频。编码部分和解码部分也合称为编码。下面将结合附图对本申请实施例的实施方式进行详细描述。
本申请实施例的技术方案可以应用于各种的音频处理系统,如图1所示,为本申请实施例提供的音频处理系统的组成结构示意图。音频处理系统100可以包括:多声道信号的编码装置101和多声道信号的解码装置102。其中,多声道信号的编码装置101又可以称为音频编码装置,可用于生成码流,然后该音频编码码流可以通过音频传输通道传输给多声道信号的解码装置102,多声道信号的解码装置102又可以称为多音频解码装置,可以接收到码流,然后执行多声道信号的解码装置102的音频解码功能,最后获得重建后的信号。
在本申请的实施例中,该多声道信号的编码装置可以应用于各种有音频通信需要的终端设备、有转码需要的无线设备与核心网设备,例如多声道信号的编码装置可以是上述终端设备或者无线设备或者核心网设备的音频编码器。同样的,该多声道信号的解码装置可以应用于各种有音频通信需要的终端设备、有转码需要的无线设备与核心网设备,例如多声道信号的解码装置可以是上述终端设备或者无线设备或者核心网设备的音频解码器。例如,音频编码器可以包括无线接入网、核心网的媒体网关、转码设备、媒体资源服务器、移动终端、固网终端等,音频编码器还可以是应用于虚拟现实技术(virtual reality,VR)流媒体(streaming)服务中的音频编码器。
在申请实施例中,以适用于虚拟现实流媒体(VR streaming)服务中的音频编码模块(audio encoding及audio decoding)为例,端到端对音频信号的编解码流程包括:音频信号A经过采集模块(acquisition)后进行预处理操作(audioPReprocessing),预处理操作包括滤除掉信号中的低频部分,可以是以20Hz或者50Hz为分界点,提取信号中的方位信息,之后进行编码处理(audio encoding)打包(file/segment encapsulation)之后发送(delivery)到解码端,解码端首先进行解包(file/segment decapsulation),之后解码(audio decoding),对解码信号进行双耳渲染(audio rendering)处理,渲染处理后的信号映射到收听者耳机(headphones)上,可以为独立的耳机,也可以是眼镜设备上的耳机。
如图2a所示,为本申请实施例提供的音频编码器和音频解码器应用于终端设备的示意图。对于每个终端设备都可以包括:音频编码器、信道编码器、音频解码器、信道解码器。具体的,信道编码器用于对音频信号进行信道编码,信道解码器用于对音频信号进行信道解码。例如,在第一终端设备20中可以包括:第一音频编码器201、第一信道编码器202、第一音频解码器203、第一信道解码器204。在第二终端设备21中可以包括:第二音频解码器211、第二信道解码器212、第二音频编码器213、第二信道编码器214。第一终端设 备20连接无线或者有线的第一网络通信设备22,第一网络通信设备22和无线或者有线的第二网络通信设备23之间通过数字信道连接,第二终端设备21连接无线或者有线的第二网络通信设备23。其中,上述无线或者有线的网络通信设备可以泛指信号传输设备,例如通信基站,数据交换设备等。
在音频通信中,作为发送端的终端设备首先进行音频采集,对采集到的音频信号进行音频编码,再进行信道编码后,通过无线网络或者核心网进行在数字信道中传输。而作为接收端的终端设备根据接收到的信号进行信道解码,以获得码流,然后经过音频解码恢复出音频信号,由接收端的终端设备进音频回放。
如图2b所示,为本申请实施例提供的音频编码器应用于无线设备或者核心网设备的示意图。其中,无线设备或者核心网设备25包括:信道解码器251、其他音频解码器252、本申请实施例提供的音频编码器253、信道编码器254,其中,其他音频解码器252是指除音频解码器以外的其他音频解码器。在无线设备或者核心网设备25内,首先通过信道解码器251对进入该设备的信号进行信道解码,然后使用其他音频解码器252进行音频解码,然后使用本申请实施例提供的音频编码器253进行音频编码,最后使用信道编码器254对音频信号进行信道编码,完成信道编码之后再传输出去。其中,其他音频解码器252是对信道解码器251解码后的码流进行音频解码。
如图2c所示,为本申请实施例提供的音频解码器应用于无线设备或者核心网设备的示意图。其中,无线设备或者核心网设备25包括:信道解码器251、本申请实施例提供的音频解码器255、其他音频编码器256、信道编码器254,其中,其他音频编码器256是指除音频编码器以外的其他音频编码器。在无线设备或者核心网设备25内,首先通过信道解码器251对进入该设备的信号进行信道解码,然后使用音频解码器255对接收到的音频编码码流进行解码,然后使用其他音频编码器256进行音频编码,最后使用信道编码器254对音频信号进行信道编码,完成信道编码之后再传输出去。在无线设备或者核心网设备中,如果需要实现转码,则需要进行相应的音频编码处理。其中,无线设备指的是通信中的射频相关的设备,核心网设备指的是通信中核心网相关的设备。
在本申请的一些实施例中,该多声道信号的编码装置可以应用于各种有音频通信需要的终端设备、有转码需要的无线设备与核心网设备,例如多声道信号的编码装置可以是上述终端设备或者无线设备或者核心网设备的多声道编码器。同样的,该多声道信号的解码装置可以应用于各种有音频通信需要的终端设备、有转码需要的无线设备与核心网设备,例如多声道信号的解码装置可以是上述终端设备或者无线设备或者核心网设备的多声道解码器。
如图3a所示,为本申请实施例提供的多声道编码器和多声道解码器应用于终端设备的示意图,对于每个终端设备都可以包括:多声道编码器、信道编码器、多声道解码器、信道解码器。该多声道编码器可以执行本申请实施例提供的音频编码方法,该多声道解码器可以执行本申请实施例提供的音频解码方法。具体的,信道编码器用于对多声道信号进行信道编码,信道解码器用于对多声道信号进行信道解码。例如,在第一终端设备30中可以包括:第一多声道编码器301、第一信道编码器302、第一多声道解码器303、第一信道解码器304。在第二终端设备31中可以包括:第二多声道解码器311、第二信道解码器312、 第二多声道编码器313、第二信道编码器314。第一终端设备30连接无线或者有线的第一网络通信设备32,第一网络通信设备32和无线或者有线的第二网络通信设备33之间通过数字信道连接,第二终端设备31连接无线或者有线的第二网络通信设备33。其中,上述无线或者有线的网络通信设备可以泛指信号传输设备,例如通信基站,数据交换设备等。音频通信中作为发送端的终端设备对采集到的多声道信号进行多声道编码,再进行信道编码后,通过无线网络或者核心网进行在数字信道中传输。而作为接收端的终端设备根据接收到的信号,进行信道解码,以获得多声道信号编码码流,然后经过多声道解码恢复出多声道信号,由作为接收端的终端设备进回放。
如图3b所示,为本申请实施例提供的多声道编码器应用于无线设备或者核心网设备的示意图,其中,无线设备或者核心网设备35包括:信道解码器351、其他音频解码器352、多声道编码器353、信道编码器354,与前述图2b类似,此处不再赘述。
如图3c所示,为本申请实施例提供的多声道解码器应用于无线设备或者核心网设备的示意图,其中,无线设备或者核心网设备35包括:信道解码器351、多声道解码器355、其他音频编码器356、信道编码器354,与前述图2c类似,此处不再赘述。
其中,音频编码处理可以是多声道编码器中的一部分,音频解码处理可以是多声道解码器中的一部分,例如,对采集到的多声道信号进行多声道编码可以是将采集到的多声道信号经过处理后获得音频信号,再按照本申请实施例提供的方法对获得的音频信号进行编码;解码端根据多声道信号编码码流,解码获得音频信号,经过上混处理后恢复出多声道信号。因此,本申请实施例也可应用于终端设备、无线设备、核心网设备中的多声道编码器和多声道解码器。在无线或者核心网设备中,如果需要实现转码,则需要进行相应的多声道编码处理。
首先介绍本申请实施例提供的一种多声道信号的编码方法,该方法可以由终端设备执行,例如该终端设备可以是一种多声道信号的编码装置(如下简称编码端或者编码器或者编码设备,例如编码端可以是人工智能(artificial intelligence,AI)编码器)。本申请实施例中多声道信号可以包括多个声道,例如第一声道和第二声道,或者多个声道可以包括第一声道、第二声道和第三声道等。如图4所示,对本申请实施例中编码设备(或者称为编码端)执行的编码流程进行说明:
401、获取多声道信号的静音标记信息,静音标记信息包括:静音使能标志,和/或静音标志。
本申请实施例中编码端输入多声道信号之后,可以获取该多声道信号的静音标记信息。该静音标记信息可以指示多声道信号中的声道的静音情况。例如对多声道信号进行静音标记检测,以检测多声道信号是否支持静音标记,编码端可以根据多声道信号生成静音标记信息。该静音标记信息可以用于指导后续的编码处理,例如比特分配等处理。静音标记信息还可以由编码端写入码流,传输给解码端,保证编解码处理的一致。
本申请实施例中静音标记信息用于指示多声道信号的静音标记,静音标记信息具有多种实现方式,例如静音标记信息可以包含静音使能标志和/或静音标志。其中,静音使能标志用于指示静音检测是否开启,静音标志用于指示多声道信号的各声道是否为静音帧。
在本申请的一些实施例中,多声道信号包含声床信号和/或对象信号,目前的编码方案 不考虑不同时刻和或不同声道间的输入信号特征的差异,采用统一的编码方案进行处理,编码效率低。本申请实施例中提供的静音使能标志能够针对声床信号和/或对象信号进行静音指示。具体的,静音标记信息包括:静音使能标志;静音使能标志包括:全局静音使能标志,或部分静音使能标志,其中,
全局静音使能标志为作用于多声道信号的静音使能标志;或者,
部分静音使能标志为作用于多声道信号中部分声道的静音使能标志。
其中,静音使能标志记作HasSilFlag,静音使能标志可以是全局静音使能标志或部分静音使能标志。通过上述全局静音使能标志,或部分静音使能标志能够对针对声床信号和/或对象信号进行静音指示,从而基于全局静音使能标志或部分静音使能标志进行后续的编码处理,例如比特分配,可以提升编码效率。
在一些具体的实现方式中,当静音使能标志为部分静音使能标志时,
部分静音使能标志为作用于对象信号的对象静音使能标志,或者,部分静音使能标志为作用于声床信号的声床静音使能标志,或者,部分静音使能标志为作用于多声道信号中不包含非低频效果(Low Frequency Effects,LFE)声道的其他声道的静音使能标志,或者所述部分静音使能标志为作用于多声道信号中参与组对的声道信号的静音使能标志。
例如,全局静音使能标志作用于所有通道,部分静音使能标志作用于部分通道。例如,对象静音使能标志应用于多声道信号中对象信号对应的声道,声床静音使能标志应用于多声道信号中声床信号对应的声道。例如,仅作用于多声道信号中的对象信号的对象静音使能标志,记作objMuteEna。又如,仅作用于多声道信号中的声床信号的声床静音使能标志,记作bedMuteEna。
例如,全局静音使能标志为作用于所述多声道信号的静音使能标志:多声道信号只包含声床信号的时候,全局静音使能标志为作用于所述声床信号的静音使能标志;多声道信号只包含对象信号的时候,全局静音使能标志为作用于所述对象信号的静音使能标志;多声道信号包含声床信号和对象信号的时候,全局静音使能标志为作用于所述声床信号和对象信号的静音使能标志。
部分静音使能标志为作用于所述多声道信号中部分声道的静音使能标志,部分声道为预先设定的,例如,所述部分静音使能标志为作用于所述对象信号的对象静音使能标志,或者,所述部分静音使能标志为作用于所述声床信号的声床静音使能标志,或者,所述部分静音使能标志为作用于所述多声道信号中不包含LFE声道信号的其他声道信号的静音使能标志。所述部分静音使能标志为作用于多声道信号中参与组对的声道信号的静音使能标志。本申请实施例中对多声道信号进行组对处理的具体方式不做限定。
在本申请的一些实施例中,多声道信号,包括:声床信号,和对象信号;
静音标记信息包括:静音使能标志;静音使能标志包括:声床静音使能标志,和对象静音使能标志,
静音使能标志占用第一比特位和第二比特位,第一比特位用于承载声床静音使能标志的值,第二比特位用于承载对象静音使能标志的值。
其中,静音使能标志可以使用不同的比特位来指示该静音使能标志的具体实现方式,例如预定义第一比特位和第二比特位,第一比特位用于承载声床静音使能标志的值,第二 比特位用于承载对象静音使能标志的值,通过上述不同的比特位,能够指示静音使能标志为声床静音使能标志,和对象静音使能标志。
在本申请的一些实施例中,步骤401获取多声道信号的静音标记信息,包括:
A1、根据输入编码设备的控制信令获取所述静音标记信息;或者,
A2、根据编码设备的编码参数获取所述静音标记信息;或者,
A3、对所述多声道信号的各声道进行静音标记检测,以得到所述静音标记信息。
其中,编码设备中可以输入控制信令,根据该控制信令确定静音标记信息,静音标记信息可以由外部输入控制,或者,编码设备会包括编码参数(也称为编码器参数),编码参数可用于确定静音标记信息,可以根据编码速率、编码带宽等编码器参数预先设定。或者,还可以根据各通道的静音检测结果确定静音标记信息。本申请实施例中对于静音标记信息的实现方式不做限定。
在本申请的一些实施例中,静音标记信息包括:静音使能标志;
静音使能标志用于指示静音标记检测功能是否开启;
静音使能标志用于指示是否需要发送多声道信号的各声道的静音标志;或者,
静音使能标志用于指示多声道信号的各声道是否均为非静音通道。
其中,静音使能标志用于指示静音检测是否开启。例如,静音使能标志为第一值(例如1)时,表示开启静音检测功能,进一步检测各声道的静音标志。静音使能标志为第二值(例如0)时,表示关闭静音检测功能。或者,静音使能标志可以用于指示各声道是否均为非静音通道。例如,静音使能标志为第一值(例如1)时,表示需要进一步检测各声道的静音标志。静音使能标志为第二值(例如0)时,表示各声道均为非静音通道。
在本申请的一些实施例中,静音标记信息包括:静音使能标志和静音标志;
步骤A3对多声道信号的各声道进行静音标记检测,以得到静音标记信息,包括:
A31、对多声道信号的各声道进行静音标记检测,以得到各声道的静音标志;
A32、根据各声道的静音标志确定静音使能标志。
其中,编码端可以先检测各声道的静音标志,各声道的静音标志用于指示各声道是否为静音帧。各声道的静音标志记作muteflag[ch],其中ch为通道编号,ch=0…N-1,其中N为待编码输入信号的总通道数,其中声床信号的通道数为M,对象声道的通道数为P,总统通道数N=M+P。声床信号的通道编号。例如,待编码信号为包含声床信号和对象信号的混合信号,其中,声床信号为5.1.4声道信号,声床信号的通道数M=10;对象信号的数量为4个,对象信号的通道数P=4;总通道数为14。声床信号的通道编号为从0到9,对象信号的通道编号为10到13。静音标志muteflag[ch],ch=0…13,对应各个通道的静音标志,用于指示各个通道是否为静音通道。在确定各声道的静音标志之后,根据各声道的静音标志确定静音使能标志。
在本申请的一些实施例中,静音标记信息包括:静音标志;或者,静音标记信息包括:静音使能标志和静音标志;
静音标志,用于指示静音使能标志作用的各声道是否为静音通道,静音通道为不需要编码的通道或者需要按照低比特编码的通道。
例如,声床信号的通道编号为从0到9,对象信号的通道编号为10到13。静音标志 muteflag[ch],ch=0…13,对应各个声道的静音标志,用于指示静音使能标志作用的各个声道是否为静音通道。静音通道是信号的能量或分贝或响度低于听觉门限的通道,是不需要编码的通过或者进需要按照较低比特编码的通道。静音标志的值为第一值(例如1)时,表示该通道为静音通道;静音标志的值为第二值(例如0)时,表示该通道为非静音通道。静音标志的值为第一值(例如1)时,不对该通道进行编码或者按照较低比特编码。
在本申请的一些实施例中,步骤A3对多声道信号的各声道进行静音标记检测,包括:
B1、根据多声道信号的当前帧的各声道的输入信号,确定当前帧的各声道的信号能量。
根据当前帧各声道的输入信号,确定当前帧各声道的信号能量,本申请实施例中对帧长的取值不做限定。
B2、根据当前帧的各声道的信号能量,确定当前帧的各声道的静音检测参数。
当前帧各声道的静音检测参数用于表征当前帧各声道信号的能量值、功率值、分贝值或者响度值。
B3、根据当前帧的各声道的静音检测参数和预设的静音检测阈值,确定当前帧的各声道的静音标志。
将当前帧各声道的静音检测参数分别与静音检测阈值进行比较,以当前帧的第一声道的静音标志检测为例,如果当前帧第一声道的静音检测参数小于静音检测阈值,则当前帧第一声道为静音帧,即当前时刻第一声道为静音通道,当前帧第一声道的静音标志muteFlag[1]为第一值(例如1)。如果当前帧第一声道的静音检测参数大于等于静音检测阈值,则当前帧第一声道为非静音帧,即当前时刻第一声道为非静音通道,当前帧第一声道的静音标志muteFlag[1]为第二值(例如0)。
402、对多声道信号进行多声道编码处理,以得到各传输通道的传输通道信号。
本申请实施例中,编码设备可以对多声道信号进行多声道编码处理,多声道编码的过程有多种,详见后续实施例的举例说明,通过上述编码过程,可以得到各传输通道的传输通道信号。
多声道量化编码的具体实现可以是组对下混后的信号经过神经网络变化,获得潜在特征;对潜在特征进行量化,并进行区间编码。多声道量化编码的具体实现可以是基于矢量量化对组对下混后的信号进行量化编码。本申请实施例对此不做限定。
在本申请的一些实施例中,步骤402对多声道信号进行多声道编码处理,以得到各传输通道的传输通道信号,包括:
C1、对多声道信号进行多声道信号筛选,以得到筛选后的多声道信号。
例如,编码设备完成多声道信号的筛选,筛选后的信号是参与组对的多声道信号,例如筛选后的声道不包括LFE声道,对于具体的筛选方式不做限定。
C2、对筛选后的多声道信号进行组对处理,以得到多声道组对信号和多声道边信息。
例如,编码设备对多声道信号进行筛选,筛选后的多声道信号可以是参与组对的多声道信号,完成多声道信号的筛选之后,还可以对多声道信号进行组对,例如声道ch1和声道ch2组成一个声道组对,得到多声道组对信号。组对处理的具体方法本发明不做限定。多声道边信息包括如下至少一种:声道间幅度差参数量化码书索引、声道组对数量、声道对索引。其中,声道间幅度差参数量化码书索引,用于指示多声道信号的各声道中每个声 道的声道间幅度差(Interaural Level Difference,ILD)参数量化的码书索引;声道组对数量,用于表示多声道信号的当前帧的声道组对数量;声道对索引,用于表示声道对的索引。
C3、根据多声道边信息对多声道组对信号进行下混处理,以得到各传输通道的传输通道信号。
在生成多声道组对信号和多声道边信息之后,可以使用该多声道边信息对多声道组对信号进行下混处理,对于具体的下混过程不再详细说明,通过前述的多声道组对和下混,可以得到多声道组对下混后的各传输通道的传输通道信号,该传输通道具体可以指的是多声道组对和下混后的通道。
在本申请的一些实施例中,步骤401获取多声道信号的静音标记信息之前,编码端执行的多声道信号的编码方法还包括:
D1、对多声道信号进行预处理,以得到预处理后的多声道信号,预处理包括如下至少一种:暂态检测、窗型判断、时频变换、频域噪声整形、时域噪声整形、频带扩展编码;
在前述执行步骤D1的实现场景下,步骤401获取多声道信号的静音标记信息,包括:
对预处理后的多声道信号进行静音标记检测,以得到静音标记信息。
其中,静音标志检测的输入信号可以是原始输入的多声道信号,也可以是经过预处理后的多声道信号。预处理可以包括但不限于:暂态检测、窗型判断、时频变换、频域噪声整形、时域噪声整形、频带扩展编码等处理。该多声道信号可以是时域信号,也可以是频域信号。通过上述预处理过程,可以提高多声道信号的编码效率。
在本申请的一些实施例中,编码端执行的多声道信号的编码方法还包括:
E1、对多声道信号进行预处理,以得到预处理后的多声道信号,预处理包括如下至少一种:暂态检测、窗型判断、时频变换、频域噪声整形、时域噪声整形、频带扩展编码;
E2、根据预处理后的多声道信号对静音标记信息进行修正。
其中,编码端可以对多声道信号进行预处理。预处理可以包括但不限于:暂态检测、窗型判断、时频变换、频域噪声整形、时域噪声整形、频带扩展编码等处理。多声道信号可以是时域信号,也可以是频域信号。经过预处理之后,还可以根据预处理后的多声道信号对步骤401中的静音标记信息进行修正,例如,频域噪声整形后,多声道信号的某一声道的信号能量发生变化,可调整该声道的静音标记检测结果。
403、根据各传输通道的传输通道信号和静音标记信息生成码流,码流包括:静音标记信息和各传输通道的传输通道信号的多声道量化编码结果。
其中,编码端生成码流,该码流中包括静音标记信息,从而使得解码端可以获取到该静音标记信息,基于该静音标记信息对码流解码,便于解码端采用与编码端一致的方式进行解码处理,例如比特分配。
在本申请的一些实施例中,步骤403根据各传输通道的传输通道信号和静音标记信息生成码流,包括:
F1、根据静音标记信息调整初始多声道处理方式,以得到调整后的多声道处理方式;
F2、根据调整后的多声道处理方式对多声道信号进行编码,以得到码流。
其中,编码端可以依据静音标记信息调整初始多声道处理方式,再根据调整后的多声 道处理方式对多声道信号进行编码,从而可以提高编码效率。例如,在多声道信号的筛选过程中,静音标志为1的声道不参与组对筛选。
在本申请的一些实施例中,步骤403根据各传输通道的传输通道信号和静音标记信息生成码流,包括:
G1、根据所述静音标记信息、可用比特数和多声道边信息,为各传输通道进行比特分配,得到各传输通道的比特分配结果;
G2、根据各通道的比特分配结果对各传输通道的传输通道信号进行编码,以得到码流。
其中,编码端可以将静音标记信息用于传输通道的比特分配,首先根据可用比特数和多声道边信息为各传输通道进行初始比特分配,然后根据静音标记信息再进行比特分配,得到各传输通道的比特分配结果;根据各传输通道的比特分配结果对传输通道信号进行编码,获得码流,该码流可以称为编码码流,或者多声道信号的码流。
进一步的,在本申请的一些实施例中,步骤G1根据所述静音标记信息、可用比特数和多声道边信息,为各传输通道进行比特分配,包括:
G11、根据可用比特数和多声道边信息,按照静音标记信息对应的比特分配策略为各传输通道进行比特分配。
编码端可以依据静音标记信息为各传输通道进行比特分配。静音使能标志可用于选择不同的比特分配策略。对于该比特分配策略的具体内容不做限定,举例说明如下:假设静音使能标志包括声床静音使能标志bedMuteEna和对象静音使能标志objMuteEna,依据静音标记信息进行比特分配,可以是先根据总的可用比特和各传输通道的信号特征,进行初次比特分配。再根据静音标记信息调整比特分配结果,通过比特分配的调整,能够提高多声道信号的传输效率。例如,若对象静音使能标志objMuteEna为1,将对象信号中muteflag为1的声道初次分配的比特分配给声床信号或其他对象通道。若声床静音使能标志bedMuteEna和对象静音使能标志均为1,可以将对象通道中muteflag为1的声道初次分配的比特重新分配给其他对象通道,将声床信号中muteflag为1的声道初次分配的比特重新分配给其他声床通道。
进一步的,在本申请的一些实施例中,多声道边信息,包括:声道比特分配比例,
其中,声道比特分配比例用于指示多声道信号中非低频效果LFE声道之间的比特分配比例。
其中,低频效果LFE声道是低音声音范围从3-120Hz的音频声道,该声道可用于发送到专门为低音调而设计的扬声器,声道比特分配比例用于指示非LFE声道的比特分配比例。例如,声道比特分配比例占用6个比特。本申请实施例中不限定声道比特分配比例占用的比特数。
例如,声道比特分配比例可以是多声道边信息中的声道比特分配比例字段,表示为chBitRatios,占用6个比特,用于指示多声道信号中除LFE声道以外的所有声道的比特分配比例。通过声道比特分配比例字段,能够指示每个传输通道的比特分配比例,从而确定出每个传输通道得到的比特数。不限定的是,该比特数还可以进一步转换为字节数。
在本申请的一些实施例中,多声道边信息包括如下至少一种:声道间幅度差参数量化码书索引、声道组对数量、声道对索引;
其中,声道间幅度差参数量化码书索引,用于指示各声道中每个声道的声道间幅度差(Interaural Level Difference,ILD)参数量化的码书索引;
声道组对数量,用于表示多声道信号的当前帧的声道组对数量;
声道对索引,用于表示声道对的索引。
其中,本申请实施例中不限定声道间幅度差参数量化码书索引占用的比特数。例如,声道间幅度差参数量化码书索引占用5个比特。声道间幅度差参数量化码书索引可以表示为mcIld[ch1]、mcIld[ch2],占用5比特,当前声道对中每个声道的声道间幅度差ILD参数量化的码书索引,用于恢复解码频谱的幅度。
本申请实施例中不限定声道组对数量占用的比特数。例如,声道组对数量占用4个比特,声道组对数量表示为pairCnt,占用4比特,用于表示当前帧的声道组对数量。
本申请实施例中不限定声道对索引占用的比特数。例如,声道对索引表示为channelPairIndex,channelPairIndex比特数与总声道数量有关,用于表示声道对的索引,可解析得到当前声道对中的两个声道的索引值,即ch1和ch2。
在本申请的一些实施例中,编码端除了执行前述步骤之外,编码设备执行的多声道信号的编码方法还包括:
向解码设备发送码流。
在本申请实施例中,编码端获得各传输通道的传输通道信号和静音标记信息之后,可以生成码流,该码流中携带静音标记信息,编码端可以向解码端发送该码流。
通过前述实施例的举例说明可知,对多声道信号进行静音标记检测,以得到静音标记信息,所述静音标记信息包括:静音使能标志,和/或静音标志;对所述多声道信号进行多声道编码处理,以得到各传输通道的传输通道信号;根据所述各传输通道的传输通道信号和所述静音标记信息生成码流,所述码流包括:所述静音标记信息和所述各传输通道的传输通道信号的多声道量化编码结果。依据静音标记信息进行后续的编码处理,可以提升编码效率。
本申请实施例还提供一种多声道信号的解码方法,该方法可以由终端设备执行,例如该终端设备可以是一种多声道信号的解码装置(如下简称解码端或者解码器,例如该解码端可以是AI解码器)。如图5所示,对本申请实施例中解码端执行的方法主要包括:
501、从编码设备的码流中解析出静音标记信息,并根据静音标记信息确定各传输通道的编码信息,静音标记信息包括:静音使能标志,和/或静音标志。
其中,解码端采用与编码端相逆的处理方式,首先从编码设备接收到码流,由于该码流中携带静音标记信息,因此根据静音标记信息确定各传输通道的编码信息,静音标记信息包括:静音使能标志,和/或静音标志。对于静音使能标志和静音标志的说明,详见前述编码端的实施例说明,此处不再赘述。
在本申请的一些实施例中,步骤501从编码设备的码流中解析出静音标记信息,包括:
H1、从码流中解析出各声道的静音标志;或者,
H2、从码流中解析出静音使能标志,若静音使能标志为第一值时,从码流中解析出静音标志;或者,
H3、从码流中解析出声床静音使能标志和/或对象静音使能标志,及各声道的静音标志; 或者,
H4、从码流中解析出声床静音使能标志和/或对象静音使能标志;根据声床静音使能标志和/或对象静音使能标志,从码流中解析出各声道的部分声道的静音标志。
解码端从编码设备的码流中解析出静音标记信息,根据编码设备生成的静音标记信息的具体内容的不同,解码端得到的静音标记信息与编码侧相对应。具体的,一种方式中,静音标志,用于指示各声道是否为静音通道,静音通道为不需要编码的声道或者需要按照低比特编码的声道,解码端可以从码流中解析出各声道的静音标志。一种方式中,静音使能标志还可以用于指示各声道是否均为非静音通道。例如,静音使能标志为第一值(例如1)时,表示需要进一步检测各声道的静音标志。静音使能标志为第二值(例如0)时,表示各声道均为非静音通道,解码端从码流中解析出静音使能标志,若静音使能标志为第一值时,从码流中解析出静音标志。一种方式中,静音使能标志包括:声床静音使能标志,和/或对象静音使能标志,解码端从码流中解析出声床静音使能标志和/或对象静音使能标志,及各声道的静音标志。一种方式中,解码端从码流中解析出声床静音使能标志和/或对象静音使能标志;根据声床静音使能标志和/或对象静音使能标志,从码流中解析出部分声道的静音标志。对于所得到的具体哪个部分声道的静音标志不做限定。
502、对各传输通道的编码信息进行解码,以得到各传输通道的解码信号。
其中,解码端在从码流中获取到各传输通道的编码信息之后,可以对各传输通道的编码信息进行解码,该解码反量化的过程与编码端的量化编码过程相逆,从而可以得到各传输通道的解码信号。
在本申请的一些实施例中,步骤502对各传输通道的编码信息进行解码,包括:
I1、从码流中解析出多声道边信息;
I2、根据多声道边信息和静音标志信息为各传输通道进行比特分配,以得到各通道的编码比特数;
I3、根据各通道的编码比特数对各传输通道的编码信息进行解码。
其中,码流中还可以包括多声道边信息,解码端可以根据多声道边信息和静音标志信息为各传输通道进行比特分配,以得到各通道的编码比特数,解码端得到的编码比特数与编码端预设的编码比特数相同,再根据各传输通道的编码比特数对各传输通道的编码信息进行解码,从而实现对各传输通道的传输通道信号的解码。
进一步的,在本申请的一些实施例中,多声道边信息,包括:声道比特分配比例字段,
其中,声道比特分配比例字段用于指示各声道中的非低频效果(Low Frequency Effects,LFE)声道的比特分配比例。
其中,低频效果LFE声道是低音声音范围从3-120Hz的音频声道,该声道可用于发送到专门为低音调而设计的扬声器。例如,声道比特分配比例字段占用6个比特。本申请实施例中不限定声道比特分配比例字段占用的比特数。
例如,声道比特分配比例字段表示为chBitRatios,占用6个比特,用于指示各声道中非LFE声道的比特分配比例。通过声道比特分配比例字段,能够指示每个声道的比特分配比例,从而确定出每个声道得到的比特数。不限定的是,该比特数还可以进一步转换为字节数。
在本申请的一些实施例中,多声道边信息包括如下至少一种:声道间幅度差参数量化码书索引、声道组对数量、声道对索引;
其中,声道间幅度差参数量化码书索引,用于指示各声道中每个声道的声道间幅度差ILD参数量化的码书索引;
声道组对数量,用于表示多声道信号的当前帧的声道组对数量;
声道对索引,用于表示声道对的索引。
其中,本申请实施例中不限定声道间幅度差参数量化码书索引占用的比特数。例如,声道间幅度差参数量化码书索引占用5个比特。声道间幅度差参数量化码书索引可以表示为mcIld[ch1]、mcIld[ch2],占用5比特,当前声道对中每个声道的声道间幅度差ILD参数量化的码书索引,用于恢复解码频谱的幅度。
本申请实施例中不限定声道组对数量占用的比特数。例如,声道组对数量占用4个比特,声道组对数量表示为pairCnt,占用4比特,用于表示当前帧的声道组对数量。
本申请实施例中不限定声道对索引占用的比特数。例如,声道对索引表示为channelPairIndex,channelPairIndex比特数与总声道数量有关,用于表示声道对的索引,可解析得到当前声道对中的两个声道的索引值,即ch1和ch2。
在本申请的一些实施例中,步骤I2根据多声道边信息和静音标志信息为各传输通道进行比特分配,包括:
I21、根据可用比特数和安全比特数,确定第一剩余比特数;
其中,对于安全比特数的取值不做限定,例如安全字节数表示为safeBits,安全字节数为8个比特,将可用比特数减去安全比特数可以得到第一剩余比特数。
I22、根据多声道边信息中的声道比特分配比例字段将第一剩余比特数分配给各通道,声道比特分配比例字段用于指示各通道的比特分配比例;
I23、当第一剩余比特数分配给各通道之后还存在第二剩余比特数时,根据声道比特分配比例字段将第二剩余比特数分配给各通道;
其中,将第一剩余比特数减去分配给各通道的比特数可以得到第二剩余比特数。
I24、当第二剩余比特数分配给各通道之后还存在第三剩余比特数时,将第三剩余比特数分配给采用第一剩余比特数进行比特分配时分配比特数最多的通道;
其中,将第二剩余比特数减去分配给各通道的比特数可以得到第三剩余比特数。
I25、当各通道中的第一通道被分配的比特数超过单个通道比特数的上限时,将超过的比特数分配给各通道中除第一通道以外的其它通道。
其中,对于单个通道比特数的上限的取值不做限定。第一通道可以是各个通道中的任意一个通道。
503、对各传输通道的解码信号进行多声道解码处理,以得到多声道解码输出信号。
其中,解码端通过解码,得到各传输通道的解码信号之后,进一步对该各传输通道的解码信号进行解码处理,从而得到解码输出信号。
在本申请的一些实施例中,步骤503对各传输通道的解码信号进行多声道解码处理,以得到多声道解码输出信号之后,解码端执行的多声道信号的解码方法还包括:
J1、对多声道解码输出信号进行后处理,后处理包括如下至少一种:频带扩展解码、 逆时域噪声整形、逆频域噪声整形、逆时频变换。
其中,上述对输出信号进行后处理的过程与编码端的预处理的过程相逆,对于具体的处理方式不再限定。
通过前述的举例说明可知,本申请实施例中解码端可以从编码端的码流中得到静音标记信息,从而便于解码端采用与编码端一致的方式进行解码处理,例如比特分配。
为便于更好的理解和实施本申请实施例的上述方案,下面举例相应的应用场景来进行具体说明。
多声道音频编码器,产品包括手机终端、芯片及无线网。
实施例一编码端如图6所示,包括静音标记检测单元、多声道编码处理单元、多声道量化编码单元、码流复用接口。
静音标记检测单元主要用于根据输入信号进行静音标记信息检测,确定静音标记信息。静音标记信息可以包含静音使能标志和/或静音标志。
静音使能标志记作HasSilFlag,静音使能标志可以是全局静音使能标志或部分静音使能标志,例如,仅作用于多声道信号中的对象信号的对象静音使能标志,记作objMuteEna。又如,仅作用于多声道信号中的对象信号的声床静音使能标志,记作bedMuteEna。
全局静音使能标志为作用于所述多声道信号的静音使能标志,多声道信号只包含声床信号的时候,全局静音使能标志为作用于所述声床信号的静音使能标志;多声道信号只包含对象信号的时候,全局静音使能标志为作用于所述对象信号的静音使能标志;多声道信号包含声床信号和对象信号的时候,全局静音使能标志为作用于所述声床信号和对象信号的静音使能标志。
部分静音使能标志为作用于所述多声道信号中部分声道的静音使能标志,部分声道为预先设定的,例如:所述部分静音使能标志为作用于所述对象信号的对象静音使能标志,或者,所述部分静音使能标志为作用于所述声床信号的声床静音使能标志,或者,所述部分静音使能标志为作用于所述多声道信号中不包含LFE声道信号的其他声道信号的静音使能标志。所述部分静音使能标志为作用于多声道信号中参与组对的声道信号的静音使能标志。本申请实施例中对多声道信号进行组对处理的具体方式不做限定。
静音使能标志用于指示静音检测是否开启。例如,静音使能标志为第一值(例如1)时,表示开启静音检测功能,进一步检测各通道的静音标志。静音使能标志为第二值(例如0)时,表示关闭静音检测功能。
静音使能标志也可以用于指示是否需要进一步传输各通道的静音标志。例如,静音使能标志为第一值(例如1)时,表示需要进一步传输各通道的静音标志。静音使能标志为第二值(例如0)时,表示不需要进一步传输各通道的静音标志。
静音使能标志还可以用于指示各通道是否均为非静音通道。例如,静音使能标志为第一值(例如1)时,表示需要进一步检测各通道的静音标志。静音使能标志为第二值(例如0)时,表示各通道均为非静音通道。
全局静音使能标志作用于所有通道,部分静音使能标志作用于部分通道。例如,对象静音使能标志应用于多声道信号中对象信号对应的声道,声床静音使能标志应用于多声道信号中声床信号对应的声道。
静音使能标志可以由外部输入控制,可以根据编码速率、编码带宽等编码器参数预先设定,还可以根据各通道的静音检测结果确定。
各通道的静音标志用于指示各通道是否为静音帧。各通道的静音标志记作silFlag[i],其中ch为通道编号,ch=0…N-1,其中N为待编码输入信号的总通道数,其中声床信号的通道数为M,对象声道的通道数为P,总的通道数N=M+P。例如,待编码信号为包含声床信号和对象信号的混合信号,其中:声床信号为5.1.4声道信号,声床信号的通道数M=10;对象信号的数量为4个,对象信号的通道数P=4;总通道数为14。声床信号的通道编号为从0到9,对象信号的通道编号为10到13。静音标志silFlag[i],ch=0…13,对应各个通道的静音标志,用于指示各个通道是否为静音通道。静音通道是信号的能量/分贝/响度低于听觉门限的通道,是不需要编码的通道或者仅需要按照较低比特编码的通道。静音标志的值为第一值(例如1)时,表示该通道为静音通道;静音标志的值为第二值(例如0)时,表示该通道为非静音通道。静音标志的值为第一值(例如1)时,不对该通道进行编码或者按照较低比特编码。
静音标志检测的输入信号可以是原始输入信号,也可以是经过预处理后的信号。预处理可以包括但不限于:暂态检测、窗型判断、时频变换、频域噪声整形、时域噪声整形、频带扩展编码等处理。输入信号可以是时域信号,也可以是频域信号。以输入信号为多声道信号中的各通道的时域信号为例,一种检测各通道的静音标志的方法可以是:
根据当前帧各通道的输入信号,确定当前帧各通道信号的能量。
假设帧长FRAME_LEN,当前帧的第ch通道的能量energy(ch)为:
其中,origch为当前帧的第ch通道的输入信号,energy(ch)为当前帧的第ch通道的能量。
根据当前帧各通道信号的能量,确定当前帧各通道的静音检测参数。
当前帧各通道的静音检测参数用于表征当前帧各通道信号的能量值、功率值、分贝值或者响度值。
例如,当前帧各通道的静音检测参数,可以是当前帧各通道信号的能量的log域的取值,例如log2(energy(ch))或者log10(energy(ch))。根据当前帧各通道信号的能量,计算当前帧各通道的静音检测参数,当前帧各通道的静音检测参数满足如下条件:
energyDB[ch]=10*log10(energy[ch]/Bit_Depth/Bit_Depth);
其中,energyDB[ch]为当前帧的第ch通道的静音检测参数,energy(ch)为当前帧的第ch通道的能量,Bit_Depth为位宽的满偏值,例如采样位深为16bit,则位宽的满偏值为216=65536。
根据当前帧各通道的静音检测参数和静音检测阈值,确定当前帧各通道的静音标志。
将当前帧各通道的静音检测参数分别与静音检测阈值进行比较:如果当前帧第ch通道的静音检测参数小于静音检测阈值,则当前帧第ch通道为静音帧,即当前时刻第ch通道为静音通道,当前帧第ch通道的静音标志silFlag[i]为第一值(例如1)。如果当前帧第ch通道的静音检测参数大于等于静音检测阈值,则当前帧第ch通道为非静音帧,即当前时刻第ch通道为非静音通道,当前帧第ch通道的静音标志silFlag[i]为第二值(例如0)。
根据当前帧第ch通道的静音检测参数和静音检测阈值,确定当前帧第ch通道的静音标志的伪代码如下:
silFlag[i]=0;
        if(energyDB[ch]<g_MuteThrehold)
{silFlag[i]=1;}
静音标记信息可以包含静音使能标志和/或静音标志,不同的静音标记信息构成如下举例:
方式一:静音标记信息为各通道的静音标志silFlag[i]。确定各通道的静音标志silFlag[i],并将各通道的静音标志silFlag[i]写入码流,传输到解码端。
方式二:静音标记信息包含静音使能标志HasSilFlag和静音标志silFlag[i]。
静音使能标志HasSilFlag指示当前帧是否打开静音检测功能,也可以用于指示当前帧是否传输各通道的静音检测结果。
确定静音使能标志HasSilFlag,写入码流,传输到解码端;根据静音使能标志的值,确定是否将静音标志silFlag[i]写入码流。
当静音使能标志HasSilFlag为0时,不将静音标志silFlag[i]写入码流传输到解码端。
当静音使能标志HasSilFlag为1时,将静音标志silFlag[i]写入码流传输到解码端。
方式三:静音标记信息包含声床静音使能标志bedMuteEna、对象静音使能标志objMuteEna和各通道的静音标志silFlag[i]。
声床静音使能标志bedMuteEna可以用于指示当前帧是否打开声床信号对应通道的静音检测功能。类似的,对象静音使能标志objMuteEna可以用于指示当前帧是否打开对象信号对应通道的静音检测功能。例如:
当声床静音使能标志bedMuteEna为0,对象静音使能标志objMuteEna为1,声床信号对应通道的静音标志值均设置为0,即非静音通道。对象信号对应通道的静音标志值为静音检测结果。
当声床静音使能标志bedMuteEna为1,对象静音使能标志objMuteEna为0,对象信号对应通道的静音标志值均设置为0,即非静音通道。声床信号对应通道的静音标志值为静音检测结果。
当声床静音使能标志bedMuteEna为0,对象静音使能标志objMuteEna为0,各通道的静音标志值均设置为0,即非静音通道。
当声床静音使能标志bedMuteEna为1,对象静音使能标志objMuteEna为1,各通道的静音标志为静音检测结果。
当静音标记信息包含声床静音使能标志bedMuteEna、对象静音使能标志objMuteEna和静音标志时,可以传输各通道的静音标志。
方式四:静音标记信息包含声床静音使能标志bedMuteEna、对象静音使能标志objMuteEna和部分通道的静音标志silFlag[i]。
方式四与方式三的区别在于:仅传出部分通道的静音标志。例如,当声床静音使能标志bedMuteEna为0,对象静音使能标志objMuteEna为1时,可以仅传输对象信号对应通 道的静音标志,不传输声床信号对应通道的静音标志;当声床静音使能标志bedMuteEna为1,对象静音使能标志objMuteEna为0时,可以仅传输声床信号对应通道的静音标志;当声床静音使能标志bedMuteEna为0,对象静音使能标志objMuteEna为0时,无需传出各通道的静音标志;当声床静音使能标志bedMuteEna为1,对象静音使能标志objMuteEna为1时,则传输各通道的静音标志。
方法五:声床静音使能标志bedMuteEna、对象静音使能标志objMuteEna可以替换为HasSilFlag={HasSilFlag(0),HasSilFlag(1)}表示,其中HasSilFlag(0)和HasSilFlag(0)分别对应bedMuteEna和objMuteEna。也可以由一个2比特的静音使能标志HasSilFlag表示声床静音使能标志bedMuteEna和对象静音使能标志objMuteEna。本申请实施例不做限定。
方法六:先确定各通道的静音标志,然后基于各通道的静音标志确定静音使能标志。
例如,静音使能标志可以是全局静音使能标志。如果各通道的静音标志均为0,则全局静音使能标志置为0,仅需要将全局静音使能标志写入码流,传到解码侧,无需传输各通道的静音标志。如果各通道的静音标志至少一个为1,则全局静音使能标志置为1,仅需要将全局静音使能标志写入码流,传到解码侧,无需传输各通道的静音标志。
又例如,静音使能标志可以是声床静音使能标志bedMuteEna和对象静音使能标志objMuteEna。以声床静音使能标志bedMuteEna为例,如果声床信号对应的各通道的静音标志均为0,则声床静音使能标志置为0,仅需要将声床静音使能标志写入码流,传到解码侧,无需传输声床信号对应的各通道的静音标志。如果声床信号对应的各通道的静音标志至少一个为1,则声床静音使能标志置为1,仅需要将声床静音使能标志写入码流,传到解码侧,无需传输声床信号对应的各通道的静音标志。对象静音使能标志objMuteEna可做类似处理,这里不再赘述。
本申请实施例仅例举了部分实现方式,具体的实现可能还有其他可能的实现方式,不做限定。
多声道编码处理单元完成多声道信号的筛选、组对、下混处理及多声道边信息生成,并获得多声道组对下混后的各传输通道信号。
可选地,静音标记检测处理与多声道编码处理之间还可以包含预处理,用于对输入信号进行预处理,以获得预处理后的,作为多声道编码处理的输入。预处理可以包括但不限于:暂态检测、窗型判断、时频变换、频域噪声整形、时域噪声整形、频带扩展编码等处理,本申请实施例不做限定。如图7所示,根据多声道的输入信号或者预处理后的多声道信号,进行多声道信号的筛选,获得筛选后的多声道信号。对筛选后的多声道信号进行组对处理,获得多声道组对信号。对多声道组对信号进行下混处理(例如中置-边信息(MIDSIDE,MS)处理)获得待编码的多声道组对下混后的信号。
可选地,在预处理过程中,可以对静音标记信息进行修正。例如,频域噪声整形后,某一传输通道信号的能量发生变化,可调整该通道的静音检测结果。
多声道边信息包括但不限于:组对数、组对声道索引列表、组对声道耳间强度差ILD系数列表、组对声道ILD大小端列表。
可选地,可以依据静音标记信息调整初始多声道处理方式。例如,在多声道信号的筛 选过程中,静音标志为1的声道不参与组对筛选。
多声道量化编码单元,对多声道组对下混后的各传输通道信号进行量化编码。
多声道量化编码包括比特分配处理和编码。
可选的是,根据所述静音标记信息、可用比特数和多声道边信息,进行比特分配;根据各通道的比特分配结果进行编码,获得编码码流。
多声道量化编码的具体实现可以是组对下混后的信号经过神经网络变化,获得潜在特征;对潜在特征进行量化,并进行区间编码。多声道量化编码的具体实现可以是基于矢量量化对组对下混后的信号进行量化编码。本申请实施例对此不做限定。
可选地,可以依据静音标记信息进行比特分配。例如,根据静音使能标志,选择不同的比特分配策略。
假设静音使能标志包括声床静音使能标志bedMuteEna和对象静音使能标志objMuteEna,依据静音标记信息进行比特分配,可以是先根据总的可用比特和各通道的信号特征,进行初次比特分配。再根据静音标记信息调整比特分配结果。例如,若对象静音使能标志objMuteEna为1,将对象信号中静音标识为1的声道初次分配的比特分配给声床信号或其他对象通道。若声床静音使能标志bedMuteEna和对象静音使能标志均为1,可以将对象通道中静音标识为1的声道初次分配的比特重新分配给其他对象通道,将声床信号中静音标识为1的声道初次分配的比特重新分配给其他声床通道。
码流复用接口将编码声道进行复用形成串行比特流bitStream以方便在信道中传输或者在数字媒质中存储。
本实施例解码端如图8所示,包括码流解复用单元、声道解码反量化单元、多声道解码处理单元、多声道后处理单元。
码流解复用单元,从接收到的码流中解析静音标志信息,并确定各声道编码信息。
从接收到的码流中解析静音标志信息,解析过程为编码端将静音标志信息写入码流的逆过程。
例如,编码端采用方式一,则解码端:从码流中解析各通道的静音标志silFlag[i],ch=0…N-1,其中N为待解码的多声道信号的通道数。
或者,编码端采用方式二,则解码端:先从码流中解析静音使能标志HasSilFlag;若静音使能标志HasSilFlag为第一值(例如1)时,从码流中解析静音标志silFlag[i],ch=0…N-1,其中N为待解码的多声道信号的通道数。
或者,编码端采用方式三,则解码端:先从码流中解析声床静音使能标志bedMuteEna和对象静音使能标志objMuteEna及各通道的静音标志silFlag[i],ch=0…N-1,其中N为待解码的多声道信号的通道数。
或者,编码端采用方式四,则解码端:先从码流中解析声床静音使能标志bedMuteEna和对象静音使能标志objMuteEna;再根据解析声床静音使能标志bedMuteEna和对象静音使能标志objMuteEna,从码流中解析对应通道的静音标志。例如:当声床静音使能标志bedMuteEna为0,对象静音使能标志objMuteEna为1时,则从码流中解析对象信号对应通道的静音标志;当声床静音使能标志bedMuteEna为1,对象静音使能标志objMuteEna为0时,则从码流中解析声床信号对应通道的静音标志;当声床静音使能标志bedMuteEna为0, 对象静音使能标志objMuteEna为0时,无需从码流中解析静音标志;当声床静音使能标志bedMuteEna为1,对象静音使能标志objMuteEna为1时,则从码流中解析各通道的静音标志,解析的声道数为声床信号对应通道数与对象信号对应通道数之和。
以如下方式为例,具体的解码端从码流中解析静音标记信息的语法如下:
从接收到的码流中解析多声道边信息。
根据多声道边信息进行比特分配,确定各声道的编码比特数。可选地,如果编码端依据静音标志信息进行比特分配,解码侧也需要根据静音标志信息,进行比特分配,确定各声道的编码比特数。
根据各声道的编码比特数,从接收到的码流中确定各声道编码信息。
解码单元,对各编码声道进行逆编码和逆量化,得到多声道组对下混的解码信号。
逆编码和逆量化是编码端多声道量化编码的逆过程。
多声道解码处理单元,多声道组对下混的解码信号进行多声道解码处理,获得多通道的输出信号。
多声道解码处理是多声道编码处理的逆过程。利用多声道边信息,根据多声道组对下混的解码信号重建多通道的输出信号。
如图9所示,如果编码端多声道编码处理之前还包含预处理,则解码端多声道解码处理之后还包含对应的后处理,例如:频带扩展解码、逆时域噪声整形、逆频域噪声整形、逆时频变换等,以获得最终的输出信号。
通过前述的举例说明可知,对多声道输入信号进行静音标记信息检测,确定静音标记信息,并依据静音标记信息进行后续的编码处理,例如比特分配,可以提升编码效率。
本申请实施例提出了一种根据输入信号特征生成静音标识位流的方法。编码端对多声道输入信号进行静音标记信息检测确定静音标记信息;将静音标记信息传输到解码端;根据静音标记信息进行比特分配,对多声道信号进行编码。解码端从码流中解析静音标记信息;根据静音标记信息进行比特分配,对多声道信号进行解码。
本申请实施例包括的技术方案中,计算每路输入信号得到静音标识位,用来指导编码和解码的比特分配。对输入信号判断是否是静音帧,如果是静音帧,对该声道不进行编码或者给予少量比特数编码。在输入端计算信号的分贝值或者响度值,和设置的听觉门限比较,低于听觉门限静音标识置为1,否则静音标识置为0。静音标识为1时该通道不编码或者按照较低比特编码;对mute位为1的通道的量化前的数据可清0;静音标识作为边信息传到解码端指导解码端的比特解复用,编码端的传输语法如下:使用HasSilFlag表示静音标识使能,可用1bit来传输HasSilFlag;在HasSilFlag=1的情况下进一步传输各声道的静音标识,HasSilFlag=0时不传输各声道的静音标识。比如5.1.4声道,在多通道的边信息里传输10比特的静音标识,每个声道1bit,顺序和输入声道的顺序一致;编码端其他模块可修改静音标识,将静音标识从1改成0并在码流中传输。
本申请实施例具有如下优点:对多声道输入信号进行静音标记信息检测,确定静音标记信息,并依据静音标记信息进行后续的编码处理,例如比特分配,对于静音通道,可以不编码或者按照较低比特编码,节省编码比特数,提升编码效率。
将静音标记信息传输到解码端,便于解码端采用与编码端一致的方式进行解码处理, 例如比特分配。
在本申请的另一些实施例中,对混合编码改进方案进行如下说明:
一种混合模式编解码支持声床信号和对象信号的编解码。具体实现方案分为三部分:
混合编码比特预分配:根据多声道边信息bedBitsRatio得到声床信号的预分配比特数bedAvailbleBytes和对象信号的预分配比特数objAvailbleBytes。
混合编码比特分配:分为四个步骤,按照处理顺序依次为:静音帧比特分配、非静音帧比特分配适配、非静音帧比特分配、非静音帧比特分配适配还原。
静音帧比特分配:如果存在静音帧,根据边信息的静音标志silFlag[i]和混合分配策略mixAllocStrategy来给静音帧声道分配比特,并更新声床信号的预分配比特数bedAvailbleBytes和对象信号的预分配总比特数objAvailbleBytes。
非静音帧比特分配适配:对声道参数顺序映射,作用是方便非静音帧比特分配处理。
非静音帧比特分配:根据声床信号的更新后的预分配比特数bedAvailbleBytes和对象信号更新后的预分配比特数objAvailbleBytes和声道比特分配比例因子chBitRatios来分配比特。
非静音帧比特分配适配还原:对声道参数顺序逆映射,作用是方便后续的区间解码、逆量化和神经网络逆变换步骤使用。
混合编码上混:根据声道对索引channelPairIndex指示的已组对的两个声道ch1和ch2,进行M/S上混,得到上混后声道信号。
多声道立体声边信息语法如下表1所示,为DecodeMcSideBits()语法。

语义说明如下,bedBitsRatio占用4比特,表示声床信号占总比特数的比例因子索引,取值0-15,对应的浮点比例如下:
1:0.0625
2:0.125
3:0.1875
4:0.25
5:0.3125
6:0.375
7:0.4375
8:0.5
9:0.5625
10:0.625
11:0.6875
12:0.75
13:0.8125
14:0.875
15:0.9375。
mixAllocStrategy占用2比特,表示声床信号和对象信号的混合信号的分配策略。该混合分配策略可以是预定的,或者混合分配策略按照编码参数预定义的,编码参数包括:编码速率、信号特征参数。编码参数是预定的。分配策略的取值范围及含义如下:
0:因Mute机制(静音标志)产生的多余的声床比特给声床信号,多余的对象比特给对象信号,静音的声床分给非静音声床。
1:因Mute机制产生的多余的声床比特分给声床信号,多余的对象比特给声床信号。
2:因Mute机制产生的多余的声床比特给对象信号,多余的对象比特给对象信号。
3:保留。
HasSilFlag占用1比特,0表示关闭静音帧处理或者没有静音帧;1表示开启静音帧 处理且存在静音帧。
silFlag[i]占用1比特,表示对应通道的静音帧标记,0表示非静音帧,1表示静音帧。
soundBedType占用1比特,type of sound bed,0 f只有对象信号or none(only objs),1是声床信号或者HOA信号or mc or hoa。
codingProfile占用3比特,0单声道,或者立体声信号或声床信号for mono/stereo/mc,1声床和对象的混合信号for channel+obj mix,2 for hoa。
pairCnt占用4比特,用于表示当前帧的声道组对数量。
channelPairIndex比特数与总声道数量有关,见上表注1。用于表示声道对的索引,可解析得到当前声道对中的两个声道的索引值,即ch1和ch2。
mcIld[ch1],mcIld[ch2]占用4比特,当前声道对中每个声道的声道间幅度差参数,用于恢复解码频谱的幅度。
scaleFlag[ch1],scaleFlag[ch2]占用1比特,表示当前声道对中每个声道的缩放标志参数,表示当前声道的幅度是被缩小或放大。
chBitRatios占用4比特,表示每个声道的比特分配比例。
解码过程如下,首先进行混合编码比特预分配。
混合编码比特预分配模块的作用是根据位流中解码获得的声床信号占总比特数的比例因子索引参数,将去除其他边信息后的剩余可用比特数计算得到声床预分配字节数和对象预分配字节数,提供给后续模块使用。
当前帧扣除其他边信息后剩余的可用字节数记为availableBytes,其中,声床预分配字节数是bedAvailbleBytes,对象预分配字节数是objAvailbleBytes。声床信号占总比特数的比例因子索引参数是bedBitsRatio,bedBitsRatio对应的浮点比例因子为bedBitsRatioFloat,bedBitsRatio和bedBitsRatioFloat的对应关系见前述语义中bedBitsRatio部分。
根据可用字节数availableBytes和声床信号占总比特数的浮点比例因子bedBitsRatioFloat计算声床预分配字节数bedAvailbleBytes和对象预分配字节数objAvailbleBytes的公式如下:
bedAvailbleBytes=floor(availableBytes*bedBitsRatioFloat);
objAvailbleBytes=availableBytes–bedAvailbleBytes。
混合编码比特分配过程如下,混合编码比特分配会根据位流中比特分配参数、可用字节数等参数共同来完成将可用比特数分配给混合编码多声道立体声中的各个下混声道,从而完成后续的区间解码、逆量化和神经网络逆变换步骤。混合编码比特分配包括以下部分:
静音帧声道的比特分配。静音帧声道的比特分配处理模块的作用是根据位流中解码获得的声床信号和对象信号的混合信号的分配策略参数mixAllocStrategy和位流中解码获得的静音帧标记参数静音使能标志HasSilFlag和静音标志silFlag来完成混合信号静音帧的比特分配。
步骤1:混合编码静音帧比特分配处理。
混合编码静音帧比特分配处理子模块根据位流中解码获得的静音帧标记相关参数 HasSilFlag和silFlag来完成混合编码静音帧的比特分配。存在以下情况及相应处理:
情况1:解析到HasSilFlag为0时,表示当前帧没有开启静音帧处理模式或者当前帧不存在静音帧,混合编码静音帧比特分配处理子模块不执行其他操作。
情况2:解析到HasSilFlag为1时,表示当前帧开启了静音帧处理且存在静音帧。此时遍历所有声道的silFlag[i],当silFlag[i]为1时,声道的字节数channelBytes[i]被置为最小安全字节数safetyBytes,最小安全字节数safetyBytes的取值和量化及区间编码模块对输入字节数的要求有关,比如,这里可以设置成10字节。
更新对象预分配字节数objAvailbleBytes。遍历silFlag[i]为1的对象声道,对于每个silFlag[i]为1的对象声道,执行以下操作:
objAvailbleBytes-=safetyBytes;
更新声床预分配字节数bedAvailbleBytes。遍历silFlag[i]为1的声床声道,对于每个silFlag[i]为1的声床声道,执行以下操作:
bedAvailbleBytes-=safetyBytes。
步骤2:静音帧剩余比特分配策略。
静音帧比特分配策略子模块的作用是当存在静音帧时,根据位流中解码获得的声床信号和对象信号的混合信号的分配策略参数mixAllocStrategy来决定将静音帧产生的剩余比特数分配给声床信号还是对象信号,具体的分配策略由mixAllocStrategy的值来确定,mixAllocStrategy取值含义详见mixAllocStrategy部分。
本申请实施例支持2种不同的静音帧剩余比特分配策略。首先进行预计算:
根据对象预分配字节数objAvailbleBytes和对象声道个数objNum计算得到对象声道分配平均字节数objAvgBytes,计算公式如下:
objAvgBytes[i]=floor(objAvailbleBytes/objNum);
如果均分后有剩余字节,把剩余字节拆分成多个1Byte按照对象信号的序号从低到高二次分配,即当sum(objAvgBytes[i])<objAvailbleBytes时,
objAvgBytes[0]+=1,其他对象声道objAvgBytes[i]做同样操作,直到sum(objAvgBytes[i])==objAvailbleBytes时结束。
方案1:mixAllocStrategy为0时,定义初始值为0的对象静音帧剩余比特objSilLeftBytes,遍历所有对象声道对应的silFlag[i],当silFlag[i]=1时,将objSilLeftBytes的值更新,即,
objSilLeftBytes+=objAvailbleBytes[i]–safetyBytes;0<=i<objNum;
直到遍历完所有的obj声道。
方案2:mixAllocStrategy为1时,定义初始值为0的对象静音帧剩余比特objSilLeftBytes,遍历所有对象声道对应的silFlag[i],当silFlag[i]=1时,将objSilLeftBytes的值更新,即
objSilLeftBytes+=objAvailbleBytes[i]–safetyBytes;0<=i<objNum;
直到遍历完所有的obj声道。
更新声床预分配字节数bedAvailbleBytes和对象预分配字节数objAvailbleBytes,例如采用如下方式:
bedAvailbleBytes+=objSilLeftBytes;
objAvailbleBytes-=objSilLeftBytes。
非静音帧比特分配前适配。将非静音帧声道的比特分配的输入参数映射成声道连续排列(静音帧声道的存在将造成非静音帧声道在物理上可能离散排布),方便后续模块非静音帧声道的比特分配处理。
非静音帧声道的比特分配。对声床非静音帧声道进行比特分配处理采用比特分配通用模块,其作用是根据声床更新后的预分配字节数bedAvailbleBytes和声道比特分配比例等参数共同来完成将可用比特数分配给声床对象多声道立体声中的各个下混声道。
输入的可用字节数记为availableBytes。多声道立体声模式可能存在LFE声道,一般情况下LFE声道的有效频谱信息较少,不需要参与多声道立体声模式的比特分配过程,预先分配固定的比特数即可。LFE声道的预分配比特数量与编码码率有关。记声道对平均码率为cpeRate,cpeRate为总编码码率折算到一个声道对的结果。若cpeRate<64kb/s,LFE声道分配的字节数为10;若cpeRate<96kb/s,LFE声道分配的字节数为15;若cpeRate>=96kb/s,则LFE声道分配的字节数为20。若LFE声道存在,则将LFE声道的预分配字节数从可用字节数availableBytes中扣除,扣除后剩余的字节数再分配给除LFE声道外的其他声道。
可用字节数availableBytes分配给其余声道的过程分为四个步骤,如下:
第一步、根据chBitRatios将比特分配给各个声道。
每个声道的字节数可表示为:
channelBytes[i]=availableBytes*chBitRatios[i]/(1<<4)。
其中,(1<<4)表示声道比特分配比例chBitRatios的最大取值范围。
第二步、若第一步中未将所有字节分配完毕,则将剩余的字节数按chBitRatios[i]表示的比例再次分配给各个声道。
第三步、若第二步结束后仍有比特剩余,则将剩余比特分配给第一步中分配字节最多的声道。
第四步、若某些声道分配的字节数超过单个声道字节数的上限,则将超过的部分分配给其余声道。
对对象非静音帧声道进行比特分配处理采用比特分配通用模块,其作用是根据对象更新后的可用字节数objAvailbleBytes和声道比特分配比例等参数共同来完成将可用比特数分配给声床对象多声道立体声中的各个下混声道。对象具体非静音帧声道进行比特分配处理过程同声床信号的非静音帧声道进行比特分配处理过程。
非静音帧声道适配还原。将非静音帧声道比特分配处理输出的字节数参数根据前述的规则逆映射成物理排布排列(静音帧声道的存在将造成非静音帧声道在物理上可能离散排布),方便后续模块区间解码、逆量化和神经网络逆变换步骤的处理。
混合编码上混。对声道对索引channelPairIndex指示的已组对的两个声道ch1和ch2,进行中央/侧边(Mid/Side,M/S)上混,上混方式与双声道立体声模式M/S上混一致。
M/S上混后,需要对上混后声道的改进型离散余弦变换(Modified Discrete Cosine Transform,MDCT)频谱进行逆双耳声强差(Interaural Level Difference,ILD)处理, 以恢复声道的幅度差异,逆ILD处理的过程如下:
其中,factor为第i个声道ILD参数对应的幅度调整因子,(1<<4)为mcIld的最大量化值范围,mdctSpectrum[i]表示第i个声道的MDCT系数矢量。
本申请实施例的技术效果如下,当多声道信号为包含声床信号和对象信号的混合信号且多声道信号中包含静音帧时,采用不同的混合包括声床信号和对象信号的混合信号的分配策略mixAllocStrategy,对静音帧节省的比特数分配到其他非静音帧,提升编码效率。
本申请实施例的改进之处如下,确定声床的预分配比特数bedAvailbleBytes和对象的预分配总比特数objAvailbleBytes;确定声床和对像中是否包括静音帧;如果存在静音帧,根据边信息silFlag[i]和mixAllocStrategy来给静音帧声道分配比特,并更新声床的预分配比特数bedAvailbleBytes和对象的预分配总比特数objAvailbleBytes。
本申请实施例提出了一种声床对象混合模式下比特分配模式位流的方法。从码流中解析包括声床信号和对象信号的混合信号的分配策略mixAllocStrategy;根据包括声床信号和对象信号的混合信号的分配策略,进行静音帧声道分配比特。
确定声床的预分配比特数bedAvailbleBytes和对象的预分配总比特数objAvailbleBytes;确定声床和对像中是否包括静音帧;如果存在静音帧,根据边信息silFlag[i]和mixAllocStrategy来给静音帧声道分配比特,并更新声床的预分配比特数bedAvailbleBytes和对象的预分配总比特数objAvailbleBytes。
从码流中解析静音标志信息(包括HasSilFlag和silFlag[i]);依据静音标志信息确定是否存在静音帧。
根据边信息silFlag[i]和mixAllocStrategy来给静音帧声道分配比特,并更新声床的预分配比特数bedAvailbleBytes和对象的预分配总比特数objAvailbleBytes。
根据获得的包括声床信号和对象信号的混合信号的分配策略参数mixAllocStrategy来确定将静音帧产生的剩余比特数分配给声床信号还是对象信号。
mixAllocStrategy2比特,表示包括声床信号和对象信号的混合信号的分配策略。取值范围及含义如下:
0:因Mute机制产生的多余比特属于声床信号的,该多余比特分配给别的声床信号,多余比特属于对象信号的,该多余比特分配给别的对象信号。
1:因Mute机制产生的多余比特属于声床信号的,该多余比特分配给别的声床信号,多余比特属于对象信号的,该多余比特分配给别的声床信号。
2:因Mute机制产生的多余比特属于声床信号的,该多余比特分配给别的对象信号,多余比特属于对象信号的,该多余比特分配给别的对象信号。
3:保留。
2种不同的静音帧剩余比特分配策略对应的具体的剩余比特分配方法。当多声道信号为包含声床信号和对象信号的混合信号时,将对象信号当成声床信号按照统一的比特分配策略一起进行比特分配,声床信号和对象信号之间相互影响,质量均变差。
本申请实施例提出了一种声床对象混合模式下比特分配位流的方法,具体的:
当多声道信号为包含声床信号和对象信号的混合信号时,根据码流解码得到比特分配比例因子,比特分配比例因子用于表征声床信号和/或对象声道信号编码比特数与总可用比特数之间的关系;
根据比特分配比例因子,确定声床信号的预分配比特数bedAvailbleBytes和对象信号的预分配比特数objAvailbleBytes;
根据声床信号的预分配比特数bedAvailbleBytes和对象信号的预分配比特数objAvailbleBytes,确定各通道的比特分配数;
根据各通道的比特分配数和码流进行解码,获得解码的多声道信号。
比特分配比例因子为声床信号的编码比特数占总可用比特数的比例因子(实施例中的bedBitsRatioFloat),或者对象信号的编码比特数占总可用比特数的比例因子,或者声床信号的编码比特数与对象信号的编码比特数之比,或者对象信号的编码比特数与声床信号的编码比特数之比。
比特分配比例因子为声床信号的编码比特数占总可用比特数的比例因子,确定比特分配比例因子的具体方法为:从码流中解析比特分配比例因子索引(如实施例中的bedBitsRatio),根据比特分配比例因子索引,确定比特分配比例因子(如实施例中的bedBitsRatioFloat)。
比特分配比例因子索引可以是对比特分配比例因子进行均匀量化编码后的编码索引,也可以是对比特分配比例因子进行非均匀量化编码后的编码索引。
比特分配比例因子索引和比特分配比例因子可以是线性关系,或者非线性关系。
根据可用字节数availableBytes和声床bed占总比特数的浮点比例因子bedBitsRatioFloat计算声床预分配字节数bedAvailbleBytes和对象预分配字节数objAvailbleBytes的公式如下:
bedAvailbleBytes=floor(availableBytes*bedBitsRatioFloat);
objAvailbleBytes=availableBytes–bedAvailbleBytes。
从码流中解析静音标志信息(包括HasSilFlag和silFlag[i]),根据声床信号的预分配比特数bedAvailbleBytes、对象信号的预分配比特数objAvailbleBytes和静音标志信息,进行比特分配,已确定各通道的比特分配数。
混合编码比特分配的步骤:依据静音标志信息确定是否存在静音帧;如果存在静音帧,根据边信息silFlag[i](和mixAllocStrategy)来给静音帧声道分配比特,并更新声床信号的预分配比特数bedAvailbleBytes和对象信号的预分配总比特数objAvailbleBytes;按照非静音帧比特分配原则,给非静音帧声道分配比特(包括非静音帧比特分配适配、非静音帧比特分配和非静音帧比特分配适配还原三个步骤)。
编码端确定比特分配比例因子;
对该因子进行量化编码,得到比特分配比例因子的索引;
把该索引写入码流。
比特分配比例因子索引和比特分配比例因子可以是线性关系,或者非线性关系。
比例因子按照编码参数预定义的。
编码参数包括:编码速率、信号特征参数。编码参数是预定的。
编码参数是根据每一帧信号的特征,例如信号的类型,自适应确定的。
编码端确定混合分配策略,在码流中携带混合分配策略。编码端发送给解码端。
当静音使能标志包含对象静音使能标志和声床静音使能标志时,声床对象混合信号的分配策略还可以包含其他模式,例如:
模式1:对象静音使能标志为1,将因对象信号中存在静音通道产生的多余比特分配给对象通道中的其他非静音通道;
模式2:对象静音使能标志为1,将因对象信号中存在静音通道产生的多余比特分配给声床信号所在通道;
模式3:声床静音使能标志为1,将因声床信号中存在静音通道产生的多余比特分配给声床通道中的其他非静音通道;
模式4:声床静音使能标志为1,将因声床信号中存在静音通道产生的多余比特分配给对象信号所在通道;
模式5:声床静音使能标志和对象静音使能标志均为1,将因对象信号中存在静音通道产生的多余比特分配给对象通道中的其他非静音通道;
模式6:声床静音使能标志和对象静音使能标志均为1,将因对象信号中存在静音通道产生的多余比特分配给声床通道中的其他非静音通道。
在本申请的另一些实施例中,混合信号编码改进方案如下:
AVS3P3标准中的混合信号编码模式支持声床信号和对象信号的编解码。在实际应用声床信号和对象信号中存在大量静音帧,合理的处理静音帧可以有效提升混合信号的编码效率。因此本提案给出一种混合信号高效编码方法,通过对声床信号和对象信号中静音帧和非静音帧合理的比特分配,提升混合信号编码质量。同时,将混合信号的比特分配策略放到编码端实现,解码端在比特分配环节不区分声床和对象。具体实现方案包括:
静音使能标志记作HasSilFlag,各通道中第i个通道的静音标志记作silFlag[i],静音使能标志为作用于多声道信号中不包含LFE声道信号的其他声道信号的静音使能标志。例如,HasSilFlag,用于指示各声道中除LFE声道之外的其他声道中是否存在静音帧。各声道中除LFE声道之外,每个声道对应的SilFlag用于指示该声道是否为静音帧。
chBitRatios[i]从非LFE声道才出现此字段改为非LFE非静音声道才出现此字段;chBitRatios[i]的比特数从4改为6;
ILD边信息从4比特的声道间幅度差参数和1比特的缩放标志参数改为5比特的缩放因子码书索引。
多声道立体声解码语法如下表2所示,为Avs3McDec()语法。

多声道立体声边信息语法如下表3,为DecodeMcSideBits()语法。

语义McBitsAllocationHasSiL()为多声道立体声比特分配。
coupleChNum为多声道信号中不包含LFE声道的所有其他声道的声道数量。
HasSilFlag占用1比特,表示音频信号当前帧的各个声道是否存在静音帧,0表示没有静音帧,1表示存在静音帧。
silFlag[i]占用1比特,0表示第i个通道是非静音帧,1表示第i个通道是静音帧
mcIld[ch1]、mcIld[ch2]占用5比特,当前声道对中每个声道的声道间幅度差ILD参数量化的码书索引,用于恢复解码频谱的幅度。
pairCnt占用4比特,用于表示当前帧的声道组对数量。
声道对索引表示为channelPairIndex,channelPairIndex比特数与总声道数量有关,见上表中的注1。用于表示声道对的索引,可解析得到当前声道对中的两个声道的索引值,即ch1和ch2。
chBitRatios占用6比特,表示每个声道的比特分配比例。
解码过程如下:
混合信号比特分配。混合信号比特分配根据位流中解码获得的静音声道标记、比特分配比例参数,将去除其他边信息后的剩余可用比特数分配给多声道立体声中的各个下混声道,从而完成后续的区间解码、逆量化和神经网络逆变换步骤。
当前帧扣除其他边信息后剩余的可用字节数记为availableBytes。
多声道立体声模式可能存在静音声道,静音声道不需要参与多声道立体声模式的比特分配过程,预先分配固定的字节数即可,字节数为8。若静音声道存在,则将静音声道的预分配字节数从可用字节数availableBytes中扣除,扣除后剩余的字节数再分配给除静音声道外的其他声道。
可用字节数availableBytes分配给其余声道的过程分为五个步骤,如下:
第一步,每个声道预分配安全字节数safeBits,安全字节数为8。安全字节数从可用字节数availableBytes中扣除,扣除后剩余的字节数availableBytes再继续后续步骤的分配。
第二步,根据chBitRatios将比特分配给各个声道,每个声道的字节数可表示为:
channelBytes[i]=availableBytes*chBitRatios[i]/(1<<6)。
其中,(1<<6)表示声道比特分配比例chBitRatios的最大取值范围。
第三步,若第二步骤中未将所有字节分配完毕,则将剩余的字节数按chBitRatios[i] 表示的比例再次分配给各个声道。
第四步,若第三步骤结束后仍有比特剩余,则将剩余比特分配给步骤1中分配字节最多的声道。
第五步,若某些声道分配的字节数超过单个声道字节数的上限,则将超过的部分分配给其余声道。
接下来对上混的过程进行说明,对声道对索引channelPairIndex指示的已组对的两个声道ch1和ch2,进行M/S上混,上混方式与双声道立体声模式M/S上混一致。M/S上混后,需要对上混后声道的MDCT频谱进行逆ILD处理,以恢复声道的幅度差异,逆ILD处理的伪代码如下:
factor=mcIldCodebook[mcIld[i]],
mdctSpectrum[i]=factor*mdctSpectrum[i]。
其中,factor为第i个声道ILD参数对应的幅度调整因子,mcIldCodebook为ILD参数的量化码书如下表4所示,mcIld[i]表示第i个声道的ILD参数对应的码书索引,mdctSpectrum[i]表示第i个声道的MDCT系数矢量。其中,如下表4为mcILD码表:

需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
为便于更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关装置。
请参阅图10所示,本申请实施例提供的一种编码设备1000,可以包括:静音标记信息获取模块1001、多声道编码模块1002和码流生成模块1003,其中,
静音标记信息获取模块,用于获取多声道信号的静音标记信息,所述静音标记信息包括:静音使能标志,和/或静音标志;
多声道编码模块,用于对所述多声道信号进行多声道编码处理,以得到各传输通道的传输通道信号;
码流生成模块,用于根据所述各传输通道的传输通道信号和所述静音标记信息生成码流,所述码流包括:所述静音标记信息和所述传输通道信号的多声道编码结果。
请参阅图11所示,本申请实施例提供的一种解码设备1100,可以包括:解析模块1101和处理模块1102,其中,
解析模块,用于从编码设备的码流中解析出静音标记信息,并根据所述静音标记信息确定各传输通道的编码信息,所述静音标记信息包括:静音使能标志,和/或静音标志;
处理模块,用于对所述各传输通道的编码信息进行解码,以得到所述各传输通道的解码信号;
所述处理模块,还用于对所述各传输通道的解码信号进行多声道解码处理,以得到多声道解码输出信号。
需要说明的是,上述装置各模块/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其带来的技术效果与本申请方法实施例相同,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
本申请实施例还提供一种计算机存储介质,其中,该计算机存储介质存储有程序,该程序执行包括上述方法实施例中记载的部分或全部步骤。
接下来介绍本申请实施例提供的另一种编码设备,请参阅图12所示,编码设备1200 包括:
接收器1201、发射器1202、处理器1203和存储器1204(其中编码设备1200中的处理器1203的数量可以一个或多个,图12中以一个处理器为例)。在本申请的一些实施例中,接收器1201、发射器1202、处理器1203和存储器1204可通过总线或其它方式连接,其中,图12中以通过总线连接为例。
存储器1204可以包括只读存储器和随机存取存储器,并向处理器1203提供指令和数据。存储器1204的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1204存储有操作系统和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。操作系统可包括各种系统程序,用于实现各种基础业务以及处理基于硬件的任务。
处理器1203控制编码设备的操作,处理器1203还可以称为中央处理单元(central processing unit,CPU)。具体的应用中,编码设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。
上述本申请实施例揭示的方法可以应用于处理器1203中,或者由处理器1203实现。处理器1203可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1203中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1203可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1204,处理器1203读取存储器1204中的信息,结合其硬件完成上述方法的步骤。
接收器1201可用于接收输入的数字或字符信息,以及产生与编码设备的相关设置以及功能控制有关的信号输入,发射器1202可包括显示屏等显示设备,发射器1202可用于通过外接接口输出数字或字符信息。
本申请实施例中,处理器1203用于执行前述实施例图4、图6、图7所示的由编码设备执行的方法。
接下来介绍本申请实施例提供的另一种解码设备,请参阅图13所示,解码设备1300包括:
接收器1301、发射器1302、处理器1303和存储器1304(其中解码设备1300中的处理器1303的数量可以一个或多个,图13中以一个处理器为例)。在本申请的一些实施例中,接收器1301、发射器1302、处理器1303和存储器1304可通过总线或其它方式连接,其中, 图13中以通过总线连接为例。
存储器1304可以包括只读存储器和随机存取存储器,并向处理器1303提供指令和数据。存储器1304的一部分还可以包括NVRAM。存储器1304存储有操作系统和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。操作系统可包括各种系统程序,用于实现各种基础业务以及处理基于硬件的任务。
处理器1303控制解码设备的操作,处理器1303还可以称为CPU。具体的应用中,解码设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。
上述本申请实施例揭示的方法可以应用于处理器1303中,或者由处理器1303实现。处理器1303可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1303中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1303可以是通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1304,处理器1303读取存储器1304中的信息,结合其硬件完成上述方法的步骤。
本申请实施例中,处理器1303,用于执行前述实施例图5、图8、图9所示的由解码设备执行的方法。
在另一种可能的设计中,当编码设备或者解码设备为终端内的芯片时,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使该终端内的芯片执行上述第一方面任意一项的音频编码方法,或者第二方面任意一项的音频解码方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述终端内的位于所述芯片外部的存储单元,如只读存储器(read-onlymemory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(randomaccessmemory,RAM)等。
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述第一方面或第二方面方法的程序执行的集成电路。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条 或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。

Claims (29)

  1. 一种多声道信号的编码方法,其特征在于,包括:
    获取多声道信号的静音标记信息,所述静音标记信息包括:静音使能标志,和/或静音标志;
    对所述多声道信号进行多声道编码处理,以得到各传输通道的传输通道信号;
    根据所述各传输通道的传输通道信号和所述静音标记信息生成码流,所述码流包括:所述静音标记信息和所述传输通道信号的多声道编码结果。
  2. 根据权利要求1所述的方法,其特征在于,所述多声道信号,包括:声床信号,和/或对象信号;
    所述静音标记信息包括:所述静音使能标志;所述静音使能标志包括:全局静音使能标志,或部分静音使能标志,其中,
    所述全局静音使能标志为作用于所述多声道信号的静音使能标志;或者,
    所述部分静音使能标志为作用于所述多声道信号中部分声道的静音使能标志。
  3. 根据权利要求2所述的方法,其特征在于,当所述静音使能标志为所述部分静音使能标志时,
    所述部分静音使能标志为作用于所述对象信号的对象静音使能标志,或者,所述部分静音使能标志为作用于所述声床信号的声床静音使能标志,或者,所述部分静音使能标志为作用于所述多声道信号中不包含非低频效果LFE声道信号的其他声道信号的静音使能标志,或者所述部分静音使能标志为作用于多声道信号中参与组对的声道信号的静音使能标志。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述多声道信号,包括:声床信号,和对象信号;
    所述静音标记信息包括:所述静音使能标志;所述静音使能标志包括:声床静音使能标志,和对象静音使能标志,
    所述静音使能标志占用第一比特位和第二比特位,所述第一比特位用于承载所述声床静音使能标志的值,所述第二比特位用于承载所述对象静音使能标志的值。
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述静音标记信息包括:所述静音使能标志;
    所述静音使能标志用于指示静音标记检测功能是否开启;或者,
    所述静音使能标志用于指示是否需要发送所述多声道信号的各声道的静音标志;或者,
    所述静音使能标志用于指示所述多声道信号的各声道是否均为非静音通道。
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述获取多声道信号的静音标记信息,包括:
    根据输入编码设备的控制信令获取所述静音标记信息;或者,
    根据编码设备的编码参数获取所述静音标记信息;或者,
    对所述多声道信号的各声道进行静音标记检测,以得到所述静音标记信息。
  7. 根据权利要求6所述的方法,其特征在于,所述静音标记信息包括:所述静音使能标志和所述静音标志;
    所述对多声道信号的各声道进行静音标记检测,以得到所述静音标记信息,包括:
    对所述多声道信号的各声道进行静音标记检测,以得到所述各声道的静音标志;
    根据所述各声道的静音标志确定所述静音使能标志。
  8. 根据权利要求1所述的方法,其特征在于,所述静音标记信息包括:所述静音标志;或者,所述静音标记信息包括:所述静音使能标志和所述静音标志;
    所述静音标志,用于指示所述静音使能标志作用的各声道是否为静音通道,所述静音通道为不需要编码的通道或需要按照低比特编码的通道。
  9. 根据权利要求1至8中任一项所述的方法,其特征在于,所述获取多声道信号的静音标记信息之前,所述方法还包括:
    对所述多声道信号进行预处理,以得到预处理后的多声道信号,所述预处理包括如下至少一种:暂态检测、窗型判断、时频变换、频域噪声整形、时域噪声整形、频带扩展编码;
    所述获取多声道信号的静音标记信息,包括:
    对所述预处理后的多声道信号进行所述静音标记检测,以得到所述静音标记信息。
  10. 根据权利要求1至8中任一项所述的方法,其特征在于,所述方法还包括:
    对所述多声道信号进行预处理,以得到预处理后的多声道信号,所述预处理包括如下至少一种:暂态检测、窗型判断、时频变换、频域噪声整形、时域噪声整形、频带扩展编码;
    根据所述预处理后的多声道信号对所述静音标记信息进行修正。
  11. 根据权利要求1至10中任一项所述的方法,其特征在于,所述根据所述各传输通道的传输通道信号和所述静音标记信息生成码流,包括:
    根据所述静音标记信息调整初始多声道处理方式,以得到调整后的多声道处理方式;
    根据所述调整后的多声道处理方式对所述各传输通道的传输通道信号进行编码,以得到所述码流。
  12. 根据权利要求1至10中任一项所述的方法,其特征在于,所述根据所述各传输通道的传输通道信号和所述静音标记信息生成码流,包括:
    根据所述静音标记信息、可用比特数和多声道边信息,为所述各传输通道进行比特分配,得到所述各传输通道的比特分配结果;
    根据所述各传输通道的比特分配结果对所述各传输通道的传输通道信号进行编码,以得到所述码流。
  13. 根据权利要求12所述的方法,其特征在于,所述根据所述静音标记信息、可用比特数和多声道边信息,为所述各传输通道进行比特分配,包括:
    根据可用比特数和多声道边信息,按照所述静音标记信息对应的比特分配策略为所述各传输通道进行比特分配。
  14. 根据权利要求12所述的方法,其特征在于,所述多声道边信息包括:声道比特分配比例,
    其中,所述声道比特分配比例用于指示所述多声道信号中非低频效果LFE声道之间的比特分配比例。
  15. 根据权利要求6或7所述的方法,其特征在于,所述对所述多声道信号的各声道进行静音标记检测,包括:
    根据所述多声道信号的当前帧的各声道的信号,确定所述当前帧的各声道的信号能量;
    根据所述当前帧的各声道的信号能量,确定所述当前帧的各声道的静音检测参数;
    根据所述当前帧的各声道的静音检测参数和预设的静音检测阈值,确定所述当前帧的各声道的静音标志。
  16. 根据权利要求1至15中任一项所述的方法,其特征在于,所述对所述多声道信号进行多声道编码处理,以得到所述各传输通道的传输通道信号,包括:
    对所述多声道信号进行多声道信号筛选,以得到筛选后的多声道信号;
    对所述筛选后的多声道信号进行组对处理,以得到多声道组对信号;
    对所述多声道组对信号进行下混处理和比特分配处理,以得到所述各传输通道的传输通道信号和多声道边信息。
  17. 根据权利要求16所述的方法,其特征在于,所述多声道边信息包括如下至少一种:声道间幅度差参数量化码书索引、声道组对数量、声道对索引、声道比特分配比例;
    其中,所述声道间幅度差参数量化码书索引,用于指示所述多声道信号的各声道中每个声道的声道间幅度差ILD参数量化的码书索引;
    所述声道组对数量,用于表示所述多声道信号的当前帧的声道组对数量;
    所述声道对索引,用于表示声道对的索引;
    所述声道比特分配比例用于指示所述多声道信号中非低频效果LFE声道之间的比特分配比例。
  18. 一种多声道信号的解码方法,其特征在于,包括:
    从编码设备的码流中解析出静音标记信息,并根据所述静音标记信息确定各传输通道的编码信息,所述静音标记信息包括:静音使能标志,和/或静音标志;
    对所述各传输通道的编码信息进行解码,以得到所述各传输通道的解码信号;
    对所述各传输通道的解码信号进行多声道解码处理,以得到多声道解码输出信号。
  19. 根据权利要求18所述的方法,其特征在于,所述从编码设备的码流中解析出静音标记信息,包括:
    从所述码流中解析出各声道的静音标志;或者,
    从所述码流中解析出所述静音使能标志,若所述静音使能标志为第一值时,从所述码流中解析出静音标志;或者,
    从所述码流中解析出声床静音使能标志和/或对象静音使能标志,及各声道的静音标志;或者,
    从所述码流中解析出声床静音使能标志和/或对象静音使能标志;根据所述声床静音使能标志和/或对象静音使能标志,从所述码流中解析出各声道的部分声道的静音标志。
  20. 根据权利要求18所述的方法,其特征在于,所述对所述各传输通道的编码信息进行解码,包括:
    从所述码流中解析出多声道边信息;
    根据所述多声道边信息和所述静音标志信息为所述各传输通道进行比特分配,以得到 所述各传输通道的编码比特数;
    根据所述各传输通道的编码比特数对所述各传输通道的编码信息进行解码。
  21. 根据权利要求18所述的方法,其特征在于,所述对所述各传输通道的解码信号进行多声道解码处理,以得到多声道解码输出信号之后,所述方法还包括:
    对所述多声道解码输出信号进行后处理,所述后处理包括如下至少一种:频带扩展解码、逆时域噪声整形、逆频域噪声整形、逆时频变换。
  22. 根据权利要求20所述的方法,其特征在于,所述多声道边信息包括如下至少一种:声道间幅度差参数量化码书索引、声道组对数量、声道对索引、声道比特分配比例;
    其中,所述声道间幅度差参数量化码书索引,用于指示各声道中每个声道的声道间幅度差ILD参数量化的码书索引;
    所述声道组对数量,用于表示所述多声道信号的当前帧的声道组对数量;
    所述声道对索引,用于表示声道对的索引;
    所述声道比特分配比例用于指示所述多声道信号中非低频效果LFE声道之间的比特分配比例。
  23. 一种编码设备,其特征在于,所述编码设备包括:
    静音标记信息获取模块,用于获取多声道信号的静音标记信息,所述静音标记信息包括:静音使能标志,和/或静音标志;
    多声道编码模块,用于对所述多声道信号进行多声道编码处理,以得到各传输通道的传输通道信号;
    码流生成模块,用于根据所述各传输通道的传输通道信号和所述静音标记信息生成码流,所述码流包括:所述静音标记信息和所述传输通道信号的多声道编码结果。
  24. 一种解码设备,其特征在于,所述解码设备包括:
    解析模块,用于从编码设备的码流中解析出静音标记信息,并根据所述静音标记信息确定各传输通道的编码信息,所述静音标记信息包括:静音使能标志,和/或静音标志;
    处理模块,用于对所述各传输通道的编码信息进行解码,以得到所述各传输通道的解码信号;
    所述处理模块,还用于对所述各传输通道的解码信号进行多声道解码处理,以得到多声道解码输出信号。
  25. 一种终端设备,其特征在于,所述终端设备包括:处理器,存储器;所述处理器、所述存储器之间进行相互的通信;
    所述存储器用于存储指令;
    所述处理器用于执行所述存储器中的所述指令,执行如权利要求1至17中任一项所述的方法。
  26. 一种终端设备,其特征在于,所述终端设备包括:处理器,存储器;所述处理器、所述存储器之间进行相互的通信;
    所述存储器用于存储指令;
    所述处理器用于执行所述存储器中的所述指令,执行如权利要求18至22中任一项所述的方法。
  27. 一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1至17,或者18至22中任意一项所述的方法。
  28. 一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行如权利要求1至17,或者18至22中任意一项所述的方法。
  29. 一种计算机可读存储介质,其特征在于,存储有如权利要求1至17任意一项所述的方法所生成的码流。
PCT/CN2023/073845 2022-03-14 2023-01-30 一种多声道信号的编解码方法和编解码设备以及终端设备 WO2023173941A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202210254868 2022-03-14
CN202210254868.9 2022-03-14
CN202210699863.7A CN116798438A (zh) 2022-03-14 2022-06-20 一种多声道信号的编解码方法和编解码设备以及终端设备
CN202210699863.7 2022-06-20

Publications (1)

Publication Number Publication Date
WO2023173941A1 true WO2023173941A1 (zh) 2023-09-21

Family

ID=88022182

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/073845 WO2023173941A1 (zh) 2022-03-14 2023-01-30 一种多声道信号的编解码方法和编解码设备以及终端设备

Country Status (2)

Country Link
TW (1) TW202403728A (zh)
WO (1) WO2023173941A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1697472A (zh) * 2004-05-14 2005-11-16 华为技术有限公司 语音切换方法及其装置
CN101431578A (zh) * 2008-10-30 2009-05-13 南京大学 一种基于g.723.1静音检测技术的信息隐藏方法
US20120035939A1 (en) * 2010-08-06 2012-02-09 Samsung Electronics Co., Ltd. Method of processing signal, encoding apparatus thereof, decoding apparatus thereof, and signal processing system
CN111681663A (zh) * 2020-07-24 2020-09-18 北京百瑞互联技术有限公司 一种降低音频编码运算量的方法、系统、存储介质及设备
CN113948096A (zh) * 2020-07-17 2022-01-18 华为技术有限公司 多声道音频信号编解码方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1697472A (zh) * 2004-05-14 2005-11-16 华为技术有限公司 语音切换方法及其装置
CN101431578A (zh) * 2008-10-30 2009-05-13 南京大学 一种基于g.723.1静音检测技术的信息隐藏方法
US20120035939A1 (en) * 2010-08-06 2012-02-09 Samsung Electronics Co., Ltd. Method of processing signal, encoding apparatus thereof, decoding apparatus thereof, and signal processing system
CN113948096A (zh) * 2020-07-17 2022-01-18 华为技术有限公司 多声道音频信号编解码方法和装置
CN111681663A (zh) * 2020-07-24 2020-09-18 北京百瑞互联技术有限公司 一种降低音频编码运算量的方法、系统、存储介质及设备

Also Published As

Publication number Publication date
TW202403728A (zh) 2024-01-16

Similar Documents

Publication Publication Date Title
JP7175979B2 (ja) 様々な時間/周波数分解能を使用して指向性オーディオコーディングパラメータを符号化または復号するための装置および方法
TWI443647B (zh) 用以將以物件為主之音訊信號編碼與解碼之方法與裝置
KR101100221B1 (ko) 오디오 신호의 디코딩 방법 및 그 장치
RU2449388C2 (ru) Способы и устройства для кодирования и декодирования объектно-базированных аудиосигналов
JP2014089467A (ja) マルチチャンネルオーディオ信号のエンコーディング/デコーディングシステム、記録媒体及び方法
TWI521502B (zh) 多聲道音訊的較高頻率和降混低頻率內容的混合編碼
KR20070001139A (ko) 오디오 분배 시스템, 오디오 인코더, 오디오 디코더 및이들의 동작 방법들
CN1885724A (zh) 产生音频信号比特流方法和设备及音频编解码方法和设备
TW201911888A (zh) 時域身歷聲編解碼方法和相關產品
TW201911292A (zh) 音訊編解碼模式確定方法和相關產品
TWI501220B (zh) 嵌入與擷取輔助資料
KR102492791B1 (ko) 시간-도메인 스테레오 인코딩 및 디코딩 방법 및 관련 제품
TWI834163B (zh) 三維音頻訊號編碼方法、裝置和編碼器
WO2023173941A1 (zh) 一种多声道信号的编解码方法和编解码设备以及终端设备
CN116798438A (zh) 一种多声道信号的编解码方法和编解码设备以及终端设备
US20240169998A1 (en) Multi-Channel Signal Encoding and Decoding Method and Apparatus
US20240177721A1 (en) Audio signal encoding and decoding method and apparatus
US20240112684A1 (en) Three-dimensional audio signal processing method and apparatus
US20240087578A1 (en) Three-dimensional audio signal coding method and apparatus, and encoder
CN117476016A (zh) 音频编解码方法、装置、存储介质及计算机程序产品
KR20230035373A (ko) 오디오 인코딩 방법, 오디오 디코딩 방법, 관련 장치, 및 컴퓨터 판독가능 저장 매체
CN115376529A (zh) 三维音频信号编码方法、装置和编码器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23769452

Country of ref document: EP

Kind code of ref document: A1