WO2018058379A1 - 一种处理多声道音频信号的方法、装置和系统 - Google Patents

一种处理多声道音频信号的方法、装置和系统 Download PDF

Info

Publication number
WO2018058379A1
WO2018058379A1 PCT/CN2016/100617 CN2016100617W WO2018058379A1 WO 2018058379 A1 WO2018058379 A1 WO 2018058379A1 CN 2016100617 W CN2016100617 W CN 2016100617W WO 2018058379 A1 WO2018058379 A1 WO 2018058379A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
stereo parameter
nth frame
parameter set
nth
Prior art date
Application number
PCT/CN2016/100617
Other languages
English (en)
French (fr)
Inventor
王喆
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to KR1020197011605A priority Critical patent/KR20190052122A/ko
Priority to EP21163871.3A priority patent/EP3910629A1/en
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to KR1020227012057A priority patent/KR102480710B1/ko
Priority to CN202311261449.9A priority patent/CN117351965A/zh
Priority to KR1020217028255A priority patent/KR102387162B1/ko
Priority to MX2019003417A priority patent/MX2019003417A/es
Priority to CN201680010600.3A priority patent/CN108140393B/zh
Priority to CN202311262035.8A priority patent/CN117351966A/zh
Priority to BR112019005983-0A priority patent/BR112019005983B1/pt
Priority to PCT/CN2016/100617 priority patent/WO2018058379A1/zh
Priority to CN202311261321.2A priority patent/CN117476018A/zh
Priority to JP2019516957A priority patent/JP6790251B2/ja
Priority to CN202311267474.8A priority patent/CN117392988A/zh
Priority to EP16917134.5A priority patent/EP3511934B1/en
Publication of WO2018058379A1 publication Critical patent/WO2018058379A1/zh
Priority to US16/368,208 priority patent/US10593339B2/en
Priority to US16/781,421 priority patent/US10984807B2/en
Priority to US17/232,679 priority patent/US11922954B2/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/012Comfort noise or silence coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/008Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/03Aspects of down-mixing multi-channel audio to configurations with lower numbers of playback channels, e.g. 7.1 -> 5.1

Definitions

  • the present invention relates to the field of audio codec technology, and in particular, to a method, apparatus and system for processing a multi-channel audio signal.
  • the original audio signal of each frame transmitted is usually encoded and transmitted at the transmitting end, and the audio signal is compressed by the encoding.
  • the receiving end receives the signal, Decode the received signal and then recover the original audio signal.
  • different types of encoding methods are adopted for different types of audio signals.
  • the audio signal is a voice signal
  • a continuous coding method is generally adopted, that is, each frame of the voice signal is separately encoded.
  • the noise signal is usually encoded by using a non-continuous coding method, that is, The noise signal of several frames is encoded on one frame of the noise signal, for example, the noise signal is encoded every six frames, and after the first frame noise signal is encoded, the second to seventh frame noise signals are no longer encoded, and then The eighth frame noise signal is encoded, and six No_Data frames are respectively in the second frame to the seventh frame.
  • the above audio signal refers to a mono audio signal.
  • stereo communication taking stereo communication as an example of two-channel communication, wherein the two channels include the first channel and the second channel.
  • the transmitting end obtains the nth frame voice signal of the first channel and the nth frame of the second channel according to the nth frame voice signal of the first channel and the nth frame voice signal of the second channel
  • the voice signal is mixed into a stereo parameter of a frame downmix signal, wherein the downmix signal is a single channel signal
  • the transmitting end mixes the nth frame voice signal in the two channels into one frame downmix signal, where n is greater than A positive integer of zero, then encode the downmix signal of the frame, and finally send the encoded downmix signal and stereo parameters to the receiving end, and after receiving the encoded downmix signal and stereo parameters, the receiving end encodes the code.
  • the downmix signal After the downmix signal is decoded, the downmix signal is restored to a two-channel signal according to the stereo parameter.
  • This transmission mode greatly reduces the number of transmitted bits compared to encoding each frame of the two-channel voice signal. , thus achieving the purpose of compression.
  • the same encoding method as the voice signal is used. If the non-continuous encoding method in mono is directly applied to the stereo communication, the receiving end cannot Restoring the noise signal causes the subjective experience of the user at the receiving end to deteriorate.
  • the present invention provides a method, apparatus and system for processing a multi-channel audio signal to solve the problem that the multi-channel audio communication system in the prior art cannot continuously transmit audio signals.
  • a method for processing a multi-channel audio signal comprising: detecting, by a encoder, whether a voice signal is included in a downmix signal of an Nth frame, and when detecting a voice signal in a downmix signal of an Nth frame, The Nth frame downmix signal is encoded; when the Nth frame downmix signal is detected to not include the voice signal: if it is determined that the Nth frame downmix signal satisfies the preset audio frame coding condition, the Nth frame downmix signal is encoded If it is determined that the downmix signal of the Nth frame does not satisfy the preset audio frame coding condition, the downmix signal of the Nth frame is not encoded; wherein the downmix signal of the Nth frame is the Nth of the two channels of the multichannel
  • the frame audio signal is obtained based on a predetermined first algorithm, and N is a positive integer greater than zero.
  • the encoder Since the encoder only includes the speech signal in the downmix signal or the downmix signal satisfies the preset audio frame encoding condition, the downmix signal is encoded, otherwise the downmix signal is not encoded, thereby enabling the encoder to implement the downmix signal.
  • Non-continuous coding improves the compression efficiency of the downmix signal.
  • the preset audio frame coding condition includes the first frame downmix signal, that is, when the first frame downmix signal does not include the voice signal, the first frame is The mixed signal satisfies the preset audio frame encoding condition, and encodes the first frame downmix signal.
  • the encoder in order to achieve a greater degree of compression efficiency for the downmix signal, optionally, when the encoder includes the voice signal in the downmix signal of the Nth frame, according to the preset voice frame coding rate. Coding of the downmix signal of the Nth frame; not including the voicemail when detecting the downmix signal of the Nth frame No.: If it is determined that the downmix signal of the Nth frame satisfies the preset speech frame coding condition, the downmix signal of the Nth frame is encoded according to the preset speech frame coding rate; if it is determined that the downmix signal of the Nth frame does not satisfy the preset The speech frame encoding condition, but satisfying the preset SID encoding condition, encodes the Nth frame downmix signal according to the preset SID encoding rate; wherein the SID encoding rate is smaller than the speech frame encoding rate.
  • the preset SID coding rate is performed on the Nth frame downmix signal. SID coding further improves the compression efficiency of the downmix signal compared to the speech signal coding.
  • the stereo parameter set needs to be encoded.
  • the encoder performs discontinuous encoding on the stereo parameter set. Specifically, the encoder obtains the first according to the N-th frame audio signal.
  • the N-frame stereo parameter set encodes the N-th stereo parameter set when the N-th downmix signal is detected to include the speech signal; and when the N-th down-mixed signal is detected to include no speech signal: If the N frame stereo parameter set satisfies the preset stereo parameter encoding condition, the at least one stereo parameter in the Nth frame stereo parameter set is encoded; if it is determined that the Nth frame stereo parameter set does not satisfy the preset stereo parameter encoding condition, then the Stereo parameter set encoding; wherein, the Nth stereo parameter set includes Z stereo parameters, and the Z stereo parameters include parameters used by the encoder to mix the Nth frame audio signals based on a predetermined algorithm, and Z is a positive integer greater than zero .
  • the encoder performs stereo parameters according to the Nth frame before encoding at least one stereo parameter in the Nth stereo parameter set.
  • the Z stereo parameters in the set are obtained according to a preset stereo parameter reduction rule, and X target stereo parameters are obtained, and then X target stereo parameters are encoded, where X is a positive integer greater than zero and less than or equal to Z.
  • the preset stereo parameter dimension reduction rule may be a preset stereo parameter type, that is, selecting X stereo parameters that match the preset stereo parameter type from the Nth frame stereo parameter set, or a preset stereo parameter.
  • the dimension reduction rule is the number of preset stereo parameters, that is, from the Nth X stereo parameters are selected in the frame stereo parameter set, or the preset stereo parameter dimension reduction rule is to reduce the resolution in the time domain or the frequency domain for at least one stereo parameter in the Nth frame stereo parameter set, that is, according to the reduction
  • the resolution of the at least one stereo parameter in the time domain or the frequency domain determines X target stereo parameters based on the Z stereo parameters.
  • the compression efficiency of the multi-channel communication system can also be improved by the following methods:
  • the encoder detects that the Nth frame audio signal includes a voice signal: according to the Nth frame audio signal, based on the first stereo parameter set generation manner, obtains an Nth frame stereo parameter set, and encodes the Nth frame stereo parameter set;
  • the Nth frame audio signal does not include a voice signal: if it is determined that the Nth frame audio signal satisfies a preset voice frame coding condition, according to the Nth frame audio signal, the Nth frame is obtained based on the first stereo parameter set generation manner Stereo parameter set, and encoding the Nth frame stereo parameter set; if it is determined that the Nth frame audio signal does not satisfy the preset speech frame encoding condition, according to the Nth frame audio signal, based on the second stereo parameter set generation manner, An N-frame stereo parameter set, and encoding, when determining that the N-th stereo parameter set satisfies a preset stereo parameter encoding condition, encoding at least one stereo parameter in the Nth frame stereo parameter set; not determining that the Nth frame audio
  • the first stereo parameter set generation manner and the second stereo parameter set generation manner satisfy at least one of the following conditions:
  • the number of stereo parameter types included in the stereo parameter set specified by the first stereo parameter set generation mode is not less than the number of stereo parameter types included in the stereo parameter set specified by the second stereo parameter set generation mode, and the first stereo parameter
  • the number of stereo parameters included in the stereo parameter set specified by the set generation mode is not less than the number of stereo parameters included in the stereo parameter set specified by the second stereo parameter set generation mode, and the stereo set by the first stereo parameter set generation mode
  • the resolution of the parameter in the time domain is not lower than the resolution of the corresponding stereo parameter specified by the second stereo parameter set generation mode, and the resolution of the stereo parameter specified by the first stereo parameter set generation mode is not lower than the resolution in the frequency domain.
  • the second stereo parameter set generation method Corresponding stereo parameters in the frequency domain resolution.
  • the encoder when the encoder includes the voice signal in the downmix signal of the Nth frame, the encoder encodes the Nth frame stereo parameter set according to the first coding mode; and the downmix signal satisfies the voice in the Nth frame.
  • the frame coding condition at least one stereo parameter in the Nth frame stereo parameter set is encoded according to the first coding mode; when the Nth frame downmix signal does not satisfy the speech frame coding condition, the Nth frame is stereo according to the second coding mode At least one stereo parameter encoding in the parameter set;
  • the coding rate specified by the first coding mode is not less than the coding rate specified by the second coding mode; and/or, for any of the stereo parameters of the Nth frame stereo parameter set, the quantization accuracy specified by the first coding mode is not It is lower than the quantization precision specified by the second coding method.
  • the Nth stereo parameter set includes IPD and ITD
  • the quantization precision of the IPD specified in the first coding mode is not lower than the quantization precision of the IPD specified in the second coding mode, and the quantization of the ITD specified in the first coding mode.
  • the accuracy is not lower than the quantization precision of the ITD specified in the second encoding method.
  • At least one stereo parameter in the set of stereo parameters of the Nth frame includes: an inter-channel level difference ILD; and the preset stereo parameter encoding condition includes: D L ⁇ D 0 ;
  • D L represents a degree of deviation of the ILD from the first standard
  • the first criterion is a T-frame stereo parameter set before the N-th stereo parameter set
  • T is a positive integer greater than 0, determined according to a predetermined second algorithm
  • At least one stereo parameter in the set of stereo parameters of the Nth frame comprises: an inter-channel time difference ITD; the preset stereo parameter encoding condition includes: D T ⁇ D 1 ;
  • D T represents a degree of deviation of the ITD from the second standard
  • the second criterion is based on a T-frame stereo parameter set before the N-th stereo parameter set, and T is a positive integer greater than 0, determined based on a predetermined third algorithm
  • At least one stereo parameter in the Nth stereo parameter set includes: an inter-channel phase difference IPD; the preset stereo parameter encoding condition includes: D p ⁇ D 2 ;
  • D P represents the degree of deviation of the IPD from the third standard
  • the third criterion is based on the T-frame stereo parameter set before the N-th stereoscopic parameter set, which is determined based on a predetermined fourth algorithm
  • T is a positive integer greater than zero.
  • the second algorithm, the third algorithm, and the fourth algorithm are preset according to actual conditions.
  • D L , D T , and D P respectively satisfy the following expressions:
  • ILD(m) is a level difference when the N channel transmits the Nth frame audio signal in the mth subband
  • M is the total number of subbands occupied by the Nth frame audio signal.
  • T is a positive integer greater than 0
  • ILD [-t] (m) is the two channels in the mth subband respectively.
  • the level difference when transmitting the t-th frame audio signal before the Nth frame audio signal, and the ITD is the time difference when the N-channel audio signal is transmitted by the two channels respectively.
  • ITD [-t] The average value of the ITD in the T-frame stereo parameter set before the Nth frame, ITD [-t] is the time difference when the t-frame audio signal before the N-th frame audio signal is transmitted separately, IPD ( m) a phase difference value when the two channels respectively transmit a partial audio signal in the Nth frame audio signal in the mth subband, The average value of the IPD in the mth subband in the T frame stereo parameter set before the Nth frame, IPD [-t] (m) is the two channels before the transmission of the Nth frame audio signal in the mth subband respectively The phase difference value of the t-th frame audio signal.
  • a method for processing a multi-channel audio signal comprising: a decoder receiving a code stream, the code stream comprising at least two frames, at least two frames of the first type and at least one of the at least two frames
  • the first type of frame includes a downmix signal
  • the second type of frame does not include a downmix signal
  • N is a positive integer greater than one: if the decoder determines that the Nth frame stream is The first type of frame is decoded by the Nth frame stream to obtain the Nth frame downmix signal; if the decoder determines that the Nth frame code stream is the second type of frame, according to the preset first rule, from the Nth frame In the at least one frame downmix signal before the mixed signal, determining the m frame downmix signal, and based on the m frame downmix signal, obtaining the Nth frame downmix signal based on the predetermined first algorithm, where m is a positive integer greater than zero
  • the code stream received by the decoder includes a first type of frame and a second type of frame, wherein the first type of frame includes a downmix signal
  • the second type of frame does not include a downmix signal, that is, the encoder is not
  • the downmix signal of each frame is encoded, thereby realizing the discontinuous transmission of the downmix signal, and improving the compression efficiency of the downmix signal of the multichannel audio communication system.
  • the first frame code stream is a first type of frame, and specifically, in order to decode the first frame code stream, the obtained downmix signal is restored to audio in two channels.
  • the signal also needs to include a set of stereo parameters in the first frame stream.
  • the second type of frame does not include the downmix signal, and therefore, the size of the first type of frame is larger than the size of the second type of frame, and the decoder can pass the code according to the Nth frame.
  • the size of the stream is used to determine whether the Nth frame code stream is the first type frame or the second type frame.
  • the identifier bit may be encapsulated in the Nth frame code stream, and the decoder obtains the identifier after decoding the Nth frame code stream portion. Bit, if the identifier bit indicates that the Nth frame code stream is the first type frame, the decoder decodes the Nth frame code stream to obtain the Nth frame downmix signal; if the flag bit indicates that the Nth frame code stream is the second type frame, Then the decoder obtains the Nth frame downmix signal according to the predetermined first algorithm.
  • the first type of frame includes a downmix signal and a stereo parameter set
  • the type frame includes a stereo parameter set and does not include a downmix signal: if the decoder determines that the Nth frame code stream is the first type frame, after decoding the Nth frame code stream, after obtaining the Nth frame downmix signal, And obtaining an Nth frame stereo parameter set, and restoring the Nth frame downmix signal to the Nth frame audio signal according to a predetermined third algorithm according to at least one stereo parameter in the Nth frame stereo parameter set; if the decoder determines The N frame code stream is the second type frame, and the Nth stream stream solution is solved.
  • a code obtains a set of stereo parameters of the Nth frame, and obtains an Nth frame downmix signal based on a predetermined first algorithm, and then the decoder is based on at least one stereo parameter in the set of stereo parameters of the Nth frame, based on a predetermined third algorithm, The N-frame downmix signal is restored to the Nth frame audio signal.
  • the first type of frame includes a downmix signal and a stereo parameter set, and second.
  • the type frame does not include the downmix signal and does not include the stereo parameter set; if the decoder determines that the Nth frame code stream is the first type of frame, the Nth code stream is decoded, and the Nth frame downmix signal is obtained.
  • the N frame code stream is a second type of frame, and the Nth frame downmix signal is obtained based on a predetermined first algorithm, and is determined according to a preset second rule from at least one frame stereo parameter set before the Nth frame stereo parameter set.
  • a k-frame stereo parameter set and based on a k-frame stereo parameter set, based on a predetermined fourth algorithm, obtains an Nth frame stereo parameter set, and then, according to at least an Nth frame stereo parameter set Stereo parameters, based on a third algorithm, the downmix signal is reduced to the N-th frame of the N frames of the audio signal, k is a positive integer greater than zero.
  • the first type of frame includes the downmix signal and the stereo parameter set
  • the third The type frame includes a stereo parameter set and does not include a downmix signal
  • the fourth type frame does not include a downmix signal and does not include a stereo parameter set
  • the third type frame and the fourth type frame are respectively a case of the second type frame.
  • the decoder determines that the Nth frame code stream is the first type of frame, decoding the Nth frame code stream, and obtaining the Nth frame stereo parameter set while obtaining the Nth frame downmix signal, and according to the Nth frame stereo parameter set At least one stereo parameter in the parameter set, based on the third algorithm, restores the Nth frame downmix signal to the Nth frame audio signal.
  • the decoder determines that the Nth code stream is the second type of frame, there are two cases:
  • the Nth frame code stream is the third type frame
  • the Nth frame code stream is decoded, the Nth frame stereo parameter set is obtained, and the Nth frame downmix signal is obtained based on the predetermined first algorithm, and the Nth frame is established according to the Nth frame.
  • the k frame stereo parameter set is determined from the at least one frame stereo parameter set before the Nth frame stereo parameter set according to the preset second rule, and is based on the k frame stereo.
  • a set of parameters based on a predetermined fourth algorithm, obtaining a set of stereo parameters of the Nth frame, k being a positive integer greater than zero, and obtaining a downmix signal of the Nth frame based on the predetermined first algorithm, and according to at least a set of stereo parameters of the Nth frame
  • a stereo parameter, based on the third algorithm restores the Nth frame downmix signal to the Nth frame audio signal.
  • the fifth type frame includes the downmix signal and the stereo parameter set
  • the sixth The type frame includes a downmix signal and does not include a stereo parameter set.
  • the fifth type frame and the sixth type frame are respectively a case of the first type frame, and the second type frame does not include the downmix signal and does not include the stereo parameter set.
  • the decoder determines that the Nth frame stream is the first type of frame, there are two cases:
  • the Nth frame code stream is the fifth type frame
  • the Nth frame code stream is decoded, and when the Nth frame downmix signal is obtained, the Nth frame stereo parameter set is also obtained, and according to the Nth frame stereo parameter set.
  • At least one stereo parameter in the third algorithm is used to restore the Nth frame downmix signal to the Nth frame audio signal;
  • the Nth frame code stream is the sixth type frame
  • the Nth frame code stream is decoded to obtain the Nth frame downmix signal, and at least one frame before the Nth frame stereo parameter set according to the preset second rule.
  • the k-frame stereo parameter set is determined, and according to the k-frame stereo parameter set, the N-th stereo parameter set is obtained based on the predetermined fourth algorithm, and is based on at least one stereo parameter in the N-th stereo parameter set, based on The third algorithm restores the Nth frame downmix signal to the Nth frame audio signal;
  • the Nth frame downmix signal is obtained based on the predetermined first algorithm, and at least one frame stereo before the Nth frame stereo parameter set according to the preset second rule.
  • the parameter set determine the k-frame stereo parameter set and according to the k-frame stereo parameter
  • the set based on the predetermined fourth algorithm, obtains the Nth frame stereo parameter set, and restores the Nth frame downmix signal to the Nth frame audio signal according to the third algorithm according to at least one stereo parameter in the Nth frame stereo parameter set .
  • the fifth type frame includes the downmix signal and the stereo parameter set
  • the sixth The type frame includes a downmix signal and does not include a stereo parameter set.
  • the fifth type frame and the sixth type frame are respectively a case of the first type frame
  • the third type frame includes a stereo parameter set and does not include a downmix signal.
  • the fourth type of frame does not include a downmix signal and does not include a stereo parameter set
  • the third type frame and the fourth type frame are respectively a case of the second type of frame:
  • the decoder determines that the Nth frame stream is the first type of frame, there are two cases:
  • the Nth frame code stream is the fifth type frame
  • the Nth frame downmix signal is obtained, and the Nth frame stereo parameter set is also obtained, and according to the Nth frame stereo parameter set.
  • At least one stereo parameter in the third algorithm is used to restore the Nth frame downmix signal to the Nth frame audio signal;
  • the Nth frame downmix signal is obtained, and according to the preset second rule, at least one from the Nth frame stereo parameter set.
  • the frame stereo parameter set determining a k-frame stereo parameter set, and obtaining, according to a predetermined fourth algorithm, a N-th stereo parameter set according to the k-frame stereo parameter set, and according to at least one stereo parameter in the N-th stereo parameter set, Restoring the Nth frame downmix signal to the Nth frame audio signal based on the third algorithm;
  • the decoder determines that the Nth code stream is the second type of frame, there are two cases:
  • the Nth frame code stream is the third type frame
  • the Nth frame code stream is decoded, the Nth frame stereo parameter set is obtained, and the Nth frame downmix signal is obtained based on the predetermined first algorithm, and according to the Nth frame stereo signal.
  • the k frame stereo parameter set is determined from the at least one frame stereo parameter set before the Nth frame stereo parameter set according to the preset second rule, and According to the k-frame stereo parameter set, based on the predetermined fourth algorithm, the Nth frame stereo parameter set is obtained, k is a positive integer greater than zero, and the Nth frame downmix signal is obtained based on the predetermined first algorithm, and according to the Nth frame stereo parameter At least one stereo parameter in the set, based on the third algorithm, restores the Nth frame downmix signal to the Nth frame audio signal.
  • an encoder including: a signal detecting unit and a signal encoding unit, wherein the signal detecting unit is configured to detect whether a voice signal is included in a downmix signal of the Nth frame, and the down signal of the Nth frame is caused by multiple The Nth frame audio signal of the two channels in the channel is obtained based on a predetermined first algorithm, and N is a positive integer greater than zero; the signal encoding unit is configured to include the Nth frame downmix signal detected by the signal detecting unit.
  • the N-th frame downmix signal is encoded, and when the signal detecting unit detects that the Nth frame downmix signal does not include the voice signal: if the signal detecting unit determines that the Nth frame downmix signal satisfies the preset audio frame The encoding condition encodes the downmix signal of the Nth frame; if the signal detecting unit determines that the downmix signal of the Nth frame does not satisfy the preset audio frame encoding condition, the downmix signal of the Nth frame is not encoded.
  • the signal encoding unit includes a first signal encoding unit and a second signal encoding unit.
  • the signal detecting unit detects that the N-th downmix signal includes a voice signal
  • the signal detecting unit notifies The first signal encoding unit encodes the Nth frame downmix signal; if the signal detecting unit determines that the Nth frame downmix signal satisfies the preset speech frame encoding condition, notifying the first signal encoding unit to encode the Nth frame downmix signal, Specifically, the first signal coding unit encodes the Nth frame downmix signal according to the preset voice frame coding rate; if the signal detection unit determines that the Nth frame downmix signal does not satisfy the preset voice frame coding condition, but satisfies the preset The mute insertion frame SID encoding condition notifies the second signal encoding unit to encode the Nth frame downmix signal. Specifically, the second signal encoding unit encodes the Nth frame downmix signal according to the preset
  • the method further includes a parameter generating unit, a parameter encoding unit, and a parameter detecting unit, wherein the parameter generating unit is configured to obtain the Nth frame stereo parameter set according to the Nth frame audio signal, and the Nth
  • the stereo parameters of the frame include Z stereo parameters, and the Z stereo parameters include parameters used by the encoder to mix the audio signals of the Nth frame based on a predetermined first algorithm.
  • the parameter encoding unit is configured to encode the Nth frame stereo parameter set when the signal detecting unit detects that the Nth frame downmix signal includes a speech signal, and detects in the signal detecting unit When the voice signal is not included in the Nth frame downmix signal: if the parameter detecting unit determines that the Nth frame stereo parameter set satisfies the preset stereo parameter encoding condition, encoding at least one stereo parameter in the Nth frame stereo parameter set; The parameter detecting unit determines that the stereo parameter set of the Nth frame does not satisfy the preset stereo parameter encoding condition, and does not encode the stereo parameter set.
  • the parameter coding unit is configured to obtain X target stereo parameters according to preset stereo parameter reduction rules according to Z stereo parameters in the Nth stereo parameter set, and X target stereo parameter encodings, where X is a positive integer greater than zero and less than or equal to Z.
  • the parameter generating unit includes a first parameter generating unit and a second parameter generating unit;
  • the signal detecting unit detects that the Nth frame audio signal includes a voice signal or the signal detecting unit detects that the Nth frame audio signal does not include a voice signal, and the Nth frame audio signal satisfies a preset voice frame coding condition, and notifies the first parameter generation.
  • the unit generates an Nth frame stereo parameter set. Specifically, the first parameter generating unit obtains an Nth frame stereo parameter set according to the first stereo parameter set generation manner according to the Nth frame audio signal, and uses the parameter encoding unit to the Nth frame.
  • Stereo parameter set encoding specifically, when the parameter encoding unit includes the first parameter encoding unit and the second parameter encoding unit, encoding the Nth frame stereo parameter set by the first parameter encoding unit; wherein, the first parameter encoding unit specifies The coding mode is the first coding mode, and the coding mode specified by the second parameter coding unit is the second coding mode. Specifically, the coding rate specified by the first coding mode is not less than the coding rate specified by the second coding mode; and/or, Any stereo parameter in the Nth stereo parameter set, the first encoding method specifies The quantization accuracy is not lower than the quantization precision specified by the second coding mode;
  • the second parameter generating unit obtains the Nth frame stereo parameter set according to the second stereo parameter set generating manner according to the Nth frame audio signal, and obtains the parameter
  • the detecting unit determines, when the Nth frame stereo parameter set satisfies the preset stereo parameter encoding condition, the Nth frame stereo parameter set by the parameter encoding unit At least one of the stereo parameters of the Nth frame stereo parameter is encoded by the second parameter encoding unit when the parameter encoding unit includes the first parameter encoding unit and the second parameter encoding unit;
  • the parameter detecting unit determines that the stereo parameter set of the Nth frame does not satisfy the preset stereo parameter encoding condition, the stereo parameter set is not encoded
  • the first stereo parameter set generation manner and the second stereo parameter set generation manner satisfy at least one of the following conditions:
  • the number of stereo parameter types included in the stereo parameter set specified by the first stereo parameter set generation mode is not less than the number of stereo parameter types included in the stereo parameter set specified by the second stereo parameter set generation mode, and the first stereo parameter
  • the number of stereo parameters included in the stereo parameter set specified by the set generation mode is not less than the number of stereo parameters included in the stereo parameter set specified by the second stereo parameter set generation mode, and the stereo set by the first stereo parameter set generation mode
  • the resolution of the parameter in the time domain is not lower than the resolution of the corresponding stereo parameter specified by the second stereo parameter set generation mode, and the resolution of the stereo parameter specified by the first stereo parameter set generation mode is not lower than the resolution in the frequency domain.
  • the second stereo parameter set generation mode specifies the resolution of the corresponding stereo parameter in the frequency domain.
  • the parameter encoding unit includes a first parameter encoding unit and a second parameter encoding unit.
  • the first parameter encoding unit is configured to include a voice signal in the Nth frame downmix signal and When the voice signal is not included in the downmix signal of the Nth frame but the speech frame coding condition is satisfied, the Nth frame stereo parameter set is encoded according to the first coding mode; the second parameter coding unit is configured to not satisfy the downmix signal in the Nth frame.
  • the speech frame encoding condition encoding at least one stereo parameter in the Nth frame stereo parameter set according to the second encoding mode;
  • the coding rate specified by the first coding mode is not less than the coding rate specified by the second coding mode; and/or, for any stereo parameter in the Nth frame stereo parameter set, the quantization accuracy specified by the first coding mode is not lower than The quantization accuracy specified by the second coding method.
  • At least one stereo parameter in the set of stereo parameters of the Nth frame comprises: an inter-channel level difference ILD; the preset stereo parameter encoding condition includes: D L ⁇ D 0 ;
  • D L represents a degree of deviation of the ILD from the first standard
  • the first criterion is a T-frame stereo parameter set before the N-th stereo parameter set
  • T is a positive integer greater than 0, determined according to a predetermined second algorithm
  • At least one stereo parameter in the set of stereo parameters of the Nth frame comprises: an inter-channel time difference ITD; the preset stereo parameter encoding condition includes: D T ⁇ D 1 ;
  • D T represents a degree of deviation of the ITD from the second standard
  • the second criterion is based on a T-frame stereo parameter set before the N-th stereo parameter set, and T is a positive integer greater than 0, determined based on a predetermined third algorithm
  • At least one stereo parameter in the Nth stereo parameter set includes: an inter-channel phase difference IPD; the preset stereo parameter encoding condition includes: D p ⁇ D 2 ;
  • D P represents the degree of deviation of the IPD from the third standard
  • the third criterion is determined according to a predetermined fourth algorithm according to the T-frame stereo parameter set before the N-th stereo parameter set, and T is a positive integer greater than 0.
  • D L , D T , and D P respectively satisfy the following expressions:
  • ILD(m) is a level difference when the N channel transmits the Nth frame audio signal in the mth subband
  • M is the total number of subbands occupied by the Nth frame audio signal.
  • T is a positive integer greater than 0
  • ILD [-t] (m) is the two channels in the mth subband respectively.
  • the level difference when transmitting the t-th frame audio signal before the Nth frame audio signal, and the ITD is the time difference when the N-channel audio signal is transmitted by the two channels respectively.
  • ITD [-t] The average value of the ITD in the T-frame stereo parameter set before the Nth frame, ITD [-t] is the time difference when the t-frame audio signal before the N-th frame audio signal is transmitted separately, IPD ( m) a phase difference value when the two channels respectively transmit a partial audio signal in the Nth frame audio signal in the mth subband, The average value of the IPD in the mth subband in the T frame stereo parameter set before the Nth frame, IPD [-t] (m) is the two channels before the transmission of the Nth frame audio signal in the mth subband respectively The phase difference value of the t-th frame audio signal.
  • a fourth aspect provides a decoder, including: a receiving unit and a decoding unit, wherein the receiving unit is configured to receive a code stream, the code stream includes at least two frames, and at least one first type frame exists in at least two frames And at least one second type of frame, the first type of frame includes a downmix signal, and the second type of frame does not include a downmix signal; for the Nth frame code stream, N is a positive integer greater than 1, and the decoding unit is configured to: If the code stream of the Nth frame is determined to be the first type of frame, the code stream of the Nth frame is decoded to obtain a downmix signal of the Nth frame; if the code stream of the Nth frame is determined to be the second type of frame, according to the preset first rule Determining the m-frame downmix signal from the at least one frame downmix signal before the downmix signal of the Nth frame, and obtaining the Nth frame downmix signal based on the predetermined first algorithm according to the m frame downmix signal, where m
  • the Nth frame downmix signal is obtained by the encoder mixing the Nth frame audio signals of the two channels of the multichannel based on a predetermined second algorithm.
  • the first type of frame includes a downmix signal and a stereo parameter set
  • the second type of frame includes a stereo parameter set and does not include a downmix signal:
  • the decoding unit is further configured to: if the Nth frame code stream is determined to be the first type of frame, decode the Nth frame code stream, and obtain the Nth frame stereo parameter set while obtaining the Nth frame downmix signal;
  • the N frame code stream is a second type of frame, and the Nth frame code stream is decoded to obtain an Nth frame stereo parameter set, and at least one stereo parameter in the Nth frame stereo parameter set is used by the decoder based on a predetermined third algorithm.
  • the N frame downmix signal is restored to the Nth frame audio signal;
  • a signal restoring unit configured to restore the Nth frame downmix signal to the Nth frame audio signal according to the third algorithm according to at least one stereo parameter in the Nth frame stereo parameter set.
  • the first type of frame includes a downmix signal and a stereo parameter set
  • the second type of frame does not include a downmix signal and does not include a stereo parameter set
  • the decoding unit is further configured to: if the Nth frame code stream is determined to be the first type of frame, decode the Nth frame code stream, and obtain the Nth frame stereo parameter set while obtaining the Nth frame downmix signal;
  • the N frame code stream is a second type of frame, and the k frame stereo parameter set is determined from the at least one frame stereo parameter set before the Nth frame stereo parameter set according to a preset second rule, and according to the k frame stereo parameter set, Obtaining a Nth frame stereo parameter set based on a predetermined fourth algorithm, where k is a positive integer greater than zero;
  • the at least one stereo parameter in the Nth frame stereo parameter set is used by the decoder to restore the Nth frame downmix signal to the Nth frame audio signal based on a predetermined third algorithm;
  • a signal restoring unit configured to restore the Nth frame downmix signal to the Nth frame audio signal according to the third algorithm according to at least one stereo parameter in the Nth frame stereo parameter set.
  • the first type of frame includes a downmix signal and a stereo parameter set
  • the third type of frame includes a stereo parameter set and does not include a downmix signal
  • the fourth type of frame does not include a lower
  • the mixed signal does not include a stereo parameter set
  • the third type frame and the fourth type frame are respectively a second type of frame:
  • the decoding unit is further configured to: if the Nth frame code stream is determined to be the first type of frame, decode the Nth frame code stream, and obtain the Nth frame stereo parameter set while obtaining the Nth frame downmix signal;
  • the N frame code stream is the second type frame: when the Nth frame code stream is the third type frame, the Nth frame code stream is decoded to obtain the Nth frame stereo parameter set; when the Nth frame code stream is the fourth type
  • the k-frame stereo parameter set is determined from the at least one frame stereo parameter set before the N-th stereo parameter set, and is obtained according to the predetermined fourth algorithm according to the k-frame stereo parameter set. a set of stereo parameters of the Nth frame, k being a positive integer greater than zero;
  • the at least one stereo parameter in the Nth frame stereo parameter set is used by the decoder to restore the Nth frame downmix signal to the Nth frame audio signal based on a predetermined third algorithm;
  • a signal restoring unit configured to restore the Nth frame downmix signal to the Nth frame audio signal according to the third algorithm according to at least one stereo parameter in the Nth frame stereo parameter set.
  • the fifth type frame includes a downmix signal and a stereo parameter set
  • the sixth type frame includes a downmix signal and does not include a stereo parameter set
  • the fifth type frame and the sixth type The frame is a case of a first type of frame, and the second type of frame does not include a downmix signal and does not include a stereo parameter set:
  • the decoding unit is further configured to: if the Nth frame code stream is determined to be the first type of frame: when the Nth frame code stream is the fifth type of frame, decode the Nth frame code stream, and obtain the Nth frame downmix signal, The Nth frame stereo parameter set is also obtained; when the Nth frame code stream is the sixth type frame, the k frame is determined from the at least one frame stereo parameter set before the Nth frame stereo parameter set according to the preset second rule.
  • Stereo parameter set and according to the k-frame stereo parameter set, based on the predetermined fourth algorithm, obtain the Nth frame stereo parameter set; if the Nth frame code stream is determined to be the second type frame, according to the preset second rule, from the Nth Determining a k-frame stereo parameter set in the at least one frame stereo parameter set before the frame stereo parameter set, and obtaining an Nth frame stereo parameter set based on the predetermined fourth algorithm according to the k-frame stereo parameter set;
  • the at least one stereo parameter of the Nth frame stereo parameter set is used by the decoder to restore the Nth frame downmix signal to the Nth frame audio signal based on a predetermined third algorithm, where k is a positive integer greater than zero;
  • a signal restoring unit configured to restore the Nth frame downmix signal to the Nth frame audio signal according to the third algorithm according to at least one stereo parameter in the Nth frame stereo parameter set.
  • the fifth type frame includes a downmix signal and a stereo parameter set
  • the sixth type frame includes a downmix signal and does not include a stereo parameter set
  • the fifth type frame and the sixth type The frame is a case of the first type of frame
  • the third type of frame contains the stereo parameter set and does not include the downmix signal
  • the fourth type of frame does not include the downmix signal and does not include the stereo parameter set
  • the fourth type of frame is a case of the second type of frame:
  • the decoding unit is further configured to: if the Nth frame code stream is determined to be the first type of frame: when the Nth frame code stream is the fifth type of frame, decode the Nth frame code stream, and obtain the Nth frame downmix signal, The Nth frame stereo parameter set is also obtained; when the Nth frame code stream is the sixth type frame, the k frame is determined from the at least one frame stereo parameter set before the Nth frame stereo parameter set according to the preset second rule. Three-dimensional The acoustic parameter set, and based on the k-frame stereo parameter set, obtains the Nth frame stereo parameter set based on the predetermined fourth algorithm.
  • the decoding unit is further configured to: if the Nth frame code stream is determined to be the second type of frame: when the Nth frame code stream is the third type of frame, the Nth frame code stream is decoded to obtain the Nth frame stereo parameter set; When the frame code stream is the fourth type of frame, the k-frame stereo parameter set is determined from the at least one frame stereo parameter set before the Nth frame stereo parameter set according to the preset second rule, and according to the k-frame stereo parameter set, Obtaining a Nth frame stereo parameter set based on a predetermined fourth algorithm;
  • the at least one stereo parameter of the Nth frame stereo parameter set is used by the decoder to restore the Nth frame downmix signal to the Nth frame audio signal based on a predetermined third algorithm, where k is a positive integer greater than zero;
  • the decoder further includes a signal restoration unit
  • a signal restoring unit configured to restore the Nth frame downmix signal to the Nth frame audio signal according to the third algorithm according to at least one stereo parameter in the Nth frame stereo parameter set.
  • a codec system comprising the encoder of any of the third aspects, and the decoder of any of the fourth aspects.
  • an embodiment of the present invention further provides a terminal device, where the terminal device includes a processor and a memory, where the memory is used to store a software program, and the processor is configured to read a software program stored in the memory and implement The method provided by the first aspect or any one of the foregoing first aspects.
  • the embodiment of the present invention further provides a computer storage medium, which may be non-volatile, that is, the content is not lost after power off.
  • the storage medium stores a software program that, when read and executed by one or more processors, implements the method provided by the first aspect or any one of the foregoing first aspects.
  • FIG. 1 is a schematic flow chart of a method for processing multi-channel audio signals according to an embodiment of the present invention
  • FIG. 2 is a schematic flow chart of a method for processing multi-channel audio signals according to Embodiment 2 of the present invention
  • 3a-3d are schematic diagrams of an encoder according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a decoder according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a codec system according to an embodiment of the present invention.
  • the audio signal is encoded or decoded in units of frames.
  • the Nth frame audio signal is the Nth audio frame
  • the voice signal is included in the Nth frame audio signal.
  • the Nth audio frame is a voice frame.
  • the voice signal is not included in the Nth frame audio frame
  • the background noise signal is included
  • the Nth audio frame is a noise frame.
  • N is greater than zero. Integer.
  • the encoder and decoder in the embodiment of the present invention can be installed on a device supporting a multi-channel audio signal processing (such as a mobile phone, a notebook computer, a tablet computer, etc.), a server, etc., for processing a multi-channel audio signal.
  • a device supporting a multi-channel audio signal processing such as a mobile phone, a notebook computer, a tablet computer, etc.
  • a server etc.
  • the device such as the terminal and the server is provided with the function of processing the multi-channel audio signal in the embodiment of the present invention.
  • the audio signal can be encoded by the non-continuous encoding mechanism in the multi-channel communication system, the compression efficiency of the audio signal is greatly improved.
  • N is a positive integer greater than zero. It is assumed that the Nth frame downmix signal is obtained by mixing the Nth frame audio signals of the two channels in the multichannel.
  • the multi-channel is two channels, wherein the two channels are the first channel and the second channel, respectively, the two channels of the multi-channel are the first channel and the second channel, and the Nth frame
  • the downmix signal is the Nth of the first channel
  • the frame audio signal is mixed with the Nth frame audio signal of the second channel; when the multichannel is three channels or more, the downmix signal is the two channel audio paired by the multichannel
  • the signal is mixed, specifically, taking three channels as an example, including the first channel, the second channel, and the third channel. It is assumed that only the first channel is paired with the second channel according to the set rule.
  • the two channels of the multi-channel are the first channel and the second channel, and the Nth frame audio signal in the first channel and the N-th frame audio signal in the second channel are downmixed to obtain the Nth Frame downmix signal; assuming that in the three channels, the first channel and the second channel pair, the second channel and the third channel pair, the multichannel Chinese two channels can be the first channel and The second channel can also be the second channel and the third channel.
  • a method for processing a multi-channel audio signal according to Embodiment 1 of the present invention includes:
  • Step 100 The encoder generates an Nth frame stereo parameter set according to the Nth frame audio signal of the two channels in the multichannel, wherein the stereo parameter set includes Z stereo parameters.
  • the Z stereo parameters include parameters used by the encoder to mix the Nth frame audio signals based on a predetermined first algorithm, and Z is a positive integer greater than zero.
  • the predetermined first algorithm is a downmix signal generation algorithm that is previously set in the encoder.
  • the preset stereo parameter generation algorithm is as follows, and the stereo parameter obtained according to the Nth frame audio signal is Inter-channel Level Difference (ILD):
  • L(i) is the Discrete Fourier Transform (DFT) coefficient of the audio signal of the Nth frame of the left channel at the ith frequency point
  • R(i) is the audio signal of the Nth frame of the right channel.
  • the DFT coefficient of the i-th frequency point ReL(i) is the real part of L(i)
  • ImL(i) is the imaginary part of L(i)
  • ReR(i) is the real part of R(i)
  • ImR( i) is the imaginary part of R(i)
  • PL(i) is the energy spectrum of the audio signal of the Nth frame of the left channel at the i-th frequency point
  • PR(i) is the audio signal of the Nth frame of the right channel at the ith
  • EL(m) is the energy of the Nth frame audio signal in the mth subband of the left channel
  • ER(m) is the energy of the Nth frame audio signal in the mth subband of the right channel.
  • the preset stereo parameter generation algorithm also includes calculating other stereo parameters such as Inter-channel Time Difference (ITD), Inter-channel Phase Difference (IPD), IC (Inter- Channel Coherence (channel coherence) algorithm for stereo parameters, the encoder can also obtain stereo parameters such as ITD, IPD, IC based on the audio signal and based on the preset stereo parameter generation algorithm.
  • ITD Inter-channel Time Difference
  • IPD Inter-channel Phase Difference
  • IC Inter- Channel Coherence (channel coherence) algorithm for stereo parameters
  • the encoder can also obtain stereo parameters such as ITD, IPD, IC based on the audio signal and based on the preset stereo parameter generation algorithm.
  • the Nth frame stereo parameter set includes at least one stereo parameter, for example, according to the Nth frame audio signal of the two channels, based on the preset stereo parameter generation algorithm, and obtains IPD, ITD, ILD, and IC, by IPD. , ITD, ILD, and IC form the Nth frame stereo parameter set.
  • Step 101 The encoder mixes the Nth frame audio signals of the two channels into the Nth frame downmix signal based on the predetermined first algorithm according to at least one stereo parameter in the Nth frame stereo parameter set.
  • the Nth stereo parameter set includes ITD, ILD, IPD, and IC.
  • the Nth frame downmix signal is obtained based on the predetermined first algorithm.
  • the Nth frame downmix signal DMX(k) The following expression is satisfied at the kth frequency point:
  • DMX(k) is the Nth frame downmixed signal at the kth frequency point
  • indicates the Kth pair of channels left
  • represents the amplitude of the kth frequency point of the Nth frame audio signal in the right channel of the K pair channel
  • ⁇ L( k) represents the phase angle of the Nth frame audio signal in the left channel at the kth frequency point
  • ILD(k) represents the ILD of the Nth frame audio signal at the kth frequency point
  • IPD(k) represents the Nth frame audio.
  • the embodiment of the present invention is not limited to other algorithms for obtaining a downmix signal.
  • the encoding of the Nth frame stereo parameter set is performed in order to enable the decoder to restore the Nth frame downmix signal.
  • the encoder performs the Nth frame stereo parameter.
  • the generated Nth frame stereo parameter set includes ITD, ILD, IPD, and IC.
  • the encoder only based on the ILD and IPD in the Nth frame stereo parameter set, based on the predetermined first algorithm, the two channels are The Nth frame audio signal is mixed into the Nth frame downmix signal, so to improve the compression efficiency, the encoder can encode only the ILD and IPD in the Nth frame stereo parameter set.
  • Step 102 The encoder detects whether the voice signal is included in the downmix signal of the Nth frame. If yes, step 103 is performed; otherwise, step 104 is performed.
  • the encoder In order to facilitate the encoder to detect whether the N-frame downmix signal includes a voice signal, optionally, the encoder directly detects whether the N-frame downmix signal includes a voice signal through Voice Activity Detection (VAD).
  • VAD Voice Activity Detection
  • an indirect method for detecting whether the N-frame downmix signal includes a voice signal is performed by the encoder, and the encoder directly detects whether the voice signal is included in the audio signal of the Nth frame through the VAD. Specifically, when the encoder detects that the audio signal of one of the two channels includes the voice signal, determining that the downmix signal obtained by mixing the audio signals in the two channels includes the voice signal, and the encoder determines two sounds. When the audio signal in the track does not include the voice signal, it is determined that the downmix signal obtained by mixing the audio signals in the two channels contains the voice signal. It should be noted that, in this indirect detection mode, the order between step 102 and step 100 and step 101 is not limited, as long as step 100 is before step 101.
  • Step 103 The encoder encodes the Nth frame downmix signal, and step 107 is performed.
  • the encoder encodes the Nth frame downmix signal to obtain an Nth frame code stream.
  • the code stream includes two frame types: a first type frame and a second type frame, wherein the first type frame includes a downmix signal, and the second type frame The downmix signal is not included, and the Nth frame code stream obtained in step 103 is the first type of frame.
  • the encoder encodes the N-th frame downmix signal according to a preset voice frame coding rate, preferably, a preset voice frame coding.
  • the rate can be set to 13.2kbps.
  • the encoder encodes the Nth frame downmix signal
  • the Nth frame stereo parameter set is encoded.
  • Step 104 The encoder determines whether the downmix signal of the Nth frame satisfies a preset audio frame coding condition. If yes, step 105 is performed; otherwise, step 106 is performed.
  • the preset audio frame coding condition is a condition for determining whether to encode the Nth frame downmix signal pre-configured in the encoder.
  • the first frame downmix signal if the first frame downmix signal does not include a voice signal, the first frame downmix signal satisfies a preset audio frame coding condition, that is, regardless of the first frame downmix signal Whether the speech signal is included or not must encode the first frame downmix signal.
  • Step 105 The encoder encodes the Nth frame downmix signal, and step 107 is performed.
  • the Nth frame code stream obtained through step 105 is also a first type of frame.
  • the encoder encodes the Nth frame downmix signal
  • the Nth frame stereo parameter set is encoded.
  • step 1 of the embodiment of the present invention the coding manner of the downmix signal of the Nth frame is the same.
  • the encoder since the down signal of the Nth frame in step 105 does not include a voice signal, when the downmix signal of the Nth frame satisfies a preset voice frame coding condition, the encoder performs the Nth according to the preset voice frame coding rate.
  • the frame downmix signal is encoded; when the downmix signal of the Nth frame does not satisfy the preset speech frame encoding condition but satisfies the preset SID encoding condition, the encoder downmixes the signal to the Nth frame according to the preset SID encoding rate.
  • Encoding wherein the preset SID encoding rate can be set to 2.8 kbps.
  • the encoder encodes the downmix signal of the Nth frame according to the SID coding mode, where
  • the SID encoding method specifies the encoding rate as a preset SID encoding rate, and specifies the algorithm used for encoding and the parameters used for encoding.
  • the preset voice frame coding condition may be: the length of the downmix signal of the Nth frame down to the Mth frame is not greater than the preset duration, wherein the downmix signal of the Mth frame includes a voice signal, and the Mth frame downmix signal It is a downmix signal containing a speech signal closest to the downmix signal of the Nth frame.
  • the preset SID encoding condition may be an odd frame encoding. When N in the downmix signal of the Nth frame is an odd number, the encoder determines that the Nth frame downmix signal satisfies the preset SID encoding condition.
  • Step 106 The encoder does not encode the Nth frame downmix signal, and step 109 is performed.
  • the Nth frame code stream obtained in step 106 is a second type of frame.
  • the encoder determines that the downmix signal of the Nth frame does not satisfy the preset audio frame coding condition. Specifically, the encoder determines that the downmix signal of the Nth frame does not satisfy the preset voice frame coding condition, and does not satisfy the preset SID coding condition. .
  • the encoder does not encode the downmix signal of the Nth frame. Specifically, the Nth frame downmix signal is not included in the code stream of the Nth frame.
  • the Nth frame stereo parameter set may be encoded, or the Nth frame stereo parameter set may not be encoded.
  • the encoder when the encoder does not encode the Nth frame downmix signal, the encoding of the Nth frame stereo parameter set is taken as an example, but optionally, the encoder does not downmix the signal to the Nth frame.
  • the Nth frame stereo parameter set may not be encoded.
  • the specific encoder does not encode the Nth frame stereo parameter and the Nth frame downmix signal
  • the decoder obtains the Nth frame downmix signal and the Nth frame stereo.
  • the manner of parameter set refer to Embodiment 2 of the present invention.
  • step 107 the encoder sends the Nth frame code stream to the decoder.
  • the Nth frame code stream includes not only the Nth frame stereo parameter
  • the set of numbers also includes the Nth frame downmix signal.
  • Step 108 The decoder determines that the Nth frame code stream is the first type of frame, and then decodes the Nth frame code stream to obtain the Nth frame downmix signal and the Nth frame stereo parameter set, and step 111 is performed.
  • the second type of frame does not include a downmix signal. Therefore, the size of the first type of frame is larger than the size of the second type of frame, and the decoder can pass the The size of the frame code stream is used to determine whether the Nth frame code stream is the first type frame or the second type frame.
  • the identifier bit may be encapsulated in the Nth frame code stream, and the decoder is in the Nth frame code. After the stream portion is decoded, the identifier bit is obtained, and the code stream of the Nth frame is determined to be the first type frame or the first type frame according to the identifier bit. For example, if the identifier bit is 1, the code stream of the Nth frame is the first type frame, and the flag bit is 0.
  • the Nth frame code stream is a second type of frame.
  • the decoder determines the decoding mode according to the rate corresponding to the Nth stream, for example, the rate of the Nth frame stream is 17.4 kbps, wherein the rate of the stream corresponding to the downmix signal is 13.2 kbps, stereo.
  • the code flow rate corresponding to the parameter set is 4.2 kbps
  • the code stream corresponding to the downmix signal is decoded according to the decoding method corresponding to 13.2 kbps
  • the code stream corresponding to the stereo parameter set is decoded according to the decoding method corresponding to 4.2 kbps.
  • the decoder determines the coding mode of the Nth code stream according to the coding mode identifier bit in the Nth code stream, and then decodes the Nth frame code stream according to the decoding mode corresponding to the coding mode.
  • Step 109 The encoder sends an Nth frame code stream to the decoder, where the Nth frame code stream includes an Nth frame stereo parameter set.
  • Step 110 The decoder determines that the Nth frame code stream is the second type of frame, and then decodes the Nth frame code stream to obtain the Nth frame stereo parameter set, and according to the preset first rule, before the Nth frame is downmixed.
  • the m frame downmix signal is determined, and based on the m frame downmix signal, the Nth frame downmix signal is obtained based on the predetermined first algorithm, where m is a positive integer greater than zero.
  • the average value of the (N-3)th frame, the (N-2)th frame, and the (N-1)th frame downmix signal is taken as the Nth frame downmix signal, or the (N-1)
  • the frame downmix signal is directly used as the Nth frame downmix signal, or the Nth frame downmix signal is estimated according to other algorithms.
  • the (N-1) frame downmix signal can also be directly used as the Nth frame downmix signal; or According to the (N-1) frame downmix signal and a preset offset value, the Nth frame downmix signal is obtained based on a preset algorithm.
  • Step 111 The decoder restores the Nth frame downmix signal to the N channel Nth frame audio signal according to a predetermined second algorithm according to the target stereo parameter of the Nth frame stereo parameter set.
  • the target stereo parameter is at least one stereo parameter in the Nth stereo parameter set.
  • the process of the decoder reverting the Nth frame downmix signal to the N channel Nth frame audio signal is an inverse process in which the encoder mixes the N channel Nth frame audio signal into the Nth frame downmix signal. Assuming that the encoder side down-mixes the N-th frame according to the IPD and ILD in the N-th stereo parameter set, the decoder then down-mixes the N-th frame according to the IPD and ILD in the N-th stereo parameter set. Restore to the Nth frame signal of each channel in the Kth pair of channels.
  • the algorithm for restoring the downmix signal preset in the decoder may be an inverse algorithm of an algorithm for generating a downmix signal in the encoder, or an algorithm independent of an algorithm for generating a downmix signal in the encoder. .
  • a method for processing a multi-channel audio signal according to Embodiment 2 of the present invention includes:
  • Step 200 The encoder generates an Nth frame stereo parameter set according to the Nth frame audio signal of the two channels in the multichannel, wherein the stereo parameter set includes Z stereo parameters.
  • the Z stereo parameters include parameters used by the encoder to mix the Nth frame audio signals based on a predetermined first algorithm, and Z is a positive integer greater than zero.
  • the predetermined first algorithm is a downmix signal generation algorithm that is preset in the encoder.
  • stereo parameter generation algorithm determines which stereo parameters are included in the Nth stereo parameter set.
  • one channel of the two channels is the left channel and one is the right channel
  • the preset The stereo parameter generation algorithm is as follows, then the stereo parameter obtained from the Nth frame audio signal is ITD:
  • N is the frame length
  • l (j) represents the time domain signal frame of the left channel at time j
  • r (j) represents the time domain signal frame of the right channel at time j
  • the preset stereo parameter generation algorithm further includes the following algorithm for generating an IPD
  • the IPD can also be obtained according to the following algorithm. Specifically, the IPD of the b-th sub-band satisfies the following expression:
  • B is the total number of subbands occupied by the audio signal in the frequency domain
  • L(k) is the signal of the Nth frame of the left channel in the kth frequency
  • R * (k) is the right sound The conjugate of the signal of the Nth frame of the audio signal at the kth frequency.
  • the preset stereo parameter generation algorithm further includes the algorithm for generating the ILD in the first embodiment of the present invention
  • the ILD can also be obtained.
  • Step 201 The encoder mixes the Nth frame audio signals of the two channels into the Nth frame downmix signal according to a predetermined algorithm according to at least one stereo parameter in the Nth frame stereo parameter set.
  • the method for obtaining the Nth frame downmix signal in the first embodiment of the present invention is not limited to the method for obtaining the Nth frame downmix signal in the embodiment of the present invention.
  • Step 202 The encoder detects whether the voice signal is included in the downmix signal of the Nth frame, and if yes, step 203 is performed, otherwise step 204 is performed.
  • the encoder detects whether the voice signal is included in the downmix signal of the Nth frame, and the encoder can detect whether the voice signal is included in the downmix signal of the Nth frame in the first embodiment of the present invention. The way.
  • Step 203 The encoder encodes the Nth frame downmix signal according to a preset speech frame encoding rate, And encoding the Nth frame stereo parameter set, step 211 is performed.
  • the encoder when the encoder includes two modes for encoding the set of stereo parameters, the first coding mode and the second coding mode, where the coding rate specified by the first coding mode is not less than the coding rate specified by the second coding mode; And/or, for any stereo parameter in the Nth stereo parameter set, the quantization precision specified by the first coding mode is not lower than the quantization precision specified by the second coding mode, and in step 203, the encoder follows the first coding mode. , encoding the Nth frame stereo parameter set.
  • the Nth stereo parameter set includes IPD and ITD
  • the quantization precision of the IPD specified in the first coding mode is not lower than the quantization precision of the IPD specified in the second coding mode, and the quantization of the ITD specified in the first coding mode.
  • the accuracy is not lower than the quantization precision of the ITD specified in the second encoding method.
  • the speech frame encoding rate can be set to 13.2 kbps.
  • Step 204 The encoder determines whether the downmix signal of the Nth frame satisfies the preset voice frame coding condition. If yes, step 205 is performed, otherwise, step 206 is performed.
  • Step 205 The encoder encodes the Nth frame downmix signal according to the preset speech frame encoding rate, and encodes the Nth frame stereo parameter set, and performs step 211.
  • the encoder when the encoder includes two modes for encoding the set of stereo parameters, the first coding mode and the second coding mode, where the coding rate specified by the first coding mode is not less than the coding rate specified by the second coding mode; And/or, for any stereo parameter in the Nth stereo parameter set, the quantization precision specified by the first coding mode is not lower than the quantization precision specified by the second coding mode, and in step 205, the encoder follows the first coding mode. , encoding the Nth frame stereo parameter set.
  • Step 206 The encoder determines whether the downmix signal of the Nth frame satisfies the preset SID encoding condition, and determines whether the stereo parameter set of the Nth frame satisfies the preset stereo parameter encoding condition. If the condition is satisfied, step 207 is performed. The N frame downmix signal satisfies the preset SID encoding condition, and the Nth frame stereo parameter set does not satisfy the preset stereo parameter encoding condition, and then step 208 is performed, if the Nth frame downmix signal does not satisfy the preset SID encoding condition, If the Nth frame stereo parameter set satisfies the preset stereo parameter encoding condition, step 209 is performed. If it is not satisfied at the same time, step 210 is performed.
  • the encoder encodes the at least one stereo parameter in the Nth stereo parameter set, it is determined whether the stereo parameter in the at least one stereo parameter satisfies the preset corresponding stereo parameter encoding condition, specifically, if the Nth frame At least one stereo parameter in the stereo parameter set includes: an inter-channel level difference ILD; the preset stereo parameter encoding condition includes: D L ⁇ D 0 ; wherein D L represents a degree of deviation of the ILD from the first standard, first The standard is determined according to a predetermined third algorithm according to a T-frame stereo parameter set before the N-th stereo parameter set, and T is a positive integer greater than 0;
  • At least one stereo parameter in the set of stereo parameters of the Nth frame comprises: an inter-channel time difference ITD; the preset stereo parameter encoding condition includes: D T ⁇ D 1 ;
  • D T represents a degree of deviation of the ITD from the second standard
  • the second criterion is determined according to a predetermined fourth algorithm according to a T-frame stereo parameter set before the N-th stereo parameter set, and T is a positive integer greater than 0;
  • At least one stereo parameter in the Nth stereo parameter set includes: an inter-channel phase difference IPD; the preset stereo parameter encoding condition includes: D p ⁇ D 2 ;
  • D P represents the degree of deviation of the IPD from the third standard
  • the third criterion is determined according to a predetermined fifth algorithm based on the T-frame stereo parameter set before the N-th stereo parameter set, and T is a positive integer greater than 0.
  • the third algorithm, the fourth algorithm, and the fifth algorithm are preset according to actual conditions.
  • the preset stereo parameter encoding condition includes only D T ⁇ D 1 , and then at least one stereo parameter in the Nth stereo parameter set.
  • the included ITD satisfies D T ⁇ D 1 , and encodes at least one stereo parameter in the Nth stereo parameter set;
  • at least one stereo parameter in the Nth stereo parameter set includes only ITD, IPD, preset stereo
  • the parameter encoding condition includes only D T ⁇ D 1 , and then at least one stereo parameter in the Nth stereo parameter set is encoded when at least one stereo parameter included in the stereo parameter set of the Nth frame satisfies D T ⁇ D 1 .
  • the preset stereo parameter encoding conditions include D T ⁇ D 1 and D L ⁇ D 0 , then only in the Nth frame stereo
  • the encoder encodes the ITD and ILD when at least one stereo parameter in the parameter set includes an ITD that satisfies D T ⁇ D 1 and the ILD satisfies D L ⁇ D 0 .
  • D L , D T , and D P respectively satisfy the following expressions:
  • ILD(m) is a level difference when the N channel transmits the Nth frame audio signal in the mth subband
  • M is the total number of subbands occupied by the Nth frame audio signal.
  • T is a positive integer greater than 0
  • ILD [-t] (m) is the two channels in the mth subband respectively.
  • the level difference when transmitting the t-th frame audio signal before the Nth frame audio signal, and the ITD is the time difference when the N-channel audio signal is transmitted by the two channels respectively.
  • ITD [-t] The average value of the ITD in the T-frame stereo parameter set before the Nth frame, ITD [-t] is the time difference when the t-frame audio signal before the N-th frame audio signal is transmitted separately, IPD ( m) a phase difference value when the two channels respectively transmit a partial audio signal in the Nth frame audio signal in the mth subband, The average value of the IPD in the mth subband in the T frame stereo parameter set before the Nth frame, IPD [-t] (m) is the two channels before the transmission of the Nth frame audio signal in the mth subband respectively The phase difference value of the t-th frame audio signal.
  • Step 207 The encoder encodes the Nth frame downmix signal according to the preset SID encoding rate, and encodes at least one stereo parameter in the Nth frame stereo parameter set, and performs step 211.
  • the encoder when the encoder encodes two modes for encoding the stereo parameter set, the first coding mode and the second coding mode, where the coding rate specified by the first coding mode is not less than the second coding
  • the mode encodes at least one stereo parameter in the Nth stereo parameter set.
  • the encoder encodes the Nth frame stereo parameter set according to 4.2 kbps, and in the second coding mode, the encoder encodes the Nth frame stereo parameter set according to 1.2 kbps.
  • the encoder obtains X target stereo parameters according to the preset stereo parameter dimension reduction rules according to the Z stereo parameters in the Nth stereo parameter set. And encoding X target stereo parameters, where X is a positive integer greater than zero and less than or equal to Z.
  • the stereo parameter set of the Nth frame includes three types of stereo parameters: IPD, ITD, and ILD, wherein the ILD is composed of ILDs of 10 sub-bands of ILD(0)...ILD(9), and the IPD is composed of IPD(0)... IPD (9) IPD composition of 10 sub-bands, ITD consists of ITD (0), ITD (1) 2 time domain sub-band ITD, assuming that the preset stereo parameter dimension reduction rule is only two in the stereo parameter set.
  • the encoder selects any two types of stereo parameters from IPD, ITD, ILD. If IPD and ILD are selected, the encoder encodes IPD and ILD.
  • the preset stereo parameter reduction rule is only half of the stereo parameters of each type, then select 5 from ILD(0)...ILD(9) and select from IPD(0)...IPD(9). 5, select one from ITD (0), ITD (1), and encode the selected parameters; or, the preset stereo parameter reduction rule is to select 5 from ILD and IPD respectively, or preset Stereo Parameter Dimensionality Rule
  • the adjacent subbands in ILD(0)...ILD(9) are combined, for example, ILD(0), ILD is obtained.
  • the mean of (1) is obtained by the new ILD(0), and the average of ILD(2) and ILD(3) is obtained to obtain the new ILD(1),..., and the average of ILD(8) and ILD(9) is obtained.
  • the new ILD (4) wherein the sub-band corresponding to the new ILD (0) is equal to the sub-band corresponding to the original ILD (0), ILD (1), ..., the sub-band corresponding to the new ILD (4) is equal to the original ILD ( 8), sub-band corresponding to ILD (9).
  • the adjacent sub-bands in IPD(0)...IPD(9) are combined to obtain a new IPD(0)...IPD(4), which will be ITD(0), ITD (1)
  • the average value is also combined to obtain a new ITD (0), wherein the time domain signal corresponding to the new ITD (0) is the same as the time domain signal corresponding to the original ITD (0) and ITD (1).
  • the new ILD(0)...ILD(4), the new IPD(0)...IPD(4) and the new ITD(0) are encoded.
  • the preset stereo parameter dimension reduction rule is to reduce the frequency domain resolution of the ILD, and then merge the adjacent sub-bands in ILD(0)...ILD(9), for example, to obtain ILD(0), ILD(1)
  • the mean value gets the new ILD(0), and the average of ILD(2) and ILD(3) is obtained to obtain the new ILD(1),..., and the average of ILD(8) and ILD(9) is obtained to obtain the new ILD ( 4), wherein the sub-band corresponding to the new ILD (0) is equal to the sub-band corresponding to the original ILD (0), ILD (1), ..., the sub-band corresponding to the new ILD (4) is equal to the original ILD (8), ILD (9) Corresponding subbands. Then, the new ILD(0)...ILD(4) is encoded.
  • Step 208 The encoder encodes the Nth frame downmix signal according to the preset SID encoding rate, and does not encode at least one stereo parameter in the Nth frame stereo parameter set, and performs step 211.
  • Step 209 The encoder encodes at least one stereo parameter in the Nth frame stereo parameter set, and does not encode the Nth frame downmix signal, and performs step 215.
  • Step 210 The encoder does not encode the Nth frame downmix signal and the Nth frame stereo parameter set, and step 217 is performed.
  • the code stream includes four different types of frames, that is, a third type frame, a fourth type frame, a fifth type frame, and a sixth type frame, wherein the third stream
  • the type frame contains a stereo parameter set and does not include a downmix signal.
  • the fourth type frame does not include a downmix signal and does not include a stereo parameter set.
  • the fifth type frame includes a downmix signal and a stereo parameter set
  • the sixth type frame includes The downmix signal is included and does not include a stereo parameter set
  • the fifth type frame and the sixth type frame are respectively a case including a downmix signal type frame
  • the third type frame and the fourth type frame respectively do not include a downmix signal A case of a type frame.
  • the Nth frame code stream in step 203, step 205, and step 207 is a fifth type frame
  • the Nth frame code stream obtained in step 208 is a sixth type frame
  • the Nth frame obtained in step 209 is obtained.
  • the code stream is a third type of frame
  • the Nth frame code stream obtained in step 211 is a fourth type of frame.
  • Step 211 The encoder sends an Nth frame code stream to the decoder, where the Nth frame code stream includes an Nth frame downmix signal and an Nth frame stereo parameter set.
  • Step 212 The decoder receives the Nth frame code stream, determines that the Nth frame code stream is the fifth type frame, and decodes the Nth frame code stream to obtain the Nth frame downmix signal and the Nth frame stereo parameter set, and performs steps. 218.
  • Embodiment 1 of the present invention For a specific implementation manner in which the decoder determines which type of frame is the N-th code stream, refer to Embodiment 1 of the present invention.
  • the decoder decodes the Nth code stream according to the rate corresponding to the Nth code stream. Specifically, if the encoder encodes the Nth frame downmix signal according to 13.2 kbps, the decoder follows the 13.2 kbps pair N. The code stream of the Nth frame downmix signal in the frame code stream is decoded. If the encoder encodes the Nth frame stereo parameter set according to 4.2 kbps, the decoder performs the Nth frame stereo parameter set in the Nth frame code stream according to 4.2 kbps. Code stream decoding.
  • Step 213 The encoder sends an Nth frame code stream to the decoder, where the Nth frame code stream includes an Nth frame downmix signal.
  • Step 214 The decoder determines that the Nth frame code stream is the sixth type frame, and then decodes the Nth frame code stream to obtain the Nth frame downmix signal, and according to the preset second rule, from the Nth frame stereo parameter set.
  • the k-frame stereo parameter set is determined, and according to the k-frame stereo parameter set, the N-th frame stereo parameter set is obtained based on the predetermined sixth algorithm, and step 218 is performed.
  • the stereo parameter set specified in the second rule is preset to be the frame closest to the distance P, and the stereoscopic parameter set obtained by decoding is obtained according to the following algorithm.
  • Nth frame stereo parameter P Nth frame stereo parameter P:
  • P represents the stereo parameter of the Nth frame, a stereoscopic parameter representing a frame closest to P and obtained by decoding
  • represents an absolute value relative to a smaller random number, for example, ⁇ may be one with A random number between.
  • the stereo parameters in the set of stereo parameters of the Nth frame are not limited to the foregoing method.
  • Step 215 The encoder sends an Nth frame code stream to the decoder, where the Nth frame code stream includes at least one stereo parameter in the Nth frame stereo parameter set.
  • Step 216 The decoder determines that the Nth frame code stream is a third type of frame, and then decodes the Nth frame code stream to obtain at least one stereo parameter in the Nth frame stereo parameter set, and according to the preset first rule, from the first In the at least one frame downmix signal before the N frame downmix signal, the m frame downmix signal is determined, and according to the m frame downmix signal, the Nth frame downmix signal is obtained based on the predetermined second algorithm, where m is greater than zero. Integer, go to step 218.
  • the average value of the (N-3)th frame, the (N-2)th frame, and the (N-1)th frame downmix signal is taken as the Nth frame downmix signal, or the (N-1)
  • the frame downmix signal is directly used as the Nth frame downmix signal, or the Nth frame downmix signal is estimated according to other algorithms.
  • the (N-1) frame downmix signal may be directly used as the Nth frame downmix signal; or, based on the (N-1) frame downmix signal and a preset offset value, based on a preset algorithm. Perform the operation to get the N ⁇ downmix signal.
  • Step 217 after the decoder receives the Nth frame code stream, and determines that the Nth frame code stream is the fourth type of frame, according to the preset second rule, from at least one frame of the stereo parameter set before the Nth frame stereo parameter set, Determining a k-frame stereo parameter set, and obtaining an Nth frame stereo parameter set based on a predetermined sixth algorithm according to the k-frame stereo parameter set;
  • the m-frame downmix signal is determined from the at least one frame downmix signal before the downmix signal of the Nth frame, and according to the m frame downmix signal, the Nth frame is obtained based on the predetermined second algorithm.
  • Mixed signal, m is a positive integer greater than zero.
  • Step 218 The decoder restores the Nth frame downmix signal to the N channel Nth frame audio signal according to a predetermined seventh algorithm according to the target stereo parameter of the Nth frame stereo parameter set.
  • the encoder detects whether the N-frame downmix signal includes a voice signal through the N-th frame audio signal in the two channels, it also provides a coding mode for the stereo parameter set, specifically If the encoder detects that any of the N channels of the audio signals of the two channels includes a voice signal, according to the Nth frame audio signal, based on the first stereo parameter set generation manner, the Nth frame stereo parameter set is obtained, and the Nth frame is obtained.
  • the encoder does not include the voice signal in the Nth frame audio signal in the two channels: if the Nth frame audio signal satisfies the preset voice frame coding condition, the audio signal according to the Nth frame is based on the first a stereo parameter set generation manner, obtaining a Nth frame stereo parameter set, and encoding the Nth frame stereo parameter set; if it is determined that the Nth frame audio signal does not satisfy the preset speech frame encoding condition, according to the Nth frame audio signal, Obtaining an Nth frame stereo parameter set based on the second stereo parameter set generation manner, and
  • the first stereo parameter set generation manner and the second stereo parameter set generation manner satisfy at least one of the following conditions:
  • the number of stereo parameter types included in the stereo parameter set specified by the first stereo parameter set generation mode is not less than the number of stereo parameter types included in the stereo parameter set specified by the second stereo parameter set generation mode, and the first stereo parameter
  • the number of stereo parameters included in the stereo parameter set specified by the set generation mode is not less than the number of stereo parameters included in the stereo parameter set specified by the second stereo parameter set generation mode, and the stereo set by the first stereo parameter set generation mode
  • the resolution of the parameter in the time domain is not lower than the resolution of the corresponding stereo parameter specified by the second stereo parameter set generation mode, and the resolution of the stereo parameter specified by the first stereo parameter set generation mode is not lower than the resolution in the frequency domain.
  • the second stereo parameter set generation mode specifies the resolution of the corresponding stereo parameter in the frequency domain.
  • the stereo parameter set obtained by the first stereo set generation method has higher accuracy in the frequency domain or the time domain than the stereo parameter set obtained by the second stereo set generation mode.
  • the encoder when the encoder detects that the N-th frame downmix signal includes a voice signal, encoding the N-th frame downmix signal according to the voice coding rate, and N frame stereo parameter set encoding; when the encoder detects that the Nth frame downmix signal does not include a speech signal: if the Nth frame downmix signal satisfies a preset speech frame encoding condition, then the Nth frame according to the speech encoding rate Downmixing signal encoding, and encoding the Nth frame stereo parameter set; if the Nth frame downmix signal does not satisfy the preset speech frame encoding condition but satisfies the preset SID encoding condition, then the Nth frame according to the SID encoding rate Downmix signal encoding, and at least a set of stereo parameters in the Nth frame A stereo parameter encoding, if the downmix signal of the Nth frame does not satisfy the preset speech frame
  • the encoder does not judge the stereo parameter set, and the stereo parameter set is not used when the downmix signal is encoded. coding.
  • the code stream obtained by encoding the downmix signal by the third encoder of the embodiment of the present invention includes two types of frames, a first type frame and a second type frame, wherein the first type frame includes a downmix signal and includes a stereo parameter set, The second type of frame does not include the downmix signal and does not include the stereo parameter set.
  • the specific decoder receives the code stream and restores the audio signal of the two channels, refer to the second embodiment of the present invention and the first embodiment of the present invention.
  • the encoder determines the Nth frame stereo. Whether the parameter set satisfies the preset stereo parameter encoding condition. If yes, the encoder does not encode the Nth frame downmix signal, but encodes at least one stereo parameter in the Nth frame stereo parameter set, otherwise the encoder does not downmix the Nth frame signal. And the Nth frame stereo parameter set encoding.
  • the code stream obtained based on the above encoding method includes three types of frames, a first type frame, a third type frame, and a fourth type frame, wherein the first type frame includes a downmix signal and includes a stereo parameter set, and the third type frame A method that does not include a downmix signal but includes a stereo parameter set, and the fourth type frame does not include a downmix signal and does not include a stereo parameter set.
  • the specific decoder receives the code stream, the method of restoring the audio signal of the two channels is described in the present invention. Embodiment 2 and Embodiment 1 of the present invention.
  • the difference between the foregoing technical solution and the second embodiment of the present invention is: determining whether the Nth frame stereo parameter set is determined when the Nth frame downmix signal does not satisfy the preset speech frame coding condition or the preset SID coding condition.
  • the preset stereo parameter encoding conditions are met.
  • the encoder when the encoder detects that the N-th frame downmix signal includes a voice signal, encoding the N-th frame downmix signal according to a voice coding rate, and Encoding the Nth frame stereo parameter set; when the encoder detects the Nth frame downmix signal When the voice signal is included: if the downmix signal of the Nth frame satisfies the preset voice frame coding condition, the Nth frame downmix signal is encoded according to the voice coding rate, and the Nth frame stereo parameter set is encoded; if the Nth frame is under the Nth frame The mixed signal does not satisfy the preset speech frame encoding condition, but satisfies the preset SID encoding condition, and the encoder determines whether the Nth frame stereo parameter set satisfies the preset stereo parameter encoding condition, and when the Nth frame stereo parameter set satisfies the preset The stereo parameter set encoding condition, the encoder detects the Nth frame downmix signal When the voice signal is included: if the downmix signal
  • the code stream obtained by the fourth coding method in the embodiment of the present invention includes three types of frames, a fifth type frame, a sixth type frame, and a second type frame, wherein the fifth type frame includes a downmix signal and a stereo parameter set, and a sixth
  • the type frame includes a downmix signal and does not include a stereo parameter set.
  • the second type frame does not include a downmix signal and does not include a stereo parameter set.
  • the difference between the fourth embodiment of the present invention and the second embodiment of the present invention is: determining whether the Nth frame stereo parameter is used when the downmix signal of the Nth frame does not satisfy the preset voice frame coding condition but meets the preset SID coding condition. At least one stereo parameter encoding in the set, if the preset speech frame encoding condition is not met, and the preset SID encoding condition is not met, the Nth frame stereo parameter set is not encoded.
  • the specific decoder obtains the Nth frame downmix signal and the Nth frame stereo parameter set.
  • Embodiment 2 of the present invention and Embodiment 1 of the present invention and stereo mode.
  • the parameter and the downmix signal coding reference may also be made to Embodiment 2 of the present invention and Embodiment 1 of the present invention.
  • the first and second predetermined first algorithm and the second predetermined algorithm have no special meaning, and are only used to distinguish different algorithms, third, fourth, fifth, and sixth.
  • the seventh and the like are similar, and will not be repeated here.
  • an embodiment of the present invention further provides an encoder, a decoder, and a codec system.
  • the method corresponding to the encoder, the decoder, and the codec system in the embodiment of the present invention is In the embodiment of the present invention, the implementation of the encoder, the decoder, and the codec system of the embodiments of the present invention can be referred to the implementation of the method, and the repeated description is omitted.
  • the encoder of the embodiment of the present invention includes: a signal detecting unit 300 and a signal encoding unit 310, wherein the signal detecting unit 300 is configured to detect whether a voice signal is included in the downmix signal of the Nth frame, under the Nth frame.
  • the mixed signal is obtained by mixing the Nth frame audio signals of the two channels in the multi-channel based on a predetermined first algorithm, and N is a positive integer greater than zero; the signal encoding unit 310 is configured to detect the first in the signal detecting unit 300.
  • the N-th frame downmix signal When the N-frame downmix signal includes a voice signal, the N-th frame downmix signal is encoded, and when the signal detecting unit 300 detects that the Nth frame downmix signal does not include the voice signal: if the signal detecting unit 300 determines the Nth frame If the downmix signal satisfies the preset audio frame coding condition, the Nth frame downmix signal is encoded; if the signal detection unit 300 determines that the Nth frame downmix signal does not satisfy the preset audio frame coding condition, then the Nth frame is not Mixed signal coding.
  • the signal encoding unit 310 includes a first signal encoding unit 311 and a second signal encoding unit 312.
  • the signal detection unit 300 detects that the N-th downmix signal includes a voice signal
  • the signal detection unit The unit 300 notifies the first signal encoding unit 311 to encode the Nth frame downmix signal;
  • the signal detecting unit 300 determines that the Nth frame downmix signal satisfies the preset C frame encoding condition, the first signal encoding unit 311 is notified to encode the Nth frame downmix signal;
  • the first signal encoding unit 311 is configured to encode the Nth frame downmix signal according to the preset speech frame encoding rate
  • the second signal encoding unit 312 is notified to encode the Nth frame downmix signal. Specifically, the second signal encoding unit 312 encodes the Nth frame downmix signal according to the preset SID encoding rate; wherein the SID encoding rate is not greater than the speech frame encoding rate.
  • the encoder shown in FIG. 3a and FIG. 3b further includes a parameter generating unit 320 and parameters.
  • the encoder is based on a parameter used when the predetermined first algorithm mixes the Nth frame audio signal, and Z is a positive integer greater than zero; the parameter encoding unit 330 is configured to include the voice signal in the Nth frame downmix signal detected by the signal detecting unit.
  • the signal detecting unit 300 And encoding the Nth frame stereo parameter set, and when the signal detecting unit 300 detects that the Nth frame downmix signal does not include the voice signal: if the signal detecting unit 300 determines that the Nth frame stereo parameter set satisfies the preset stereo
  • the parameter encoding condition encodes at least one stereo parameter in the Nth frame stereo parameter set; if the signal detecting unit 300 determines that the Nth frame stereo parameter set does not satisfy the preset stereo parameter encoding condition, the stereo parameter set is not encoded.
  • the parameter encoding unit 330 is configured to obtain X target stereo parameters according to a preset stereo parameter dimension reduction rule according to Z stereo parameters in the Nth frame stereo parameter set, and encode X target stereo parameters, Where X is a positive integer greater than zero and less than or equal to Z.
  • the second parameter encoding unit 332 is configured to follow the preset Z parameters according to the Z stereo parameters in the Nth frame stereo parameter set. Stereo parameter reduction rule, get X target stereo parameters, and encode X target stereo parameters.
  • the encoder parameter generating unit 320 shown in FIG. 3c includes a first parameter generating unit 321 and a second parameter generating unit 322, and the signal detecting unit 300 detects the Nth.
  • the first parameter generating unit 321 is notified to generate the first The N-frame stereo parameter set; the signal detecting unit 300 detects that the N-th frame audio signal does not include the voice signal, and the N-th frame audio signal does not satisfy the preset voice frame coding condition, and notifies the second parameter generating unit 322 to generate the Nth frame.
  • the stereo parameter set specifically, the pre-defined first parameter generating unit 321 obtains the Nth frame stereo parameter set based on the first stereo parameter set generation manner according to the Nth frame audio signal, and the second parameter generating unit 322 according to the Nth frame audio. Signal, based on the second stereo parameter set generation method, The Nth frame stereo parameter set.
  • the first stereo parameter set generation manner and the second stereo parameter set generation manner satisfy at least one of the following conditions:
  • the number of stereo parameter types included in the stereo parameter set specified by the first stereo parameter set generation mode is not less than the number of stereo parameter types included in the stereo parameter set specified by the second stereo parameter set generation mode, and the first stereo parameter
  • the number of stereo parameters included in the stereo parameter set specified by the set generation mode is not less than the number of stereo parameters included in the stereo parameter set specified by the second stereo parameter set generation mode, and the stereo set by the first stereo parameter set generation mode
  • the resolution of the parameter in the time domain is not lower than the resolution of the corresponding stereo parameter specified by the second stereo parameter set generation mode, and the resolution of the stereo parameter specified by the first stereo parameter set generation mode is not lower than the resolution in the frequency domain.
  • the second stereo parameter set generation mode specifies the resolution of the corresponding stereo parameter in the frequency domain.
  • the second parameter generating unit 322 encodes the Nth frame stereo parameter set by the parameter encoding unit 330.
  • the parameter encoding unit 330 includes the first parameter encoding unit. 331 and the second parameter encoding unit 332, the Nth frame stereo parameter set generated by the first parameter generating unit 321 is encoded by the first parameter encoding unit 331; and generated by the second parameter encoding unit 332 by the second parameter encoding unit 332.
  • the Nth frame stereo parameter set encoding; the encoding mode of the first parameter encoding unit 331 is pre-defined as the first encoding mode, and the encoding mode of the second parameter encoding unit 332 is pre-defined as the second encoding mode, wherein the first parameter encoding unit specifies The coding mode is the first coding mode, and the coding mode specified by the second parameter coding unit is the second coding mode.
  • the coding rate specified by the first coding mode is not less than the coding rate specified by the second coding mode; and/or, For any stereo parameter in the Nth stereo parameter set, the quantization accuracy specified by the first coding mode is not A second quantization accuracy in a predetermined encoding scheme.
  • the stereo parameter set is not encoded.
  • the parameter encoding unit 330 includes a first parameter encoding unit 331 and a second parameter encoding.
  • the unit 332 specifically, the first parameter encoding unit 331 is configured to include a voice signal in the Nth frame downmix signal and a voice signal in the Nth frame downmix signal but satisfy the voice frame coding condition, according to the first code.
  • the method encodes the Nth frame stereo parameter set; the second parameter encoding unit 332 is configured to: when the Nth frame downmix signal does not satisfy the speech frame encoding condition, according to the second encoding mode, at least one stereo in the Nth frame stereo parameter set.
  • Parameter encoding is configured to: when the Nth frame downmix signal does not satisfy the speech frame encoding condition, according to the second encoding mode, at least one stereo in the Nth frame stereo parameter set.
  • the coding rate specified by the first coding mode is not less than the coding rate specified by the second coding mode; and/or, for any stereo parameter in the Nth frame stereo parameter set, the quantization accuracy specified by the first coding mode is not lower than The quantization accuracy specified by the second coding method.
  • At least one stereo parameter in the set of stereo parameters of the Nth frame comprises: an inter-channel level difference ILD; the preset stereo parameter encoding condition includes: D L ⁇ D 0 ;
  • D L represents a degree of deviation of the ILD from the first standard
  • the first criterion is a T-frame stereo parameter set before the N-th stereo parameter set
  • T is a positive integer greater than 0, determined according to a predetermined second algorithm
  • At least one stereo parameter in the set of stereo parameters of the Nth frame comprises: an inter-channel time difference ITD; the preset stereo parameter encoding condition includes: D T ⁇ D 1 ;
  • D T represents a degree of deviation of the ITD from the second standard
  • the second criterion is based on a T-frame stereo parameter set before the N-th stereo parameter set, and T is a positive integer greater than 0, determined based on a predetermined third algorithm
  • At least one stereo parameter in the Nth stereo parameter set includes: an inter-channel phase difference IPD; the preset stereo parameter encoding condition includes: D p ⁇ D 2 ;
  • D P represents the degree of deviation of the IPD from the third standard
  • the third criterion is determined according to a predetermined fourth algorithm according to the T-frame stereo parameter set before the N-th stereo parameter set, and T is a positive integer greater than 0.
  • D L , D T , and D P respectively satisfy the following expressions:
  • ILD(m) is a level difference when the N channel transmits the Nth frame audio signal in the mth subband
  • M is the total number of subbands occupied by the Nth frame audio signal.
  • T is a positive integer greater than 0
  • ILD [-t] (m) is the two channels in the mth subband respectively.
  • the level difference when transmitting the t-th frame audio signal before the Nth frame audio signal, and the ITD is the time difference when the N-channel audio signal is transmitted by the two channels respectively.
  • ITD [-t] The average value of the ITD in the T-frame stereo parameter set before the Nth frame, ITD [-t] is the time difference when the t-frame audio signal before the N-th frame audio signal is transmitted separately, IPD ( m) a phase difference value when the two channels respectively transmit a partial audio signal in the Nth frame audio signal in the mth subband, The average value of the IPD in the mth subband in the T frame stereo parameter set before the Nth frame, IPD [-t] (m) is the two channels before the transmission of the Nth frame audio signal in the mth subband respectively The phase difference value of the t-th frame audio signal.
  • the parameter detecting unit 340 shown in FIG. 3a to FIG. 3d is optional, that is, the parameter detecting unit 340 may be present in the encoder, or the parameter detecting unit 340 may not be present.
  • the parameter encoding unit 330 encodes each frame of the stereo parameter set by the parameter generating unit 320, it is not necessary to detect the stereo parameter and directly encode it.
  • the decoder of the embodiment of the present invention includes: a receiving unit 400 and a decoding unit 410, where the receiving unit 400 is configured to receive a code stream, where the code stream includes at least two frames, and at least two frames exist.
  • At least one first type frame and at least one second type frame the first type of frame includes a downmix signal, the second type of frame does not include a downmix signal; and for the Nth frame code stream, N is a positive integer greater than one
  • the decoding unit 410 is configured to: if it is determined that the Nth frame code stream is the first type of frame, then the Nth frame code The stream decoding is performed to obtain the Nth frame downmix signal; if the Nth frame code stream is determined to be the second type frame, the at least one frame downmix signal before the Nth frame downmix signal is determined according to the preset first rule.
  • the m frame is downmixed, and according to the m frame downmix signal, based on the predetermined first algorithm, the Nth frame downmix signal is obtained, where m is a positive integer greater than zero;
  • the Nth frame downmix signal is obtained by the encoder mixing the Nth frame audio signals of the two channels of the multichannel based on a predetermined second algorithm.
  • the decoder shown in FIG. 4 further includes a signal restoring unit 420.
  • the first type of frame includes a downmix signal and a stereo parameter set
  • the second type of frame includes a stereo parameter set and does not include a downmix signal:
  • Decoding unit 410 if it is determined that the Nth frame code stream is the first type of frame, decodes the Nth frame code stream, and obtains the Nth frame stereo parameter set while obtaining the Nth frame downmix signal; if the Nth frame is determined The code stream is a second type of frame, and the Nth frame code stream is decoded to obtain an Nth frame stereo parameter set; wherein at least one stereo parameter in the Nth frame stereo parameter set is used by the decoder based on the predetermined third algorithm The N frame downmix signal is restored to the Nth frame audio signal;
  • the signal restoring unit 420 is configured to restore the Nth frame downmix signal to the Nth frame audio signal according to the third algorithm according to at least one stereo parameter in the Nth frame stereo parameter set.
  • the first type of frame includes a downmix signal and a stereo parameter set
  • the second type of frame does not include a downmix signal and does not include a stereo parameter set
  • the decoding unit 410 is further configured to: if the Nth frame code stream is determined to be the first type of frame, decode the Nth frame code stream, and obtain the Nth frame stereo parameter set while obtaining the Nth frame downmix signal; if The Nth frame code stream is a second type of frame, and according to the preset second rule, the k frame stereo parameter set is determined from the at least one frame stereo parameter set before the Nth frame stereo parameter set, and according to the k frame stereo parameter set. And obtaining, according to a predetermined fourth algorithm, a set of stereo parameters of the Nth frame, where k is a positive integer greater than zero;
  • the at least one stereo parameter in the Nth frame stereo parameter set is used by the decoder to restore the Nth frame downmix signal to the Nth frame audio signal based on a predetermined third algorithm;
  • a signal restoring unit 420 configured to perform, according to at least one stereo in the set of stereo parameters of the Nth frame
  • the acoustic parameter based on the third algorithm, restores the Nth frame downmix signal to the Nth frame audio signal.
  • the first type of frame includes a downmix signal and a stereo parameter set
  • the third type of frame includes a stereo parameter set and does not include a downmix signal
  • the fourth type of frame does not include a downmix signal and does not include a stereo parameter set.
  • the third type of frame and the fourth type of frame are respectively a case of the second type of frame:
  • the decoding unit 410 is further configured to: if the Nth frame code stream is determined to be the first type of frame, decode the Nth frame code stream, and obtain the Nth frame stereo parameter set while obtaining the Nth frame downmix signal; if The Nth frame code stream is the second type of frame: when the Nth frame code stream is the third type of frame, the Nth frame code stream is decoded to obtain the Nth frame stereo parameter set; when the Nth frame code stream is the fourth In the case of a type frame, determining a k-frame stereo parameter set from at least one frame stereo parameter set preceding the N-th stereo parameter set according to a preset second rule, and based on the k-frame stereo parameter set, based on a predetermined fourth algorithm, Obtaining a set of stereo parameters of the Nth frame, where k is a positive integer greater than zero;
  • the at least one stereo parameter in the Nth frame stereo parameter set is used by the decoder to restore the Nth frame downmix signal to the Nth frame audio signal based on a predetermined third algorithm;
  • the signal restoring unit 420 is configured to restore the Nth frame downmix signal to the Nth frame audio signal according to the third algorithm according to at least one stereo parameter in the Nth frame stereo parameter set.
  • the fifth type frame includes a downmix signal and a stereo parameter set
  • the sixth type frame includes a downmix signal and does not include a stereo parameter set
  • the fifth type frame and the sixth type frame are respectively the first type frame.
  • the second type of frame does not contain a downmix signal and does not contain a stereo parameter set:
  • the decoding unit 410 is further configured to: if determining that the Nth frame code stream is the first type of frame: when the Nth frame code stream is the fifth type of frame, decoding the Nth frame code stream, and obtaining the Nth frame downmix signal At the same time, the Nth frame stereo parameter set is also obtained; when the Nth frame code stream is the sixth type frame, according to the preset second rule, the at least one frame stereo parameter set before the Nth frame stereo parameter set is determined. a k-frame stereo parameter set, and based on a k-frame stereo parameter set, obtains an Nth frame stereo parameter set based on a predetermined fourth algorithm;
  • the decoding unit 410 is further configured to: if it is determined that the Nth frame code stream is the second type of frame, according to the preset second a rule, determining a k-frame stereo parameter set from at least one frame stereo parameter set before the Nth frame stereo parameter set, and obtaining an Nth frame stereo parameter set based on the predetermined fourth algorithm according to the k-frame stereo parameter set;
  • the at least one stereo parameter of the Nth frame stereo parameter set is used by the decoder to restore the Nth frame downmix signal to the Nth frame audio signal based on a predetermined third algorithm, where k is a positive integer greater than zero;
  • the signal restoring unit 420 is configured to restore the Nth frame downmix signal to the Nth frame audio signal according to the third algorithm according to at least one stereo parameter in the Nth frame stereo parameter set.
  • the fifth type frame includes a downmix signal and a stereo parameter set
  • the sixth type frame includes a downmix signal and does not include a stereo parameter set
  • the fifth type frame and the sixth type frame are respectively the first type frame.
  • the third type of frame includes a stereo parameter set and does not include a downmix signal
  • the fourth type of frame does not include a downmix signal and does not include a stereo parameter set
  • the third type frame and the fourth type frame are respectively second.
  • the decoding unit 410 is further configured to: if the Nth frame code stream is determined to be the first type of frame: when the Nth frame code stream is the fifth type of frame, decode the Nth frame code stream, and obtain the Nth frame downmix signal And obtaining the Nth frame stereo parameter set; when the Nth frame code stream is the sixth type frame, determining, according to the preset second rule, from at least one frame of the stereo parameter set before the Nth frame stereo parameter set, determining k a set of stereo parameters of the frame, and obtaining a set of stereo parameters of the Nth frame based on the predetermined fourth algorithm according to the k-frame stereo parameter set;
  • the decoding unit 410 is further configured to: if the Nth frame code stream is determined to be the second type frame, when the Nth frame code stream is the third type frame, the Nth frame code stream is decoded to obtain the Nth frame stereo parameter set; When the Nth frame code stream is the fourth type of frame, the k frame stereo parameter set is determined from the at least one frame stereo parameter set before the Nth frame stereo parameter set according to the preset second rule, and according to the k frame stereo parameter. The set, based on the predetermined fourth algorithm, obtains a set of stereo parameters of the Nth frame;
  • the at least one stereo parameter of the Nth frame stereo parameter set is used by the decoder to restore the Nth frame downmix signal to the Nth frame audio signal based on a predetermined third algorithm, where k is a positive integer greater than zero;
  • the signal restoring unit 420 is configured to restore the Nth frame downmix signal to the Nth frame audio signal according to the third algorithm according to at least one stereo parameter in the Nth frame stereo parameter set.
  • the codec system of the embodiment of the present invention includes any of the encoders 500 shown in FIGS. 3a to 3b, and the decoder 510 shown in FIG.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Stereophonic System (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)

Abstract

一种处理多声道音频信号方法、装置和系统,涉及音频编解码技术领域,用以解决现有技术中多声道音频通信系统不能非连续传输音频信号的问题。其中,编码器包括:信号检测单元和信号编码单元,信号编码单元用于在信号检测单元检测到第N帧下混信号中包含语音信号时,对第N帧下混信号编码,以及在信号检测单元检测到第N帧下混信号中不包含语音信号时:若信号检测单元确定第N帧下混信号满足预设的音频帧编码条件,则对第N帧下混信号编码;若信号检测单元确定第N帧下混信号不满足预设的音频帧编码条件,则不对第N帧下混信号编码。这种技术方案由于对下混信号的编码是非连续的,因此解决了现有技术中不能非连续传输音频信号的问题。

Description

一种处理多声道音频信号的方法、装置和系统 技术领域
本发明涉及音频编解码技术领域,特别涉及一种处理多声道音频信号的方法、装置和系统。
背景技术
在音频通信中,为了增加通信系统的容量,通常在发送端对被传输的原始的每帧音频信号先编码再进行传输,通过编码实现了对音频信号的压缩,当接收端接收到信号后,对接收到的信号解码,然后恢复出原始音频信号。其中,为了实现对音频信号的最大化压缩,针对不同类型的音频信号,采用不同类型的编码方式。现有技术中,当音频信号为语音信号时,通常采用连续编码的方式,即分别对每帧语音信号编码,当音频信号为噪声信号时,通常采用非连续编码的方式对噪声信号编码,即每隔若干帧的噪声信号对一帧噪声信号编码,例如每隔六帧对噪声信号编码,对第一帧噪声信号编码后,则不再对第二帧至第七帧噪声信号编码,然后对第八帧噪声信号编码,在该第二帧到第七帧分别为六个No_Data帧。具体的,上述音频信号指的是单声道的音频信号。
随着音频通信技术的发展,在音频通信系统中还有一种特别的通信方式:立体声通信,以立体声通信为双声道通信为例,其中双声道包括第一声道和第二声道,发送端根据第一声道的第n帧语音信号和第二声道中的第n帧语音信号,得到用于将第一声道的第n帧语音信号和第二声道中的第n帧语音信号混合为一帧下混信号的立体声参数后,其中,下混信号为单通道信号,然后,发送端将双声道中的第n帧语音信号混合为一帧下混信号,n为大于零的正整数,再对该帧下混信号编码,最后将编码后的下混信号和立体声参数发送到接收端,接收端在接收到编码后的下混信号和立体声参数后,对编码 后的下混信号解码,然后根据立体声参数将下混信号还原为双声道信号,这种传输方式与分别对双声道中的每帧语音信号都编码相比,大大降低了传输的比特数,从而达到了压缩的目的。
但是,当在立体声通信中,传输的是噪声信号时,采用的还是与语音信号相同的编码方式,若直接将单声道中非连续编码的方式应用在在立体声通信中,则在接收端不能将噪声信号还原,导致接收端的用户主观体验变差。
发明内容
本发明提供一种处理多声道音频信号的方法、装置和系统,用以解决现有技术中多声道音频通信系统不能非连续传输音频信号的问题。
第一方面,提供了一种处理多声道音频信号的方法,包括:编码器检测第N帧下混信号中是否包含语音信号,在检测到第N帧下混信号中包含语音信号时,对第N帧下混信号编码;在检测到第N帧下混信号中不包含语音信号时:若确定第N帧下混信号满足预设的音频帧编码条件,则对第N帧下混信号编码;若确定第N帧下混信号不满足预设的音频帧编码条件,则不对第N帧下混信号编码;其中,第N帧下混信号是由多声道中两个声道的第N帧音频信号基于预定第一算法混合后得到的,N为大于零的正整数。
由于编码器只有在下混信号中包含语音信号或者下混信号满足预设的音频帧编码条件时,才对下混信号编码,否则不对下混信号编码,从而使得编码器实现了对下混信号的非连续编码,提高了对下混信号的压缩效率。
需要说明的是,在本发明实施例中,预设的音频帧编码条件中包括第一帧下混信号,也就是说,在第一帧下混信号中不包含语音信号时,第一帧下混信号满足预设的音频帧编码条件,对第一帧下混信号编码。
在第一方面的基础上,为更大程度实现对下混信号的压缩效率,可选的,编码器在检测到第N帧下混信号中包含语音信号时,根据预设的语音帧编码速率对第N帧下混信号编码;在检测到第N帧下混信号中不包含语音信 号时:若确定第N帧下混信号满足预设的语音帧编码条件,则根据预设的语音帧编码速率对第N帧下混信号编码;若确定第N帧下混信号不满足预设的语音帧编码条件、但满足预设的SID编码条件,则根据预设的SID编码速率对第N帧下混信号编码;其中,SID编码速率小于语音帧编码速率。
应理解,在具体实现时,若确定第N帧下混信号不满足预设的语音帧编码条件、但满足预设的SID编码条件,则预设的SID编码速率对第N帧下混信号进行SID编码,与语音信号编码相比,进一步提高了下混信号的压缩效率。此外,需要说明的是,在第一方面以及上述技术方案中,为了避免解码器无法将下混信号还原,还需将立体声参数集合编码。
在第一方面的基础上,为了再进一步提高多声道通信系统的压缩效率,可选的,编码器对立体声参数集合进行非连续编码,具体的,编码器根据第N帧音频信号,得到第N帧立体声参数集合,在检测到第N帧下混信号中包含语音信号时,则对第N帧立体声参数集合编码;在检测到第N帧下混信号中不包含语音信号时:若确定第N帧立体声参数集合满足预设的立体声参数编码条件,则对第N帧立体声参数集合中的至少一个立体声参数编码;若确定第N帧立体声参数集合不满足预设的立体声参数编码条件,则不对立体声参数集合编码;其中,第N帧立体声参数集合中包括Z个立体声参数,Z个立体声参数包括编码器基于预定算法对第N帧音频信号混合时所用到的参数,Z为大于零的正整数。
在第一方面的基础上,可选的,为了更进一步提高多声道通信系统的压缩效率,编码器在对第N帧立体声参数集合中的至少一个立体声参数编码前,根据第N帧立体声参数集合中的Z个立体声参数,按照预设的立体声参数降维规则,得到X个目标立体声参数,然后再对X个目标立体声参数编码,其中,X为大于零且小于等于Z的正整数。
其中,预设的立体声参数降维规则可以为预设的立体声参数类型,即从第N帧立体声参数集合中选出符合预设的立体声参数类型的X个立体声参数,或者,预设的立体声参数降维规则为预设的立体声参数个数,即从第N 帧立体声参数集合中选出X个立体声参数,或者,预设的立体声参数降维规则为针对第N帧立体声参数集合中至少一个立体声参数降低在时域或频域的分辨率,即按照降低后的至少一个立体声参数在时域或频域的分辨率,基于Z个立体声参数确定出X个目标立体声参数。
在第一方面的基础上,可选的,还可通过下述方法,提高多声道通信系统的压缩效率:
编码器在检测到第N帧音频信号包含语音信号时:根据第N帧音频信号,基于第一立体声参数集合生成方式,得到第N帧立体声参数集合,并对第N帧立体声参数集合编码;在检测到第N帧音频信号不包含语音信号时:若确定第N帧音频信号满足预设的语音帧编码条件,则根据第N帧音频信号,基于第一立体声参数集合生成方式,得到第N帧立体声参数集合,并对第N帧立体声参数集合编码;若确定第N帧音频信号不满足预设的语音帧编码条件,则根据第N帧音频信号,基于第二立体声参数集合生成方式,得到第N帧立体声参数集合,并在确定第N帧立体声参数集合满足预设的立体声参数编码条件时,对第N帧立体声参数集合中的至少一个立体声参数编码;在确定第N帧立体声参数集合不满足预设的立体声参数编码条件时,不对立体声参数集合编码;
其中,第一立体声参数集合生成方式和第二立体声参数集合生成方式满足下列至少一个条件:
第一立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数类型的个数不少于第二立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数类型的个数,第一立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数的个数不少于第二立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数的个数,第一立体声参数集合生成方式规定的立体声参数在时域的分辨率不低于第二立体声参数集合生成方式规定的对应的立体声参数在时域的分辨率,第一立体声参数集合生成方式规定的立体声参数在频域的分辨率不低于第二立体声参数集合生成方式规定的 对应的立体声参数在频域的分辨率。
在第一方面的基础上,可选的,编码器在第N帧下混信号中包含语音信号时,根据第一编码方式对第N帧立体声参数集合编码;在第N帧下混信号满足语音帧编码条件时,根据第一编码方式对第N帧立体声参数集合中的至少一个立体声参数编码;在第N帧下混信号不满足语音帧编码条件时,根据第二编码方式对第N帧立体声参数集合中的至少一个立体声参数编码;
其中,第一编码方式规定的编码速率不小于第二编码方式规定的编码速率;和/或,针对所述第N帧立体声参数集合中的任一立体声参数,第一编码方式规定的量化精度不低于第二编码方式规定的量化精度。
例如,第N帧立体声参数集合中包括IPD和ITD,第一编码方式中规定的IPD的量化精度不低于第二编码方式中规定的IPD的量化精度,第一编码方式中规定的ITD的量化精度不低于第二编码方式中规定的ITD的量化精度。
在第一方面的基础上,可选的,通常情况下,若第N帧立体声参数集合中的至少一个立体声参数包括:声道间电平差ILD;预设立体声参数编码条件中包括:DL≥D0
其中,DL表示ILD与第一标准的偏离程度,第一标准是根据第N帧立体声参数集合之前的T帧立体声参数集合,基于预定第二算法确定的,T为大于0的正整数;
若第N帧立体声参数集合中的至少一个立体声参数包括:声道间时间差ITD;预设立体声参数编码条件中包括:DT≥D1
其中,DT表示ITD与第二标准的偏离程度,第二标准是根据第N帧立体声参数集合之前的T帧立体声参数集合,基于预定第三算法确定的,T为大于0的正整数;
若第N帧立体声参数集合中的至少一个立体声参数包括:声道间相位差IPD;预设立体声参数编码条件中包括:Dp≥D2
其中,DP表示IPD与第三标准的偏离程度,第三标准是根据第N帧立体 声参数集合之前的T帧立体声参数集合,基于预定第四算法确定的,T为大于0的正整数。
其中,第二算法、第三算法以及第四算法是根据实际情况需要预先设置的。
可选的,DL、DT、DP分别满足下列表达式:
Figure PCTCN2016100617-appb-000001
Figure PCTCN2016100617-appb-000002
Figure PCTCN2016100617-appb-000003
其中,ILD(m)为两声道分别在第m个子频带传输第N帧音频信号时的电平差值,M为传输第N帧音频信号所占用的子频带的总个数,
Figure PCTCN2016100617-appb-000004
为在第N帧之前的T帧立体声参数集合中在第m个子频带的ILD的平均值,T为大于0的正整数,ILD[-t](m)为两声道分别在第m个子频带传输第N帧音频信号之前的第t帧音频信号时的电平差值,ITD为两声道分别传输第N帧音频信号时的时间差值,
Figure PCTCN2016100617-appb-000005
为在第N帧之前的T帧立体声参数集合中的ITD的平均值,ITD[-t]为两声道分别传输第N帧音频信号之前的第t帧音频信号时的时间差值,IPD(m)为两声道分别在第m个子频带传输第N帧音频信号中的部分音频信号时的相位差值,
Figure PCTCN2016100617-appb-000006
为在第N帧之前的T帧立体声参数集合中在第m个子频带的IPD的平均值,IPD[-t](m)为两声道分别在第m个子频带传输第N帧音频信号之前的第t帧音频信号时的相位差值。
第二方面,提供了一种处理多声道音频信号的方法,包括:解码器接收到码流,码流包括至少两个帧,至少两个帧中存在至少一个第一类型帧和至少一个第二类型帧,第一类型帧中包含下混信号,第二类型帧中不包含下混信号;针对第N帧码流,N为大于1的正整数:解码器若确定第N帧码流为 第一类型帧,则对第N帧码流解码,得到第N帧下混信号;解码器若确定第N帧码流为第二类型帧,则根据预设第一规则,从第N帧下混信号之前的至少一帧下混信号中,确定m帧下混信号,并根据m帧下混信号,基于预定第一算法,得到第N帧下混信号,m为大于零的正整数;其中,第N帧下混信号是编码器由多声道中两个声道的第N帧音频信号基于预定第二算法混合后得到的。
由于解码器接收到的码流中包括第一类型帧和第二类型帧,其中第一类型帧中包括下混信号,第二类型帧中不包括下混信号,也就是说,在编码器并非对每帧下混信号都进行了编码,从而实现了下混信号的非连续传输,提高了多声道音频通信系统下混信号的压缩效率。
需要说明的是,在本发明实施例中,第一帧码流为第一类型帧,具体的,为了在解码第一帧码流后,将得到的下混信号还原为两声道中的音频信号,在第一帧码流中还需要包括立体声参数集合。具体的,由于第一类型帧中包含下混信号,第二类型帧中不包含下混信号,因此,第一类型帧的大小大于第二类型帧的大小,解码器可以通过根据第N帧码流的大小来判断第N帧码流为第一类型帧还是第二类型帧,此外,还可以在第N帧码流中封装标识位,解码器在对第N帧码流部分解码后得到标识位,若标识位指示第N帧码流为第一类型帧,则解码器对第N帧码流解码得到第N帧下混信号;若标识位指示第N帧码流为第二类型帧,则解码器根据预定第一算法得到第N帧下混信号。
在第二方面的基础上,为了将下混信号还原为两声道中的音频信号,保证音频信号的通信质量,可选的,第一类型帧中包含下混信号和立体声参数集合,第二类型帧中包含立体声参数集合且不包含下混信号:解码器若确定第N帧码流为第一类型帧,则对第N帧码流解码之后,在得到第N帧下混信号的同时,还得到第N帧立体声参数集合,并根据第N帧立体声参数集合中的至少一个立体声参数,基于预定第三算法,将第N帧下混信号还原为第N帧音频信号;解码器若确定第N帧码流为第二类型帧,则对第N帧码流解 码,得到第N帧立体声参数集合,以及基于预定第一算法,得到第N帧下混信号,然后解码器根据第N帧立体声参数集合中的至少一个立体声参数,基于预定第三算法,将第N帧下混信号还原为第N帧音频信号。
在第二方面的基础上,为了将下混信号还原为两声道中的音频信号,保证音频信号的通信质量,可选的,第一类型帧中包含下混信号和立体声参数集合,第二类型帧中不包含下混信号且不包含立体声参数集合;解码器若确定第N帧码流为第一类型帧,则对第N帧码流解码,在得到第N帧下混信号的同时,还得到第N帧立体声参数集合;然后,根据第N帧立体声参数集合中的至少一个立体声参数,基于第三算法,将第N帧下混信号还原为第N帧音频信号;解码器若确定第N帧码流为第二类型帧,则基于预定第一算法得到第N帧下混信号,以及根据预设第二规则,从第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据k帧立体声参数集合,基于预定第四算法,得到第N帧立体声参数集合,然后,根据第N帧立体声参数集合中的至少一个立体声参数,基于第三算法,将第N帧下混信号还原为第N帧音频信号,k为大于零的正整数。
在第二方面的基础上,为了将下混信号还原为两声道中的音频信号,保证音频信号的通信质量,可选的,第一类型帧中包含下混信号和立体声参数集合,第三类型帧中包含立体声参数集合且不包含下混信号,第四类型帧中不包含下混信号且不包含立体声参数集合,第三类型帧和第四类型帧分别为第二类型帧的一种情况:
解码器若确定第N帧码流为第一类型帧,则对第N帧码流解码,在得到第N帧下混信号的同时,还得到第N帧立体声参数集合,并根据第N帧立体声参数集合中的至少一个立体声参数,基于第三算法,将第N帧下混信号还原为第N帧音频信号。
解码器若确定第N帧码流为第二类型帧,包括两种情况:
当第N帧码流为第三类型帧时,则对第N帧码流解码,得到第N帧立体声参数集合,以及基于预定第一算法得到第N帧下混信号,并根据第N帧立 体声参数集合中的至少一个立体声参数,基于第三算法,将第N帧下混信号还原为第N帧音频信号;
当第N帧码流为第四类型帧时,则根据预设第二规则,从第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据k帧立体声参数集合,基于预定第四算法,得到第N帧立体声参数集合,k为大于零的正整数,以及基于预定第一算法得到第N帧下混信号,并根据第N帧立体声参数集合中的至少一个立体声参数,基于第三算法,将第N帧下混信号还原为第N帧音频信号。
在第二方面的基础上,为了将下混信号还原为两声道中的音频信号,保证音频信号的通信质量,可选的,第五类型帧中包含下混信号和立体声参数集合,第六类型帧中包含下混信号且不包含立体声参数集合,第五类型帧和第六类型帧分别为第一类型帧的一种情况,第二类型帧中不包含下混信号且不包含立体声参数集合:
解码器若确定第N帧码流为第一类型帧,包括两种情况:
当第N帧码流为第五类型帧时,则对第N帧码流解码,在得到第N帧下混信号的同时,还得到第N帧立体声参数集合,并根据第N帧立体声参数集合中的至少一个立体声参数,基于第三算法,将第N帧下混信号还原为第N帧音频信号;
当第N帧码流为第六类型帧时,则对第N帧码流解码,得到第N帧下混信号,以及根据预设第二规则,从第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据k帧立体声参数集合,基于预定第四算法,得到第N帧立体声参数集合,并根据第N帧立体声参数集合中的至少一个立体声参数,基于第三算法,将第N帧下混信号还原为第N帧音频信号;
解码器若确定第N帧码流为第二类型帧,则基于预定第一算法得到第N帧下混信号,以及根据预设第二规则,从第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据k帧立体声参数 集合,基于预定第四算法,得到第N帧立体声参数集合,并根据第N帧立体声参数集合中的至少一个立体声参数,基于第三算法,将第N帧下混信号还原为第N帧音频信号。
在第二方面的基础上,为了将下混信号还原为两声道中的音频信号,保证音频信号的通信质量,可选的,第五类型帧中包含下混信号和立体声参数集合,第六类型帧中包含下混信号且不包含立体声参数集合,第五类型帧和第六类型帧分别为第一类型帧的一种情况,第三类型帧中包含立体声参数集合且不包含下混信号,第四类型帧中不包含下混信号且不包含立体声参数集合,第三类型帧和第四类型帧分别为第二类型帧的一种情况:
解码器若确定第N帧码流为第一类型帧,包括两种情况:
当第N帧码流为第五类型帧时,则对第N帧码流解码之后,得到第N帧下混信号的同时,还得到第N帧立体声参数集合,并根据第N帧立体声参数集合中的至少一个立体声参数,基于第三算法,将第N帧下混信号还原为第N帧音频信号;
当第N帧码流为第六类型帧时,则对第N帧码流解码之后,得到第N帧下混信号,以及根据预设第二规则,从第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据k帧立体声参数集合,基于预定第四算法,得到第N帧立体声参数集合,并根据第N帧立体声参数集合中的至少一个立体声参数,基于第三算法,将第N帧下混信号还原为第N帧音频信号;
解码器若确定第N帧码流为第二类型帧,包括两种情况:
当第N帧码流为第三类型帧时,则对第N帧码流解码,得到第N帧立体声参数集合,以及基于预定第一算法得到第N帧下混信号,并根据第N帧立体声参数集合中的至少一个立体声参数,基于第三算法,将第N帧下混信号还原为第N帧音频信号;
当第N帧码流为第四类型帧时,则根据预设第二规则,从第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并 根据k帧立体声参数集合,基于预定第四算法,得到第N帧立体声参数集合,k为大于零的正整数,以及基于预定第一算法得到第N帧下混信号,并根据第N帧立体声参数集合中的至少一个立体声参数,基于第三算法,将第N帧下混信号还原为第N帧音频信号。
第三方面,提供了一种编码器,包括:信号检测单元和信号编码单元,其中,信号检测单元用于检测第N帧下混信号中是否包含语音信号,第N帧下混信号是由多声道中两个声道的第N帧音频信号基于预定第一算法混合后得到的,N为大于零的正整数;信号编码单元用于在信号检测单元检测到第N帧下混信号中包含语音信号时,对第N帧下混信号编码,以及在信号检测单元检测到第N帧下混信号中不包含语音信号时:若信号检测单元确定第N帧下混信号满足预设的音频帧编码条件,则对第N帧下混信号编码;若信号检测单元确定第N帧下混信号不满足预设的音频帧编码条件,则不对第N帧下混信号编码。
在第三方面的基础上,可选的,信号编码单元包括第一信号编码单元和第二信号编码单元,在信号检测单元检测到第N帧下混信号中包含语音信号时,信号检测单元通知第一信号编码单元对第N帧下混信号编码;若信号检测单元确定第N帧下混信号满足预设的语音帧编码条件,则通知第一信号编码单元对第N帧下混信号编码,具体的,第一信号编码单元根据预设的语音帧编码速率对第N帧下混信号编码;若信号检测单元确定第N帧下混信号不满足预设的语音帧编码条件、但满足预设的静音插入帧SID编码条件,则通知第二信号编码单元对第N帧下混信号编码,具体的,第二信号编码单元根据预设的SID编码速率对第N帧下混信号编码;其中,SID编码速率不大于语音帧编码速率。
在第三方面的基础上,可选的,还包括参数生成单元、参数编码单元和参数检测单元,其中,参数生成单元用于根据第N帧音频信号,得到第N帧立体声参数集合,第N帧立体声参数集合中包括Z个立体声参数,Z个立体声参数包括编码器基于预定第一算法对第N帧音频信号混合时所用到的参 数,Z为大于零的正整数;参数编码单元用于在信号检测单元检测到第N帧下混信号中包含语音信号时,则对第N帧立体声参数集合编码,以及在信号检测单元检测到第N帧下混信号中不包含语音信号时:若参数检测单元确定第N帧立体声参数集合满足预设的立体声参数编码条件,则对第N帧立体声参数集合中的至少一个立体声参数编码;若参数检测单元确定第N帧立体声参数集合不满足预设的立体声参数编码条件,则不对立体声参数集合编码。
在第三方面的基础上,可选的,参数编码单元用于根据第N帧立体声参数集合中的Z个立体声参数,按照预设的立体声参数降维规则,得到X个目标立体声参数,并对X个目标立体声参数编码,其中,X为大于零且小于等于Z的正整数。
在第三方面的基础上,可选的,参数生成单元包括第一参数生成单元和第二参数生成单元;
信号检测单元检测到第N帧音频信号包含语音信号时或者信号检测单元检测到第N帧音频信号不包含语音信号、且第N帧音频信号满足预设的语音帧编码条件,通知第一参数生成单元生成第N帧立体声参数集合,具体的,第一参数生成单元根据第N帧音频信号,基于第一立体声参数集合生成方式,得到第N帧立体声参数集合,并通过参数编码单元对第N帧立体声参数集合编码,具体的,当参数编码单元包括第一参数编码单元和第二参数编码单元时,通过第一参数编码单元对第N帧立体声参数集合编码;其中,第一参数编码单元规定的编码方式为第一编码方式,第二参数编码单元规定的编码方式为第二编码方式,具体的,第一编码方式规定的编码速率不小于第二编码方式规定的编码速率;和/或,针对第N帧立体声参数集合中的任一立体声参数,第一编码方式规定的量化精度不低于第二编码方式规定的量化精度;
以及在信号检测单元检测到第N帧音频信号不包含语音信号时:第二参数生成单元根据第N帧音频信号,基于第二立体声参数集合生成方式,得到第N帧立体声参数集合,并在参数检测单元确定第N帧立体声参数集合满足预设的立体声参数编码条件时,通过参数编码单元对第N帧立体声参数集合 中的至少一个立体声参数编码;具体的,当参数编码单元包括第一参数编码单元和第二参数编码单元时,通过第二参数编码单元对第N帧立体声参数集合中的至少一个立体声参数编码;
在参数检测单元确定第N帧立体声参数集合不满足预设的立体声参数编码条件时,不对立体声参数集合编码;
其中,第一立体声参数集合生成方式和第二立体声参数集合生成方式满足下列至少一个条件:
第一立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数类型的个数不少于第二立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数类型的个数,第一立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数的个数不少于第二立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数的个数,第一立体声参数集合生成方式规定的立体声参数在时域的分辨率不低于第二立体声参数集合生成方式规定的对应的立体声参数在时域的分辨率,第一立体声参数集合生成方式规定的立体声参数在频域的分辨率不低于第二立体声参数集合生成方式规定的对应的立体声参数在频域的分辨率。
在第三方面的基础上,可选的,参数编码单元包括第一参数编码单元和第二参数编码单元,具体的,第一参数编码单元用于在第N帧下混信号中包含语音信号以及在第N帧下混信号中不包含语音信号但满足语音帧编码条件时,根据第一编码方式对第N帧立体声参数集合编码;第二参数编码单元用于在第N帧下混信号不满足语音帧编码条件时,根据第二编码方式对第N帧立体声参数集合中的至少一个立体声参数编码;
其中,第一编码方式规定的编码速率不小于第二编码方式规定的编码速率;和/或,针对第N帧立体声参数集合中的任一立体声参数,第一编码方式规定的量化精度不低于第二编码方式规定的量化精度。
在第三方面的基础上,可选的,若第N帧立体声参数集合中的至少一个立体声参数包括:声道间电平差ILD;预设立体声参数编码条件中包括: DL≥D0
其中,DL表示ILD与第一标准的偏离程度,第一标准是根据第N帧立体声参数集合之前的T帧立体声参数集合,基于预定第二算法确定的,T为大于0的正整数;
若第N帧立体声参数集合中的至少一个立体声参数包括:声道间时间差ITD;预设立体声参数编码条件中包括:DT≥D1
其中,DT表示ITD与第二标准的偏离程度,第二标准是根据第N帧立体声参数集合之前的T帧立体声参数集合,基于预定第三算法确定的,T为大于0的正整数;
若第N帧立体声参数集合中的至少一个立体声参数包括:声道间相位差IPD;预设立体声参数编码条件中包括:Dp≥D2
其中,DP表示IPD与第三标准的偏离程度,第三标准是根据第N帧立体声参数集合之前的T帧立体声参数集合,基于预定第四算法确定的,T为大于0的正整数。
在第三方面的基础上,可选的,DL、DT、DP分别满足下列表达式:
Figure PCTCN2016100617-appb-000007
Figure PCTCN2016100617-appb-000008
Figure PCTCN2016100617-appb-000009
其中,ILD(m)为两声道分别在第m个子频带传输第N帧音频信号时的电平差值,M为传输第N帧音频信号所占用的子频带的总个数,
Figure PCTCN2016100617-appb-000010
为在第N帧之前的T帧立体声参数集合中在第m个子频带的ILD的平均值,T为大于0的正整数,ILD[-t](m)为两声道分别在第m个子频带传输第N帧音频信号之前的第t帧音频信号时的电平差值,ITD为两声道分别传输第N帧 音频信号时的时间差值,
Figure PCTCN2016100617-appb-000011
为在第N帧之前的T帧立体声参数集合中的ITD的平均值,ITD[-t]为两声道分别传输第N帧音频信号之前的第t帧音频信号时的时间差值,IPD(m)为两声道分别在第m个子频带传输第N帧音频信号中的部分音频信号时的相位差值,
Figure PCTCN2016100617-appb-000012
为在第N帧之前的T帧立体声参数集合中在第m个子频带的IPD的平均值,IPD[-t](m)为两声道分别在第m个子频带传输第N帧音频信号之前的第t帧音频信号时的相位差值。
第四方面,提供了一种解码器,包括:接收单元和解码单元,其中,接收单元用于接收到码流,码流包括至少两个帧,至少两个帧中存在至少一个第一类型帧和至少一个第二类型帧,第一类型帧中包含下混信号,第二类型帧中不包含下混信号;针对第N帧码流,N为大于1的正整数,解码单元,用于:若确定第N帧码流为第一类型帧,则对第N帧码流解码,得到第N帧下混信号;若确定第N帧码流为第二类型帧,则根据预设第一规则,从第N帧下混信号之前的至少一帧下混信号中,确定m帧下混信号,并根据m帧下混信号,基于预定第一算法,得到第N帧下混信号,m为大于零的正整数;
其中,第N帧下混信号是编码器由多声道中两个声道的第N帧音频信号基于预定第二算法混合后得到的。
在第四方面的基础上,可选的,第一类型帧中包含下混信号和立体声参数集合,第二类型帧中包含立体声参数集合且不包含下混信号:
解码单元还用于若确定第N帧码流为第一类型帧,则对第N帧码流解码,在得到第N帧下混信号的同时,还得到第N帧立体声参数集合;若确定第N帧码流为第二类型帧,则对第N帧码流解码,得到第N帧立体声参数集合,第N帧立体声参数集合中的至少一个立体声参数用于解码器基于预定第三算法将第N帧下混信号还原为第N帧音频信号;
信号还原单元,用于根据第N帧立体声参数集合中的至少一个立体声参数,基于第三算法,将第N帧下混信号还原为第N帧音频信号。
在第四方面的基础上,可选的,第一类型帧中包含下混信号和立体声参数集合,第二类型帧中不包含下混信号且不包含立体声参数集合;
解码单元还用于若确定第N帧码流为第一类型帧,则对第N帧码流解码,在得到第N帧下混信号的同时,还得到第N帧立体声参数集合;若确定第N帧码流为第二类型帧,则根据预设第二规则,从第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据k帧立体声参数集合,基于预定第四算法,得到第N帧立体声参数集合,k为大于零的正整数;
其中,第N帧立体声参数集合中的至少一个立体声参数用于解码器基于预定第三算法将第N帧下混信号还原为第N帧音频信号;
信号还原单元,用于根据第N帧立体声参数集合中的至少一个立体声参数,基于第三算法,将第N帧下混信号还原为第N帧音频信号。
在第四方面的基础上,可选的,第一类型帧中包含下混信号和立体声参数集合,第三类型帧中包含立体声参数集合且不包含下混信号,第四类型帧中不包含下混信号且不包含立体声参数集合,第三类型帧和第四类型帧分别为第二类型帧的一种情况:
解码单元还用于若确定第N帧码流为第一类型帧,则对第N帧码流解码,在得到第N帧下混信号的同时,还得到第N帧立体声参数集合;若确定第N帧码流为第二类型帧:当第N帧码流为第三类型帧时,则对第N帧码流解码,得到第N帧立体声参数集合;当第N帧码流为第四类型帧时,则根据预设第二规则,从第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据k帧立体声参数集合,基于预定第四算法,得到第N帧立体声参数集合,k为大于零的正整数;
其中,第N帧立体声参数集合中的至少一个立体声参数用于解码器基于预定第三算法将第N帧下混信号还原为第N帧音频信号;
信号还原单元,用于根据第N帧立体声参数集合中的至少一个立体声参数,基于第三算法,将第N帧下混信号还原为第N帧音频信号。
在第四方面的基础上,可选的,第五类型帧中包含下混信号和立体声参数集合,第六类型帧中包含下混信号且不包含立体声参数集合,第五类型帧和第六类型帧分别为第一类型帧的一种情况,第二类型帧中不包含下混信号且不包含立体声参数集合:
解码单元还用于若确定第N帧码流为第一类型帧:当第N帧码流为第五类型帧时,对第N帧码流解码,在得到第N帧下混信号的同时,还得到第N帧立体声参数集合;当第N帧码流为第六类型帧时,则根据预设第二规则,从第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据k帧立体声参数集合,基于预定第四算法,得到第N帧立体声参数集合;若确定第N帧码流为第二类型帧,则根据预设第二规则,从第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据k帧立体声参数集合,基于预定第四算法,得到第N帧立体声参数集合;
其中,第N帧立体声参数集合中的至少一个立体声参数用于解码器基于预定第三算法将第N帧下混信号还原为第N帧音频信号,k为大于零的正整数;
信号还原单元,用于根据第N帧立体声参数集合中的至少一个立体声参数,基于第三算法,将第N帧下混信号还原为第N帧音频信号。
在第四方面的基础上,可选的,第五类型帧中包含下混信号和立体声参数集合,第六类型帧中包含下混信号且不包含立体声参数集合,第五类型帧和第六类型帧分别为第一类型帧的一种情况,第三类型帧中包含立体声参数集合且不包含下混信号,第四类型帧中不包含下混信号且不包含立体声参数集合,第三类型帧和第四类型帧分别为第二类型帧的一种情况:
解码单元还用于若确定第N帧码流为第一类型帧:当第N帧码流为第五类型帧时,对第N帧码流解码,在得到第N帧下混信号的同时,还得到第N帧立体声参数集合;当第N帧码流为第六类型帧时,则根据预设第二规则,从第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体 声参数集合,并根据k帧立体声参数集合,基于预定第四算法,得到第N帧立体声参数集合。
解码单元还用于若确定第N帧码流为第二类型帧:当第N帧码流为第三类型帧时,对第N帧码流解码,得到第N帧立体声参数集合;当第N帧码流为第四类型帧时,则根据预设第二规则,从第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据k帧立体声参数集合,基于预定第四算法,得到第N帧立体声参数集合;
其中,第N帧立体声参数集合中的至少一个立体声参数用于解码器基于预定第三算法将第N帧下混信号还原为第N帧音频信号,k为大于零的正整数;
解码器还包括,信号还原单元;
信号还原单元,用于根据第N帧立体声参数集合中的至少一个立体声参数,基于第三算法,将第N帧下混信号还原为第N帧音频信号。
第五方面,提供了一种编解码系统,包括第三方面提供的任一的编码器,和第四方面提供的任一的解码器。
第六方面,本发明实施例还提供一种终端设备,该终端设备包括处理器和存储器,所述存储器用于存储软件程序,所述处理器用于读取所述存储器中存储的软件程序并实现第一方面或上述第一方面的任意一种实现方式提供的方法。
第七方面,本发明实施例中还提供一种计算机存储介质,该存储介质可以是非易失性的,即断电后内容不丢失。该存储介质中存储软件程序,该软件程序在被一个或多个处理器读取并执行时可实现第一方面或上述第一方面的任意一种实现方式提供的方法。
附图说明
图1为本发明实施例一多声道音频信号处理的方法的流程示意图;
图2为本发明实施例二多声道音频信号处理的方法的流程示意图;
图3a~图3d为本发明实施例编码器的示意图;
图4为本发明实施例解码器的示意图;
图5为本发明实施例编解码系统的示意图。
具体实施方式
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作进一步地详细描述。
应理解,在音频编解码技术中,是以帧为单位对音频信号编码或解码的,具体的,第N帧音频信号即为第N个音频帧,当在第N帧音频信号中包括语音信号时,第N个音频帧即为语音帧,当第N帧音频帧中不包语音信号时,包括背景噪声信号时,第N个音频帧即为噪声帧,在这里,N为大于零的正整数。
此外,在单声道通信系统中,采用非连续编码方式时,每隔若干个噪声帧编码一次,得到静音插入帧(Silence Insertion Descriptor,SID)。
本发明实施例中的编码器和解码器为处理多声道音频信号的程序包可以通过安装在支持多通道音频信号处理的终端(如手机、笔记本电脑、平板电脑等)、服务器等设备上,使得终端、服务器等设备具备本发明实施例处理多声道音频信号的功能。
在本发明实施例中,由于多声道通信系统中能够采用非连续编码的机制对音频信号进行编码,大大提高了对音频信号的压缩效率。
下面以第N帧下混信号为例,对本发明实施例处理多声道音频信号的方法进行详细说明,其中,N为大于零的正整数。假设第N帧下混信号是由多声道中的两声道的第N帧音频信号混合后得到的。
当多声道为两声道时,其中,两声道分别为第一声道和第二声道,则多声道中的两声道为第一声道和第二声道,第N帧下混信号是由第一声道的第N 帧音频信号和第二声道的第N帧音频信号混合的到的;当多声道为三声道或三声道以上时,下混信号是由多声道中配对的两声道的音频信号混合得到的,具体的,以三声道为例,包括第一声道、第二声道和第三声道,假设根据设定的规则,只有第一声道与第二声道配对,则多声道中的两声道为第一声道和第二声道,由第一声道中的第N帧音频信号和第二声道中的第N帧音频信号下混后,得到第N帧下混信号;假设在三声道中,第一声道和第二声道配对、第二声道和第三声道配对,则多声道中国的两声道可以为第一声道和第二声道,也可以为第二声道和第三声道。
如图1所示,本发明实施例一处理多声道音频信号的方法,包括:
步骤100,编码器根据多声道中两声道的第N帧音频信号,生成第N帧立体声参数集合,其中,立体声参数集合中包括Z个立体声参数。
具体的,Z个立体声参数包括编码器基于预定第一算法对第N帧音频信号混合时所用到的参数,Z为大于零的正整数。应理解,预定第一算法为预先在编码器中设置的下混信号生成算法。
需要说明的是,具体的第N帧立体声参数集合中包括哪些立体声参数,是由预设的立体声参数生成算法决定的,假设两声道中一个声道为左声道,一个为右声道,预设的立体声参数生成算法如下,则根据第N帧音频信号得到的立体声参数为声道间电平差(Inter-channel Level Difference,ILD):
Figure PCTCN2016100617-appb-000013
Figure PCTCN2016100617-appb-000014
Figure PCTCN2016100617-appb-000015
Figure PCTCN2016100617-appb-000016
Figure PCTCN2016100617-appb-000017
其中,L(i)为左声道第N帧音频信号在第i个频点的离散傅里叶变换(Discrete Fourier Transform,DFT)系数,R(i)为右声道第N帧音频信号在第i个频点的DFT系数,ReL(i)为L(i)的实部,ImL(i)为L(i)的虚部,ReR(i)为R(i)的实部,ImR(i)为R(i)的虚部,PL(i)为左声道第N帧音频信号在第i个频点的能量谱,PR(i)为右声道第N帧音频信号在第i个频点的能量谱,EL(m)为左声道第m个子频带中的第N帧音频信号的能量,ER(m)为右声道第m个子频带中的第N帧音频信号的能量,传输第N帧音频信号的子频带的总个数为M。
在上述立体声参数生成算法中,不考虑第N帧音频信号为在频点i=0和
Figure PCTCN2016100617-appb-000018
时,分别为直流分量和奈奎斯特分量的情况。
当预设的立体声参数生成算法中,还包括计算其它立体声参数如声道间时间差(Inter-channel Time Difference,ITD)、声道间相位差(Inter-channel Phase Difference,IPD)、IC(Inter-channel Coherence,声道间相干性)的立体声参数的算法时,则编码器还能够根据音频信号,基于预设的立体声参数生成算法得到ITD、IPD、IC等立体声参数。
应理解,第N帧立体声参数集合中包括至少一个立体声参数,例如根据两个声道的第N帧音频信号,基于预设的立体声参数生成算法,得到IPD、ITD、ILD和IC,则由IPD、ITD、ILD和IC组成第N帧立体声参数集合。
步骤101,编码器根据第N帧立体声参数集合中的至少一个立体声参数,基于预定第一算法,将两声道的第N帧音频信号混合为第N帧下混信号。
例如,第N帧立体声参数集合中包括ITD、ILD、IPD和IC,根据ILD和IPD,基于预定第一算法,得到第N帧下混信号,具体的,第N帧下混信号DMX(k)在第k个频点的满足下列表达式:
Figure PCTCN2016100617-appb-000019
其中,DMX(k)为第N帧下混信号在第k个频点的|L(k)|表示第K对声道中左 声道中第N帧音频信号在第k个频点的幅度、|R(k)||表示K对声道中右声道中第N帧音频信号第k个频点的幅度,∠L(k)表示左声道中第N帧音频信号在第k个频点的相角,ILD(k)表示第N帧音频信号在第k个频点的ILD,IPD(k)表示第N帧音频信号第k个频点的IPD。
需要说明的是,本发明实施例除上述得到下混信号的算法外,不限于其它得到下混信号的算法。
在本发明实施例一中,对第N帧立体声参数集合编码,是为了使得解码器能够还原第N帧下混信号,可选的,为提高编码的压缩效率,编码器对第N帧立体声参数集合中用于得到第N帧下混信号的立体声参数编码。例如,生成的第N帧立体声参数集合中包括ITD、ILD、IPD和IC,然而,若编码器只根据第N帧立体声参数集合中的ILD和IPD,基于预定第一算法将两声道中的第N帧音频信号混合为第N帧下混信号,则为提高压缩效率,则编码器可以只对第N帧立体声参数集合中的ILD和IPD编码。
步骤102,编码器检测第N帧下混信号中是否包含语音信号,若是,则执行步骤103,否则执行步骤104。
为便于实现编码器检测第N帧下混信号中是否包含语音信号,可选的,编码器通过语音活动检测(Voice Activity Detection,VAD)直接检测第N帧下混信号中是否包含语音信号。
可选的,一种编码器检测第N帧下混信号中是否包含语音信号的间接方法,编码器通过VAD直接检测第N帧音频信号中是否包含语音信号。具体的,编码器当检测到两声道中的一个声道的音频信号包含语音信号,则确定由两声道中的音频信号混合得到的下混信号中包含语音信号,编码器当确定两声道中的音频信号都不包括语音信号时,才确定由两声道中的音频信号混合得到的下混信号中包含语音信号。需要说明的是,在这种间接检测方式下,不限定步骤102与步骤100、步骤101之间的顺序,只要步骤100在步骤101之前即可。
步骤103,编码器对第N帧下混信号编码,执行步骤107。
其中,编码器对第N帧下混信号编码得到的是第N帧码流。
由于在本发明实施例一种对下混信号是非连续编码,则码流包括两种帧类型:第一类型帧和第二类型帧,其中第一类型帧中包括下混信号,第二类型帧中不包括下混信号,通过步骤103得到的第N帧码流为第一类型帧。
在步骤103中,由于第N帧下混信号中包含语音信号,可选的,编码器根据预设的语音帧编码速率对第N帧下混信号编码,较佳的,预设的语音帧编码速率可以设置为13.2kbps。
此外,可选的,编码器若对第N帧下混信号编码,则对第N帧立体声参数集合编码。
步骤104,编码器判断第N帧下混信号是否满足预设的音频帧编码条件,若是,则执行步骤105,否则,执行步骤106。
其中,预设的音频帧编码条件是预先配置在编码器中的是否对第N帧下混信号进行编码的判断条件。
需要说明的是,针对第一帧下混信号,若第一帧下混信号中不包含语音信号时,第一帧下混信号满足预设的音频帧编码条件,即无论第一帧下混信号中是否包含语音信号都要对第一帧下混信号编码。
步骤105,编码器对第N帧下混信号编码,执行步骤107。
具体的,通过步骤105得到的第N帧码流也是第一类型帧。
需要说明的是,可选的,编码器若对第N帧下混信号编码,则对第N帧立体声参数集合编码。
可选的,为了便于简化对下混信号编码的实现方式,在本发明实施例一中步骤103与步骤105对第N帧下混信号的编码方式相同。
可选的,由于步骤105中第N帧下混信号中不包含语音信号,当第N帧下混信号满足预设的语音帧编码条件时,编码器根据预设的语音帧编码速率对第N帧下混信号编码;当第N帧下混信号不满足预设的语音帧编码条件、但满足预设的SID编码条件时,编码器根据预设的SID编码速率对第N帧下混信号 编码,其中,预设的SID编码速率可以设置为2.8kbps。
需要说明的是,当第N帧下混信号不满足预设的语音帧编码条件、但满足预设的SID编码条件时,编码器根据SID编码方式,对第N帧下混信号编码,其中,SID编码方式规定了编码速率为预设的SID编码速率,以及规定了编码使用的算法以及编码使用的参数。
其中,预设的语音帧编码条件可以为:第N帧下混信号距离第M帧下混信号的时长不大于预设时长,其中第M帧下混信号包含语音信号,第M帧下混信号是距离第N帧下混信号最近的一帧包含语音信号的下混信号。预设的SID编码条件可以为奇数帧编码,则第N帧下混信号中的N为奇数时,则编码器确定第N帧下混信号满足预设的SID编码条件。
步骤106,编码器不对第N帧下混信号编码,执行步骤109。
具体的,通过步骤106得到的第N帧码流为第二类型帧。
编码器确定第N帧下混信号不满足预设的音频帧编码条件,具体的,编码器确定第N帧下混信号不满足预设的语音帧编码条件,且不满足预设的SID编码条件。
在本发明实施例中,编码器不对第N帧下混信号编码,具体的,第N帧的码流中不包括第N帧下混信号。
编码器不对第N帧下混信号编码时,可以对第N帧立体声参数集合编码,也可以不对第N帧立体声参数集合编码。
在本发明实施例一中,以编码器当不对第N帧下混信号编码时,对第N帧立体声参数集合编码为例进行说明,但可选的,编码器当不对第N帧下混信号编码时,也可以不对第N帧立体声参数集合编码,具体的编码器对第N帧立体声参数和第N帧下混信号都不编码时,解码器得到第N帧下混信号和第N帧立体声参数集合的方式参考本发明实施例二。
步骤107,编码器向解码器发送第N帧码流。
其中,为了能够使解码器能够在解码得到第N帧下混信号后,将第N帧下混信号还原为两声道第N帧音频信号,第N帧码流中不仅包括第N帧立体声参 数集合还包括第N帧下混信号。
步骤108,解码器确定第N帧码流为第一类型帧,则对第N帧码流解码,得到第N帧下混信号和第N帧立体声参数集合,执行步骤111。
需要说明的是,由于第一类型帧中包含下混信号,第二类型帧中不包含下混信号,因此,第一类型帧的大小大于第二类型帧的大小,解码器可以通过根据第N帧码流的大小来判断第N帧码流为第一类型帧还是第二类型帧,此外,可选的,还可以在第N帧码流中封装标识位,解码器在对第N帧码流部分解码后得到标识位,根据标识位判断第N帧码流为第一类型帧还是第而类型帧,例如标识位为1指示第N帧码流为第一类型帧,标识位为0指示第N帧码流为第二类型帧。
此外,可选的,解码器根据第N帧码流对应的速率,确定解码方式,例如第N帧码流的速率为17.4kbps,其中,下混信号对应的码流的速率为13.2kbps,立体声参数集合对应的码流速率为4.2kbps,则按照与13.2kbps对应的解码方式对下混信号对应的码流解码,以及按照与4.2kbps对应的解码方式对立体声参数集合对应的码流解码。
或者,解码器根据第N帧码流中的编码方式标识位,确定第N帧码流的编码方式,然后根据与编码方式对应的解码方式,对第N帧码流解码。
步骤109,编码器向解码器发送第N帧码流,第N帧码流中包括第N帧立体声参数集合。
步骤110,解码器确定第N帧码流为第二类型帧,则对第N帧码流解码,得到第N帧立体声参数集合,以及根据预设第一规则,从第N帧下混信号之前的至少一帧下混信号中,确定m帧下混信号,并根据m帧下混信号,基于预定第一算法,得到第N帧下混信号,其中,m为大于零的正整数。
具体的,取第(N-3)帧、第(N-2)帧和第(N-1)帧下混信号的平均值,作为第N帧下混信号,或者,将第(N-1)帧下混信号直接作为第N帧下混信号,或者根据其它算法估计第N帧下混信号。
此外,还可以直接将第(N-1)帧下混信号作为第N帧下混信号;或者, 根据第(N-1)帧下混信号和一个预设的偏差值,基于预设的算法进行运算得到第N帧下混信号。
步骤111,解码器根据第N帧立体声参数集合的目标立体声参数,基于预定第二算法,将第N帧下混信号还原为两声道的第N帧音频信号。
应理解,目标立体声参数为第N帧立体声参数集合中的至少一立体声参数。
具体的,解码器将第N帧下混信号还原为两声道的第N帧音频信号的过程为编码器将两声道的第N帧音频信号混合为第N帧下混信号的逆过程,假设编码器端根据第N帧立体声参数集合中的IPD和ILD得到的第N帧下混信号,则在解码器则根据第N帧立体声参数集合中的IPD和ILD,将第N帧下混信号还原为第K对声道中各个声道的第N帧信号。此外,需要说明的是,解码器中预设的还原下混信号的算法可以为编码器中生成下混信号的算法的逆算法,也可以是独立于编码器中生成下混信号的算法的算法。
此外,为了提高多声道通信系统编码的压缩效率,编码器在实现对下混信号非连续编码的同时,也可实现对立体声参数集合的非连续编码,下面以第N帧下混信号为例,如图2所示,本发明实施例二多声道音频信号处理的方法,包括:
步骤200,编码器根据多声道中两声道的第N帧音频信号,生成第N帧立体声参数集合,其中,立体声参数集合中包括Z个立体声参数。
具体的,Z个立体声参数包括编码器基于预定第一算法对第N帧音频信号混合时所用到的参数,Z为大于零的正整数。应理解,预定第一算法为预先设置在编码器中的下混信号生成算法。
需要说明的是,第N帧立体声参数集合中包括哪些立体声参数,是由预设的立体声参数生成算法决定的,假设两声道中一个声道为左声道,一个为右声道,预设的立体声参数生成算法如下,则根据第N帧音频信号得到的立体声参数为ITD:
Figure PCTCN2016100617-appb-000020
Figure PCTCN2016100617-appb-000021
其中,0≤i≤Tmax,N为帧长,l(j)表示左声道在j时刻的时域信号帧,r(j)表示右声道在j时刻的时域信号帧,则若
Figure PCTCN2016100617-appb-000022
则ITD为
Figure PCTCN2016100617-appb-000023
对应的索引值的相反数,否则ITD为
Figure PCTCN2016100617-appb-000024
对应的索引值的相反数,在本发明实施例中,其它得到ITD的算法同样适用。
若预设的立体声参数生成算法中还包括如下生成IPD的算法,则按照下述算法还可得到IPD。具体的,第b个子频带的IPD满足下列表达式:
Figure PCTCN2016100617-appb-000025
其中,B为音频信号在频域所占用的子频带的总个数,L(k)为左声道中第N帧音频信号在第k个频点的信号,R*(k)为右声道第N帧音频信号在第k个频点的信号的共轭。
此外,当预设的立体声参数生成算法中还包括本发明实施例一中的生成ILD的算法时,则还可以得到ILD。
步骤201,编码器根据第N帧立体声参数集合中的至少一个立体声参数,基于预定算法,将两声道的第N帧音频信号混合为第N帧下混信号。
具体的,预定第一算法可以参见本发明实施例一中得到第N帧下混信号的方法,但不限于本发明实施例一种得到第N帧下混信号的方法。
步骤202,编码器检测第N帧下混信号中是否包含语音信号,若是,则执行步骤203,否则执行步骤204。
其中,本发明实施例二中,编码器检测第N帧下混信号中是否包含语音信号的具体实现方式,可参见本发明实施例一中编码器检测第N帧下混信号中是否包含语音信号的方式。
步骤203,编码器根据预设的语音帧编码速率对第N帧下混信号编码,以 及对第N帧立体声参数集合编码,执行步骤211。
具体的,当编码器中包括两种对立体声参数集合编码的方式时,第一编码方式和第二编码方式,其中,第一编码方式规定的编码速率不小于第二编码方式规定的编码速率;和/或,针对第N帧立体声参数集合中的任一立体声参数,第一编码方式规定的量化精度不低于第二编码方式规定的量化精度,在步骤203中,编码器按照第一编码方式,对第N帧立体声参数集合编码。
例如,第N帧立体声参数集合中包括IPD和ITD,第一编码方式中规定的IPD的量化精度不低于第二编码方式中规定的IPD的量化精度,第一编码方式中规定的ITD的量化精度不低于第二编码方式中规定的ITD的量化精度。
较佳的,语音帧编码速率可以设置为13.2kbps。
步骤204,编码器判断第N帧下混信号是否满足预设的语音帧编码条件,若是,则执行步骤205,否者,执行步骤206。
步骤205,编码器根据预设的语音帧编码速率对第N帧下混信号编码,以及对第N帧立体声参数集合编码,执行步骤211。
具体的,当编码器中包括两种对立体声参数集合编码的方式时,第一编码方式和第二编码方式,其中,第一编码方式规定的编码速率不小于第二编码方式规定的编码速率;和/或,针对第N帧立体声参数集合中的任一立体声参数,第一编码方式规定的量化精度不低于第二编码方式规定的量化精度,在步骤205中,编码器按照第一编码方式,对第N帧立体声参数集合编码。
步骤206,编码器判断第N帧下混信号是否满足预设的SID编码条件,以及判断第N帧立体声参数集合是否满足预设的立体声参数编码条件,若同时满足,则执行步骤207,若第N帧下混信号满足预设的SID编码条件,第N帧立体声参数集合不满足预设的立体声参数编码条件,则执行步骤208,若第N帧下混信号不满足预设的SID编码条件,第N帧立体声参数集合满足预设的立体声参数编码条件,则执行步骤209,若同时不满足,则执行步骤210。
具体的,当编码器在对第N帧立体声参数集合中的至少一个立体声参数编码之前,判断至少一个立体声参数中的立体声参数是否满足预设对应的立 体声参数编码条件,具体的,若第N帧立体声参数集合中的至少一个立体声参数包括:声道间电平差ILD;预设立体声参数编码条件中包括:DL≥D0;其中,DL表示ILD与第一标准的偏离程度,第一标准是根据第N帧立体声参数集合之前的T帧立体声参数集合,基于预定第三算法确定的,T为大于0的正整数;
若第N帧立体声参数集合中的至少一个立体声参数包括:声道间时间差ITD;预设立体声参数编码条件中包括:DT≥D1
其中,DT表示ITD与第二标准的偏离程度,第二标准是根据第N帧立体声参数集合之前的T帧立体声参数集合,基于预定第四算法确定的,T为大于0的正整数;
若第N帧立体声参数集合中的至少一个立体声参数包括:声道间相位差IPD;预设立体声参数编码条件中包括:Dp≥D2
其中,DP表示IPD与第三标准的偏离程度,第三标准是根据第N帧立体声参数集合之前的T帧立体声参数集合,基于预定第五算法确定的,T为大于0的正整数。
其中,第三算法、第四算法以及第五算法是根据实际情况需要预先设置的。
具体的,当第N帧立体声参数集合中的至少一个立体声参数仅包括ITD时,预设的立体声参数编码条件仅包括DT≥D1,则当第N帧立体声参数集合中的至少一个立体声参数包括的ITD满足DT≥D1,则对第N帧立体声参数集合中的至少一个立体声参数编码;当第N帧立体声参数集合中的至少一个立体声参数仅包括ITD、IPD时,预设的立体声参数编码条件仅包括DT≥D1,则当第N帧立体声参数集合中的至少一个立体声参数包括的ITD满足DT≥D1,则对第N帧立体声参数集合中的至少一个立体声参数编码,但是,当第N帧立体声参数集合中的至少一个立体声参数仅包括ITD、ILD时,预设的立体声 参数编码条件包括DT≥D1和DL≥D0,则只有在第N帧立体声参数集合中的至少一个立体声参数包括的ITD满足DT≥D1、且ILD满足DL≥D0时,编码器才对ITD和ILD编码。
可选的,DL、DT、DP分别满足下列表达式:
Figure PCTCN2016100617-appb-000026
Figure PCTCN2016100617-appb-000027
Figure PCTCN2016100617-appb-000028
其中,ILD(m)为两声道分别在第m个子频带传输第N帧音频信号时的电平差值,M为传输第N帧音频信号所占用的子频带的总个数,
Figure PCTCN2016100617-appb-000029
为在第N帧之前的T帧立体声参数集合中在第m个子频带的ILD的平均值,T为大于0的正整数,ILD[-t](m)为两声道分别在第m个子频带传输第N帧音频信号之前的第t帧音频信号时的电平差值,ITD为两声道分别传输第N帧音频信号时的时间差值,
Figure PCTCN2016100617-appb-000030
为在第N帧之前的T帧立体声参数集合中的ITD的平均值,ITD[-t]为两声道分别传输第N帧音频信号之前的第t帧音频信号时的时间差值,IPD(m)为两声道分别在第m个子频带传输第N帧音频信号中的部分音频信号时的相位差值,
Figure PCTCN2016100617-appb-000031
为在第N帧之前的T帧立体声参数集合中在第m个子频带的IPD的平均值,IPD[-t](m)为两声道分别在第m个子频带传输第N帧音频信号之前的第t帧音频信号时的相位差值。
步骤207,编码器根据预设的SID编码速率对第N帧下混信号编码,以及对第N帧立体声参数集合中至少一个立体声参数编码,执行步骤211。
具体的,当编码器中保量两种对立体声参数集合编码的方式时,第一编码方式和第二编码方式,其中,第一编码方式规定的编码速率不小于第二编 码方式规定的编码速率;和/或,针对第N帧立体声参数集合中任一立体声参数,第一编码方式规定的量化精度不低于第二编码方式规定的量化精度,编码器按照第二编码方式对第N帧立体声参数集合中至少一个立体声参数编码。
例如,第一编码方式中编码器按照4.2kbps对第N帧立体声参数集合编码,第二编码方式中编码器按照1.2kbps对第N帧立体声参数集合编码。
其中,为提高编码器对立体声参数集合的压缩效率,可选的,编码器根据第N帧立体声参数集合中的Z个立体声参数,按照预设的立体声参数降维规则,得到X个目标立体声参数,并对X个目标立体声参数编码,其中,X为大于零且小于等于Z的正整数。
具体的,第N帧立体声参数集合中包括IPD、ITD、ILD三种类型的立体声参数,其中,ILD由ILD(0)…ILD(9)10个子频带的ILD组成,IPD由IPD(0)…IPD(9)10个子频带的IPD组成,ITD由ITD(0),ITD(1)2个时域子带的ITD组成,假设预设的立体声参数降维规则为立体声参数集合中只包括两个类型的立体声参数,则编码器从IPD、ITD、ILD中选择任意两个类型的立体声参数,假设选择的是IPD和ILD,则编码器对IPD和ILD编码。或者,预设的立体声参数降维规则为每个类型的立体声参数只保留一半,则分别从ILD(0)…ILD(9)中选择5个、从IPD(0)…IPD(9)中选择5个,从ITD(0),ITD(1)中选择1个,将选择的参数编码;或者,预设的立体声参数降维规则为从ILD和IPD中分别选择5个,或者,预设的立体声参数降维规则为降低ILD、IPD的频域分辨率和ITD的时域分辨率,则将ILD(0)…ILD(9)中相邻子频带合并,例如求取ILD(0)、ILD(1)的均值得到新的ILD(0),求取ILD(2)、ILD(3)的均值得到新的ILD(1),…,求取ILD(8)、ILD(9)的均值得到新的ILD(4),其中新的ILD(0)对应的子频带等于原ILD(0)、ILD(1)对应的子频带,…,新的ILD(4)对应的子频带等于原ILD(8)、ILD(9)对应的子频带。同样的方法,将IPD(0)…IPD(9)中相邻子频带合并,得到新的IPD(0)…IPD(4),将ITD(0)、ITD (1)也求取均值进行合并得到新的ITD(0),其中新的ITD(0)对应的时域信号与原ITD(0)、ITD(1)对应的时域信号相同。将新的ILD(0)…ILD(4),新的IPD(0)…IPD(4)和新的ITD(0)编码。或者,预设的立体声参数降维规则为降低ILD的频域分辨率,则将ILD(0)…ILD(9)中相邻子频带合并,例如求取ILD(0)、ILD(1)的均值得到新的ILD(0),求取ILD(2)、ILD(3)的均值得到新的ILD(1),…,求取ILD(8)、ILD(9)的均值得到新的ILD(4),其中新的ILD(0)对应的子频带等于原ILD(0)、ILD(1)对应的子频带,…,新的ILD(4)对应的子频带等于原ILD(8)、ILD(9)对应的子频带。然后,将新的ILD(0)…ILD(4)编码。
步骤208,编码器根据预设的SID编码速率对第N帧下混信号编码,不对第N帧立体声参数集合中至少一个立体声参数编码,执行步骤211。
步骤209,编码器对第N帧立体声参数集合中的至少一个立体声参数编码,不对第N帧下混信号编码,执行步骤215。
步骤210,编码器不对第N帧下混信号和第N帧立体声参数集合编码,执行步骤217。
通过本发明实施例二编码器编码后得到的码流,码流中包括四种不同类型的帧,即第三类型帧、第四类型帧、第五类型帧和第六类型帧,其中第三类型帧中包含立体声参数集合且不包含下混信号,第四类型帧中不包含下混信号且不包含立体声参数集合,第五类型帧中包含下混信号和立体声参数集合,第六类型帧中包含下混信号且不包含立体声参数集合,其中第五类型帧和第六类型帧分别为包含下混信号类型帧的一种情况,第三类型帧和第四类型帧分别为不包含下混信号类型帧的一种情况。
具体的,步骤203、步骤205和步骤207中的到的第N帧码流为第五类型帧,步骤208中得到的第N帧码流为第六类型帧,步骤209中得到的第N帧码流为第三类型帧,步骤211中得到的第N帧码流为第四类型帧。
步骤211,编码器向解码器发送第N帧码流,第N帧码流中包括第N帧下混信号和第N帧立体声参数集合。
步骤212,解码器接收第N帧码流,确定第N帧码流为第五类型帧,则对第N帧码流解码,得到第N帧下混信号和第N帧立体声参数集合,执行步骤218。
其中解码器确定第N帧码流为哪一类型帧的具体实施方式参见本发明实施例一。
具体的,解码器根据第N帧码流对应的速率,对第N帧码流解码,具体的,若编码器按照13.2kbps对第N帧下混信号编码,则解码器按照13.2kbps对第N帧码流中第N帧下混信号的码流解码,若编码器按照4.2kbps对第N帧立体声参数集合编码,则解码器按照4.2kbps对第N帧码流中第N帧立体声参数集合的码流解码。
步骤213,编码器向解码器发送第N帧码流,第N帧码流中包括第N帧下混信号。
步骤214,解码器确定第N帧码流为第六类型帧,则对第N帧码流解码,得到第N帧下混信号,并根据预设第二规则,从第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据k帧立体声参数集合,基于预定第六算法,得到第N帧立体声参数集合,执行步骤218。
具体的,以第N帧立体声参数集合中一个立体声参数为例,预设第二规则中规定的立体声参数集合为距离P最近的一帧、且通过解码得到的立体生参数集合,根据下列算法得到第N帧立体声参数P:
Figure PCTCN2016100617-appb-000032
P表示第N帧的立体声参数,
Figure PCTCN2016100617-appb-000033
表示距离P最近的一帧、且通过解码得到的立体生参数,δ表示一个绝对值相对于较小的一个随机数,例如δ可以是一个在
Figure PCTCN2016100617-appb-000034
Figure PCTCN2016100617-appb-000035
之间的随机数。
需要说明的是,在本发明实施例中,不限于上述方法估计第N帧立体声参数集合中的各个立体声参数。
步骤215,编码器向解码器发送第N帧码流,第N帧码流中包括第N帧立体声参数集合中的至少一个立体声参数。
步骤216,解码器确定第N帧码流为第三类型帧,则对第N帧码流解码,得到第N帧立体声参数集合中的至少一个立体声参数,以及根据预设第一规则,从第N帧下混信号之前的至少一帧下混信号中,确定m帧下混信号,并根据m帧下混信号,基于预定第二算法,得到第N帧下混信号,m为大于零的正整数,执行步骤218。
具体的,取第(N-3)帧、第(N-2)帧和第(N-1)帧下混信号的平均值,作为第N帧下混信号,或者,将第(N-1)帧下混信号直接作为第N帧下混信号,或者根据其它算法估计第N帧下混信号。
此外,还可以直接将第(N-1)帧下混信号作为第N帧下混信号;或者,根据第(N-1)帧下混信号和一个预设的偏差值,基于预设的算法进行运算得到第N帧下混信号。
步骤217,解码器接收第N帧码流后,确定第N帧码流为第四类型帧,则根据预设第二规则,从第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据k帧立体声参数集合,基于预定第六算法,得到第N帧立体声参数集合;以及
根据预设第一规则,从第N帧下混信号之前的至少一帧下混信号中,确定m帧下混信号,并根据m帧下混信号,基于预定第二算法,得到第N帧下混信号,m为大于零的正整数。
步骤218,解码器根据第N帧立体声参数集合的目标立体声参数,基于预定第七算法,将第N帧下混信号还原为两声道的第N帧音频信号。
此外,基于本发明实施例,编码器若通过两声道中的第N帧音频信号检测第N帧下混信号中是否包含语音信号,还提供了一种对立体声参数集合的编码方式,具体的,编码器若检测到两声道中任一第N帧音频信号包含语音信号,则根据第N帧音频信号,基于第一立体声参数集合生成方式,得到第N帧立体声参数集合,并对第N帧立体声参数集合编码;
编码器在确定两声道中的第N帧音频信号中都不包含语音信号时:若第N帧音频信号满足预设的语音帧编码条件,则根据第N帧音频信号,基于第 一立体声参数集合生成方式,得到第N帧立体声参数集合,并对第N帧立体声参数集合编码;若确定第N帧音频信号不满足预设的语音帧编码条件,则根据第N帧音频信号,基于第二立体声参数集合生成方式,得到第N帧立体声参数集合,并
在确定第N帧立体声参数集合满足预设的立体声参数编码条件时,对第N帧立体声参数集合中的至少一个立体声参数编码;在确定第N帧立体声参数集合不满足预设的立体声参数编码条件时,不对立体声参数集合编码;
其中,第一立体声参数集合生成方式和所述第二立体声参数集合生成方式满足下列至少一个条件:
第一立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数类型的个数不少于第二立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数类型的个数,第一立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数的个数不少于第二立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数的个数,第一立体声参数集合生成方式规定的立体声参数在时域的分辨率不低于第二立体声参数集合生成方式规定的对应的立体声参数在时域的分辨率,第一立体声参数集合生成方式规定的立体声参数在频域的分辨率不低于第二立体声参数集合生成方式规定的对应的立体声参数在频域的分辨率。
具体的,第一立体声集合生成方式得到的立体声参数集合在频域或时域的精度较第二立体声集合生成方式得到的立体声参数集合高。
此外,本发明实施例三处理多声道音频信号的方法中,当编码器检测到第N帧下混信号中包含语音信号时,按照语音编码速率对第N帧下混信号编码,以及对第N帧立体声参数集合编码;当编码器检测到第N帧下混信号中不包含语音信号时:若第N帧下混信号满足预设的语音帧编码条件,则按照语音编码速率对第N帧下混信号编码,以及对第N帧立体声参数集合编码;若第N帧下混信号不满足预设的语音帧编码条件、但满足预设的SID编码条件,则按照SID编码速率对第N帧下混信号编码,以及对第N帧立体声参数集合中至少 一个立体声参数编码,若第N帧下混信号既不满足预设的语音帧编码条件、也不满足预设的SID编码条件时,编码器不对第N帧下混信号编码,同时也不对第N帧立体声参数集合编码。
应理解,本发明实施例三与本发明实施例一和本发明实施例二的区别在于:编码器不对立体声参数集合进行判断,对下混信号无论采用何种方式编码时,则对立体声参数集合编码。
通过本发明实施例三编码器对下混信号编码得到的码流包括两种类型的帧,第一类型帧和第二类型帧,其中第一类型帧包含下混信号且包含立体声参数集合,第二类型帧不包含下混信号且不包含立体声参数集合,具体的解码器接收到码流后,还原得到两声道的音频信号的方法参见本发明实施例二和本发明实施例一。
在本发明实施例三的基础上,可选的,在第N帧下混信号既不满足预设的语音帧编码条件、也不满足预设的SID编码条件时,编码器判断第N帧立体声参数集合是否满足预设的立体声参数编码条件,若是,编码器不对第N帧下混信号编码,但对第N帧立体声参数集合中至少一个立体声参数编码,否则编码器不对第N帧下混信号和第N帧立体声参数集合编码。
基于上述编码方法得到的码流包括三种类型帧,第一类型帧、第三类型帧和第四类型帧,其中第一类型帧中包含下混信号且包含立体声参数集合,第三类型帧中不包含下混信号但包含立体声参数集合,第四类型帧不包含下混信号且不包含立体声参数集合,具体的解码器接收到码流后,还原得到两声道的音频信号的方法参见本发明实施例二和本发明实施例一。
上述技术方案与本发明实施例二的区别在于,在第N帧下混信号既不满足预设的语音帧编码条件、也不满足预设的SID编码条件时,判断第N帧立体声参数集合是否满足预设的立体声参数编码条件。
可选的,本发明实施例四处理多声道音频信号的方法中,当编码器检测到第N帧下混信号中包含语音信号时,按照语音编码速率对第N帧下混信号编码,以及对第N帧立体声参数集合编码;当编码器检测到第N帧下混信号中不 包含语音信号时:若第N帧下混信号满足预设的语音帧编码条件,则按照语音编码速率对第N帧下混信号编码,以及对第N帧立体声参数集合编码;若第N帧下混信号不满足预设的语音帧编码条件、但满足预设的SID编码条件,编码器判断第N帧立体声参数集合是否满足预设的立体声参数编码条件,当第N帧立体声参数集合满足预设的立体声参数集合编码条件时,编码器按照SID编码速率对第N帧下混信号编码,以及对第N帧立体声参数集合中至少一个立体声参数编码,当第N帧立体声参数集合不满足预设的立体声参数集合编码条件时,编码器按照SID编码速率对第N帧下混信号编码,且不对第N帧立体声参数集合编码;若第N帧下混信号既不满足预设的语音帧编码条件、也不满足预设的SID编码条件时,编码器不对第N帧下混信号编码,同时也不对第N帧立体声参数集合编码。
通过本发明实施例四编码方式得到的码流包括三种类型帧,第五类型帧、第六类型帧和第二类型帧,其中第五类型帧中包含下混信号和立体声参数集合,第六类型帧中包含下混信号且不包含立体声参数集合,第二类型帧中不包含下混信号且不包含立体声参数集合,具体的解码器接收到码流后,还原得到两声道的音频信号的方法参见本发明实施例二和本发明实施例一。
本发明实施例四与本发明实施例二的区别在于:在第N帧下混信号不满足预设的语音帧编码条件、但满足预设的SID编码条件时,判断是否对第N帧立体声参数集合中至少一个立体声参数编码,当不满足预设的语音帧编码条件、且不满足预设的SID编码条件,则不对第N帧立体参数集合编码。
在本发明实施例三和本发明实施例四中,具体的解码器得到第N帧下混信号和第N帧立体声参数集合的方式参见本发明实施例二和本发明实施例一,以及对立体声参数和下混信号编码的具体实施方式也可参见本发明实施例二和本发明实施例一。
在本发明任一实施例中,预定第一算法、预定第二算法中的第一、第二没有特殊的含义,仅是用于区分不同的算法,第三、第四、第五、第六、第七等与此类似,在此不再一一赘述。
基于同一发明构思,本发明实施例中还提供了一种编码器、一种解码器和一种编解码系统,由于本发明实施例中的编码器、解码器和编解码系统对应的方法为本发明实施例处理多声道音频信号的方法,因此本发明实施例编码器、解码器以及编解码系统的实施可以参见该方法的实施,重复之处不再赘述。
如图3a所示,本发明实施例编码器,包括:信号检测单元300和信号编码单元310,其中,信号检测单元300用于检测第N帧下混信号中是否包含语音信号,第N帧下混信号是由多声道中两个声道的第N帧音频信号基于预定第一算法混合后得到的,N为大于零的正整数;信号编码单元310用于在信号检测单元300检测到第N帧下混信号中包含语音信号时,对第N帧下混信号编码,以及在信号检测单元300检测到第N帧下混信号中不包含语音信号时:若信号检测单元300确定第N帧下混信号满足预设的音频帧编码条件,则对第N帧下混信号编码;若信号检测单元300确定第N帧下混信号不满足预设的音频帧编码条件,则不对第N帧下混信号编码。
可选的,如图3b所示,信号编码单元310包括第一信号编码单元311和第二信号编码单元312,在信号检测单元300检测到第N帧下混信号中包含语音信号时,信号检测单元300通知第一信号编码单元311对第N帧下混信号编码;
若信号检测单元300确定第N帧下混信号满足预设的语音帧编码条件,则通知第一信号编码单元311对第N帧下混信号编码;
具体的,规定第一信号编码单元311根据预设的语音帧编码速率对第N帧下混信号编码;
若信号检测单元300确定第N帧下混信号不满足预设的语音帧编码条件、但满足预设的静音插入帧SID编码条件,则通知第二信号编码单元312对第N帧下混信号编码,具体的规定第二信号编码单元312根据预设的SID编码速率对第N帧下混信号编码;其中,SID编码速率不大于语音帧编码速率。
可选的,如图3a和如图3b所示的编码器还包括参数生成单元320、参数 编码单元330和参数检测单元340,其中,参数生成单元320用于根据第N帧音频信号,得到第N帧立体声参数集合,第N帧立体声参数集合中包括Z个立体声参数,Z个立体声参数包括编码器基于预定第一算法对第N帧音频信号混合时所用到的参数,Z为大于零的正整数;参数编码单元330用于在信号检测单元检测到第N帧下混信号中包含语音信号时,则对第N帧立体声参数集合编码,以及在信号检测单元300检测到第N帧下混信号中不包含语音信号时:若信号检测单元300确定第N帧立体声参数集合满足预设的立体声参数编码条件,则对第N帧立体声参数集合中的至少一个立体声参数编码;若信号检测单元300确定第N帧立体声参数集合不满足预设的立体声参数编码条件,则不对立体声参数集合编码。
可选的,参数编码单元330用于根据第N帧立体声参数集合中的Z个立体声参数,按照预设的立体声参数降维规则,得到X个目标立体声参数,并对X个目标立体声参数编码,其中,X为大于零且小于等于Z的正整数。
具体的,当参数编码单元330包括第一参数编码单元331和第二参数编码单元332时,第二参数编码单元332用于根据第N帧立体声参数集合中的Z个立体声参数,按照预设的立体声参数降维规则,得到X个目标立体声参数,并对X个目标立体声参数编码。
可选的,在如图3a和图3b的基础上,如图3c所示的编码器参数生成单元320包括第一参数生成单元321和第二参数生成单元322,信号检测单元300检测到第N帧音频信号包含语音信号时,或者信号检测单元300检测到第N帧音频信号不包含语音信号、且第N帧音频信号满足预设的语音帧编码条件时,通知第一参数生成单元321生成第N帧立体声参数集合;信号检测单元300检测到第N帧音频信号不包含语音信号、且第N帧音频信号不满足预设的语音帧编码条件时,通知第二参数生成单元322生成第N帧立体声参数集合,具体的,预先规定第一参数生成单元321根据第N帧音频信号,基于第一立体声参数集合生成方式,得到第N帧立体声参数集合,第二参数生成单元322根据第N帧音频信号,基于第二立体声参数集合生成方式,得到 第N帧立体声参数集合。
其中,第一立体声参数集合生成方式和第二立体声参数集合生成方式满足下列至少一个条件:
第一立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数类型的个数不少于第二立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数类型的个数,第一立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数的个数不少于第二立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数的个数,第一立体声参数集合生成方式规定的立体声参数在时域的分辨率不低于第二立体声参数集合生成方式规定的对应的立体声参数在时域的分辨率,第一立体声参数集合生成方式规定的立体声参数在频域的分辨率不低于第二立体声参数集合生成方式规定的对应的立体声参数在频域的分辨率。
第二参数生成单元322在得到第N帧立体声参数集合后,通过参数编码单元330对第N帧立体声参数集合编码,具体的,如图3d所示,当参数编码单元330包括第一参数编码单元331和第二参数编码单元332时,通过第一参数编码单元331对第一参数生成单元321生成的第N帧立体声参数集合编码;通过第二参数编码单元332对第二参数生成单元322生成的第N帧立体声参数集合编码;预先规定第一参数编码单元331的编码方式为第一编码方式,预先规定第二参数编码单元332的编码方式为第二编码方式,其中,第一参数编码单元规定的编码方式为第一编码方式,第二参数编码单元规定的编码方式为第二编码方式,具体的,第一编码方式规定的编码速率不小于第二编码方式规定的编码速率;和/或,针对第N帧立体声参数集合中的任一立体声参数,第一编码方式规定的量化精度不低于第二编码方式规定的量化精度。
在参数检测单元340确定第N帧立体声参数集合不满足预设的立体声参数编码条件时,不对立体声参数集合编码。
可选的,参数编码单元330包括第一参数编码单元331和第二参数编码 单元332,具体的,第一参数编码单元331用于在第N帧下混信号中包含语音信号以及在第N帧下混信号中不包含语音信号但满足语音帧编码条件时,根据第一编码方式对第N帧立体声参数集合编码;第二参数编码单元332用于在第N帧下混信号不满足语音帧编码条件时,根据第二编码方式对第N帧立体声参数集合中的至少一个立体声参数编码;
其中,第一编码方式规定的编码速率不小于第二编码方式规定的编码速率;和/或,针对第N帧立体声参数集合中的任一立体声参数,第一编码方式规定的量化精度不低于第二编码方式规定的量化精度。
在第三方面的基础上,可选的,若第N帧立体声参数集合中的至少一个立体声参数包括:声道间电平差ILD;预设立体声参数编码条件中包括:DL≥D0
其中,DL表示ILD与第一标准的偏离程度,第一标准是根据第N帧立体声参数集合之前的T帧立体声参数集合,基于预定第二算法确定的,T为大于0的正整数;
若第N帧立体声参数集合中的至少一个立体声参数包括:声道间时间差ITD;预设立体声参数编码条件中包括:DT≥D1
其中,DT表示ITD与第二标准的偏离程度,第二标准是根据第N帧立体声参数集合之前的T帧立体声参数集合,基于预定第三算法确定的,T为大于0的正整数;
若第N帧立体声参数集合中的至少一个立体声参数包括:声道间相位差IPD;预设立体声参数编码条件中包括:Dp≥D2
其中,DP表示IPD与第三标准的偏离程度,第三标准是根据第N帧立体声参数集合之前的T帧立体声参数集合,基于预定第四算法确定的,T为大于0的正整数。
可选的,DL、DT、DP分别满足下列表达式:
Figure PCTCN2016100617-appb-000036
Figure PCTCN2016100617-appb-000037
Figure PCTCN2016100617-appb-000038
其中,ILD(m)为两声道分别在第m个子频带传输第N帧音频信号时的电平差值,M为传输第N帧音频信号所占用的子频带的总个数,
Figure PCTCN2016100617-appb-000039
为在第N帧之前的T帧立体声参数集合中在第m个子频带的ILD的平均值,T为大于0的正整数,ILD[-t](m)为两声道分别在第m个子频带传输第N帧音频信号之前的第t帧音频信号时的电平差值,ITD为两声道分别传输第N帧音频信号时的时间差值,
Figure PCTCN2016100617-appb-000040
为在第N帧之前的T帧立体声参数集合中的ITD的平均值,ITD[-t]为两声道分别传输第N帧音频信号之前的第t帧音频信号时的时间差值,IPD(m)为两声道分别在第m个子频带传输第N帧音频信号中的部分音频信号时的相位差值,
Figure PCTCN2016100617-appb-000041
为在第N帧之前的T帧立体声参数集合中在第m个子频带的IPD的平均值,IPD[-t](m)为两声道分别在第m个子频带传输第N帧音频信号之前的第t帧音频信号时的相位差值。
需要说明的是,如图3a~图3d所示的参数检测单元340是可选的,即在编码器中可以存在参数检测单元340,也可以没有参数检测单元340。
当参数编码单元330对参数生成单元320每帧立体声参数集合都编码时,无需对立体声参数进行检测,直接编码即可。
如图4所示,本发明实施例的解码器,包括:接收单元400和解码单元410,其中,接收单元400用于接收到码流,码流包括至少两个帧,至少两个帧中存在至少一个第一类型帧和至少一个第二类型帧,第一类型帧中包含下混信号,第二类型帧中不包含下混信号;针对第N帧码流,N为大于1的正整数,解码单元410用于:若确定第N帧码流为第一类型帧,则对第N帧码 流解码,得到第N帧下混信号;若确定第N帧码流为第二类型帧,则根据预设第一规则,从第N帧下混信号之前的至少一帧下混信号中,确定m帧下混信号,并根据m帧下混信号,基于预定第一算法,得到第N帧下混信号,m为大于零的正整数;
其中,第N帧下混信号是编码器由多声道中两个声道的第N帧音频信号基于预定第二算法混合后得到的。
可选的,如图4所示的解码器还包括信号还原单元420,第一类型帧中包含下混信号和立体声参数集合,第二类型帧中包含立体声参数集合且不包含下混信号:
解码单元410若确定第N帧码流为第一类型帧,则对第N帧码流解码,在得到第N帧下混信号的同时,还得到第N帧立体声参数集合;若确定第N帧码流为第二类型帧,则对第N帧码流解码,得到第N帧立体声参数集合;其中,第N帧立体声参数集合中的至少一个立体声参数用于解码器基于预定第三算法将第N帧下混信号还原为第N帧音频信号;
信号还原单元420,用于根据第N帧立体声参数集合中的至少一个立体声参数,基于第三算法,将第N帧下混信号还原为第N帧音频信号。
可选的,第一类型帧中包含下混信号和立体声参数集合,第二类型帧中不包含下混信号且不包含立体声参数集合;
解码单元410还用于若确定第N帧码流为第一类型帧,则对第N帧码流解码,在得到第N帧下混信号的同时,还得到第N帧立体声参数集合;若确定第N帧码流为第二类型帧,则根据预设第二规则,从第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据k帧立体声参数集合,基于预定第四算法,得到第N帧立体声参数集合,k为大于零的正整数;
其中,第N帧立体声参数集合中的至少一个立体声参数用于解码器基于预定第三算法将第N帧下混信号还原为第N帧音频信号;
信号还原单元420,用于根据第N帧立体声参数集合中的至少一个立体 声参数,基于第三算法,将第N帧下混信号还原为第N帧音频信号。
可选的,第一类型帧中包含下混信号和立体声参数集合,第三类型帧中包含立体声参数集合且不包含下混信号,第四类型帧中不包含下混信号且不包含立体声参数集合,第三类型帧和第四类型帧分别为第二类型帧的一种情况:
解码单元410还用于若确定第N帧码流为第一类型帧,则对第N帧码流解码,在得到第N帧下混信号的同时,还得到第N帧立体声参数集合;若确定第N帧码流为第二类型帧:当第N帧码流为第三类型帧时,则对第N帧码流解码,得到第N帧立体声参数集合;当第N帧码流为第四类型帧时,则根据预设第二规则,从第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据k帧立体声参数集合,基于预定第四算法,得到第N帧立体声参数集合,k为大于零的正整数;
其中,第N帧立体声参数集合中的至少一个立体声参数用于解码器基于预定第三算法将第N帧下混信号还原为第N帧音频信号;
信号还原单元420,用于根据第N帧立体声参数集合中的至少一个立体声参数,基于第三算法,将第N帧下混信号还原为第N帧音频信号。
可选的,第五类型帧中包含下混信号和立体声参数集合,第六类型帧中包含下混信号且不包含立体声参数集合,第五类型帧和第六类型帧分别为第一类型帧的一种情况,第二类型帧中不包含下混信号且不包含立体声参数集合:
解码单元410还用于若确定第N帧码流为第一类型帧:当第N帧码流为第五类型帧时,则对第N帧码流解码,在得到第N帧下混信号的同时,还得到第N帧立体声参数集合;当第N帧码流为第六类型帧时,则根据预设第二规则,从第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据k帧立体声参数集合,基于预定第四算法,得到第N帧立体声参数集合;
解码单元410还用于若确定第N帧码流为第二类型帧,则根据预设第二 规则,从第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据k帧立体声参数集合,基于预定第四算法,得到第N帧立体声参数集合;
其中,第N帧立体声参数集合中的至少一个立体声参数用于解码器基于预定第三算法将第N帧下混信号还原为第N帧音频信号,k为大于零的正整数;
信号还原单元420,用于根据第N帧立体声参数集合中的至少一个立体声参数,基于第三算法,将第N帧下混信号还原为第N帧音频信号。
可选的,第五类型帧中包含下混信号和立体声参数集合,第六类型帧中包含下混信号且不包含立体声参数集合,第五类型帧和第六类型帧分别为第一类型帧的一种情况,第三类型帧中包含立体声参数集合且不包含下混信号,第四类型帧中不包含下混信号且不包含立体声参数集合,第三类型帧和第四类型帧分别为第二类型帧的一种情况:
解码单元410还用于若确定第N帧码流为第一类型帧:当第N帧码流为第五类型帧时,对第N帧码流解码,在得到第N帧下混信号的同时,还得到第N帧立体声参数集合;当第N帧码流为第六类型帧时,则根据预设第二规则,从第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据k帧立体声参数集合,基于预定第四算法,得到第N帧立体声参数集合;
解码单元410还用于若确定第N帧码流为第二类型帧,当第N帧码流为第三类型帧时,则对第N帧码流解码,得到第N帧立体声参数集合;当第N帧码流为第四类型帧时,则根据预设第二规则,从第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据k帧立体声参数集合,基于预定第四算法,得到第N帧立体声参数集合;
其中,第N帧立体声参数集合中的至少一个立体声参数用于解码器基于预定第三算法将第N帧下混信号还原为第N帧音频信号,k为大于零的正整数;
信号还原单元420,用于根据第N帧立体声参数集合中的至少一个立体声参数,基于第三算法,将第N帧下混信号还原为第N帧音频信号。
如图5所示,本发明实施例的编解码系统,包括如图3a~图3b所示的任一编码器500,和如图4所示的解码器510。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了 基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。

Claims (29)

  1. 一种处理多声道音频信号的方法,其特征在于,包括:
    编码器检测第N帧下混信号中是否包含语音信号,所述第N帧下混信号是由多声道中两个声道的第N帧音频信号基于预定第一算法混合后得到的,N为大于零的正整数;
    所述编码器在检测到所述第N帧下混信号中包含语音信号时,对所述第N帧下混信号编码;
    所述编码器在检测到所述第N帧下混信号中不包含语音信号时:
    所述编码器若确定所述第N帧下混信号满足预设的音频帧编码条件,则对所述第N帧下混信号编码;若确定所述第N帧下混信号不满足预设的音频帧编码条件,则不对所述第N帧下混信号编码。
  2. 如权利要求1所述的方法,其特征在于,所述编码器在检测到所述第N帧下混信号中包含语音信号时,对所述第N帧下混信号编码,包括:
    所述编码器在检测到所述第N帧下混信号中包含语音信号时,根据预设的语音帧编码速率对所述第N帧下混信号编码;
    所述编码器若确定所述第N帧下混信号满足预设的音频帧编码条件,则对所述第N帧下混信号编码,包括:
    所述编码器若确定所述第N帧下混信号满足预设的语音帧编码条件,则根据预设的语音帧编码速率对所述第N帧下混信号编码;
    所述编码器若确定所述第N帧下混信号不满足预设的语音帧编码条件、但满足预设的静音插入帧SID编码条件,则根据预设的SID编码速率对所述第N帧下混信号编码;其中,所述SID编码速率不大于所述语音帧编码速率。
  3. 如权利要求1或2所述的方法,其特征在于,所述方法还包括:
    所述编码器根据所述第N帧音频信号,得到第N帧立体声参数集合,其中,所述第N帧立体声参数集合中包括Z个立体声参数,所述Z个立体声参 数包括所述编码器基于所述预定第一算法对所述第N帧音频信号混合时所用到的参数,Z为大于零的正整数;
    所述编码器在检测到所述第N帧下混信号中包含语音信号时,则对所述第N帧立体声参数集合编码;
    所述编码器在检测到所述第N帧下混信号中不包含语音信号时:
    所述编码器若确定所述第N帧立体声参数集合满足预设的立体声参数编码条件,则对所述第N帧立体声参数集合中的至少一个立体声参数编码;若确定所述第N帧立体声参数集合不满足预设的立体声参数编码条件,则不对所述立体声参数集合编码。
  4. 如权利要求3所述的方法,其特征在于,所述编码器对所述第N帧立体声参数集合中的至少一个立体声参数编码,包括:
    所述编码器根据所述第N帧立体声参数集合中的Z个立体声参数,按照预设的立体声参数降维规则,得到X个目标立体声参数,其中,X为大于零且小于等于Z的正整数;
    所述编码器对所述X个目标立体声参数编码。
  5. 如权利要求2所述的方法,其特征在于,还包括:
    所述编码器在检测到所述第N帧音频信号包含语音信号时:
    所述编码器根据所述第N帧音频信号,基于第一立体声参数集合生成方式,得到所述第N帧立体声参数集合,并对所述第N帧立体声参数集合编码;
    所述编码器在检测到所述第N帧音频信号不包含语音信号时:
    所述编码器若确定所述第N帧音频信号满足预设的语音帧编码条件,则根据所述第N帧音频信号,基于第一立体声参数集合生成方式,得到所述第N帧立体声参数集合,并对所述第N帧立体声参数集合编码;
    所述编码器若确定所述第N帧音频信号不满足预设的语音帧编码条件,则根据所述第N帧音频信号,基于第二立体声参数集合生成方式,得到所述第N帧立体声参数集合,并
    在确定所述第N帧立体声参数集合满足预设的立体声参数编码条件时, 对所述第N帧立体声参数集合中的至少一个立体声参数编码;在确定所述第N帧立体声参数集合不满足预设的立体声参数编码条件时,不对所述立体声参数集合编码;
    其中,所述第一立体声参数集合生成方式和所述第二立体声参数集合生成方式满足下列至少一个条件:
    所述第一立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数类型的个数不少于所述第二立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数类型的个数,所述第一立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数的个数不少于所述第二立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数的个数,所述第一立体声参数集合生成方式规定的立体声参数在时域的分辨率不低于所述第二立体声参数集合生成方式规定的对应的立体声参数在时域的分辨率,所述第一立体声参数集合生成方式规定的立体声参数在频域的分辨率不低于所述第二立体声参数集合生成方式规定的对应的立体声参数在频域的分辨率。
  6. 如权利要求3至5任一所述的方法,其特征在于,所述编码器对所述第N帧立体声参数集合编码,包括:
    所述编码器根据第一编码方式对所述第N帧立体声参数集合编码;
    所述编码器对所述第N帧立体声参数集合中的至少一个立体声参数编码,包括:
    所述编码器在所述第N帧下混信号满足所述语音帧编码条件时,根据第一编码方式对所述第N帧立体声参数集合中的至少一个立体声参数编码;
    所述编码器在所述第N帧下混信号不满足所述语音帧编码条件时,根据所述第二编码方式对所述第N帧立体声参数集合中的至少一个立体声参数编码;
    其中,所述第一编码方式规定的编码速率不小于所述第二编码方式规定的编码速率;和/或,针对所述第N帧立体声参数集合中的任一立体声参数,所述第一编码方式规定的量化精度不低于所述第二编码方式规定的量化精 度。
  7. 如权利要求3至6任一所述的方法,其特征在于,若所述第N帧立体声参数集合中的至少一个立体声参数包括:声道间电平差ILD;所述预设立体声参数编码条件中包括:DL≥D0
    其中,DL表示ILD与第一标准的偏离程度,所述第一标准是根据第N帧立体声参数集合之前的T帧立体声参数集合,基于预定第二算法确定的,T为大于0的正整数;
    若所述第N帧立体声参数集合中的至少一个立体声参数包括:声道间时间差ITD;所述预设立体声参数编码条件中包括:DT≥D1
    其中,DT表示ITD与第二标准的偏离程度,所述第二标准是根据第N帧立体声参数集合之前的T帧立体声参数集合,基于预定第三算法确定的,T为大于0的正整数;
    若所述第N帧立体声参数集合中的至少一个立体声参数包括:声道间相位差IPD;所述预设立体声参数编码条件中包括:Dp≥D2
    其中,DP表示IPD与第三标准的偏离程度,所述第三标准是根据第N帧立体声参数集合之前的T帧立体声参数集合,基于预定第四算法确定的,T为大于0的正整数。
  8. 如权利要求7所述的方法,其特征在于,DL、DT、DP分别满足下列表达式:
    Figure PCTCN2016100617-appb-100001
    Figure PCTCN2016100617-appb-100002
    Figure PCTCN2016100617-appb-100003
    其中,ILD(m)为所述两声道分别在第m个子频带传输所述第N帧音频信号时的电平差值,M为传输所述第N帧音频信号所占用的子频带的总个数,
    Figure PCTCN2016100617-appb-100004
    为在所述第N帧之前的T帧立体声参数集合中在所述第m个子频带的ILD的平均值,T为大于0的正整数,ILD[-t](m)为所述两声道分别在第m个子频带传输所述第N帧音频信号之前的第t帧音频信号时的电平差值,ITD为所述两声道分别传输所述第N帧音频信号时的时间差值,
    Figure PCTCN2016100617-appb-100005
    为在所述第N帧之前的T帧立体声参数集合中的ITD的平均值,ITD[-t]为所述两声道分别传输所述第N帧音频信号之前的第t帧音频信号时的时间差值,IPD(m)为所述两声道分别在第m个子频带传输所述第N帧音频信号中的部分音频信号时的相位差值,
    Figure PCTCN2016100617-appb-100006
    为在所述第N帧之前的T帧立体声参数集合中在第m个子频带的IPD的平均值,IPD[-t](m)为所述两声道分别在第m个子频带传输所述第N帧音频信号之前的第t帧音频信号时的相位差值。
  9. 一种处理多声道音频信号的方法,其特征在于,包括:
    解码器接收到码流,所述码流包括至少两个帧,所述至少两个帧中存在至少一个第一类型帧和至少一个第二类型帧,所述第一类型帧中包含下混信号,所述第二类型帧中不包含下混信号;
    针对第N帧码流,所述N为大于1的正整数:
    所述解码器若确定所述第N帧码流为所述第一类型帧,则对所述第N帧码流解码,得到第N帧下混信号;
    所述解码器若确定所述第N帧码流为所述第二类型帧,则根据预设第一规则,从所述第N帧下混信号之前的至少一帧下混信号中,确定m帧下混信号,并根据所述m帧下混信号,基于预定第一算法,得到所述第N帧下混信号,m为大于零的正整数;
    其中,所述第N帧下混信号是编码器由多声道中两个声道的第N帧音频信号基于预定第二算法混合后得到的。
  10. 如权利要求9所述的方法,其特征在于,所述第一类型帧中包含下混信号和立体声参数集合,所述第二类型帧中包含立体声参数集合且不包含 下混信号:
    所述解码器若确定所述第N帧码流为所述第一类型帧,则对所述第N帧码流解码之后,还包括:
    所述解码器得到第N帧立体声参数集合;
    所述解码器若确定所述第N帧码流为所述第二类型帧之后,还包括:
    所述解码器对所述第N帧码流解码,得到第N帧立体声参数集合;
    其中,所述第N帧立体声参数集合中的至少一个立体声参数用于所述解码器基于所述预定第三算法将所述第N帧下混信号还原为所述第N帧音频信号
    所述解码器根据所述第N帧立体声参数集合中的至少一个立体声参数,基于所述第三算法,将所述第N帧下混信号还原为所述第N帧音频信号。
  11. 如权利要求9所述的方法,其特征在于,所述第一类型帧中包含下混信号和立体声参数集合,所述第二类型帧中不包含下混信号且不包含立体声参数集合;
    所述解码器若确定所述第N帧码流为所述第一类型帧,则对所述第N帧码流解码之后,还包括:
    所述解码器得到第N帧立体声参数集合;
    所述解码器若确定所述第N帧码流为所述第二类型帧之后,还包括:
    所述解码器根据预设第二规则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合,k为大于零的正整数;
    其中,所述第N帧立体声参数集合中的至少一个立体声参数用于所述解码器基于所述预定第三算法将所述第N帧下混信号还原为所述第N帧音频信号;
    所述解码器根据所述第N帧立体声参数集合中的至少一个立体声参数,基于所述第三算法,将所述第N帧下混信号还原为所述第N帧音频信号。
  12. 如权利要求9所述的方法,其特征在于,所述第一类型帧中包含下混信号和立体声参数集合,第三类型帧中包含立体声参数集合且不包含下混信号,第四类型帧中不包含下混信号且不包含立体声参数集合,所述第三类型帧和所述第四类型帧分别为所述第二类型帧的一种情况:
    所述解码器若确定所述第N帧码流为所述第一类型帧,则对所述第N帧码流解码之后,还包括:
    所述解码器得到第N帧立体声参数集合;
    所述解码器若确定所述第N帧码流为所述第二类型帧之后,还包括:
    当所述第N帧码流为所述第三类型帧时,所述解码器对所述第N帧码流解码,得到第N帧立体声参数集合;
    当所述第N帧码流为所述第四类型帧时,所述解码器根据预设第二规则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合,k为大于零的正整数;
    其中,所述第N帧立体声参数集合中的至少一个立体声参数用于所述解码器基于所述预定第三算法将所述第N帧下混信号还原为所述第N帧音频信号;
    所述解码器根据所述第N帧立体声参数集合中的至少一个立体声参数,基于所述第三算法,将所述第N帧下混信号还原为所述第N帧音频信号。
  13. 如权利要求9所述的方法,其特征在于,第五类型帧中包含下混信号和立体声参数集合,第六类型帧中包含下混信号且不包含立体声参数集合,所述第五类型帧和所述第六类型帧分别为所述第一类型帧的一种情况,所述第二类型帧中不包含下混信号且不包含立体声参数集合:
    所述解码器若确定所述第N帧码流为所述第一类型帧之后,还包括:
    当所述第N帧码流为所述第五类型帧时,所述解码器对所述第N帧码流解码,得到第N帧立体声参数集合;
    当所述第N帧码流为所述第六类型帧时,所述解码器根据预设第二规 则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合;
    所述解码器若确定所述第N帧码流为所述第二类型帧之后,还包括:
    所述解码器根据预设第二规则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合,
    其中,所述第N帧立体声参数集合中的至少一个立体声参数用于所述解码器基于所述预定第三算法将所述第N帧下混信号还原为所述第N帧音频信号,所述k为大于零的正整数;
    所述解码器根据所述第N帧立体声参数集合中的至少一个立体声参数,基于所述第三算法,将所述第N帧下混信号还原为所述第N帧音频信号。
  14. 如权利要求9所述的方法,其特征在于,第五类型帧中包含下混信号和立体声参数集合,第六类型帧中包含下混信号且不包含立体声参数集合,所述第五类型帧和所述第六类型帧分别为所述第一类型帧的一种情况,第三类型帧中包含立体声参数集合且不包含下混信号,第四类型帧中不包含下混信号且不包含立体声参数集合,所述第三类型帧和所述第四类型帧分别为所述第二类型帧的一种情况:
    所述解码器若确定所述第N帧码流为所述第一类型帧之后,还包括:
    当所述第N帧码流为所述第五类型帧时,所述解码器对所述第N帧码流解码,得到第N帧立体声参数集合;
    当所述第N帧码流为所述第六类型帧时,所述解码器根据预设第二规则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合;
    所述解码器若确定所述第N帧码流为所述第二类型帧之后,还包括:
    当所述第N帧码流为所述第三类型帧时,所述解码器对所述第N帧码流 解码,得到第N帧立体声参数集合;
    当所述第N帧码流为所述第四类型帧时,所述解码器根据预设第二规则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合;
    其中,所述第N帧立体声参数集合中的至少一个立体声参数用于所述解码器基于所述预定第三算法将所述第N帧下混信号还原为所述第N帧音频信号,k为大于零的正整数;
    所述解码器根据所述第N帧立体声参数集合中的至少一个立体声参数,基于所述第三算法,将所述第N帧下混信号还原为所述第N帧音频信号。
  15. 一种编码器,其特征在于,包括:
    信号检测单元,用于检测第N帧下混信号中是否包含语音信号,所述第N帧下混信号是由多声道中两个声道的第N帧音频信号基于预定第一算法混合后得到的,N为大于零的正整数;
    信号编码单元,用于在所述信号检测单元检测到所述第N帧下混信号中包含语音信号时,对所述第N帧下混信号编码;
    所述信号编码单元,还用于在所述信号检测单元检测到所述第N帧下混信号中不包含语音信号时:
    若所述信号检测单元确定所述第N帧下混信号满足预设的音频帧编码条件,则对所述第N帧下混信号编码;若所述信号检测单元确定所述第N帧下混信号不满足预设的音频帧编码条件,则不对所述第N帧下混信号编码。
  16. 如权利要求15所述的编码器,其特征在于,所述信号编码单元包括第一信号编码单元和第二信号编码单元,所述第一信号编码单元,具体用于:
    在所述信号检测单元检测到所述第N帧下混信号中包含语音信号时,根据预设的语音帧编码速率对所述第N帧下混信号编码;
    若所述信号检测单元确定所述第N帧下混信号满足预设的语音帧编码条件,则根据预设的语音帧编码速率对所述第N帧下混信号编码;
    所述第二信号编码单元,具体用于:
    若所述信号检测单元确定所述第N帧下混信号不满足预设的语音帧编码条件、但满足预设的静音插入帧SID编码条件,则根据预设的SID编码速率对所述第N帧下混信号编码;其中,所述SID编码速率不大于所述语音帧编码速率。
  17. 如权利要求15或16所述的编码器,其特征在于,还包括参数生成单元、参数编码单元和参数检测单元;
    所述参数生成单元,用于根据所述第N帧音频信号,得到第N帧立体声参数集合,其中,所述第N帧立体声参数集合中包括Z个立体声参数,所述Z个立体声参数包括所述编码器基于所述预定第一算法对所述第N帧音频信号混合时所用到的参数,Z为大于零的正整数;
    所述参数编码单元,用于在所述信号检测单元检测到所述第N帧下混信号中包含语音信号时,则对所述第N帧立体声参数集合编码;
    所述参数编码单元,在所述信号检测单元检测到所述第N帧下混信号中不包含语音信号时,还用于:
    若所述参数检测单元确定所述第N帧立体声参数集合满足预设的立体声参数编码条件,则对所述第N帧立体声参数集合中的至少一个立体声参数编码;若所述参数检测单元确定所述第N帧立体声参数集合不满足预设的立体声参数编码条件,则不对所述立体声参数集合编码。
  18. 如权利要求17所述的编码器,其特征在于,所述参数编码单元对所述第N帧立体声参数集合中的至少一个立体声参数编码,具体用于:
    根据所述第N帧立体声参数集合中的Z个立体声参数,按照预设的立体声参数降维规则,得到X个目标立体声参数,并对所述X个目标立体声参数编码,其中,X为大于零且小于等于Z的正整数。
  19. 如权利要求16所述的编码器,其特征在于,所述参数生成单元包括第一参数生成单元和第二参数生成单元;
    所述第一参数生成单元,用于在所述信号检测单元检测到所述第N帧音 频信号包含语音信号时以及在所述信号检测单元检测到所述第N帧音频信号不包含语音信号、且确定所述第N帧音频信号满足预设的语音帧编码条件时:根据所述第N帧音频信号,基于第一立体声参数集合生成方式,得到所述第N帧立体声参数集合,并通过参数编码单元对所述第N帧立体声参数集合编码;
    所述第二参数生成单元,用于在所述信号检测单元检测到所述第N帧音频信号不包含语音信号、且确定所述第N帧音频信号不满足预设的语音帧编码条件时:
    根据所述第N帧音频信号,基于第二立体声参数集合生成方式,得到所述第N帧立体声参数集合,并
    在所述参数检测单元确定所述第N帧立体声参数集合满足预设的立体声参数编码条件时,对所述第N帧立体声参数集合中的至少一个立体声参数编码;在所述参数检测单元确定所述第N帧立体声参数集合不满足预设的立体声参数编码条件时,不对所述立体声参数集合编码;
    其中,所述第一立体声参数集合生成方式和所述第二立体声参数集合生成方式满足下列至少一个条件:
    所述第一立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数类型的个数不少于所述第二立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数类型的个数,所述第一立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数的个数不少于所述第二立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数的个数,所述第一立体声参数集合生成方式规定的立体声参数在时域的分辨率不低于所述第二立体声参数集合生成方式规定的对应的立体声参数在时域的分辨率,所述第一立体声参数集合生成方式规定的立体声参数在频域的分辨率不低于所述第二立体声参数集合生成方式规定的对应的立体声参数在频域的分辨率。
  20. 如权利要求17至19任一所述的编码器,其特征在于,所述参数编码单元包括第一参数编码单元和第二参数编码单元;
    所述第一参数编码单元,用于在所述信号检测单元检测到第N帧下混信号中包含语音信号以及所述第N帧下混信号满足所述语音帧编码条件时,根据第一编码方式对所述第N帧立体声参数集合编码;
    所述第二参数编码单元,具体用于:在所述第N帧下混信号不满足所述语音帧编码条件时,根据所述第二编码方式对所述第N帧立体声参数集合中的至少一个立体声参数编码;
    其中,所述第一编码方式规定的编码速率不小于所述第二编码方式规定的编码速率;和/或,针对所述第N帧立体声参数集合中的任一立体声参数,所述第一编码方式规定的量化精度不低于所述第二编码方式规定的量化精度。
  21. 如权利要求17至20任一所述的编码器,其特征在于,若所述第N帧立体声参数集合中的至少一个立体声参数包括:声道间电平差ILD;所述预设立体声参数编码条件中包括:DL≥D0
    其中,DL表示ILD与第一标准的偏离程度,所述第一标准是根据第N帧立体声参数集合之前的T帧立体声参数集合,基于预定第二算法确定的,T为大于0的正整数;
    若所述第N帧立体声参数集合中的至少一个立体声参数包括:声道间时间差ITD;所述预设立体声参数编码条件中包括:DT≥D1
    其中,DT表示ITD与第二标准的偏离程度,所述第二标准是根据第N帧立体声参数集合之前的T帧立体声参数集合,基于预定第三算法确定的,T为大于0的正整数;
    若所述第N帧立体声参数集合中的至少一个立体声参数包括:声道间相位差IPD;所述预设立体声参数编码条件中包括:Dp≥D2
    其中,DP表示IPD与第三标准的偏离程度,所述第三标准是根据第N帧立体声参数集合之前的T帧立体声参数集合,基于预定第四算法确定的,T为大于0的正整数。
  22. 如权利要求21所述的编码器,其特征在于,DL、DT、DP分别满足下列表达式:
    Figure PCTCN2016100617-appb-100007
    Figure PCTCN2016100617-appb-100008
    Figure PCTCN2016100617-appb-100009
    其中,ILD(m)为所述两声道分别在第m个子频带传输所述第N帧音频信号时的电平差值,M为传输所述第N帧音频信号所占用的子频带的总个数,
    Figure PCTCN2016100617-appb-100010
    为在所述第N帧之前的T帧立体声参数集合中在所述第m个子频带的ILD的平均值,T为大于0的正整数,ILD[-t](m)为所述两声道分别在第m个子频带传输所述第N帧音频信号之前的第t帧音频信号时的电平差值,ITD为所述两声道分别传输所述第N帧音频信号时的时间差值,
    Figure PCTCN2016100617-appb-100011
    为在所述第N帧之前的T帧立体声参数集合中的ITD的平均值,ITD[-t]为所述两声道分别传输所述第N帧音频信号之前的第t帧音频信号时的时间差值,IPD(m)为所述两声道分别在第m个子频带传输所述第N帧音频信号中的部分音频信号时的相位差值,
    Figure PCTCN2016100617-appb-100012
    为在所述第N帧之前的T帧立体声参数集合中在第m个子频带的IPD的平均值,IPD[-t](m)为所述两声道分别在第m个子频带传输所述第N帧音频信号之前的第t帧音频信号时的相位差值。
  23. 一种解码器,其特征在于,包括:
    接收单元,用于接收到码流,所述码流包括至少两个帧,所述至少两个帧中存在至少一个第一类型帧和至少一个第二类型帧,所述第一类型帧中包含下混信号,所述第二类型帧中不包含下混信号;
    针对第N帧码流,所述N为大于1的正整数,解码单元,用于:
    若确定所述第N帧码流为所述第一类型帧,对所述第N帧码流解码,得 到第N帧下混信号;
    若确定所述第N帧码流为所述第二类型帧,则根据预设第一规则,从所述第N帧下混信号之前的至少一帧下混信号中,确定m帧下混信号,并根据所述m帧下混信号,基于预定第一算法,得到所述第N帧下混信号,m为大于零的正整数;
    其中,所述第N帧下混信号是编码器由多声道中两个声道的第N帧音频信号基于预定第二算法混合后得到的。
  24. 如权利要求23所述的解码器,其特征在于,所述第一类型帧中包含下混信号和立体声参数集合,所述第二类型帧中包含立体声参数集合且不包含下混信号:
    所述解码单元还用于:
    若确定所述第N帧码流为所述第一类型帧,则对所述第N帧码流解码之后,得到第N帧立体声参数集合;
    若确定所述第N帧码流为所述第二类型帧,则对所述第N帧码流解码,得到第N帧立体声参数结合;
    其中,所述第N帧立体声参数集合中的至少一个立体声参数用于所述解码器基于所述预定第三算法将所述第N帧下混信号还原为所述第N帧音频信号;
    所述解码器还包括,信号还原单元;
    所述信号还原单元,用于根据所述第N帧立体声参数集合中的至少一个立体声参数,基于所述第三算法,将所述第N帧下混信号还原为所述第N帧音频信号。
  25. 如权利要求23所述的解码器,其特征在于,所述第一类型帧中包含下混信号和立体声参数集合,所述第二类型帧中不包含下混信号且不包含立体声参数集合;
    所述解码单元,还用于:
    若确定所述第N帧码流为所述第一类型帧,则对所述第N帧码流解码之 后,得到第N帧立体声参数集合;
    若确定所述第N帧码流为所述第二类型帧,则根据预设第二规则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合,k为大于零的正整数;
    其中,所述第N帧立体声参数集合中的至少一个立体声参数用于所述解码器基于所述预定第三算法将所述第N帧下混信号还原为所述第N帧音频信号;
    所述解码器还包括,信号还原单元;
    所述信号还原单元,用于根据所述第N帧立体声参数集合中的至少一个立体声参数,基于所述第三算法,将所述第N帧下混信号还原为所述第N帧音频信号。
  26. 如权利要求23所述的解码器,其特征在于,所述第一类型帧中包含下混信号和立体声参数集合,第三类型帧中包含立体声参数集合且不包含下混信号,第四类型帧中不包含下混信号且不包含立体声参数集合,所述第三类型帧和所述第四类型帧分别为所述第二类型帧的一种情况:
    所述解码单元,还用于:
    若确定所述第N帧码流为所述第一类型帧,则对所述第N帧码流解码之后,得到第N帧立体声参数集合;
    若确定所述第N帧码流为所述第二类型帧,则当所述第N帧码流为所述第三类型帧时,对所述第N帧码流解码,得到第N帧立体声参数集合;当所述第N帧码流为所述第四类型帧时,则根据预设第二规则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合,k为大于零的正整数;
    其中,所述第N帧立体声参数集合中的至少一个立体声参数用于所述解码器基于所述预定第三算法将所述第N帧下混信号还原为所述第N帧音频信 号;
    所述解码器还包括,信号还原单元;
    所述信号还原单元,用于根据所述第N帧立体声参数集合中的至少一个立体声参数,基于所述第三算法,将所述第N帧下混信号还原为所述第N帧音频信号。
  27. 如权利要求23所述的解码器,其特征在于,第五类型帧中包含下混信号和立体声参数集合,第六类型帧中包含下混信号且不包含立体声参数集合,所述第五类型帧和所述第六类型帧分别为所述第一类型帧的一种情况,所述第二类型帧中不包含下混信号且不包含立体声参数集合:
    所述解码单元,还用于:
    若确定所述第N帧码流为所述第一类型帧,则当所述第N帧码流为所述第五类型帧时,对所述第N帧码流解码之后,得到第N帧立体声参数集合;当所述第N帧码流为所述第六类型帧时,则根据预设第二规则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合;
    若确定所述第N帧码流为所述第二类型帧,则根据预设第二规则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合,
    其中,所述第N帧立体声参数集合中的至少一个立体声参数用于所述解码器基于所述预定第三算法将所述第N帧下混信号还原为所述第N帧音频信号,所述k为大于零的正整数;
    所述解码器还包括,信号还原单元;
    所述信号还原单元,用于根据所述第N帧立体声参数集合中的至少一个立体声参数,基于所述第三算法,将所述第N帧下混信号还原为所述第N帧音频信号。
  28. 如权利要求23所述的解码器,其特征在于,第五类型帧中包含下混信号和立体声参数集合,第六类型帧中包含下混信号且不包含立体声参数集合,所述第五类型帧和所述第六类型帧分别为所述第一类型帧的一种情况,第三类型帧中包含立体声参数集合且不包含下混信号,第四类型帧中不包含下混信号且不包含立体声参数集合,所述第三类型帧和所述第四类型帧分别为所述第二类型帧的一种情况:
    所述解码单元,还用于:
    若确定所述第N帧码流为所述第一类型帧,当所述第N帧码流为所述第五类型帧时,则对所述第N帧码流解码之后,得到第N帧立体声参数集合;当所述第N帧码流为所述第六类型帧时,则根据预设第二规则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合;
    若确定所述第N帧码流为所述第二类型帧,当所述第N帧码流为所述第三类型帧时,对所述第N帧码流解码,得到第N帧立体声参数集合;当所述第N帧码流为所述第四类型帧时,则根据预设第二规则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合;
    其中,所述第N帧立体声参数集合中的至少一个立体声参数用于所述解码器基于所述预定第三算法将所述第N帧下混信号还原为所述第N帧音频信号,k为大于零的正整数;
    所述解码器还包括,信号还原单元;
    所述信号还原单元,用于根据所述第N帧立体声参数集合中的至少一个立体声参数,基于所述第三算法,将所述第N帧下混信号还原为所述第N帧音频信号。
  29. 一种编解码系统,其特征在于,包括如权利要求15至22任一所述 的编码器,和如权利要求23至28任一所述的解码器。
PCT/CN2016/100617 2016-09-28 2016-09-28 一种处理多声道音频信号的方法、装置和系统 WO2018058379A1 (zh)

Priority Applications (17)

Application Number Priority Date Filing Date Title
CN202311267474.8A CN117392988A (zh) 2016-09-28 2016-09-28 一种处理多声道音频信号的方法、装置和系统
CN202311262035.8A CN117351966A (zh) 2016-09-28 2016-09-28 一种处理多声道音频信号的方法、装置和系统
KR1020227012057A KR102480710B1 (ko) 2016-09-28 2016-09-28 다중 채널 오디오 신호 처리 방법, 장치 및 시스템
CN202311261449.9A CN117351965A (zh) 2016-09-28 2016-09-28 一种处理多声道音频信号的方法、装置和系统
KR1020217028255A KR102387162B1 (ko) 2016-09-28 2016-09-28 다중 채널 오디오 신호 처리 방법, 장치 및 시스템
MX2019003417A MX2019003417A (es) 2016-09-28 2016-09-28 Metodo, aparato y sistema de procesamiento de señales de audio de multicanal.
CN201680010600.3A CN108140393B (zh) 2016-09-28 2016-09-28 一种处理多声道音频信号的方法、装置和系统
KR1020197011605A KR20190052122A (ko) 2016-09-28 2016-09-28 다중 채널 오디오 신호 처리 방법, 장치 및 시스템
BR112019005983-0A BR112019005983B1 (pt) 2016-09-28 Método de processamento de sinal de áudio de multicanais, codificador, decodificador e sistema de codificação e decodificação
CN202311261321.2A CN117476018A (zh) 2016-09-28 2016-09-28 一种处理多声道音频信号的方法、装置和系统
PCT/CN2016/100617 WO2018058379A1 (zh) 2016-09-28 2016-09-28 一种处理多声道音频信号的方法、装置和系统
JP2019516957A JP6790251B2 (ja) 2016-09-28 2016-09-28 マルチチャネルオーディオ信号処理方法、装置、およびシステム
EP21163871.3A EP3910629A1 (en) 2016-09-28 2016-09-28 Multichannel audio signal processing method, apparatus, and system
EP16917134.5A EP3511934B1 (en) 2016-09-28 2016-09-28 Method, apparatus and system for processing multi-channel audio signal
US16/368,208 US10593339B2 (en) 2016-09-28 2019-03-28 Multichannel audio signal processing method, apparatus, and system
US16/781,421 US10984807B2 (en) 2016-09-28 2020-02-04 Multichannel audio signal processing method, apparatus, and system
US17/232,679 US11922954B2 (en) 2016-09-28 2021-04-16 Multichannel audio signal processing method, apparatus, and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/100617 WO2018058379A1 (zh) 2016-09-28 2016-09-28 一种处理多声道音频信号的方法、装置和系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/368,208 Continuation US10593339B2 (en) 2016-09-28 2019-03-28 Multichannel audio signal processing method, apparatus, and system

Publications (1)

Publication Number Publication Date
WO2018058379A1 true WO2018058379A1 (zh) 2018-04-05

Family

ID=61763024

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/100617 WO2018058379A1 (zh) 2016-09-28 2016-09-28 一种处理多声道音频信号的方法、装置和系统

Country Status (7)

Country Link
US (3) US10593339B2 (zh)
EP (2) EP3511934B1 (zh)
JP (1) JP6790251B2 (zh)
KR (3) KR20190052122A (zh)
CN (5) CN117351966A (zh)
MX (1) MX2019003417A (zh)
WO (1) WO2018058379A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110556119A (zh) * 2018-05-31 2019-12-10 华为技术有限公司 一种下混信号的计算方法及装置

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
MX2019003417A (es) * 2016-09-28 2019-10-07 Huawei Tech Co Ltd Metodo, aparato y sistema de procesamiento de señales de audio de multicanal.
KR20210154807A (ko) * 2019-04-18 2021-12-21 돌비 레버러토리즈 라이쎈싱 코오포레이션 다이얼로그 검출기
CN115867964A (zh) * 2020-06-11 2023-03-28 杜比实验室特许公司 用于对多声道输入信号内的空间背景噪声进行编码和/或解码的方法和设备
AU2021317755B2 (en) * 2020-07-30 2023-11-09 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus, method and computer program for encoding an audio signal or for decoding an encoded audio scene
WO2024056702A1 (en) * 2022-09-13 2024-03-21 Telefonaktiebolaget Lm Ericsson (Publ) Adaptive inter-channel time difference estimation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101320563A (zh) * 2007-06-05 2008-12-10 华为技术有限公司 一种背景噪声编码/解码装置、方法和通信设备
CN101556799A (zh) * 2009-05-14 2009-10-14 华为技术有限公司 一种音频解码方法和音频解码器
CN101661749A (zh) * 2009-09-23 2010-03-03 清华大学 一种语音和音乐双模切换编/解码的方法
CN103188595A (zh) * 2011-12-31 2013-07-03 展讯通信(上海)有限公司 处理多声道音频信号的方法和系统
US20140330415A1 (en) * 2011-11-10 2014-11-06 Nokia Corporation Method and apparatus for detecting audio sampling rate

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0713586B2 (ja) 1987-02-20 1995-02-15 三機工業株式会社 自動車エンジン実験用移動油水制御装置
JP2835483B2 (ja) * 1993-06-23 1998-12-14 松下電器産業株式会社 音声判別装置と音響再生装置
JP2728122B2 (ja) * 1995-05-23 1998-03-18 日本電気株式会社 無音圧縮音声符号化復号化装置
EP0977172A4 (en) * 1997-03-19 2000-12-27 Hitachi Ltd METHOD AND DEVICE FOR DETERMINING THE START AND END POINT OF A SOUND SECTION IN VIDEO
ATE388542T1 (de) * 1999-12-13 2008-03-15 Broadcom Corp Sprach-durchgangsvorrichtung mit sprachsynchronisierung in abwärtsrichtung
JP3526269B2 (ja) 2000-12-11 2004-05-10 株式会社東芝 ネットワーク間中継装置及び該中継装置における転送スケジューリング方法
US7657706B2 (en) 2003-12-18 2010-02-02 Cisco Technology, Inc. High speed memory and input/output processor subsystem for efficiently allocating and using high-speed memory and slower-speed memory
KR100888474B1 (ko) * 2005-11-21 2009-03-12 삼성전자주식회사 멀티채널 오디오 신호의 부호화/복호화 장치 및 방법
JP2008286904A (ja) * 2007-05-16 2008-11-27 Panasonic Corp オーディオ複号化装置
JP2011504250A (ja) * 2007-11-21 2011-02-03 エルジー エレクトロニクス インコーポレイティド 信号処理方法及び装置
EP2144229A1 (en) * 2008-07-11 2010-01-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Efficient use of phase information in audio encoding and decoding
KR101137652B1 (ko) * 2009-10-14 2012-04-23 광운대학교 산학협력단 천이 구간에 기초하여 윈도우의 오버랩 영역을 조절하는 통합 음성/오디오 부호화/복호화 장치 및 방법
US9324337B2 (en) * 2009-11-17 2016-04-26 Dolby Laboratories Licensing Corporation Method and system for dialog enhancement
JP5299327B2 (ja) * 2010-03-17 2013-09-25 ソニー株式会社 音声処理装置、音声処理方法、およびプログラム
JP5581449B2 (ja) * 2010-08-24 2014-08-27 ドルビー・インターナショナル・アーベー Fmステレオ無線受信機の断続的モノラル受信の隠蔽
US8831937B2 (en) * 2010-11-12 2014-09-09 Audience, Inc. Post-noise suppression processing to improve voice quality
WO2012066727A1 (ja) * 2010-11-17 2012-05-24 パナソニック株式会社 ステレオ信号符号化装置、ステレオ信号復号装置、ステレオ信号符号化方法及びステレオ信号復号方法
US9036526B2 (en) * 2012-11-08 2015-05-19 Qualcomm Incorporated Voice state assisted frame early termination
CN105247610B (zh) * 2013-05-31 2019-11-08 索尼公司 编码装置和方法、解码装置和方法以及记录介质
CN105304080B (zh) * 2015-09-22 2019-09-03 科大讯飞股份有限公司 语音合成装置及方法
RU2763374C2 (ru) * 2015-09-25 2021-12-28 Войсэйдж Корпорейшн Способ и система с использованием разности долговременных корреляций между левым и правым каналами для понижающего микширования во временной области стереофонического звукового сигнала в первичный и вторичный каналы
US20170134282A1 (en) 2015-11-10 2017-05-11 Ciena Corporation Per queue per service differentiation for dropping packets in weighted random early detection
MX2019003417A (es) * 2016-09-28 2019-10-07 Huawei Tech Co Ltd Metodo, aparato y sistema de procesamiento de señales de audio de multicanal.
CN109285536B (zh) * 2018-11-23 2022-05-13 出门问问创新科技有限公司 一种语音特效合成方法、装置、电子设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101320563A (zh) * 2007-06-05 2008-12-10 华为技术有限公司 一种背景噪声编码/解码装置、方法和通信设备
CN101556799A (zh) * 2009-05-14 2009-10-14 华为技术有限公司 一种音频解码方法和音频解码器
CN101661749A (zh) * 2009-09-23 2010-03-03 清华大学 一种语音和音乐双模切换编/解码的方法
US20140330415A1 (en) * 2011-11-10 2014-11-06 Nokia Corporation Method and apparatus for detecting audio sampling rate
CN103188595A (zh) * 2011-12-31 2013-07-03 展讯通信(上海)有限公司 处理多声道音频信号的方法和系统

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110556119A (zh) * 2018-05-31 2019-12-10 华为技术有限公司 一种下混信号的计算方法及装置
CN110556119B (zh) * 2018-05-31 2022-02-18 华为技术有限公司 一种下混信号的计算方法及装置
US11869517B2 (en) 2018-05-31 2024-01-09 Huawei Technologies Co., Ltd. Downmixed signal calculation method and apparatus

Also Published As

Publication number Publication date
CN117351966A (zh) 2024-01-05
US20190221219A1 (en) 2019-07-18
EP3511934A1 (en) 2019-07-17
CN117476018A (zh) 2024-01-30
MX2019003417A (es) 2019-10-07
US10593339B2 (en) 2020-03-17
CN117392988A (zh) 2024-01-12
JP2019533189A (ja) 2019-11-14
US10984807B2 (en) 2021-04-20
US20200273468A1 (en) 2020-08-27
EP3511934A4 (en) 2019-08-14
EP3511934B1 (en) 2021-04-21
KR20210111898A (ko) 2021-09-13
KR20220053030A (ko) 2022-04-28
EP3910629A1 (en) 2021-11-17
CN117351965A (zh) 2024-01-05
KR102387162B1 (ko) 2022-04-14
US11922954B2 (en) 2024-03-05
US20210312932A1 (en) 2021-10-07
BR112019005983A2 (pt) 2019-10-01
CN108140393A (zh) 2018-06-08
CN108140393B (zh) 2023-10-20
JP6790251B2 (ja) 2020-11-25
KR102480710B1 (ko) 2022-12-22
KR20190052122A (ko) 2019-05-15

Similar Documents

Publication Publication Date Title
TWI752281B (zh) 用以使用量化及熵寫碼來編碼或解碼方向性音訊寫碼參數之設備及方法
WO2018058379A1 (zh) 一种处理多声道音频信号的方法、装置和系统
US9384743B2 (en) Apparatus and method for encoding/decoding multichannel signal
US9324329B2 (en) Method for parametric spatial audio coding and decoding, parametric spatial audio coder and parametric spatial audio decoder
US9275646B2 (en) Method for inter-channel difference estimation and spatial audio coding device
US20120093321A1 (en) Apparatus and method for encoding and decoding spatial parameter
EP3664083A1 (en) Signal reconstruction method and device in stereo signal encoding
WO2011153913A1 (zh) 边带残差信号生成方法及装置
WO2024052499A1 (en) Decoder and decoding method for discontinuous transmission of parametrically coded independent streams with metadata
WO2024051954A1 (en) Encoder and encoding method for discontinuous transmission of parametrically coded independent streams with metadata
TW202411984A (zh) 用於具有元資料之參數化經寫碼獨立串流之不連續傳輸的編碼器及編碼方法
BR112019005983B1 (pt) Método de processamento de sinal de áudio de multicanais, codificador, decodificador e sistema de codificação e decodificação

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201680010600.3

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16917134

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019516957

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112019005983

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 2016917134

Country of ref document: EP

Effective date: 20190408

ENP Entry into the national phase

Ref document number: 20197011605

Country of ref document: KR

Kind code of ref document: A

REG Reference to national code

Ref country code: BR

Ref legal event code: B01E

Ref document number: 112019005983

Country of ref document: BR

Free format text: APRESENTE A NUMERACAO CORRETA DAS PAGINAS DAS REIVINDICACOES

ENP Entry into the national phase

Ref document number: 112019005983

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20190326