WO2018058379A1 - 一种处理多声道音频信号的方法、装置和系统 - Google Patents
一种处理多声道音频信号的方法、装置和系统 Download PDFInfo
- Publication number
- WO2018058379A1 WO2018058379A1 PCT/CN2016/100617 CN2016100617W WO2018058379A1 WO 2018058379 A1 WO2018058379 A1 WO 2018058379A1 CN 2016100617 W CN2016100617 W CN 2016100617W WO 2018058379 A1 WO2018058379 A1 WO 2018058379A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- frame
- stereo parameter
- nth frame
- parameter set
- nth
- Prior art date
Links
- 230000005236 sound signal Effects 0.000 title claims abstract description 267
- 238000000034 method Methods 0.000 title claims abstract description 70
- 238000012545 processing Methods 0.000 title claims abstract description 23
- 238000013139 quantization Methods 0.000 claims description 28
- 230000009467 reduction Effects 0.000 claims description 15
- 208000029523 Interstitial Lung disease Diseases 0.000 claims description 11
- 230000005540 biological transmission Effects 0.000 claims description 8
- 230000014509 gene expression Effects 0.000 claims description 8
- 238000003780 insertion Methods 0.000 claims description 5
- 230000037431 insertion Effects 0.000 claims description 5
- 238000004891 communication Methods 0.000 abstract description 24
- 238000001514 detection method Methods 0.000 abstract description 10
- 230000006835 compression Effects 0.000 description 14
- 238000007906 compression Methods 0.000 description 14
- 238000004590 computer program Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/005—Correction of errors induced by the transmission channel, if related to the coding algorithm
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/012—Comfort noise or silence coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/008—Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/24—Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/03—Aspects of down-mixing multi-channel audio to configurations with lower numbers of playback channels, e.g. 7.1 -> 5.1
Definitions
- the present invention relates to the field of audio codec technology, and in particular, to a method, apparatus and system for processing a multi-channel audio signal.
- the original audio signal of each frame transmitted is usually encoded and transmitted at the transmitting end, and the audio signal is compressed by the encoding.
- the receiving end receives the signal, Decode the received signal and then recover the original audio signal.
- different types of encoding methods are adopted for different types of audio signals.
- the audio signal is a voice signal
- a continuous coding method is generally adopted, that is, each frame of the voice signal is separately encoded.
- the noise signal is usually encoded by using a non-continuous coding method, that is, The noise signal of several frames is encoded on one frame of the noise signal, for example, the noise signal is encoded every six frames, and after the first frame noise signal is encoded, the second to seventh frame noise signals are no longer encoded, and then The eighth frame noise signal is encoded, and six No_Data frames are respectively in the second frame to the seventh frame.
- the above audio signal refers to a mono audio signal.
- stereo communication taking stereo communication as an example of two-channel communication, wherein the two channels include the first channel and the second channel.
- the transmitting end obtains the nth frame voice signal of the first channel and the nth frame of the second channel according to the nth frame voice signal of the first channel and the nth frame voice signal of the second channel
- the voice signal is mixed into a stereo parameter of a frame downmix signal, wherein the downmix signal is a single channel signal
- the transmitting end mixes the nth frame voice signal in the two channels into one frame downmix signal, where n is greater than A positive integer of zero, then encode the downmix signal of the frame, and finally send the encoded downmix signal and stereo parameters to the receiving end, and after receiving the encoded downmix signal and stereo parameters, the receiving end encodes the code.
- the downmix signal After the downmix signal is decoded, the downmix signal is restored to a two-channel signal according to the stereo parameter.
- This transmission mode greatly reduces the number of transmitted bits compared to encoding each frame of the two-channel voice signal. , thus achieving the purpose of compression.
- the same encoding method as the voice signal is used. If the non-continuous encoding method in mono is directly applied to the stereo communication, the receiving end cannot Restoring the noise signal causes the subjective experience of the user at the receiving end to deteriorate.
- the present invention provides a method, apparatus and system for processing a multi-channel audio signal to solve the problem that the multi-channel audio communication system in the prior art cannot continuously transmit audio signals.
- a method for processing a multi-channel audio signal comprising: detecting, by a encoder, whether a voice signal is included in a downmix signal of an Nth frame, and when detecting a voice signal in a downmix signal of an Nth frame, The Nth frame downmix signal is encoded; when the Nth frame downmix signal is detected to not include the voice signal: if it is determined that the Nth frame downmix signal satisfies the preset audio frame coding condition, the Nth frame downmix signal is encoded If it is determined that the downmix signal of the Nth frame does not satisfy the preset audio frame coding condition, the downmix signal of the Nth frame is not encoded; wherein the downmix signal of the Nth frame is the Nth of the two channels of the multichannel
- the frame audio signal is obtained based on a predetermined first algorithm, and N is a positive integer greater than zero.
- the encoder Since the encoder only includes the speech signal in the downmix signal or the downmix signal satisfies the preset audio frame encoding condition, the downmix signal is encoded, otherwise the downmix signal is not encoded, thereby enabling the encoder to implement the downmix signal.
- Non-continuous coding improves the compression efficiency of the downmix signal.
- the preset audio frame coding condition includes the first frame downmix signal, that is, when the first frame downmix signal does not include the voice signal, the first frame is The mixed signal satisfies the preset audio frame encoding condition, and encodes the first frame downmix signal.
- the encoder in order to achieve a greater degree of compression efficiency for the downmix signal, optionally, when the encoder includes the voice signal in the downmix signal of the Nth frame, according to the preset voice frame coding rate. Coding of the downmix signal of the Nth frame; not including the voicemail when detecting the downmix signal of the Nth frame No.: If it is determined that the downmix signal of the Nth frame satisfies the preset speech frame coding condition, the downmix signal of the Nth frame is encoded according to the preset speech frame coding rate; if it is determined that the downmix signal of the Nth frame does not satisfy the preset The speech frame encoding condition, but satisfying the preset SID encoding condition, encodes the Nth frame downmix signal according to the preset SID encoding rate; wherein the SID encoding rate is smaller than the speech frame encoding rate.
- the preset SID coding rate is performed on the Nth frame downmix signal. SID coding further improves the compression efficiency of the downmix signal compared to the speech signal coding.
- the stereo parameter set needs to be encoded.
- the encoder performs discontinuous encoding on the stereo parameter set. Specifically, the encoder obtains the first according to the N-th frame audio signal.
- the N-frame stereo parameter set encodes the N-th stereo parameter set when the N-th downmix signal is detected to include the speech signal; and when the N-th down-mixed signal is detected to include no speech signal: If the N frame stereo parameter set satisfies the preset stereo parameter encoding condition, the at least one stereo parameter in the Nth frame stereo parameter set is encoded; if it is determined that the Nth frame stereo parameter set does not satisfy the preset stereo parameter encoding condition, then the Stereo parameter set encoding; wherein, the Nth stereo parameter set includes Z stereo parameters, and the Z stereo parameters include parameters used by the encoder to mix the Nth frame audio signals based on a predetermined algorithm, and Z is a positive integer greater than zero .
- the encoder performs stereo parameters according to the Nth frame before encoding at least one stereo parameter in the Nth stereo parameter set.
- the Z stereo parameters in the set are obtained according to a preset stereo parameter reduction rule, and X target stereo parameters are obtained, and then X target stereo parameters are encoded, where X is a positive integer greater than zero and less than or equal to Z.
- the preset stereo parameter dimension reduction rule may be a preset stereo parameter type, that is, selecting X stereo parameters that match the preset stereo parameter type from the Nth frame stereo parameter set, or a preset stereo parameter.
- the dimension reduction rule is the number of preset stereo parameters, that is, from the Nth X stereo parameters are selected in the frame stereo parameter set, or the preset stereo parameter dimension reduction rule is to reduce the resolution in the time domain or the frequency domain for at least one stereo parameter in the Nth frame stereo parameter set, that is, according to the reduction
- the resolution of the at least one stereo parameter in the time domain or the frequency domain determines X target stereo parameters based on the Z stereo parameters.
- the compression efficiency of the multi-channel communication system can also be improved by the following methods:
- the encoder detects that the Nth frame audio signal includes a voice signal: according to the Nth frame audio signal, based on the first stereo parameter set generation manner, obtains an Nth frame stereo parameter set, and encodes the Nth frame stereo parameter set;
- the Nth frame audio signal does not include a voice signal: if it is determined that the Nth frame audio signal satisfies a preset voice frame coding condition, according to the Nth frame audio signal, the Nth frame is obtained based on the first stereo parameter set generation manner Stereo parameter set, and encoding the Nth frame stereo parameter set; if it is determined that the Nth frame audio signal does not satisfy the preset speech frame encoding condition, according to the Nth frame audio signal, based on the second stereo parameter set generation manner, An N-frame stereo parameter set, and encoding, when determining that the N-th stereo parameter set satisfies a preset stereo parameter encoding condition, encoding at least one stereo parameter in the Nth frame stereo parameter set; not determining that the Nth frame audio
- the first stereo parameter set generation manner and the second stereo parameter set generation manner satisfy at least one of the following conditions:
- the number of stereo parameter types included in the stereo parameter set specified by the first stereo parameter set generation mode is not less than the number of stereo parameter types included in the stereo parameter set specified by the second stereo parameter set generation mode, and the first stereo parameter
- the number of stereo parameters included in the stereo parameter set specified by the set generation mode is not less than the number of stereo parameters included in the stereo parameter set specified by the second stereo parameter set generation mode, and the stereo set by the first stereo parameter set generation mode
- the resolution of the parameter in the time domain is not lower than the resolution of the corresponding stereo parameter specified by the second stereo parameter set generation mode, and the resolution of the stereo parameter specified by the first stereo parameter set generation mode is not lower than the resolution in the frequency domain.
- the second stereo parameter set generation method Corresponding stereo parameters in the frequency domain resolution.
- the encoder when the encoder includes the voice signal in the downmix signal of the Nth frame, the encoder encodes the Nth frame stereo parameter set according to the first coding mode; and the downmix signal satisfies the voice in the Nth frame.
- the frame coding condition at least one stereo parameter in the Nth frame stereo parameter set is encoded according to the first coding mode; when the Nth frame downmix signal does not satisfy the speech frame coding condition, the Nth frame is stereo according to the second coding mode At least one stereo parameter encoding in the parameter set;
- the coding rate specified by the first coding mode is not less than the coding rate specified by the second coding mode; and/or, for any of the stereo parameters of the Nth frame stereo parameter set, the quantization accuracy specified by the first coding mode is not It is lower than the quantization precision specified by the second coding method.
- the Nth stereo parameter set includes IPD and ITD
- the quantization precision of the IPD specified in the first coding mode is not lower than the quantization precision of the IPD specified in the second coding mode, and the quantization of the ITD specified in the first coding mode.
- the accuracy is not lower than the quantization precision of the ITD specified in the second encoding method.
- At least one stereo parameter in the set of stereo parameters of the Nth frame includes: an inter-channel level difference ILD; and the preset stereo parameter encoding condition includes: D L ⁇ D 0 ;
- D L represents a degree of deviation of the ILD from the first standard
- the first criterion is a T-frame stereo parameter set before the N-th stereo parameter set
- T is a positive integer greater than 0, determined according to a predetermined second algorithm
- At least one stereo parameter in the set of stereo parameters of the Nth frame comprises: an inter-channel time difference ITD; the preset stereo parameter encoding condition includes: D T ⁇ D 1 ;
- D T represents a degree of deviation of the ITD from the second standard
- the second criterion is based on a T-frame stereo parameter set before the N-th stereo parameter set, and T is a positive integer greater than 0, determined based on a predetermined third algorithm
- At least one stereo parameter in the Nth stereo parameter set includes: an inter-channel phase difference IPD; the preset stereo parameter encoding condition includes: D p ⁇ D 2 ;
- D P represents the degree of deviation of the IPD from the third standard
- the third criterion is based on the T-frame stereo parameter set before the N-th stereoscopic parameter set, which is determined based on a predetermined fourth algorithm
- T is a positive integer greater than zero.
- the second algorithm, the third algorithm, and the fourth algorithm are preset according to actual conditions.
- D L , D T , and D P respectively satisfy the following expressions:
- ILD(m) is a level difference when the N channel transmits the Nth frame audio signal in the mth subband
- M is the total number of subbands occupied by the Nth frame audio signal.
- T is a positive integer greater than 0
- ILD [-t] (m) is the two channels in the mth subband respectively.
- the level difference when transmitting the t-th frame audio signal before the Nth frame audio signal, and the ITD is the time difference when the N-channel audio signal is transmitted by the two channels respectively.
- ITD [-t] The average value of the ITD in the T-frame stereo parameter set before the Nth frame, ITD [-t] is the time difference when the t-frame audio signal before the N-th frame audio signal is transmitted separately, IPD ( m) a phase difference value when the two channels respectively transmit a partial audio signal in the Nth frame audio signal in the mth subband, The average value of the IPD in the mth subband in the T frame stereo parameter set before the Nth frame, IPD [-t] (m) is the two channels before the transmission of the Nth frame audio signal in the mth subband respectively The phase difference value of the t-th frame audio signal.
- a method for processing a multi-channel audio signal comprising: a decoder receiving a code stream, the code stream comprising at least two frames, at least two frames of the first type and at least one of the at least two frames
- the first type of frame includes a downmix signal
- the second type of frame does not include a downmix signal
- N is a positive integer greater than one: if the decoder determines that the Nth frame stream is The first type of frame is decoded by the Nth frame stream to obtain the Nth frame downmix signal; if the decoder determines that the Nth frame code stream is the second type of frame, according to the preset first rule, from the Nth frame In the at least one frame downmix signal before the mixed signal, determining the m frame downmix signal, and based on the m frame downmix signal, obtaining the Nth frame downmix signal based on the predetermined first algorithm, where m is a positive integer greater than zero
- the code stream received by the decoder includes a first type of frame and a second type of frame, wherein the first type of frame includes a downmix signal
- the second type of frame does not include a downmix signal, that is, the encoder is not
- the downmix signal of each frame is encoded, thereby realizing the discontinuous transmission of the downmix signal, and improving the compression efficiency of the downmix signal of the multichannel audio communication system.
- the first frame code stream is a first type of frame, and specifically, in order to decode the first frame code stream, the obtained downmix signal is restored to audio in two channels.
- the signal also needs to include a set of stereo parameters in the first frame stream.
- the second type of frame does not include the downmix signal, and therefore, the size of the first type of frame is larger than the size of the second type of frame, and the decoder can pass the code according to the Nth frame.
- the size of the stream is used to determine whether the Nth frame code stream is the first type frame or the second type frame.
- the identifier bit may be encapsulated in the Nth frame code stream, and the decoder obtains the identifier after decoding the Nth frame code stream portion. Bit, if the identifier bit indicates that the Nth frame code stream is the first type frame, the decoder decodes the Nth frame code stream to obtain the Nth frame downmix signal; if the flag bit indicates that the Nth frame code stream is the second type frame, Then the decoder obtains the Nth frame downmix signal according to the predetermined first algorithm.
- the first type of frame includes a downmix signal and a stereo parameter set
- the type frame includes a stereo parameter set and does not include a downmix signal: if the decoder determines that the Nth frame code stream is the first type frame, after decoding the Nth frame code stream, after obtaining the Nth frame downmix signal, And obtaining an Nth frame stereo parameter set, and restoring the Nth frame downmix signal to the Nth frame audio signal according to a predetermined third algorithm according to at least one stereo parameter in the Nth frame stereo parameter set; if the decoder determines The N frame code stream is the second type frame, and the Nth stream stream solution is solved.
- a code obtains a set of stereo parameters of the Nth frame, and obtains an Nth frame downmix signal based on a predetermined first algorithm, and then the decoder is based on at least one stereo parameter in the set of stereo parameters of the Nth frame, based on a predetermined third algorithm, The N-frame downmix signal is restored to the Nth frame audio signal.
- the first type of frame includes a downmix signal and a stereo parameter set, and second.
- the type frame does not include the downmix signal and does not include the stereo parameter set; if the decoder determines that the Nth frame code stream is the first type of frame, the Nth code stream is decoded, and the Nth frame downmix signal is obtained.
- the N frame code stream is a second type of frame, and the Nth frame downmix signal is obtained based on a predetermined first algorithm, and is determined according to a preset second rule from at least one frame stereo parameter set before the Nth frame stereo parameter set.
- a k-frame stereo parameter set and based on a k-frame stereo parameter set, based on a predetermined fourth algorithm, obtains an Nth frame stereo parameter set, and then, according to at least an Nth frame stereo parameter set Stereo parameters, based on a third algorithm, the downmix signal is reduced to the N-th frame of the N frames of the audio signal, k is a positive integer greater than zero.
- the first type of frame includes the downmix signal and the stereo parameter set
- the third The type frame includes a stereo parameter set and does not include a downmix signal
- the fourth type frame does not include a downmix signal and does not include a stereo parameter set
- the third type frame and the fourth type frame are respectively a case of the second type frame.
- the decoder determines that the Nth frame code stream is the first type of frame, decoding the Nth frame code stream, and obtaining the Nth frame stereo parameter set while obtaining the Nth frame downmix signal, and according to the Nth frame stereo parameter set At least one stereo parameter in the parameter set, based on the third algorithm, restores the Nth frame downmix signal to the Nth frame audio signal.
- the decoder determines that the Nth code stream is the second type of frame, there are two cases:
- the Nth frame code stream is the third type frame
- the Nth frame code stream is decoded, the Nth frame stereo parameter set is obtained, and the Nth frame downmix signal is obtained based on the predetermined first algorithm, and the Nth frame is established according to the Nth frame.
- the k frame stereo parameter set is determined from the at least one frame stereo parameter set before the Nth frame stereo parameter set according to the preset second rule, and is based on the k frame stereo.
- a set of parameters based on a predetermined fourth algorithm, obtaining a set of stereo parameters of the Nth frame, k being a positive integer greater than zero, and obtaining a downmix signal of the Nth frame based on the predetermined first algorithm, and according to at least a set of stereo parameters of the Nth frame
- a stereo parameter, based on the third algorithm restores the Nth frame downmix signal to the Nth frame audio signal.
- the fifth type frame includes the downmix signal and the stereo parameter set
- the sixth The type frame includes a downmix signal and does not include a stereo parameter set.
- the fifth type frame and the sixth type frame are respectively a case of the first type frame, and the second type frame does not include the downmix signal and does not include the stereo parameter set.
- the decoder determines that the Nth frame stream is the first type of frame, there are two cases:
- the Nth frame code stream is the fifth type frame
- the Nth frame code stream is decoded, and when the Nth frame downmix signal is obtained, the Nth frame stereo parameter set is also obtained, and according to the Nth frame stereo parameter set.
- At least one stereo parameter in the third algorithm is used to restore the Nth frame downmix signal to the Nth frame audio signal;
- the Nth frame code stream is the sixth type frame
- the Nth frame code stream is decoded to obtain the Nth frame downmix signal, and at least one frame before the Nth frame stereo parameter set according to the preset second rule.
- the k-frame stereo parameter set is determined, and according to the k-frame stereo parameter set, the N-th stereo parameter set is obtained based on the predetermined fourth algorithm, and is based on at least one stereo parameter in the N-th stereo parameter set, based on The third algorithm restores the Nth frame downmix signal to the Nth frame audio signal;
- the Nth frame downmix signal is obtained based on the predetermined first algorithm, and at least one frame stereo before the Nth frame stereo parameter set according to the preset second rule.
- the parameter set determine the k-frame stereo parameter set and according to the k-frame stereo parameter
- the set based on the predetermined fourth algorithm, obtains the Nth frame stereo parameter set, and restores the Nth frame downmix signal to the Nth frame audio signal according to the third algorithm according to at least one stereo parameter in the Nth frame stereo parameter set .
- the fifth type frame includes the downmix signal and the stereo parameter set
- the sixth The type frame includes a downmix signal and does not include a stereo parameter set.
- the fifth type frame and the sixth type frame are respectively a case of the first type frame
- the third type frame includes a stereo parameter set and does not include a downmix signal.
- the fourth type of frame does not include a downmix signal and does not include a stereo parameter set
- the third type frame and the fourth type frame are respectively a case of the second type of frame:
- the decoder determines that the Nth frame stream is the first type of frame, there are two cases:
- the Nth frame code stream is the fifth type frame
- the Nth frame downmix signal is obtained, and the Nth frame stereo parameter set is also obtained, and according to the Nth frame stereo parameter set.
- At least one stereo parameter in the third algorithm is used to restore the Nth frame downmix signal to the Nth frame audio signal;
- the Nth frame downmix signal is obtained, and according to the preset second rule, at least one from the Nth frame stereo parameter set.
- the frame stereo parameter set determining a k-frame stereo parameter set, and obtaining, according to a predetermined fourth algorithm, a N-th stereo parameter set according to the k-frame stereo parameter set, and according to at least one stereo parameter in the N-th stereo parameter set, Restoring the Nth frame downmix signal to the Nth frame audio signal based on the third algorithm;
- the decoder determines that the Nth code stream is the second type of frame, there are two cases:
- the Nth frame code stream is the third type frame
- the Nth frame code stream is decoded, the Nth frame stereo parameter set is obtained, and the Nth frame downmix signal is obtained based on the predetermined first algorithm, and according to the Nth frame stereo signal.
- the k frame stereo parameter set is determined from the at least one frame stereo parameter set before the Nth frame stereo parameter set according to the preset second rule, and According to the k-frame stereo parameter set, based on the predetermined fourth algorithm, the Nth frame stereo parameter set is obtained, k is a positive integer greater than zero, and the Nth frame downmix signal is obtained based on the predetermined first algorithm, and according to the Nth frame stereo parameter At least one stereo parameter in the set, based on the third algorithm, restores the Nth frame downmix signal to the Nth frame audio signal.
- an encoder including: a signal detecting unit and a signal encoding unit, wherein the signal detecting unit is configured to detect whether a voice signal is included in a downmix signal of the Nth frame, and the down signal of the Nth frame is caused by multiple The Nth frame audio signal of the two channels in the channel is obtained based on a predetermined first algorithm, and N is a positive integer greater than zero; the signal encoding unit is configured to include the Nth frame downmix signal detected by the signal detecting unit.
- the N-th frame downmix signal is encoded, and when the signal detecting unit detects that the Nth frame downmix signal does not include the voice signal: if the signal detecting unit determines that the Nth frame downmix signal satisfies the preset audio frame The encoding condition encodes the downmix signal of the Nth frame; if the signal detecting unit determines that the downmix signal of the Nth frame does not satisfy the preset audio frame encoding condition, the downmix signal of the Nth frame is not encoded.
- the signal encoding unit includes a first signal encoding unit and a second signal encoding unit.
- the signal detecting unit detects that the N-th downmix signal includes a voice signal
- the signal detecting unit notifies The first signal encoding unit encodes the Nth frame downmix signal; if the signal detecting unit determines that the Nth frame downmix signal satisfies the preset speech frame encoding condition, notifying the first signal encoding unit to encode the Nth frame downmix signal, Specifically, the first signal coding unit encodes the Nth frame downmix signal according to the preset voice frame coding rate; if the signal detection unit determines that the Nth frame downmix signal does not satisfy the preset voice frame coding condition, but satisfies the preset The mute insertion frame SID encoding condition notifies the second signal encoding unit to encode the Nth frame downmix signal. Specifically, the second signal encoding unit encodes the Nth frame downmix signal according to the preset
- the method further includes a parameter generating unit, a parameter encoding unit, and a parameter detecting unit, wherein the parameter generating unit is configured to obtain the Nth frame stereo parameter set according to the Nth frame audio signal, and the Nth
- the stereo parameters of the frame include Z stereo parameters, and the Z stereo parameters include parameters used by the encoder to mix the audio signals of the Nth frame based on a predetermined first algorithm.
- the parameter encoding unit is configured to encode the Nth frame stereo parameter set when the signal detecting unit detects that the Nth frame downmix signal includes a speech signal, and detects in the signal detecting unit When the voice signal is not included in the Nth frame downmix signal: if the parameter detecting unit determines that the Nth frame stereo parameter set satisfies the preset stereo parameter encoding condition, encoding at least one stereo parameter in the Nth frame stereo parameter set; The parameter detecting unit determines that the stereo parameter set of the Nth frame does not satisfy the preset stereo parameter encoding condition, and does not encode the stereo parameter set.
- the parameter coding unit is configured to obtain X target stereo parameters according to preset stereo parameter reduction rules according to Z stereo parameters in the Nth stereo parameter set, and X target stereo parameter encodings, where X is a positive integer greater than zero and less than or equal to Z.
- the parameter generating unit includes a first parameter generating unit and a second parameter generating unit;
- the signal detecting unit detects that the Nth frame audio signal includes a voice signal or the signal detecting unit detects that the Nth frame audio signal does not include a voice signal, and the Nth frame audio signal satisfies a preset voice frame coding condition, and notifies the first parameter generation.
- the unit generates an Nth frame stereo parameter set. Specifically, the first parameter generating unit obtains an Nth frame stereo parameter set according to the first stereo parameter set generation manner according to the Nth frame audio signal, and uses the parameter encoding unit to the Nth frame.
- Stereo parameter set encoding specifically, when the parameter encoding unit includes the first parameter encoding unit and the second parameter encoding unit, encoding the Nth frame stereo parameter set by the first parameter encoding unit; wherein, the first parameter encoding unit specifies The coding mode is the first coding mode, and the coding mode specified by the second parameter coding unit is the second coding mode. Specifically, the coding rate specified by the first coding mode is not less than the coding rate specified by the second coding mode; and/or, Any stereo parameter in the Nth stereo parameter set, the first encoding method specifies The quantization accuracy is not lower than the quantization precision specified by the second coding mode;
- the second parameter generating unit obtains the Nth frame stereo parameter set according to the second stereo parameter set generating manner according to the Nth frame audio signal, and obtains the parameter
- the detecting unit determines, when the Nth frame stereo parameter set satisfies the preset stereo parameter encoding condition, the Nth frame stereo parameter set by the parameter encoding unit At least one of the stereo parameters of the Nth frame stereo parameter is encoded by the second parameter encoding unit when the parameter encoding unit includes the first parameter encoding unit and the second parameter encoding unit;
- the parameter detecting unit determines that the stereo parameter set of the Nth frame does not satisfy the preset stereo parameter encoding condition, the stereo parameter set is not encoded
- the first stereo parameter set generation manner and the second stereo parameter set generation manner satisfy at least one of the following conditions:
- the number of stereo parameter types included in the stereo parameter set specified by the first stereo parameter set generation mode is not less than the number of stereo parameter types included in the stereo parameter set specified by the second stereo parameter set generation mode, and the first stereo parameter
- the number of stereo parameters included in the stereo parameter set specified by the set generation mode is not less than the number of stereo parameters included in the stereo parameter set specified by the second stereo parameter set generation mode, and the stereo set by the first stereo parameter set generation mode
- the resolution of the parameter in the time domain is not lower than the resolution of the corresponding stereo parameter specified by the second stereo parameter set generation mode, and the resolution of the stereo parameter specified by the first stereo parameter set generation mode is not lower than the resolution in the frequency domain.
- the second stereo parameter set generation mode specifies the resolution of the corresponding stereo parameter in the frequency domain.
- the parameter encoding unit includes a first parameter encoding unit and a second parameter encoding unit.
- the first parameter encoding unit is configured to include a voice signal in the Nth frame downmix signal and When the voice signal is not included in the downmix signal of the Nth frame but the speech frame coding condition is satisfied, the Nth frame stereo parameter set is encoded according to the first coding mode; the second parameter coding unit is configured to not satisfy the downmix signal in the Nth frame.
- the speech frame encoding condition encoding at least one stereo parameter in the Nth frame stereo parameter set according to the second encoding mode;
- the coding rate specified by the first coding mode is not less than the coding rate specified by the second coding mode; and/or, for any stereo parameter in the Nth frame stereo parameter set, the quantization accuracy specified by the first coding mode is not lower than The quantization accuracy specified by the second coding method.
- At least one stereo parameter in the set of stereo parameters of the Nth frame comprises: an inter-channel level difference ILD; the preset stereo parameter encoding condition includes: D L ⁇ D 0 ;
- D L represents a degree of deviation of the ILD from the first standard
- the first criterion is a T-frame stereo parameter set before the N-th stereo parameter set
- T is a positive integer greater than 0, determined according to a predetermined second algorithm
- At least one stereo parameter in the set of stereo parameters of the Nth frame comprises: an inter-channel time difference ITD; the preset stereo parameter encoding condition includes: D T ⁇ D 1 ;
- D T represents a degree of deviation of the ITD from the second standard
- the second criterion is based on a T-frame stereo parameter set before the N-th stereo parameter set, and T is a positive integer greater than 0, determined based on a predetermined third algorithm
- At least one stereo parameter in the Nth stereo parameter set includes: an inter-channel phase difference IPD; the preset stereo parameter encoding condition includes: D p ⁇ D 2 ;
- D P represents the degree of deviation of the IPD from the third standard
- the third criterion is determined according to a predetermined fourth algorithm according to the T-frame stereo parameter set before the N-th stereo parameter set, and T is a positive integer greater than 0.
- D L , D T , and D P respectively satisfy the following expressions:
- ILD(m) is a level difference when the N channel transmits the Nth frame audio signal in the mth subband
- M is the total number of subbands occupied by the Nth frame audio signal.
- T is a positive integer greater than 0
- ILD [-t] (m) is the two channels in the mth subband respectively.
- the level difference when transmitting the t-th frame audio signal before the Nth frame audio signal, and the ITD is the time difference when the N-channel audio signal is transmitted by the two channels respectively.
- ITD [-t] The average value of the ITD in the T-frame stereo parameter set before the Nth frame, ITD [-t] is the time difference when the t-frame audio signal before the N-th frame audio signal is transmitted separately, IPD ( m) a phase difference value when the two channels respectively transmit a partial audio signal in the Nth frame audio signal in the mth subband, The average value of the IPD in the mth subband in the T frame stereo parameter set before the Nth frame, IPD [-t] (m) is the two channels before the transmission of the Nth frame audio signal in the mth subband respectively The phase difference value of the t-th frame audio signal.
- a fourth aspect provides a decoder, including: a receiving unit and a decoding unit, wherein the receiving unit is configured to receive a code stream, the code stream includes at least two frames, and at least one first type frame exists in at least two frames And at least one second type of frame, the first type of frame includes a downmix signal, and the second type of frame does not include a downmix signal; for the Nth frame code stream, N is a positive integer greater than 1, and the decoding unit is configured to: If the code stream of the Nth frame is determined to be the first type of frame, the code stream of the Nth frame is decoded to obtain a downmix signal of the Nth frame; if the code stream of the Nth frame is determined to be the second type of frame, according to the preset first rule Determining the m-frame downmix signal from the at least one frame downmix signal before the downmix signal of the Nth frame, and obtaining the Nth frame downmix signal based on the predetermined first algorithm according to the m frame downmix signal, where m
- the Nth frame downmix signal is obtained by the encoder mixing the Nth frame audio signals of the two channels of the multichannel based on a predetermined second algorithm.
- the first type of frame includes a downmix signal and a stereo parameter set
- the second type of frame includes a stereo parameter set and does not include a downmix signal:
- the decoding unit is further configured to: if the Nth frame code stream is determined to be the first type of frame, decode the Nth frame code stream, and obtain the Nth frame stereo parameter set while obtaining the Nth frame downmix signal;
- the N frame code stream is a second type of frame, and the Nth frame code stream is decoded to obtain an Nth frame stereo parameter set, and at least one stereo parameter in the Nth frame stereo parameter set is used by the decoder based on a predetermined third algorithm.
- the N frame downmix signal is restored to the Nth frame audio signal;
- a signal restoring unit configured to restore the Nth frame downmix signal to the Nth frame audio signal according to the third algorithm according to at least one stereo parameter in the Nth frame stereo parameter set.
- the first type of frame includes a downmix signal and a stereo parameter set
- the second type of frame does not include a downmix signal and does not include a stereo parameter set
- the decoding unit is further configured to: if the Nth frame code stream is determined to be the first type of frame, decode the Nth frame code stream, and obtain the Nth frame stereo parameter set while obtaining the Nth frame downmix signal;
- the N frame code stream is a second type of frame, and the k frame stereo parameter set is determined from the at least one frame stereo parameter set before the Nth frame stereo parameter set according to a preset second rule, and according to the k frame stereo parameter set, Obtaining a Nth frame stereo parameter set based on a predetermined fourth algorithm, where k is a positive integer greater than zero;
- the at least one stereo parameter in the Nth frame stereo parameter set is used by the decoder to restore the Nth frame downmix signal to the Nth frame audio signal based on a predetermined third algorithm;
- a signal restoring unit configured to restore the Nth frame downmix signal to the Nth frame audio signal according to the third algorithm according to at least one stereo parameter in the Nth frame stereo parameter set.
- the first type of frame includes a downmix signal and a stereo parameter set
- the third type of frame includes a stereo parameter set and does not include a downmix signal
- the fourth type of frame does not include a lower
- the mixed signal does not include a stereo parameter set
- the third type frame and the fourth type frame are respectively a second type of frame:
- the decoding unit is further configured to: if the Nth frame code stream is determined to be the first type of frame, decode the Nth frame code stream, and obtain the Nth frame stereo parameter set while obtaining the Nth frame downmix signal;
- the N frame code stream is the second type frame: when the Nth frame code stream is the third type frame, the Nth frame code stream is decoded to obtain the Nth frame stereo parameter set; when the Nth frame code stream is the fourth type
- the k-frame stereo parameter set is determined from the at least one frame stereo parameter set before the N-th stereo parameter set, and is obtained according to the predetermined fourth algorithm according to the k-frame stereo parameter set. a set of stereo parameters of the Nth frame, k being a positive integer greater than zero;
- the at least one stereo parameter in the Nth frame stereo parameter set is used by the decoder to restore the Nth frame downmix signal to the Nth frame audio signal based on a predetermined third algorithm;
- a signal restoring unit configured to restore the Nth frame downmix signal to the Nth frame audio signal according to the third algorithm according to at least one stereo parameter in the Nth frame stereo parameter set.
- the fifth type frame includes a downmix signal and a stereo parameter set
- the sixth type frame includes a downmix signal and does not include a stereo parameter set
- the fifth type frame and the sixth type The frame is a case of a first type of frame, and the second type of frame does not include a downmix signal and does not include a stereo parameter set:
- the decoding unit is further configured to: if the Nth frame code stream is determined to be the first type of frame: when the Nth frame code stream is the fifth type of frame, decode the Nth frame code stream, and obtain the Nth frame downmix signal, The Nth frame stereo parameter set is also obtained; when the Nth frame code stream is the sixth type frame, the k frame is determined from the at least one frame stereo parameter set before the Nth frame stereo parameter set according to the preset second rule.
- Stereo parameter set and according to the k-frame stereo parameter set, based on the predetermined fourth algorithm, obtain the Nth frame stereo parameter set; if the Nth frame code stream is determined to be the second type frame, according to the preset second rule, from the Nth Determining a k-frame stereo parameter set in the at least one frame stereo parameter set before the frame stereo parameter set, and obtaining an Nth frame stereo parameter set based on the predetermined fourth algorithm according to the k-frame stereo parameter set;
- the at least one stereo parameter of the Nth frame stereo parameter set is used by the decoder to restore the Nth frame downmix signal to the Nth frame audio signal based on a predetermined third algorithm, where k is a positive integer greater than zero;
- a signal restoring unit configured to restore the Nth frame downmix signal to the Nth frame audio signal according to the third algorithm according to at least one stereo parameter in the Nth frame stereo parameter set.
- the fifth type frame includes a downmix signal and a stereo parameter set
- the sixth type frame includes a downmix signal and does not include a stereo parameter set
- the fifth type frame and the sixth type The frame is a case of the first type of frame
- the third type of frame contains the stereo parameter set and does not include the downmix signal
- the fourth type of frame does not include the downmix signal and does not include the stereo parameter set
- the fourth type of frame is a case of the second type of frame:
- the decoding unit is further configured to: if the Nth frame code stream is determined to be the first type of frame: when the Nth frame code stream is the fifth type of frame, decode the Nth frame code stream, and obtain the Nth frame downmix signal, The Nth frame stereo parameter set is also obtained; when the Nth frame code stream is the sixth type frame, the k frame is determined from the at least one frame stereo parameter set before the Nth frame stereo parameter set according to the preset second rule. Three-dimensional The acoustic parameter set, and based on the k-frame stereo parameter set, obtains the Nth frame stereo parameter set based on the predetermined fourth algorithm.
- the decoding unit is further configured to: if the Nth frame code stream is determined to be the second type of frame: when the Nth frame code stream is the third type of frame, the Nth frame code stream is decoded to obtain the Nth frame stereo parameter set; When the frame code stream is the fourth type of frame, the k-frame stereo parameter set is determined from the at least one frame stereo parameter set before the Nth frame stereo parameter set according to the preset second rule, and according to the k-frame stereo parameter set, Obtaining a Nth frame stereo parameter set based on a predetermined fourth algorithm;
- the at least one stereo parameter of the Nth frame stereo parameter set is used by the decoder to restore the Nth frame downmix signal to the Nth frame audio signal based on a predetermined third algorithm, where k is a positive integer greater than zero;
- the decoder further includes a signal restoration unit
- a signal restoring unit configured to restore the Nth frame downmix signal to the Nth frame audio signal according to the third algorithm according to at least one stereo parameter in the Nth frame stereo parameter set.
- a codec system comprising the encoder of any of the third aspects, and the decoder of any of the fourth aspects.
- an embodiment of the present invention further provides a terminal device, where the terminal device includes a processor and a memory, where the memory is used to store a software program, and the processor is configured to read a software program stored in the memory and implement The method provided by the first aspect or any one of the foregoing first aspects.
- the embodiment of the present invention further provides a computer storage medium, which may be non-volatile, that is, the content is not lost after power off.
- the storage medium stores a software program that, when read and executed by one or more processors, implements the method provided by the first aspect or any one of the foregoing first aspects.
- FIG. 1 is a schematic flow chart of a method for processing multi-channel audio signals according to an embodiment of the present invention
- FIG. 2 is a schematic flow chart of a method for processing multi-channel audio signals according to Embodiment 2 of the present invention
- 3a-3d are schematic diagrams of an encoder according to an embodiment of the present invention.
- FIG. 4 is a schematic diagram of a decoder according to an embodiment of the present invention.
- FIG. 5 is a schematic diagram of a codec system according to an embodiment of the present invention.
- the audio signal is encoded or decoded in units of frames.
- the Nth frame audio signal is the Nth audio frame
- the voice signal is included in the Nth frame audio signal.
- the Nth audio frame is a voice frame.
- the voice signal is not included in the Nth frame audio frame
- the background noise signal is included
- the Nth audio frame is a noise frame.
- N is greater than zero. Integer.
- the encoder and decoder in the embodiment of the present invention can be installed on a device supporting a multi-channel audio signal processing (such as a mobile phone, a notebook computer, a tablet computer, etc.), a server, etc., for processing a multi-channel audio signal.
- a device supporting a multi-channel audio signal processing such as a mobile phone, a notebook computer, a tablet computer, etc.
- a server etc.
- the device such as the terminal and the server is provided with the function of processing the multi-channel audio signal in the embodiment of the present invention.
- the audio signal can be encoded by the non-continuous encoding mechanism in the multi-channel communication system, the compression efficiency of the audio signal is greatly improved.
- N is a positive integer greater than zero. It is assumed that the Nth frame downmix signal is obtained by mixing the Nth frame audio signals of the two channels in the multichannel.
- the multi-channel is two channels, wherein the two channels are the first channel and the second channel, respectively, the two channels of the multi-channel are the first channel and the second channel, and the Nth frame
- the downmix signal is the Nth of the first channel
- the frame audio signal is mixed with the Nth frame audio signal of the second channel; when the multichannel is three channels or more, the downmix signal is the two channel audio paired by the multichannel
- the signal is mixed, specifically, taking three channels as an example, including the first channel, the second channel, and the third channel. It is assumed that only the first channel is paired with the second channel according to the set rule.
- the two channels of the multi-channel are the first channel and the second channel, and the Nth frame audio signal in the first channel and the N-th frame audio signal in the second channel are downmixed to obtain the Nth Frame downmix signal; assuming that in the three channels, the first channel and the second channel pair, the second channel and the third channel pair, the multichannel Chinese two channels can be the first channel and The second channel can also be the second channel and the third channel.
- a method for processing a multi-channel audio signal according to Embodiment 1 of the present invention includes:
- Step 100 The encoder generates an Nth frame stereo parameter set according to the Nth frame audio signal of the two channels in the multichannel, wherein the stereo parameter set includes Z stereo parameters.
- the Z stereo parameters include parameters used by the encoder to mix the Nth frame audio signals based on a predetermined first algorithm, and Z is a positive integer greater than zero.
- the predetermined first algorithm is a downmix signal generation algorithm that is previously set in the encoder.
- the preset stereo parameter generation algorithm is as follows, and the stereo parameter obtained according to the Nth frame audio signal is Inter-channel Level Difference (ILD):
- L(i) is the Discrete Fourier Transform (DFT) coefficient of the audio signal of the Nth frame of the left channel at the ith frequency point
- R(i) is the audio signal of the Nth frame of the right channel.
- the DFT coefficient of the i-th frequency point ReL(i) is the real part of L(i)
- ImL(i) is the imaginary part of L(i)
- ReR(i) is the real part of R(i)
- ImR( i) is the imaginary part of R(i)
- PL(i) is the energy spectrum of the audio signal of the Nth frame of the left channel at the i-th frequency point
- PR(i) is the audio signal of the Nth frame of the right channel at the ith
- EL(m) is the energy of the Nth frame audio signal in the mth subband of the left channel
- ER(m) is the energy of the Nth frame audio signal in the mth subband of the right channel.
- the preset stereo parameter generation algorithm also includes calculating other stereo parameters such as Inter-channel Time Difference (ITD), Inter-channel Phase Difference (IPD), IC (Inter- Channel Coherence (channel coherence) algorithm for stereo parameters, the encoder can also obtain stereo parameters such as ITD, IPD, IC based on the audio signal and based on the preset stereo parameter generation algorithm.
- ITD Inter-channel Time Difference
- IPD Inter-channel Phase Difference
- IC Inter- Channel Coherence (channel coherence) algorithm for stereo parameters
- the encoder can also obtain stereo parameters such as ITD, IPD, IC based on the audio signal and based on the preset stereo parameter generation algorithm.
- the Nth frame stereo parameter set includes at least one stereo parameter, for example, according to the Nth frame audio signal of the two channels, based on the preset stereo parameter generation algorithm, and obtains IPD, ITD, ILD, and IC, by IPD. , ITD, ILD, and IC form the Nth frame stereo parameter set.
- Step 101 The encoder mixes the Nth frame audio signals of the two channels into the Nth frame downmix signal based on the predetermined first algorithm according to at least one stereo parameter in the Nth frame stereo parameter set.
- the Nth stereo parameter set includes ITD, ILD, IPD, and IC.
- the Nth frame downmix signal is obtained based on the predetermined first algorithm.
- the Nth frame downmix signal DMX(k) The following expression is satisfied at the kth frequency point:
- DMX(k) is the Nth frame downmixed signal at the kth frequency point
- indicates the Kth pair of channels left
- represents the amplitude of the kth frequency point of the Nth frame audio signal in the right channel of the K pair channel
- ⁇ L( k) represents the phase angle of the Nth frame audio signal in the left channel at the kth frequency point
- ILD(k) represents the ILD of the Nth frame audio signal at the kth frequency point
- IPD(k) represents the Nth frame audio.
- the embodiment of the present invention is not limited to other algorithms for obtaining a downmix signal.
- the encoding of the Nth frame stereo parameter set is performed in order to enable the decoder to restore the Nth frame downmix signal.
- the encoder performs the Nth frame stereo parameter.
- the generated Nth frame stereo parameter set includes ITD, ILD, IPD, and IC.
- the encoder only based on the ILD and IPD in the Nth frame stereo parameter set, based on the predetermined first algorithm, the two channels are The Nth frame audio signal is mixed into the Nth frame downmix signal, so to improve the compression efficiency, the encoder can encode only the ILD and IPD in the Nth frame stereo parameter set.
- Step 102 The encoder detects whether the voice signal is included in the downmix signal of the Nth frame. If yes, step 103 is performed; otherwise, step 104 is performed.
- the encoder In order to facilitate the encoder to detect whether the N-frame downmix signal includes a voice signal, optionally, the encoder directly detects whether the N-frame downmix signal includes a voice signal through Voice Activity Detection (VAD).
- VAD Voice Activity Detection
- an indirect method for detecting whether the N-frame downmix signal includes a voice signal is performed by the encoder, and the encoder directly detects whether the voice signal is included in the audio signal of the Nth frame through the VAD. Specifically, when the encoder detects that the audio signal of one of the two channels includes the voice signal, determining that the downmix signal obtained by mixing the audio signals in the two channels includes the voice signal, and the encoder determines two sounds. When the audio signal in the track does not include the voice signal, it is determined that the downmix signal obtained by mixing the audio signals in the two channels contains the voice signal. It should be noted that, in this indirect detection mode, the order between step 102 and step 100 and step 101 is not limited, as long as step 100 is before step 101.
- Step 103 The encoder encodes the Nth frame downmix signal, and step 107 is performed.
- the encoder encodes the Nth frame downmix signal to obtain an Nth frame code stream.
- the code stream includes two frame types: a first type frame and a second type frame, wherein the first type frame includes a downmix signal, and the second type frame The downmix signal is not included, and the Nth frame code stream obtained in step 103 is the first type of frame.
- the encoder encodes the N-th frame downmix signal according to a preset voice frame coding rate, preferably, a preset voice frame coding.
- the rate can be set to 13.2kbps.
- the encoder encodes the Nth frame downmix signal
- the Nth frame stereo parameter set is encoded.
- Step 104 The encoder determines whether the downmix signal of the Nth frame satisfies a preset audio frame coding condition. If yes, step 105 is performed; otherwise, step 106 is performed.
- the preset audio frame coding condition is a condition for determining whether to encode the Nth frame downmix signal pre-configured in the encoder.
- the first frame downmix signal if the first frame downmix signal does not include a voice signal, the first frame downmix signal satisfies a preset audio frame coding condition, that is, regardless of the first frame downmix signal Whether the speech signal is included or not must encode the first frame downmix signal.
- Step 105 The encoder encodes the Nth frame downmix signal, and step 107 is performed.
- the Nth frame code stream obtained through step 105 is also a first type of frame.
- the encoder encodes the Nth frame downmix signal
- the Nth frame stereo parameter set is encoded.
- step 1 of the embodiment of the present invention the coding manner of the downmix signal of the Nth frame is the same.
- the encoder since the down signal of the Nth frame in step 105 does not include a voice signal, when the downmix signal of the Nth frame satisfies a preset voice frame coding condition, the encoder performs the Nth according to the preset voice frame coding rate.
- the frame downmix signal is encoded; when the downmix signal of the Nth frame does not satisfy the preset speech frame encoding condition but satisfies the preset SID encoding condition, the encoder downmixes the signal to the Nth frame according to the preset SID encoding rate.
- Encoding wherein the preset SID encoding rate can be set to 2.8 kbps.
- the encoder encodes the downmix signal of the Nth frame according to the SID coding mode, where
- the SID encoding method specifies the encoding rate as a preset SID encoding rate, and specifies the algorithm used for encoding and the parameters used for encoding.
- the preset voice frame coding condition may be: the length of the downmix signal of the Nth frame down to the Mth frame is not greater than the preset duration, wherein the downmix signal of the Mth frame includes a voice signal, and the Mth frame downmix signal It is a downmix signal containing a speech signal closest to the downmix signal of the Nth frame.
- the preset SID encoding condition may be an odd frame encoding. When N in the downmix signal of the Nth frame is an odd number, the encoder determines that the Nth frame downmix signal satisfies the preset SID encoding condition.
- Step 106 The encoder does not encode the Nth frame downmix signal, and step 109 is performed.
- the Nth frame code stream obtained in step 106 is a second type of frame.
- the encoder determines that the downmix signal of the Nth frame does not satisfy the preset audio frame coding condition. Specifically, the encoder determines that the downmix signal of the Nth frame does not satisfy the preset voice frame coding condition, and does not satisfy the preset SID coding condition. .
- the encoder does not encode the downmix signal of the Nth frame. Specifically, the Nth frame downmix signal is not included in the code stream of the Nth frame.
- the Nth frame stereo parameter set may be encoded, or the Nth frame stereo parameter set may not be encoded.
- the encoder when the encoder does not encode the Nth frame downmix signal, the encoding of the Nth frame stereo parameter set is taken as an example, but optionally, the encoder does not downmix the signal to the Nth frame.
- the Nth frame stereo parameter set may not be encoded.
- the specific encoder does not encode the Nth frame stereo parameter and the Nth frame downmix signal
- the decoder obtains the Nth frame downmix signal and the Nth frame stereo.
- the manner of parameter set refer to Embodiment 2 of the present invention.
- step 107 the encoder sends the Nth frame code stream to the decoder.
- the Nth frame code stream includes not only the Nth frame stereo parameter
- the set of numbers also includes the Nth frame downmix signal.
- Step 108 The decoder determines that the Nth frame code stream is the first type of frame, and then decodes the Nth frame code stream to obtain the Nth frame downmix signal and the Nth frame stereo parameter set, and step 111 is performed.
- the second type of frame does not include a downmix signal. Therefore, the size of the first type of frame is larger than the size of the second type of frame, and the decoder can pass the The size of the frame code stream is used to determine whether the Nth frame code stream is the first type frame or the second type frame.
- the identifier bit may be encapsulated in the Nth frame code stream, and the decoder is in the Nth frame code. After the stream portion is decoded, the identifier bit is obtained, and the code stream of the Nth frame is determined to be the first type frame or the first type frame according to the identifier bit. For example, if the identifier bit is 1, the code stream of the Nth frame is the first type frame, and the flag bit is 0.
- the Nth frame code stream is a second type of frame.
- the decoder determines the decoding mode according to the rate corresponding to the Nth stream, for example, the rate of the Nth frame stream is 17.4 kbps, wherein the rate of the stream corresponding to the downmix signal is 13.2 kbps, stereo.
- the code flow rate corresponding to the parameter set is 4.2 kbps
- the code stream corresponding to the downmix signal is decoded according to the decoding method corresponding to 13.2 kbps
- the code stream corresponding to the stereo parameter set is decoded according to the decoding method corresponding to 4.2 kbps.
- the decoder determines the coding mode of the Nth code stream according to the coding mode identifier bit in the Nth code stream, and then decodes the Nth frame code stream according to the decoding mode corresponding to the coding mode.
- Step 109 The encoder sends an Nth frame code stream to the decoder, where the Nth frame code stream includes an Nth frame stereo parameter set.
- Step 110 The decoder determines that the Nth frame code stream is the second type of frame, and then decodes the Nth frame code stream to obtain the Nth frame stereo parameter set, and according to the preset first rule, before the Nth frame is downmixed.
- the m frame downmix signal is determined, and based on the m frame downmix signal, the Nth frame downmix signal is obtained based on the predetermined first algorithm, where m is a positive integer greater than zero.
- the average value of the (N-3)th frame, the (N-2)th frame, and the (N-1)th frame downmix signal is taken as the Nth frame downmix signal, or the (N-1)
- the frame downmix signal is directly used as the Nth frame downmix signal, or the Nth frame downmix signal is estimated according to other algorithms.
- the (N-1) frame downmix signal can also be directly used as the Nth frame downmix signal; or According to the (N-1) frame downmix signal and a preset offset value, the Nth frame downmix signal is obtained based on a preset algorithm.
- Step 111 The decoder restores the Nth frame downmix signal to the N channel Nth frame audio signal according to a predetermined second algorithm according to the target stereo parameter of the Nth frame stereo parameter set.
- the target stereo parameter is at least one stereo parameter in the Nth stereo parameter set.
- the process of the decoder reverting the Nth frame downmix signal to the N channel Nth frame audio signal is an inverse process in which the encoder mixes the N channel Nth frame audio signal into the Nth frame downmix signal. Assuming that the encoder side down-mixes the N-th frame according to the IPD and ILD in the N-th stereo parameter set, the decoder then down-mixes the N-th frame according to the IPD and ILD in the N-th stereo parameter set. Restore to the Nth frame signal of each channel in the Kth pair of channels.
- the algorithm for restoring the downmix signal preset in the decoder may be an inverse algorithm of an algorithm for generating a downmix signal in the encoder, or an algorithm independent of an algorithm for generating a downmix signal in the encoder. .
- a method for processing a multi-channel audio signal according to Embodiment 2 of the present invention includes:
- Step 200 The encoder generates an Nth frame stereo parameter set according to the Nth frame audio signal of the two channels in the multichannel, wherein the stereo parameter set includes Z stereo parameters.
- the Z stereo parameters include parameters used by the encoder to mix the Nth frame audio signals based on a predetermined first algorithm, and Z is a positive integer greater than zero.
- the predetermined first algorithm is a downmix signal generation algorithm that is preset in the encoder.
- stereo parameter generation algorithm determines which stereo parameters are included in the Nth stereo parameter set.
- one channel of the two channels is the left channel and one is the right channel
- the preset The stereo parameter generation algorithm is as follows, then the stereo parameter obtained from the Nth frame audio signal is ITD:
- N is the frame length
- l (j) represents the time domain signal frame of the left channel at time j
- r (j) represents the time domain signal frame of the right channel at time j
- the preset stereo parameter generation algorithm further includes the following algorithm for generating an IPD
- the IPD can also be obtained according to the following algorithm. Specifically, the IPD of the b-th sub-band satisfies the following expression:
- B is the total number of subbands occupied by the audio signal in the frequency domain
- L(k) is the signal of the Nth frame of the left channel in the kth frequency
- R * (k) is the right sound The conjugate of the signal of the Nth frame of the audio signal at the kth frequency.
- the preset stereo parameter generation algorithm further includes the algorithm for generating the ILD in the first embodiment of the present invention
- the ILD can also be obtained.
- Step 201 The encoder mixes the Nth frame audio signals of the two channels into the Nth frame downmix signal according to a predetermined algorithm according to at least one stereo parameter in the Nth frame stereo parameter set.
- the method for obtaining the Nth frame downmix signal in the first embodiment of the present invention is not limited to the method for obtaining the Nth frame downmix signal in the embodiment of the present invention.
- Step 202 The encoder detects whether the voice signal is included in the downmix signal of the Nth frame, and if yes, step 203 is performed, otherwise step 204 is performed.
- the encoder detects whether the voice signal is included in the downmix signal of the Nth frame, and the encoder can detect whether the voice signal is included in the downmix signal of the Nth frame in the first embodiment of the present invention. The way.
- Step 203 The encoder encodes the Nth frame downmix signal according to a preset speech frame encoding rate, And encoding the Nth frame stereo parameter set, step 211 is performed.
- the encoder when the encoder includes two modes for encoding the set of stereo parameters, the first coding mode and the second coding mode, where the coding rate specified by the first coding mode is not less than the coding rate specified by the second coding mode; And/or, for any stereo parameter in the Nth stereo parameter set, the quantization precision specified by the first coding mode is not lower than the quantization precision specified by the second coding mode, and in step 203, the encoder follows the first coding mode. , encoding the Nth frame stereo parameter set.
- the Nth stereo parameter set includes IPD and ITD
- the quantization precision of the IPD specified in the first coding mode is not lower than the quantization precision of the IPD specified in the second coding mode, and the quantization of the ITD specified in the first coding mode.
- the accuracy is not lower than the quantization precision of the ITD specified in the second encoding method.
- the speech frame encoding rate can be set to 13.2 kbps.
- Step 204 The encoder determines whether the downmix signal of the Nth frame satisfies the preset voice frame coding condition. If yes, step 205 is performed, otherwise, step 206 is performed.
- Step 205 The encoder encodes the Nth frame downmix signal according to the preset speech frame encoding rate, and encodes the Nth frame stereo parameter set, and performs step 211.
- the encoder when the encoder includes two modes for encoding the set of stereo parameters, the first coding mode and the second coding mode, where the coding rate specified by the first coding mode is not less than the coding rate specified by the second coding mode; And/or, for any stereo parameter in the Nth stereo parameter set, the quantization precision specified by the first coding mode is not lower than the quantization precision specified by the second coding mode, and in step 205, the encoder follows the first coding mode. , encoding the Nth frame stereo parameter set.
- Step 206 The encoder determines whether the downmix signal of the Nth frame satisfies the preset SID encoding condition, and determines whether the stereo parameter set of the Nth frame satisfies the preset stereo parameter encoding condition. If the condition is satisfied, step 207 is performed. The N frame downmix signal satisfies the preset SID encoding condition, and the Nth frame stereo parameter set does not satisfy the preset stereo parameter encoding condition, and then step 208 is performed, if the Nth frame downmix signal does not satisfy the preset SID encoding condition, If the Nth frame stereo parameter set satisfies the preset stereo parameter encoding condition, step 209 is performed. If it is not satisfied at the same time, step 210 is performed.
- the encoder encodes the at least one stereo parameter in the Nth stereo parameter set, it is determined whether the stereo parameter in the at least one stereo parameter satisfies the preset corresponding stereo parameter encoding condition, specifically, if the Nth frame At least one stereo parameter in the stereo parameter set includes: an inter-channel level difference ILD; the preset stereo parameter encoding condition includes: D L ⁇ D 0 ; wherein D L represents a degree of deviation of the ILD from the first standard, first The standard is determined according to a predetermined third algorithm according to a T-frame stereo parameter set before the N-th stereo parameter set, and T is a positive integer greater than 0;
- At least one stereo parameter in the set of stereo parameters of the Nth frame comprises: an inter-channel time difference ITD; the preset stereo parameter encoding condition includes: D T ⁇ D 1 ;
- D T represents a degree of deviation of the ITD from the second standard
- the second criterion is determined according to a predetermined fourth algorithm according to a T-frame stereo parameter set before the N-th stereo parameter set, and T is a positive integer greater than 0;
- At least one stereo parameter in the Nth stereo parameter set includes: an inter-channel phase difference IPD; the preset stereo parameter encoding condition includes: D p ⁇ D 2 ;
- D P represents the degree of deviation of the IPD from the third standard
- the third criterion is determined according to a predetermined fifth algorithm based on the T-frame stereo parameter set before the N-th stereo parameter set, and T is a positive integer greater than 0.
- the third algorithm, the fourth algorithm, and the fifth algorithm are preset according to actual conditions.
- the preset stereo parameter encoding condition includes only D T ⁇ D 1 , and then at least one stereo parameter in the Nth stereo parameter set.
- the included ITD satisfies D T ⁇ D 1 , and encodes at least one stereo parameter in the Nth stereo parameter set;
- at least one stereo parameter in the Nth stereo parameter set includes only ITD, IPD, preset stereo
- the parameter encoding condition includes only D T ⁇ D 1 , and then at least one stereo parameter in the Nth stereo parameter set is encoded when at least one stereo parameter included in the stereo parameter set of the Nth frame satisfies D T ⁇ D 1 .
- the preset stereo parameter encoding conditions include D T ⁇ D 1 and D L ⁇ D 0 , then only in the Nth frame stereo
- the encoder encodes the ITD and ILD when at least one stereo parameter in the parameter set includes an ITD that satisfies D T ⁇ D 1 and the ILD satisfies D L ⁇ D 0 .
- D L , D T , and D P respectively satisfy the following expressions:
- ILD(m) is a level difference when the N channel transmits the Nth frame audio signal in the mth subband
- M is the total number of subbands occupied by the Nth frame audio signal.
- T is a positive integer greater than 0
- ILD [-t] (m) is the two channels in the mth subband respectively.
- the level difference when transmitting the t-th frame audio signal before the Nth frame audio signal, and the ITD is the time difference when the N-channel audio signal is transmitted by the two channels respectively.
- ITD [-t] The average value of the ITD in the T-frame stereo parameter set before the Nth frame, ITD [-t] is the time difference when the t-frame audio signal before the N-th frame audio signal is transmitted separately, IPD ( m) a phase difference value when the two channels respectively transmit a partial audio signal in the Nth frame audio signal in the mth subband, The average value of the IPD in the mth subband in the T frame stereo parameter set before the Nth frame, IPD [-t] (m) is the two channels before the transmission of the Nth frame audio signal in the mth subband respectively The phase difference value of the t-th frame audio signal.
- Step 207 The encoder encodes the Nth frame downmix signal according to the preset SID encoding rate, and encodes at least one stereo parameter in the Nth frame stereo parameter set, and performs step 211.
- the encoder when the encoder encodes two modes for encoding the stereo parameter set, the first coding mode and the second coding mode, where the coding rate specified by the first coding mode is not less than the second coding
- the mode encodes at least one stereo parameter in the Nth stereo parameter set.
- the encoder encodes the Nth frame stereo parameter set according to 4.2 kbps, and in the second coding mode, the encoder encodes the Nth frame stereo parameter set according to 1.2 kbps.
- the encoder obtains X target stereo parameters according to the preset stereo parameter dimension reduction rules according to the Z stereo parameters in the Nth stereo parameter set. And encoding X target stereo parameters, where X is a positive integer greater than zero and less than or equal to Z.
- the stereo parameter set of the Nth frame includes three types of stereo parameters: IPD, ITD, and ILD, wherein the ILD is composed of ILDs of 10 sub-bands of ILD(0)...ILD(9), and the IPD is composed of IPD(0)... IPD (9) IPD composition of 10 sub-bands, ITD consists of ITD (0), ITD (1) 2 time domain sub-band ITD, assuming that the preset stereo parameter dimension reduction rule is only two in the stereo parameter set.
- the encoder selects any two types of stereo parameters from IPD, ITD, ILD. If IPD and ILD are selected, the encoder encodes IPD and ILD.
- the preset stereo parameter reduction rule is only half of the stereo parameters of each type, then select 5 from ILD(0)...ILD(9) and select from IPD(0)...IPD(9). 5, select one from ITD (0), ITD (1), and encode the selected parameters; or, the preset stereo parameter reduction rule is to select 5 from ILD and IPD respectively, or preset Stereo Parameter Dimensionality Rule
- the adjacent subbands in ILD(0)...ILD(9) are combined, for example, ILD(0), ILD is obtained.
- the mean of (1) is obtained by the new ILD(0), and the average of ILD(2) and ILD(3) is obtained to obtain the new ILD(1),..., and the average of ILD(8) and ILD(9) is obtained.
- the new ILD (4) wherein the sub-band corresponding to the new ILD (0) is equal to the sub-band corresponding to the original ILD (0), ILD (1), ..., the sub-band corresponding to the new ILD (4) is equal to the original ILD ( 8), sub-band corresponding to ILD (9).
- the adjacent sub-bands in IPD(0)...IPD(9) are combined to obtain a new IPD(0)...IPD(4), which will be ITD(0), ITD (1)
- the average value is also combined to obtain a new ITD (0), wherein the time domain signal corresponding to the new ITD (0) is the same as the time domain signal corresponding to the original ITD (0) and ITD (1).
- the new ILD(0)...ILD(4), the new IPD(0)...IPD(4) and the new ITD(0) are encoded.
- the preset stereo parameter dimension reduction rule is to reduce the frequency domain resolution of the ILD, and then merge the adjacent sub-bands in ILD(0)...ILD(9), for example, to obtain ILD(0), ILD(1)
- the mean value gets the new ILD(0), and the average of ILD(2) and ILD(3) is obtained to obtain the new ILD(1),..., and the average of ILD(8) and ILD(9) is obtained to obtain the new ILD ( 4), wherein the sub-band corresponding to the new ILD (0) is equal to the sub-band corresponding to the original ILD (0), ILD (1), ..., the sub-band corresponding to the new ILD (4) is equal to the original ILD (8), ILD (9) Corresponding subbands. Then, the new ILD(0)...ILD(4) is encoded.
- Step 208 The encoder encodes the Nth frame downmix signal according to the preset SID encoding rate, and does not encode at least one stereo parameter in the Nth frame stereo parameter set, and performs step 211.
- Step 209 The encoder encodes at least one stereo parameter in the Nth frame stereo parameter set, and does not encode the Nth frame downmix signal, and performs step 215.
- Step 210 The encoder does not encode the Nth frame downmix signal and the Nth frame stereo parameter set, and step 217 is performed.
- the code stream includes four different types of frames, that is, a third type frame, a fourth type frame, a fifth type frame, and a sixth type frame, wherein the third stream
- the type frame contains a stereo parameter set and does not include a downmix signal.
- the fourth type frame does not include a downmix signal and does not include a stereo parameter set.
- the fifth type frame includes a downmix signal and a stereo parameter set
- the sixth type frame includes The downmix signal is included and does not include a stereo parameter set
- the fifth type frame and the sixth type frame are respectively a case including a downmix signal type frame
- the third type frame and the fourth type frame respectively do not include a downmix signal A case of a type frame.
- the Nth frame code stream in step 203, step 205, and step 207 is a fifth type frame
- the Nth frame code stream obtained in step 208 is a sixth type frame
- the Nth frame obtained in step 209 is obtained.
- the code stream is a third type of frame
- the Nth frame code stream obtained in step 211 is a fourth type of frame.
- Step 211 The encoder sends an Nth frame code stream to the decoder, where the Nth frame code stream includes an Nth frame downmix signal and an Nth frame stereo parameter set.
- Step 212 The decoder receives the Nth frame code stream, determines that the Nth frame code stream is the fifth type frame, and decodes the Nth frame code stream to obtain the Nth frame downmix signal and the Nth frame stereo parameter set, and performs steps. 218.
- Embodiment 1 of the present invention For a specific implementation manner in which the decoder determines which type of frame is the N-th code stream, refer to Embodiment 1 of the present invention.
- the decoder decodes the Nth code stream according to the rate corresponding to the Nth code stream. Specifically, if the encoder encodes the Nth frame downmix signal according to 13.2 kbps, the decoder follows the 13.2 kbps pair N. The code stream of the Nth frame downmix signal in the frame code stream is decoded. If the encoder encodes the Nth frame stereo parameter set according to 4.2 kbps, the decoder performs the Nth frame stereo parameter set in the Nth frame code stream according to 4.2 kbps. Code stream decoding.
- Step 213 The encoder sends an Nth frame code stream to the decoder, where the Nth frame code stream includes an Nth frame downmix signal.
- Step 214 The decoder determines that the Nth frame code stream is the sixth type frame, and then decodes the Nth frame code stream to obtain the Nth frame downmix signal, and according to the preset second rule, from the Nth frame stereo parameter set.
- the k-frame stereo parameter set is determined, and according to the k-frame stereo parameter set, the N-th frame stereo parameter set is obtained based on the predetermined sixth algorithm, and step 218 is performed.
- the stereo parameter set specified in the second rule is preset to be the frame closest to the distance P, and the stereoscopic parameter set obtained by decoding is obtained according to the following algorithm.
- Nth frame stereo parameter P Nth frame stereo parameter P:
- P represents the stereo parameter of the Nth frame, a stereoscopic parameter representing a frame closest to P and obtained by decoding
- ⁇ represents an absolute value relative to a smaller random number, for example, ⁇ may be one with A random number between.
- the stereo parameters in the set of stereo parameters of the Nth frame are not limited to the foregoing method.
- Step 215 The encoder sends an Nth frame code stream to the decoder, where the Nth frame code stream includes at least one stereo parameter in the Nth frame stereo parameter set.
- Step 216 The decoder determines that the Nth frame code stream is a third type of frame, and then decodes the Nth frame code stream to obtain at least one stereo parameter in the Nth frame stereo parameter set, and according to the preset first rule, from the first In the at least one frame downmix signal before the N frame downmix signal, the m frame downmix signal is determined, and according to the m frame downmix signal, the Nth frame downmix signal is obtained based on the predetermined second algorithm, where m is greater than zero. Integer, go to step 218.
- the average value of the (N-3)th frame, the (N-2)th frame, and the (N-1)th frame downmix signal is taken as the Nth frame downmix signal, or the (N-1)
- the frame downmix signal is directly used as the Nth frame downmix signal, or the Nth frame downmix signal is estimated according to other algorithms.
- the (N-1) frame downmix signal may be directly used as the Nth frame downmix signal; or, based on the (N-1) frame downmix signal and a preset offset value, based on a preset algorithm. Perform the operation to get the N ⁇ downmix signal.
- Step 217 after the decoder receives the Nth frame code stream, and determines that the Nth frame code stream is the fourth type of frame, according to the preset second rule, from at least one frame of the stereo parameter set before the Nth frame stereo parameter set, Determining a k-frame stereo parameter set, and obtaining an Nth frame stereo parameter set based on a predetermined sixth algorithm according to the k-frame stereo parameter set;
- the m-frame downmix signal is determined from the at least one frame downmix signal before the downmix signal of the Nth frame, and according to the m frame downmix signal, the Nth frame is obtained based on the predetermined second algorithm.
- Mixed signal, m is a positive integer greater than zero.
- Step 218 The decoder restores the Nth frame downmix signal to the N channel Nth frame audio signal according to a predetermined seventh algorithm according to the target stereo parameter of the Nth frame stereo parameter set.
- the encoder detects whether the N-frame downmix signal includes a voice signal through the N-th frame audio signal in the two channels, it also provides a coding mode for the stereo parameter set, specifically If the encoder detects that any of the N channels of the audio signals of the two channels includes a voice signal, according to the Nth frame audio signal, based on the first stereo parameter set generation manner, the Nth frame stereo parameter set is obtained, and the Nth frame is obtained.
- the encoder does not include the voice signal in the Nth frame audio signal in the two channels: if the Nth frame audio signal satisfies the preset voice frame coding condition, the audio signal according to the Nth frame is based on the first a stereo parameter set generation manner, obtaining a Nth frame stereo parameter set, and encoding the Nth frame stereo parameter set; if it is determined that the Nth frame audio signal does not satisfy the preset speech frame encoding condition, according to the Nth frame audio signal, Obtaining an Nth frame stereo parameter set based on the second stereo parameter set generation manner, and
- the first stereo parameter set generation manner and the second stereo parameter set generation manner satisfy at least one of the following conditions:
- the number of stereo parameter types included in the stereo parameter set specified by the first stereo parameter set generation mode is not less than the number of stereo parameter types included in the stereo parameter set specified by the second stereo parameter set generation mode, and the first stereo parameter
- the number of stereo parameters included in the stereo parameter set specified by the set generation mode is not less than the number of stereo parameters included in the stereo parameter set specified by the second stereo parameter set generation mode, and the stereo set by the first stereo parameter set generation mode
- the resolution of the parameter in the time domain is not lower than the resolution of the corresponding stereo parameter specified by the second stereo parameter set generation mode, and the resolution of the stereo parameter specified by the first stereo parameter set generation mode is not lower than the resolution in the frequency domain.
- the second stereo parameter set generation mode specifies the resolution of the corresponding stereo parameter in the frequency domain.
- the stereo parameter set obtained by the first stereo set generation method has higher accuracy in the frequency domain or the time domain than the stereo parameter set obtained by the second stereo set generation mode.
- the encoder when the encoder detects that the N-th frame downmix signal includes a voice signal, encoding the N-th frame downmix signal according to the voice coding rate, and N frame stereo parameter set encoding; when the encoder detects that the Nth frame downmix signal does not include a speech signal: if the Nth frame downmix signal satisfies a preset speech frame encoding condition, then the Nth frame according to the speech encoding rate Downmixing signal encoding, and encoding the Nth frame stereo parameter set; if the Nth frame downmix signal does not satisfy the preset speech frame encoding condition but satisfies the preset SID encoding condition, then the Nth frame according to the SID encoding rate Downmix signal encoding, and at least a set of stereo parameters in the Nth frame A stereo parameter encoding, if the downmix signal of the Nth frame does not satisfy the preset speech frame
- the encoder does not judge the stereo parameter set, and the stereo parameter set is not used when the downmix signal is encoded. coding.
- the code stream obtained by encoding the downmix signal by the third encoder of the embodiment of the present invention includes two types of frames, a first type frame and a second type frame, wherein the first type frame includes a downmix signal and includes a stereo parameter set, The second type of frame does not include the downmix signal and does not include the stereo parameter set.
- the specific decoder receives the code stream and restores the audio signal of the two channels, refer to the second embodiment of the present invention and the first embodiment of the present invention.
- the encoder determines the Nth frame stereo. Whether the parameter set satisfies the preset stereo parameter encoding condition. If yes, the encoder does not encode the Nth frame downmix signal, but encodes at least one stereo parameter in the Nth frame stereo parameter set, otherwise the encoder does not downmix the Nth frame signal. And the Nth frame stereo parameter set encoding.
- the code stream obtained based on the above encoding method includes three types of frames, a first type frame, a third type frame, and a fourth type frame, wherein the first type frame includes a downmix signal and includes a stereo parameter set, and the third type frame A method that does not include a downmix signal but includes a stereo parameter set, and the fourth type frame does not include a downmix signal and does not include a stereo parameter set.
- the specific decoder receives the code stream, the method of restoring the audio signal of the two channels is described in the present invention. Embodiment 2 and Embodiment 1 of the present invention.
- the difference between the foregoing technical solution and the second embodiment of the present invention is: determining whether the Nth frame stereo parameter set is determined when the Nth frame downmix signal does not satisfy the preset speech frame coding condition or the preset SID coding condition.
- the preset stereo parameter encoding conditions are met.
- the encoder when the encoder detects that the N-th frame downmix signal includes a voice signal, encoding the N-th frame downmix signal according to a voice coding rate, and Encoding the Nth frame stereo parameter set; when the encoder detects the Nth frame downmix signal When the voice signal is included: if the downmix signal of the Nth frame satisfies the preset voice frame coding condition, the Nth frame downmix signal is encoded according to the voice coding rate, and the Nth frame stereo parameter set is encoded; if the Nth frame is under the Nth frame The mixed signal does not satisfy the preset speech frame encoding condition, but satisfies the preset SID encoding condition, and the encoder determines whether the Nth frame stereo parameter set satisfies the preset stereo parameter encoding condition, and when the Nth frame stereo parameter set satisfies the preset The stereo parameter set encoding condition, the encoder detects the Nth frame downmix signal When the voice signal is included: if the downmix signal
- the code stream obtained by the fourth coding method in the embodiment of the present invention includes three types of frames, a fifth type frame, a sixth type frame, and a second type frame, wherein the fifth type frame includes a downmix signal and a stereo parameter set, and a sixth
- the type frame includes a downmix signal and does not include a stereo parameter set.
- the second type frame does not include a downmix signal and does not include a stereo parameter set.
- the difference between the fourth embodiment of the present invention and the second embodiment of the present invention is: determining whether the Nth frame stereo parameter is used when the downmix signal of the Nth frame does not satisfy the preset voice frame coding condition but meets the preset SID coding condition. At least one stereo parameter encoding in the set, if the preset speech frame encoding condition is not met, and the preset SID encoding condition is not met, the Nth frame stereo parameter set is not encoded.
- the specific decoder obtains the Nth frame downmix signal and the Nth frame stereo parameter set.
- Embodiment 2 of the present invention and Embodiment 1 of the present invention and stereo mode.
- the parameter and the downmix signal coding reference may also be made to Embodiment 2 of the present invention and Embodiment 1 of the present invention.
- the first and second predetermined first algorithm and the second predetermined algorithm have no special meaning, and are only used to distinguish different algorithms, third, fourth, fifth, and sixth.
- the seventh and the like are similar, and will not be repeated here.
- an embodiment of the present invention further provides an encoder, a decoder, and a codec system.
- the method corresponding to the encoder, the decoder, and the codec system in the embodiment of the present invention is In the embodiment of the present invention, the implementation of the encoder, the decoder, and the codec system of the embodiments of the present invention can be referred to the implementation of the method, and the repeated description is omitted.
- the encoder of the embodiment of the present invention includes: a signal detecting unit 300 and a signal encoding unit 310, wherein the signal detecting unit 300 is configured to detect whether a voice signal is included in the downmix signal of the Nth frame, under the Nth frame.
- the mixed signal is obtained by mixing the Nth frame audio signals of the two channels in the multi-channel based on a predetermined first algorithm, and N is a positive integer greater than zero; the signal encoding unit 310 is configured to detect the first in the signal detecting unit 300.
- the N-th frame downmix signal When the N-frame downmix signal includes a voice signal, the N-th frame downmix signal is encoded, and when the signal detecting unit 300 detects that the Nth frame downmix signal does not include the voice signal: if the signal detecting unit 300 determines the Nth frame If the downmix signal satisfies the preset audio frame coding condition, the Nth frame downmix signal is encoded; if the signal detection unit 300 determines that the Nth frame downmix signal does not satisfy the preset audio frame coding condition, then the Nth frame is not Mixed signal coding.
- the signal encoding unit 310 includes a first signal encoding unit 311 and a second signal encoding unit 312.
- the signal detection unit 300 detects that the N-th downmix signal includes a voice signal
- the signal detection unit The unit 300 notifies the first signal encoding unit 311 to encode the Nth frame downmix signal;
- the signal detecting unit 300 determines that the Nth frame downmix signal satisfies the preset C frame encoding condition, the first signal encoding unit 311 is notified to encode the Nth frame downmix signal;
- the first signal encoding unit 311 is configured to encode the Nth frame downmix signal according to the preset speech frame encoding rate
- the second signal encoding unit 312 is notified to encode the Nth frame downmix signal. Specifically, the second signal encoding unit 312 encodes the Nth frame downmix signal according to the preset SID encoding rate; wherein the SID encoding rate is not greater than the speech frame encoding rate.
- the encoder shown in FIG. 3a and FIG. 3b further includes a parameter generating unit 320 and parameters.
- the encoder is based on a parameter used when the predetermined first algorithm mixes the Nth frame audio signal, and Z is a positive integer greater than zero; the parameter encoding unit 330 is configured to include the voice signal in the Nth frame downmix signal detected by the signal detecting unit.
- the signal detecting unit 300 And encoding the Nth frame stereo parameter set, and when the signal detecting unit 300 detects that the Nth frame downmix signal does not include the voice signal: if the signal detecting unit 300 determines that the Nth frame stereo parameter set satisfies the preset stereo
- the parameter encoding condition encodes at least one stereo parameter in the Nth frame stereo parameter set; if the signal detecting unit 300 determines that the Nth frame stereo parameter set does not satisfy the preset stereo parameter encoding condition, the stereo parameter set is not encoded.
- the parameter encoding unit 330 is configured to obtain X target stereo parameters according to a preset stereo parameter dimension reduction rule according to Z stereo parameters in the Nth frame stereo parameter set, and encode X target stereo parameters, Where X is a positive integer greater than zero and less than or equal to Z.
- the second parameter encoding unit 332 is configured to follow the preset Z parameters according to the Z stereo parameters in the Nth frame stereo parameter set. Stereo parameter reduction rule, get X target stereo parameters, and encode X target stereo parameters.
- the encoder parameter generating unit 320 shown in FIG. 3c includes a first parameter generating unit 321 and a second parameter generating unit 322, and the signal detecting unit 300 detects the Nth.
- the first parameter generating unit 321 is notified to generate the first The N-frame stereo parameter set; the signal detecting unit 300 detects that the N-th frame audio signal does not include the voice signal, and the N-th frame audio signal does not satisfy the preset voice frame coding condition, and notifies the second parameter generating unit 322 to generate the Nth frame.
- the stereo parameter set specifically, the pre-defined first parameter generating unit 321 obtains the Nth frame stereo parameter set based on the first stereo parameter set generation manner according to the Nth frame audio signal, and the second parameter generating unit 322 according to the Nth frame audio. Signal, based on the second stereo parameter set generation method, The Nth frame stereo parameter set.
- the first stereo parameter set generation manner and the second stereo parameter set generation manner satisfy at least one of the following conditions:
- the number of stereo parameter types included in the stereo parameter set specified by the first stereo parameter set generation mode is not less than the number of stereo parameter types included in the stereo parameter set specified by the second stereo parameter set generation mode, and the first stereo parameter
- the number of stereo parameters included in the stereo parameter set specified by the set generation mode is not less than the number of stereo parameters included in the stereo parameter set specified by the second stereo parameter set generation mode, and the stereo set by the first stereo parameter set generation mode
- the resolution of the parameter in the time domain is not lower than the resolution of the corresponding stereo parameter specified by the second stereo parameter set generation mode, and the resolution of the stereo parameter specified by the first stereo parameter set generation mode is not lower than the resolution in the frequency domain.
- the second stereo parameter set generation mode specifies the resolution of the corresponding stereo parameter in the frequency domain.
- the second parameter generating unit 322 encodes the Nth frame stereo parameter set by the parameter encoding unit 330.
- the parameter encoding unit 330 includes the first parameter encoding unit. 331 and the second parameter encoding unit 332, the Nth frame stereo parameter set generated by the first parameter generating unit 321 is encoded by the first parameter encoding unit 331; and generated by the second parameter encoding unit 332 by the second parameter encoding unit 332.
- the Nth frame stereo parameter set encoding; the encoding mode of the first parameter encoding unit 331 is pre-defined as the first encoding mode, and the encoding mode of the second parameter encoding unit 332 is pre-defined as the second encoding mode, wherein the first parameter encoding unit specifies The coding mode is the first coding mode, and the coding mode specified by the second parameter coding unit is the second coding mode.
- the coding rate specified by the first coding mode is not less than the coding rate specified by the second coding mode; and/or, For any stereo parameter in the Nth stereo parameter set, the quantization accuracy specified by the first coding mode is not A second quantization accuracy in a predetermined encoding scheme.
- the stereo parameter set is not encoded.
- the parameter encoding unit 330 includes a first parameter encoding unit 331 and a second parameter encoding.
- the unit 332 specifically, the first parameter encoding unit 331 is configured to include a voice signal in the Nth frame downmix signal and a voice signal in the Nth frame downmix signal but satisfy the voice frame coding condition, according to the first code.
- the method encodes the Nth frame stereo parameter set; the second parameter encoding unit 332 is configured to: when the Nth frame downmix signal does not satisfy the speech frame encoding condition, according to the second encoding mode, at least one stereo in the Nth frame stereo parameter set.
- Parameter encoding is configured to: when the Nth frame downmix signal does not satisfy the speech frame encoding condition, according to the second encoding mode, at least one stereo in the Nth frame stereo parameter set.
- the coding rate specified by the first coding mode is not less than the coding rate specified by the second coding mode; and/or, for any stereo parameter in the Nth frame stereo parameter set, the quantization accuracy specified by the first coding mode is not lower than The quantization accuracy specified by the second coding method.
- At least one stereo parameter in the set of stereo parameters of the Nth frame comprises: an inter-channel level difference ILD; the preset stereo parameter encoding condition includes: D L ⁇ D 0 ;
- D L represents a degree of deviation of the ILD from the first standard
- the first criterion is a T-frame stereo parameter set before the N-th stereo parameter set
- T is a positive integer greater than 0, determined according to a predetermined second algorithm
- At least one stereo parameter in the set of stereo parameters of the Nth frame comprises: an inter-channel time difference ITD; the preset stereo parameter encoding condition includes: D T ⁇ D 1 ;
- D T represents a degree of deviation of the ITD from the second standard
- the second criterion is based on a T-frame stereo parameter set before the N-th stereo parameter set, and T is a positive integer greater than 0, determined based on a predetermined third algorithm
- At least one stereo parameter in the Nth stereo parameter set includes: an inter-channel phase difference IPD; the preset stereo parameter encoding condition includes: D p ⁇ D 2 ;
- D P represents the degree of deviation of the IPD from the third standard
- the third criterion is determined according to a predetermined fourth algorithm according to the T-frame stereo parameter set before the N-th stereo parameter set, and T is a positive integer greater than 0.
- D L , D T , and D P respectively satisfy the following expressions:
- ILD(m) is a level difference when the N channel transmits the Nth frame audio signal in the mth subband
- M is the total number of subbands occupied by the Nth frame audio signal.
- T is a positive integer greater than 0
- ILD [-t] (m) is the two channels in the mth subband respectively.
- the level difference when transmitting the t-th frame audio signal before the Nth frame audio signal, and the ITD is the time difference when the N-channel audio signal is transmitted by the two channels respectively.
- ITD [-t] The average value of the ITD in the T-frame stereo parameter set before the Nth frame, ITD [-t] is the time difference when the t-frame audio signal before the N-th frame audio signal is transmitted separately, IPD ( m) a phase difference value when the two channels respectively transmit a partial audio signal in the Nth frame audio signal in the mth subband, The average value of the IPD in the mth subband in the T frame stereo parameter set before the Nth frame, IPD [-t] (m) is the two channels before the transmission of the Nth frame audio signal in the mth subband respectively The phase difference value of the t-th frame audio signal.
- the parameter detecting unit 340 shown in FIG. 3a to FIG. 3d is optional, that is, the parameter detecting unit 340 may be present in the encoder, or the parameter detecting unit 340 may not be present.
- the parameter encoding unit 330 encodes each frame of the stereo parameter set by the parameter generating unit 320, it is not necessary to detect the stereo parameter and directly encode it.
- the decoder of the embodiment of the present invention includes: a receiving unit 400 and a decoding unit 410, where the receiving unit 400 is configured to receive a code stream, where the code stream includes at least two frames, and at least two frames exist.
- At least one first type frame and at least one second type frame the first type of frame includes a downmix signal, the second type of frame does not include a downmix signal; and for the Nth frame code stream, N is a positive integer greater than one
- the decoding unit 410 is configured to: if it is determined that the Nth frame code stream is the first type of frame, then the Nth frame code The stream decoding is performed to obtain the Nth frame downmix signal; if the Nth frame code stream is determined to be the second type frame, the at least one frame downmix signal before the Nth frame downmix signal is determined according to the preset first rule.
- the m frame is downmixed, and according to the m frame downmix signal, based on the predetermined first algorithm, the Nth frame downmix signal is obtained, where m is a positive integer greater than zero;
- the Nth frame downmix signal is obtained by the encoder mixing the Nth frame audio signals of the two channels of the multichannel based on a predetermined second algorithm.
- the decoder shown in FIG. 4 further includes a signal restoring unit 420.
- the first type of frame includes a downmix signal and a stereo parameter set
- the second type of frame includes a stereo parameter set and does not include a downmix signal:
- Decoding unit 410 if it is determined that the Nth frame code stream is the first type of frame, decodes the Nth frame code stream, and obtains the Nth frame stereo parameter set while obtaining the Nth frame downmix signal; if the Nth frame is determined The code stream is a second type of frame, and the Nth frame code stream is decoded to obtain an Nth frame stereo parameter set; wherein at least one stereo parameter in the Nth frame stereo parameter set is used by the decoder based on the predetermined third algorithm The N frame downmix signal is restored to the Nth frame audio signal;
- the signal restoring unit 420 is configured to restore the Nth frame downmix signal to the Nth frame audio signal according to the third algorithm according to at least one stereo parameter in the Nth frame stereo parameter set.
- the first type of frame includes a downmix signal and a stereo parameter set
- the second type of frame does not include a downmix signal and does not include a stereo parameter set
- the decoding unit 410 is further configured to: if the Nth frame code stream is determined to be the first type of frame, decode the Nth frame code stream, and obtain the Nth frame stereo parameter set while obtaining the Nth frame downmix signal; if The Nth frame code stream is a second type of frame, and according to the preset second rule, the k frame stereo parameter set is determined from the at least one frame stereo parameter set before the Nth frame stereo parameter set, and according to the k frame stereo parameter set. And obtaining, according to a predetermined fourth algorithm, a set of stereo parameters of the Nth frame, where k is a positive integer greater than zero;
- the at least one stereo parameter in the Nth frame stereo parameter set is used by the decoder to restore the Nth frame downmix signal to the Nth frame audio signal based on a predetermined third algorithm;
- a signal restoring unit 420 configured to perform, according to at least one stereo in the set of stereo parameters of the Nth frame
- the acoustic parameter based on the third algorithm, restores the Nth frame downmix signal to the Nth frame audio signal.
- the first type of frame includes a downmix signal and a stereo parameter set
- the third type of frame includes a stereo parameter set and does not include a downmix signal
- the fourth type of frame does not include a downmix signal and does not include a stereo parameter set.
- the third type of frame and the fourth type of frame are respectively a case of the second type of frame:
- the decoding unit 410 is further configured to: if the Nth frame code stream is determined to be the first type of frame, decode the Nth frame code stream, and obtain the Nth frame stereo parameter set while obtaining the Nth frame downmix signal; if The Nth frame code stream is the second type of frame: when the Nth frame code stream is the third type of frame, the Nth frame code stream is decoded to obtain the Nth frame stereo parameter set; when the Nth frame code stream is the fourth In the case of a type frame, determining a k-frame stereo parameter set from at least one frame stereo parameter set preceding the N-th stereo parameter set according to a preset second rule, and based on the k-frame stereo parameter set, based on a predetermined fourth algorithm, Obtaining a set of stereo parameters of the Nth frame, where k is a positive integer greater than zero;
- the at least one stereo parameter in the Nth frame stereo parameter set is used by the decoder to restore the Nth frame downmix signal to the Nth frame audio signal based on a predetermined third algorithm;
- the signal restoring unit 420 is configured to restore the Nth frame downmix signal to the Nth frame audio signal according to the third algorithm according to at least one stereo parameter in the Nth frame stereo parameter set.
- the fifth type frame includes a downmix signal and a stereo parameter set
- the sixth type frame includes a downmix signal and does not include a stereo parameter set
- the fifth type frame and the sixth type frame are respectively the first type frame.
- the second type of frame does not contain a downmix signal and does not contain a stereo parameter set:
- the decoding unit 410 is further configured to: if determining that the Nth frame code stream is the first type of frame: when the Nth frame code stream is the fifth type of frame, decoding the Nth frame code stream, and obtaining the Nth frame downmix signal At the same time, the Nth frame stereo parameter set is also obtained; when the Nth frame code stream is the sixth type frame, according to the preset second rule, the at least one frame stereo parameter set before the Nth frame stereo parameter set is determined. a k-frame stereo parameter set, and based on a k-frame stereo parameter set, obtains an Nth frame stereo parameter set based on a predetermined fourth algorithm;
- the decoding unit 410 is further configured to: if it is determined that the Nth frame code stream is the second type of frame, according to the preset second a rule, determining a k-frame stereo parameter set from at least one frame stereo parameter set before the Nth frame stereo parameter set, and obtaining an Nth frame stereo parameter set based on the predetermined fourth algorithm according to the k-frame stereo parameter set;
- the at least one stereo parameter of the Nth frame stereo parameter set is used by the decoder to restore the Nth frame downmix signal to the Nth frame audio signal based on a predetermined third algorithm, where k is a positive integer greater than zero;
- the signal restoring unit 420 is configured to restore the Nth frame downmix signal to the Nth frame audio signal according to the third algorithm according to at least one stereo parameter in the Nth frame stereo parameter set.
- the fifth type frame includes a downmix signal and a stereo parameter set
- the sixth type frame includes a downmix signal and does not include a stereo parameter set
- the fifth type frame and the sixth type frame are respectively the first type frame.
- the third type of frame includes a stereo parameter set and does not include a downmix signal
- the fourth type of frame does not include a downmix signal and does not include a stereo parameter set
- the third type frame and the fourth type frame are respectively second.
- the decoding unit 410 is further configured to: if the Nth frame code stream is determined to be the first type of frame: when the Nth frame code stream is the fifth type of frame, decode the Nth frame code stream, and obtain the Nth frame downmix signal And obtaining the Nth frame stereo parameter set; when the Nth frame code stream is the sixth type frame, determining, according to the preset second rule, from at least one frame of the stereo parameter set before the Nth frame stereo parameter set, determining k a set of stereo parameters of the frame, and obtaining a set of stereo parameters of the Nth frame based on the predetermined fourth algorithm according to the k-frame stereo parameter set;
- the decoding unit 410 is further configured to: if the Nth frame code stream is determined to be the second type frame, when the Nth frame code stream is the third type frame, the Nth frame code stream is decoded to obtain the Nth frame stereo parameter set; When the Nth frame code stream is the fourth type of frame, the k frame stereo parameter set is determined from the at least one frame stereo parameter set before the Nth frame stereo parameter set according to the preset second rule, and according to the k frame stereo parameter. The set, based on the predetermined fourth algorithm, obtains a set of stereo parameters of the Nth frame;
- the at least one stereo parameter of the Nth frame stereo parameter set is used by the decoder to restore the Nth frame downmix signal to the Nth frame audio signal based on a predetermined third algorithm, where k is a positive integer greater than zero;
- the signal restoring unit 420 is configured to restore the Nth frame downmix signal to the Nth frame audio signal according to the third algorithm according to at least one stereo parameter in the Nth frame stereo parameter set.
- the codec system of the embodiment of the present invention includes any of the encoders 500 shown in FIGS. 3a to 3b, and the decoder 510 shown in FIG.
- embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
- computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
- the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
- the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
- These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
- the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Mathematical Physics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Stereophonic System (AREA)
- Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
Abstract
Description
Claims (29)
- 一种处理多声道音频信号的方法,其特征在于,包括:编码器检测第N帧下混信号中是否包含语音信号,所述第N帧下混信号是由多声道中两个声道的第N帧音频信号基于预定第一算法混合后得到的,N为大于零的正整数;所述编码器在检测到所述第N帧下混信号中包含语音信号时,对所述第N帧下混信号编码;所述编码器在检测到所述第N帧下混信号中不包含语音信号时:所述编码器若确定所述第N帧下混信号满足预设的音频帧编码条件,则对所述第N帧下混信号编码;若确定所述第N帧下混信号不满足预设的音频帧编码条件,则不对所述第N帧下混信号编码。
- 如权利要求1所述的方法,其特征在于,所述编码器在检测到所述第N帧下混信号中包含语音信号时,对所述第N帧下混信号编码,包括:所述编码器在检测到所述第N帧下混信号中包含语音信号时,根据预设的语音帧编码速率对所述第N帧下混信号编码;所述编码器若确定所述第N帧下混信号满足预设的音频帧编码条件,则对所述第N帧下混信号编码,包括:所述编码器若确定所述第N帧下混信号满足预设的语音帧编码条件,则根据预设的语音帧编码速率对所述第N帧下混信号编码;所述编码器若确定所述第N帧下混信号不满足预设的语音帧编码条件、但满足预设的静音插入帧SID编码条件,则根据预设的SID编码速率对所述第N帧下混信号编码;其中,所述SID编码速率不大于所述语音帧编码速率。
- 如权利要求1或2所述的方法,其特征在于,所述方法还包括:所述编码器根据所述第N帧音频信号,得到第N帧立体声参数集合,其中,所述第N帧立体声参数集合中包括Z个立体声参数,所述Z个立体声参 数包括所述编码器基于所述预定第一算法对所述第N帧音频信号混合时所用到的参数,Z为大于零的正整数;所述编码器在检测到所述第N帧下混信号中包含语音信号时,则对所述第N帧立体声参数集合编码;所述编码器在检测到所述第N帧下混信号中不包含语音信号时:所述编码器若确定所述第N帧立体声参数集合满足预设的立体声参数编码条件,则对所述第N帧立体声参数集合中的至少一个立体声参数编码;若确定所述第N帧立体声参数集合不满足预设的立体声参数编码条件,则不对所述立体声参数集合编码。
- 如权利要求3所述的方法,其特征在于,所述编码器对所述第N帧立体声参数集合中的至少一个立体声参数编码,包括:所述编码器根据所述第N帧立体声参数集合中的Z个立体声参数,按照预设的立体声参数降维规则,得到X个目标立体声参数,其中,X为大于零且小于等于Z的正整数;所述编码器对所述X个目标立体声参数编码。
- 如权利要求2所述的方法,其特征在于,还包括:所述编码器在检测到所述第N帧音频信号包含语音信号时:所述编码器根据所述第N帧音频信号,基于第一立体声参数集合生成方式,得到所述第N帧立体声参数集合,并对所述第N帧立体声参数集合编码;所述编码器在检测到所述第N帧音频信号不包含语音信号时:所述编码器若确定所述第N帧音频信号满足预设的语音帧编码条件,则根据所述第N帧音频信号,基于第一立体声参数集合生成方式,得到所述第N帧立体声参数集合,并对所述第N帧立体声参数集合编码;所述编码器若确定所述第N帧音频信号不满足预设的语音帧编码条件,则根据所述第N帧音频信号,基于第二立体声参数集合生成方式,得到所述第N帧立体声参数集合,并在确定所述第N帧立体声参数集合满足预设的立体声参数编码条件时, 对所述第N帧立体声参数集合中的至少一个立体声参数编码;在确定所述第N帧立体声参数集合不满足预设的立体声参数编码条件时,不对所述立体声参数集合编码;其中,所述第一立体声参数集合生成方式和所述第二立体声参数集合生成方式满足下列至少一个条件:所述第一立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数类型的个数不少于所述第二立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数类型的个数,所述第一立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数的个数不少于所述第二立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数的个数,所述第一立体声参数集合生成方式规定的立体声参数在时域的分辨率不低于所述第二立体声参数集合生成方式规定的对应的立体声参数在时域的分辨率,所述第一立体声参数集合生成方式规定的立体声参数在频域的分辨率不低于所述第二立体声参数集合生成方式规定的对应的立体声参数在频域的分辨率。
- 如权利要求3至5任一所述的方法,其特征在于,所述编码器对所述第N帧立体声参数集合编码,包括:所述编码器根据第一编码方式对所述第N帧立体声参数集合编码;所述编码器对所述第N帧立体声参数集合中的至少一个立体声参数编码,包括:所述编码器在所述第N帧下混信号满足所述语音帧编码条件时,根据第一编码方式对所述第N帧立体声参数集合中的至少一个立体声参数编码;所述编码器在所述第N帧下混信号不满足所述语音帧编码条件时,根据所述第二编码方式对所述第N帧立体声参数集合中的至少一个立体声参数编码;其中,所述第一编码方式规定的编码速率不小于所述第二编码方式规定的编码速率;和/或,针对所述第N帧立体声参数集合中的任一立体声参数,所述第一编码方式规定的量化精度不低于所述第二编码方式规定的量化精 度。
- 如权利要求3至6任一所述的方法,其特征在于,若所述第N帧立体声参数集合中的至少一个立体声参数包括:声道间电平差ILD;所述预设立体声参数编码条件中包括:DL≥D0;其中,DL表示ILD与第一标准的偏离程度,所述第一标准是根据第N帧立体声参数集合之前的T帧立体声参数集合,基于预定第二算法确定的,T为大于0的正整数;若所述第N帧立体声参数集合中的至少一个立体声参数包括:声道间时间差ITD;所述预设立体声参数编码条件中包括:DT≥D1;其中,DT表示ITD与第二标准的偏离程度,所述第二标准是根据第N帧立体声参数集合之前的T帧立体声参数集合,基于预定第三算法确定的,T为大于0的正整数;若所述第N帧立体声参数集合中的至少一个立体声参数包括:声道间相位差IPD;所述预设立体声参数编码条件中包括:Dp≥D2;其中,DP表示IPD与第三标准的偏离程度,所述第三标准是根据第N帧立体声参数集合之前的T帧立体声参数集合,基于预定第四算法确定的,T为大于0的正整数。
- 如权利要求7所述的方法,其特征在于,DL、DT、DP分别满足下列表达式:其中,ILD(m)为所述两声道分别在第m个子频带传输所述第N帧音频信号时的电平差值,M为传输所述第N帧音频信号所占用的子频带的总个数, 为在所述第N帧之前的T帧立体声参数集合中在所述第m个子频带的ILD的平均值,T为大于0的正整数,ILD[-t](m)为所述两声道分别在第m个子频带传输所述第N帧音频信号之前的第t帧音频信号时的电平差值,ITD为所述两声道分别传输所述第N帧音频信号时的时间差值,为在所述第N帧之前的T帧立体声参数集合中的ITD的平均值,ITD[-t]为所述两声道分别传输所述第N帧音频信号之前的第t帧音频信号时的时间差值,IPD(m)为所述两声道分别在第m个子频带传输所述第N帧音频信号中的部分音频信号时的相位差值,为在所述第N帧之前的T帧立体声参数集合中在第m个子频带的IPD的平均值,IPD[-t](m)为所述两声道分别在第m个子频带传输所述第N帧音频信号之前的第t帧音频信号时的相位差值。
- 一种处理多声道音频信号的方法,其特征在于,包括:解码器接收到码流,所述码流包括至少两个帧,所述至少两个帧中存在至少一个第一类型帧和至少一个第二类型帧,所述第一类型帧中包含下混信号,所述第二类型帧中不包含下混信号;针对第N帧码流,所述N为大于1的正整数:所述解码器若确定所述第N帧码流为所述第一类型帧,则对所述第N帧码流解码,得到第N帧下混信号;所述解码器若确定所述第N帧码流为所述第二类型帧,则根据预设第一规则,从所述第N帧下混信号之前的至少一帧下混信号中,确定m帧下混信号,并根据所述m帧下混信号,基于预定第一算法,得到所述第N帧下混信号,m为大于零的正整数;其中,所述第N帧下混信号是编码器由多声道中两个声道的第N帧音频信号基于预定第二算法混合后得到的。
- 如权利要求9所述的方法,其特征在于,所述第一类型帧中包含下混信号和立体声参数集合,所述第二类型帧中包含立体声参数集合且不包含 下混信号:所述解码器若确定所述第N帧码流为所述第一类型帧,则对所述第N帧码流解码之后,还包括:所述解码器得到第N帧立体声参数集合;所述解码器若确定所述第N帧码流为所述第二类型帧之后,还包括:所述解码器对所述第N帧码流解码,得到第N帧立体声参数集合;其中,所述第N帧立体声参数集合中的至少一个立体声参数用于所述解码器基于所述预定第三算法将所述第N帧下混信号还原为所述第N帧音频信号所述解码器根据所述第N帧立体声参数集合中的至少一个立体声参数,基于所述第三算法,将所述第N帧下混信号还原为所述第N帧音频信号。
- 如权利要求9所述的方法,其特征在于,所述第一类型帧中包含下混信号和立体声参数集合,所述第二类型帧中不包含下混信号且不包含立体声参数集合;所述解码器若确定所述第N帧码流为所述第一类型帧,则对所述第N帧码流解码之后,还包括:所述解码器得到第N帧立体声参数集合;所述解码器若确定所述第N帧码流为所述第二类型帧之后,还包括:所述解码器根据预设第二规则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合,k为大于零的正整数;其中,所述第N帧立体声参数集合中的至少一个立体声参数用于所述解码器基于所述预定第三算法将所述第N帧下混信号还原为所述第N帧音频信号;所述解码器根据所述第N帧立体声参数集合中的至少一个立体声参数,基于所述第三算法,将所述第N帧下混信号还原为所述第N帧音频信号。
- 如权利要求9所述的方法,其特征在于,所述第一类型帧中包含下混信号和立体声参数集合,第三类型帧中包含立体声参数集合且不包含下混信号,第四类型帧中不包含下混信号且不包含立体声参数集合,所述第三类型帧和所述第四类型帧分别为所述第二类型帧的一种情况:所述解码器若确定所述第N帧码流为所述第一类型帧,则对所述第N帧码流解码之后,还包括:所述解码器得到第N帧立体声参数集合;所述解码器若确定所述第N帧码流为所述第二类型帧之后,还包括:当所述第N帧码流为所述第三类型帧时,所述解码器对所述第N帧码流解码,得到第N帧立体声参数集合;当所述第N帧码流为所述第四类型帧时,所述解码器根据预设第二规则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合,k为大于零的正整数;其中,所述第N帧立体声参数集合中的至少一个立体声参数用于所述解码器基于所述预定第三算法将所述第N帧下混信号还原为所述第N帧音频信号;所述解码器根据所述第N帧立体声参数集合中的至少一个立体声参数,基于所述第三算法,将所述第N帧下混信号还原为所述第N帧音频信号。
- 如权利要求9所述的方法,其特征在于,第五类型帧中包含下混信号和立体声参数集合,第六类型帧中包含下混信号且不包含立体声参数集合,所述第五类型帧和所述第六类型帧分别为所述第一类型帧的一种情况,所述第二类型帧中不包含下混信号且不包含立体声参数集合:所述解码器若确定所述第N帧码流为所述第一类型帧之后,还包括:当所述第N帧码流为所述第五类型帧时,所述解码器对所述第N帧码流解码,得到第N帧立体声参数集合;当所述第N帧码流为所述第六类型帧时,所述解码器根据预设第二规 则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合;所述解码器若确定所述第N帧码流为所述第二类型帧之后,还包括:所述解码器根据预设第二规则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合,其中,所述第N帧立体声参数集合中的至少一个立体声参数用于所述解码器基于所述预定第三算法将所述第N帧下混信号还原为所述第N帧音频信号,所述k为大于零的正整数;所述解码器根据所述第N帧立体声参数集合中的至少一个立体声参数,基于所述第三算法,将所述第N帧下混信号还原为所述第N帧音频信号。
- 如权利要求9所述的方法,其特征在于,第五类型帧中包含下混信号和立体声参数集合,第六类型帧中包含下混信号且不包含立体声参数集合,所述第五类型帧和所述第六类型帧分别为所述第一类型帧的一种情况,第三类型帧中包含立体声参数集合且不包含下混信号,第四类型帧中不包含下混信号且不包含立体声参数集合,所述第三类型帧和所述第四类型帧分别为所述第二类型帧的一种情况:所述解码器若确定所述第N帧码流为所述第一类型帧之后,还包括:当所述第N帧码流为所述第五类型帧时,所述解码器对所述第N帧码流解码,得到第N帧立体声参数集合;当所述第N帧码流为所述第六类型帧时,所述解码器根据预设第二规则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合;所述解码器若确定所述第N帧码流为所述第二类型帧之后,还包括:当所述第N帧码流为所述第三类型帧时,所述解码器对所述第N帧码流 解码,得到第N帧立体声参数集合;当所述第N帧码流为所述第四类型帧时,所述解码器根据预设第二规则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合;其中,所述第N帧立体声参数集合中的至少一个立体声参数用于所述解码器基于所述预定第三算法将所述第N帧下混信号还原为所述第N帧音频信号,k为大于零的正整数;所述解码器根据所述第N帧立体声参数集合中的至少一个立体声参数,基于所述第三算法,将所述第N帧下混信号还原为所述第N帧音频信号。
- 一种编码器,其特征在于,包括:信号检测单元,用于检测第N帧下混信号中是否包含语音信号,所述第N帧下混信号是由多声道中两个声道的第N帧音频信号基于预定第一算法混合后得到的,N为大于零的正整数;信号编码单元,用于在所述信号检测单元检测到所述第N帧下混信号中包含语音信号时,对所述第N帧下混信号编码;所述信号编码单元,还用于在所述信号检测单元检测到所述第N帧下混信号中不包含语音信号时:若所述信号检测单元确定所述第N帧下混信号满足预设的音频帧编码条件,则对所述第N帧下混信号编码;若所述信号检测单元确定所述第N帧下混信号不满足预设的音频帧编码条件,则不对所述第N帧下混信号编码。
- 如权利要求15所述的编码器,其特征在于,所述信号编码单元包括第一信号编码单元和第二信号编码单元,所述第一信号编码单元,具体用于:在所述信号检测单元检测到所述第N帧下混信号中包含语音信号时,根据预设的语音帧编码速率对所述第N帧下混信号编码;若所述信号检测单元确定所述第N帧下混信号满足预设的语音帧编码条件,则根据预设的语音帧编码速率对所述第N帧下混信号编码;所述第二信号编码单元,具体用于:若所述信号检测单元确定所述第N帧下混信号不满足预设的语音帧编码条件、但满足预设的静音插入帧SID编码条件,则根据预设的SID编码速率对所述第N帧下混信号编码;其中,所述SID编码速率不大于所述语音帧编码速率。
- 如权利要求15或16所述的编码器,其特征在于,还包括参数生成单元、参数编码单元和参数检测单元;所述参数生成单元,用于根据所述第N帧音频信号,得到第N帧立体声参数集合,其中,所述第N帧立体声参数集合中包括Z个立体声参数,所述Z个立体声参数包括所述编码器基于所述预定第一算法对所述第N帧音频信号混合时所用到的参数,Z为大于零的正整数;所述参数编码单元,用于在所述信号检测单元检测到所述第N帧下混信号中包含语音信号时,则对所述第N帧立体声参数集合编码;所述参数编码单元,在所述信号检测单元检测到所述第N帧下混信号中不包含语音信号时,还用于:若所述参数检测单元确定所述第N帧立体声参数集合满足预设的立体声参数编码条件,则对所述第N帧立体声参数集合中的至少一个立体声参数编码;若所述参数检测单元确定所述第N帧立体声参数集合不满足预设的立体声参数编码条件,则不对所述立体声参数集合编码。
- 如权利要求17所述的编码器,其特征在于,所述参数编码单元对所述第N帧立体声参数集合中的至少一个立体声参数编码,具体用于:根据所述第N帧立体声参数集合中的Z个立体声参数,按照预设的立体声参数降维规则,得到X个目标立体声参数,并对所述X个目标立体声参数编码,其中,X为大于零且小于等于Z的正整数。
- 如权利要求16所述的编码器,其特征在于,所述参数生成单元包括第一参数生成单元和第二参数生成单元;所述第一参数生成单元,用于在所述信号检测单元检测到所述第N帧音 频信号包含语音信号时以及在所述信号检测单元检测到所述第N帧音频信号不包含语音信号、且确定所述第N帧音频信号满足预设的语音帧编码条件时:根据所述第N帧音频信号,基于第一立体声参数集合生成方式,得到所述第N帧立体声参数集合,并通过参数编码单元对所述第N帧立体声参数集合编码;所述第二参数生成单元,用于在所述信号检测单元检测到所述第N帧音频信号不包含语音信号、且确定所述第N帧音频信号不满足预设的语音帧编码条件时:根据所述第N帧音频信号,基于第二立体声参数集合生成方式,得到所述第N帧立体声参数集合,并在所述参数检测单元确定所述第N帧立体声参数集合满足预设的立体声参数编码条件时,对所述第N帧立体声参数集合中的至少一个立体声参数编码;在所述参数检测单元确定所述第N帧立体声参数集合不满足预设的立体声参数编码条件时,不对所述立体声参数集合编码;其中,所述第一立体声参数集合生成方式和所述第二立体声参数集合生成方式满足下列至少一个条件:所述第一立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数类型的个数不少于所述第二立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数类型的个数,所述第一立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数的个数不少于所述第二立体声参数集合生成方式规定的立体声参数集合中包括的立体声参数的个数,所述第一立体声参数集合生成方式规定的立体声参数在时域的分辨率不低于所述第二立体声参数集合生成方式规定的对应的立体声参数在时域的分辨率,所述第一立体声参数集合生成方式规定的立体声参数在频域的分辨率不低于所述第二立体声参数集合生成方式规定的对应的立体声参数在频域的分辨率。
- 如权利要求17至19任一所述的编码器,其特征在于,所述参数编码单元包括第一参数编码单元和第二参数编码单元;所述第一参数编码单元,用于在所述信号检测单元检测到第N帧下混信号中包含语音信号以及所述第N帧下混信号满足所述语音帧编码条件时,根据第一编码方式对所述第N帧立体声参数集合编码;所述第二参数编码单元,具体用于:在所述第N帧下混信号不满足所述语音帧编码条件时,根据所述第二编码方式对所述第N帧立体声参数集合中的至少一个立体声参数编码;其中,所述第一编码方式规定的编码速率不小于所述第二编码方式规定的编码速率;和/或,针对所述第N帧立体声参数集合中的任一立体声参数,所述第一编码方式规定的量化精度不低于所述第二编码方式规定的量化精度。
- 如权利要求17至20任一所述的编码器,其特征在于,若所述第N帧立体声参数集合中的至少一个立体声参数包括:声道间电平差ILD;所述预设立体声参数编码条件中包括:DL≥D0;其中,DL表示ILD与第一标准的偏离程度,所述第一标准是根据第N帧立体声参数集合之前的T帧立体声参数集合,基于预定第二算法确定的,T为大于0的正整数;若所述第N帧立体声参数集合中的至少一个立体声参数包括:声道间时间差ITD;所述预设立体声参数编码条件中包括:DT≥D1;其中,DT表示ITD与第二标准的偏离程度,所述第二标准是根据第N帧立体声参数集合之前的T帧立体声参数集合,基于预定第三算法确定的,T为大于0的正整数;若所述第N帧立体声参数集合中的至少一个立体声参数包括:声道间相位差IPD;所述预设立体声参数编码条件中包括:Dp≥D2;其中,DP表示IPD与第三标准的偏离程度,所述第三标准是根据第N帧立体声参数集合之前的T帧立体声参数集合,基于预定第四算法确定的,T为大于0的正整数。
- 如权利要求21所述的编码器,其特征在于,DL、DT、DP分别满足下列表达式:其中,ILD(m)为所述两声道分别在第m个子频带传输所述第N帧音频信号时的电平差值,M为传输所述第N帧音频信号所占用的子频带的总个数,为在所述第N帧之前的T帧立体声参数集合中在所述第m个子频带的ILD的平均值,T为大于0的正整数,ILD[-t](m)为所述两声道分别在第m个子频带传输所述第N帧音频信号之前的第t帧音频信号时的电平差值,ITD为所述两声道分别传输所述第N帧音频信号时的时间差值,为在所述第N帧之前的T帧立体声参数集合中的ITD的平均值,ITD[-t]为所述两声道分别传输所述第N帧音频信号之前的第t帧音频信号时的时间差值,IPD(m)为所述两声道分别在第m个子频带传输所述第N帧音频信号中的部分音频信号时的相位差值,为在所述第N帧之前的T帧立体声参数集合中在第m个子频带的IPD的平均值,IPD[-t](m)为所述两声道分别在第m个子频带传输所述第N帧音频信号之前的第t帧音频信号时的相位差值。
- 一种解码器,其特征在于,包括:接收单元,用于接收到码流,所述码流包括至少两个帧,所述至少两个帧中存在至少一个第一类型帧和至少一个第二类型帧,所述第一类型帧中包含下混信号,所述第二类型帧中不包含下混信号;针对第N帧码流,所述N为大于1的正整数,解码单元,用于:若确定所述第N帧码流为所述第一类型帧,对所述第N帧码流解码,得 到第N帧下混信号;若确定所述第N帧码流为所述第二类型帧,则根据预设第一规则,从所述第N帧下混信号之前的至少一帧下混信号中,确定m帧下混信号,并根据所述m帧下混信号,基于预定第一算法,得到所述第N帧下混信号,m为大于零的正整数;其中,所述第N帧下混信号是编码器由多声道中两个声道的第N帧音频信号基于预定第二算法混合后得到的。
- 如权利要求23所述的解码器,其特征在于,所述第一类型帧中包含下混信号和立体声参数集合,所述第二类型帧中包含立体声参数集合且不包含下混信号:所述解码单元还用于:若确定所述第N帧码流为所述第一类型帧,则对所述第N帧码流解码之后,得到第N帧立体声参数集合;若确定所述第N帧码流为所述第二类型帧,则对所述第N帧码流解码,得到第N帧立体声参数结合;其中,所述第N帧立体声参数集合中的至少一个立体声参数用于所述解码器基于所述预定第三算法将所述第N帧下混信号还原为所述第N帧音频信号;所述解码器还包括,信号还原单元;所述信号还原单元,用于根据所述第N帧立体声参数集合中的至少一个立体声参数,基于所述第三算法,将所述第N帧下混信号还原为所述第N帧音频信号。
- 如权利要求23所述的解码器,其特征在于,所述第一类型帧中包含下混信号和立体声参数集合,所述第二类型帧中不包含下混信号且不包含立体声参数集合;所述解码单元,还用于:若确定所述第N帧码流为所述第一类型帧,则对所述第N帧码流解码之 后,得到第N帧立体声参数集合;若确定所述第N帧码流为所述第二类型帧,则根据预设第二规则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合,k为大于零的正整数;其中,所述第N帧立体声参数集合中的至少一个立体声参数用于所述解码器基于所述预定第三算法将所述第N帧下混信号还原为所述第N帧音频信号;所述解码器还包括,信号还原单元;所述信号还原单元,用于根据所述第N帧立体声参数集合中的至少一个立体声参数,基于所述第三算法,将所述第N帧下混信号还原为所述第N帧音频信号。
- 如权利要求23所述的解码器,其特征在于,所述第一类型帧中包含下混信号和立体声参数集合,第三类型帧中包含立体声参数集合且不包含下混信号,第四类型帧中不包含下混信号且不包含立体声参数集合,所述第三类型帧和所述第四类型帧分别为所述第二类型帧的一种情况:所述解码单元,还用于:若确定所述第N帧码流为所述第一类型帧,则对所述第N帧码流解码之后,得到第N帧立体声参数集合;若确定所述第N帧码流为所述第二类型帧,则当所述第N帧码流为所述第三类型帧时,对所述第N帧码流解码,得到第N帧立体声参数集合;当所述第N帧码流为所述第四类型帧时,则根据预设第二规则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合,k为大于零的正整数;其中,所述第N帧立体声参数集合中的至少一个立体声参数用于所述解码器基于所述预定第三算法将所述第N帧下混信号还原为所述第N帧音频信 号;所述解码器还包括,信号还原单元;所述信号还原单元,用于根据所述第N帧立体声参数集合中的至少一个立体声参数,基于所述第三算法,将所述第N帧下混信号还原为所述第N帧音频信号。
- 如权利要求23所述的解码器,其特征在于,第五类型帧中包含下混信号和立体声参数集合,第六类型帧中包含下混信号且不包含立体声参数集合,所述第五类型帧和所述第六类型帧分别为所述第一类型帧的一种情况,所述第二类型帧中不包含下混信号且不包含立体声参数集合:所述解码单元,还用于:若确定所述第N帧码流为所述第一类型帧,则当所述第N帧码流为所述第五类型帧时,对所述第N帧码流解码之后,得到第N帧立体声参数集合;当所述第N帧码流为所述第六类型帧时,则根据预设第二规则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合;若确定所述第N帧码流为所述第二类型帧,则根据预设第二规则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合,其中,所述第N帧立体声参数集合中的至少一个立体声参数用于所述解码器基于所述预定第三算法将所述第N帧下混信号还原为所述第N帧音频信号,所述k为大于零的正整数;所述解码器还包括,信号还原单元;所述信号还原单元,用于根据所述第N帧立体声参数集合中的至少一个立体声参数,基于所述第三算法,将所述第N帧下混信号还原为所述第N帧音频信号。
- 如权利要求23所述的解码器,其特征在于,第五类型帧中包含下混信号和立体声参数集合,第六类型帧中包含下混信号且不包含立体声参数集合,所述第五类型帧和所述第六类型帧分别为所述第一类型帧的一种情况,第三类型帧中包含立体声参数集合且不包含下混信号,第四类型帧中不包含下混信号且不包含立体声参数集合,所述第三类型帧和所述第四类型帧分别为所述第二类型帧的一种情况:所述解码单元,还用于:若确定所述第N帧码流为所述第一类型帧,当所述第N帧码流为所述第五类型帧时,则对所述第N帧码流解码之后,得到第N帧立体声参数集合;当所述第N帧码流为所述第六类型帧时,则根据预设第二规则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合;若确定所述第N帧码流为所述第二类型帧,当所述第N帧码流为所述第三类型帧时,对所述第N帧码流解码,得到第N帧立体声参数集合;当所述第N帧码流为所述第四类型帧时,则根据预设第二规则,从所述第N帧立体声参数集合之前的至少一帧立体声参数集合中,确定k帧立体声参数集合,并根据所述k帧立体声参数集合,基于预定第四算法,得到所述第N帧立体声参数集合;其中,所述第N帧立体声参数集合中的至少一个立体声参数用于所述解码器基于所述预定第三算法将所述第N帧下混信号还原为所述第N帧音频信号,k为大于零的正整数;所述解码器还包括,信号还原单元;所述信号还原单元,用于根据所述第N帧立体声参数集合中的至少一个立体声参数,基于所述第三算法,将所述第N帧下混信号还原为所述第N帧音频信号。
- 一种编解码系统,其特征在于,包括如权利要求15至22任一所述 的编码器,和如权利要求23至28任一所述的解码器。
Priority Applications (17)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311267474.8A CN117392988A (zh) | 2016-09-28 | 2016-09-28 | 一种处理多声道音频信号的方法、装置和系统 |
CN202311262035.8A CN117351966A (zh) | 2016-09-28 | 2016-09-28 | 一种处理多声道音频信号的方法、装置和系统 |
KR1020227012057A KR102480710B1 (ko) | 2016-09-28 | 2016-09-28 | 다중 채널 오디오 신호 처리 방법, 장치 및 시스템 |
CN202311261449.9A CN117351965A (zh) | 2016-09-28 | 2016-09-28 | 一种处理多声道音频信号的方法、装置和系统 |
KR1020217028255A KR102387162B1 (ko) | 2016-09-28 | 2016-09-28 | 다중 채널 오디오 신호 처리 방법, 장치 및 시스템 |
MX2019003417A MX2019003417A (es) | 2016-09-28 | 2016-09-28 | Metodo, aparato y sistema de procesamiento de señales de audio de multicanal. |
CN201680010600.3A CN108140393B (zh) | 2016-09-28 | 2016-09-28 | 一种处理多声道音频信号的方法、装置和系统 |
KR1020197011605A KR20190052122A (ko) | 2016-09-28 | 2016-09-28 | 다중 채널 오디오 신호 처리 방법, 장치 및 시스템 |
BR112019005983-0A BR112019005983B1 (pt) | 2016-09-28 | Método de processamento de sinal de áudio de multicanais, codificador, decodificador e sistema de codificação e decodificação | |
CN202311261321.2A CN117476018A (zh) | 2016-09-28 | 2016-09-28 | 一种处理多声道音频信号的方法、装置和系统 |
PCT/CN2016/100617 WO2018058379A1 (zh) | 2016-09-28 | 2016-09-28 | 一种处理多声道音频信号的方法、装置和系统 |
JP2019516957A JP6790251B2 (ja) | 2016-09-28 | 2016-09-28 | マルチチャネルオーディオ信号処理方法、装置、およびシステム |
EP21163871.3A EP3910629A1 (en) | 2016-09-28 | 2016-09-28 | Multichannel audio signal processing method, apparatus, and system |
EP16917134.5A EP3511934B1 (en) | 2016-09-28 | 2016-09-28 | Method, apparatus and system for processing multi-channel audio signal |
US16/368,208 US10593339B2 (en) | 2016-09-28 | 2019-03-28 | Multichannel audio signal processing method, apparatus, and system |
US16/781,421 US10984807B2 (en) | 2016-09-28 | 2020-02-04 | Multichannel audio signal processing method, apparatus, and system |
US17/232,679 US11922954B2 (en) | 2016-09-28 | 2021-04-16 | Multichannel audio signal processing method, apparatus, and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2016/100617 WO2018058379A1 (zh) | 2016-09-28 | 2016-09-28 | 一种处理多声道音频信号的方法、装置和系统 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/368,208 Continuation US10593339B2 (en) | 2016-09-28 | 2019-03-28 | Multichannel audio signal processing method, apparatus, and system |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018058379A1 true WO2018058379A1 (zh) | 2018-04-05 |
Family
ID=61763024
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2016/100617 WO2018058379A1 (zh) | 2016-09-28 | 2016-09-28 | 一种处理多声道音频信号的方法、装置和系统 |
Country Status (7)
Country | Link |
---|---|
US (3) | US10593339B2 (zh) |
EP (2) | EP3511934B1 (zh) |
JP (1) | JP6790251B2 (zh) |
KR (3) | KR20190052122A (zh) |
CN (5) | CN117351966A (zh) |
MX (1) | MX2019003417A (zh) |
WO (1) | WO2018058379A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110556119A (zh) * | 2018-05-31 | 2019-12-10 | 华为技术有限公司 | 一种下混信号的计算方法及装置 |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
MX2019003417A (es) * | 2016-09-28 | 2019-10-07 | Huawei Tech Co Ltd | Metodo, aparato y sistema de procesamiento de señales de audio de multicanal. |
KR20210154807A (ko) * | 2019-04-18 | 2021-12-21 | 돌비 레버러토리즈 라이쎈싱 코오포레이션 | 다이얼로그 검출기 |
CN115867964A (zh) * | 2020-06-11 | 2023-03-28 | 杜比实验室特许公司 | 用于对多声道输入信号内的空间背景噪声进行编码和/或解码的方法和设备 |
AU2021317755B2 (en) * | 2020-07-30 | 2023-11-09 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus, method and computer program for encoding an audio signal or for decoding an encoded audio scene |
WO2024056702A1 (en) * | 2022-09-13 | 2024-03-21 | Telefonaktiebolaget Lm Ericsson (Publ) | Adaptive inter-channel time difference estimation |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101320563A (zh) * | 2007-06-05 | 2008-12-10 | 华为技术有限公司 | 一种背景噪声编码/解码装置、方法和通信设备 |
CN101556799A (zh) * | 2009-05-14 | 2009-10-14 | 华为技术有限公司 | 一种音频解码方法和音频解码器 |
CN101661749A (zh) * | 2009-09-23 | 2010-03-03 | 清华大学 | 一种语音和音乐双模切换编/解码的方法 |
CN103188595A (zh) * | 2011-12-31 | 2013-07-03 | 展讯通信(上海)有限公司 | 处理多声道音频信号的方法和系统 |
US20140330415A1 (en) * | 2011-11-10 | 2014-11-06 | Nokia Corporation | Method and apparatus for detecting audio sampling rate |
Family Cites Families (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0713586B2 (ja) | 1987-02-20 | 1995-02-15 | 三機工業株式会社 | 自動車エンジン実験用移動油水制御装置 |
JP2835483B2 (ja) * | 1993-06-23 | 1998-12-14 | 松下電器産業株式会社 | 音声判別装置と音響再生装置 |
JP2728122B2 (ja) * | 1995-05-23 | 1998-03-18 | 日本電気株式会社 | 無音圧縮音声符号化復号化装置 |
EP0977172A4 (en) * | 1997-03-19 | 2000-12-27 | Hitachi Ltd | METHOD AND DEVICE FOR DETERMINING THE START AND END POINT OF A SOUND SECTION IN VIDEO |
ATE388542T1 (de) * | 1999-12-13 | 2008-03-15 | Broadcom Corp | Sprach-durchgangsvorrichtung mit sprachsynchronisierung in abwärtsrichtung |
JP3526269B2 (ja) | 2000-12-11 | 2004-05-10 | 株式会社東芝 | ネットワーク間中継装置及び該中継装置における転送スケジューリング方法 |
US7657706B2 (en) | 2003-12-18 | 2010-02-02 | Cisco Technology, Inc. | High speed memory and input/output processor subsystem for efficiently allocating and using high-speed memory and slower-speed memory |
KR100888474B1 (ko) * | 2005-11-21 | 2009-03-12 | 삼성전자주식회사 | 멀티채널 오디오 신호의 부호화/복호화 장치 및 방법 |
JP2008286904A (ja) * | 2007-05-16 | 2008-11-27 | Panasonic Corp | オーディオ複号化装置 |
JP2011504250A (ja) * | 2007-11-21 | 2011-02-03 | エルジー エレクトロニクス インコーポレイティド | 信号処理方法及び装置 |
EP2144229A1 (en) * | 2008-07-11 | 2010-01-13 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Efficient use of phase information in audio encoding and decoding |
KR101137652B1 (ko) * | 2009-10-14 | 2012-04-23 | 광운대학교 산학협력단 | 천이 구간에 기초하여 윈도우의 오버랩 영역을 조절하는 통합 음성/오디오 부호화/복호화 장치 및 방법 |
US9324337B2 (en) * | 2009-11-17 | 2016-04-26 | Dolby Laboratories Licensing Corporation | Method and system for dialog enhancement |
JP5299327B2 (ja) * | 2010-03-17 | 2013-09-25 | ソニー株式会社 | 音声処理装置、音声処理方法、およびプログラム |
JP5581449B2 (ja) * | 2010-08-24 | 2014-08-27 | ドルビー・インターナショナル・アーベー | Fmステレオ無線受信機の断続的モノラル受信の隠蔽 |
US8831937B2 (en) * | 2010-11-12 | 2014-09-09 | Audience, Inc. | Post-noise suppression processing to improve voice quality |
WO2012066727A1 (ja) * | 2010-11-17 | 2012-05-24 | パナソニック株式会社 | ステレオ信号符号化装置、ステレオ信号復号装置、ステレオ信号符号化方法及びステレオ信号復号方法 |
US9036526B2 (en) * | 2012-11-08 | 2015-05-19 | Qualcomm Incorporated | Voice state assisted frame early termination |
CN105247610B (zh) * | 2013-05-31 | 2019-11-08 | 索尼公司 | 编码装置和方法、解码装置和方法以及记录介质 |
CN105304080B (zh) * | 2015-09-22 | 2019-09-03 | 科大讯飞股份有限公司 | 语音合成装置及方法 |
RU2763374C2 (ru) * | 2015-09-25 | 2021-12-28 | Войсэйдж Корпорейшн | Способ и система с использованием разности долговременных корреляций между левым и правым каналами для понижающего микширования во временной области стереофонического звукового сигнала в первичный и вторичный каналы |
US20170134282A1 (en) | 2015-11-10 | 2017-05-11 | Ciena Corporation | Per queue per service differentiation for dropping packets in weighted random early detection |
MX2019003417A (es) * | 2016-09-28 | 2019-10-07 | Huawei Tech Co Ltd | Metodo, aparato y sistema de procesamiento de señales de audio de multicanal. |
CN109285536B (zh) * | 2018-11-23 | 2022-05-13 | 出门问问创新科技有限公司 | 一种语音特效合成方法、装置、电子设备及存储介质 |
-
2016
- 2016-09-28 MX MX2019003417A patent/MX2019003417A/es unknown
- 2016-09-28 KR KR1020197011605A patent/KR20190052122A/ko not_active Application Discontinuation
- 2016-09-28 CN CN202311262035.8A patent/CN117351966A/zh active Pending
- 2016-09-28 CN CN202311261449.9A patent/CN117351965A/zh active Pending
- 2016-09-28 KR KR1020227012057A patent/KR102480710B1/ko active IP Right Grant
- 2016-09-28 CN CN202311267474.8A patent/CN117392988A/zh active Pending
- 2016-09-28 EP EP16917134.5A patent/EP3511934B1/en active Active
- 2016-09-28 WO PCT/CN2016/100617 patent/WO2018058379A1/zh active Application Filing
- 2016-09-28 CN CN202311261321.2A patent/CN117476018A/zh active Pending
- 2016-09-28 CN CN201680010600.3A patent/CN108140393B/zh active Active
- 2016-09-28 KR KR1020217028255A patent/KR102387162B1/ko active IP Right Grant
- 2016-09-28 JP JP2019516957A patent/JP6790251B2/ja active Active
- 2016-09-28 EP EP21163871.3A patent/EP3910629A1/en active Pending
-
2019
- 2019-03-28 US US16/368,208 patent/US10593339B2/en active Active
-
2020
- 2020-02-04 US US16/781,421 patent/US10984807B2/en active Active
-
2021
- 2021-04-16 US US17/232,679 patent/US11922954B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101320563A (zh) * | 2007-06-05 | 2008-12-10 | 华为技术有限公司 | 一种背景噪声编码/解码装置、方法和通信设备 |
CN101556799A (zh) * | 2009-05-14 | 2009-10-14 | 华为技术有限公司 | 一种音频解码方法和音频解码器 |
CN101661749A (zh) * | 2009-09-23 | 2010-03-03 | 清华大学 | 一种语音和音乐双模切换编/解码的方法 |
US20140330415A1 (en) * | 2011-11-10 | 2014-11-06 | Nokia Corporation | Method and apparatus for detecting audio sampling rate |
CN103188595A (zh) * | 2011-12-31 | 2013-07-03 | 展讯通信(上海)有限公司 | 处理多声道音频信号的方法和系统 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110556119A (zh) * | 2018-05-31 | 2019-12-10 | 华为技术有限公司 | 一种下混信号的计算方法及装置 |
CN110556119B (zh) * | 2018-05-31 | 2022-02-18 | 华为技术有限公司 | 一种下混信号的计算方法及装置 |
US11869517B2 (en) | 2018-05-31 | 2024-01-09 | Huawei Technologies Co., Ltd. | Downmixed signal calculation method and apparatus |
Also Published As
Publication number | Publication date |
---|---|
CN117351966A (zh) | 2024-01-05 |
US20190221219A1 (en) | 2019-07-18 |
EP3511934A1 (en) | 2019-07-17 |
CN117476018A (zh) | 2024-01-30 |
MX2019003417A (es) | 2019-10-07 |
US10593339B2 (en) | 2020-03-17 |
CN117392988A (zh) | 2024-01-12 |
JP2019533189A (ja) | 2019-11-14 |
US10984807B2 (en) | 2021-04-20 |
US20200273468A1 (en) | 2020-08-27 |
EP3511934A4 (en) | 2019-08-14 |
EP3511934B1 (en) | 2021-04-21 |
KR20210111898A (ko) | 2021-09-13 |
KR20220053030A (ko) | 2022-04-28 |
EP3910629A1 (en) | 2021-11-17 |
CN117351965A (zh) | 2024-01-05 |
KR102387162B1 (ko) | 2022-04-14 |
US11922954B2 (en) | 2024-03-05 |
US20210312932A1 (en) | 2021-10-07 |
BR112019005983A2 (pt) | 2019-10-01 |
CN108140393A (zh) | 2018-06-08 |
CN108140393B (zh) | 2023-10-20 |
JP6790251B2 (ja) | 2020-11-25 |
KR102480710B1 (ko) | 2022-12-22 |
KR20190052122A (ko) | 2019-05-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI752281B (zh) | 用以使用量化及熵寫碼來編碼或解碼方向性音訊寫碼參數之設備及方法 | |
WO2018058379A1 (zh) | 一种处理多声道音频信号的方法、装置和系统 | |
US9384743B2 (en) | Apparatus and method for encoding/decoding multichannel signal | |
US9324329B2 (en) | Method for parametric spatial audio coding and decoding, parametric spatial audio coder and parametric spatial audio decoder | |
US9275646B2 (en) | Method for inter-channel difference estimation and spatial audio coding device | |
US20120093321A1 (en) | Apparatus and method for encoding and decoding spatial parameter | |
EP3664083A1 (en) | Signal reconstruction method and device in stereo signal encoding | |
WO2011153913A1 (zh) | 边带残差信号生成方法及装置 | |
WO2024052499A1 (en) | Decoder and decoding method for discontinuous transmission of parametrically coded independent streams with metadata | |
WO2024051954A1 (en) | Encoder and encoding method for discontinuous transmission of parametrically coded independent streams with metadata | |
TW202411984A (zh) | 用於具有元資料之參數化經寫碼獨立串流之不連續傳輸的編碼器及編碼方法 | |
BR112019005983B1 (pt) | Método de processamento de sinal de áudio de multicanais, codificador, decodificador e sistema de codificação e decodificação |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201680010600.3 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16917134 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2019516957 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112019005983 Country of ref document: BR |
|
ENP | Entry into the national phase |
Ref document number: 2016917134 Country of ref document: EP Effective date: 20190408 |
|
ENP | Entry into the national phase |
Ref document number: 20197011605 Country of ref document: KR Kind code of ref document: A |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01E Ref document number: 112019005983 Country of ref document: BR Free format text: APRESENTE A NUMERACAO CORRETA DAS PAGINAS DAS REIVINDICACOES |
|
ENP | Entry into the national phase |
Ref document number: 112019005983 Country of ref document: BR Kind code of ref document: A2 Effective date: 20190326 |