WO2023153228A1 - 符号化装置、及び、符号化方法 - Google Patents

符号化装置、及び、符号化方法 Download PDF

Info

Publication number
WO2023153228A1
WO2023153228A1 PCT/JP2023/002481 JP2023002481W WO2023153228A1 WO 2023153228 A1 WO2023153228 A1 WO 2023153228A1 JP 2023002481 W JP2023002481 W JP 2023002481W WO 2023153228 A1 WO2023153228 A1 WO 2023153228A1
Authority
WO
WIPO (PCT)
Prior art keywords
encoding
signal
stereo
channel
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2023/002481
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
裕一 神谷
拓也 河嶋
宏幸 江原
旭 原田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Intellectual Property Corp of America
Original Assignee
Panasonic Intellectual Property Corp of America
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Panasonic Intellectual Property Corp of America filed Critical Panasonic Intellectual Property Corp of America
Priority to JP2023580166A priority Critical patent/JPWO2023153228A1/ja
Priority to US18/835,764 priority patent/US20250191596A1/en
Publication of WO2023153228A1 publication Critical patent/WO2023153228A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/20Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/22Mode decision, i.e. based on audio signal content versus external parameters

Definitions

  • the present disclosure relates to an encoding device and an encoding method.
  • Non-Patent Document 1 there is a low-bit-rate encoding technique for speech audio signals (see, for example, Non-Patent Document 1).
  • JP 2021-119383 A Japanese Patent Publication No. 7-501190
  • a non-limiting embodiment of the present disclosure contributes to providing an encoding apparatus and an encoding method that can improve the encoding performance for speech audio signals at a low bit rate.
  • An encoding apparatus when it is determined that an input stereo signal is a signal suitable for encoding using a mid-side stereo scheme, according to conditions, the input stereo a control circuit for determining whether to transform a signal into a mid-side signal in the time domain and apply a first encoding or to apply a second encoding; a first encoding circuit for encoding a side signal; and a second encoding circuit for encoding said input stereo signal in the frequency domain when said second encoding is applied.
  • FIG. 4 is a diagram showing an example of switching transition of coding modes in the coding system; Diagram showing an example of channel transform transitions in a coding system Diagram showing a configuration example of a decoding system
  • Patent Document 1 discloses a highly efficient Modified Discrete Cosine Transform (MDCT) stereo encoding system that combines the Mid-Side (M/S) stereo system and the Left-Right (LR) stereo system. Further, for example, a method of switching between the M/S stereo system and the LR stereo system in transform coding for a stereo signal is known (see Patent Documents 1 and 2, for example).
  • MDCT Modified Discrete Cosine Transform
  • MDCT encoding (or MDCT-based encoding) shown in Patent Document 1 may have insufficient encoding performance for speech signals at low bit rates.
  • the M/S stereo method is set in all of a plurality of subbands (for example, also called frequency band or spectrum band) obtained by dividing the spectrum of an input stereo signal.
  • "Full Mid-Side encoding mode" may be selected.
  • the MDCT-based encoding method is applied, but depending on the bit rate, Code Excited Prediction (CELP) encoding (or CELP-based encoding and ) may improve the coding performance for speech signals.
  • CELP Code Excited Prediction
  • CELP coding can improve coding performance. Easy to give. Therefore, when encoding audio signals using the M/S stereo method, if the inter-channel time difference (ITD) is not zero, the performance of stereo signal encoding using CELP encoding may be degraded or insufficient. have a nature.
  • ITD inter-channel time difference
  • FIG. 1 is a diagram showing a configuration example of an encoding device (or referred to as an "encoding system") 10. As shown in FIG.
  • the coding device 10 includes, for example, a transform/analysis/preprocessing/coding control unit 11, an M/S conversion unit 12, a spectrum coding unit 13, an ITD correction unit 14, a mixing unit 15, a CELP-based coding unit 16, Also, a switching multiplexing unit 17 may be provided.
  • a stereo signal including, for example, an L channel (Left channel) and an R channel (Right channel) may be input to the transform/analysis/preprocessing/encoding control unit 11 .
  • the transform/analysis/preprocessing/encoding control unit 11 transforms, for example, the L-channel signal and the R-channel signal into signals in the frequency domain, and performs M/S conversion on the L-channel signal and the R-channel signal transformed into the frequency domain. You may output to the part 12.
  • the transform processing in the transform/analysis/preprocessing/encoding control unit 11 converts a time domain signal into a frequency domain parameter (spectral parameter), such as Fast Fourier Transform (FFT), Discrete Fourier Transform (DFT), or MDCT. A conversion process may be used.
  • FFT Fast Fourier Transform
  • DFT Discrete Fourier Transform
  • MDCT Discrete Fourier Transform
  • the conversion/analysis/preprocessing/encoding control unit 11 controls, for example, M/S conversion in the M/S conversion unit 12, and provides information on M/S conversion (for example, “M/S conversion control information”). ) may be output to the M/S converter 12 .
  • the M/S conversion control information may include, for example, information regarding the presence or absence of LR-M/S conversion in the M/S conversion section 12, or information regarding subbands on which LR-M/S conversion is performed.
  • the transform/analysis/preprocessing/encoding control unit 11 may output the L-channel signal and the R-channel signal in the time domain to the ITD correction unit 14, for example. Further, the conversion/analysis/preprocessing/encoding control unit 11 performs control related to ITD correction, for example, and outputs control information related to ITD correction (for example, referred to as “ITD correction control information”) to the ITD correction unit 14. you can
  • the ITD correction control information may be, for example, information indicating the ITD correction value, or information for determining the ITD correction value in the ITD correction unit 14 .
  • the conversion/analysis/preprocessing/encoding control unit 11 controls, for example, mixing in the mixing unit 15, and outputs control information regarding mixing (for example, referred to as “mixing control information”) to the mixing unit 15. good.
  • the mixing control information may include, for example, information on parameters (an example of which will be described later) used for mixing in the mixing section 15 .
  • the transform/analysis/preprocessing/encoding control unit 11 may perform, for example, analysis processing for analyzing the features of the L-channel signal and the R-channel signal. Analysis processing includes, for example, inter-channel correlation (ICC) analysis, inter-channel time difference (ITD) analysis, inter-channel level difference (ILD) analysis, or pitch analysis. may contain.
  • the transform/analysis/preprocessing/encoding control unit 11 may output, for example, information about the analysis result (for example, referred to as “analysis information”) to the ITD correction unit 14 or other components.
  • the transformation/analysis/preprocessing/encoding control unit 11 may perform preprocessing such as pre-emphasis or perceptual masking (or perceptual weighting).
  • the transform/analysis/preprocessing/encoding control unit 11 performs switching control of encoding modes, for example, and switches and multiplexes control information (for example, referred to as “encoding mode information”) regarding switching of encoding modes. may be output to the conversion unit 17.
  • the encoding mode information includes, for example, stereo signal encoding in the frequency domain (for example, referred to as "stereo FD (Frequency Domain) encoding”) and stereo signal encoding in the time domain (for example, "stereo TD (Time domain) encoding) may be included.
  • the M/S conversion section 12 and the spectrum encoding section 13 may configure a stereo FD encoding section (for example, corresponding to the second encoding circuit) that performs stereo FD encoding.
  • the M/S transform unit 12 receives L-channel and R-channel signals (e.g., spectrum parameters) in the frequency domain, and M/S transform control information. is entered.
  • the M/S conversion unit 12 may perform LR-M/S conversion processing of the spectral parameters of the L channel and the spectral parameters of the R channel, for example, based on the M/S conversion control information.
  • the M/S conversion unit 12 outputs, for example, spectral parameters (2 channels) after LR-M/S conversion processing to the spectrum encoding unit 13 .
  • the M/S conversion unit 12 may perform LR-M/S conversion processing for each subband.
  • the M/S conversion control information includes information indicating whether to perform LR-M/S conversion for each subband, and the M/S conversion unit 12 performs , LR-M/S conversion processing may be performed.
  • the M/S conversion control information includes information indicating whether to perform LR-M/S conversion in a plurality of subbands (for example, some or all subbands), and the M/S conversion unit 12 may perform LR-M/S conversion processing based on the M/S conversion control information.
  • the spectral encoding unit 13 performs, for example, encoding processing of spectral parameters of two channels input from the M/S conversion unit 12, and switch-multiplexes the encoding result (for example, referred to as “stereo FD encoded information”). output to the conversion unit 17.
  • the ITD correction unit 14, the mixing unit 15, and the CELP-based encoding unit 16 may constitute a stereo TD encoding unit (for example, corresponding to the first encoding circuit) that performs stereo TD encoding. .
  • the ITD correction unit 14 receives, for example, the L-channel signal and R-channel signal in the time domain after preprocessing, ITD correction control information, and analysis information from the transform/analysis/preprocessing/encoding control unit 11 . you can For example, based on the ITD correction control information, the ITD correction unit 14 performs correction processing (for example, correction processing for bringing the absolute value of the ITD closer to zero) for the L channel signal and the R channel signal to be equal to or less than a threshold value (for example, correction processing for ITD correction processing) may be performed.
  • the ITD correction section 14 may output the L channel signal and the R channel signal after the ITD correction processing to the mixing section 15 . An example of ITD correction processing in the ITD correction unit 14 will be described later.
  • the ITD correction process is performed on the encoding (encoder) side, it does not have to be performed on the decoding (decoder) side (for example, the decoding side does not need to perform restoration processing).
  • at least one of an upper limit and a lower limit may be set for the maximum number of shifts (eg, the number of samples) that can be corrected (eg, shifted).
  • the angular resolution required to reproduce human voice in an arbitrary three-dimensional radial direction (for example, also referred to as azimuth perceptual resolution) is 30 degrees (see, for example, Non-Patent Document 2). Therefore, for example, the range of ITD correction may be set so that the angle of the direction of arrival is within about 30 degrees.
  • the correctable range may be set to ⁇ 3 samples.
  • the ITD correction range is not limited to ⁇ 3 samples, and other values may be used.
  • the perceptual resolution of the azimuth referred to when setting the ITD correction range is not limited to 30 degrees.
  • the ITD correction unit 14 may clip it at the upper limit value or the lower limit value.
  • ILD correction process for correcting ILD in the L channel signal and the R channel signal may be performed.
  • the encoding device 10 may adjust the amplitudes of both channel signals so that the ILD between the L channel signal and the R channel signal after ITD correction processing is zero, that is, the energy of both channel signals is equal.
  • encoding device 10 may adjust the amplitudes of both channel signals so as to have the average energy of the energy of the L channel signal and the energy of the R channel signal.
  • the encoding device 10 may perform amplitude adjustment such that the amount of amplitude adjustment is gradually increased from the frame start point in order to avoid discontinuity between frames.
  • the encoding device 10 may calculate an amplitude adjustment coefficient (for example, gain) and multiply both channel signals after ITD correction processing by the calculated amplitude adjustment coefficient.
  • an amplitude adjustment coefficient for example, gain
  • the amplitude adjustment coefficient calculation procedure includes an energy calculation step, an amplitude ratio calculation step, and an amplitude adjustment coefficient calculation step.
  • the energy calculation step calculates the frame energy (EL and ER) of the L channel signal (L) after ITD correction processing and the R channel signal (R) after ITD correction processing, and outputs them to the amplitude ratio calculation step. do.
  • the amplitude ratio calculation step obtains the square root of the ratio of EL and ER, and outputs it as the amplitude ratio (RLR) of L and R to the amplitude adjustment coefficient calculation step.
  • the amplitude ratio calculation step may output the amplitude ratio as 1 without calculating the amplitude ratio when the average energy, power, or magnitude of amplitude of both channel signals does not exceed a predetermined threshold. .
  • amplitude adjustment processing is not performed on a signal with a low level, and useless processing can be skipped.
  • the amplitude adjustment coefficient calculation step obtains the square root of the ratio of the average value of the square of RLR and 1 (for example, 0.5 ⁇ (RLR ⁇ RLR+1)) and the square of RLR (for example, RLR ⁇ RLR), and calculates L Let it be the amplitude adjustment factor (GL) for the channel. Also, the amplitude adjustment coefficient calculation step obtains an amplitude adjustment coefficient (GR) for the R channel by multiplying GL by RLR. Note that in the amplitude adjustment coefficient step, when the obtained GL is not within a predetermined threshold range (for example, the lower limit threshold or more and the upper limit threshold or less), if the GL exceeds the upper limit threshold, it is clipped to the upper limit threshold. If GL is below the lower threshold, it may be clipped to the lower threshold. By keeping the amplitude adjustment coefficient within a specific range in this way, it is possible to avoid an excessive change in amplitude due to amplitude adjustment.
  • a predetermined threshold range for example, the lower limit threshold or more and the upper
  • the signal after amplitude adjustment is smoothed between frames.
  • the procedure for calculating the amplitude adjustment coefficient is not limited to the processing shown in FIG.
  • the amplitude adjustment coefficient is not limited to the value obtained by the processing shown in FIG. 2, and may be a value calculated so that the amplitudes (or energies) of both channel signals are equal.
  • the encoding device 10 may perform processing for bringing ILD closer to zero (for example, ILD correction processing) in addition to processing for bringing ITD closer to zero (for example, ITD correction processing).
  • ILD correction processing processing for bringing ILD closer to zero
  • ITD correction processing processing for bringing ITD closer to zero
  • the mixing unit 15 may receive the L channel signal and the R channel signal after ITD correction processing from the ITD correction unit 14 and may receive mixing control information from the conversion/analysis/preprocessing/encoding control unit 11. .
  • the mixing unit 15 performs mixing processing of the L-channel signal and the R-channel signal, for example, based on the mixing control information, and outputs the two-channel signal after the mixing processing to the CELP-based encoding unit 16 .
  • An example of mixing processing in the mixing section 15 will be described later.
  • the CELP-based encoding unit 16 converts each of the two-channel signals input from the mixing unit 15 (for example, the M/S signal obtained by converting the input stereo signal after ITD correction) into, for example, Enhanced Voice CELP-based codecs (e.g., multi-mode codecs, multi-mode codecs, or multi-mode monophonic codec).
  • CELP-based encoding section 16 may output a signal obtained by multiplexing the encoding result of each channel (for example, “stereo TD encoded information”) to switching multiplexing section 17 .
  • the switching multiplexing unit 17 receives M /S conversion control information, mixing control information, stereo FD coded information input from spectrum coding section 13, and stereo TD coded information input from CELP base coding section 16, information to be transmitted is multiplexed. It may be output to a transmission line such as a communication channel or a recording medium such as a storage medium.
  • either one of the stereo FD encoded information and the stereo TD encoded information may be input to the switching multiplexing section 17 based on the encoding control information.
  • FIG. 3 is a flow diagram showing an example of the processing procedure of the encoding device 10. As shown in FIG.
  • the transform/analysis/preprocessing/encoding control unit 11 performs, for example, transform processing, analysis processing, and preprocessing on the L-channel signal and the R-channel signal (S1).
  • the encoding device 10 determines whether the target frame is a frame using stereo TD encoding (S2). For example, encoding device 10 may determine whether or not a condition for applying stereo TD encoding is satisfied. Alternatively, for example, the encoding device 10 may determine whether or not a condition for applying stereo FD encoding is satisfied.
  • Encoding apparatus 10 may determine whether or not to use stereo TD encoding, for example, based on the analysis result of inter-channel correlation (ICC) between the L channel and the R channel. based on an LR/MS decision algorithm (eg, a method for determining M/S conversion control). For example, when the inter-channel correlation (ICC) is high (for example, when the ICC value is equal to or greater than a threshold value), the encoding device 10 determines that the conditions for applying stereo TD encoding are satisfied, and the inter-channel correlation (ICC) is low (eg, the ICC value is less than a threshold), it may be determined that the conditions for applying stereo TD coding are not satisfied.
  • ICC inter-channel correlation
  • the encoding device 10 may analyze whether the type of the input stereo signal is an audio signal, for example, in the analysis process.
  • the conditions for applying stereo TD encoding may be based, for example, on the type of input stereo signal. For example, when the type of input stereo signal is an audio signal, the encoding apparatus 10 determines that the conditions for applying stereo TD encoding are met, and when the type of the input stereo signal is not an audio signal, the encoding apparatus 10 performs stereo TD encoding. may be determined not to meet the conditions for applying
  • the conditions for applying stereo TD encoding may be based, for example, on the inter-channel time difference (ITD) in the input stereo signal. For example, when the ITD value obtained from the ITD analysis is within a preset threshold range near 0, the encoding device 10 determines that the conditions for applying stereo TD encoding are satisfied, and determines that the ITD value is outside the threshold range, it may be determined that the conditions for applying stereo TD encoding are not met.
  • ITD inter-channel time difference
  • the preset range may be, for example, a range expanded to within about 50% of the correctable range of the ITD correction process (for example, the range based on perceptual resolution).
  • the preset range is such that when the ITD changes from within the predetermined range to outside the range, or when the ITD changes from outside the predetermined range to within the range, the state after the change continues for a predetermined number of frames. After that, the determination result may be changed. The purpose of this is to avoid a situation in which stereo FD encoding and stereo TD encoding are frequently switched between frames in the case of an input signal whose ITD changes near the boundary of the ITD range.
  • condition for applying stereo TD encoding may be based on, for example, the bit rate for the input stereo signal. For example, encoding apparatus 10 determines that the conditions for applying stereo TD encoding are satisfied when the bit rate is equal to or less than the threshold, and the conditions for applying stereo TD encoding are not satisfied when the bit rate is greater than the threshold. You can judge.
  • the conditions for applying stereo TD encoding may be based on, for example, at least one of the above-described ICC, LR/MS decision algorithm, type of input stereo signal, ITD, and bit rate.
  • the encoding device 10 determines that the frame uses stereo TD encoding (S2: YES), it performs stereo TD encoding processing (S3). For example, if the encoding device 10 determines to apply the stereo TD encoding described above, the encoding device 10 converts the stereo audio signal from an LR stereo signal to an M/S stereo signal, and converts the M signal and the S signal to CELP-based encoding. It may be determined to encode using an encoder (for example, the CELP-based encoding unit 16).
  • S3 stereo TD encoding processing
  • ACELP Algebraic CELP
  • CELP coding has better coding performance for speech signals than other coding at low to medium bit rates. Therefore, as described above, encoding apparatus 10 can improve the encoding performance of speech signals by performing CELP-based stereo TD encoding when the conditions are satisfied.
  • the encoding apparatus 10 applies CELP-based encoding to the M signal and CELP-based encoding to the S signal, for example, for stereo audio signals with high inter-channel correlation. Coding may be applied.
  • stereo FD encoding processing is performed (S4).
  • FIG. 4 is a flow chart showing an example of a processing procedure for stereo TD encoding (for example, processing of S3 shown in FIG. 3).
  • the encoding device 10 performs ITD correction processing for correcting (absolute value of) ITD to a threshold value or less for the L channel signal and the R channel signal (S31).
  • the encoding device 10 performs mixing processing (for example, time-domain LR ⁇ M/S conversion processing) on the ITD-corrected R-channel signal and L-channel signal (S32).
  • mixing processing for example, time-domain LR ⁇ M/S conversion processing
  • the encoding device 10 for example, performs encoding processing for each channel on the two channel signals after the mixing processing (S33).
  • the ITD correction process is performed, for example, after it is determined that the frame to be encoded is a frame to be subjected to stereo TD encoding (for example, called a “stereo TD encoded frame”).
  • stereo TD-encoded frames can be classified into the following three types.
  • the first stereo TD frame (hereinafter also referred to as the "first frame") after switching from a frame for which stereo FD encoding processing is performed (for example, referred to as a "stereo FD encoded frame").
  • a frame followed by a stereo TD-encoded frame (hereinafter also referred to as a “second frame”).
  • the second frame may be, for example, a frame whose preceding and succeeding frames are not stereo FD frames.
  • the last stereo TD-encoded frame (hereinafter also referred to as "third frame”).
  • the third frame may be a frame that switches to a stereo FD encoded frame in the next frame.
  • the method of ITD correction processing in each of these three types of frames may be different.
  • the CELP-based encoding unit 16 selects the MDCT-based encoding mode as described later. good. In the first frame, if the ITD is not zero, an ITD correction process may be performed to bring it closer to zero.
  • the immediately preceding frame is a stereo TD-encoded frame, and there is a high possibility that ITD correction processing has already been applied.
  • the encoding apparatus 10 may, for example, gradually delay the signal of one channel (shift the waveform in the future direction on the time axis) according to the difference (change) between the ITD in the previous frame and the ITD in the current frame.
  • the correction process may be performed such that the waveform is shifted toward the past), or the waveform is gradually advanced (the waveform is shifted in the past direction on the time axis).
  • the encoding device 10 when there is no change in the ITD between the immediately preceding frame and the current frame (for example, when the difference (the absolute value of) is within a threshold value or when it is 0), the encoding device 10 gradually changes the ITD Correction processing may not be performed (for example, the previous shift amount may be maintained).
  • the encoding device 10 may set an upper limit for the amount of ITD correction (for example, the number of samples by which the signal of one channel is delayed) in order to suppress abrupt changes in the signal due to correction processing.
  • the encoding device 10 may set (eg, limit) the upper limit (eg, maximum value) of the number of samples that can be corrected per frame to 1 sample. In this case, it takes two frames or more to perform ITD correction for more than one sample.
  • the subsequent frames are switched to stereo FD encoding, it is preferable to perform ITD correction processing to restore the corrected ITD.
  • ITD correction processing to restore the corrected ITD.
  • setting an upper limit on the number of undo samples per frame e.g., limiting or limiting
  • the encoding device 10 gradually advances the channel delayed by the ITD correction process (shifted in the future direction on the time axis) (shifted in the past direction on the time axis) to the original position. Perform the return process.
  • the encoding device 10 gradually converts the time signal within one sample in a plurality of stereo TD-encoded frames (for example, sections) other than the third frame immediately before the frame in which stereo FD encoding is performed.
  • a shifting ITD correction may be performed.
  • FIG. 5 is a flow chart showing an example of the processing procedure of the ITD correction process described above (for example, the process of S31 shown in FIG. 4).
  • the encoding device 10 determines whether or not the frame is the first frame for switching to stereo TD encoding (S311).
  • the encoding device 10 does not need to perform ITD correction processing (for example, end ITD correction processing). Note that, as described above, the encoding device 10 may perform ITD correction processing on this frame. In this case, the process of S311 may not be performed, and the first frame may be treated in the same manner as the second frame.
  • the encoding device 10 determines whether the frame is the third frame for switching to stereo FD encoding (S312).
  • the encoding device 10 may perform ITD correction processing (S313).
  • the encoding device 10 may restore the ITD to the ITD-corrected channel (S314). By this processing, the ITD correction processing is finished so that the input signal is finally output as it is.
  • FIG. 6 is a diagram showing the processing flow of the ITD correction processing shown in FIG. 5 using pseudo program code.
  • the process of advancing the signal for example, the process of shifting in the past direction on the time axis
  • the process of delaying the signal for example, the process of shifting the signal in the future direction on the time axis
  • a resolution of less than one sample may be used to achieve the change. This can be done using an interpolating filter that interpolates between the samples. For example, it can be implemented similarly to the fractionally delayed long-term prediction filter used in the known CELP codec.
  • FIG. 7 is a diagram showing an example of a coefficient set of an interpolation filter (for example, an FIR filter) that interpolates with 1/24 sample accuracy using a total of 13 points, 6 samples before and after.
  • the interpolation filter is equivalent to the impulse response of a delay filter that delays the signal with an accuracy of 1/24 sample and is inverted on the time axis.
  • a filter with a coefficient set consisting of 0 and 1 is shown for the sake of convenience, but it may be omitted in terms of implementation (for example, since the input and output do not change or only shift by one sample, the filter does not have to be applied as a treatment).
  • the coefficient set shown in FIG. 7 is gradually switched from the upper coefficient set to the lower coefficient set.
  • FIG. 8 is a diagram showing, as an example, how encoding modes are switched over five frames in which three types of stereo TD encoded frames and stereo FD encoded frames are switched. Time elapses from the left end to the right end of FIG. 8, and the frames are separated by broken lines.
  • the leftmost frame is the second frame of the stereo TD-encoded frames.
  • the second frame from the left is the stereo TD-encoded frame (third frame) immediately before switching to the stereo FD-encoded frame.
  • the third frame from the left is a stereo FD-encoded frame.
  • the fourth frame from the left is the stereo TD-encoded frame (first frame) immediately after switching from the stereo FD-encoded frame.
  • the fifth frame from the left is the second frame of the stereo TD-encoded frames, like the leftmost frame.
  • a section for example, "M/S->LR transition section" in which the M/S stereo signal gradually changes to the LR stereo signal is provided.
  • the encoding device 10 may perform mixing processing (an example will be described later) of M/S ⁇ LR transition.
  • mixing processing an example will be described later
  • MDCT-based coding mode may be set.
  • MDCT-based coding modes may include, for example, MDCT-based Transform coded excitation (TCX) mode for EVS codecs.
  • a section for example, "LR->M/S transition section" in which the LR stereo signal gradually changes to the M/S stereo signal is provided. It's good.
  • the encoding device 10 may perform mixing processing (an example will be described later) of LR ⁇ M/S transition. In the mixing process of LR ⁇ M / S transition, for example, in order to seamlessly (or smoothly) connect with the previous stereo FD encoded frame, the same kind of encoding mode as in stereo FD encoding MDCT-based coding mode may be set.
  • encoding device 10 performs MDCT-based encoding of stereo TD encoding in a frame adjacent to a frame in which stereo FD encoding is performed, among a plurality of consecutive frames (for example, sections) in which stereo TD encoding is performed.
  • encoding device 10 performs the M/S->LR transition section where stereo TD encoding is switched to stereo FD encoding, and from stereo FD encoding to stereo TD encoding, among frames to be stereo TD encoded.
  • encoding may be performed based on the encoding mode in stereo FD encoding (eg, MDCT-based encoding mode).
  • FIG. 9 is a diagram showing an example of mixing processing (encoding-side processing) and inverse mixing processing (decoding-side processing) corresponding to the switching transition between stereo TD encoding and stereo FD encoding shown in FIG. is. Time elapses from the left end to the right end of FIG. 9, and the frames are separated by broken lines. Also, the types of the five frames shown in FIG. 9 (for example, one of the first to third frames of the stereo FD-encoded frame and the stereo TD-encoded frame) are the same as in the example shown in FIG.
  • channel conversion processing (mixing processing) is expressed by, for example, the following equation (1).
  • L n and R n indicate the L-channel signal and R-channel signal before conversion processing, respectively, and the suffix n indicates time (sample number). Also, in equation (1), M n and S n denote the M-channel signal and S-channel signal after conversion processing, respectively.
  • the channel represented by the following equation (2) Conversion processing (mixing processing) may be performed.
  • N indicates the frame length (or transition section length).
  • the transition interval length N may be shorter than one frame, for example.
  • the stereo signal gradually transitions from the M/S signal to the LR signal as time n elapses.
  • the channel conversion represented by the following equation (3) Processing (mixing processing) may be performed.
  • N indicates the frame length (or transition section length).
  • the transition interval length N may be shorter than one frame, for example.
  • the stereo signal gradually transitions from the LR signal to the M/S signal as time n elapses.
  • FIG. 10 is a diagram showing a configuration example of a decoding device (or called a “decoding system”) 20. As shown in FIG.
  • the decoding device 20 may include, for example, a separation switching section 21, a spectrum decoding section 22, an inverse M/S conversion section 23, an inverse conversion section 24, a CELP-based decoding section 25, an inverse mixing section 26, and a switching section 27. .
  • Multiplexed encoded information is input to the separation switching unit 21 from, for example, a transmission path such as a communication channel or a recording medium such as a storage medium.
  • the separation switching unit 21 may, for example, separate the encoded information into a plurality of pieces of control information and switch the output destination of the separated control information.
  • the separation switching unit 21 outputs the stereo FD encoded information (eg, spectrum encoded information) to the spectrum decoding unit 22, and performs M/S conversion.
  • the control information may be output to the inverse M/S converter 23 .
  • the separation switching unit 21 converts the stereo TD encoded information (for example, the encoded information of the CELP-based encoding unit 16) to the CELP-based decoding unit. 25 and the mixing control information may be output to the inverse mixing unit 26 .
  • the separation switching unit 21 transmits information indicating which of stereo FD encoded information and stereo TD encoded information is transmitted (or which of stereo FD encoding and stereo TD encoding is applied). You may output to the switching part 27.
  • FIG. 1 For example, transmits information indicating which of stereo FD encoded information and stereo TD encoded information is transmitted (or which of stereo FD encoding and stereo TD encoding is applied). You may output to the switching part 27.
  • spectrum decoding section 22 and inverse M/S conversion section 23 may constitute a stereo FD decoding section that decodes stereo encoded information in the frequency domain (for example, referred to as "stereo FD decoding").
  • the spectral decoding unit 22 receives, for example, spectral encoded information output from the separation switching unit 21 , decodes spectral information of two channels, and outputs the decoded spectral information to the inverse M/S conversion unit 23 .
  • the inverse M/S conversion unit 23 receives the two-channel decoded spectrum output from the spectrum decoding unit 22 and the M/S conversion control information output from the separation switching unit 21, and converts the M/S conversion control information into Based on this, inverse M/S conversion is performed on the two-channel decoded spectrum, and the LR stereo spectrum (for example, MDCT spectrum) is output to the inverse transform section 24 .
  • the LR stereo spectrum for example, MDCT spectrum
  • the inverse transform unit 24 receives the LR stereo signal (MDCT spectrum) output from the inverse M/S transform unit 23, performs inverse transform (for example, Inverse MDCT (IMDCT)) processing, and converts the LR stereo signal (time signal) to the switching unit 27 .
  • inverse transform for example, Inverse MDCT (IMDCT)
  • IMDCT Inverse MDCT
  • the CELP-based decoding unit 25 and the inverse mixing unit 26 may constitute a stereo TD decoding unit that decodes stereo encoded information in the time domain (for example, called "stereo TD decoding").
  • the CELP-based decoding unit 25 receives, for example, the encoded information of the CELP-based encoding unit 16 output from the separation switching unit 21, decodes the 2-channel audio signals, and outputs them to the inverse mixing unit 26.
  • the inverse mixing unit 26 receives, for example, the two-channel decoded audio signal output from the CELP-based decoding unit 25, and based on the mixing control information output from the separation/switching unit 21, the two-channel decoded audio signal is to perform inverse mixing processing, reconstruct the LR stereo signal, and output it to the switching unit 27 .
  • the switching unit 27 receives information output from the separation switching unit 21, receives the decoded LR stereo signal from either the inverse transform unit 24 or the inverse mixing unit 26 according to the information, and Output as final LR stereo signals (eg, L channel signal and R channel signal).
  • final LR stereo signals eg, L channel signal and R channel signal.
  • the decoding device 20 does not perform processing corresponding to the ITD correction processing performed in stereo TD encoding (for example, inverse correction processing that restores the corrected ITD). good.
  • FIG. 1 An example of inverse mixing processing corresponding to switching transition between stereo TD decoding and stereo FD decoding is shown in FIG.
  • channel conversion processing (inverse mixing processing) is expressed by the following equation (4), for example.
  • the channel represented by the following equation (5) A conversion process (inverse mixing process) may be performed.
  • the decoded stereo signal gradually transitions from the M/S signal to the LR signal as time n elapses.
  • the channel represented by the following equation (6) A conversion process (inverse mixing process) may be performed.
  • the decoded stereo signal gradually transitions from the LR signal to the M/S signal as time n elapses.
  • encoding apparatus 10 when it is determined that the input stereo signal is a signal suitable for encoding using the full M/S encoding mode, the condition Depending on the type of input stereo signal (for example, the type of input stereo signal), it is determined whether to convert the input stereo signal into an M/S signal in the time domain and apply stereo TD coding, or apply stereo FD coding. Then, encoding apparatus 10 encodes the M/S signal when stereo TD encoding is applied, and encodes the input stereo signal in the frequency domain when stereo FD encoding is applied.
  • the type of input stereo signal for example, the type of input stereo signal
  • the encoding device 10 may apply CELP-based encoding.
  • the encoding device 10 may use a codec that switches between MDCT encoding and CELP encoding (MDCT/CELP switching hybrid codec).
  • MDCT/CELP switching hybrid codec MDCT/CELP switching hybrid codec
  • the encoding device 10 corrects the inter-channel time difference (ITD) between the L channel and the R channel in the input stereo signal to a threshold value or less (for example, near 0) to correct the ITD. Encoding is performed on the subsequent M/S signal.
  • ITD inter-channel time difference
  • the ITD can be made close to zero in the encoding of speech signals using the M/S stereo method. can improve performance.
  • the ITD correction process is performed by the encoding device 10 and not performed by the decoding device 20 . Therefore, information about ITD correction does not have to be transmitted to the decoding device 20, so an increase in the amount of encoded information or the processing amount of the decoding device 20 can be suppressed.
  • the input stereo signal is a signal suitable for encoding using only the M/S stereo method.
  • the decision to select the full M/S coding mode is made when the percentage of bands determined to use the M/S stereo scheme among multiple bands (subbands) of the frequency spectrum of the input stereo signal is equal to or greater than a threshold. It may be determined by whether there is For example, the full M/S coding mode may be selected if the percentage of bands determined to use the M/S stereo scheme is greater than or equal to a threshold.
  • the determination to select the full M/S coding mode may be determined by whether or not it is determined to use the M/S stereo scheme in all of the multiple bands of the frequency spectrum of the input stereo signal.
  • the full M/S coding mode may be selected when it is determined to use the M/S stereo scheme in all of the multiple bands.
  • the parameters such as the number of frames, the number of samples, the angle of resolution, and the threshold used in the above embodiment are examples, and other values may be used.
  • Each functional block used in the description of the above embodiments is partially or wholly realized as an LSI, which is an integrated circuit, and each process described in the above embodiments is partially or wholly implemented as It may be controlled by one LSI or a combination of LSIs.
  • An LSI may be composed of individual chips, or may be composed of one chip so as to include some or all of the functional blocks.
  • the LSI may have data inputs and outputs.
  • LSIs are also called ICs, system LSIs, super LSIs, and ultra LSIs depending on the degree of integration.
  • the method of circuit integration is not limited to LSI, and may be realized by a dedicated circuit, a general-purpose processor, or a dedicated processor. Further, an FPGA (Field Programmable Gate Array) that can be programmed after the LSI is manufactured, or a reconfigurable processor that can reconfigure the connections and settings of the circuit cells inside the LSI may be used.
  • FPGA Field Programmable Gate Array
  • reconfigurable processor that can reconfigure the connections and settings of the circuit cells inside the LSI may be used.
  • the present disclosure may be implemented as digital or analog processing. Furthermore, if an integration technology that replaces the LSI appears due to advances in semiconductor technology or another derived technology, the technology may naturally be used to integrate the functional blocks. Application of biotechnology, etc. is possible.
  • a communication device may include a radio transceiver and processing/control circuitry.
  • a wireless transceiver may include a receiver section and a transmitter section, or functions thereof.
  • a wireless transceiver (transmitter, receiver) may include an RF (Radio Frequency) module and one or more antennas.
  • RF modules may include amplifiers, RF modulators/demodulators, or the like.
  • Non-limiting examples of communication devices include telephones (mobile phones, smart phones, etc.), tablets, personal computers (PCs) (laptops, desktops, notebooks, etc.), cameras (digital still/video cameras, etc.).
  • digital players digital audio/video players, etc.
  • wearable devices wearable cameras, smartwatches, tracking devices, etc.
  • game consoles digital book readers
  • telehealth and telemedicine (remote health care/medicine prescription) devices vehicles or mobile vehicles with communication capabilities (automobiles, planes, ships, etc.), and combinations of the various devices described above.
  • Communication equipment is not limited to portable or movable equipment, but any type of equipment, device or system that is non-portable or fixed, e.g. smart home devices (household appliances, lighting equipment, smart meters or measuring instruments, control panels, etc.), vending machines, and any other "Things" that can exist on the IoT (Internet of Things) network.
  • smart home devices household appliances, lighting equipment, smart meters or measuring instruments, control panels, etc.
  • vending machines and any other "Things” that can exist on the IoT (Internet of Things) network.
  • Communication includes data communication by cellular system, wireless LAN system, communication satellite system, etc., as well as data communication by a combination of these.
  • Communication apparatus also includes devices such as controllers and sensors that are connected or coupled to communication devices that perform the communication functions described in this disclosure. Examples include controllers and sensors that generate control and data signals used by communication devices to perform the communication functions of the communication device.
  • Communication equipment also includes infrastructure equipment, such as base stations, access points, and any other equipment, device, or system that communicates with or controls the various equipment, not limited to those listed above. .
  • An encoding apparatus when it is determined that an input stereo signal is a signal suitable for encoding using a mid-side stereo scheme, according to conditions, the input stereo a control circuit for determining whether to transform a signal into a mid-side signal in the time domain and apply a first encoding or to apply a second encoding; a first encoding circuit for encoding a side signal; and a second encoding circuit for encoding said input stereo signal in the frequency domain when said second encoding is applied.
  • the first encoding includes Code-Excited-Linear-Prediction (CELP)-based encoding
  • the second encoding includes Modified Discrete Cosine Transform (MDCT)-based contains the encoding of
  • the first encoding is multimode encoding and further includes Modified Discrete Cosine Transform (MDCT) based encoding.
  • MDCT Modified Discrete Cosine Transform
  • the condition is based on the type of the input stereo signal, and the control circuit determines application of the first encoding when the type is an audio signal.
  • the condition is based on an inter-channel time difference between a left channel and a right channel in the input stereo signal, and the control circuit controls the first Determines the application of encoding.
  • the condition is based on the correlation between left and right channels in the input stereo signal, and the control circuit is configured to apply the first encoding if the correlation is greater than or equal to a threshold. decide.
  • condition is based on bitrate
  • control circuit determines to apply the first encoding if the bitrate is less than or equal to a threshold.
  • the determination includes whether a ratio of bands determined to use the mid-side stereo method, among a plurality of bands of the frequency spectrum of the input stereo signal, is equal to or greater than a threshold; Alternatively, it is determined by whether or not it is determined that the mid-side stereo system is used in all of the plurality of bands.
  • a correction circuit that performs correction processing to bring the inter-channel time difference between the left channel and the right channel in the input stereo signal closer to 0, wherein the first encoding circuit Encoding is performed on the mid-side signal obtained by converting the input stereo signal after correcting the time difference.
  • the range of correction for the inter-channel time difference is based on the angular resolution for reproducing the audio signal.
  • control circuit performs Modified Discrete Cosine of the first encoding in a section adjacent to a section in which the second encoding is performed, among a plurality of continuous sections in which the first encoding is performed.
  • Transform (MDCT) based encoding is performed.
  • the encoding apparatus performs to determine whether to transform the input stereo signal into a mid-side signal in the time domain and apply the first encoding, or to apply the second encoding, and applying the first encoding, When encoding the mid-side signal and applying the second encoding, the encoding of the input stereo signal is performed in the frequency domain.
  • An embodiment of the present disclosure is useful for coding systems and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Mathematical Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
PCT/JP2023/002481 2022-02-08 2023-01-26 符号化装置、及び、符号化方法 Ceased WO2023153228A1 (ja)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2023580166A JPWO2023153228A1 (https=) 2022-02-08 2023-01-26
US18/835,764 US20250191596A1 (en) 2022-02-08 2023-01-26 Encoding device and encoding method

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2022017997 2022-02-08
JP2022-017997 2022-02-08
JP2022-143856 2022-09-09
JP2022143856 2022-09-09

Publications (1)

Publication Number Publication Date
WO2023153228A1 true WO2023153228A1 (ja) 2023-08-17

Family

ID=87564084

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/002481 Ceased WO2023153228A1 (ja) 2022-02-08 2023-01-26 符号化装置、及び、符号化方法

Country Status (3)

Country Link
US (1) US20250191596A1 (https=)
JP (1) JPWO2023153228A1 (https=)
WO (1) WO2023153228A1 (https=)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017156767A (ja) * 2012-09-18 2017-09-07 ホアウェイ・テクノロジーズ・カンパニー・リミテッド 低または中ビットレートに対する知覚品質に基づくオーディオ分類
JP2018529122A (ja) * 2016-01-22 2018-10-04 フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン スペクトルドメイン・リサンプリングを用いて多チャネル信号を符号化又は復号化する装置及び方法
JP2021119383A (ja) * 2016-01-22 2021-08-12 フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ 改良されたミッド/サイド決定を持つ包括的なildを持つmdct m/sステレオのための装置および方法
JP2021529354A (ja) * 2018-07-04 2021-10-28 フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン マルチシグナルエンコーダ、マルチシグナルデコーダ、および信号白色化または信号後処理を使用する関連方法

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8041042B2 (en) * 2006-11-30 2011-10-18 Nokia Corporation Method, system, apparatus and computer program product for stereo coding
EP2144230A1 (en) * 2008-07-11 2010-01-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Low bitrate audio encoding/decoding scheme having cascaded switches
JPWO2014068817A1 (ja) * 2012-10-31 2016-09-08 株式会社ソシオネクスト オーディオ信号符号化装置及びオーディオ信号復号装置
EP2830054A1 (en) * 2013-07-22 2015-01-28 Fraunhofer Gesellschaft zur Förderung der angewandten Forschung e.V. Audio encoder, audio decoder and related methods using two-channel processing within an intelligent gap filling framework
EP3067886A1 (en) * 2015-03-09 2016-09-14 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal
FR3048808A1 (fr) * 2016-03-10 2017-09-15 Orange Codage et decodage optimise d'informations de spatialisation pour le codage et le decodage parametrique d'un signal audio multicanal
CN109389984B (zh) * 2017-08-10 2021-09-14 华为技术有限公司 时域立体声编解码方法和相关产品
US11270710B2 (en) * 2017-09-25 2022-03-08 Panasonic Intellectual Property Corporation Of America Encoder and encoding method
US10734001B2 (en) * 2017-10-05 2020-08-04 Qualcomm Incorporated Encoding or decoding of audio signals
EP3985665B1 (en) * 2018-04-05 2024-08-21 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus, method or computer program for estimating an inter-channel time difference

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017156767A (ja) * 2012-09-18 2017-09-07 ホアウェイ・テクノロジーズ・カンパニー・リミテッド 低または中ビットレートに対する知覚品質に基づくオーディオ分類
JP2018529122A (ja) * 2016-01-22 2018-10-04 フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン スペクトルドメイン・リサンプリングを用いて多チャネル信号を符号化又は復号化する装置及び方法
JP2021119383A (ja) * 2016-01-22 2021-08-12 フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ 改良されたミッド/サイド決定を持つ包括的なildを持つmdct m/sステレオのための装置および方法
JP2021529354A (ja) * 2018-07-04 2021-10-28 フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン マルチシグナルエンコーダ、マルチシグナルデコーダ、および信号白色化または信号後処理を使用する関連方法

Also Published As

Publication number Publication date
JPWO2023153228A1 (https=) 2023-08-17
US20250191596A1 (en) 2025-06-12

Similar Documents

Publication Publication Date Title
US8798276B2 (en) Method and apparatus for encoding multi-channel audio signal and method and apparatus for decoding multi-channel audio signal
RU2495503C2 (ru) Устройство кодирования звука, устройство декодирования звука, устройство кодирования и декодирования звука и система проведения телеконференций
CN112735447B (zh) 压缩和解压缩高阶高保真度立体声响复制信号表示的方法及装置
JP6856655B2 (ja) 複数のオーディオ信号の符号化
RU2325046C2 (ru) Аудиокодирование
EP2209114B1 (en) Speech coding/decoding apparatus/method
JP7689196B2 (ja) 空間音声ストリームの結合
WO2007088853A1 (ja) 音声符号化装置、音声復号装置、音声符号化システム、音声符号化方法及び音声復号方法
KR20190067825A (ko) 다수의 오디오 신호들의 디코딩
CN102272830B (zh) 音响信号解码装置及平衡调整方法
Lindblom et al. Flexible sum-difference stereo coding based on time-aligned signal components
WO2023153228A1 (ja) 符号化装置、及び、符号化方法
KR20090122143A (ko) 오디오 신호 처리 방법 및 장치
EP2402941B1 (en) Channel signal generation apparatus
US20260045263A1 (en) Encoding device and encoding method
US12367884B2 (en) Encoding device, decoding device, encoding method, and decoding method
HK40051314B (en) Method and apparatus for compressing and decompressing a higher order ambisonics signal representation
HK40051314A (en) Method and apparatus for compressing and decompressing a higher order ambisonics signal representation
HK40050574B (en) Method and apparatus for compressing and decompressing a higher order ambisonics signal representation
HK40050574A (en) Method and apparatus for compressing and decompressing a higher order ambisonics signal representation
HK1238786A1 (en) Method and apparatus for compressing and decompressing a higher order ambisonics signal representation
HK1135791A1 (en) Audio decoding
HK1135791B (en) Audio decoding
HK40001808A (en) Decoding of multiple audio signals
HK1235535A1 (en) Method and apparatus for compressing and decompressing a higher order ambisonics signal representation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23752693

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2023580166

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 18835764

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 23752693

Country of ref document: EP

Kind code of ref document: A1

WWP Wipo information: published in national office

Ref document number: 18835764

Country of ref document: US