WO2013168414A1 - 音信号ハイブリッドエンコーダ、音信号ハイブリッドデコーダ、音信号符号化方法、及び音信号復号方法 - Google Patents
音信号ハイブリッドエンコーダ、音信号ハイブリッドデコーダ、音信号符号化方法、及び音信号復号方法 Download PDFInfo
- Publication number
- WO2013168414A1 WO2013168414A1 PCT/JP2013/002950 JP2013002950W WO2013168414A1 WO 2013168414 A1 WO2013168414 A1 WO 2013168414A1 JP 2013002950 W JP2013002950 W JP 2013002950W WO 2013168414 A1 WO2013168414 A1 WO 2013168414A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal
- frame
- lfd
- decoder
- encoder
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 217
- 230000005236 sound signal Effects 0.000 title claims abstract description 100
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 claims abstract description 74
- 238000004458 analytical method Methods 0.000 claims abstract description 31
- 238000013139 quantization Methods 0.000 claims description 21
- 238000006243 chemical reaction Methods 0.000 claims description 13
- 230000015572 biosynthetic process Effects 0.000 claims description 10
- 238000003786 synthesis reaction Methods 0.000 claims description 10
- 230000004044 response Effects 0.000 claims description 7
- 230000007423 decrease Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 20
- 230000014509 gene expression Effects 0.000 description 19
- 238000004590 computer program Methods 0.000 description 16
- 230000008569 process Effects 0.000 description 11
- 230000003595 spectral effect Effects 0.000 description 8
- 238000005070 sampling Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 239000000470 constituent Substances 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 230000004048 modification Effects 0.000 description 6
- 238000012986 modification Methods 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 230000000875 corresponding effect Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000001228 spectrum Methods 0.000 description 5
- 238000009432 framing Methods 0.000 description 4
- 230000002596 correlated effect Effects 0.000 description 3
- 230000007774 longterm Effects 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 238000010187 selection method Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 239000002131 composite material Substances 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 230000005284 excitation Effects 0.000 description 2
- 238000009499 grossing Methods 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
- 230000002087 whitening effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/20—Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0212—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/06—Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/22—Mode decision, i.e. based on audio signal content versus external parameters
Definitions
- the present invention relates to a sound signal hybrid encoder and a sound signal hybrid decoder capable of switching a codec.
- Hybrid codec is a codec that combines the advantages of audio codec and speech codec.
- a sound signal in which content mainly composed of a speech signal (sound signal) and content mainly based on an audio signal (sound signal) is mixed by an encoding method suitable for each by switching between the audio codec and the speech codec.
- Hybrid codec can efficiently encode content that contains both speech and audio signals. For this reason, the hybrid codec is applicable to various applications such as audio books, broadcasting systems, portable media devices, portable communication terminals (for example, smartphones, tablet computers), video conferencing apparatuses, and music performances on a network. .
- the present invention provides a sound signal hybrid encoder that can efficiently generate an AC signal.
- a sound signal hybrid encoder includes a signal analysis unit that analyzes a characteristic of a sound signal and determines a coding method of a frame included in the sound signal, and performs LFD (Lapped Frequency Domain) conversion on the frame.
- LFD Lapped Frequency Domain
- An LFD encoder that generates an LFD frame in which the frame is encoded, an LP encoder that generates an LP (Linear Prediction) frame in which the frame is encoded by calculating a linear prediction coefficient of the frame, and the signal
- a switching unit that switches whether the frame is encoded by the LFD encoder or the LP encoder, and is continuous with the LP frame by switching control of the switching unit
- the LFD frame A local decoder that generates a local decode signal including a signal obtained by decoding at least a part of an AC (Aliasing Cancel) target frame, and a signal obtained by decoding at least a part of the LP frame that is continuous with the AC target frame;
- An AC signal generation unit that generates and outputs an AC signal used for removing aliasing that occurs in decoding of the AC target frame using the sound signal and the local decode signal, and the AC signal generation unit includes: When the AC target frame continues immediately after the LP frame, or when the AC target frame is
- the sound signal hybrid encoder of the present invention can efficiently generate an AC signal.
- FIG. 1 is a diagram for explaining removal of aliasing due to partial overlap in encoding / decoding using MDCT.
- FIG. 2 is a diagram illustrating an AC signal generation method used in switching from LP coding to transform coding.
- FIG. 3 is a diagram illustrating a method of generating an AC signal used in switching from transform coding to LP coding.
- FIG. 4 is a block diagram showing a configuration of the sound signal hybrid encoder according to the first embodiment.
- FIG. 5 is a diagram showing the shape of a window having a small overlap.
- FIG. 6 is a block diagram illustrating an example of the configuration of the AC signal generation unit.
- FIG. 7 is a flowchart illustrating an example of the operation of the AC signal generation unit.
- FIG. 1 is a diagram for explaining removal of aliasing due to partial overlap in encoding / decoding using MDCT.
- FIG. 2 is a diagram illustrating an AC signal generation method used in switching from LP coding to transform coding.
- FIG. 8 is a diagram illustrating a second method of AC signal generation used in switching from LP encoding to transform encoding.
- FIG. 9 is a diagram illustrating a second method of AC signal generation used in switching from transform coding to LP coding.
- FIG. 10 is a block diagram showing a configuration of the sound signal hybrid decoder according to the second embodiment.
- FIG. 11 is a block diagram illustrating an example of the configuration of the AC output signal generation unit.
- FIG. 12 is a flowchart illustrating an example of the operation of the AC output signal generation unit.
- the audio codec is suitable for encoding a stationary signal including local spectrum content (tone signal, harmonic signal, etc.).
- encoding is performed mainly by converting a signal into the frequency domain.
- an encoder of an audio codec converts an input signal into a frequency (spectrum) domain by using time-frequency domain transform such as modified discrete cosine transform (MDCT).
- time-frequency domain transform such as modified discrete cosine transform (MDCT).
- a frame to be encoded has a part (partial overlap) temporally overlapping with a frame that is temporally continuous (adjacent) to the frame, and each frame to be encoded has a window It is processed.
- the partial overlap is for smoothing the frame boundaries on the decoding side.
- the window processing has the two purposes of generating a higher resolution spectrum and blurring the boundary of the frame encoded for the above smoothing.
- MDCT converts time domain samples into a reduced number of spectral coefficients for encoding.
- a time-frequency domain transform such as MDCT generates an aliasing component, but the aliasing component is removed on the decoding side due to the partial overlap.
- One of the main advantages of audio codecs is that psychoacoustic models can be used easily. For example, a higher number of bits can be assigned to a perceptual “masker” and a lower number of bits can be assigned to a perceptual “masky” that the human ear cannot perceive. In the audio codec, coding efficiency and sound quality are greatly improved by using a psychoacoustic model.
- MPEG Advanced Audio Coding (AAC) is a good example of a pure audio codec.
- the speech codec is a method based on a model that uses the pitch characteristics of the vocal tract, and is suitable for encoding human speech.
- the speech codec encoder uses a linear prediction (LP) filter to encode the LP filter coefficients of the input signal in order to obtain a spectral envelope of human speech.
- LP linear prediction
- the LP filter performs inverse filtering on the input signal (split spectrally) to generate a sound source signal having a flat spectrum.
- the sound source signal here usually represents a sound source signal having a “code word”, and is sparsely encoded using a vector quantization (VQ) method.
- VQ vector quantization
- a long term predictor (LTP: Long Term Predictor) may be incorporated in order to capture long term periodicity of speech.
- LTP Long Term Predictor
- a whitening filter may be applied to the signal before the linear prediction filter, encoding in consideration of psychoacoustic aspects becomes possible.
- TCX Transform encoding excitation
- TCX is a method that combines LP coding and transform coding.
- the input signal is perceptually weighted with a perceptual filter derived from the linear prediction filter of the input signal.
- the weighted input signal is then converted to the spectral domain and the spectral coefficients are encoded with the VQ method.
- TCX is an ITU. Seen in T's extended adaptive multirate wideband (AMR-WB +) codec.
- the frequency transform used in (AMR-WB +) is a Discrete Fourier Transform (DFT: Discrete Fourier Transform).
- DFT Discrete Fourier Transform
- the above main encoding method can be supplemented by adding a low bit rate tool.
- the two main low bit rate tools are the bandwidth extension tool and the multi-channel extension tool.
- the Bandwidth Extension (BWE) tool uses the harmonic relationship between the low-frequency part and the high-frequency part of the input signal to parameterize the high-frequency part of the input signal.
- These bandwidth extension parameters are, for example, subband energy and TNR (Tone To Noise Ratio).
- the decoder forms a basic high frequency signal by extending the low frequency portion of the input signal depending on whether the input signal is patched or stretched.
- the decoder uses the bandwidth extension parameter to shape the amplitude of the spectrally extended signal. That is, the bandwidth extension parameter compensates for the noise floor and tone (tone color) with an artificially generated counterpart.
- MPEG high-efficiency AAC is a codec that includes such a bandwidth extension tool, codenamed Spectral Band Replication (SBR).
- SBR Spectral Band Replication
- parameter calculation is performed in a hybrid domain (time and frequency domain) generated by a quadrature mirror filter bank (QMF: Quadrature Mirror Filterbank).
- the multi-channel extension tool downmixes multi-channels into encoding channel subsets.
- Multi-channel expansion tools encode the relationships between individual channels in a parametric manner. These multi-channel extension parameters are, for example, level differences between channels, time differences between channels, and correlations between channels.
- the decoder synthesizes the individual channel signals by mixing the decoded downmixed channel signal with the artificially generated “non-correlated” signal. At this time, the mixing weight between the signal of the downmixed channel and the non-correlated signal is calculated based on the above parameters.
- the waveform of the output signal output from the decoder is not similar to the waveform of the original input signal, but is perceptually similar to the original input signal.
- MPEG Surround MPS: MPEG Surround
- MPS parameters are also calculated in the QMF region.
- Multi-channel expansion tools are also known as stereo expansion.
- USAC Unified Speech And Audio Codec
- the above tools similar to the AAC method (hereinafter referred to as AAC), LP, TCX, band expansion tool (hereinafter referred to as SBR), and channel are selected according to the characteristics of the input signal.
- the optimum tool is selected from all the enlargement tools (hereinafter referred to as MPS) and used in combination.
- the USAC encoder downmixes a stereo signal into a monaural signal using the MPS tool, and reduces the monophonic signal of the entire band to a narrowband monaural signal using the SBR tool. Furthermore, in order to encode a narrow-band monaural signal, a USAC encoder should analyze the characteristics of a signal frame using a signal classification unit and encode using any of the core codecs (AAC, LP, TCX). To decide. Here, in the USAC, it is important to remove aliasing generated between frames due to codec switching.
- MDCT concatenates consecutive frames and performs window processing on the concatenated signals before performing conversion. This is shown in FIG.
- FIG. 1 is a diagram for explaining the removal of aliasing due to partial overlap in encoding / decoding using MDCT.
- a and b indicate the first half and the second half when the frame 1 is divided into two equal parts, respectively.
- c and d indicate the first half and the second half when the frame 2 is divided into two equal parts, respectively.
- e and f respectively indicate the first half and the second half when the frame 3 is divided into two equal parts.
- the first set of MDCT conversion is performed on signals (a, b, c, d) obtained by combining frames 1 and 2.
- the second set of MDCT conversions is performed on signals (c, d, e, f) obtained by combining frames 2 and 3.
- c and d are partial overlaps (overlap regions).
- equation (1) is a case of MDCT of 1st set
- Formula (2) shows the case of MDCT of 2nd set.
- the window has the following characteristic (3).
- the subscript “R” indicates time reversal / inversion. Specifically, such a relationship can be seen, for example, in the first half cycle of the sine function.
- the decoder performs an inverse modified discrete cosine transform (IMDCT: Inverse Modified Discrete Cosine Transform) on the decoded MDCT coefficients.
- IMDCT Inverse Modified Discrete Cosine Transform
- Equation (5) When the signal shown in Equation (4) is compared with the original signal shown in Equation (1), an aliasing component as shown in Equation (5) below is generated by IMDCT.
- the signal after IMDCT for the second set of MDCTs is expressed by the following equation (6).
- Equation (3) considering the window characteristics shown in Equation (3), the last two terms in Equation (7) are added to the first two terms in Equation (8), so that c and d, which are the original signals, are obtained. can get. That is, the aliasing component is eliminated.
- the frame size is the number of samples N in the encoding based on MDCT
- an inherent MDCT delay (filter delay) of N samples occurs. Therefore, the total delay is 2N samples.
- aliasing can be removed using a forward aliasing removal (FAC) tool.
- FAC forward aliasing removal
- FIG. 2 is a diagram showing the principle of the FAC tool.
- a and b indicate the first half and the second half, respectively, when frame 1 is divided into two equal parts.
- c and d indicate the first half and the second half when the frame 2 is divided into two equal parts, respectively.
- e and f respectively indicate the first half and the second half when the frame 3 is divided into two equal parts.
- LP coding is performed in the first half of frame 1 and the second half of frame 2 (that is, b and c).
- the coding method is switched from LP coding to transform coding, and frame 2 and frame 3 are subjected to transform coding.
- the decoder can completely decode the subframe c using only the encoded subframe c.
- the subframe d is encoded by transform coding (MDCT or TCX)
- the decoder decodes the subframe d as it is, the decoded signal includes an aliasing component. In order to remove such aliasing components, the encoder generates the following first to third signals.
- the encoder first performs inverse MDCT using a local decoder to generate a windowed first signal x.
- d 'and c' are signals obtained by decoding d and c by a local decoder, respectively.
- the encoder applies a second window to the signal c ′′ obtained by decoding the LD-encoded subframe c using a local decoder, and inverts the signal c ′′.
- the signal y is generated.
- the third signal is a zero input response (ZIR: Zero Input Response) obtained by windowing the preceding LP frame, as shown in Expression (11).
- ZIR Zero Input Response
- the zero input response (ZIR) is a process of calculating an output value when a zero input is made to the FIR filter in a state where the state is changing every moment due to the past input in the FIR filter process.
- an aliasing removal (AC) signal is calculated by subtracting the above three signals from the original signal d.
- the AC signal has the following characteristics. When the encoding performance is sufficient and the waveform of the signal after decoding is similar to the waveform of the original signal, as well as Equation (12) is approximated as the following Equation (13).
- the beginning of the subframe of the AC signal is It is.
- the end of the subframe d is w2 ⁇ 1
- the end of the subframe of the AC signal is It is. That is, the AC signal is shaped like a naturally windowed signal that converges to zero on both sides of subframe d.
- the AC signal is used when switching from LP coding to transform coding (MDCT / TCX). In the case of switching from transform coding (MDCT / TCX) to LP coding, a similar AC signal is generated.
- the difference is that the AC signal used in switching from transform coding to LP coding does not have a ZIR component.
- the AC signal used in switching from transform coding to LP coding is not zero at the end adjacent to the LP-coded frame of the subframe, and thus does not have a shape like a windowed signal. The point is also different.
- FIG. 3 is a diagram illustrating an AC signal generation method used in switching from transform coding to LP coding.
- an AC signal is generated in order to remove aliasing components included in subframe c. Specifically, by subtracting the first signal x represented by the equation (14) and the second signal y represented by the equation (15) from the original signal c, the equation (16) is obtained. Asking.
- the total delay which is the total time of the signal processing time and the signal transmission time (network delay), is 30 mm. It must be less than a second (for example, see Non-Patent Document 1). If the echo cancellation processing and network delay account for 20 milliseconds of the total delay, the algorithmic delay allowed in encoding / decoding is about 10 milliseconds.
- the main delay in MPEG USAC is caused by the following 1-3.
- the main delay that occurs in both the encoder and decoder is caused by the large size of the frame.
- the MPEG USAC standard allows a frame size of 768 samples or 1024 samples.
- N the number of samples
- a delay of 2N occurs, and a delay of 1536 or 2048 samples occurs.
- the sampling frequency is 48 kHz, a core MDCT + framing delay of 32 ms or 43 ms respectively occurs.
- the second major delay that occurs in both the encoder and decoder occurs in the QMF analysis and synthesis filter bank for SBR and MPS.
- a conventional filter bank with a symmetric typical window results in a delay of 12 milliseconds at an additional 577 sample delay or 48 kHz sampling frequency.
- the main delay caused by the encoder is a look-ahead delay caused by the signal classification unit of the encoder.
- the signal classification unit analyzes signal transition, timbre, and spectral tilt (signal characteristics), and determines which of the MDCT, LP, and TCX methods should be used to encode the signal. This usually causes a further delay of one frame. The delay is 16 milliseconds or 21 milliseconds if the sampling frequency is 48 kHz.
- the first thing to do in order to achieve ultra-low delay is a significant reduction in frame size.
- the frame size is reduced, in order to reduce the coding efficiency of transform coding, it is more important than ever to use bits efficiently during quantization.
- the aliasing component of the transform-coded frame is combined with the decoded LP signal (for example, Formula (10)).
- the encoder removes aliasing components by generating and encoding an additional aliasing residual signal called an AC signal as described above.
- the code amount of the AC signal should be as small as possible.
- the aliasing component cannot be sufficiently removed even if the AC signal is used.
- the coding method is switched from LP coding to transform coding (MDCT / TCX), based on the ZIR of the preceding LP coded subframe c, the AC signal is first Is calculated to be zero.
- the AC signal is a window-processed signal at first glance, and if a specific quantization method is used, efficient encoding is promoted.
- the AC signal generation method shown in FIG. 2 predicts the start of subframe d based on the ZIR of subframe c, for example, when the signal characteristics change suddenly, it is sufficient.
- the aliasing component cannot be removed.
- the waveform of the AC signal is not smaller than the waveform of the encoded original signal, and the MDCT signal and the LP signal from which aliasing has been removed are similar to the original signal.
- the waveform of the original signal and the waveform of the signal after decoding may be similar, and an AC signal becomes an unnecessary burden during encoding.
- the codec of the present invention based on the overall structure of the MPEG USAC has the following basic configurations 1 to 3 in order to reduce delay.
- the overlap between successive MDCT frames is reduced to further reduce the delay (see, for example, Non-Patent Document 4).
- the recommended number of overlapping samples is 128 samples.
- the basic configuration also uses a composite low delay filter bank with a typical asymmetric window.
- a low-delay QMF filter bank is described in Non-Patent Document 2, is well known, and has already been used in MPEG AAC-ELD (see Non-Patent Document 3).
- the codec of the present invention can realize an algorithm delay of 10 milliseconds.
- a sound signal hybrid encoder includes a signal analysis unit that analyzes a characteristic of a sound signal and determines a coding method of a frame included in the sound signal, and performs LFD (Lapped Frequency Domain) conversion on the frame.
- LFD Lapped Frequency Domain
- An LFD encoder that generates an LFD frame in which the frame is encoded, an LP encoder that generates an LP (Linear Prediction) frame in which the frame is encoded by calculating a linear prediction coefficient of the frame, and the signal
- a switching unit that switches whether the frame is encoded by the LFD encoder or the LP encoder, and is continuous with the LP frame by switching control of the switching unit
- the LFD frame A local decoder that generates a local decode signal including a signal obtained by decoding at least a part of an AC (Aliasing Cancel) target frame, and a signal obtained by decoding at least a part of the LP frame that is continuous with the AC target frame;
- An AC signal generation unit that generates and outputs an AC signal used for removing aliasing that occurs in decoding of the AC target frame using the sound signal and the local decode signal, and the AC signal generation unit includes: When the AC target frame continues immediately after the LP frame, or when the AC target frame is
- the sound signal hybrid encoder can efficiently generate an AC signal by selecting one method from a plurality of methods and generating and outputting an AC signal.
- the AC signal generation unit may generate and output the AC signal according to one method selected from the first method and the second method different from the first method. .
- a quantizer that quantizes the AC signal is further provided, and the AC signal generation unit generates the two AC signals using the first method and the second method, respectively.
- the AC signal of the method used to generate the AC signal having the smaller code amount after quantization by the quantizer among the two generated AC signals may be output.
- the first method uses the zero input response obtained by windowing the LP frame immediately before the AC target frame.
- This is a method for generating a signal
- the second method may be a method for generating the AC signal without using the zero input response.
- the first scheme is a scheme standardized in a unified speech and audio codec (USAC), and the second scheme has a code amount after quantization of an AC signal to be generated.
- a method that is expected to be smaller than the above method may be used.
- the AC signal generation unit selects the first method, and the frame size of the frame included in the sound signal. If is less than the predetermined size, the second method may be selected.
- the AC signal generation unit further includes a quantizer that quantizes the AC signal, and the AC signal generation unit generates the AC signal by the first method, and generates the AC signal by the first method.
- the code amount after quantization by the quantizer is smaller than a predetermined threshold
- the first method is selected, and the AC signal generated by the first method is quantized by the quantizer
- the subsequent code amount is equal to or greater than a predetermined threshold
- the AC signal is further generated by the second method, the AC signal generated by the first method, and the AC signal generated by the second method.
- the AC signal with the smaller code amount after quantization by the quantizer may be output.
- the AC signal generation unit further includes a first AC candidate generator that generates the AC signal in the first scheme, and a second AC candidate that generates the AC signal in the second scheme.
- a candidate generator (1) outputting the AC signal generated by one AC candidate generator selected from the first AC candidate generator and the second AC candidate generator; and (2 And an AC candidate selector that outputs the AC flag indicating which of the first method and the second method is used to output the AC signal.
- an LD (Low Delay) analysis filter bank that generates an input subband signal that is a signal obtained by converting the input signal into a time-frequency domain representation, and a multichannel extension parameter and an A multi-channel extension unit that generates a downmix subband signal, a bandwidth extension unit that generates a bandwidth extension parameter and a narrowband subband signal from the downmix subband signal, and a time frequency of the narrowband subband signal.
- LD Low Delay
- a quantizer for quantizing and the quantity Equalizer may comprise a bitstream multiplexer for transmitting the multiplexed signal and the AC flag quantized.
- the LFD encoder may encode the frame by a TCX method.
- the LFD encoder encodes the frame by MDCT
- the switching unit performs window processing on the frame encoded by the LFD encoder
- the window used for the window processing is the window of the frame. It may be monotonically increasing or monotonically decreasing in a period shorter than half of the length.
- the sound signal hybrid decoder includes an LFD frame encoded by LFD conversion, an LP frame encoded using a linear prediction coefficient, and the LFD frame continuous with the LP frame.
- An audio signal hybrid decoder that decodes an encoded signal including an AC signal for removing aliasing of a certain AC target frame, an ILFD (Inverse Laminated Frequency Domain) decoder that decodes the LFD frame, and the LP
- An LP decoder that decodes a frame; a switching unit that outputs a second narrowband signal in which a frame obtained by performing window processing on the frame decoded by the ILFD decoder and a frame decoded by the LP decoder; Used to generate the AC signal An AC flag that indicates a scheme is obtained, and an AC output signal is generated by adding a signal output from the switching unit, the ILFD decoder, or the LP decoder to the AC signal according to the scheme indicated by the AC flag.
- a bit stream demultiplexer that obtains a bit stream including the quantized encoded signal and the AC flag, and the quantized encoded signal is inversely quantized to generate the code.
- An inverse quantizer that generates a quantized signal
- an LD analysis filter bank that generates a narrowband subband signal by converting the third narrowband signal output from the adder into a time-frequency domain representation,
- a bandwidth extension parameter included in the encoded signal generated by the inverse quantizer By applying a bandwidth extension parameter included in the encoded signal generated by the inverse quantizer to the narrowband subband signal, a high frequency signal is synthesized to generate a subband signal with an extended bandwidth.
- the bandwidth extension decoding unit and the multi-channel extension parameter included in the encoded signal generated by the inverse quantizer are extended by the bandwidth.
- the multi-channel extension decoding unit that generates a multi-channel sub-band signal and a multi-channel signal that is a signal obtained by converting the multi-channel sub-band signal from a time-frequency representation into a time-domain representation And an LD synthesis filter bank.
- the AC signal is generated by a first method or a second method different from the first method
- the AC output signal generation unit is further generated by the first method.
- a first AC candidate generator that generates the AC output signal corresponding to an AC signal
- a second AC candidate generator that generates the AC output signal corresponding to the AC signal generated by the second scheme
- an AC candidate that selects either the first AC candidate generator or the second AC candidate generator according to the AC flag and causes the selected AC candidate generator to generate the AC output signal.
- a selector is
- FIG. 4 is a block diagram showing a configuration of the sound signal hybrid encoder according to the first embodiment.
- the sound signal hybrid encoder 100 includes an LD (Low Delay) analysis filter bank 400, an MPS encoder 401, an SBR encoder 402, an LD synthesis filter bank 403, a signal analysis unit 404, and a switching unit 405.
- the sound signal hybrid encoder 100 includes an audio encoder 406 (hereinafter simply referred to as MDCT encoder 406) using an MDCT filter bank, an LP encoder 408, and a TCX encoder 410.
- the sound signal hybrid encoder 100 also includes a plurality of quantizers 407, 409, 411, 414, 416, and 417, a bit stream multiplexer 415, a local decoder 412, and an AC signal generation unit 413.
- the LD analysis filter bank 400 generates an input subband signal represented by a hybrid time / frequency expression by performing low delay analysis filter bank processing on an input signal (multi-channel input signal).
- Specific examples of the low-delay filter bank include the low-delay QMF filter bank shown in Non-Patent Document 2, but are not limited thereto.
- the MPS encoder 401 (multi-channel extension unit) converts the input subband signal generated by the LD analysis filter bank 400 into a downmix subband signal, which is a smaller set of signals, and generates an MPS parameter.
- the downmix subband signal here means a full-band downmix subband signal.
- the input signal is a stereo signal
- only one downmix subband signal is generated.
- the MPS parameter is quantized by the quantizer 416.
- the SBR encoder 402 (bandwidth extension unit) downsamples the downmix subband signal into a set of narrowband subband signals. In this process, SBR parameters are generated.
- the SBR parameter is quantized by the quantizer 417.
- the LD synthesis filter bank 403 reconverts the narrowband subband signal into the time domain and generates a first narrowband signal (sound signal).
- the low-delay QMF filter bank disclosed in Non-Patent Document 2 can be used.
- the signal analysis unit 404 analyzes the characteristics of the first narrowband signal and selects an optimum encoder from among the MDCT encoder 406, the LP encoder 408, and the TCX encoder 410 in order to encode the first narrowband signal. select.
- the MDCT encoder 406 and the TCX encoder 410 are also referred to as an LFD (Lapped Frequency Domain) encoder.
- the signal analysis unit 404 can select the MDCT encoder 406 for the first narrowband signal that is very tonal overall and has a small variation in spectral tilt.
- the signal analysis unit 404 selects the LP encoder 408 if the first narrowband signal has strong tone characteristics in the low frequency region and the spectral tilt greatly fluctuates.
- the TCX encoder 410 is selected for the first narrowband signal that does not meet any of the above criteria.
- the signal analysis unit 404 analyzes the characteristics of the first narrowband signal (sound signal) and determines the encoding method of the frame included in the first narrowband signal. May be.
- the switching unit 405 performs switching control of whether the frame is encoded by the LFD encoder (MDCT encoder 406 or TCX encoder 410) or the LP encoder 408 according to the determination result of the signal analysis unit 404. Specifically, the switching unit 405 selects a sample subset of the encoding target frames (past and current frames) included in the first narrowband signal based on the encoder selected according to the determination result of the signal analysis unit 404. Select and generate a second narrowband signal from the sample subset for subsequent encoding.
- the LFD encoder MDCT encoder 406 or TCX encoder 410
- the switching unit 405 when selecting the MDCT, the switching unit 405 performs window processing on the selected sample subset.
- FIG. 5 is a diagram showing the shape of a window with a small overlap. As shown in FIG. 5, the desirable window shape in the sound signal hybrid encoder 100 has a small overlap. In Embodiment 1, the switching unit 405 performs such window processing when selecting MDCT.
- the window shown in FIG. 1 and the like monotonously increases in a half period of the frame length and monotonously decreases in a half period of the frame length.
- the window shown in FIG. 5 monotonously increases in a period shorter than half the frame length and monotonically decreases in a period shorter than half the frame length. This means that the overlap is small.
- the MDCT encoder 406 encodes the encoding target frame by MDCT.
- the LP encoder 408 encodes the encoding target frame by calculating a linear prediction coefficient of the encoding target frame.
- the LP encoder 408 is, for example, a CELP system such as ACELP (Algebraic Code Excited Linear Prediction), VSELP (Vector Sum Excluded Linear Prediction), or the like.
- the TCX encoder 410 encodes the encoding target frame by the TCX method. Specifically, the TCX encoder 410 calculates a linear prediction coefficient of the encoding target frame, encodes the encoding target frame by performing MDCT processing on the residual of the linear prediction coefficient.
- a frame encoded by the MDCT encoder 406 or the TCX encoder 410 is described as an LFD frame
- a frame encoded by the LP encoder is described as an LP frame.
- An LFD frame in which aliasing occurs due to switching of the switching unit 405 is referred to as an AC target frame.
- the AC target frame is an LFD frame that is continuously encoded with the LP frame by the switching control of the switching unit 405.
- the AC target frame includes a case where the AC target frame is a frame encoded immediately after the LP frame (a frame immediately following the LP frame) and a frame where the AC target frame is encoded immediately before the LP frame (a sequence immediately before the LP frame). There are two types of frames.
- Quantizers 407, 409, and 411 quantize the encoder output. Specifically, the quantizer 407 quantizes the output of the MDCT encoder 406, the quantizer 409 quantizes the output of the LP encoder 408, and the quantizer 411 quantizes the output of the TCX encoder 410. .
- the quantizer 407 is a combination of a dB-step quantizer and Huffman coding
- the quantizer 409 and the quantizer 411 are vector quantizers.
- the local decoder 412 acquires the AC target frame and the LP frame continuous with the AC target frame from the bit stream multiplexer 415, and generates a local decode signal obtained by decoding at least a part of the acquired frame.
- the local decode signal is a narrowband signal decoded by the local decoder 412. Specifically, the d ′ and c ′ in the equation (10), the c ′′ in the equation (11), and the equation (15) described above. D ′′ and the like.
- the AC signal generation unit 413 generates and outputs an AC signal used for removing aliasing that occurs in decoding of the AC target frame, using the first signal and the first narrowband signal. In other words, the AC signal generation unit 413 generates an AC signal by using the decoded past data (past frame) provided by the local decoder 412.
- AC signal generation section 413 generates a plurality of AC signals using a plurality of AC processes (methods), and which AC signal among the generated AC signals is encoded. Check if the bit efficiency is better. Furthermore, the AC signal generation unit 413 selects an AC signal with better bit efficiency in encoding, and outputs the selected AC signal and an AC flag indicating the AC process used to generate the AC signal. Note that the selected AC signal is quantized by the quantizer 414.
- the bit stream multiplexer 415 writes all encoded frames and sub information to the bit stream. That is, the bit stream multiplexer 415 multiplexes the signals quantized by the quantizers 407, 409, 411, 414, 416, and 417, and the AC flag, and transmits them.
- FIG. 6 is a block diagram illustrating an example of the configuration of the AC signal generation unit 413.
- the AC signal generation unit 413 includes a first AC candidate generator 700, a second AC candidate generator 701, and an AC candidate selector 702.
- Each of the first AC candidate generator 700 and the second AC candidate generator 701 uses the first narrowband signal and the local decode signal to finally output the AC signal output from the AC signal generation unit.
- a candidate AC candidate is calculated.
- the AC candidate generated by the first AC candidate generator 700 may be simply referred to as AC
- the AC candidate generated by the second AC candidate generator 701 may be simply referred to as AC2.
- the first AC candidate generator 700 generates an AC candidate (AC signal) using the first scheme
- the second AC candidate generator is a second scheme different from the first scheme.
- an AC candidate (AC signal) is generated by the method described above. Details of the first method and the second method will be described later.
- the AC candidate selector 702 selects one AC candidate of AC and AC2 based on a predetermined condition.
- the predetermined condition is a code amount when each AC candidate is quantized.
- the AC candidate selector 702 outputs the selected AC candidate and an AC flag indicating whether the selected AC candidate is generated using the first method or the second method.
- FIG. 7 is a flowchart showing an example of the operation of the AC signal generation unit 413.
- the first narrowband signal is encoded while the switching unit 405 switches the encoding method according to the determination result of the signal analysis unit 404 (in S101 and S102). No).
- the AC signal generation unit 413 first generates an AC signal by the first method (S103). Specifically, the first AC candidate generator 700 generates an AC using the first narrowband signal and the local decode signal.
- the AC signal generation unit 413 generates an AC signal by the second method (S104). Specifically, the second AC candidate generator 701 generates AC2 using the first narrowband signal and the local decode signal.
- the AC signal generation unit 413 selects one AC candidate (AC signal) of AC and AC2 (S105). Specifically, AC candidate selector 702 selects an AC candidate having a small code amount after quantization by quantizer 414 from AC and AC2.
- the AC signal generation unit 413 outputs the AC candidate (AC signal) selected in step S105 and the AC flag indicating the generation method of the AC candidate (S106).
- the AC signal generation unit 413 is one of the AC signal generated by the first method and the AC signal generated by the second method different from the first method based on a predetermined condition. Select either one and output.
- the AC signal generation unit 413 outputs an AC flag indicating whether the output AC signal is generated using the first method or the second method.
- the AC signal generation unit 413 performs two operations in each of the case where the AC target frame is a frame encoded immediately after the LP frame and the case where the AC target frame is a frame encoded immediately before the LP frame. An AC signal is generated by the method.
- the first method and the second method will be described in detail.
- the AC signal generation method is not limited to these specific examples, and It may be a method.
- the first method is an AC process normally used in MPEG USAC as already described with reference to FIG. 2, and is a method of generating an AC candidate (AC) using Expression (12). That is, the first AC candidate generator 700 generates an AC candidate (AC) using Expression (12).
- the AC signal generation unit 413 further generates an AC signal using the second method without using ZIR.
- the second method is desirably a method in which the code amount after quantization of the generated AC signal is expected to be smaller than that of the first method (a method in which the code amount is prioritized over aliasing removal).
- a method of reducing the quantization bit for quantizing the signal from the number of normal quantization bits, or when expressing an AC signal with an LPC filter Various methods such as a method of reducing the order of the filter coefficient can be taken.
- FIG. 8 is a diagram showing a second method of AC signal generation used in switching from LP encoding to transform encoding. That is, the second AC candidate generator 701 generates an AC candidate (AC2) using the following equation (17).
- AC2 is highly likely to be a bit-efficient signal than AC.
- the AC2 signal described above is more likely to have a small signal level fluctuation than the AC, and when quantizing such a signal, even if the number of bits allocated for quantization is thinned out to some extent, the quantization accuracy is unlikely to deteriorate. For this reason, particularly when the waveform of the original signal d and the signal d ′ after decoding is likely to be similar, or when the encoding conditions tend to be higher in bit rate and smaller in the difference between d and d ′. , AC2 is likely to be a bit more efficient signal than AC.
- the first method is an AC process normally used in MPEG USAC, as already described with reference to FIG. 3, and generates an AC candidate (AC) using Expression (16). That is, the first AC candidate generator 700 generates an AC candidate (AC) using Expression (16).
- the AC signal generation unit 413 further generates an AC signal using the second method.
- FIG. 9 is a diagram showing a second method of AC signal generation used in switching from transform coding to LP coding. That is, the second AC candidate generator 701 generates an AC candidate (AC2) using the following equation (20).
- AC2 is a signal to be encoded with a bit efficiency higher than that of AC.
- bit efficiency is high, the waveforms of the original signal c and the decoded signal c ′ are likely to be similar.
- the simplest selection method of the AC candidate selector 702 is a method of selecting both AC and AC2 through the quantizer 414 and selecting an AC candidate with a small number of bits (code amount) necessary for encoding.
- AC candidate selection method is not limited to such a method, and other methods may be used.
- AC candidate selector 702 (AC signal generation unit 413) has a case where the frame size of the frame included in the first narrowband signal is larger than a predetermined size (for example, when the code amount of the frame is large). If the first method is selected and the frame size of the frame included in the first narrowband signal is equal to or smaller than a predetermined size (for example, when the code amount of the frame is small), the second method is used. May be selected.
- a predetermined size for example, when the code amount of the frame is large.
- the AC signal generation unit 413 when the AC signal generation unit 413 generates an AC signal by the first method and the code amount after the quantization by the quantizer of the AC signal generated by the first method is smaller than a predetermined threshold value May select the first method.
- the AC signal generation unit 413 further generates the AC signal by the second method. Generate. As a result, the AC signal generation unit 413 generates an AC signal having a smaller code amount after quantization by the quantizer 414 out of the AC signal generated by the first method and the AC signal generated by the second method. It may be output.
- the sound signal hybrid encoder according to Embodiment 1 can be any encoder that includes at least an overlap frequency domain transform encoder (LFD encoder, for example, MDCT, TCX) and a linear prediction encoder (LP encoder). You may implement
- the sound signal hybrid encoder according to Embodiment 1 may be realized as an encoder including only a TCX encoder and an LP encoder.
- the bandwidth extension tool and the multi-channel extension tool in the first embodiment are arbitrary low bit rate tools and are not essential components.
- the sound signal hybrid encoder according to Embodiment 1 may be realized as an encoder that does not have a subset of these tools or all of these tools.
- the AC signal generation unit 413 may generate and output an AC signal according to one method selected from a plurality of methods, and output an AC flag indicating the selected one method.
- the AC flag in this case may be any flag as long as it can distinguish one method from a plurality of methods, for example, composed of a plurality of bits.
- the sound signal hybrid encoder according to Embodiment 1 can adaptively select an AC signal with good bit efficiency at the time of encoding. That is, according to the sound signal hybrid encoder according to the first embodiment, an efficient encoder with a low bit rate can be realized. Such a bit rate reduction effect is particularly noticeable when codec switching is fast and for low-delay encoders that require many bits for encoding.
- FIG. 10 is a block diagram showing a configuration of the sound signal hybrid decoder according to the second embodiment.
- the sound signal hybrid decoder 200 includes an LD analysis filter bank 503, an LD synthesis filter bank 500, an MPS decoder 501, an SBR decoder 502, and a switching unit 505.
- the sound signal hybrid decoder 200 includes an audio decoder 506 using an IMDCT filter bank (hereinafter simply referred to as an IMDCT decoder 506), an LP decoder 508, a TCX decoder 510, and inverse quantizers 507, 509, and 511. 514, 516, and 517, a bit stream demultiplexer 515, and an AC output signal generator.
- the bitstream demultiplexer 515 includes one of the IMDCT decoder 506, the LP decoder 508, and the TCX decoder and the corresponding dequantizers 507, 509, and 511 based on the core coder indicator of the bitstream. One of them is selected.
- the bit stream demultiplexer 515 dequantizes the bit stream data using the selected inverse quantizer, and decodes the bit stream data using the selected decoder.
- the outputs of the inverse quantizers 507, 509, and 511 are input to the IMDCT decoder 506, the LP decoder 508, or the TCX decoder 510, respectively, and further converted into the time domain in the decoder to generate the first narrowband signal.
- the IMDCT decoder 506 and the TCX decoder 510 are also referred to as an ILFD (Inverse Lapped Frequency Domain) decoder.
- ILFD Inverse Lapped Frequency Domain
- the switching unit 505 first aligns the frames of the first narrowband signal according to the time relationship with the past sample (according to the encoded order).
- the switching unit 505 performs window processing on the decoding target frame and adds an overlapping portion.
- the window used is the same as that used by the encoder shown in FIG. 5, and the window shown in FIG. 5 has a short overlap region in order to achieve low delay.
- the switching unit 505 switches the codec, the aliasing component around the frame boundary of the AC target frame (hereinafter also referred to as a switching frame) matches the signal shown in FIG. 2 and FIG. In addition, the switching unit 505 generates a second narrowband signal.
- the AC signal included in the bit stream is inversely quantized by the inverse quantizer 514.
- the AC flag included in the bitstream determines the next processing method of the AC signal, such as generation of an additional antialiasing component using a past narrowband signal.
- the AC output signal generation unit 513 sums the AC signal that has been dequantized according to the AC flag and the AC component (x, y, z, etc.) generated by the switching unit 505, thereby generating an AC_out signal (AC output). Signal).
- the adder 504 adds the AC_out signal to the second narrowband signal that is aligned by the switching unit 505 and to which the overlap region is added, and removes aliasing components at the frame boundary of the AC target frame.
- a signal from which aliasing components are removed is referred to as a third narrowband signal.
- the LD analysis filter bank 503 processes the third narrowband signal and generates a narrowband subband signal represented by a hybrid time / frequency representation.
- the low-delay QMF filter bank shown in Non-Patent Document 2 can be cited as a candidate, but is not limited thereto.
- the SBR decoder 502 (bandwidth extension decoding unit) expands the narrowband subband signal to a higher frequency region.
- the expansion method is either a “patch-up” method in which the low frequency band is copied to a higher frequency band or a “stretch-up” method in which the harmonics in the low frequency band are expanded based on the principle of the phase vocoder.
- the characteristics (especially energy, noise floor, and tone color) of the expanded (synthesized) high frequency region are adjusted based on the SBR parameters inversely quantized by the inverse quantizer 517. As a result, a subband signal with an expanded bandwidth is generated.
- the MPS decoder 501 (multi-channel extension decoding unit) generates a multi-channel sub-band signal from the sub-band signal whose bandwidth is extended, using the MPS parameter that is inversely quantized by the inverse quantizer 516. For example, the MPS decoder 501 mixes the non-correlated signal and the downmix signal based on the inter-channel correlation parameter. The MPS decoder 501 further adjusts the amplitude and phase of the mixed signal based on the inter-channel level difference parameter and the inter-channel phase difference parameter to generate a multi-channel subband signal.
- the LD synthesis filter bank 500 reconverts the multi-channel subband signal from the hybrid time / frequency domain to the time domain, and outputs a multi-channel signal in the time domain.
- FIG. 11 is a block diagram illustrating an example of the configuration of the AC output signal generation unit 513.
- the AC output signal generation unit 513 includes a first AC candidate generator 800, a second AC candidate generator 801, and AC candidate selectors 802 and 803.
- Each of first AC candidate generator 800 and second AC candidate generator 801 calculates an AC candidate (AC output signal, AC_out) using the dequantized AC signal and the decoded narrowband signal. To do.
- the AC candidate selectors 802 and 803 select one of the first AC candidate generator 800 and the second AC candidate generator 801 based on the AC flag in order to remove aliasing.
- FIG. 12 is a flowchart illustrating an example of the operation of the AC output signal generation unit 513.
- the sound signal hybrid decoder 200 performs a process of decoding the acquired frame according to the encoding method of the frame (No in S201 and S202).
- the AC output signal generation unit 513 When the AC output signal generation unit 513 acquires the AC flag (Yes in S202), the AC output signal generation unit 513 performs processing according to the AC flag and generates an AC_out signal (S203).
- the AC candidate selectors 802 and 803 select an AC candidate generator indicated by the AC flag.
- the AC candidate selectors 802 and 803 select the first AC candidate generator 800 when the AC flag indicates the first scheme.
- the AC candidate selectors 802 and 803 select the second AC candidate generator 801 when the AC flag indicates the second method.
- the AC output signal generation unit 513 (AC candidate selectors 802 and 803) generates an AC_out signal using the selected AC candidate generator.
- the AC output signal generation unit 513 causes the selected AC candidate generator to generate an AC_out signal.
- the first AC candidate generator 800 generates a first AC_out signal.
- the second AC candidate generator 801 generates a second AC_out signal.
- the adder 504 adds the AC_out signal output from the AC output signal generation unit 513 to the second narrowband signal output from the switching unit 505 to remove aliasing (S204).
- an AC_out signal generation method (calculation method) corresponding to the example shown in Embodiment 1 is shown; however, the AC_out signal generation method is not limited to such a specific example. Such a method may be used.
- the first AC candidate generator 800 calculates the first AC_out signal as follows.
- the second AC candidate generator 801 calculates the second AC_out signal as follows.
- x, y, and z are narrowband signals subjected to the following window processing.
- x is a signal that the switching unit 505 performs time alignment and window processing.
- y is a signal obtained by decoding the preceding LP frame, which is inverted by the switching unit 505 by multiplying two windows, and matches the equation (10).
- z is the ZIR of the preceding LP frame that has been windowed by the switching unit 505, and coincides with Equation (11).
- the first AC candidate generator 800 calculates the first AC_out signal as follows.
- the second AC candidate generator 801 calculates the second AC_out signal as follows.
- x is a signal that is time-aligned and windowed by the switching unit 505.
- y is a signal obtained when the switching unit 505 inverts two windows to invert and decodes the subsequent LP frame, and coincides with Expression (15).
- the AC candidate selectors 802 and 803 are configured to use the first AC candidate generator 800 or the second AC candidate according to the AC flag.
- the generator 801 is activated and outputs AC_out1 or AC_out2.
- the sound signal hybrid decoder 200 can remove the aliasing component of the signal encoded by the sound signal hybrid encoder 100 according to Embodiment 1.
- the sound signal hybrid decoder according to the second embodiment can be any decoder as long as it includes at least an overlap frequency domain transform decoder (ILFD decoder, for example, MDCT, TCX) and a linear prediction decoder (LP decoder). It may be realized as a decoder having a configuration.
- the sound signal hybrid decoder according to Embodiment 2 may be realized as a decoder including only a TCX decoder and an LP decoder.
- the bandwidth extension tool and the multi-channel extension tool in the second embodiment are arbitrary low bit rate tools and are not essential components.
- the sound signal hybrid decoder according to Embodiment 2 may be realized as a subset of these tools or a decoder that does not have all of these tools.
- the signal encoded by the sound signal hybrid encoder according to the first embodiment can be appropriately decoded according to the AC flag.
- the sound signal hybrid encoder according to Embodiment 1 adaptively selects an AC signal with good bit efficiency at the time of encoding. For this reason, the sound signal hybrid decoder according to the second embodiment realizes an efficient decoder with a low bit rate.
- Such a bit rate reduction effect is particularly noticeable when codec switching is fast and for low-delay encoders that require many bits for encoding.
- each of the above devices can be realized by a computer system including a microprocessor, a ROM, a RAM, a hard disk unit, a display unit, a keyboard, a mouse, and the like.
- a computer program is stored in the RAM or the hard disk unit.
- Each device achieves its functions by the microprocessor operating according to the computer program.
- the computer program is configured by combining a plurality of instruction codes indicating instructions for the computer in order to achieve a predetermined function.
- a part or all of the components constituting each of the above devices may be configured by one system LSI (Large Scale Integration).
- the system LSI is an ultra-multifunctional LSI manufactured by integrating a plurality of components on a single chip, and specifically, a computer system including a microprocessor, ROM, RAM, and the like. .
- a computer program is stored in the ROM.
- the system LSI achieves its functions by the microprocessor loading a computer program from the ROM to the RAM and performing operations such as operations in accordance with the loaded computer program.
- Part or all of the constituent elements constituting each of the above devices may be configured from an IC card or a single module that can be attached to and detached from each device.
- the IC card or module is a computer system that includes a microprocessor, ROM, RAM, and the like.
- the IC card or the module may include the super multifunctional LSI described above.
- the IC card or the module achieves its functions by the microprocessor operating according to the computer program. This IC card or this module may have tamper resistance.
- the present invention may be realized by the method described above. Further, these methods may be realized by a computer program realized by a computer, or may be realized by a digital signal consisting of a computer program.
- the present invention also relates to a computer readable recording medium such as a flexible disk, hard disk, CD-ROM, MO, DVD, DVD-ROM, DVD-RAM, BD (Blu-ray (registered trademark)). ) Disc), or recorded in a semiconductor memory or the like. Moreover, you may implement
- a computer program or a digital signal may be transmitted via an electric communication line, a wireless or wired communication line, a network represented by the Internet, a data broadcast, or the like.
- the present invention is also a computer system including a microprocessor and a memory.
- the memory stores a computer program, and the microprocessor may operate according to the computer program.
- program or digital signal may be recorded on a recording medium and transferred, or the program or digital signal may be transferred via a network or the like, and may be implemented by another independent computer system.
- this invention is not limited to these embodiment or its modification. Unless it deviates from the gist of the present invention, various modifications conceived by those skilled in the art are applied to the present embodiment or the modification thereof, or a form constructed by combining different embodiments or components in the modification. Included within the scope of the present invention.
- the present invention relates to an audio book, a broadcasting system, a portable media device, a portable communication terminal (for example, a smartphone, a tablet computer), a video conferencing apparatus, and a sign of a signal including audio content such as music performance on a network. It is used for applications related to conversion.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
従来の音声圧縮技術は、大きく分類すれば、オーディオコーデックとスピーチコーデックとの2つに分けられる。
実施の形態1では、音信号ハイブリッドエンコーダについて説明する。
実施の形態2では、音信号ハイブリッドデコーダについて説明する。
なお、本発明を上記実施の形態に基づいて説明してきたが、本発明は、上記の実施の形態に限定されないのはもちろんである。以下のような場合も本発明に含まれる。
200 音信号ハイブリッドデコーダ
400、503 LD解析フィルタバンク
401 MPSエンコーダ
402 SBRエンコーダ
403、500 LD合成フィルタバンク
404 信号解析部
405、505 切替部
406 MDCTエンコーダ
407、409、411、414、416、417 量子化器
408 LPエンコーダ
410 TCXエンコーダ
412 ローカルデコーダ
413 AC信号生成部
415 ビットストリームマルチプレクサ
501 MPSデコーダ
502 SBRデコーダ
504 加算器(加算部)
506 IMDCTデコーダ
507、509、511、514、516、517 逆量子化器
508 LPデコーダ
510 TCXデコーダ
513 AC出力信号生成部
515 ビットストリームデマルチプレクサ
700、800 第1のAC候補生成器
701、801 第2のAC候補生成器
702、802、803 AC候補選択器
Claims (20)
- 音信号の特性を解析し、前記音信号に含まれるフレームの符号化方法を判断する信号解析部と、
前記フレームをLFD(Lapped Frequency Domain)変換することによって当該フレームを符号化したLFDフレームを生成するLFDエンコーダと、
前記フレームの線形予測係数を算出することによって当該フレームを符号化したLP(Linear Prediction)フレームを生成するLPエンコーダと、
前記信号解析部の判断結果に応じて、前記フレームを前記LFDエンコーダによって符号化するか、前記LPエンコーダによって符号化するかの切替を行う切替部と、
前記切替部の切替制御によって前記LPフレームと連続する前記LFDフレームであるAC(Aliasing Cancel)対象フレームの少なくとも一部を復号した信号と、前記AC対象フレームに連続する前記LPフレームの少なくとも一部を復号した信号とを含むローカルデコード信号を生成するローカルデコーダと、
前記AC対象フレームの復号において生じるエイリアシングの除去に用いられるAC信号を、前記音信号及び前記ローカルデコード信号を用いて生成し、出力するAC信号生成部とを備え、
前記AC信号生成部は、前記AC対象フレームが前記LPフレームの直後に連続する場合、または前記AC対象フレームが前記LPフレームの直前に連続するフレームである場合において、(1)複数の方式の中から選択した1つの方式にしたがって、前記AC信号を生成して出力し、かつ、(2)前記選択した1つの方式を示すACフラグを出力する
音信号ハイブリッドエンコーダ。 - 前記AC信号生成部は、第1の方式及び前記第1の方式とは異なる第2の方式の中から選択した1つの方式にしたがって前記AC信号を生成して出力する
請求項1に記載の音信号ハイブリッドエンコーダ。 - さらに、前記AC信号を量子化する量子化器を備え、
前記AC信号生成部は、前記第1の方式及び前記第2の方式のそれぞれを用いて2つの前記AC信号を生成し、生成した2つの前記AC信号のうち、前記量子化器による量子化後の符号量が小さいほうの前記AC信号の生成に用いられた方式の前記AC信号を出力する
請求項2に記載の音信号ハイブリッドエンコーダ。 - 前記AC対象フレームが前記LPフレームの直後に連続するフレームである場合、
前記第1の方式は、前記AC対象フレームの直前のLPフレームを窓処理したゼロ入力応答を用いて前記AC信号を生成する方式であり、
前記第2の方式は、前記ゼロ入力応答を用いることなく前記AC信号を生成する方式である
請求項2または3に記載の音信号ハイブリッドエンコーダ。 - 前記第1の方式は、USAC(Unified Speech And Audio Codec)において規格化された方式であり、
前記第2の方式は、生成されるAC信号の量子化後の符号量が前記第1の方式よりも小さくなることが見込まれる方式である
請求項2~4のいずれか1項に記載の音信号ハイブリッドエンコーダ。 - 前記AC信号生成部は、前記音信号に含まれるフレームのフレームサイズが所定の大きさよりも大きい場合は、前記第1の方式を選択し、前記音信号に含まれるフレームのフレームサイズが前記所定の大きさ以下の場合は、前記第2の方式を選択する
請求項5に記載の音信号ハイブリッドエンコーダ。 - さらに、前記AC信号を量子化する量子化器を備え、
前記AC信号生成部は、前記第1の方式で前記AC信号を生成し、前記第1の方式で生成した前記AC信号の前記量子化器による量子化後の符号量が所定の閾値よりも小さい場合は、前記第1の方式を選択し、
前記第1の方式で生成した前記AC信号の前記量子化器による量子化後の符号量が所定の閾値以上である場合は、さらに前記第2の方式で前記AC信号を生成し、前記第1の方式で生成した前記AC信号及び前記第2の方式で生成した前記AC信号のうち、前記量子化器による量子化後の符号量が小さいほうの前記AC信号を出力する
請求項2~6のいずれか1項に記載の音信号ハイブリッドエンコーダ。 - 前記AC信号生成部は、さらに、
前記第1の方式で前記AC信号を生成する第1のAC候補生成器と、
前記第2の方式で前記AC信号を生成する第2のAC候補生成器と、
(1)前記第1のAC候補生成器及び前記第2のAC候補生成器のうちから選択した1つのAC候補生成器が生成する前記AC信号を出力し、かつ、(2)出力される前記AC信号が前記第1の方式及び前記第2の方式のいずれの方式を用いて生成されたかを示す前記ACフラグを出力するAC候補選択器とを備える
請求項2~7のいずれか1項に記載の音信号ハイブリッドエンコーダ。 - さらに、
入力信号に対して時間周波数領域表現に変換した信号である入力サブバンド信号を生成するLD(Low Delay)解析フィルタバンクと、
前記入力サブバンド信号から、マルチチャンネル拡張パラメータ及びダウンミックスサブバンド信号を生成するマルチチャンネル拡張部と、
前記ダウンミックスサブバンド信号から、帯域幅拡張パラメータ及び狭帯域サブバンド信号を生成する帯域幅拡張部と、
前記狭帯域サブバンド信号を時間周波数表現から時間領域表現に変換した信号である前記音信号を生成するLD合成フィルタバンクと、
前記マルチチャンネル拡張パラメータ、前記帯域幅拡張パラメータ、出力された前記AC信号、前記LFDフレーム、及び前記LPフレームを量子化する量子化器と、
前記量子化器が量子化した信号及び前記ACフラグを多重化して送信するビットストリームマルチプレクサとを備える
請求項1~8のいずれか1項に記載の音信号ハイブリッドエンコーダ。 - 前記LFDエンコーダは、TCX方式によって前記フレームを符号化する
請求項1~9のいずれか1項に記載の音信号ハイブリッドエンコーダ。 - 前記LFDエンコーダは、MDCTによって前記フレームを符号化し、
前記切替部は、前記LFDエンコーダが符号化する前記フレームに対し窓処理を行い、
前記窓処理に用いられる窓は、前記フレームの長さの2分の1よりも短い期間において単調増加または単調減少する
請求項1~10のいずれか1項に記載の音信号ハイブリッドエンコーダ。 - LFD変換により符号化されたLFDフレームと、線形予測係数を用いて符号化されたLPフレームと、前記LPフレームと連続する前記LFDフレームであるAC対象フレームのエイリアシングの除去を行うためのAC信号とが含まれる符号化信号を復号する音信号ハイブリッドデコーダであって、
前記LFDフレームを復号するILFD(Inverse Lapped Frequency Domain)デコーダと、
前記LPフレームを復号するLPデコーダと、
前記ILFDデコーダが復号したフレームに窓処理を行ったフレームと、前記LPデコーダが復号したフレームとを順番に整列した第2の狭帯域信号を出力する切替部と、
前記AC信号の生成に用いられた方式を示すACフラグを取得し、前記ACフラグが示す方式に応じて、前記切替部、前記ILFDデコーダ、または前記LPデコーダから出力される信号を前記AC信号に加算したAC出力信号を生成するAC出力信号生成部と、
前記第2の狭帯域信号のうちの前記AC対象フレームに相当する部分に、前記AC出力信号を加算した第3の狭帯域信号を出力する加算部とを備える
音信号ハイブリッドデコーダ。 - さらに、
量子化された前記符号化信号と、前記ACフラグとが含まれるビットストリームを取得するビットストリームデマルチプレクサと、
前記量子化された前記符号化信号を逆量子化して前記符号化信号を生成する逆量子化器と、
前記加算部から出力される前記第3の狭帯域信号を時間周波数領域表現に変換することにより、狭帯域サブバンド信号を生成するLD解析フィルタバンクと、
前記逆量子化器により生成された符号化信号に含まれる帯域幅拡張パラメータを前記狭帯域サブバンド信号に適用することにより、高周波信号を合成し、帯域幅が拡張されたサブバンド信号を生成する帯域幅拡張復号部と、
前記逆量子化器により生成された符号化信号に含まれるマルチチャンネル拡張パラメータを前記帯域幅が拡張されたサブバンド信号に適用することにより、マルチチャンネルサブバンド信号を生成するマルチチャンネル拡張復号部と、
前記マルチチャンネルサブバンド信号を時間周波数表現から時間領域表現に変換した信号であるマルチチャンネル信号を生成するLD合成フィルタバンクとを備える
請求項12に記載の音信号ハイブリッドデコーダ。 - 前記AC信号は、第1の方式または前記第1の方式とは異なる第2の方式によって生成され、
前記AC出力信号生成部は、さらに、
前記第1の方式で生成された前記AC信号に対応する前記AC出力信号を生成する第1のAC候補生成器と、
前記第2の方式で生成された前記AC信号に対応する前記AC出力信号を生成する第2のAC候補生成器と、
前記ACフラグに応じて、前記第1のAC候補生成器及び前記第2のAC候補生成器のいずれか一方を選択し、選択したAC候補生成器に前記AC出力信号を生成させるAC候補選択器とを備える
請求項12または13に記載の音信号ハイブリッドデコーダ。 - 音信号の特性を解析し、前記音信号に含まれるフレームの符号化方法を判断する信号解析ステップと、
前記フレームをLFD(Lapped Frequency Domain)変換することによって当該フレームを符号化したLFDフレームを生成するLFDエンコードステップと、
前記フレームの線形予測係数を算出することによって当該フレームを符号化したLP(Linear Prediction)フレームを生成するLPエンコードステップと、
前記信号解析ステップの判断結果に応じて、前記フレームを前記LFDエンコードステップにおいて符号化するか、前記LPエンコードステップにおいて符号化するかの切替を行う切替ステップと、
前記切替ステップの切替制御によって前記LPフレームと連続する前記LFDフレームであるAC(Aliasing Cancel)対象フレームの少なくとも一部を復号した信号と、前記AC対象フレームに連続する前記LPフレームの少なくとも一部を復号した信号とを含むローカルデコード信号を生成するローカルデコードステップと、
前記AC対象フレームの復号において生じるエイリアシングの除去に用いられるAC信号を、前記音信号及び前記ローカルデコード信号を用いて生成し、出力するAC信号生成ステップとを含み、
前記AC信号生成ステップでは、前記AC対象フレームが前記LPフレームの直後に連続する場合、または前記AC対象フレームが前記LPフレームの直前に連続するフレームである場合において、(1)複数の方式の中から選択した1つの方式にしたがって、前記AC信号を生成して出力し、かつ、(2)前記選択した1つの方式を示すACフラグを出力する
音信号符号化方法。 - 請求項15に記載の音信号符号化方法をコンピュータに実行させるためのプログラム。
- 音信号の特性を解析し、前記音信号に含まれるフレームの符号化方法を判断する信号解析部と、
前記フレームをLFD(Lapped Frequency Domain)変換することによって当該フレームを符号化したLFDフレームを生成するLFDエンコーダと、
前記フレームの線形予測係数を算出することによって当該フレームを符号化したLP(Linear Prediction)フレームを生成するLPエンコーダと、
前記信号解析部の判断結果に応じて、前記フレームを前記LFDエンコーダによって符号化するか、前記LPエンコーダによって符号化するかの切替を行う切替部と、
前記切替部の切替制御によって前記LPフレームと連続する前記LFDフレームであるAC(Aliasing Cancel)対象フレームの少なくとも一部を復号した信号と、前記AC対象フレームに連続する前記LPフレームの少なくとも一部を復号した信号とを含むローカルデコード信号を生成するローカルデコーダと、
前記AC対象フレームの復号において生じるエイリアシングの除去に用いられるAC信号を、前記音信号及び前記ローカルデコード信号を用いて生成し、出力するAC信号生成部とを備え、
前記AC信号生成部は、前記AC対象フレームが前記LPフレームの直後に連続する場合、または前記AC対象フレームが前記LPフレームの直前に連続するフレームである場合において、(1)複数の方式の中から選択した1つの方式にしたがって、前記AC信号を生成して出力し、かつ、(2)前記選択した1つの方式を示すACフラグを出力する
集積回路。 - LFD変換により符号化されたLFDフレームと、線形予測係数を用いて符号化されたLPフレームと、前記LPフレームと連続する前記LFDフレームであるAC対象フレームのエイリアシングの除去を行うためのAC信号とが含まれる符号化信号を復号する音信号復号方法であって、
前記LFDフレームを復号するILFDデコードステップと、
前記LPフレームを復号するLPデコードステップと、
前記ILFDデコードステップで復号したフレームに窓処理を行ったフレームと、前記LPデコーダが復号したフレームとを順番に整列した第2の狭帯域信号を出力する切替ステップと、
前記AC信号の生成に用いられた方式を示すACフラグを取得し、前記ACフラグが示す方式に応じて、前記切替ステップ、前記ILFDデコードステップ、または前記LPデコードステップにおいて出力される信号を前記AC信号に加算したAC出力信号を生成するAC出力信号生成ステップと、
前記第2の狭帯域信号のうちの前記AC対象フレームに相当する部分に、前記AC出力信号を加算した第3の狭帯域信号を出力する加算ステップとを含む
音信号復号方法。 - 請求項18に記載の音信号復号方法をコンピュータに実行させるためのプログラム。
- LFD変換により符号化されたLFDフレームと、線形予測係数を用いて符号化されたLPフレームと、前記LPフレームと連続する前記LFDフレームであるAC対象フレームのエイリアシングの除去を行うためのAC信号とが含まれる符号化信号を復号する集積回路であって、
前記LFDフレームを復号するILFDデコーダと、
前記LPフレームを復号するLPデコーダと、
前記ILFDデコーダが復号したフレームに窓処理を行ったフレームと、前記LPデコーダが復号したフレームとを順番に整列した第2の狭帯域信号を出力する切替部と、
前記AC信号の生成に用いられた方式を示すACフラグを取得し、前記ACフラグが示す方式に応じて、前記切替部、前記ILFDデコーダ、または前記LPデコーダから出力される信号を前記AC信号に加算したAC出力信号を生成するAC出力信号生成部と、
前記第2の狭帯域信号のうち、復号後の前記AC対象フレームに相当する部分に、前記AC出力信号を加算した第3の狭帯域信号を出力する加算部とを備える
集積回路。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP13786609.1A EP2849180B1 (en) | 2012-05-11 | 2013-05-08 | Hybrid audio signal encoder, hybrid audio signal decoder, method for encoding audio signal, and method for decoding audio signal |
JP2013537355A JP6126006B2 (ja) | 2012-05-11 | 2013-05-08 | 音信号ハイブリッドエンコーダ、音信号ハイブリッドデコーダ、音信号符号化方法、及び音信号復号方法 |
US14/117,738 US9489962B2 (en) | 2012-05-11 | 2013-05-08 | Sound signal hybrid encoder, sound signal hybrid decoder, sound signal encoding method, and sound signal decoding method |
CN201380001328.9A CN103548080B (zh) | 2012-05-11 | 2013-05-08 | 声音信号混合编码器、声音信号混合解码器、声音信号编码方法以及声音信号解码方法 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2012-108999 | 2012-05-11 | ||
JP2012108999 | 2012-05-11 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013168414A1 true WO2013168414A1 (ja) | 2013-11-14 |
Family
ID=49550477
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2013/002950 WO2013168414A1 (ja) | 2012-05-11 | 2013-05-08 | 音信号ハイブリッドエンコーダ、音信号ハイブリッドデコーダ、音信号符号化方法、及び音信号復号方法 |
Country Status (5)
Country | Link |
---|---|
US (1) | US9489962B2 (ja) |
EP (1) | EP2849180B1 (ja) |
JP (1) | JP6126006B2 (ja) |
CN (1) | CN103548080B (ja) |
WO (1) | WO2013168414A1 (ja) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107454416A (zh) * | 2017-09-12 | 2017-12-08 | 广州酷狗计算机科技有限公司 | 视频流发送方法和装置 |
RU2679571C1 (ru) * | 2015-03-09 | 2019-02-11 | Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. | Аудиокодер для кодирования многоканального сигнала и аудиодекодер для декодирования кодированного аудиосигнала |
JP2022174077A (ja) * | 2014-07-28 | 2022-11-22 | フラウンホッファー-ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ | スムーズな遷移を取得するために、ゼロ入力応答を用いるオーディオ・デコーダ、方法及びコンピュータ・プログラム |
JP7523563B2 (ja) | 2020-02-28 | 2024-07-26 | エヴィデント・カナダ・インコーポレイテッド | 超音波検査のための位相ベースのアプローチ |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105493182B (zh) * | 2013-08-28 | 2020-01-21 | 杜比实验室特许公司 | 混合波形编码和参数编码语音增强 |
RU2665281C2 (ru) * | 2013-09-12 | 2018-08-28 | Долби Интернэшнл Аб | Временное согласование данных обработки на основе квадратурного зеркального фильтра |
KR101498113B1 (ko) * | 2013-10-23 | 2015-03-04 | 광주과학기술원 | 사운드 신호의 대역폭 확장 장치 및 방법 |
EP2980796A1 (en) * | 2014-07-28 | 2016-02-03 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Method and apparatus for processing an audio signal, audio decoder, and audio encoder |
US10504530B2 (en) | 2015-11-03 | 2019-12-10 | Dolby Laboratories Licensing Corporation | Switching between transforms |
CN108352165B (zh) * | 2015-11-09 | 2023-02-03 | 索尼公司 | 解码装置、解码方法以及计算机可读存储介质 |
CA3045847C (en) | 2016-11-08 | 2021-06-15 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Downmixer and method for downmixing at least two channels and multichannel encoder and multichannel decoder |
ES2853936T3 (es) * | 2017-01-10 | 2021-09-20 | Fraunhofer Ges Forschung | Decodificador de audio, codificador de audio, método para proporcionar una señal de audio decodificada, método para proporcionar una señal de audio codificada, flujo de audio, proveedor de flujos de audio y programa informático que utiliza un identificador de flujo |
KR20210135492A (ko) * | 2019-03-05 | 2021-11-15 | 소니그룹주식회사 | 신호 처리 장치 및 방법, 그리고 프로그램 |
CN113948085B (zh) * | 2021-12-22 | 2022-03-25 | 中国科学院自动化研究所 | 语音识别方法、系统、电子设备和存储介质 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010148516A1 (en) * | 2009-06-23 | 2010-12-29 | Voiceage Corporation | Forward time-domain aliasing cancellation with application in weighted or original signal domain |
WO2011048118A1 (en) * | 2009-10-20 | 2011-04-28 | Fraunhofer Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio signal encoder, audio signal decoder, method for providing an encoded representation of an audio content, method for providing a decoded representation of an audio content and computer program for use in low delay applications |
WO2011158485A2 (ja) * | 2010-06-14 | 2011-12-22 | パナソニック株式会社 | オーディオハイブリッド符号化装置およびオーディオハイブリッド復号装置 |
Family Cites Families (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB8421498D0 (en) * | 1984-08-24 | 1984-09-26 | British Telecomm | Frequency domain speech coding |
BR9007063A (pt) * | 1989-01-27 | 1991-10-08 | Dolby Lab Licensing Corp | Codificador,descodificador e codificador/descodificador de transformada de taxa de bites baixa para audio de alta qualidade |
US6124811A (en) * | 1998-07-02 | 2000-09-26 | Intel Corporation | Real time algorithms and architectures for coding images compressed by DWT-based techniques |
US6226608B1 (en) * | 1999-01-28 | 2001-05-01 | Dolby Laboratories Licensing Corporation | Data framing for adaptive-block-length coding system |
US6426977B1 (en) * | 1999-06-04 | 2002-07-30 | Atlantic Aerospace Electronics Corporation | System and method for applying and removing Gaussian covering functions |
US6917913B2 (en) * | 2001-03-12 | 2005-07-12 | Motorola, Inc. | Digital filter for sub-band synthesis |
US7516064B2 (en) * | 2004-02-19 | 2009-04-07 | Dolby Laboratories Licensing Corporation | Adaptive hybrid transform for signal analysis and synthesis |
US8682652B2 (en) * | 2006-06-30 | 2014-03-25 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Audio encoder, audio decoder and audio processor having a dynamically variable warping characteristic |
FR2912249A1 (fr) * | 2007-02-02 | 2008-08-08 | France Telecom | Codage/decodage perfectionnes de signaux audionumeriques. |
CA2708861C (en) * | 2007-12-18 | 2016-06-21 | Lg Electronics Inc. | A method and an apparatus for processing an audio signal |
CA2871268C (en) * | 2008-07-11 | 2015-11-03 | Nikolaus Rettelbach | Audio encoder, audio decoder, methods for encoding and decoding an audio signal, audio stream and computer program |
EP2144231A1 (en) * | 2008-07-11 | 2010-01-13 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Low bitrate audio encoding/decoding scheme with common preprocessing |
PL2301020T3 (pl) * | 2008-07-11 | 2013-06-28 | Fraunhofer Ges Forschung | Urządzenie i sposób do kodowania/dekodowania sygnału audio z użyciem algorytmu przełączania aliasingu |
MY181231A (en) * | 2008-07-11 | 2020-12-21 | Fraunhofer Ges Zur Forderung Der Angenwandten Forschung E V | Audio encoder and decoder for encoding and decoding audio samples |
CN102177426B (zh) * | 2008-10-08 | 2014-11-05 | 弗兰霍菲尔运输应用研究公司 | 多分辨率切换音频编码/解码方案 |
KR101377703B1 (ko) * | 2008-12-22 | 2014-03-25 | 한국전자통신연구원 | 광대역 인터넷 음성 단말 장치 |
KR101622950B1 (ko) * | 2009-01-28 | 2016-05-23 | 삼성전자주식회사 | 오디오 신호의 부호화 및 복호화 방법 및 그 장치 |
JP4892021B2 (ja) * | 2009-02-26 | 2012-03-07 | 株式会社東芝 | 信号帯域拡張装置 |
EP3474279A1 (en) * | 2009-07-27 | 2019-04-24 | Unified Sound Systems, Inc. | Methods and apparatus for processing an audio signal |
CN102498515B (zh) * | 2009-09-17 | 2014-06-18 | 延世大学工业学术合作社 | 处理音频信号的方法和设备 |
WO2011048117A1 (en) * | 2009-10-20 | 2011-04-28 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio signal encoder, audio signal decoder, method for encoding or decoding an audio signal using an aliasing-cancellation |
US9613630B2 (en) * | 2009-11-12 | 2017-04-04 | Lg Electronics Inc. | Apparatus for processing a signal and method thereof for determining an LPC coding degree based on reduction of a value of LPC residual |
EP2524374B1 (en) * | 2010-01-13 | 2018-10-31 | Voiceage Corporation | Audio decoding with forward time-domain aliasing cancellation using linear-predictive filtering |
SI3239979T1 (sl) * | 2010-10-25 | 2024-09-30 | Voiceage Evs Llc | Kodiranje generičnih zvočnih signalov pri nizkih bitnih hitrostih in majhni zakasnitvi |
FR2969805A1 (fr) * | 2010-12-23 | 2012-06-29 | France Telecom | Codage bas retard alternant codage predictif et codage par transformee |
-
2013
- 2013-05-08 JP JP2013537355A patent/JP6126006B2/ja active Active
- 2013-05-08 CN CN201380001328.9A patent/CN103548080B/zh active Active
- 2013-05-08 US US14/117,738 patent/US9489962B2/en active Active
- 2013-05-08 WO PCT/JP2013/002950 patent/WO2013168414A1/ja active Application Filing
- 2013-05-08 EP EP13786609.1A patent/EP2849180B1/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010148516A1 (en) * | 2009-06-23 | 2010-12-29 | Voiceage Corporation | Forward time-domain aliasing cancellation with application in weighted or original signal domain |
WO2011048118A1 (en) * | 2009-10-20 | 2011-04-28 | Fraunhofer Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio signal encoder, audio signal decoder, method for providing an encoded representation of an audio content, method for providing a decoded representation of an audio content and computer program for use in low delay applications |
WO2011158485A2 (ja) * | 2010-06-14 | 2011-12-22 | パナソニック株式会社 | オーディオハイブリッド符号化装置およびオーディオハイブリッド復号装置 |
Non-Patent Citations (5)
Title |
---|
CAROT, ALEXANDER ET AL.: "Networked Music Performance: State of the Art", AES 30TH INTERNATIONAL CONFERENCE, 15 March 2007 (2007-03-15) |
SCHNELL, MARKUS ET AL.: "MPEG-4 Enhanced Low Delay AAC - a new standard for high quality communication", AES 125TH CONVENTION, 2 December 2008 (2008-12-02) |
SCHULLER, GERALD ET AL.: "New Framework for Modulated Perfect Reconstruction Filter Banks", IEEE TRANSACTION ON SIGNAL PROCESSING, vol. 44, August 1996 (1996-08-01), pages 1941 - 1954 |
See also references of EP2849180A4 |
VALIN, JEAN-MARC ET AL., A FULL-BANDWIDTH AUDIO CODEC WITH LOW COMPLEXITY AND VERY LOW DELAY |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2022174077A (ja) * | 2014-07-28 | 2022-11-22 | フラウンホッファー-ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ | スムーズな遷移を取得するために、ゼロ入力応答を用いるオーディオ・デコーダ、方法及びコンピュータ・プログラム |
US11922961B2 (en) | 2014-07-28 | 2024-03-05 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Audio decoder, method and computer program using a zero-input-response to obtain a smooth transition |
US10388287B2 (en) | 2015-03-09 | 2019-08-20 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal |
US10395661B2 (en) | 2015-03-09 | 2019-08-27 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal |
US10777208B2 (en) | 2015-03-09 | 2020-09-15 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal |
US11107483B2 (en) | 2015-03-09 | 2021-08-31 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal |
US11238874B2 (en) | 2015-03-09 | 2022-02-01 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal |
RU2680195C1 (ru) * | 2015-03-09 | 2019-02-18 | Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. | Аудиокодер для кодирования многоканального сигнала и аудиодекодер для декодирования кодированного аудиосигнала |
US11741973B2 (en) | 2015-03-09 | 2023-08-29 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal |
US11881225B2 (en) | 2015-03-09 | 2024-01-23 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal |
RU2679571C1 (ru) * | 2015-03-09 | 2019-02-11 | Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. | Аудиокодер для кодирования многоканального сигнала и аудиодекодер для декодирования кодированного аудиосигнала |
CN107454416A (zh) * | 2017-09-12 | 2017-12-08 | 广州酷狗计算机科技有限公司 | 视频流发送方法和装置 |
CN107454416B (zh) * | 2017-09-12 | 2020-06-30 | 广州酷狗计算机科技有限公司 | 视频流发送方法和装置 |
JP7523563B2 (ja) | 2020-02-28 | 2024-07-26 | エヴィデント・カナダ・インコーポレイテッド | 超音波検査のための位相ベースのアプローチ |
Also Published As
Publication number | Publication date |
---|---|
JPWO2013168414A1 (ja) | 2016-01-07 |
CN103548080A (zh) | 2014-01-29 |
EP2849180A4 (en) | 2015-04-22 |
US20140074489A1 (en) | 2014-03-13 |
EP2849180B1 (en) | 2020-01-01 |
JP6126006B2 (ja) | 2017-05-10 |
EP2849180A1 (en) | 2015-03-18 |
CN103548080B (zh) | 2017-03-08 |
US9489962B2 (en) | 2016-11-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6126006B2 (ja) | 音信号ハイブリッドエンコーダ、音信号ハイブリッドデコーダ、音信号符号化方法、及び音信号復号方法 | |
JP6941643B2 (ja) | 全帯域ギャップ充填を備えた周波数ドメインプロセッサと時間ドメインプロセッサとを使用するオーディオ符号器及び復号器 | |
US8321210B2 (en) | Audio encoding/decoding scheme having a switchable bypass | |
JP6310074B2 (ja) | インテリジェントギャップ充填フレームワーク内の2チャネル処理を用いるオーディオ符号器、オーディオ復号器およびその方法 | |
EP2950308B1 (en) | Bandwidth expansion parameter-generator, encoder, decoder, bandwidth expansion parameter-generating method, encoding method, and decoding method | |
RU2485606C2 (ru) | Схема кодирования/декодирования аудио сигналов с низким битрейтом с применением каскадных переключений | |
TWI581251B (zh) | 使用頻域處理器、時域處理器及供不斷初始化的跨處理器之音頻編碼器及解碼器 | |
JP2013508761A (ja) | マルチモードオーディオコーデックおよびそれに適応されるcelp符号化 | |
JP2016524721A (ja) | オブジェクト特有時間/周波数分解能を使用する混合信号からのオーディオオブジェクト分離 | |
Herre et al. | 18. Perceptual Perceptual Audio Coding of Speech Signals |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2013537355 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14117738 Country of ref document: US Ref document number: 2013786609 Country of ref document: EP |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13786609 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |