WO2013118476A1 - 音響/音声符号化装置、音響/音声復号装置、音響/音声符号化方法および音響/音声復号方法 - Google Patents
音響/音声符号化装置、音響/音声復号装置、音響/音声符号化方法および音響/音声復号方法 Download PDFInfo
- Publication number
- WO2013118476A1 WO2013118476A1 PCT/JP2013/000550 JP2013000550W WO2013118476A1 WO 2013118476 A1 WO2013118476 A1 WO 2013118476A1 JP 2013000550 W JP2013000550 W JP 2013000550W WO 2013118476 A1 WO2013118476 A1 WO 2013118476A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- codebook
- bits
- instruction
- subvector
- consumes
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 83
- 239000013598 vector Substances 0.000 claims abstract description 82
- 238000013139 quantization Methods 0.000 claims abstract description 69
- 238000006243 chemical reaction Methods 0.000 claims description 32
- 238000001228 spectrum Methods 0.000 claims description 27
- 230000008569 process Effects 0.000 claims description 22
- 230000003595 spectral effect Effects 0.000 claims description 11
- 230000001131 transforming effect Effects 0.000 claims description 4
- 238000004458 analytical method Methods 0.000 description 10
- 238000011426 transformation method Methods 0.000 description 10
- 230000005284 excitation Effects 0.000 description 8
- 230000015572 biosynthetic process Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 238000007796 conventional method Methods 0.000 description 6
- 238000003786 synthesis reaction Methods 0.000 description 5
- 230000000873 masking effect Effects 0.000 description 4
- 238000001914 filtration Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 239000002131 composite material Substances 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/032—Quantisation or dequantisation of spectral components
- G10L19/038—Vector quantisation, e.g. TwinVQ audio
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/12—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
Definitions
- the present invention relates to an acoustic / speech encoding device, an acoustic / speech decoding device, an acoustic / speech encoding method, and an acoustic / speech decoding method using vector quantization.
- Transform coding is to transform a signal from the time domain to the spectral domain by using discrete Fourier transform (DFT: “Discrete” Fourier Transform) or modified discrete cosine transform (MDCT: “Modified Discrete” Cosine Transform).
- DFT discrete Fourier transform
- MDCT modified discrete cosine transform
- the spectral coefficient is quantized and encoded.
- a psychoacoustic model is applied to determine the perceptual importance of the spectral coefficient, and then the spectral coefficient is quantized or encoded according to the perceptual importance.
- Widely used conversion codecs include MPEG MP3, MPEG AAC (see Non-Patent Document 1), and Dolby AC3. Transform coding is useful for music or general acoustic signals. A simple configuration of the conversion codec is shown in FIG.
- the encoder shown in FIG. 1 uses a time-to-frequency domain transform method such as discrete Fourier transform (DFT) or modified discrete cosine transform (MDCT) to convert the time domain signal S (n) to the frequency domain signal S (f (101).
- DFT discrete Fourier transform
- MDCT modified discrete cosine transform
- the psychoacoustic model analysis is performed on the frequency domain signal S (f) to derive a masking curve (103).
- the frequency domain signal S (f) is quantized according to the masking curve derived by the psychoacoustic model analysis so that the quantization noise becomes inaudible (102).
- Quantization parameters are multiplexed (104) and transmitted to the decoder side.
- all bit stream information is demultiplexed (105).
- the quantized parameter is inversely quantized to restore the decoded frequency domain signal S ⁇ (f) (106).
- a frequency-to-time domain transformation method such as inverse discrete Fourier transform (IDFT: Inverse Discrete Fourier Transform) or inverse modified discrete cosine transform (IMDCT: Inverse Modified Discrete Cosine Transform
- IDFT Inverse Discrete Fourier Transform
- IMDCT Inverse Modified Discrete Cosine Transform
- linear predictive coding uses the property that a speech signal can be predicted in the time domain, and obtains a residual excitation signal by applying linear prediction to the input speech signal.
- this modeling produces a very efficient speech representation.
- the residual excitation signal is mainly encoded with two different methods, TCX and CELP.
- TCX In TCX (see Non-Patent Document 2), the residual excitation signal is efficiently converted and encoded in the frequency domain.
- Widely used TCX codecs include 3GPP AMR-WB + and MPEG USAC. A simple configuration of the TCX codec is shown in FIG.
- the encoder shown in FIG. 2 performs LPC analysis on the input signal and uses the predictability of the signal in the time domain (201).
- the LPC parameter obtained by the LPC analysis is quantized (202), the quantization index is multiplexed (207), and transmitted to the decoder side.
- a residual (excitation) signal S r (n) is obtained by applying LPC inverse filtering to the input signal S (n) using the LPC parameters inversely quantized by the inverse quantization unit (203) ( 204).
- DFT discrete Fourier transform
- MDCT modified discrete cosine transform
- S r (f) is quantized (206), the quantization parameter is multiplexed (207), and transmitted to the decoder side.
- the decoder shown in FIG. 2 first demultiplexes all bitstream information (208).
- the quantization parameter is inversely quantized to restore the decoded frequency domain residual signal S r ⁇ (f) (210).
- a frequency to time domain transformation method such as inverse discrete Fourier transform (IDFT) or inverse modified discrete cosine transform (IMDCT)
- IDDCT inverse modified discrete cosine transform
- the decoded time domain residual signal S r ⁇ (n) is processed by the LPC synthesis filter (212) to obtain the decoded time domain signal S ⁇ ( n).
- CELP encoding the residual excitation signal is quantized using some predetermined codebook. In order to further improve the sound quality, it is common to convert the difference signal between the original signal and the LPC synthesized signal into a frequency domain and further encode it.
- a widely used CELP codec includes ITU-T G. 729.1 (see Non-Patent Document 3), ITU-T G. 718 (see Non-Patent Document 4).
- a simple configuration of hierarchical coding (layered coding and embedded coding) of CELP coding and transform coding is shown in FIG.
- the encoder shown in FIG. 3 performs CELP coding on the input signal to use the signal predictability in the time domain (301).
- the CELP parameter is used to restore the composite signal in the CELP local decoder (302).
- a prediction error signal S e (n) (a difference signal between the input signal and the combined signal) is obtained by subtracting the combined signal from the input signal.
- the prediction error signal S e (n) is converted to the frequency domain signal S e (f) using a time-to-frequency domain conversion method such as discrete Fourier transform (DFT) or modified discrete cosine transform (MDCT) ( 303).
- DFT discrete Fourier transform
- MDCT modified discrete cosine transform
- S e (f) is quantized (304), the quantization parameter is multiplexed (305), and transmitted to the decoder side.
- the decoder shown in FIG. 3 first demultiplexes all bitstream information (306).
- the quantization parameter is inversely quantized to restore the decoded frequency domain residual signal S e ⁇ (f) (308).
- the decoded frequency domain residual signal S e ⁇ (f) is transformed into time.
- the decoding time domain residual signal S e ⁇ (n) is restored (309).
- CELP decoder restores the combined signal S syn (n) in CELP decoder (307) by adding the decoded prediction error signal S e ⁇ a CELP synthesis signal S syn (n) (n), Decode time domain signal S ⁇ (n) is restored.
- the transform coding part of transform coding and linear predictive coding is usually performed using some quantization method.
- One of the vector quantization methods is called divided multirate lattice VQ or algebra VQ (AVQ) (see Non-Patent Document 5).
- AMR-WB + see Non-Patent Document 6
- an LPC residual is quantized in the TCX domain using a divided multirate lattice VQ (shown in FIG. 4).
- a newly standardized audio codec ITU-T G Also at 718, the LPC residual is quantized as a residual coding layer 3 in the MDCT domain using the split multirate lattice VQ.
- the divided multi-rate lattice VQ is a vector quantization method based on a lattice quantizer. Specifically, in the case of the divided multi-rate lattice VQ used in AMR-WB + (see Non-Patent Document 6), a vector codebook consisting of a subset of Gosset lattice called RE8 lattice is used, and eight spectra are used. The spectrum is quantized into blocks consisting of coefficients (see Non-Patent Document 5).
- c s ⁇ G.
- s is a row vector of integer values
- c is a generated grid point.
- a multi-rate codebook can be formed by obtaining a subset of grid points that are inside spheres of different radii.
- FIG. 4 shows a simple configuration using division multi-rate vector quantization in the TCX codec.
- LPC analysis is performed on the input signal to use the predictability of the signal in the time domain (401).
- the LPC parameter obtained by the LPC analysis is quantized (402), the quantization index is multiplexed (407), and transmitted to the decoder side.
- a residual (excitation) signal S r (n) is obtained by applying LPC inverse filtering to the input signal S (n) using the LPC parameters inversely quantized by the inverse quantization unit (403) (404). ).
- DFT discrete Fourier transform
- MDCT modified discrete cosine transform
- the divided multi-rate lattice vector quantization method is applied to S r (f) (406), the quantization parameters are multiplexed (407), and transmitted to the decoder side.
- the quantization parameter is inversely quantized by the division multirate lattice vector inverse quantization method to obtain a decoded frequency domain residual signal S r ⁇ (f) (410).
- a frequency to time domain transformation method such as inverse discrete Fourier transform (IDFT) or inverse modified discrete cosine transform (IMDCT)
- IDDCT inverse modified discrete cosine transform
- the decoding time domain residual signal S r ⁇ (n) is processed by the LPC synthesis filter (412), and the decoding time domain signal S ⁇ (N) is obtained.
- FIG. 5 shows a process of the divided multi-rate lattice VQ.
- the input spectrum S (f) is divided into several 8-dimensional blocks (or vectors) (501), and each block (or vector) is quantized by a multi-rate lattice vector quantization method (502).
- the total gain is calculated according to the number of available bits and the energy level of the entire spectrum.
- the ratio of the original spectrum to the total gain is quantized with several different codebooks.
- the quantization parameters of the divided multi-rate lattice VQ are the total gain quantization index, the codebook indication of each block (or vector), and the code vector index of each block (or vector).
- FIG. 6 summarizes the codebook list of the divided multi-rate lattice VQ employed in AMR-WB + (see Non-Patent Document 6).
- codebooks Q0, Q2, Q3, and Q4 are base codebooks. If a given grid point is not included in those base codebooks, Voronoi extension (see Non-Patent Document 7) is applied using only the Q3 or Q4 portion of the base codebook.
- Q5 is the Voronoi extension of Q3
- Q6 is the Voronoi extension of Q4.
- Each code book is composed of several code vectors.
- the code vector index of the code book is represented by several bits.
- the number of bits is derived by the following equation (1).
- N bits is the number of bits consumed in the code vector index
- N cv is the number of code vectors in the code book.
- the code book Q0 has a zero vector, which is only one vector, which means that the quantized value of the vector is zero. Therefore, no bits are required for this code vector index.
- a bitstream is usually formed in two ways. The first method is shown in FIG. 7, and the second method is shown in FIG.
- the input signal S (f) is first divided into several vectors.
- the total gain is then derived according to the number of available bits and the energy level of the spectrum.
- the total gain is quantized with a scalar quantizer, and S (f) / G is quantized with a multirate lattice vector quantizer.
- the total gain index forms the first part, all codebook instructions are grouped together to form the second part, and all the indices in the code vector are grouped together To form the last part.
- the input signal S (f) is first divided into several vectors.
- the total gain is then derived according to the number of available bits and the energy level of the spectrum.
- the total gain is quantized with a scalar quantizer, and S (f) / G is quantized with a multirate lattice vector quantizer.
- the total gain index forms the first part
- the codebook indication followed by the code vector index of each vector forms the second part.
- the energy of the subvector v3 is the largest among the eight subvectors, and according to the division multirate lattice vector quantization process, the codebook of v3 (The integer n of Qn is referred to as a codebook number here).
- subvector v3 consumes the most bits for codebook indication.
- a codebook with a large codebook number may consume too many bits (for example, several times the codebook instructions for a codebook with a small codebook number), so a codebook instruction for a codebook with a large codebook number It is desirable to reduce the number of bits consumed.
- the codebook instruction of v3 consumes 11 bits
- the codebook instruction of v4 consumes 3 bits
- the codebook instruction of other vectors consumes 2 bits.
- a v3 codebook instruction consumes more than five times as many bits as a v1 (v2, v5, v6, v7, or v8) codebook instruction.
- the codebook indication and the code vector index are directly converted to binary numbers to form a bitstream. Therefore, the total number of bits consumed for all vectors can be calculated as follows.
- Bits total is the total number of bits consumed
- Bits gain_q is the number of bits consumed to quantize the total gain
- Bits cb_indication is the number of bits consumed in the codebook instruction of each vector.
- Bits cv_index is the number of bits consumed for the code vector index of each vector
- N is the total number of vectors in the entire spectrum.
- an idea for efficiently coding the quantization parameter of the divided multi-rate lattice vector quantization is introduced.
- the difference value between the actual value and the estimated value is calculated.
- the position of the subvector that uses the codebook and the difference value between the actual value and the estimated value are transmitted.
- Detailed steps in the encoder are as follows. 1) Calculate codebook instructions for all subvectors. 2) Identify the position of the subvector where the codebook indication consumes the most bits and encode the position. Then, the codebook indications of all subvectors other than the subvector consuming the most bits are encoded. 3) Estimate the codebook whose instructions consume the most bits. 4) Encode the difference between the actual value and the estimated value.
- Detailed steps in the decoder are as follows. 1) Decode the position of the subvector where the codebook indication consumes the most bits. 2) Decode the codebook instructions for all other subvectors. 3) Estimate the codebook whose instructions consume the most bits. 4) Decode the difference between the actual value and the estimated value. 5) The decoded value is calculated by adding the estimated value and the difference.
- FIG. 9 The spectrum of FIG. 9 is used as an example for detailed description.
- 1) Referring to the codebook instruction table of FIG. 6, calculate codebook instructions for all subvectors. The detailed results are shown in FIG. 2) Identify the position of the subvector where the codebook indication consumes the most bits and encode the position. Then, the codebook indications of all subvectors other than the subvector consuming the most bits are encoded. As shown in FIG. 11, the codebook indication of subvector v3 consumes the most bits. As an example, the position is encoded using the code book shown in FIG. Referring to FIG. 12, the position of v3 is encoded as “010”. 3) Estimate the codebook whose instructions consume the most bits according to the following equation:
- cb ′ max is an estimated value of the code book that consumes the most bits
- Bits available is the total number of available bits
- Bits cbvi is the bit consumption of the codebook instruction vi. 4)
- the difference value is calculated according to the following equation (4), and is encoded with reference to the table of FIG. As shown in FIG. 13, the value that the difference can take is negative. The reason is that the estimate is calculated assuming that all available bits are used in quantization. Bits exceeding the number of available bits cannot be consumed by quantization. The estimate is the maximum possible value. Therefore, the actual value will never be greater than the estimated value.
- cb ′ max is the estimated value of the codebook consuming the most bits
- cb max is the actual value of the codebook consuming the most bits
- cb diff is the actual value And the estimated value.
- cb ′ max is the estimated value of the codebook consuming the most bits
- cb max is the actual value of the codebook consuming the most bits
- cb diff is the actual value And the estimated value.
- the number of bits saved by the method proposed in the present invention is calculated as the following equation (6).
- Bits save is the number of bits saved by the method proposed in the present invention
- Bits cbmax is the bit consumption of the codebook that consumes the most bits
- Bits position_cbmax consumes the most bits. It is the number of bits consumed at the codebook position
- Bits cbdiff is the number of bits consumed to encode the difference value.
- the bit consumption of the code book that consumes the most bits is proportional to the code book number.
- the maximum codebook number is 11, and the number of bits consumed for the codebook instruction is 11 bits.
- the number of bits consumed for the difference value is less than the number of bits consumed by the codebook that consumes the most bits. This is because the difference value is smaller than the code book instruction. As shown in the above example, the number of bits consumed for encoding the difference value is 1 bit.
- the number of saving bits in this example is calculated as in the following equation (7).
- Diagram showing simple configuration of conversion codec A diagram showing a simple configuration of a TCX codec
- segmentation multirate lattice vector quantization The figure which shows the table
- Diagram showing codebook instructions for all subvectors Diagram showing the code table for the location of the subvector where the codebook directive consumes the most bits
- Diagram showing the code table of difference values The figure explaining the structure of the codec which concerns on Embodiment 1 of this invention.
- the figure which shows an example which encodes only a part of spectrum The figure which shows encoding parameters other than v2 in the case of the example of FIG.
- Flowchart of encoding process proposed in Embodiment 7 Flowchart of decoding process proposed in Embodiment 7 Flowchart of conventional split multirate lattice VQ Flowchart of divided multirate VQ proposed in the eighth embodiment
- FIG. 14 shows the codec of the present invention.
- the codec comprises an encoder and a decoder that apply split multirate lattice vector quantization.
- a time domain signal S (n) is converted to a frequency domain signal S (f) using a time to frequency domain conversion method such as discrete Fourier transform (DFT) or modified discrete cosine transform (MDCT). (1401).
- DFT discrete Fourier transform
- MDCT modified discrete cosine transform
- a psychoacoustic model analysis is performed on the frequency domain signal S (f) to obtain a masking curve (1402).
- the divided multi-rate lattice vector quantization is applied to the frequency domain signal S (f) so that the quantization noise becomes inaudible (1403).
- three quantization parameter sets including a total gain quantization index, a codebook instruction, and a code vector index are generated.
- the code book instruction is converted in the following manner (1404). 1) Calculate codebook indications for all subvectors. 2) Identify the position of the subvector where the codebook indication consumes the most bits and encode the position. Then, the codebook indications of all subvectors other than the subvector consuming the most bits are encoded. 3) Estimate the codebook whose instructions consume the most bits. 4) Encode the difference between the actual value and the estimated value.
- the total gain index, the code vector index, the position of the maximum code book, the difference value between the actual value and the estimated value, and the code book instructions of other subvectors are multiplexed (1405) and transmitted to the decoder side.
- all bit stream information is demultiplexed by the demultiplexer (1406).
- the position of the maximum codebook and the difference value between the actual value and the estimated value are converted into the maximum codebook instruction by the codebook instruction conversion unit (1407).
- Detailed steps in the codebook instruction conversion unit (1407) are as follows. 1) Decode the position of the subvector where the codebook indication consumes the most bits. 2) Decode the codebook instructions for all other subvectors. 3) Estimate the codebook whose instructions consume the most bits. 4) Decode the difference between the actual value and the estimated value. 5) Calculate the decoded value by adding the estimated value and the difference.
- the total gain index, the code vector index, and the original codebook instruction are inversely quantized by the divided multirate lattice vector inverse quantization method to restore the decoded frequency domain signal S ⁇ (f) (1408).
- the decoded frequency domain signal S ⁇ (f) By transforming the decoded frequency domain signal S ⁇ (f) back into the time domain using a frequency to time domain transformation method such as inverse discrete Fourier transform (IDFT) or inverse modified discrete cosine transform (IMDCT). Then, the decoding time domain signal S ⁇ (n) is restored (1409).
- a frequency to time domain transformation method such as inverse discrete Fourier transform (IDFT) or inverse modified discrete cosine transform (IMDCT).
- the encoder shown in FIG. 15 performs LPC analysis on the input signal and uses the predictability of the signal in the time domain (1501).
- the LPC parameter obtained by the LPC analysis is quantized (1502), the quantization index is multiplexed (1508), and transmitted to the decoder side.
- a residual (excitation) signal S r (n) is obtained by applying LPC inverse filtering to the input signal S (n) using the LPC parameters quantized by the inverse quantization unit (1503) (1504). ).
- DFT discrete Fourier transform
- MDCT modified discrete cosine transform
- a total gain quantization index In the divided multi-rate lattice vector quantization, three sets of quantization parameters are generated: a total gain quantization index, a codebook instruction, and a code vector index.
- the code book instruction is converted in the following manner (1507). 1) Calculate codebook instructions for all subvectors. 2) Identify the position of the subvector where the codebook indication consumes the most bits and encode the position. Then, the codebook indications of all subvectors other than the subvector consuming the most bits are encoded. 3) Estimate the codebook whose instructions consume the most bits. 4) Encode the difference between the actual value and the estimated value.
- the total gain index, the code vector index, the position of the maximum code book, the difference value between the actual value and the estimated value, and the code book instructions of other subvectors are multiplexed (1508) and transmitted to the decoder side.
- the position of the maximum codebook and the difference value between the actual value and the estimated value are converted into the maximum codebook instruction by the codebook instruction conversion unit (1510).
- Detailed steps in the codebook instruction conversion unit are as follows. 1) Decode the position of the subvector where the codebook indication consumes the most bits. 2) Decode the codebook instructions for all other subvectors. 3) Estimate the codebook whose instructions consume the most bits. 4) Decode the difference between the actual value and the estimated value. 5) Calculate the decoded value by adding the estimated value and the difference.
- the total gain index, the code vector index, and the original codebook instruction are inversely quantized by the divided multirate lattice vector inverse quantization method to restore the decoded frequency domain signal S r ⁇ (f) (1511).
- the decoded frequency domain residual signal S r ⁇ (f) is transformed into the time domain.
- the decoding time domain residual signal S r ⁇ (n) is restored by returning to (1512).
- the decoded time domain residual signal S r ⁇ (n) is processed by the LPC synthesis filter (1514) to obtain the decoded time domain signal S ⁇ ( n).
- a feature of this embodiment is that the spectrum cluster analysis method is applied in hierarchical coding (layered coding, embedded coding) of CELP coding and transform coding.
- the encoder shown in FIG. 16 performs CELP coding on the input signal and uses the predictability of the signal in the time domain (1601). Using the CELP parameter, the composite signal is restored by the CELP decoder (1602), the CELP parameter is multiplexed (1606), and transmitted to the decoder side. By subtracting the combined signal from the input signal, a prediction error signal S e (n) (a difference signal between the input signal and the combined signal) is obtained.
- the prediction error signal S e (n) is converted to the frequency domain signal S e (f) using a time-to-frequency domain conversion method such as discrete Fourier transform (DFT) or modified discrete cosine transform (MDCT) ( 1603).
- DFT discrete Fourier transform
- MDCT modified discrete cosine transform
- a divided multi-rate lattice vector quantization is applied to the frequency domain signal S e (f) (1604).
- three quantization parameter sets including a total gain quantization index, a codebook instruction, and a code vector index are generated.
- the code book instruction is converted in the following manner (1605). 1) Calculate codebook instructions for all subvectors. 2) Identify the position of the subvector where the codebook indication consumes the most bits and encode the position. Then, the codebook indications of all subvectors other than the subvector consuming the most bits are encoded. 3) Estimate the codebook whose instructions consume the most bits. 4) Encode the difference between the actual value and the estimated value.
- the total gain index, the code vector index, the position of the maximum code book, the difference value between the actual value and the specified value, and the code book instructions of other subvectors are multiplexed (1606) and transmitted to the decoder side.
- the position of the maximum codebook and the difference value between the actual value and the estimated value are converted into the maximum codebook instruction by the codebook instruction conversion unit (1608).
- Detailed steps in the codebook instruction conversion unit (1608) are as follows. 1) Decode the position of the subvector where the codebook indication consumes the most bits. 2) Decode the codebook instructions for all other subvectors. 3) Estimate the codebook whose instructions consume the most bits. 4) Decode the difference between the actual value and the estimated value. 5) Calculate the decoded value by adding the estimated value and the difference.
- the total gain index, the code vector index, and the original codebook instruction are inversely quantized by the divided multirate lattice vector inverse quantization method to restore the decoded frequency domain signal S e ⁇ (f) (1609).
- the CELP decoder restores the synthesized signal S syn (n) (1611), and adds the CELP synthesized signal S syn (n) and the decoded prediction error signal S e ⁇ (n), Decode time domain signal S ⁇ (n) is restored.
- the bit consumption of the novel method of the present invention is higher than that of the conventional method. May also increase.
- Equation (6) when Bits cbmax ⁇ Bits position_cbmax + Bits cbdiff , the bit consumption of the new method of the present invention is larger than that of the conventional method. In order to prevent this problem, this embodiment proposes one idea.
- This idea is to reduce the number of bits consumed to indicate the position of the codebook that consumes the most bits.
- a fixed subvector codebook eg, the last subvector codebook
- the difference value between the actual code book instruction and the estimated value is encoded and transmitted to the decoder side.
- the estimated codebook indication calculated with the assumption that most of the allocated bits are used for sub-vector coding and all bits are used by the calculation of the total gain becomes the actual value.
- the absolute value of the difference becomes smaller than the actual codebook instruction, and the number of bits consumed for encoding the difference value is guaranteed to be smaller than the actual value.
- the detailed encoding process is as follows. 1) Calculate codebook instructions for all subvectors. 2) Identify the location of the subvector where the codebook indication consumes the most bits. 3) Compare the codebook indication with a predetermined threshold (the threshold is a predetermined value calculated based on a large database to ensure that the bit consumption of the method of the invention is less than that of the conventional method) .
- codebook instruction is less than or equal to the threshold, the following is performed. a) Estimate the codebook indication of the last subvector. b) Encode the difference between the actual value and the estimated value and encode the codebook indication for all subvectors other than the last subvector.
- the detailed encoding process is as follows. 1) Decode codebook instructions for all other subvectors. 2) Estimate the codebook indication of the subvector into which the codebook indication has been converted. 3) Decode the difference between the actual value and the estimated value. 4) Calculate the decoded value by adding the estimated value and the difference. 5) Compare the decoded value with a predetermined threshold.
- the number of consumed bits realized by the method of the present invention is greater than the original divided multi-rate VQ by comparing the codebook indication that consumes the maximum number of bits with some predetermined threshold. To avoid the situation. This ensures that bit savings are always achieved.
- the fixed subvector is not limited to the last subvector, and may be determined according to the characteristics of the input spectrum. As an example, if the codebook of the first subvector is statistically larger than the other subvectors, the first subvector can be selected.
- the last code vector is encoded as the maximum codebook and its position is fixed. Bit consumption for notification is avoided. Then, it can be assured that the number of bits saved by the method of the present invention is a positive value.
- Non-Patent Document 7 The indication of Q0 has the least number of bits consumed (1 bit), but the probability is very low, only 3%.
- the use probability of Q2 is the highest (29%), but the number of bits consumed is not the minimum.
- the codebook instruction is designed using the Huffman table design method and the codebook instruction
- the basic policy is to allocate fewer bits to a codebook with a high probability and allocate more bits to a codebook with a low probability.
- the method of the present invention is applied not to the codebook instruction having the largest codebook number but to the codebook instruction that consumes the most bits.
- Detailed steps in the encoder are as follows. 1) Encode the codebook indication for all subvectors. 2) Locate and encode the position of the subvector where the codebook indication consumes the most bits. 3) Estimate the codebook whose instructions consume the most bits. 4) Encode the difference between the actual value and the estimated value.
- Detailed steps in the decoder are as follows. 1) Decode the position of the subvector where the codebook indication consumes the most bits. 2) Decode the codebook instructions for all other subvectors. 3) Estimate the codebook whose instructions consume the most bits. 4) Decode the difference between the actual value and the estimated value. 5) The decoded value is calculated by adding the estimated value and the difference.
- the bit saved by the codebook instruction conversion method is used to divide the spectrum into small bands and assign a “gain correction factor” to each band, thereby giving finer resolution to the total gain.
- the gain correction factor By transmitting the gain correction factor using the saved bits, the quantization performance can be improved and the sound quality can be improved.
- the codebook instruction conversion method of the present invention can be applied to encoding of stereo signals or multichannel signals.
- the method of the present invention is applied to side signal encoding, and the saved bits are used for main signal encoding. Since the main signal is perceptually more important than the side signal, this provides a subjective improvement in sound quality.
- the codebook instruction conversion method of the present invention can be applied to a codec that encodes spectral coefficients in units of a plurality of frames (or units of a plurality of subframes).
- the bits saved by the codebook directed conversion method can be stored and used to encode spectral coefficients or some other parameter in the next encoding stage.
- the sound quality can be maintained in a situation where there is a frame loss.
- the configurations proposed in the first embodiment, the second embodiment, and the third embodiment are based on the premise that all subvectors are quantized by AVQ. If all subvectors are quantized with AVQ, all possible values of cb diff are negative. The reason is that the estimated codebook indication is calculated assuming that all available bits are used in quantization. Bits exceeding the number of available bits cannot be consumed by quantization. The estimated codebook indication is the maximum possible value. Therefore, the actual codebook instruction will never be larger than the estimated codebook instruction.
- cb diff may be positive.
- the energy is concentrated in the low frequency part of the spectrum, all the bits are distributed in the low frequency subvector, and no bits are allocated to the high frequency subvector.
- the total number of bits allocated to quantize 8 subvector spectra is 72 bits, and the codebook instructions for all subvectors are shown in FIG. It can be seen that the last two subvectors leave no bits to encode their codebook indications. In this case, in order to apply the method of the present invention, it is necessary to transmit the codebook instructions of the last two subvectors, and 2 bits are used for those instructions.
- the bit consumption of all subvectors other than v2, which is the subvector consuming the most bits, is shown in FIG.
- the codebook of v2 is estimated by the following formula (8).
- a simple method is to include a positive value for cb diff in the codebook. However, this method increases the bit consumption for encoding cb diff .
- Another idea is to invalidate the idea proposed by the present invention when not all of the subvectors are quantized by AVQ.
- the problem is that a flag is required to indicate whether the proposed idea is valid.
- this information can be extracted from the available information.
- This idea is to determine whether or not to enable the method proposed by the present invention by encoding AVQ parameters on the encoder side and using bit usage information on the decoder side as is conventionally done.
- step (hereinafter abbreviated as “ST”) 1701 the total number of consumed bits N ′ bits of all subvectors is calculated.
- Step1702 it is checked whether or not the number of available bits N bits is sufficient to encode the AVQ parameters of all subvectors (N bits ⁇ N ′ bits ). If the number of available bits is sufficient to encode the AVQ parameters of all subvectors, the process proceeds to ST1703, and if not enough, the process proceeds to ST1713.
- the codebook instruction specifies the position of the subvector that consumes the most bits.
- the codebook instruction is compared with a predetermined threshold, and if the codebook instruction is larger than the threshold, the process proceeds to ST1705, and if the codebook instruction is less than the threshold, the process proceeds to ST1709.
- codebook instructions of all subvectors other than the subvector consuming the most bits are encoded.
- the codebook instruction is estimated as the codebook instruction of the subvector consuming the most bits.
- codebook instructions of a predetermined subvector for example, all subvectors other than the last subvector are encoded.
- a codebook instruction cb last of a predetermined subvector for example, the last subvector is estimated.
- the codebook instruction of the subvector into which the codebook instruction is converted is estimated. That is, the estimated codebook instruction cb ′ max is calculated.
- the decoded codebook instruction is calculated by adding the estimated codebook instruction and the difference.
- the decoded codebook instruction is compared with a predetermined threshold value. If the decoded codebook instruction is larger than the threshold value, the process proceeds to ST1811. If the decoded codebook instruction is equal to or less than the threshold value, the process proceeds to ST1812. Transition.
- the codebook instruction decodes the position of the subvector that consumes the most bits.
- This embodiment solves the problem that the value of cb diff becomes positive without using flag information by using information on the number of bits remaining after each subvector is decoded on the decoder side.
- Non-Patent Document 5 also states that the amount of bits used may be less than the number of allocated bits.
- equation (6) when Bits cbmax ⁇ Bits position_cbmax + Bits cbdifff , the bit consumption of the new method of the present invention is larger than that of the conventional method, and if there are a large number of unused bits, the value of cb diff
- an idea for preventing this problem is proposed.
- This idea is to use all allocated bits in vector quantization.
- One possible scheme is to use unused bits to increase the codebook number of the subvector with the greatest energy, and another possible scheme is to use unused bits to Are encoded as subvectors.
- FIG. 21 shows a flowchart of the original divided multirate lattice VQ
- FIG. 22 shows a flowchart of the method proposed by the present invention.
- the estimated total gain g is used to normalize the subvector, and in ST1904, the normalized subvector is quantized into the RE8 lattice.
- the number of unused bits is calculated, and in ST1908, unused bits are allocated to the subvector having the maximum energy (selected subvector), and the codebook and code vector of the selected subvector are updated. To do.
- allocating unused bits to selected subvectors provides two technical benefits. One is that most of the allocated bits are used to encode the sub-vector of the current frame. Second, the difference value cb diff becomes very small, so that the number of bits used to encode the difference value is reduced. As a result, more bits are saved.
- An acoustic / speech encoding device, an acoustic / speech decoding device, an acoustic / speech encoding method, and an acoustic / speech decoding method according to the present invention include a wireless communication terminal device, a base station device, a remote conference terminal device, and a television in a mobile communication system.
- the present invention can be applied to a conference terminal device and a voice over internet protocol (VoIP) terminal device.
- VoIP voice over internet protocol
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
1)すべてのサブベクトルのコードブック指示を算出する。
2)コードブック指示が最も多くのビットを消費するサブベクトルの位置を特定し、その位置を符号化する。そして、最も多くのビットを消費するサブベクトル以外のすべてのサブベクトルのコードブック指示を符号化する。
3)指示が最も多くのビットを消費するコードブックを推定する。
4)実際の値と推定値との差分を符号化する。
1)コードブック指示が最も多くのビットを消費するサブベクトルの位置を復号する。
2)それ以外のすべてのサブベクトルのコードブック指示を復号する。
3)指示が最も多くのビットを消費するコードブックを推定する。
4)実際の値と推定値との差分を復号する。
5)推定値と差分を加算することにより、復号された値を算出する。
1)図6のコードブック指示表を参照して、すべてのサブベクトルのコードブック指示を算出する。この詳細な結果を図11に示す。
2)コードブック指示が最も多くのビットを消費するサブベクトルの位置を特定し、その位置を符号化する。そして、最も多くのビットを消費するサブベクトル以外のすべてのサブベクトルのコードブック指示を符号化する。図11に示すように、サブベクトルv3のコードブック指示が最も多くのビットを消費する。一例として、図12に示すコードブックを使用して位置を符号化する。図12を参照すると、v3の位置は「010」に符号化される。
3)以下の式に従って、指示が最も多くのビットを消費するコードブックを推定する。
4)実際の値と推定値との差分を符号化する。差分値は下記の式(4)に従って計算し、図13の表を参照して符号化する。図13に示すように、差分が取り得る値は負になる。その理由は、推定値は、すべての利用可能なビット数が量子化で使用されることを想定して計算されるためである。利用可能なビット数を超えるビットが量子化で消費されることは起こりえない。推定値は最大の可能な値である。したがって、実際の値が推定値より大きくなることは決してない。
1)コードブック指示が最も多くのビットを消費するサブベクトルの位置を復号する。図12の表を参照すると、「010」がv3に対応する。
2)v3以外のすべてのサブベクトルのコードブック指示を復号する。
3)指示が最も多くのビットを消費するコードブックを推定する。これは式(3)により行う。
4)実際の値と推定値との差分を復号する。差分値は図13の表を参照して復号し、差分のコード0は差分0に対応する。
5)推定値と差分を加算することにより、復号された値を算出する。詳細な計算は次の式(5)で得られる。
図14に本発明のコーデックを示す。このコーデックは、分割マルチレート格子ベクトル量子化を適用するエンコーダおよびデコーダを備える。
1)すべてのサブベクトルのコードブック指示を計算する。
2)コードブック指示が最も多くのビットを消費するサブベクトルの位置を特定し、その位置を符号化する。そして、最も多くのビットを消費するサブベクトル以外のすべてのサブベクトルのコードブック指示を符号化する。
3)指示が最も多くのビットを消費するコードブックを推定する。
4)実際の値と推定値との差分を符号化する。
1)コードブック指示が最も多くのビットを消費するサブベクトルの位置を復号する。
2)それ以外のすべてのサブベクトルのコードブック指示を復号する。
3)指示が最も多くのビットを消費するコードブックを推定する。
4)実際の値と推定値との差分を復号する。
5)推定値と差分を加算することにより、復号された値を計算する。
この実施の形態の特徴は、本発明の方法をTCXコーデックで適用する点である。
1)すべてのサブベクトルのコードブック指示を算出する。
2)コードブック指示が最も多くのビットを消費するサブベクトルの位置を特定し、その位置を符号化する。そして、最も多くのビットを消費するサブベクトル以外のすべてのサブベクトルのコードブック指示を符号化する。
3)指示が最も多くのビットを消費するコードブックを推定する。
4)実際の値と推定値との差分を符号化する。
1)コードブック指示が最も多くのビットを消費するサブベクトルの位置を復号する。
2)それ以外のすべてのサブベクトルのコードブック指示を復号する。
3)指示が最も多くのビットを消費するコードブックを推定する。
4)実際の値と推定値との差分を復号する。
5)推定値と差分を加算することにより、復号された値を計算する。
この実施の形態の特徴は、CELP符号化および変換符号化の階層符号化(層化符号化、埋め込み符号化)においてスペクトルクラスタ解析法を適用する点である。
1)すべてのサブベクトルのコードブック指示を算出する。
2)コードブック指示が最も多くのビットを消費するサブベクトルの位置を特定し、その位置を符号化する。そして、最も多くのビットを消費するサブベクトル以外のすべてのサブベクトルのコードブック指示を符号化する。
3)指示が最も多くのビットを消費するコードブックを推定する。
4)実際の値と推定値との差分を符号化する。
1)コードブック指示が最も多くのビットを消費するサブベクトルの位置を復号する。
2)それ以外のすべてのサブベクトルのコードブック指示を復号する。
3)指示が最も多くのビットを消費するコードブックを推定する。
4)実際の値と推定値との差分を復号する。
5)推定値と差分を加算することにより、復号された値を計算する。
この実施の形態では、本発明の新規の方法が、元となった分割マルチレート格子VQ法よりも多くのビットを消費する可能性を防ぐための発想を説明する。
1)すべてのサブベクトルのコードブック指示を算出する。
2)コードブック指示が最も多くのビットを消費するサブベクトルの位置を特定する。
3)コードブック指示を所定の閾値と比較する(閾値は、本発明の方法のビット消費が従来の方法よりも確実に少なくなるように、大きなデータベースに基づいて計算された所定の値である)。
a)指示が最も多くのビットを消費するコードブック指示のコードブックインデックスを推定する。
b)実際の値と推定値との差分を符号化する。
c)コードブック指示が最も多くのビットを消費するサブベクトルの位置を符号化し、最も多くのビットを消費するサブベクトル以外のすべてのサブベクトルのコードブック指示を符号化する。
a)最後のサブベクトルのコードブック指示を推定する。
b)実際の値と推定値の差分を符号化し、最後のサブベクトル以外のすべてのサブベクトルのコードブック指示を符号化する。
1)すべての他のサブベクトルのコードブック指示を復号する。
2)コードブック指示が変換されたサブベクトルのコードブック指示を推定する。
3)実際の値と推定値との差分を復号する。
4)推定値と差分を加算することにより、復号された値を計算する。
5)復号された値を所定の閾値と比較する。
a)コードブック指示が最も多くのビットを消費するサブベクトルの位置を復号する。
従来技術では、コードブック指示はコードブック使用の確率に応じて設計されるのではなく、単に、図6に示すようなコードブック指示表が広く使用される。
1)すべてのサブベクトルのコードブック指示を符号化する。
2)コードブック指示が最も多くのビットを消費するサブベクトルの位置を特定し、符号化する。
3)指示が最も多くのビットを消費するコードブックを推定する。
4)実際の値と推定値との差分を符号化する。
1)コードブック指示が最も多くのビットを消費するサブベクトルの位置を復号する。
2)それ以外のすべてのサブベクトルのコードブック指示を復号する。
3)指示が最も多くのビットを消費するコードブックを推定する。
4)実際の値と推定値との差分を復号する。
5)推定値と差分を加算することにより、復号された値を算出する。
この実施の形態の特徴は、本発明のコードブック指示変換方法で節減されたビットを利用して、量子化されたベクトルのゲイン精度を向上させる点である。
この実施の形態では、実際のコードブック指示cbmaxと推定されるコードブック指示cb’maxとの差分cbdiffが正になる可能性を防ぐための発想である。
この実施の形態では、本発明の新規の方法が、元となった分割マルチレート格子VQ法よりも多くのビットを消費する可能性を防ぐための発想を説明する。
1402 音響心理学的モデル解析部
1403、1506、1604 分割マルチレート格子VQ部
1404、1407、1507、1510、1605、1608 コードブック指示変換部
1405、1508、1606 多重化部
1406、1509、1607 逆多重化部
1408、1511、1609 分割マルチレート格子VQ-1部
1409、1512、1610 F/T変換部
1501 LPC解析部
1502 量子化部
1503、1513 逆量子化部
1504 LPC逆フィルタ
1514 LPC合成フィルタ
1601 CELPエンコーダ
1602 CELPローカルデコーダ
1611 CELPデコーダ
Claims (11)
- 時間領域入力信号を周波数スペクトルに変換する時間周波数領域変換部と、
前記周波数スペクトルの入力信号をサブバンドに分割し、サブバンドに分割した入力信号を量子化して、コードブック指示を生成するベクトル量子化部と、
前記コードブック指示を変換するコードブック指示変換部と、
を具備し、
前記コードブック指示変換部は、
前記コードブック指示が最も多くのビットを消費するサブベクトルの位置を特定し、特定した位置、および、最も多くのビットを消費するサブベクトル以外のすべてのサブベクトルのコードブック指示を符号化し、
前記コードブック指示が最も多くのビットを消費するコードブックを推定し、
実際のコードブック指示と推定されたコードブック指示との差分を符号化する、
音響/音声符号化装置。 - 前記コードブック指示変換部は、前記コードブック指示が前記閾値より大きい場合、
前記コードブック指示が最も多くのビットを消費するサブベクトル以外のすべてのサブベクトルのコードブック指示を符号化し、
利用可能な総ビット数および他のサブベクトルのビット使用状況の情報に基づいて、最も多くのビットを消費するコードブック指示を推定し、
実際のコードブック指示と推定されたコードブック指示との差分と、前記コードブック指示が最も多くのビットを消費するサブベクトルの位置とを符号化する、
請求項1に記載の音響/音声符号化装置。 - 前記コードブック指示変換部は、前記コードブック指示が前記閾値以下である場合、
所定のサブバンド以外のすべてのサブバンドのコードブック指示を符号化し、
利用可能な総ビット数および他のサブベクトルのビット使用状況の情報に基づいて、前記所定のサブバンドのコードブック指示を推定し、
実際のコードブック指示と推定されたコードブック指示との差分を符号化する、
請求項1に記載の音響/音声符号化装置。 - 前記コードブック指示変換部は、総消費ビット数が割り振られた総ビット数より大きい場合、
残りのビットがなくなるまで前記サブベクトルのコードブック指示を符号化する、
請求項1に記載の音響/音声符号化装置。 - 前記コードブック指示変換部は、
最も大きなエネルギーを有するサブベクトルに未使用ビットを配分し、前記未使用ビットを配分したサブベクトルのコードブックおよびコードベクトルを更新する、
請求項2に記載の音響/音声符号化装置。 - 前記コードブック指示変換部は、
零ベクトルとして符号化されるサブベクトルに前記未使用ビットを配分し、前記未使用ビットを配分したサブベクトルのコードブックおよびコードベクトルを更新する、
請求項2に記載の音響/音声符号化装置。 - 音響/音声符号化装置によって符号化されたコードブック指示が最も多くのビットを消費するサブベクトルの位置を復号し、最も多くのビットを消費するサブベクトル以外のすべてのサブベクトルのコードブック指示を復号し、前記コードブック指示が最も多くのビットを消費するコードブック指示を推定し、実際のコードブック指示と推定したコードブック指示との差分を復号し、推定した前記コードブック指示に復号した前記差分を加算してコードブック指示を復号するコードブック指示変換部と、
復号された前記コードブック指示を含む各サブベクトルのスペクトル係数を逆量子化するベクトル逆量子化部と、
前記逆量子化されたスペクトル係数を時間領域に変換する周波数時間領域変換部と、
を具備する音響/音声復号装置。 - 前記コードブック指示変換部は、
サブベクトルのコードブック指示を順次復号し、すべてのサブベクトルのコードブック指示が復号される前に残りのビット数が0より大きい場合、利用可能な総ビット数と他のサブベクトルのビット使用状況に基づいて、前記音響/音声符号化装置によって符号化されたコードブック指示を推定し、前記残りのビット数が0になる場合、復号処理を終了する、
請求項7に記載の音響/音声復号装置。 - 前記コードブック指示変換部は、
復号された前記コードブック指示が所定の閾値より大きい場合、サブベクトルの位置を復号し、復号された前記コードブック指示を対応するサブベクトルに割り当て、
復号された前記コードブック指示が前記閾値以下である場合、復号された前記コードブック指示を所定のサブバンドに割り当てる、
請求項8に記載の音響/音声復号装置。 - 時間領域入力信号を周波数スペクトルに変換する時間周波数領域変換工程と、
前記周波数スペクトルの入力信号をサブバンドに分割し、サブバンドに分割した入力信号を量子化して、コードブック指示を生成するベクトル量子化工程と、
前記コードブック指示を変換するコードブック指示変換工程と、
を具備し、
前記コードブック指示変換工程は、
前記コードブック指示が最も多くのビットを消費するサブベクトルの位置を特定し、特定した位置、および、最も多くのビットを消費するサブベクトル以外のすべてのサブベクトルのコードブック指示を符号化し、
前記コードブック指示が最も多くのビットを消費するコードブックを推定し、
実際のコードブック指示と推定されたコードブック指示との差分を符号化する、
音響/音声符号化方法。 - 音響/音声符号化装置によって符号化されたコードブック指示が最も多くのビットを消費するサブベクトルの位置を復号し、最も多くのビットを消費するサブベクトル以外のすべてのサブベクトルのコードブック指示を復号し、前記コードブック指示が最も多くのビットを消費するコードブック指示を推定し、実際のコードブック指示と推定したコードブック指示との差分を復号し、推定した前記コードブック指示に復号した前記差分を加算してコードブック指示を復号するコードブック指示変換工程と、
復号された前記コードブック指示を含む各サブベクトルのスペクトル係数を逆量子化するベクトル逆量子化工程と、
前記逆量子化されたスペクトル係数を時間領域に変換する周波数時間領域変換工程と、
を具備する音響/音声復号方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/376,501 US9454972B2 (en) | 2012-02-10 | 2013-02-01 | Audio and speech coding device, audio and speech decoding device, method for coding audio and speech, and method for decoding audio and speech |
EP13747107.4A EP2814028B1 (en) | 2012-02-10 | 2013-02-01 | Audio and speech coding device, audio and speech decoding device, method for coding audio and speech, and method for decoding audio and speech |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2012-027702 | 2012-02-10 | ||
JP2012027702 | 2012-02-10 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013118476A1 true WO2013118476A1 (ja) | 2013-08-15 |
Family
ID=48947247
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2013/000550 WO2013118476A1 (ja) | 2012-02-10 | 2013-02-01 | 音響/音声符号化装置、音響/音声復号装置、音響/音声符号化方法および音響/音声復号方法 |
Country Status (4)
Country | Link |
---|---|
US (1) | US9454972B2 (ja) |
EP (1) | EP2814028B1 (ja) |
JP (1) | JPWO2013118476A1 (ja) |
WO (1) | WO2013118476A1 (ja) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113766237A (zh) * | 2021-09-30 | 2021-12-07 | 咪咕文化科技有限公司 | 一种编码方法、解码方法、装置、设备及可读存储介质 |
WO2021256082A1 (ja) * | 2020-06-18 | 2021-12-23 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | 符号化装置、復号装置、符号化方法、及び、復号方法 |
WO2022201632A1 (ja) * | 2021-03-23 | 2022-09-29 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | 符号化装置、復号装置、符号化方法、及び、復号方法 |
WO2023100494A1 (ja) * | 2021-12-01 | 2023-06-08 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | 符号化装置、復号装置、符号化方法、及び、復号方法 |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
RU2643452C2 (ru) | 2012-12-13 | 2018-02-01 | Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. | Устройство кодирования аудио/голоса, устройство декодирования аудио/голоса, способ кодирования аудио/голоса и способ декодирования аудио/голоса |
TWM487509U (zh) | 2013-06-19 | 2014-10-01 | 杜比實驗室特許公司 | 音訊處理設備及電子裝置 |
US9418671B2 (en) * | 2013-08-15 | 2016-08-16 | Huawei Technologies Co., Ltd. | Adaptive high-pass post-filter |
EP3044876B1 (en) | 2013-09-12 | 2019-04-10 | Dolby Laboratories Licensing Corporation | Dynamic range control for a wide variety of playback environments |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005528839A (ja) * | 2002-05-31 | 2005-09-22 | ヴォイスエイジ・コーポレーション | 信号のマルチレートによる格子ベクトル量子化の方法とシステム |
JP2007525707A (ja) * | 2004-02-18 | 2007-09-06 | ヴォイスエイジ・コーポレーション | Acelp/tcxに基づくオーディオ圧縮中の低周波数強調の方法およびデバイス |
WO2012004998A1 (ja) * | 2010-07-06 | 2012-01-12 | パナソニック株式会社 | スペクトル係数コーディングの量子化パラメータを効率的に符号化する装置及び方法 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6370502B1 (en) * | 1999-05-27 | 2002-04-09 | America Online, Inc. | Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec |
US20070147518A1 (en) | 2005-02-18 | 2007-06-28 | Bruno Bessette | Methods and devices for low-frequency emphasis during audio compression based on ACELP/TCX |
US8515767B2 (en) | 2007-11-04 | 2013-08-20 | Qualcomm Incorporated | Technique for encoding/decoding of codebook indices for quantized MDCT spectrum in scalable speech and audio codecs |
CN102081927B (zh) * | 2009-11-27 | 2012-07-18 | 中兴通讯股份有限公司 | 一种可分层音频编码、解码方法及系统 |
US9786292B2 (en) * | 2011-10-28 | 2017-10-10 | Panasonic Intellectual Property Corporation Of America | Audio encoding apparatus, audio decoding apparatus, audio encoding method, and audio decoding method |
-
2013
- 2013-02-01 WO PCT/JP2013/000550 patent/WO2013118476A1/ja active Application Filing
- 2013-02-01 EP EP13747107.4A patent/EP2814028B1/en not_active Not-in-force
- 2013-02-01 US US14/376,501 patent/US9454972B2/en active Active
- 2013-02-01 JP JP2013557416A patent/JPWO2013118476A1/ja active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005528839A (ja) * | 2002-05-31 | 2005-09-22 | ヴォイスエイジ・コーポレーション | 信号のマルチレートによる格子ベクトル量子化の方法とシステム |
JP2007525707A (ja) * | 2004-02-18 | 2007-09-06 | ヴォイスエイジ・コーポレーション | Acelp/tcxに基づくオーディオ圧縮中の低周波数強調の方法およびデバイス |
WO2012004998A1 (ja) * | 2010-07-06 | 2012-01-12 | パナソニック株式会社 | スペクトル係数コーディングの量子化パラメータを効率的に符号化する装置及び方法 |
Non-Patent Citations (10)
Title |
---|
"Extended AMR Wideband Speech Codec (AMR-WB+", 3GPP TS 26.290 |
"G.729-based embedded variable bit-rate coder: An 8-32kbit/s scalable wideband coder bitstream interoperable with G.729", ITU-T RECOMMENDATION G.729.1, 2007 |
KARL HEINZ BRANDENBURG: "MP3 and AAC Explained", AES 1 INTERNATIONAL CONFERENCE, September 1999 (1999-09-01) |
LEFEBVRE ET AL.: "High quality coding of wideband audio signals using transform coded excitation (TCX", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, vol. 1, April 1994 (1994-04-01), pages 1,193 - 1,196 |
M. XIE; J.-P. ADOUL: "Embedded algebraic vector quantization (EAVQ) with application to wideband audio coding", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP, vol. 1, 1996, pages 240 - 243 |
M.XIE ET AL.: "Embedded algebraic vector quantizers (EAVQ) with application to wideband speech coding", PROCEEDINGS OF THE 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP'96), vol. I, May 1996 (1996-05-01), pages 240 - 243, XP002252600 * |
S. RAGOT; B. BESSETTE; R. LEFEBVRE: "Low-complexity Multi-Rate Lattice Vector Quantization with Application to Wideband TCX Speech Coding at 32kbit/s", PROC. IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP, vol. 1, May 2004 (2004-05-01), pages 501 - 504, XP010717675, DOI: doi:10.1109/ICASSP.2004.1326032 |
S.RAGOT ET AL.: "Low-complexity multi-rate lattice vector quantization with application to wideband TCX speech coding at 32kbit/s", PROCEEDINGS OF THE 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP'04), vol. I, May 2004 (2004-05-01), pages 501 - 504, XP010717675 * |
See also references of EP2814028A4 * |
T. VAILLANCOURT ET AL.: "ITU-T EV-VBR: A Robust 8-32 kbit/s Scalable Coder for Error Prone Telecommunication Chamiels", PROC. EUSIPCO, August 2008 (2008-08-01) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021256082A1 (ja) * | 2020-06-18 | 2021-12-23 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | 符号化装置、復号装置、符号化方法、及び、復号方法 |
WO2022201632A1 (ja) * | 2021-03-23 | 2022-09-29 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | 符号化装置、復号装置、符号化方法、及び、復号方法 |
CN113766237A (zh) * | 2021-09-30 | 2021-12-07 | 咪咕文化科技有限公司 | 一种编码方法、解码方法、装置、设备及可读存储介质 |
WO2023100494A1 (ja) * | 2021-12-01 | 2023-06-08 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | 符号化装置、復号装置、符号化方法、及び、復号方法 |
Also Published As
Publication number | Publication date |
---|---|
EP2814028A4 (en) | 2015-05-06 |
EP2814028B1 (en) | 2016-08-17 |
US20150025879A1 (en) | 2015-01-22 |
JPWO2013118476A1 (ja) | 2015-05-11 |
EP2814028A1 (en) | 2014-12-17 |
US9454972B2 (en) | 2016-09-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2013118476A1 (ja) | 音響/音声符号化装置、音響/音声復号装置、音響/音声符号化方法および音響/音声復号方法 | |
CA2923218C (en) | Adaptive bandwidth extension and apparatus for the same | |
JP5357055B2 (ja) | 改良形デジタルオーディオ信号符号化/復号化方法 | |
KR101139172B1 (ko) | 스케일러블 음성 및 오디오 코덱들에서 양자화된 mdct 스펙트럼에 대한 코드북 인덱스들의 인코딩/디코딩을 위한 기술 | |
JP5695074B2 (ja) | 音声符号化装置および音声復号化装置 | |
CN105679327B (zh) | 用于对音频信号进行编码和解码的方法及设备 | |
TWI619116B (zh) | 產生帶寬延伸訊號的裝置及方法、及非暫時性電腦可讀記錄媒體 | |
JP6980871B2 (ja) | 信号符号化方法及びその装置、並びに信号復号方法及びその装置 | |
EP2772912B1 (en) | Audio encoding apparatus, audio decoding apparatus, audio encoding method, and audio decoding method | |
JP2015172779A (ja) | オーディオ及び/またはスピーチ信号符号化及び/または復号化方法及び装置 | |
JP5629319B2 (ja) | スペクトル係数コーディングの量子化パラメータを効率的に符号化する装置及び方法 | |
JP5863765B2 (ja) | 符号化方法および装置、そして、復号化方法および装置 | |
JP2014513813A (ja) | 適応的な利得−シェイプのレート共用 | |
WO2009022193A2 (en) | Devices, methods and computer program products for audio signal coding and decoding | |
US20100280830A1 (en) | Decoder | |
KR102148407B1 (ko) | 소스 필터를 이용한 주파수 스펙트럼 처리 장치 및 방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13747107 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2013557416 Country of ref document: JP Kind code of ref document: A |
|
REEP | Request for entry into the european phase |
Ref document number: 2013747107 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14376501 Country of ref document: US Ref document number: 2013747107 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |