WO2006008817A1 - オーディオ符号化装置及びオーディオ符号化方法 - Google Patents
オーディオ符号化装置及びオーディオ符号化方法 Download PDFInfo
- Publication number
- WO2006008817A1 WO2006008817A1 PCT/JP2004/010416 JP2004010416W WO2006008817A1 WO 2006008817 A1 WO2006008817 A1 WO 2006008817A1 JP 2004010416 W JP2004010416 W JP 2004010416W WO 2006008817 A1 WO2006008817 A1 WO 2006008817A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- block
- fluctuation ratio
- input signal
- short
- encoding
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 90
- 238000004364 calculation method Methods 0.000 claims abstract description 75
- 238000006243 chemical reaction Methods 0.000 claims abstract description 44
- 238000013139 quantization Methods 0.000 claims description 107
- 238000004458 analytical method Methods 0.000 claims description 46
- 230000000873 masking effect Effects 0.000 claims description 31
- 230000001131 transforming effect Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 abstract description 21
- 230000009466 transformation Effects 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 24
- 238000001228 spectrum Methods 0.000 description 22
- 230000008569 process Effects 0.000 description 19
- 238000009432 framing Methods 0.000 description 16
- 238000007796 conventional method Methods 0.000 description 13
- 230000008859 change Effects 0.000 description 11
- 230000000694 effects Effects 0.000 description 9
- 230000005236 sound signal Effects 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 6
- 230000004048 modification Effects 0.000 description 6
- 238000012986 modification Methods 0.000 description 6
- 238000005311 autocorrelation function Methods 0.000 description 4
- 241000282414 Homo sapiens Species 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 230000007480 spreading Effects 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 101150107341 RERE gene Proteins 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/022—Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
- G10L19/025—Detection of transients or attacks for time/frequency resolution switching
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/022—Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0212—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
Definitions
- the present invention relates to an audio encoding device and an audio encoding method for encoding an audio signal.
- the mainstream of audio encoding apparatuses is an adaptive conversion audio encoding apparatus using human auditory characteristics.
- the basic encoding process of the adaptive transform audio encoding device is as follows.
- an audio signal in the time domain is converted into a frequency domain.
- the signal on the frequency axis is divided by the frequency band corresponding to the auditory frequency resolution. Then, using the human auditory characteristics, the optimum amount of information necessary for encoding is calculated in each frequency band.
- the signal on the frequency axis is quantized according to the amount of information allocated to each frequency band.
- MPEG Motion Picture Experts Group
- AAC Advanced Audio Coding
- ISO International Organization for Standardization
- IEC International Electrotechnical and Ommission
- FIG. 10 is a configuration diagram showing the configuration of the encoder of MPEG-2 AAC, which is the first conventional technology.
- the technique shown in this figure is referred to as the first conventional technique.
- the details of the AAC encoder are described in detail in, for example, Non-Patent Document 1 below.
- the AAC encoder divides an input signal into frames each having a predetermined number of samples.
- the AAC encoder performs an encoding process for each frame.
- the length of one frame and one long block is the same. The following explanation is the processing procedure of the A AC encoder shown in FIG.
- an input signal is input to framing section 1001.
- the framing unit 1001 divides the input signal into frames (long blocks) having a predetermined number of samples.
- the signal output from the framing unit 1001 is input to a long block discrete cosine transform unit (hereinafter simply referred to as an MD CT conversion unit) 1002 and a short block MDCT conversion unit 1003.
- an MD CT conversion unit long block discrete cosine transform unit
- the MDCT conversion unit 1002 for the long block performs 1024 points of MDCT conversion on the input signal. Then, the MDCT conversion unit 1002 for the long block calculates the MDCT coefficient (MDCT1). The short block MDCT conversion unit 1003 performs 128-point MDCT conversion on the input signal. Then, the short block MDCT conversion unit 1003 calculates the MDCT coefficient (MDCT2). Since there are 8 short blocks per frame, 8 sets of MDCT2 are generated.
- the framing unit 1001 outputs the divided input signal to the psychoacoustic analysis unit 1004 for long blocks. Then, the psychoacoustic analysis unit 1004 for the long block obtains the masking threshold Thl for the long block and the psychoacoustic entropy PE1 from the input signal force.
- Thl and PE1 the method shown in the psychological auditory model section of Non-Patent Document 1 is known.
- the framing unit 1001 outputs the input signal divided into frames to the psychoacoustic analysis unit 1005 for short blocks. Then, the psychoacoustic analysis unit 1005 for the short block obtains the masking threshold Th2 for the short block and the psychoacoustic entropy PE2 from the input signal.
- psychoacoustic entropy is the amount of information representing the minimum number of bits necessary to quantize a signal.
- Masking refers to a phenomenon in which human beings cannot perceive an error if the error is below a certain standard when the signal is quantized by the quantizer.
- the reference value indicating the limit of the error that cannot be perceived by humans is called a masking threshold.
- FIG. 11 is a schematic diagram showing an example of pre-echo.
- A) of FIG. 11 is a schematic diagram showing an input signal before encoding
- (b) of FIG. 11 is a graph showing a decoded sound when encoding is performed only with a long block. At the beginning of Fig. 11 (b), noise that is not present in the input signal is generated before the attack sound.
- the block length determination unit 1006 determines the nature of the input signal. Then, the block length determination unit 1006 determines an optimal block length for quantization. Specifically, the block length determination unit 1006 selects a long block if PEl> PEl_thr, and selects a short block otherwise.
- PE1 ⁇ thr is a threshold value (constant) determined by force.
- the determination result of the block length determination unit 1006 is output to the selector 1007 that selects MDCT.
- the masking threshold selected by the block length determination unit 1006 is output to the spectrum quantization unit 1008. That is, when the block length determination unit 1006 selects a long block, MDCT1 and Thl are input to the spectrum quantization unit 1008. In addition, when the block length / half IJ fixing unit 1006 selects a short block, MDCT2 and Th2 are input to the spectrum quantization unit 1008.
- the spectrum quantization unit 1008 quantizes the MDCT coefficient for each frequency band according to the input masking threshold. Then, the spectrum quantization unit 1008 outputs the quantization code 1.
- Quantization code 1 output from spectrum quantization section 1008 is input to Huffman encoding section 1 009.
- the Huffman encoding unit 1009 converts the quantization code 1 into the quantization code 2 from which the redundancy is further removed than the quantization code 1.
- the quantization code 2 is output from the Huffman code key unit 1009 to the quantization control unit 1011. Then, the quantization control unit 1011 finally outputs from the input quantization code 2. Calculate the total number of bits in the bitstream. In FIG. 10, a range surrounded by a dotted line is a range that can be controlled by the quantization control unit 1011.
- the quantization control unit 1011 repeats the process (5) and the process (7) so that the spectrum quantization unit 1008 The Huffman encoder 1009 is controlled. Also, the quantization control unit 1011 causes the Huffman coding unit 1009 to output the quantization code 2 to the bit stream generation unit 1010 when the calculated total number of bits is less than the number of bits allowed for the current block. Then, the quantization control unit 1011 controls the bit stream generation unit 1010 to output a bit stream.
- the AAC method transforms the MDCT spectrum into a mantissa part and an exponent part. That is, the A AC method transforms the MDCT spectrum into a floating point display. Then, the AAC method quantizes the mantissa part (MDCT quantization).
- the AAC method obtains the number of bits (total number of bits) required when the mantissa part and the exponent part quantized in (b) are Huffman coded.
- the AAC method ends the quantization if the total number of bits obtained in (c) is less than the number of quantization bits allowed in the current frame (allowable number of bits). In the AAC method, if the total number of bits is greater than the allowable number of bits, the exponent part set in (a) is judged to be inappropriate. In the AAC method, the exponent part is changed and the process (b)-(d) is repeated. The AAC method determines the exponent part where the total number of bits is less than the allowable number of bits.
- the exponent part is temporarily fixed.
- the mantissa part is determined and the MDCT spectrum is quantized.
- the AAC method calculates the total number of bits so that the quantization error when the MDCT spectrum is transformed into the exponent part and the mantissa part is less than the allowable error.
- the AAC method is judged to be inappropriate if the total number of bits is larger than the preset bit rate.
- the exponent part is changed, and the exponent part fixing process and the mantissa part quantization process are performed again.
- the AAC scheme determines the optimal exponent part and mantissa part such that the quantization error is less than the allowable error and the total number of bits is less than the set bit rate. [0021] As described above, the AAC scheme calculates the total number of bits required after quantization and Huffman coding. The AAC system then determines the optimal exponent and mantissa part so that the total number of bits is less than the allowable number of bits allowed for the current frame.
- “optimal” means “quantization error is less than allowable error”.
- the first prior art selects an optimal block length from a long block, a short block, and a force. Therefore, the first conventional technique can obtain a good sound quality with less pre-echo.
- the first conventional technology performs MDCT conversion and psychoacoustic analysis for both long blocks and short blocks. Therefore, the first conventional technology has a large amount of processing
- FIG. 12 is a configuration diagram showing the configuration of the second prior art. This second prior art divides one frame into shorter blocks.
- an input signal is input to the framing unit 1201.
- the framing unit 1201 divides the input signal into frames (long blocks) having a predetermined number of samples.
- the signal output from 01 is the power calculator 1202, selector 1204, psychoacoustic analyzer 1
- the power calculator 1202 calculates power and a power fluctuation ratio from the input signal.
- the power calculation unit 1202 outputs the calculated power fluctuation ratio to the block length determination unit 1203.
- the block length determination unit 1203 determines whether to use a long block or a short block based on the input power fluctuation ratio. Then, the block length determination unit 1203 outputs the determination result to the selector 1204 and the selector 1207. Each selector 1204 and selector 1207 selects whether to use a long block or a short block based on the determination result of the block length determination unit 1203. [0028]
- the long block MDCT converter 1205 performs 1024-point MDCT conversion on the input signal. Then, the MDCT conversion unit 1205 for the long block includes the MDCT coefficient (MDCT1
- the short block MDCT conversion unit 1206 performs 128-point MDCT conversion on the input signal. Then, the short block MDCT conversion unit 1206 calculates an MDCT coefficient (MDC T2). Since there are 8 short blocks per frame, 8 sets of MDCT2 are generated.
- the psychoacoustic analysis unit 1208 obtains a masking threshold value from the input signal.
- the masking threshold value obtained from the input signal is input to the spectrum quantization unit 1209.
- the spectrum quantization unit 1209 quantizes the MDCT coefficient for each frequency band in accordance with the input masking threshold. Then, the spectrum quantization unit 1209 outputs a quantized code 1 obtained by quantizing the MDCT coefficient.
- Quantization code 1 output from spectrum quantization section 1209 is Huffman coding section 1
- the Huffman encoding unit 1210 converts the quantized code 1 into the quantized code 2 from which the redundancy is further removed than the quantized code 1.
- This quantization code 2 is input to the quantization control unit 1212.
- the quantization controller 1212 The quantization controller 1212
- a range surrounded by a dotted line is a range that can be controlled by the quantization control unit 1212.
- the quantization control unit 1212 repeats processing (3) and processing (5) so that the spectrum quantization unit 1209
- the Huffman encoder 1210 is controlled.
- the quantization control unit 1212 causes the Huffman coding unit 1210 to output the quantization code 2 to the bit stream generation unit 1211 when the calculated total number of bits is less than the number of bits allowed for the current block. Then, the quantization control unit 1212 controls the bit stream generation unit 1211 to output the bit stream.
- FIG. 13 shows an example in which a frame is divided into short blocks in the second prior art.
- FIG. Figure 13 shows the case where one frame is divided into four short blocks.
- the second conventional technique finds the input signal power P (l), P (2), P (3), P (4) for each short block.
- the second conventional technique is the power fluctuation ratio between adjacent short blocks ⁇ (1, 2), ⁇
- ⁇ (i, j) is the electric power between short block i and short block i.
- the power fluctuation ratio increases when the input signal rapidly increases. Conversely, the power fluctuation ratio decreases when the input signal suddenly decreases. Therefore, when the power fluctuation ratio hardly changes, the block length determination unit 1203 selects a long block. Also, the block length determination unit 1203 selects a short block when the power fluctuation ratio suddenly increases or decreases. This process allows the second prior art to select the optimal window length.
- the block length is determined before MDCT conversion and psychoacoustic analysis. Therefore, the second prior art performs MDCT conversion and psychoacoustic analysis for only one of the long block and the short block. Therefore, the second conventional technique can encode an audio signal with a smaller processing amount than the first conventional technique.
- the second conventional technique may not be able to detect a change in the nature of the input signal. For example, if a sine wave is input and the frequency of the sine wave changes while the power remains constant, the second prior art cannot detect the signal change point by using only the power fluctuation ratio. .
- FIG. 14 is a diagram showing examples of input signals, power fluctuation ratios, and predicted gain fluctuation ratios.
- 14 (a) is a graph showing the input signal before signing
- FIG. 14 (b) is a graph of the power fluctuation ratio
- FIG. 14 (c) is the predicted gain fluctuation ratio. It is a graph of.
- Patent Document 1 Japanese Patent Laid-Open No. 7-66733
- Non-patent literature l PART7 of ISO / lEC 13818-7, "Advanced Audio Coding (AAC)"
- the first conventional technique performs MDCT conversion and psychoacoustic analysis for each of the long block and the short block. For this reason, the first conventional technique has a problem that the processing amount is larger than the case of processing only a long block or a short block.
- the second prior art has a problem that an appropriate block length may not be selected.
- An object of the present invention is to provide an audio encoding device and an audio encoding method capable of appropriately selecting a block length while reducing the processing amount.
- the audio encoding device of the present invention includes:
- Power calculating means for calculating a power fluctuation ratio from the input signal; Calculating means for calculating a predicted gain fluctuation ratio from the input signal;
- Block length determination means for determining whether to perform encoding using a long block or encoding using a short block from the power variation ratio and the predicted gain variation ratio.
- the audio encoding device of the present invention includes:
- the block length determination means is
- the audio encoding device of the present invention includes:
- Threshold value determining means is provided for changing a threshold value for determining the block length when the code used by the block length determining means is changed according to the determination result of the block length determining means.
- the audio encoding device of the present invention provides:
- the threshold value determining means is
- the threshold value is set to a value larger than the initial value.
- the audio encoding device of the present invention provides:
- the calculating means is
- the power calculation means uses a predetermined number of blocks for calculating power to form one block, and calculates the predicted gain fluctuation ratio of the one block.
- the audio encoding device of the present invention includes:
- the calculation means uses a predetermined number of blocks for calculating the prediction gain as one block, and calculates the power fluctuation ratio of the one block.
- the audio encoding device of the present invention includes:
- a long block mode that divides the input signal into frames of a certain number of samples and encodes one frame of the input signal;
- an audio coding apparatus provided with a short block mode for dividing the frame into short blocks and coding the short blocks
- Power calculating means for calculating a power fluctuation ratio from the input signal
- Block length determination means for determining whether to perform encoding by a long block or encoding by a short block from the power fluctuation ratio and the prediction gain fluctuation ratio;
- a first conversion unit that obtains a first coefficient by performing discrete cosine transform on an input signal in units of long blocks
- the block length determination unit When coding by a short block is selected by the second block, a second conversion unit that obtains a second coefficient by performing discrete cosine transform on the input signal in units of a short block, and a determination result of the block length determination unit Selecting means for selecting the first coefficient or the second coefficient as a third coefficient;
- Quantization means for spectrally quantizing the third coefficient according to the masking threshold to obtain a first code
- Huffman encoding means for Huffman encoding the first code to obtain a second code, and calculating the total number of bits of the output bitstream from the second code, and based on the result of the calculation Quantization control means for instructing output of the bitstream;
- Bit stream generating means for generating a bit stream from the second code and outputting the bit stream based on an instruction from the quantization control means.
- the audio encoding device of the present invention includes:
- the block length determination means is
- At least one of the power fluctuation ratio and the predicted gain fluctuation ratio is larger than a predetermined threshold, encoding by a short block is selected, and at least one of the power fluctuation ratio and the predicted gain fluctuation ratio is In cases other than the case where the threshold value is larger than a predetermined threshold value, encoding by a long block is selected.
- the audio encoding device of the present invention includes:
- Threshold for determining the block length when the code used by the block length determination means is input
- Threshold value determining means for changing the value according to the determination result of the block length determining means is provided.
- the audio encoding device of the present invention provides:
- the threshold value determining means is
- the threshold value is set to a value larger than the initial value.
- the audio encoding device of the present invention includes:
- the calculating means is
- the power calculation means uses a predetermined number of blocks for calculating power to form one block, and calculates the predicted gain fluctuation ratio of the one block.
- the audio encoding device of the present invention includes:
- the calculation means uses a predetermined number of blocks for calculating the prediction gain as one block, and calculates the power fluctuation ratio of the one block.
- the audio encoding method of the present invention includes:
- a block length determination step for determining whether to perform encoding by a long block or encoding by a short block from the power fluctuation ratio and the prediction gain fluctuation ratio.
- the audio encoding method of the present invention includes:
- a long block mode that divides the input signal into frames of a certain number of samples and encodes one frame of the input signal
- an audio encoding method comprising: a short block mode for dividing the frame into short blocks and encoding the short blocks;
- a power calculation step of calculating a power fluctuation ratio from the input signal A calculation step of calculating a predicted gain fluctuation ratio from the input signal;
- a block length determination step for determining whether to perform encoding by a long block or encoding by a short block from the power fluctuation ratio and the prediction gain fluctuation ratio;
- a Huffman encoding step for obtaining a second code by Huffman encoding the first code, and calculating the total number of bits of the output bitstream from the second code, and based on the result of the calculation
- a quantization control step for instructing output of the bitstream;
- the audio encoding device and the audio encoding method of the present invention determine whether to perform code encoding by a long block or code encoding by a short block from the power fluctuation ratio and the prediction gain fluctuation ratio. To do. Therefore, the audio encoding device and the audio encoding method of the present invention do not need to perform both encoding with a long block and encoding with a short block. Therefore, the audio encoding device and the audio encoding method of the present invention can reduce the amount of processing and determine the block length to be encoded using both the power fluctuation ratio and the predicted gain fluctuation ratio. Therefore, encoding with a more appropriate block length can be performed.
- the audio encoding device and audio encoding method of the present invention have a block length format.
- the threshold for block length determination used in accordance with the block length determination result for example, it is possible to prevent frequent selection of encoding by short blocks and to improve the sound quality of the output sound. Reduction can be reduced.
- the audio encoding device and the audio encoding method of the present invention use a predetermined number of blocks for calculating power to form one block, and calculate the predicted gain fluctuation ratio of this one block, thereby performing processing. The amount can be reduced.
- the audio encoding device and the audio encoding method of the present invention use a predetermined number of blocks for calculating the prediction gain as one block, and calculate the power fluctuation ratio of this one block, thereby The amount can be reduced.
- FIG. 1 is a schematic diagram of an audio encoding device according to the present invention.
- FIG. 2 is a conceptual diagram of an example of a long block and a short block used in the audio encoding device of the present invention.
- FIG. 3 is a conceptual diagram of a method for calculating a predicted gain fluctuation ratio in the audio encoding device of the present invention.
- FIG. 4 is a configuration diagram of a first embodiment of an audio encoding device according to the present invention.
- FIG. 5 is a flowchart of the operation of the block length determination method performed by the first embodiment of the audio encoding device of the present invention.
- FIG. 6 is a configuration diagram of a second embodiment of an audio encoding device of the present invention.
- FIG. 7 is a graph showing an operation of threshold value control in the threshold value determination unit of the second embodiment of the audio encoding device of the present invention.
- FIG. 8 is a conceptual diagram of a method for obtaining a predicted gain fluctuation ratio and a power fluctuation ratio in the third embodiment of the audio encoding device of the present invention.
- FIG. 9 Calculation of power fluctuation ratio in the fourth embodiment of the audio encoding device of the present invention. It is a conceptual diagram which shows the taking-out method.
- FIG. 10 is a configuration diagram showing a configuration of an encoder of MPEG-2 AAC, which is a first prior art.
- FIG. 11 is a schematic diagram showing an example of pre-echo.
- FIG. 12 A configuration diagram showing the configuration of the second prior art.
- FIG. 1 is a schematic diagram of an audio encoding device according to the present invention. The following description also serves as an overview of the audio encoding method of the present invention.
- a frame unit 101 divides an input signal into input signal frames (long blocks) having a predetermined number of samples.
- the MDCT conversion unit 106 for long blocks, the MDCT conversion unit 107 for short blocks, the power calculation unit 102, and the calculation unit 103 divide one frame into short blocks that are shorter than the long blocks.
- FIG. 1 is a schematic diagram of an audio encoding device according to the present invention. The following description also serves as an overview of the audio encoding method of the present invention.
- a frame unit 101 divides an input signal into input signal frames (long blocks) having a predetermined number of samples.
- the MDCT conversion unit 106 for long blocks, the MDCT conversion unit 107 for short blocks, the power calculation unit 102, and the calculation unit 103 divide one frame into short blocks that are shorter than the long blocks.
- FIG. 2 is a conceptual diagram of an example of a long block and a short block used in the audio encoding device of the present invention.
- Figure 2 shows the case where one frame (long block) is divided into four short blocks. The following description is based on the example shown in FIG. However, the present invention can be similarly implemented even when one frame is divided into n (n> 0).
- the power calculation unit 102 obtains input signal powers P (l), P (2), P (3), and P (4) for each short block. Next, the power calculator 102 calculates the power fluctuation ratio ⁇ (1, 2) between adjacent blocks,
- FIG. 3 is a conceptual diagram of a method for calculating a predicted gain fluctuation ratio in the audio encoding device of the present invention.
- the k parameter calculation method is arbitrary.
- the present invention can use, for example, a method of obtaining an autocorrelation function from an input signal and calculating a k parameter from the autocorrelation function by a known method such as a Levinson algorithm.
- the calculation unit 103 obtains the prediction gain fluctuation ratio ⁇ (i, j) from the prediction gains G (i) and G (j) power equations obtained from the short blocks i and j. .
- the power fluctuation ratio ⁇ (i, j) is input to the block length determination unit 104. Also, the expected interest
- the obtained fluctuation ratio ⁇ (i, j) is input to the block length determination unit 104.
- the block length determination unit 4 determines whether to quantize the long block or the short block.
- the block length determination unit 104 determines whether to quantize the long block or the short block.
- the block length determination unit selects a long block means that the block length determination unit selects encoding by the long block.
- that the block length determination unit selects a short block means that the block length determination unit selects encoding with a short block. That is, when the block length determination unit selects a block, it means that the block length determination unit selects a code key based on the block.
- the block length determination unit 104 determines the threshold TH for the power fluctuation ratio and the predicted gain fluctuation ratio T.
- the block length determination unit 104 determines the threshold T among ⁇ (1, 2), ⁇ (2, 3), ⁇ (3, 4).
- the block length determination unit 104 sets the threshold value among ⁇ (1, 2), ⁇ (2, 3), ⁇ (3, 4).
- the block length determination unit 104 selects a short block only when one of the power fluctuation ratio and the predicted gain fluctuation ratio in the frame exceeds a preset threshold, and otherwise Select a long block.
- the block length determination unit 104 selects a long block
- the determination result is output to the selector 105 and the selector 108.
- the selector 105 and the selector 108 select a block based on the determination result of the block length determination unit 104. Therefore, when the block length determination unit 104 selects a long block, the selector 105 and the selector 108 select a long block.
- the input signal output from framing section 101 is input to long block MDCT conversion section 106.
- the MDCT conversion unit 106 for long blocks outputs MDCT1.
- the block length determination unit 104 selects a short block
- the determination result is output to the selector 105 and the selector 108. Then, the selector 105 and the selector 108 select the short block.
- the input signal output from framing section 101 is input to short block MDCT conversion section 107.
- the MDCT conversion unit 107 for short blocks outputs MDCT coefficients for the number of short blocks. That is, when one frame is divided into four short blocks, the short block MDCT conversion unit 107 outputs four sets of MDCT coefficients.
- the psychoacoustic analysis unit 109 obtains a masking threshold value from the input signal.
- the psychoacoustic analysis unit 109 obtains a masking threshold for the long block.
- the psychoacoustic analysis unit 109 calculates a masking threshold for the short block when the block length determination unit 104 selects the short block.
- any method can be used as the masking threshold calculation method.
- the psychoacoustic analysis unit 109 can use the method disclosed in Non-Patent Document 1. That is, the psychoacoustic analysis unit 109 performs FFT analysis on the input signal. Then, the psychoacoustic analysis unit 109 obtains an FFT spectrum. Then, the psychoacoustic analysis unit 109 calculates a masking threshold from the FFT spectrum.
- the MDCT coefficient and the masking threshold are input to the quantization unit 110.
- Quantization Unit 110 quantizes the MDCT coefficient for each frequency band according to the input masking threshold. Then, the quantization unit 110 outputs the quantized code 1 in which the MDCT coefficient is quantized.
- the quantization code 1 is input to the Huffman coding unit 111. Then, the Huffman encoding unit 111 converts the quantized code 1 into the quantized code 2 from which the redundancy is further removed than the quantized code 1.
- the Huffman code unit 111 outputs the quantization code 2 to the quantization control unit 113.
- the quantization control unit 113 calculates the total number of bits of the bit stream that is finally output from the input quantization code 2.
- a range surrounded by a dotted line is a range that can be controlled by the quantization control unit 113.
- the quantization control unit 113 repeats the process (8) and the process (10) so that the quantization unit 110
- the Huffman code key unit 111 is controlled.
- the quantization control unit 113 causes the Huffman coding unit 111 to output the quantization code 2 to the bit stream generation unit 112 when the calculated total number of bits is less than the number of bits allowed for the current block.
- the quantization control unit 113 controls the bit stream generation unit 112 to output a bit stream.
- the audio encoding device shown in FIG. 1 realizes quantization.
- the quantization process in the present invention is the same as the details of the AAC quantization process described in the above-mentioned section of the prior art, and thus detailed description thereof is omitted.
- FIG. 4 is a configuration diagram of the first embodiment of the audio encoding device of the present invention.
- the framing unit 401 divides the input signal into input signal frames (long blocks) having a predetermined number of samples.
- the short block MDCT conversion unit 410, the power calculation unit 402, and the autocorrelation calculation unit 4003 divide the input frame into short blocks.
- the frame division in this embodiment will be described with reference to FIG. Fig. 2 is a conceptual diagram showing examples of long blocks and short blocks. In the example shown in Fig. 2, one frame (long block) is divided into four short blocks. Below, it demonstrates based on this example. However, this embodiment holds true even when one frame is divided into n (n is a non-negative integer).
- the power calculation unit 402 obtains input signal powers P (l), P (2), P (3), and P (4) for each short block. Then, the power calculation unit 402 calculates the power fluctuation ratio ⁇ (1
- This power fluctuation ratio is obtained by the aforementioned equation (1).
- autocorrelation calculation section 403 obtains autocorrelation from the short block input signal.
- autocorrelation calculation section 403 outputs this autocorrelation to k parameter calculation section 404.
- the k parameter calculation unit 404 calculates the k parameter from the autocorrelation function by a known method such as the Levinson algorithm. Note that the k parameter calculation unit 404 may obtain an autocorrelation function force LPC coefficient, and the k parameter calculation unit 404 may convert the LPC coefficient into a k parameter.
- the predicted gain fluctuation ratio calculation unit 406 calculates the predicted gain fluctuation ratio ⁇ — shown in the following equation from the predicted gains G (i) and G (j) obtained in the short block i and the short block j. Find (i, j).
- self Correlation calculation section 403, k parameter calculation section 404, prediction gain calculation section 405, and prediction gain fluctuation ratio calculation section 406 may be part of the function of calculation section 103 shown in FIG.
- the power fluctuation ratio ⁇ (i, j) and the predicted gain fluctuation ratio ⁇ (i, j) are determined as block lengths.
- FIG. 5 is a flowchart of the operation of the block length determination method performed by the first embodiment of the audio encoding device of the present invention.
- the fact that the block length determination unit selects a long block means that the block length determination unit selects encoding by the long block.
- that the block length determination unit selects a short block means that the block length determination unit selects encoding with a short block. That is, that the block length determination unit selects a block means that the block length determination unit selects encoding by the block.
- the block length determination unit 407 includes a threshold TH for the power fluctuation ratio and a predicted gain fluctuation ratio.
- the block length determination unit 407 uses the threshold TH among ⁇ (1, 2), ⁇ (2, 3), ⁇ (3, 4).
- the block length determination unit 407 has a threshold TH among ⁇ (1, 2), ⁇ (2, 3), and ⁇ (3, 4).
- the short block is selected (S504, S505, S506, S508), otherwise the long block is selected (S507).
- the block length determination unit 407 selects a short block only when one of the power fluctuation ratio and the predicted gain fluctuation ratio in the frame exceeds a preset threshold. Otherwise, select the long block.
- Each selector 408 and selector 411 select a block length to be used based on the determination result of the block length determination unit 407.
- the block length determination unit 407 selects a short block
- the input signal is input to the MDCT conversion unit 410 for the short block.
- the MDCT conversion unit 410 for short blocks outputs MDCT coefficients corresponding to the number of short blocks. That is, when one frame is divided into four short blocks, the short block MDCT conversion unit 410 outputs four sets of MDCT coefficients.
- the psychoacoustic analysis unit 412 obtains a masking threshold value from the input signal.
- the input signal output from the framing unit 401 is input to the psychoacoustic analysis unit 412.
- the psychoacoustic analysis unit 412 obtains a masking threshold for the long block.
- the psychoacoustic analysis unit 412 obtains a masking threshold for the short block when the block length determination unit 407 selects the short block.
- an arbitrary method can be used as the masking threshold calculation method.
- the psychoacoustic analysis unit 412 can use the method disclosed in Non-Patent Document 1. That is, the psychoacoustic analysis unit 412 performs FFT analysis on the input signal. Then, the psychoacoustic analysis unit 412 obtains an FFT spectrum. Then, the psychoacoustic analysis unit 412 calculates a masking threshold from the FFT spectrum.
- the MDCT coefficient and the masking threshold are input to the quantization unit 413.
- the quantization unit 413 quantizes the MDCT coefficient for each frequency band according to the input masking threshold.
- the quantization unit 413 outputs a quantization code 1 obtained by quantizing the MDCT coefficient.
- the quantization code 1 is input to the Huffman encoding unit 414. Then, the Huffman encoding unit 414 converts the quantization code 1 into a quantization code whose redundancy is further removed from that of the quantization code 1. Convert to code 2.
- the Huffman code unit 414 outputs the quantization code 2 to the quantization control unit 416.
- the quantization control unit 416 calculates the total number of bits of the bit stream that is finally output from the input quantization code 2.
- a range surrounded by a dotted line is a range that can be controlled by the quantization control unit 416.
- the quantization control unit 416 repeats the processing (8) and processing (10) so that the quantization unit 413
- the Huffman code key unit 414 is controlled.
- the quantization control unit 416 causes the Huffman coding unit 414 to output the quantization code 2 to the bit stream generation unit 415.
- the quantization control unit 415 controls the bit stream generation unit 415 to output a bit stream.
- this embodiment determines the block length before MDCT conversion, it can encode a high-quality audio signal with a smaller processing amount than the first conventional technology. It is. Further, in this embodiment, since the block length is determined using the power fluctuation ratio and the predicted gain fluctuation ratio, the block length is determined more accurately than the second conventional technique. It is possible to encode an audio signal with higher quality than the prior art.
- the block length to be encoded is determined before MDCT conversion and psychoacoustic analysis. Therefore, this embodiment can perform high-quality encoding with a small amount of processing compared to the first prior art. Furthermore, this embodiment uses a power fluctuation ratio and a predicted gain fluctuation ratio in the block length determination means. Therefore, this embodiment can determine the block length with higher accuracy than the second prior art.
- FIG. 14 are graphs showing calculation results of the power fluctuation ratio and the predicted gain fluctuation ratio.
- the input signal shown in Fig. 14 (a) has almost no change in section A with a power fluctuation ratio value of 0 (Fig. 14 (b)).
- the input signal shown in Fig. 14 (a) has a large fluctuation in the predicted gain fluctuation ratio in section A (Fig. 14 (c)).
- both the power fluctuation ratio and the predicted gain fluctuation ratio are calculated.
- a short block is selected when one of the power fluctuation ratio and the predicted gain fluctuation ratio exceeds a threshold value. Therefore, in this embodiment, the block length can be accurately determined even with an input signal such as section A shown in FIG.
- FIG. 6 is a configuration diagram of the second embodiment of the audio encoding device of the present invention. This embodiment is different from the first embodiment in that the threshold TH and the predicted gain change with respect to the power fluctuation ratio are
- the part that dynamically changes the threshold TH for the dynamic ratio is different.
- the other part is the first
- attack sounds In general, short blocks are often selected in areas that change rapidly, such as attack sounds.
- the attack sound has a large MDCT spectrum amplitude over a wide frequency range. Therefore, an attack sound requires a large number of quantization bits when it is encoded.
- the threshold value TH and the threshold value TH are increased for a certain period of time thereafter. As a result, in this embodiment, it is as short as possible.
- the operation of framing section 601 shown in FIG. 6 is the same as the operation of framing section 401 shown in FIG. 4, and the operation of power calculation section 602 is the power calculation section 402 shown in FIG.
- the operation of the autocorrelation calculation unit 603 is the same as the operation of the autocorrelation calculation unit 403 shown in FIG. 4, and the operation of the k parameter calculation unit 604 is the k parameter calculation shown in FIG.
- the operation of the unit 404 is the same, and the operation of the prediction gain calculation unit 605 is the same as the operation of the prediction gain calculation unit 405 shown in FIG.
- the operation of the predicted gain fluctuation ratio calculation unit 606 is the same as the operation of the prediction gain fluctuation ratio calculation unit 406 shown in FIG. 4, and the operation of the selector 609 is the operation of the selector 408 shown in FIG.
- the operation of the long block MDCT conversion unit 610 is the same as that of the long block MDCT conversion unit 409 shown in FIG.
- the operation of the MDCT conversion unit 611 for short blocks is the same as the operation of the M DCT conversion unit 410 for short blocks shown in FIG. 4, and the operation of the selector 612 is the selector shown in FIG.
- the operation of the psychoacoustic analysis unit 613 is the same as the operation of the psychoacoustic analysis unit 412 shown in FIG. 4, and the operation of the quantization unit 614 is the same as the operation of the quantization unit 413 shown in FIG.
- the operation of the Huffman encoder 615 is the same as the operation of the Huffman encoder 414 shown in FIG. 4, and the operation of the bitstream generator 616 is the bitstream generator shown in FIG.
- the operation of the quantization control unit 617 is the same as the operation of the quantization control unit 416 shown in FIG. In FIG. 6, a range surrounded by a dotted line is a range that can be controlled by the quantization control unit 617.
- the block length determination unit 607 shown in FIG. 6 receives the threshold value determined by the threshold value determination unit 608. Further, the block length determination unit 607 outputs the block length determination result to the selector 609, the selector 612, and the threshold value determination unit 608.
- the threshold determination unit 608 determines a threshold based on the determination result output from the block length determination unit 607. That is, the threshold value determination unit 608 outputs the increased threshold value when the determination result output from the block length determination unit 607 is a determination result for selecting a short block. Further, the block length determination unit 607 performs determination processing based on the threshold value received from the threshold value determination unit 608. Threshold is changed Except for the points that can be moved, the determination process in the block length determination unit 607 is the same as that shown in FIG. Further, the threshold determination unit 608 may be a part of the function of the calculation unit 103 shown in FIG.
- FIG. 7 is a graph showing the threshold control operation in the threshold value determination unit of the second embodiment of the audio encoding device of the present invention.
- the threshold TH is changed to TH + a. Where h> 0.
- short block is selected, the threshold TH is changed to TH + a. Where h> 0.
- the threshold TH is changed to TH + ⁇ .
- the threshold value is changed to the original value (initial value) TH, TH.
- the threshold TH and the threshold TH are increased for a certain period of time and the short blocks are not selected as continuously as possible.
- the present embodiment can obtain the same effects as those of the first embodiment described above. Further, in the present embodiment, once a short block is selected, the threshold value is controlled so that the short block is not selected for a certain time thereafter. For this reason, in this embodiment, it is possible to reduce deterioration in sound quality caused by continuously selecting short blocks.
- the short block is not selected for a certain period of time.
- the threshold is set to ⁇ + H + Himawari
- the threshold value is based on ⁇ .
- the third embodiment is different from the first embodiment described above in that the predicted gain fluctuation ratio is obtained in units of frames. That is, in the present embodiment, a predetermined number of blocks for calculating power are used as one block, and the predicted gain fluctuation ratio of this one block is calculated.
- LPC analysis is performed for each short block. Therefore, the first embodiment can accurately calculate the predicted gain fluctuation ratio.
- the number of executions of LPC analysis increases, so the amount of processing also increases.
- LPC analysis is performed once for each long block. Therefore, this embodiment can further reduce the amount of calculation compared to the first embodiment.
- FIG. 8 is a conceptual diagram of a method for obtaining a predicted gain fluctuation ratio and a power fluctuation ratio in the third embodiment of the audio encoding device of the present invention.
- the prediction gain is obtained from the k parameter obtained by performing the LPC analysis for each short block.
- the prediction gain fluctuation ratio is calculated based on the ratio to the prediction gain obtained in the same manner in the immediately preceding short block.
- this embodiment performs LPC analysis on the input signal of one long block (the nth frame) to obtain the k parameter. . That is, the k parameter calculation unit performs LPC analysis on the input signal of one long block (nth frame) to obtain the k parameter.
- the prediction gain G (n) is calculated from the k parameter.
- the prediction gain power G (n ⁇ 1) and G (n) obtained in the same manner in the previous frame (the (n ⁇ 1) th frame) is used to predict using the following equation: Gain fluctuation ratio ⁇ (n
- the present embodiment is a short block similar to the first embodiment.
- the power fluctuation ratios ⁇ (1, 2), ⁇ (2, 3), ⁇ (3, 4) are calculated for each lock.
- the form determines the optimum block length from the calculated predicted gain fluctuation ratio and power fluctuation ratio. Hereinafter, this determination operation will be described.
- the block length determination unit determines that ⁇ ( ⁇ ) is greater than a predetermined threshold value ⁇ .
- the block length judgment unit is one of ⁇ (1, 2), ⁇ (2, 3), ⁇ (3, 4).
- the block length determination unit selects the long block when the short block is not selected in either (1) or (2).
- the configuration and processing contents after selecting a block length are the same as those in the first embodiment. Therefore, the description of the configuration and processing contents after selecting the block length of this embodiment will be omitted.
- the present embodiment can obtain the same effects as those of the first embodiment of the present invention described above. Furthermore, in this embodiment, the block length can be selected with a smaller processing amount than in the first embodiment by performing the LPC analysis only once for the long block.
- the block for calculating the prediction gain is not limited to the case where a block of one frame is used. The prediction gain may be calculated. Even in this case, the present embodiment can obtain the same effects as described above.
- this embodiment is different from the first embodiment in the method of calculating the power fluctuation ratio performed by dividing one frame into eight short blocks. That is, in this embodiment, a predetermined number of blocks for calculating the prediction gain are used as one block, and the power fluctuation ratio of this one block is calculated.
- FIG. 9 is a conceptual diagram showing a method for calculating the power fluctuation ratio in the fourth embodiment of the audio encoding device of the present invention.
- one frame is divided into eight short blocks, and the power fluctuation ratio is calculated.
- this embodiment does not calculate one power fluctuation ratio for one short block as in the first embodiment. That is, this embodiment is different from the first embodiment in that the power fluctuation ratio is obtained from a plurality of adjacent short blocks.
- the calculation method of the power fluctuation ratio of this embodiment is shown below.
- power P (1) is obtained from the first and second short blocks.
- the power P (2) is obtained from the third and fourth short blocks.
- power P (3) is obtained from the fifth and sixth short blocks.
- power P (4) is obtained from the seventh and eighth short blocks.
- the power fluctuation ratio ⁇ (1, 2) is obtained from P (l) and P (2). Also book
- the power fluctuation ratio ⁇ (2, 3) is obtained from P (2) and P (3).
- this embodiment the power fluctuation ratio ⁇ (2, 3) is obtained from P (2) and P (3).
- this embodiment is different from the first embodiment in that the power of two short blocks is obtained. That is, in the first embodiment, 8 predicted gain fluctuation ratios and ⁇ power fluctuation ratios are calculated, whereas in this embodiment, 8 predicted gain fluctuation ratios and 4 power fluctuation ratios are calculated. Only pieces are calculated. That is, in the present embodiment, the number of predicted gain fluctuation ratios and power fluctuation ratios calculated within one frame may be different. Since the other parts of the present embodiment are the same as those of the first embodiment, description thereof will be omitted.
- this embodiment can obtain the same effects as those of the first embodiment of the present invention described above. Furthermore, in the present embodiment, by calculating the power of two short blocks, the calculation amount of the power calculation process can be reduced as compared with the first embodiment. Note that the present embodiment is not limited to the case where two short blocks are used as power calculation blocks, but the power may be calculated using any number of three or more short blocks. Les. Even in this case, an effect similar to the above effect can be obtained.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2006527708A JP4533386B2 (ja) | 2004-07-22 | 2004-07-22 | オーディオ符号化装置及びオーディオ符号化方法 |
PCT/JP2004/010416 WO2006008817A1 (ja) | 2004-07-22 | 2004-07-22 | オーディオ符号化装置及びオーディオ符号化方法 |
EP04770880A EP1775718A4 (en) | 2004-07-22 | 2004-07-22 | AUDIOCODING DEVICE AND AUDIOCODING METHOD |
US11/654,679 US20070118368A1 (en) | 2004-07-22 | 2007-01-18 | Audio encoding apparatus and audio encoding method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2004/010416 WO2006008817A1 (ja) | 2004-07-22 | 2004-07-22 | オーディオ符号化装置及びオーディオ符号化方法 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/654,679 Continuation US20070118368A1 (en) | 2004-07-22 | 2007-01-18 | Audio encoding apparatus and audio encoding method |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2006008817A1 true WO2006008817A1 (ja) | 2006-01-26 |
Family
ID=35784953
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2004/010416 WO2006008817A1 (ja) | 2004-07-22 | 2004-07-22 | オーディオ符号化装置及びオーディオ符号化方法 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20070118368A1 (ja) |
EP (1) | EP1775718A4 (ja) |
JP (1) | JP4533386B2 (ja) |
WO (1) | WO2006008817A1 (ja) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007286146A (ja) * | 2006-04-13 | 2007-11-01 | Nippon Telegr & Teleph Corp <Ntt> | 適応ブロック長符号化装置、その方法、プログラム及び記録媒体 |
JP2007286200A (ja) * | 2006-04-13 | 2007-11-01 | Nippon Telegr & Teleph Corp <Ntt> | 適応ブロック長符号化装置、その方法、プログラム及び記録媒体 |
JP2008102520A (ja) * | 2006-10-18 | 2008-05-01 | Polycom Inc | オーディオ信号の2重変換符号化 |
JP2011509426A (ja) * | 2008-01-04 | 2011-03-24 | ドルビー・インターナショナル・アーベー | オーディオエンコーダおよびデコーダ |
US7966175B2 (en) | 2006-10-18 | 2011-06-21 | Polycom, Inc. | Fast lattice vector quantization |
CN102243872A (zh) * | 2010-05-10 | 2011-11-16 | 炬力集成电路设计有限公司 | 对音频数字信号进行编码、解码的方法及系统 |
JP2018056877A (ja) * | 2016-09-30 | 2018-04-05 | 株式会社モバイルテクノ | 信号圧縮装置、信号伸長装置、信号圧縮プログラム、信号伸長プログラム及び通信装置 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090144054A1 (en) * | 2007-11-30 | 2009-06-04 | Kabushiki Kaisha Toshiba | Embedded system to perform frame switching |
WO2010102446A1 (zh) | 2009-03-11 | 2010-09-16 | 华为技术有限公司 | 一种线性预测分析方法、装置及系统 |
CN102930871B (zh) * | 2009-03-11 | 2014-07-16 | 华为技术有限公司 | 一种线性预测分析方法、装置及系统 |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH06259098A (ja) * | 1993-03-08 | 1994-09-16 | Pioneer Electron Corp | 適応ブロック長変換符号化のブロック長選択装置 |
JPH0766733A (ja) | 1993-08-25 | 1995-03-10 | Victor Co Of Japan Ltd | 音声高能率符号化装置 |
JPH09232964A (ja) * | 1996-02-20 | 1997-09-05 | Nippon Steel Corp | ブロック長可変型変換符号化装置および過渡状態検出装置 |
JP2000500247A (ja) * | 1996-07-11 | 2000-01-11 | フラオホッフェル―ゲゼルシャフト ツル フェルデルング デル アンゲヴァンドテン フォルシュング エー.ヴェー. | 可聴信号のコーディングおよびデコーディング方法 |
JP2000134106A (ja) * | 1998-10-29 | 2000-05-12 | Matsushita Electric Ind Co Ltd | オーディオ変換符号化のための周波数領域でのブロックサイズ判定適応方法 |
JP2000206990A (ja) * | 1999-01-12 | 2000-07-28 | Ricoh Co Ltd | デジタル音響信号符号化装置、デジタル音響信号符号化方法及びデジタル音響信号符号化プログラムを記録した媒体 |
JP2001343997A (ja) * | 2000-05-30 | 2001-12-14 | Ricoh Co Ltd | デジタル音響信号符号化装置、方法及び記録媒体 |
JP2003195881A (ja) * | 2001-12-28 | 2003-07-09 | Victor Co Of Japan Ltd | 周波数変換ブロック長適応変換装置及びプログラム |
JP2003233400A (ja) * | 2002-02-08 | 2003-08-22 | Ntt Docomo Inc | 復号装置、符号化装置、復号方法、及び、符号化方法 |
JP2004054156A (ja) * | 2002-07-24 | 2004-02-19 | Victor Co Of Japan Ltd | 音響信号符号化方法及び音響信号符号化装置 |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW271524B (ja) * | 1994-08-05 | 1996-03-01 | Qualcomm Inc | |
US5774837A (en) * | 1995-09-13 | 1998-06-30 | Voxware, Inc. | Speech coding system and method using voicing probability determination |
US5848391A (en) * | 1996-07-11 | 1998-12-08 | Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V. | Method subband of coding and decoding audio signals using variable length windows |
WO2001022401A1 (en) * | 1999-09-20 | 2001-03-29 | Koninklijke Philips Electronics N.V. | Processing circuit for correcting audio signals, receiver, communication system, mobile apparatus and related method |
DE60208426T2 (de) * | 2001-11-02 | 2006-08-24 | Matsushita Electric Industrial Co., Ltd., Kadoma | Vorrichtung zur signalkodierung, signaldekodierung und system zum verteilen von audiodaten |
US7460993B2 (en) * | 2001-12-14 | 2008-12-02 | Microsoft Corporation | Adaptive window-size selection in transform coding |
US7389226B2 (en) * | 2002-10-29 | 2008-06-17 | Ntt Docomo, Inc. | Optimized windows and methods therefore for gradient-descent based window optimization for linear prediction analysis in the ITU-T G.723.1 speech coding standard |
TWI275074B (en) * | 2004-04-12 | 2007-03-01 | Vivotek Inc | Method for analyzing energy consistency to process data |
US7177804B2 (en) * | 2005-05-31 | 2007-02-13 | Microsoft Corporation | Sub-band voice codec with multi-stage codebooks and redundant coding |
-
2004
- 2004-07-22 JP JP2006527708A patent/JP4533386B2/ja not_active Expired - Fee Related
- 2004-07-22 EP EP04770880A patent/EP1775718A4/en not_active Withdrawn
- 2004-07-22 WO PCT/JP2004/010416 patent/WO2006008817A1/ja active Application Filing
-
2007
- 2007-01-18 US US11/654,679 patent/US20070118368A1/en not_active Abandoned
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH06259098A (ja) * | 1993-03-08 | 1994-09-16 | Pioneer Electron Corp | 適応ブロック長変換符号化のブロック長選択装置 |
JPH0766733A (ja) | 1993-08-25 | 1995-03-10 | Victor Co Of Japan Ltd | 音声高能率符号化装置 |
JPH09232964A (ja) * | 1996-02-20 | 1997-09-05 | Nippon Steel Corp | ブロック長可変型変換符号化装置および過渡状態検出装置 |
JP2000500247A (ja) * | 1996-07-11 | 2000-01-11 | フラオホッフェル―ゲゼルシャフト ツル フェルデルング デル アンゲヴァンドテン フォルシュング エー.ヴェー. | 可聴信号のコーディングおよびデコーディング方法 |
JP2000134106A (ja) * | 1998-10-29 | 2000-05-12 | Matsushita Electric Ind Co Ltd | オーディオ変換符号化のための周波数領域でのブロックサイズ判定適応方法 |
JP2000206990A (ja) * | 1999-01-12 | 2000-07-28 | Ricoh Co Ltd | デジタル音響信号符号化装置、デジタル音響信号符号化方法及びデジタル音響信号符号化プログラムを記録した媒体 |
JP2001343997A (ja) * | 2000-05-30 | 2001-12-14 | Ricoh Co Ltd | デジタル音響信号符号化装置、方法及び記録媒体 |
JP2003195881A (ja) * | 2001-12-28 | 2003-07-09 | Victor Co Of Japan Ltd | 周波数変換ブロック長適応変換装置及びプログラム |
JP2003233400A (ja) * | 2002-02-08 | 2003-08-22 | Ntt Docomo Inc | 復号装置、符号化装置、復号方法、及び、符号化方法 |
JP2004054156A (ja) * | 2002-07-24 | 2004-02-19 | Victor Co Of Japan Ltd | 音響信号符号化方法及び音響信号符号化装置 |
Non-Patent Citations (2)
Title |
---|
SEAN A RAMPRASHAD: "The Multi Mode Transform Predictive Coding Paradigm", IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, vol. 11, 2 March 2003 (2003-03-02) |
See also references of EP1775718A4 |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007286146A (ja) * | 2006-04-13 | 2007-11-01 | Nippon Telegr & Teleph Corp <Ntt> | 適応ブロック長符号化装置、その方法、プログラム及び記録媒体 |
JP2007286200A (ja) * | 2006-04-13 | 2007-11-01 | Nippon Telegr & Teleph Corp <Ntt> | 適応ブロック長符号化装置、その方法、プログラム及び記録媒体 |
JP2008102520A (ja) * | 2006-10-18 | 2008-05-01 | Polycom Inc | オーディオ信号の2重変換符号化 |
US7953595B2 (en) | 2006-10-18 | 2011-05-31 | Polycom, Inc. | Dual-transform coding of audio signals |
US7966175B2 (en) | 2006-10-18 | 2011-06-21 | Polycom, Inc. | Fast lattice vector quantization |
JP2011509426A (ja) * | 2008-01-04 | 2011-03-24 | ドルビー・インターナショナル・アーベー | オーディオエンコーダおよびデコーダ |
US8484019B2 (en) | 2008-01-04 | 2013-07-09 | Dolby Laboratories Licensing Corporation | Audio encoder and decoder |
US8494863B2 (en) | 2008-01-04 | 2013-07-23 | Dolby Laboratories Licensing Corporation | Audio encoder and decoder with long term prediction |
US8924201B2 (en) | 2008-01-04 | 2014-12-30 | Dolby International Ab | Audio encoder and decoder |
US8938387B2 (en) | 2008-01-04 | 2015-01-20 | Dolby Laboratories Licensing Corporation | Audio encoder and decoder |
CN102243872A (zh) * | 2010-05-10 | 2011-11-16 | 炬力集成电路设计有限公司 | 对音频数字信号进行编码、解码的方法及系统 |
JP2018056877A (ja) * | 2016-09-30 | 2018-04-05 | 株式会社モバイルテクノ | 信号圧縮装置、信号伸長装置、信号圧縮プログラム、信号伸長プログラム及び通信装置 |
Also Published As
Publication number | Publication date |
---|---|
EP1775718A4 (en) | 2008-05-07 |
EP1775718A1 (en) | 2007-04-18 |
JPWO2006008817A1 (ja) | 2008-05-01 |
US20070118368A1 (en) | 2007-05-24 |
JP4533386B2 (ja) | 2010-09-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6704037B2 (ja) | 音声符号化装置および方法 | |
KR101162572B1 (ko) | 오디오 데이터 부호화 및 복호화 장치와 방법 | |
US9361900B2 (en) | Encoding device and method, decoding device and method, and program | |
TWI669706B (zh) | 用於解碼高階保真立體音響表示之方法、裝置及非暫態電腦可讀取儲存媒體 | |
KR100904605B1 (ko) | 음성부호화장치, 음성복호장치, 음성부호화방법 및음성복호방법 | |
KR100840439B1 (ko) | 음성부호화장치 및 음성복호장치 | |
JP5583881B2 (ja) | オーディオ信号の変換方法及び変換装置、オーディオ信号の適応的符号化方法及び適応的符号化装置 | |
WO1998042083A1 (en) | Audio coding method and apparatus | |
US20070118368A1 (en) | Audio encoding apparatus and audio encoding method | |
JP4063508B2 (ja) | ビットレート変換装置およびビットレート変換方法 | |
JPWO2009057329A1 (ja) | 符号化装置、復号装置およびこれらの方法 | |
EP2439736A1 (en) | Down-mixing device, encoder, and method therefor | |
JP2003316394A (ja) | 音声復号システム、及び、音声復号方法、並びに、音声復号プログラム | |
KR101387808B1 (ko) | 가변 비트율을 갖는 잔차 신호 부호화를 이용한 고품질 다객체 오디오 부호화 및 복호화 장치 | |
JP2006003580A (ja) | オーディオ信号符号化装置及びオーディオ信号符号化方法 | |
JP4699117B2 (ja) | 信号符号化装置、信号復号化装置、信号符号化方法、及び信号復号化方法。 | |
JP2003233397A (ja) | オーディオ符号化装置、オーディオ符号化プログラム及びオーディオ符号化データ伝送装置 | |
KR100880995B1 (ko) | 오디오 부호화 장치 및 오디오 부호화 방법 | |
JP4625709B2 (ja) | ステレオオーディオ信号符号化装置 | |
JP4273062B2 (ja) | 符号化方法、符号化装置、復号化方法及び復号化装置 | |
JP4721355B2 (ja) | 符号化データの符号化則変換方法および装置 | |
JP2007304258A (ja) | オーディオ信号符号化およびその復号化装置、方法ならびにプログラム | |
JP2003271199A (ja) | オーディオ信号の符号化方法及び符号化装置 | |
JP2008268792A (ja) | オーディオ信号符号化装置およびそのビットレート変換装置 | |
JP2006262295A (ja) | 符号化装置、復号装置、符号化方法及び復号方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2006527708 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11654679 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2004770880 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020077001898 Country of ref document: KR |
|
WWP | Wipo information: published in national office |
Ref document number: 1020077001898 Country of ref document: KR |
|
WWP | Wipo information: published in national office |
Ref document number: 2004770880 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 11654679 Country of ref document: US |