WO2013176177A1 - 符号化方法、復号方法、符号化装置、復号装置、プログラム、および記録媒体 - Google Patents
符号化方法、復号方法、符号化装置、復号装置、プログラム、および記録媒体 Download PDFInfo
- Publication number
- WO2013176177A1 WO2013176177A1 PCT/JP2013/064209 JP2013064209W WO2013176177A1 WO 2013176177 A1 WO2013176177 A1 WO 2013176177A1 JP 2013064209 W JP2013064209 W JP 2013064209W WO 2013176177 A1 WO2013176177 A1 WO 2013176177A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- pitch period
- frequency domain
- domain pitch
- sample
- interval
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 149
- 238000006243 chemical reaction Methods 0.000 claims abstract description 237
- 230000005236 sound signal Effects 0.000 claims abstract description 7
- 230000007774 longterm Effects 0.000 claims description 229
- 238000004458 analytical method Methods 0.000 claims description 105
- 230000008707 rearrangement Effects 0.000 claims description 78
- 108010076504 Protein Sorting Signals Proteins 0.000 claims description 49
- 230000008569 process Effects 0.000 claims description 49
- 238000011084 recovery Methods 0.000 claims description 18
- 230000015572 biosynthetic process Effects 0.000 claims description 15
- 238000003786 synthesis reaction Methods 0.000 claims description 15
- 238000005070 sampling Methods 0.000 claims description 12
- 238000013519 translation Methods 0.000 claims description 9
- 230000003247 decreasing effect Effects 0.000 claims description 3
- 230000000737 periodic effect Effects 0.000 claims 2
- 241000209094 Oryza Species 0.000 description 143
- 235000007164 Oryza sativa Nutrition 0.000 description 143
- 235000009566 rice Nutrition 0.000 description 143
- 238000012545 processing Methods 0.000 description 64
- 238000010606 normalization Methods 0.000 description 24
- 238000001228 spectrum Methods 0.000 description 21
- 230000004048 modification Effects 0.000 description 15
- 238000012986 modification Methods 0.000 description 15
- 230000006870 function Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 11
- 238000013139 quantization Methods 0.000 description 11
- 238000004364 calculation method Methods 0.000 description 7
- 239000013598 vector Substances 0.000 description 6
- 230000003044 adaptive effect Effects 0.000 description 5
- 230000014509 gene expression Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 230000002441 reversible effect Effects 0.000 description 4
- 238000012937 correction Methods 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000008901 benefit Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/002—Dynamic bit allocation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/0017—Lossless audio signal coding; Perfect reconstruction of coded audio signal by transmission of coding error
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0212—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/032—Quantisation or dequantisation of spectral components
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/09—Long term prediction, i.e. removing periodical redundancies, e.g. by using adaptive codebook or pitch predictor
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
- G10L2025/903—Pitch determination of speech signals using a laryngograph
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
- G10L2025/906—Pitch tracking
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
Definitions
- the present invention relates to an audio signal encoding technique and a code string decoding technique obtained by this encoding technique. More specifically, the present invention relates to encoding and decoding of a frequency domain sample sequence obtained by converting an acoustic signal into the frequency domain.
- Adaptive coding for orthogonal transform coefficients such as DFT (Discrete Fourier Transform) and MDCT (Modified Discrete Cosine Transform) is known as a coding method for low-bit (for example, about 10 kbit / s to 20 kbit / s) speech and acoustic signals. It has been.
- AMR-WB + Extended-Adaptive-Multi-Rate-Wideband
- TCX transform-coded-excitation
- TwinVQ TransformTransdomain Weighted Interleave Vector Quantization
- a collection of samples after the entire MDCT coefficient is rearranged according to a fixed rule is encoded as a vector.
- a large component for each time period pitch period is extracted from the MDCT coefficient, information corresponding to the time period pitch period is encoded, and further, the remaining MDCT after removing the large component for each time period pitch period is removed.
- a method may be employed in which the coefficient sequence is rearranged and the rearranged MDCT coefficient sequence is encoded by vector quantization for each predetermined number of samples.
- Non-patent documents 1 and 2 can be exemplified as documents related to TwinVQ.
- Patent Document 1 can be exemplified as a technique for extracting and encoding samples at regular intervals.
- AMR-WB + and other encodings based on TCX do not take into account variations in the amplitude of the frequency domain sample sequences based on periodicity, and encoding the sample sequences with large amplitude variations reduces the encoding efficiency. Resulting in. In order to improve the encoding efficiency, it is effective to perform encoding according to different standards for each sample group with small amplitude variation based on the pitch period of the sample sequence in the frequency domain.
- the present invention can efficiently determine and encode the pitch period of the frequency domain sample sequence during encoding and specify the pitch period of the frequency domain sample sequence during decoding. Aims to provide a new technology.
- the time domain pitch period L corresponds to the time domain pitch period code of the acoustic signal in a predetermined time period, and the frequency domain sample interval corresponding to the time domain pitch period L is converted into the conversion interval. obtained as T 1, to determine the frequency domain pitch period T from the candidate value including an integral multiple of the value U ⁇ T 1 conversion intervals T 1 and converted interval T 1, the frequency-domain pitch period T is converted interval T 1 A frequency domain pitch period code indicating how many times is obtained is obtained. The frequency domain pitch period code is output so that the decoding side can identify the frequency domain pitch period T.
- the frequency domain pitch period T is searched from an integral multiple of the conversion interval, the calculation processing amount required for searching the frequency domain pitch period T is small. Furthermore, since information indicating how many times the frequency domain pitch period T is the conversion interval is used as information for specifying the frequency domain pitch period T, the code amount of the frequency domain pitch period code can be suppressed. Accordingly, it is possible to efficiently determine and encode the pitch period of the frequency domain sample sequence during encoding, and to specify the pitch period of the frequency domain sample sequence during decoding.
- the block diagram of the encoding apparatus of embodiment The block diagram of the decoding apparatus of embodiment.
- the block diagram of the encoding apparatus of embodiment The block diagram of the decoding apparatus of embodiment.
- the figure which illustrated the variable-length codebook of an embodiment The figure which illustrated the variable-length codebook of an embodiment.
- the block diagram of the encoding apparatus of embodiment The block diagram of the decoding apparatus of embodiment.
- Encoder 11 The encoding process performed by the encoding device 11 will be described with reference to FIG. Each unit of the encoding device 11 performs the following operation in units of frames that are predetermined time intervals. In the following description, the number of frame samples is N t , and the digital acoustic signal for one frame is a digital acoustic signal sequence x (1),..., X (N t ).
- the long-term prediction analysis unit 111 obtains a pitch period L in the time domain corresponding to the input digital acoustic signal sequence x (1), ..., x (N t ) in units of frames that are predetermined time intervals. (step S111-1), and calculates the pitch gain g p corresponding to the pitch period L of the time domain (step S111-2), long-term indicating whether to perform a long-term prediction on the basis of the pitch gain g p Prediction selection information is obtained and output (step S111-3), and if the long-term prediction selection information indicates that long-term prediction is to be executed, at least the time-domain pitch period L and the time-domain pitch period L are specified. further outputs the time-domain pitch period codes C L (step S111-4).
- Step S111-1 Time domain pitch period L
- the long-term prediction analysis unit 111 selects, for example, a candidate ⁇ having a maximum value obtained from the equation (A1) from among pitch period candidates ⁇ in a predetermined time domain as a digital acoustic signal sequence x (1),. ., x (N t ) is selected as the pitch period L in the time domain.
- the candidate ⁇ and the pitch period L of the time domain are not only expressed using integers only (integer precision) but also expressed using integers and decimal values (fractional values) (decimal precision).
- x (t ⁇ ) is obtained using an interpolation filter that performs a weighted average operation on a plurality of digital acoustic signal samples.
- Step S111-2 Pitch gain g p
- Long-term prediction analysis unit 111 calculates a pitch gain g p by the equation (A2).
- Step S111-3 Long-term prediction selection information
- the long-term prediction analysis unit 111 obtains and outputs long-term prediction selection information indicating that long-term prediction is to be executed when the pitch gain g p is equal to or greater than a predetermined value, and the pitch gain g p is determined as described above. If it is less than the value, long-term prediction selection information indicating that long-term prediction is not to be executed is obtained and output.
- Step S111-4 When long-term prediction is executed, the long-term prediction analysis unit 111 performs the following.
- the long-term prediction analysis unit 111 stores a predetermined time-domain pitch period candidate ⁇ to which an index that uniquely corresponds to the candidate is assigned.
- the long-term prediction analysis unit 111 selects the index that identifies the candidate ⁇ selected as the time domain pitch period L as the time domain pitch period code C L that identifies the time domain pitch period L. Then, in addition to the above long-term prediction selection information, the long-term prediction analysis unit 111 outputs a time-domain pitch period L and a time-domain pitch period code C L.
- the long-term prediction analysis unit 111 When the long-term prediction analysis unit 111 also outputs the quantized pitch gain g p ⁇ and the pitch gain code C gp , the long-term prediction analysis unit 111 uniquely identifies the candidate as a predetermined pitch gain. The one assigned by the corresponding index is stored. The long-term prediction analysis unit 111 selects, as a pitch gain code C gp that identifies a quantized pitch gain g p ⁇ , an index that identifies a pitch gain candidate that is closest to the pitch gain g p among pitch gain candidates.
- the long-term prediction residual generation unit 112 inputs the input digital acoustic signal in units of frames that are predetermined time intervals.
- a long-term prediction residual signal sequence obtained by removing the long-term predicted signal from the sequence is generated and output. For example, based on the input digital acoustic signal sequence x (1), ..., x (N t ), the time-domain pitch period L, and the quantized pitch gain g p ⁇ , It is generated by calculating the difference signal sequence x p (1),..., X p (N t ).
- a predetermined value such as 0.5 is used as g p ⁇ .
- x p (t) x (t) -g p ⁇ x (tL) (A3)
- “Frequency domain transform unit 113a” First, when the frequency domain transform unit 113a is in frame units and the long-term prediction selection information output from the long-term prediction analysis unit 111 indicates that long-term prediction is to be executed, the input long-term prediction residual signal sequence x p (1) , ..., x p (N t ), when the long-term prediction selection information output by the long-term prediction analysis unit 111 indicates that long-term prediction is not performed, the input digital acoustic signal sequence x (1) ,. .., x (N t ) are converted into MDCT coefficient sequences X (1),..., X (N) at N points in the frequency domain (N is referred to as “transformed frame length”) (step S113a). .
- the frequency domain transform unit 113a performs MDCT transformation of the signal sequence after windowing the 2 * N-point long-term prediction residual signal sequence or digital acoustic signal sequence in the time domain, and obtains N-point coefficients in the frequency domain. .
- the symbol * represents multiplication.
- the frequency domain transform unit 113a updates the frame by shifting the window in the time domain by N points. At this time, the samples of adjacent frames overlap by N points.
- the target sample of the long-term prediction analysis and the target sample of the window in MDCT conversion are independent, and the shape of the window can be set by the degree of delay or superposition. For example, N t points may be taken from a sample portion with no overlay as a target sample for long-term prediction analysis.
- Weighting envelope normalization unit 113b receives the MDCT coefficient sequence input by the power spectrum envelope coefficient sequence of the digital acoustic signal sequence estimated using the linear prediction coefficient obtained by the linear prediction analysis for the digital acoustic signal sequence in units of frames. Are normalized, and a weighted normalized MDCT coefficient sequence is output (step S113b).
- the weighted envelope normalization unit 113b uses a weighted power spectrum envelope coefficient sequence in which the power spectrum envelope is blunted to generate an MDCT coefficient sequence in units of frames. Normalize each coefficient of.
- the weighted normalized MDCT coefficient sequence does not have the amplitude gradient and amplitude irregularity as large as the input MDCT coefficient sequence, but has a similar magnitude relationship to the power spectrum envelope coefficient sequence of the audio-acoustic digital signal.
- the coefficient side region corresponding to the low frequency has a slightly large amplitude and has a fine structure due to the pitch period of the time region.
- the digital acoustic signal x (t) at the sample point t corresponding to the time by the p-th order autoregressive process which is an all-pole model is the value x of the past that goes back to the time point p (p is a positive integer).
- each coefficient W (n) [1 ⁇ n ⁇ N] of the power spectrum envelope coefficient sequence is expressed by Expression (2).
- exp ( ⁇ ) is an exponential function with the Napier number as the base, j is an imaginary unit, and ⁇ 2 is the predicted residual energy.
- the linear prediction coefficient may be obtained by performing linear prediction analysis on the same digital acoustic signal sequence input to the long-term prediction analysis unit 111 by the weighted envelope normalization unit 113b, or exists in the encoding device 11. It may be obtained by linear predictive analysis of a speech acoustic digital signal by other means not shown. In such a case, the weighted envelope normalization unit 113b obtains each coefficient W (1),..., W (N) of the power spectrum envelope coefficient sequence using the linear prediction coefficient. In addition, the coefficients W (1),..., W (N) of the power spectrum envelope coefficient sequence have already been obtained by other means (power spectrum envelope coefficient sequence calculation unit) present in the encoding device 11.
- the weighted envelope normalization unit 113b can use the coefficients W (1),..., W (N) of the power spectrum envelope coefficient sequence.
- W (1),..., W (N) of the power spectrum envelope coefficient sequence since the decoding device 12 described later needs to obtain the same value as that obtained by the encoding device 11, a quantized linear prediction coefficient and / or a power spectrum envelope coefficient sequence is used.
- linear prediction coefficient” or “power spectrum envelope coefficient sequence” means a quantized linear prediction coefficient or power spectrum envelope coefficient sequence.
- the linear prediction coefficient is encoded by, for example, a conventional encoding technique, and the prediction coefficient code obtained thereby is transmitted to the decoding side.
- the conventional encoding technique is, for example, an encoding technique in which a code corresponding to the linear prediction coefficient itself is a prediction coefficient code, a code corresponding to the LSP parameter by converting the linear prediction coefficient into an LSP parameter, and a prediction coefficient code.
- An encoding technique for converting a linear prediction coefficient into a PARCOR coefficient and using a code corresponding to the PARCOR coefficient as a prediction coefficient code When the power spectrum envelope coefficient sequence is obtained by other means in the encoding device 11, the linear prediction coefficient is encoded by the conventional encoding technique in the other means in the encoding device 11. And the prediction coefficient code is transmitted to the decoding side.
- the weighted envelope normalization unit 113b converts each coefficient X (1),..., X (N) of the MDCT coefficient sequence to the correction value W ⁇ (1) of each coefficient of the power spectrum envelope coefficient sequence corresponding to each coefficient. , ..., W ⁇ (N), by dividing each coefficient X (1) / W ⁇ (1), ..., X (N) / W ⁇ (N) of the weighted normalized MDCT coefficient sequence Process to get.
- the correction value W ⁇ (n) [1 ⁇ n ⁇ N] is given by Equation (3).
- ⁇ is a positive constant of 1 or less, and is a constant that dulls the power spectrum coefficient.
- the weighted envelope normalization unit 113b converts each coefficient X (1),..., X (N) of the MDCT coefficient sequence to the ⁇ power of each coefficient of the power spectrum envelope coefficient sequence corresponding to each coefficient (0 ⁇ ⁇ 1) values W (1) ⁇ ,..., W (N) By dividing by ⁇ , each coefficient X (1) / W (1) ⁇ ,. (N) / W (N) ⁇ is obtained.
- a frame-by-frame weighted normalized MDCT coefficient sequence is obtained, but the weighted normalized MDCT coefficient sequence does not have as large an amplitude gradient or amplitude unevenness as the input MDCT coefficient sequence, but the input MDCT coefficient It has a magnitude relationship similar to the power spectrum envelope of the column, that is, has a slightly large amplitude in the coefficient side region corresponding to a low frequency and has a fine structure due to the pitch period in the time domain.
- the inverse processing corresponding to the weighted envelope normalization process that is, the process of restoring the MDCT coefficient sequence from the weighted normalized MDCT coefficient sequence is performed on the decoding side, so the weighted power spectrum envelope coefficient sequence from the power spectrum envelope coefficient sequence It is necessary to set a common setting for the encoding side and the decoding side.
- the normalization gain calculation unit 113c receives the weighted normalized MDCT coefficient sequence as an input and can quantize each coefficient of the weighted normalized MDCT coefficient sequence with a given total number of bits for each frame.
- a quantization step width is determined using a sum of amplitude values or energy values over a range, and a coefficient (hereinafter referred to as a gain) for dividing each coefficient of the weighted normalized MDCT coefficient sequence so as to be the quantization step width. Obtained (step S113c).
- Information representing this gain is transmitted to the decoding side as gain information.
- the normalization gain calculation unit 113c normalizes (divides) each coefficient of the input weighted normalization MDCT coefficient sequence with this gain for each frame and outputs the result.
- the quantization unit 113d quantizes each coefficient of the weighted normalized MDCT coefficient sequence normalized by the gain for each frame with the quantization step width determined in the process of step S113c.
- the MDCT coefficient sequence is output as a “frequency domain sample sequence” (step S113d).
- the quantized MDCT coefficient sequence (frequency domain sample sequence) obtained in the process of step S113d is an input to the frequency domain pitch period analysis unit 115 and the rearrangement processing unit 116a.
- the pitch period of the region may be determined.
- the period conversion unit 114 does nothing when the long-term prediction selection information indicates that long-term prediction is not executed. However, there is no problem even if the long-term prediction selection information performs the same processing as when long-term prediction is executed. That is, the period conversion unit 114 is configured such that the long-term prediction selection information is not input, the input pitch period L in the time domain and the sample point N in the frequency domain are input, and the conversion interval T 1 is obtained and output. There may be.
- the frequency domain pitch period analysis unit 115 sets the input conversion interval T 1 and an integer multiple of the conversion interval T 1 U ⁇ T 1 as candidate values. as, determines the frequency domain pitch period T, it outputs the frequency-domain pitch period codes indicating whether the frequency domain pitch period T and the frequency-domain pitch period T is multiple of conversion interval T 1.
- U is an integer in a predetermined first range. For example, U is an integer other than 0, for example, U ⁇ 2.
- a total of eight values 1 , 7T 1 , and 8T 1 are frequency domain pitch period candidate values, and the frequency domain pitch period T is selected from these candidate values.
- the frequency domain pitch period code is a code corresponding to each integer of at least 3 bits and not less than 1 and not more than 8.
- the frequency-domain pitch period analysis unit 115 determines the frequency-domain pitch period T using a predetermined integer value in the second range as a candidate value,
- the area pitch period T and the frequency area pitch period code indicating the frequency area pitch period T are output.
- the integer value in the second range is 5 or more and 36 or less
- a total of 5 values of 5 , 6,..., 36 are the candidate values of the frequency domain pitch period.
- the frequency domain pitch period T is selected from the candidate values.
- the frequency domain pitch period code is a code corresponding to each integer of 0 or more and 31 or less of at least 5 bits on a one-to-one basis.
- the frequency domain pitch period analysis unit 115 determines, for example, a frequency domain pitch period T that has the maximum index value indicating the degree of energy concentration in the sample group selected according to a predetermined rearrangement rule.
- the index value indicating the degree of energy concentration is the sum of energy, the sum of absolute values, or the like. That is, when the index value indicating the energy concentration is the total energy, the candidate value that maximizes the total energy of all the samples included in the sample group selected according to the predetermined rearrangement rule is set as the frequency domain pitch. Determined as period T.
- the index value indicating the energy concentration is an absolute value sum
- the candidate value that maximizes the absolute value sum of the values of all samples included in the sample group selected according to a predetermined rearrangement rule is selected as the frequency. This is determined as the area pitch period T.
- the “sample group selected according to a predetermined rearrangement rule” will be described in detail in the column of the rearrangement processing unit 116a.
- the frequency domain pitch period analysis unit 115 actually encodes the sample sequence rearranged according to a predetermined rearrangement rule, and determines the candidate value that minimizes the code amount as the frequency domain pitch period T.
- the “sample sequence rearranged according to a predetermined rearrangement rule” will be described in detail in the column of the rearrangement processing unit 116a.
- the frequency domain pitch period analysis unit 115 selects and selects the predetermined number of candidate values from the largest index value indicating the degree of energy concentration in the sample group selected according to a predetermined rearrangement rule, for example.
- the candidate value that minimizes the amount of code is determined as the frequency domain pitch period T by actually encoding the sample string rearranged according to a predetermined rearrangement rule from the candidate values.
- Frequency domain pitch period analysis section 115 when the long-term prediction selection information indicates to perform long term prediction, the conversion interval T 1 and converted interval T 1 integral multiple of U ⁇ T 1 as candidate values, frequency domain The meaning of determining the pitch period T will be described below.
- each MDCT coefficient sequence X (k) includes, for example, the following 2 * N-dimensional orthonormal basis vectors B (k) and signal sequence vectors (x p '(1), ..., x p ' (2 * N)).
- x (1), ..., x (N t ) and X (1), ..., X (N) are discrete values.
- An integer multiple of the adjacent sample interval of x (1), ..., x (N t ) in the time domain is not necessarily the fundamental period P f , and furthermore, X (1),. ., X (N) is not necessarily an integral multiple of the adjacent sample interval being the ideal conversion interval 2 * N / P f . Therefore, the pitch period L in the time domain selected in step S111-1 may be an integer multiple of the basic period P f or a candidate ⁇ in the vicinity thereof instead of the basic period P f or a candidate ⁇ in the vicinity thereof.
- the interval T 1 ′ obtained by converting the time-domain pitch period L into the frequency domain is an integral fraction of the ideal conversion interval, that is, (2 * N / P f ) / n.
- an index value indicating the degree of energy concentration in the selected sample group may be increased.
- the time-domain pitch period L selected in step S111-1 is a candidate ⁇ that maximizes the value obtained by equation (A1).
- the maximum value of x (t) x (t- ⁇ ) in equation (A1) is the fundamental period P f of the digital acoustic signal sequence x (1), ..., x (N t ) or its integral multiple That is, the candidate ⁇ closest to any one of n * P f (where n is a positive integer) is selected. That is, the candidate ⁇ closest to any of n * P f tends to be the time-domain pitch period L.
- the fundamental period P f is an integer multiple of the sampling period (adjacent sample interval) of the digital acoustic signal sequence x (1), ..., x (N t ), the fundamental period P f or the closest to it There is a high tendency that the candidate ⁇ maximizes the value obtained by the equation (A1) and becomes the pitch period L in the time domain.
- the fundamental period P f is not an integer multiple of the sampling period, the fundamental period P f other n * P f or closest candidate to that ⁇ is the maximum value obtained by the formula (A1), the pitch in the time domain
- the cycle is L. For example, in the example of FIG.
- the basic period P f is not an integral multiple of the sampling period, and 2 * P f is selected as the pitch period L in the time domain.
- the smaller of the candidate values the larger the value of equation (A1), so the time domain pitch period L is selected. It tends to be easy.
- 2 * P f and 4 * P f are integer multiples of the sampling period, 2 * P f is more likely to be selected as the time domain pitch period L because the value of equation (A1) is greater. That is, it can be said that the above-mentioned n tends to be used more as the value is smaller.
- an integer multiple of the sampling interval in the frequency domain does not necessarily correspond to the ideal conversion interval 2 * N / P f .
- the ideal conversion interval 2 * N / P f is not an integral multiple of the adjacent sample interval of the MDCT coefficient sequence X (1), ..., X (N).
- a sample group cannot be selected with 2 * N / P f as the frequency domain pitch period T.
- the ideal conversion interval 2 * N / P f itself cannot be selected as the frequency domain pitch period.
- An index value indicating the degree of energy concentration in the group can be increased. That is, for the purpose of increasing the degree of energy concentration in the selected sample group, the relationship between the frequency domain pitch period T and the conversion interval T 1 ′ can be written as follows using equation (A41).
- Equation (A42) can be approximated as follows using the conversion interval T 1 of the formula (A4).
- the pitch period T of the frequency domain can be approximated by an integer multiple of the conversion interval T 1.
- it is a pitch period T of the frequency domain so as to increase the index value indicating the degree of concentration of energy to the sample group than other values high.
- an integral multiple of the conversion interval T 1 and converted interval T 1 and the value of that neighborhood as candidate values, to determine the frequency domain pitch period T to increase the index value indicating the degree of concentration of energy to the sample group be able to.
- the smaller the value of n the more likely it is to be used.
- the multiplier m * n for the conversion interval T 1 of the frequency domain pitch period T is small in the frequency domain. It can be said that the higher the frequency domain pitch period T, the more likely it is to be determined. That is, it can be said that the easily determined as higher the frequency domain pitch period T an integral multiple of the multiple value conversion interval T 1 is less tendency.
- FIG. 5 shows the relationship between the frequency domain pitch period and the time domain pitch period that increase the index value indicating the degree of energy concentration in the sample group. From FIG. 5, the frequency domain pitch period T is frequently an integer multiple of the conversion interval T 1 (especially 1, 2, 3 or 4) or a value in the vicinity thereof, and the frequency domain pitch period T is the conversion interval. frequency not be an integral multiple of T 1 it is can be seen that low. That is, FIG.
- the frequency domain pitch period T that increases the concentration of energy in the sample group has a very high probability of being an integer multiple of the conversion interval T 1 or a value in the vicinity thereof.
- those multipliers m * n for the conversion interval T 1 of the frequency domain pitch period T is small, it can also be seen that in the tendency to be determined as a frequency-domain pitch period T. Therefore, by searching for the frequency domain pitch period is an integral multiple and values in the vicinity of the conversion interval T 1 as the candidate value, to obtain a value that increases the degree of concentration of energy to the sample group as a frequency-domain pitch period Can do.
- the frequency domain pitch period consideration encoding unit 116 includes a rearrangement processing unit 116a and an encoding unit 116b, and encodes an input frequency domain sample sequence using an encoding method based on the frequency domain pitch period T.
- the code string obtained by is output.
- the rearrangement processing unit 116a includes (1) all samples of the frequency domain sample sequence, and (2) the frequency domain pitch period T determined by the frequency domain pitch period analysis unit 115 of the frequency domain sample sequence. All or one of one or a plurality of consecutive samples including samples corresponding to and one or a plurality of consecutive samples including samples corresponding to an integer multiple of the frequency domain pitch period T in the frequency domain sample sequence A sample string in which at least a part of samples included in the sample string is rearranged so that a part of samples are collected is output as a rearranged sample string.
- one or a plurality of consecutive samples including samples corresponding to the frequency domain pitch period T and one or a plurality of consecutive samples including samples corresponding to an integer multiple of the frequency domain pitch period T are gathered.
- At least some of the samples included in the input sample sequence are rearranged.
- One or more consecutive samples including samples corresponding to the frequency domain pitch period T and one or more consecutive samples including samples corresponding to an integer multiple of the frequency domain pitch period T are low frequency They are gathered together as a unit.
- the rearrangement processing unit 116a uses samples F (nT ⁇ 1) and F (nT + 1) before and after the sample F (nT) corresponding to an integer multiple of the frequency domain pitch period T from the input sample sequence. ) Including three samples F (nT-1), F (nT), and F (nT + 1). A group of the selected samples is a “sample group selected according to a predetermined rearrangement rule” in the frequency domain pitch period analysis unit 115.
- F (j) is a sample corresponding to the number j representing the sample index corresponding to the frequency.
- n is an integer in a range where 1 to nT + 1 do not exceed the preset upper limit N of the target sample.
- N may be the maximum value of the number j representing the sample index corresponding to the frequency.
- a collection of samples selected according to n is called a sample group.
- the upper limit N may be equal to jmax, but in the case of acoustic signals such as speech and musical sounds, the high-frequency sample index is generally small enough, so it is large to improve the encoding efficiency described later.
- N may be a value smaller than jmax. For example, N may be about half of jmax.
- nmax samples corresponding to each frequency from the lowest frequency to the first predetermined frequency nmax * T + 1 among the samples included in the input sample sequence Are subject to sorting.
- the symbol * represents multiplication.
- the rearrangement processing unit 116a generates the sample sequence A by arranging the selected sample F (j) in order from the top of the sample sequence while maintaining the magnitude relationship of the original number j. For example, when n represents each integer from 1 to 5, the rearrangement processing unit 116a uses the first sample group F (T-1), F (T), F (T + 1), and the second sample. Group F (2T-1), F (2T), F (2T + 1), third sample group F (3T-1), F (3T), F (3T), F (3T + 1), fourth sample group F (4T-1), F (4T), F (4T + 1), and fifth sample group F (5T-1), F (5T), F (5T), F (5T + 1) are arranged from the head of the sample sequence.
- the rearrangement processing unit 116a arranges the samples F (j) that have not been selected in order from the end of the sample row A while maintaining the original number size relationship.
- the unselected sample F (j) is a sample located between the sample groups constituting the sample row A, and such a continuous set of samples is referred to as a sample set. That is, in the above example, the first sample set F (1),..., F (T-2), the second sample set F (T + 2),. , F (3T-2), fourth sample set F (3T + 2), ..., F (4T-2), fifth sample set F (4T + 2),..., F (5T-2), the sixth sample set F (5T + 2),... F (jmax) are arranged in order from the end of the sample sequence A, and these samples constitute the sample sequence B .
- the input sample sequence F (j) (1 ⁇ j ⁇ jmax) is F (T ⁇ 1), F (T), F (T + 1), F (2T ⁇ 1). ), F (2T), F (2T + 1), F (3T-1), F (3T), F (3T + 1), F (4T-1), F (4T), F (4T + 1 ), F (5T-1), F (5T), F (5T), F (5T), F (5T + 1), F (1), ..., F (T-2), F (T + 2), ..., F (2T-2) , F (2T + 2), ..., F (3T-2), F (3T + 2), ..., F (4T-2), F (4T + 2), ..., F (5T-2), F (5T + 2),... F (jmax) are rearranged (see FIG. 6).
- This rearranged sample string is a “sample string rearranged according to a predetermined rearrangement rule” in the frequency domain pitch period analysis unit 115.
- each sample In the low frequency band, each sample often has a large value in amplitude and power, even samples other than samples corresponding to the frequency domain pitch period T and samples that are integer multiples thereof. Therefore, the rearrangement of samples corresponding to each frequency from the lowest frequency to the predetermined frequency f may not be performed.
- the predetermined frequency f is nT + ⁇
- the samples F (1),..., F (nT + ⁇ ) before rearrangement are not rearranged, and after F (nT + ⁇ + 1) before rearrangement.
- This sample is subject to sorting.
- ⁇ is set in advance to an integer greater than or equal to 0 and somewhat smaller than T (for example, an integer not exceeding T / 2).
- n may be an integer of 2 or more.
- P samples F (1),..., F (P) from the sample corresponding to the lowest frequency before rearrangement are not rearranged, and after F (P + 1) before rearrangement Samples may be sorted.
- the predetermined frequency f is P.
- the criteria for the rearrangement for the collection of samples to be rearranged are as described above. Note that when the first predetermined frequency is set, the predetermined frequency f (second predetermined frequency) is smaller than the first predetermined frequency.
- the input sample sequence F (j) (1 ⁇ j ⁇ jmax) is F (1),..., F (T + 1), F (2T-1), F (2T), F (2T + 1), F (3T-1), F (3T), F (3T + 1), F (4T-1), F (4T), F (4T + 1), F (5T-1 ), F (5T), F (5T + 1), F (T + 2), ..., F (2T-2), F (2T + 2), ..., F (3T-2), F (3T + 2), ..., F (4T-2), F (4T + 2), ..., F (5T-2), F (5T + 2), ... F (jmax) (see Fig. 7). reference).
- the upper limit N or the first predetermined frequency for determining the maximum value of the number j to be rearranged is not set to a value common to all frames, and a different upper limit N or the first predetermined frequency is set for each frame. May be.
- information specifying the upper limit N or the first predetermined frequency for each frame may be sent to the decoding side.
- the number of sample groups to be rearranged may be specified. In this case, the number of sample groups is set for each frame, and the sample group is set. May be sent to the decoding side. Of course, the number of sample groups to be rearranged may be common to all frames.
- the second predetermined frequency f may be set to a different second predetermined frequency f for each frame without being a value common to all frames. In this case, information specifying the second predetermined frequency for each frame may be sent to the decoding side.
- the envelope of the sample index shows a downward trend as the frequency increases.
- the frequency domain sample train generally has few high-frequency components as a characteristic of an acoustic signal, particularly an audio signal or a musical sound signal.
- the reordering unit 116a reorders at least some of the samples included in the input sample sequence so that the envelope of the sample index shows a downward trend as the frequency increases. Good.
- each sample included in the frequency domain sample string is often a positive, negative, or zero value. Even in such a case, the above-described rearrangement process or the rearrangement process described later is performed. Just do it.
- one or a plurality of consecutive samples including samples corresponding to the frequency domain pitch period T and one or a continuous including samples corresponding to an integer multiple of the frequency domain pitch period T are arranged on the high frequency side.
- a reordering may be performed that collects one or a plurality of consecutive samples including corresponding samples.
- the sample group is arranged in the reverse order in the sample row A
- the sample set is arranged in the reverse order in the sample row B
- the sample row B is arranged on the low frequency side
- the sample row A is arranged behind the sample B.
- the reordering unit 116a reorders at least some of the samples included in the input sample sequence so that the envelope of the sample index shows an increasing tendency with increasing frequency. Good.
- the frequency domain pitch period T may be a decimal number instead of an integer.
- F (R (nT-1)), F (R (nT)), and F (R (nT + 1)) are selected with R (nT) rounded off to nT. become.
- the frequency domain pitch period analysis unit 115 when the frequency domain pitch period analysis unit 115 performs a process of determining a candidate value that minimizes the actual code amount as the frequency domain pitch period T, the frequency domain pitch period analysis unit 115 generates a sample sequence after rearrangement. Since the frequency domain pitch period consideration encoding unit 116 is generated, the rearrangement processing unit 116a may not be provided.
- the number of samples included in each sample group is a total of three samples: a sample corresponding to the frequency domain pitch period T or an integral multiple thereof (hereinafter referred to as a central sample) and one sample before and after that.
- a central sample a sample corresponding to the frequency domain pitch period T or an integral multiple thereof
- An example of a fixed number is shown.
- the rearrangement processing unit 116a selects from a plurality of options in which the combination of the number of samples included in the sample group and the sample index is different.
- the information representing one of them is output as auxiliary information (first auxiliary information).
- the rearrangement processing unit 116a performs rearrangement corresponding to each option, and the encoding unit 116b described later encodes the code corresponding to each option.
- a method of obtaining the code amount of the column and selecting an option having the smallest code amount may be adopted.
- the first auxiliary information is output not from the rearrangement processing unit 116a but from the encoding unit 116b. This method is also valid when n can be selected.
- the encoding unit 116b encodes the sample sequence output from the rearrangement processing unit 116a and outputs the obtained code sequence (step S116b). For example, the encoding unit 116b performs encoding by switching the variable-length encoding method according to the amplitude deviation of the samples included in the sample sequence output from the rearrangement processing unit 116a. That is, since the rearrangement processing unit 116a collects samples with large amplitudes on the low frequency side (or high frequency side) in the frame, the encoding unit 116b performs variable length encoding by a method suitable for the bias. Do.
- the average is obtained by performing the rice coding with the different rice parameter for each region.
- the amount of code can be reduced.
- samples having a large amplitude are collected on the low frequency side (side closer to the head of the frame) in the frame will be described as an example.
- the encoding unit 116b applies Rice encoding (also referred to as Golomb-Rice encoding) for each sample in a region where samples having large amplitudes are gathered. In a region other than this region, the encoding unit 116b applies entropy encoding (such as Huffman encoding or arithmetic encoding) suitable for encoding a set of samples in which a plurality of samples are collected.
- the application region of rice encoding and the rice parameter may be fixed, or one of a plurality of options having different combinations of the application region of rice encoding and the rice parameter can be selected. It may be a configuration.
- a variable length code (binary value surrounded by a symbol "" as described below can be used as selection information for rice encoding, and the encoding unit 116b also outputs selection information.
- Rice coding is applied to the area 1/16 from the beginning with the Rice parameter set to 2.
- Rice coding is applied with an area of 1/32 from the top as a Rice parameter of 3.
- the code amount of the code string corresponding to each rice encoding obtained by the encoding process is compared, and the option with the smallest code amount is selected.
- a method of selecting may be adopted.
- the average code amount can be reduced by, for example, run-length encoding the number of consecutive samples having an amplitude of 0.
- the encoding unit 116b applies (1) Rice encoding for each sample in a region where samples having a large amplitude are gathered, and (2) (a) 0 in regions other than this region. In a region where samples having amplitude are continuous, encoding is performed to output a code representing the number of consecutive samples having amplitude of 0. (b) In the remaining region, encoding is performed on a set of samples obtained by collecting a plurality of samples.
- Entropy coding (Huffman coding, arithmetic coding, etc.) is also applied. Even in such a case, the selection of the rice encoding as described above may be performed. In such a case, information indicating to which region run-length encoding has been applied needs to be transmitted to the decoding side. For example, this information is included in the selection information. Further, when a plurality of encoding methods belonging to entropy encoding are prepared as options, information for specifying which encoding is selected needs to be transmitted to the decoding side. Information is included in the selection information.
- the rearrangement processing unit 116a also outputs a sample string before rearrangement (a sample string that has not been rearranged), and the encoding unit 116b outputs the sample string before rearrangement and the sample string after rearrangement, respectively.
- Code length obtained by variable-length coding and coding amount of code string obtained by variable-length coding of sample string before rearrangement and code string obtained by switching variable-length coding of sample stream after rearrangement for each region When the code amount of the sample sequence before rearrangement is minimum, a code sequence obtained by variable-length encoding the sample sequence before rearrangement is output.
- the encoding unit 116b also outputs auxiliary information (second auxiliary information) indicating whether or not the sample sequence corresponding to the code sequence is a sample sequence obtained by rearranging the samples. It is sufficient to use 1 bit as the second auxiliary information. If the second auxiliary information is a sample string corresponding to the code string that specifies a sample string that has not been rearranged, the first auxiliary information may not be output.
- second auxiliary information is a sample string corresponding to the code string that specifies a sample string that has not been rearranged, the first auxiliary information may not be output.
- the rearrangement of the sample sequence is applied only when the prediction gain or its estimated value is larger than a predetermined threshold value.
- This utilizes the property of voice and musical tone that vocal cord vibration and instrument vibration are strong and the periodicity is often high when the prediction gain is large.
- the prediction gain is the original sound energy divided by the prediction residual energy.
- a quantized parameter can be used in common by an encoding device and a decoding device.
- the encoding unit 116b uses the i-th quantized PARCOR coefficient k (i) obtained by another means (not shown) in the encoding device 11, and uses (1-k (i) * k ( i)) is multiplied by each order, and an estimated value of the prediction gain expressed by the reciprocal number is calculated. If the calculated estimated value is larger than a predetermined threshold, the rearranged sample sequence is variable-length encoded. The obtained code string is output, and if not, a code string obtained by variable-length coding the sample string before rearrangement is output. In this case, it is not necessary to output the second auxiliary information indicating whether or not the sample sequence corresponding to the code sequence is the sample sequence that has been rearranged. In other words, since there is a high possibility that the effect is small when noisy speech or silence is not possible, it is less wasteful to calculate the second auxiliary information or to calculate that the rearrangement is not performed.
- the rearrangement processing unit 116a calculates a prediction gain or an estimated value of the prediction gain. If the prediction gain or the estimated value of the prediction gain is larger than a predetermined threshold value, the rearrangement is performed on the sample sequence. The subsequent sample sequence is output to the encoding unit 116b, otherwise, the sample sequence itself input to the rearrangement processing unit 116a is output to the encoding unit 116b without performing the rearrangement on the sample sequence. In 116b, the sample sequence output from the rearrangement processing unit 116a may be variable length encoded.
- the threshold value is set in advance as a common value on the encoding side and the decoding side.
- the quantized PARCOR coefficient is a coefficient that can be converted from a linear prediction coefficient or an LSP parameter, instead of obtaining the quantized PARCOR coefficient by another means (not shown) in the encoding apparatus 11, the encoding apparatus 11 First, the quantized linear prediction coefficient and the quantized LSP parameter are obtained by another means (not shown), then the quantized PARCOR coefficient is obtained from the obtained parameter, and the estimated gain is further obtained. May be. In short, the estimated value of the prediction gain is obtained based on the quantized coefficient corresponding to the linear prediction coefficient.
- a symbol sequence frequency table for arithmetic coding is selected from the immediately preceding symbol sequence.
- Arithmetic coding that divides the closed interval half-line [0, 1] according to the appearance probability of the selected symbol sequence and assigns a code for the symbol sequence to a binary decimal value indicating a position in the segmented interval. Is done.
- the sample sequence in the frequency domain after the rearrangement (quantized MDCT coefficient sequence in the above example) is sequentially divided into symbols from the low frequency, and a frequency table for arithmetic coding is generated.
- the closed section half-line [0, 1] is divided according to the appearance probability of the selected symbol series, and the symbol series is converted into a binary decimal value indicating the position in the section. Assign a sign for.
- the sample sequence has already been rearranged so that samples having the same or similar index (for example, absolute value of the amplitude) that reflect the sample size are collected by the rearrangement process. The fluctuation of the index reflecting the sample size between adjacent samples is reduced, the accuracy of the symbol frequency table is increased, and the total code amount of codes obtained by arithmetic coding on the symbols can be suppressed.
- the decoding process performed by the decoding device 12 will be described with reference to FIG.
- the decoding device 12 receives at least the long-term prediction selection information, the gain information, the frequency domain pitch period code, and the code string.
- the long-term prediction selection information indicates that long-term prediction is to be performed
- at least a time domain pitch period code CL is input.
- a pitch gain code C gp may also be input.
- selection information, first auxiliary information, or second auxiliary information is output from the encoding device 11, the selection information, first auxiliary information, or second auxiliary information is also input to the decoding device 12.
- the frequency domain pitch period consideration decoding unit 123 includes a decoding unit 123a and a recovery unit 123b, and decodes an input code string to obtain a sequence of original samples and outputs by a decoding method based on the frequency domain pitch period T. To do.
- Decryption unit 123a decodes the input code string for each frame and outputs a frequency-domain sample string (step S123a).
- the decoding unit determines whether or not the second auxiliary information indicates that the sample sequence corresponding to the code sequence is a sample sequence on which the samples have been rearranged.
- the output destination of the frequency domain sample sequence obtained by 123a is different.
- the frequency domain sample sequence obtained by the decoding unit 123a is output to the recovery unit 123b.
- the frequency domain sample sequence obtained by the decoding unit 123a is output to the gain multiplication unit 124a. Is done.
- the decoding unit 123a uses the i-th quantized PARCOR coefficient k (i) obtained by another means (not shown) in the decoding device 12 to calculate (1-k (i) * k (i)). Calculate an estimate of the prediction gain expressed as the reciprocal of what is multiplied for each order.
- the decoding unit 123a outputs the frequency domain sample sequence obtained by the decoding unit 123a to the recovery unit 123b. Otherwise, the decoding unit 123a outputs the sample sequence before the rearrangement of the frequency domain sample sequence obtained by the decoding unit 123a to the gain multiplication unit 124a.
- a method of obtaining a quantized PARCOR coefficient by another means (not shown) in the decoding device 12 a method of obtaining a quantized PARCOR coefficient by decoding a code corresponding to the PARCOR coefficient, a code corresponding to the LSP parameter
- a well-known method such as a method of obtaining a quantized LSP parameter by decoding and converting the obtained quantized LSP parameter to obtain a quantized PARCOR coefficient may be employed.
- all of these methods are methods for obtaining a quantized coefficient corresponding to a linear prediction coefficient from a code corresponding to the linear prediction coefficient. That is, the estimated value of the prediction gain is based on the quantized coefficient corresponding to the linear prediction coefficient obtained by decoding the code corresponding to the linear prediction coefficient.
- the decoding unit 123a When selection information is input from the encoding device 11 to the decoding device 12, the decoding unit 123a performs a decoding process on the input code string using a decoding method according to the selection information. Naturally, a decoding method corresponding to the encoding method executed to obtain the code string is executed.
- the details of the decoding process performed by the decoding unit 123a correspond to the details of the encoding process performed by the encoding unit 116b of the encoding device 11. Therefore, the description of the encoding process is incorporated herein and the decoding corresponding to the executed encoding is performed. Is a decoding process performed by the decoding unit 123a, and this is a detailed description of the decoding process.
- selection information When selection information is input, what encoding method is executed is specified by the selection information.
- the selection information includes, for example, information for specifying an application region and a rice parameter for Rice coding, information indicating an application region for run-length encoding, and information for specifying the type of entropy encoding
- a decoding method according to these encoding methods is applied to a corresponding region of the input code string. Since the decoding process corresponding to the Rice encoding, the decoding process corresponding to the entropy encoding, and the decoding process corresponding to the run length encoding are all well known, description thereof will be omitted.
- Long-term prediction information decoding unit 121 Long-term prediction information decoding unit 121, long-term prediction selection information to indicate that performing the long-term prediction decodes the time input area pitch period codes C L and outputs the resulting pitch period L in the time domain .
- the pitch gain code C gp is also input, the pitch gain code C gp is further decoded to obtain a quantized pitch gain g p ⁇ and output it.
- Period conversion unit 122 When the long-term prediction selection information indicates that long-term prediction is to be performed, the period conversion unit 122 decodes the input frequency-domain pitch period code, and the frequency-domain pitch period T is how many times the conversion interval T 1 By obtaining an integer value indicating whether or not, obtaining a conversion interval T 1 by equation (A4) based on the pitch period L in the time domain and the number N of sample points in the frequency domain, and multiplying the conversion interval T 1 by the integer value Obtain frequency domain pitch period T and output. When the long-term prediction selection information indicates that long-term prediction is not performed, the period conversion unit 122 decodes the input frequency domain pitch period code to obtain and output the frequency domain pitch period T.
- the recovery unit 123b follows the frequency domain pitch period T obtained by the period conversion unit 122, or the frequency domain obtained by the period conversion unit 122 when auxiliary information is input to the decoding device 12.
- the original sample sequence is obtained from the frequency domain sample sequence output by the decoding unit 123a and output (step S123b).
- the “original sample arrangement” corresponds to the “frequency domain sample sequence” output from the frequency domain sample sequence generation unit 113 of the encoding device 11.
- the rearrangement can be specified by the frequency domain pitch period T and the auxiliary information.
- the details of the recovery processing by the recovery unit 123b correspond to the details of the rearrangement processing by the rearrangement processing unit 116a of the encoding device 11, so that the description of the rearrangement processing is incorporated here and the reverse processing of the rearrangement processing. It is specified that (reverse rearrangement) is the recovery process performed by the recovery unit 123b, and this is a detailed description of the recovery process. In order to help understanding, an example of a recovery process corresponding to a specific example of the above-described rearrangement process will be described.
- the rearrangement processing unit 116a collects the sample group on the low frequency side, and F (T-1), F (T), F (T + 1), F (2T-1), F (2T), F ( 2T + 1), F (3T-1), F (3T), F (3T + 1), F (4T-1), F (4T), F (4T + 1), F (5T-1), F (5T), F (5T), F (5T), F (5T + 1), F (1), ..., F (T-2), F (T + 2), ..., F (2T-2), F (2T + 2), ... , F (3T-2), F (3T + 2), ..., F (4T-2), F (4T + 2), ..., F (5T-2), F (5T + 2), ..., F (5T + 2), ...
- the recovery unit 123b includes the frequency domain sample sequences F (T-1), F (T), F (T + 1), F (2T) output by the decoding unit 123a. -1), F (2T), F (2T + 1), F (3T-1), F (3T), F (3T + 1), F (4T-1), F (4T), F (4T +1), F (5T-1), F (5T), F (5T), F (5T + 1), F (1), ..., F (T-2), F (T + 2), ..., F (2T- 2), F (2T + 2), ..., F (3T-2), F (3T + 2), ..., F (4T-2), F (4T + 2), ..., F (5T-2) , F (5T + 2),..., F (5T + 2),...,..., F (5T-2) , F (5T + 2),...,..., F (5T + 2),...,..., F (5T + 2),...,..., F (5T + 2),...,..., F (5T + 2),...,..., F (5T-2)
- F (jmax) are input.
- the recovery unit 123b based on the frequency domain pitch period T and the auxiliary information, inputs the sample sequence F (T-1), F (T), F (T + 1), F (2T-1), F ( 2T), F (2T + 1), F (3T-1), F (3T), F (3T + 1), F (4T-1), F (4T), F (4T + 1), F ( 5T-1), F (5T), F (5T + 1), F (1), ..., F (T-2), F (T + 2), ..., F (2T-2), F (2T +2), ..., F (3T-2), F (3T + 2), ..., F (4T-2), F (4T + 2), ..., F (5T-2), F (5T + 2 ),... F (jmax) is returned to the original sample sequence F (j) (1 ⁇ j ⁇ jmax).
- the gain multiplication unit 124a multiplies each coefficient of the sample sequence output from the decoding unit 123a or the recovery unit 123b for each frame by the gain specified by the gain information, thereby obtaining “normalized weighted normalization”.
- An MDCT coefficient sequence "is obtained and output (step S124a).
- the weighted envelope denormalization unit 124b transmits the power spectrum envelope transmitted as described above to each coefficient of the “normalized weighted normalized MDCT coefficient sequence” output from the gain multiplication unit 124a for each frame.
- an “MDCT coefficient sequence” is obtained and output (step S124b).
- the weighted envelope denormalization unit 124b outputs the “normalized weighted normalization MDCT output from the gain multiplication unit 124a. For each coefficient in the “coefficient sequence”, the values W (1) ⁇ ,. By multiplying, each coefficient X (1),..., X (N) of the MDCT coefficient sequence is obtained.
- time domain conversion unit 124c converts the “MDCT coefficient sequence” output from the weighted envelope denormalization unit 124b to the time domain to obtain a signal sequence in units of frames (time domain signal sequence).
- Step S124c the signal sequence obtained by the time domain conversion unit 124c is the long-term prediction residual signal sequence x p (1),. .., x p (N t ) are input to the long-term prediction synthesis unit 125.
- the signal sequence obtained by the time-domain conversion unit 124c is a digital acoustic signal sequence x (1),. It is output from the decoding device 12 as x (N t ).
- the long-term prediction synthesis unit 125 When the long-term prediction selection information indicates that the long-term prediction selection information is to be executed, the long-term prediction synthesis unit 125 performs the long-term prediction residual signal sequence x p (1) ,. (N t ), the time-domain pitch period L output by the long-term prediction information decoding unit 121, the quantized pitch gain g p ⁇ , and the past digital acoustic signal generated by the long-term prediction synthesis unit 125, (A5) obtains a digital acoustic signal sequence x (1), ..., x (N t ).
- the long-term prediction information decoding unit 121 does not output the quantized pitch gain g p ⁇ , that is, when the pitch gain code C gp is not input to the decoding device 12, a predetermined value such as 0.5 is determined as g p ⁇ . Value is used.
- the value of g p ⁇ in this case is stored in advance in the long-term prediction information decoding unit 121 so that the same value can be used in the encoding device 11 and the decoding device 12.
- the signal sequence obtained by the long-term prediction synthesis unit 125 is output from the decoding device 12 as a digital acoustic signal sequence x (1),..., X (N t ).
- the long-term prediction combining unit 125 does nothing when the long-term prediction selection information indicates that long-term prediction is not executed.
- the frequency domain pitch period T when the frequency domain pitch period T is clear, by encoding the sample sequence rearranged according to the frequency domain pitch period T, efficient encoding can be performed. (That is, the average code length can be reduced). In addition, samples with the same or similar index are concentrated for each local region by rearranging the sample sequence, so that not only the efficiency of variable-length coding but also the reduction of quantization distortion and the amount of codes can be achieved. It has become.
- a value U ⁇ T 1 is an integral multiple of the encoding device 11 in terms of interval T 1 and converted interval T 1 in the first embodiment as the candidate value, but the conversion interval T 1 integral multiple of The frequency domain pitch period T may be determined using a multiple value other than the value U ⁇ T 1 as a candidate value.
- the encoding device 11 ′ of this modification is different from the encoding device 11 of the first embodiment in that a frequency domain pitch period analysis unit 115 ′ is provided instead of the frequency domain pitch period analysis unit 115.
- the frequency domain pitch period analysis section 115 ' a predetermined non-integer multiple U ⁇ T 1 of the integral multiple of U ⁇ T 1 and converted interval T 1 in terms intervals T 1 and converted interval T 1
- a frequency domain pitch period T is determined and output using the multiple value as a candidate value.
- the frequency domain pitch period analysis unit 115 ′ uses the integer value in the second range determined in advance as a candidate value as in the first embodiment.
- the area pitch period T is determined and output.
- Frequency domain pitch period analyzer 115 ' Frequency domain pitch period analysis section 115 ', the candidate value of a predetermined multiple of non-integral multiple U ⁇ T 1 Conversion intervals T 1 and Conversion interval T 1 integral multiple of U ⁇ T 1 and converted interval T 1
- the frequency domain pitch period T is determined as a value (the frequency domain pitch period T is determined from the candidate values including the conversion interval T 1 and a value U ⁇ T 1 that is an integer multiple of the conversion interval T 1 ), and the frequency domain pitch A frequency domain pitch period code indicating how many times the conversion period T 1 is equal to the period T and the frequency domain pitch period T is output.
- the conversion interval T 1 when the integer of the predetermined first range is 2 or more and 9 or less, the conversion interval T 1 , and its integer multiple values 2T 1 , 3T 1 , 4T 1 , 5T 1 , 6T 1 , 7T 1 , 8T 1 , 9T 1 , 1.9375T 1 , 2.0625T 1 , 2.125T 1 , 2.1875T 1 , 2.25T 1 , 2.9375T 1 , 3.0625T 1 , which are predetermined multiples other than an integer multiple of the conversion interval T 1
- a total of 16 values are frequency domain pitch period candidate values, and the frequency domain pitch period T is selected from these candidate values.
- the frequency domain pitch period code is a code of at least 4 bits corresponding to each of the 16 candidate values on a one-to-one basis.
- an integer in a first predetermined range does not necessarily include all integers that are greater than or equal to a certain integer and less than or equal to an integer.
- an integer that is 2 or more and 9 or less and that excludes 5 may be an integer in a first range determined in advance.
- a conversion interval T 1 a value that is an integer multiple thereof 2T 1 , 3T 1 , 4T 1 , 6T 1 , 7T 1 , 8T 1 , 9T 1 , other than an integer multiple of the conversion interval T 1 is determined in advance.
- the frequency domain pitch period code is a code of at least 4 bits corresponding to each of the 16 candidate values on a one-to-one basis.
- the frequency domain pitch period analysis unit 115 ′ uses the integer value in the second range determined in advance as a candidate value as in the first embodiment.
- the area pitch period T is determined.
- the decoding device 12 ′ of this modification is different from the decoding device 12 of the first embodiment in that a cycle conversion unit 122 ′ is provided instead of the cycle conversion unit 122.
- Period conversion unit 122 ′ Period conversion unit 122 ', or if the long-term prediction selection information indicates to perform the long-term prediction is many times the frequency domain pitch period T is converted interval T 1 by decoding the frequency-domain pitch period codes Obtain the value (multiple value) shown, obtain the conversion interval T 1 by the formula (A4) based on the pitch period L in the time domain and the number N of sample points in the frequency domain, and how many times the conversion interval T 1 is The frequency domain pitch period T is obtained and output by multiplying the indicated value.
- the period conversion unit 122 ′ obtains and outputs the frequency-domain pitch period T by decoding the frequency-domain pitch period code.
- the frequency domain pitch period T is determined by using a multiple value other than an integer multiple U ⁇ T 1 of the conversion interval T 1 as a candidate value.
- the length of the frequency domain pitch period code is determined by a variable length codebook.
- the frequency domain pitch period analysis unit 115 ′′ determines the pitch period T in consideration of the length of the frequency domain pitch period code.
- the encoding device 11 ′′ of the present modification is different from the encoding device 11 of the first embodiment in that a frequency domain pitch period analysis unit 115 ′′ is provided instead of the frequency domain pitch period analysis unit 115.
- Frequency domain pitch period analysis unit 115 '''' Frequency domain pitch period analysis section 115 '' is a value of a predetermined multiple of non-integral multiple U ⁇ T 1 Conversion intervals T 1 and Conversion interval T 1 integral multiple of U ⁇ T 1 and converted interval T 1
- a frequency domain pitch period T is determined as a candidate value (a frequency domain pitch period T is determined from candidate values including a conversion interval T 1 and a value U ⁇ T 1 that is an integer multiple of the conversion interval T 1 ), and the frequency domain
- the pitch period T and the frequency domain pitch period code indicating how many times the conversion interval T 1 is the frequency domain pitch period T are output.
- the frequency domain pitch period codes indicating how many times the frequency domain pitch period T is converted interval T 1
- the code length of the code corresponding to an integer multiple of V ⁇ T 1 Conversion interval T 1 is,
- the frequency domain pitch period code is determined using a variable length codebook that is shorter than the code length of the code corresponding to the other candidates.
- V is an integer.
- V is an integer other than 0, and for example, V is a positive integer.
- the code length of the variable length code when the frequency domain pitch period T is the conversion interval T 1 itself, and the variable length code when the frequency domain pitch period T is an integral multiple U ⁇ T 1 of the conversion interval T 1
- the frequency domain pitch period code may be determined using a variable length codebook (example 1) whose code length is shorter than the code length of the variable length code in other cases.
- the “variable length code” means a code that shortens the average code length by assigning a shorter code to a less frequent event for a more frequent event.
- the code length of the frequency domain pitch period code is other than the code length in the other cases Also short.
- An example of such a variable length codebook is shown in FIG. Since the integral multiple of the conversion interval T 1 has a property that is more frequently determined as the frequency domain pitch period than the other, by determining the frequency domain pitch period code using such a variable length codebook, The average codebook can be shortened.
- the code length of the variable length code when the frequency domain pitch period T is the conversion interval T 1 itself
- the code length of the variable length code when the frequency domain pitch period T is an integral multiple of the conversion interval T 1 U ⁇ T 1 length
- the code length of the variable length code for a frequency domain pitch period T is in the vicinity of the conversion interval T 1
- the frequency domain pitch period code may be determined using a variable length code book (example 2) in which the code length of the variable length code is shorter than the code length of the variable length code in other cases.
- Frequency domain pitch period codes in this case, when the frequency-domain pitch period T is of the conversion interval T 1, when an integral multiple of the translation interval T 1, when in the vicinity of the conversion interval T 1, in terms of distance T 1 The code length is shorter than the code length in other cases. If the frequency-domain pitch period T is of the conversion interval T 1, when an integral multiple of the translation interval T 1, when in the vicinity of the conversion interval T 1, when in the vicinity of integral multiples Conversion interval T 1, the Since the frequency selected as the frequency domain pitch period is higher than in other cases, the average code length is shortened by making the corresponding code length shorter than the code length in other cases. be able to.
- variable length code when the frequency domain pitch period T is the conversion interval T 1 itself is the variable length code when the frequency domain pitch period T is an integral multiple U ⁇ T 1 of the conversion interval T 1.
- the frequency domain pitch period code may be determined using a variable-length codebook (example 3) shorter than the code length. In this case, in the frequency domain pitch period code, when the frequency domain pitch period T is the conversion interval T 1 itself, the code length is shorter than the code length when it is near the conversion interval T 1 .
- variable-length codebook (example 4) shorter than the code length of the variable-length code in the case of being near may be used.
- the first frequency-domain pitch period codes in this case, when the first frequency-domain pitch period T is an integral multiple of the translation interval T 1, the better the code length, when it is near an integer multiple of the conversion interval T 1 It is shorter than the code length.
- the frequency-domain pitch period T may be determined using a variable-length codebook (example 5) to which variable-length codes are assigned so as to have a monotonic non-decreasing relationship with the size.
- At least the code length of the frequency domain pitch period code when the frequency domain pitch period T is a value V ⁇ T 1 that is an integral multiple of the conversion interval T 1 is monotonously non-decreasing with respect to the size of the integer V It becomes a relationship.
- variable length codebook (Example 6) having the characteristics of Examples 1 and 3 may be used, or the variable length codebook (Example 7) having the characteristics of Examples 2 and 3 may be used.
- the variable-length codebook (Example 8) having the characteristics of Examples 2 and 4 may be used, and the variable-length codebook (Example 9) having the characteristics of Examples 2, 3, and 4 may be used.
- a variable-length codebook (Example 10) that combines the features of any of [9] to [9] and Example 5 may be used.
- the encoding device 21 of the present embodiment is different from the encoding device 11 of the first embodiment in that a frequency domain pitch period analysis unit 215 is provided instead of the frequency domain pitch period analysis unit 115.
- the conversion interval T 1 and a value U ⁇ T 1 that is an integral multiple of the conversion interval T 1 An intermediate candidate value is determined from the inside, and the frequency domain pitch period T is determined and output from the intermediate candidate value and a value in a predetermined third range in the vicinity of the intermediate candidate value.
- the frequency-domain pitch period analysis unit 215 uses a predetermined integer value in the second range as a candidate value in the frequency domain as in the first embodiment.
- the pitch period T is determined and output.
- the frequency domain pitch period analysis unit 215 When the long-term prediction selection information indicates that long-term prediction is to be performed, the frequency domain pitch period analysis unit 215 first uses the conversion interval T 1 and a value U ⁇ T 1 that is an integer multiple of the conversion interval T 1 as a candidate value. And determine an intermediate candidate value. Next, the frequency domain pitch period analysis unit 215 determines the frequency domain pitch period T and outputs the frequency domain pitch period T using the intermediate candidate value and a predetermined third range value in the vicinity of the intermediate candidate value as the candidate value. To do. Further, the frequency domain pitch period analysis unit 215 displays information indicating how many times the intermediate candidate value is the conversion interval T 1 and information indicating the difference between the frequency domain pitch period T and the intermediate candidate value in the frequency domain. Output as pitch period code.
- a total of eight values 1 , 7T 1 , and 8T 1 are candidates for the intermediate candidate value, and the intermediate candidate value T cand is selected from these candidates.
- information indicating whether the intermediate candidate value is multiple of conversion interval T 1 is at least 3 bits, a code corresponding one-to-one with 1 to 8 each an integer.
- T cand ⁇ 3, T cand ⁇ 2, T cand ⁇ 1, T cand , T cand +1, and T cand +2 , T cand +3 and T cand +4 in total are candidates for the frequency domain pitch period T, and the frequency domain pitch period T is selected from these candidates.
- the information indicating the difference between the frequency domain pitch period T and the intermediate candidate value is a code corresponding to at least 3 bits and an integer of ⁇ 3 to 4 in a one-to-one correspondence.
- the value in the predetermined third range may be an integer value or a decimal value.
- the conversion interval T 1 and converted interval T 1 integral multiple of, in addition to the value U ⁇ T 1, the conversion interval T 1 integral multiple of U ⁇ T 1 than the An intermediate candidate value may be determined using a multiple value as a candidate value. That is, the intermediate candidate value may be determined from the candidate values including the conversion interval T 1 and the value U ⁇ T 1 that is an integer multiple of the conversion interval T 1 .
- the decoding device 22 of this embodiment is different from the decoding device 12 of the first embodiment in that a cycle conversion unit 222 is provided instead of the cycle conversion unit 122.
- the frequency conversion unit 222 decodes the frequency domain pitch period code, and the intermediate candidate value is a multiple of the conversion interval T 1.
- a certain integer value and a difference value between the frequency domain pitch period T and the intermediate candidate value are obtained, and the difference value is added to a value obtained by multiplying the conversion interval T 1 by the integer value. The thing is obtained and output as a frequency domain pitch period T.
- the period conversion unit 222 obtains and outputs the frequency-domain pitch period T by decoding the frequency-domain pitch period code.
- the encoding device 31 of the present embodiment is different from the encoding devices 11, 11 ′, and 21 of the first embodiment, the modified example of the first embodiment, and the second embodiment in the frequency domain pitch period analysis unit 115, Instead of 115 ′ and 215, a frequency domain pitch period analysis unit 315 is provided.
- the frequency domain pitch period analysis unit 315 replaces “when the long-term prediction selection information indicates that long-term prediction is to be performed” and “the quantized pitch gain g p ⁇ is greater than or equal to a predetermined value.
- the process is performed as “when the quantized pitch gain g p ⁇ is smaller than a predetermined value” instead of “when” and “when the long-term prediction selection information indicates that long-term prediction is not performed”. Except this, it is the same as the first embodiment and the second embodiment. Note that this embodiment is premised on the configuration in which the encoding device 31 obtains the quantized pitch gain g p ⁇ and the pitch gain code C gp in the first embodiment.
- the decoding device 32 of this embodiment is different from the decoding devices 12, 12 ′, and 22 of the first and second embodiments in that a cycle conversion unit 322 is provided instead of the cycle conversion units 122, 122 ′, and 222. It is.
- the period conversion unit 322 replaces “when the long-term prediction selection information indicates that long-term prediction is performed” with “when the quantized pitch gain g p ⁇ is greater than or equal to a predetermined value”, Instead of “when long-term prediction selection information indicates that long-term prediction is not performed”, “when quantized pitch gain g p ⁇ is smaller than a predetermined value”, processing is performed. Except this, it is the same as the first embodiment and the second embodiment. Note that this embodiment is premised on the configuration of the first embodiment in which the pitch gain code C gp is input to the decoding device 32 to obtain the quantized pitch gain g p ⁇ .
- the encoding device 41 of this embodiment is different from the encoding devices 11, 11 ′, and 21 of the first embodiment, the modification of the first embodiment, and the second embodiment in the long-term prediction analysis unit 111 and the long-term prediction.
- the residual generation unit 112 the frequency domain conversion unit 113a, the period conversion unit 114, and the frequency domain pitch period analysis units 115, 115 ′, and 215, a long-term prediction analysis unit 411, a long-term prediction residual generation unit 412, a frequency It is a point provided with the area
- long-term prediction analyzer 411 of the present embodiment to perform a long-term prediction, regardless of the value of the pitch gain g p. More specifically, long-term prediction analysis unit 411, regardless of the value of the pitch gain g p, the long-term prediction analysis unit 111 performs processing of the "long-term prediction selection information may indicate to perform a long-term prediction.” Thus, long-term prediction analysis unit 411, it is not necessary to carry out the presence or absence of determination of the execution of long-term prediction by whether the pitch gain g p is a predetermined value or more, there is no need to output the long-term prediction selection information.
- the long-term prediction residual generation unit 412, the frequency domain conversion unit 413 a, the period conversion unit 414, and the frequency domain pitch period analysis unit 415 are respectively the long-term prediction residual generation unit 112, the frequency domain conversion unit 113 a, and the period conversion unit 114.
- the processing corresponding to “when the long-term prediction selection information output by the long-term prediction analysis unit 111 indicates that long-term prediction is to be executed” of the frequency domain pitch period analysis units 115, 115 ′, and 215 is performed.
- the decoding device 42 of the present embodiment is different from the decoding devices 12, 12 ′, and 22 of the first and second embodiments in that the decoding unit 123a, the long-term prediction information decoding unit 121, the period conversion units 122, 122 ′, 222, the time domain conversion unit 124c, and the long-term prediction synthesis unit 125 are replaced by a decoding unit 423a, a long-term prediction information decoding unit 421, a period conversion unit 422, a time domain conversion unit 424c, and a long-term prediction synthesis unit 425. is there.
- long-term prediction synthesis is performed regardless of the long-term prediction selection information and the value of the quantized pitch gain g p ⁇ . Therefore, it is not necessary to input the long-term prediction selection information to the decoding device 42 of the present embodiment.
- the decoding unit 423a, the long-term prediction information decoding unit 421, the period conversion unit 422, the time domain conversion unit 424c, and the long-term prediction synthesis unit 425 of the present embodiment are respectively a decoding unit 123a, a long-term prediction information decoding unit 121, and a period conversion unit 122. , 122 ′, 222, the time domain conversion unit 124c, and the long-term prediction synthesis unit 125 perform processing corresponding to “when the long-term prediction selection information indicates that long-term prediction is executed”.
- the encoding devices 11, 11 ′, 21, 31, and 41 of the above embodiments include frequency domain transform units 113a and 413a, a weighted envelope normalization unit 113b, a normalization gain calculation unit 113c, and a quantization unit 113d.
- the quantized MDCT coefficient sequence in units of frames obtained by the quantizing unit 113d is used as the input of the frequency domain pitch period analyzing units 115, 115 ′, 215, 315, and 415.
- the encoding devices 11, 11 ′, 21, 31, 41 include processing units other than the frequency domain transform units 113a and 413a, the weighted envelope normalization unit 113b, the normalization gain calculation unit 113c, and the quantization unit 113d.
- the encoding devices 11, 11 ′, 21, 31, and 41 are configured by frequency domain conversion units 113a and 413a, a weighted envelope normalization unit 113b, a normalization gain calculation unit 113c, and a quantization unit 113d as an example.
- the frequency domain sample string generation unit 113 is provided.
- the frequency domain sample sequence generation unit 113 included in the encoding devices 11, 11 ′, 21, 31, and 41 performs processing for obtaining a frequency domain sample sequence derived from the long-term prediction residual signal. If long-term prediction is not performed, processing for obtaining a frequency-domain sample string derived from the acoustic signal is performed.
- the sample sequence obtained by the frequency domain sample sequence generation unit 113 is input to the frequency domain pitch period analysis units 115, 115 ′, 215, 315, and 415.
- the decoding devices 12, 12 ′, 22, 32, and 42 include, for example, a gain multiplication unit 124a, a weighted envelope denormalization unit 124b, and a time domain transform.
- the time domain signal sequence generation unit 124 configured by the units 124c and 424c is provided.
- the time domain signal sequence generation unit 124 included in the decoding devices 12, 12 ′, 22, 32, 42 receives a time domain signal sequence derived from the frequency domain sample sequence input from the decoding units 123 a, 423 a, or the recovery unit 123 b. Get the process.
- the signal sequence obtained by the time domain signal sequence generation unit 124 is the long-term prediction residual signal sequence x p. (1),..., X p (N t ) are input to the long-term prediction synthesis units 125 and 425.
- the signal sequence obtained by the time-domain signal sequence generation unit 124 is a digital acoustic signal sequence x (1). ,..., x (N t ) are output from the decoding devices 12, 12 ′, 22, 32, 42.
- the encoding device 51 of the present embodiment is the first embodiment, a modification of the first embodiment, the encoding devices 11, 11 of the second embodiment, the third embodiment, and the fourth embodiment.
- the difference from ', 21, 31, 41 is that the encoding device 51 does not include the frequency domain pitch period consideration encoding unit 116.
- the encoding device 51 functions as an encoding device that obtains a code for specifying the frequency domain pitch period.
- the frequency domain sample sequence output from the encoding device 51 is, for example, a frequency domain pitch period consideration code outside the encoding device 51.
- the data is input to the encoding unit 116 and encoded, but may be encoded using other encoding means.
- Others are the same as the encoding devices 11, 11 ′, 21, 31, 41 of the first embodiment, the modified example of the first embodiment, the second embodiment, the third embodiment, and the fourth embodiment.
- the decoding device 52 of the present embodiment is the first embodiment, a modification of the first embodiment, the decoding devices 12, 12 ′, second embodiment, third embodiment, and fourth embodiment of the first embodiment. 22, 32, and 42 is that the decoding device 52 does not include the frequency domain pitch period consideration decoding unit 123, the time domain signal sequence generation unit 124, and the long-term prediction synthesis unit 125.
- the decoding device 52 is a decoding device that obtains at least the long-term predicted frequency domain pitch period T and the time domain pitch period L from at least the frequency domain pitch period code and the time domain pitch period code included in the code string. Function.
- the time-domain pitch period L and the quantized pitch gain g p ⁇ output from the decoding device 52 are input to the long-term prediction synthesis unit 125.
- the code sequence, the frequency domain pitch period T output from the decoding device 52 (and auxiliary information when auxiliary information is input) are input to the frequency domain pitch period considering decoding unit 123.
- Others are the same as those of the decoding devices 12, 12 ′, 22, 32, and 42 of the first embodiment, the modified example of the first embodiment, the second embodiment, the third embodiment, and the fourth embodiment.
- the encoding device 61 and the decoding device 62 of the present embodiment are the first embodiment, a modification of the first embodiment, the second embodiment, the third embodiment, and the fourth embodiment.
- the difference is that a frequency domain pitch cycle consideration encoding unit 616 is configured instead of the frequency domain pitch cycle consideration encoding unit 116, and a frequency domain pitch cycle consideration decoding unit 623 is replaced with the frequency domain pitch cycle consideration decoding unit 123. It is a point that is composed.
- the frequency domain sample string is input to the frequency domain pitch period consideration encoding unit 616.
- the code string, frequency domain pitch period T, and auxiliary information are input to the frequency domain pitch period considering decoding unit 623.
- the frequency domain pitch cycle consideration encoding unit 616 and the frequency domain pitch cycle consideration decoding unit 623 will be described.
- the frequency domain pitch period consideration encoding unit 616 includes an encoding unit 616b, encodes an input frequency domain sample sequence by an encoding method based on the frequency domain pitch period T, and converts the obtained code sequence into Output.
- the encoding unit 616b includes one or a plurality of consecutive samples including samples corresponding to the frequency domain pitch period T in the frequency domain sample sequence, and an integer of the frequency domain pitch period T in the frequency domain sample sequence.
- a sample group G1 including all or a part of one or a plurality of consecutive samples including samples corresponding to a double, and a sample group G2 including samples not included in the sample group G1 in the frequency domain sample sequence Are encoded according to different criteria (differentiated), and the resulting code string is output.
- sample groups G1 and G2 “One or more consecutive samples including samples corresponding to the frequency domain pitch period T in the frequency domain sample sequence, and samples corresponding to an integer multiple of the frequency domain pitch period T in the frequency domain sample sequence
- a specific example of “all or a part of one or a plurality of consecutive samples including” is the same as in the first embodiment, and a group of such samples is the sample group G1.
- An example of the sample group G1 is a set of sample groups by the three samples F (nT-1), F (nT), and F (nT + 1).
- n represents each integer from 1 to 5
- the first sample group F (T-1), F (T), F (T + 1), and the second sample group F (2T-1) , F (2T), F (2T + 1), third sample group F (3T-1), F (3T), F (3T + 1), fourth sample group F (4T-1), F A group consisting of (4T), F (4T + 1) and the fifth sample group F (5T-1), F (5T), F (5T), F (5T + 1) is the sample group G1.
- a group of samples not included in the sample group G1 in the sample string input to the encoding unit 616b is the sample group G2.
- n represents each integer from 1 to 5
- fifth A group consisting of sample sets F (4T + 2),..., F (5T-2) and sixth sample sets F (5T + 2),... F (jmax) is an example of the sample group G2.
- the frequency domain pitch period T is a decimal
- F (R (nT-1)), F (R (nT)), F (R (nT + 1) ) May be the sample group G1.
- R (nT) is a value obtained by rounding off nT.
- the number of samples and the sample index included in each sample group constituting the sample group G1 may be variable, or a plurality of combinations of the number of samples included in each sample group constituting the sample group G1 and the sample index are different.
- Information indicating one selected from the options may be output as auxiliary information (first auxiliary information).
- the encoding unit 616b encodes the sample group G1 and the sample group G2 according to different criteria without rearranging the samples included in the sample groups G1 and G2, and outputs a code string obtained thereby.
- the samples included in the sample group G1 have an average larger amplitude than the samples included in the sample group G2.
- the samples included in the sample group G1 are variable-length-encoded according to the magnitude of the amplitude of the samples included in the sample group G1 or a criterion corresponding to the estimated value, and the amplitude of the samples included in the sample group G2 is Alternatively, the samples included in the sample group G2 are subjected to variable length coding according to a criterion corresponding to the estimated value.
- the encoding unit 616b uses the Rice parameter corresponding to the magnitude of the amplitude of the sample included in the sample group G1 or the estimated value thereof to perform the rice encoding for each sample included in the sample group G1.
- the encoding unit 616b uses the rice parameter corresponding to the magnitude of the amplitude of the sample included in the sample group G2 or the estimated value thereof to rice-encode the samples included in the sample group G2 for each sample.
- the encoding unit 616b outputs a code string obtained by the Rice encoding and auxiliary information for specifying the Rice parameter.
- the encoding unit 616b obtains the rice parameter of the sample group G1 in the frame from the average amplitude of the samples included in the sample group G1 in each frame.
- the encoding unit 616b obtains the Rice parameter of the sample group G2 in the frame from the average amplitude of the samples included in the sample group G2 in each frame.
- the Rice parameter is an integer greater than or equal to zero.
- the encoding unit 616b uses the Rice parameter of the sample group G1 to perform the Rice encoding of the sample included in the sample group G1, and uses the Rice parameter of the sample group G2 to apply the Rice code to the sample included in the sample group G2. Turn into. As a result, the average code amount can be reduced. This will be described in detail below.
- a code obtained by subjecting the sample X (k) included in the sample group G1 to the rice coding for each sample is a quotient q obtained by dividing the sample X (k) by a value corresponding to the rice parameter s of the sample group G1. It includes prefix (k) obtained by alpha-coding (k) and sub (k) for specifying the remainder. That is, the code corresponding to the sample X (k) in this example includes prefix (k) and sub (k). Note that the sample X (k) to be subjected to Rice encoding is expressed as an integer.
- Expressions (B1) to (B4) are standardized to express the quotient q (k) as follows.
- q (k) floor ⁇ (2 *
- -z) / 2 s ⁇ (z 0 or 1 or 2)...
- prefix (k) is a code obtained by alpha-coding the quotient q (k), and the code amount can be expressed as follows using equation (B7). floor ⁇ (2 *
- This Rice parameter s corresponds to the average amplitude D /
- the Rice parameter for the sample group G1 is obtained from the average amplitude of the samples included in the sample group G1
- the sample group G2 is determined from the average amplitude of the samples included in the sample group G2.
- the total amount of codes can be minimized by obtaining the Rice parameters and performing the rice coding by distinguishing between the sample group G1 and the sample group G2.
- Example 1 of auxiliary information for specifying rice parameters When the rice parameter corresponding to the sample group G1 and the rice parameter corresponding to the sample group G2 are distinguished and handled, on the decoding side, auxiliary information (third auxiliary information) for specifying the Rice parameter corresponding to the sample group G1; The auxiliary information (fourth auxiliary information) for specifying the rice parameter corresponding to the sample group G2 is required. Therefore, the encoding unit 616b may output the third auxiliary information and the fourth auxiliary information in addition to the code string formed by the code obtained by performing the rice encoding of the sample string for each sample.
- Example 2 of auxiliary information for specifying rice parameters When the acoustic signal is to be encoded, the average amplitude of the samples included in the sample group G1 is larger than the average amplitude of the samples included in the sample group G2, and the rice corresponding to the sample group G1. The parameter is larger than the rice parameter corresponding to the sample group G2. By utilizing this fact, it is possible to reduce the code amount of the auxiliary information for specifying the Rice parameter.
- the encoding unit 616b may output only one of the third auxiliary information and the fourth auxiliary information in addition to the code string.
- Example 3 of auxiliary information for specifying rice parameters Information that can identify the rice parameter corresponding to the sample group G1 alone is the fifth auxiliary information, and information that can specify the difference between the rice parameter corresponding to the sample group G1 and the rice parameter corresponding to the sample group G2 is the sixth auxiliary information. It is good. On the contrary, the information that can specify the rice parameter corresponding to the sample group G2 alone is the sixth auxiliary information, and the information that can specify the difference between the rice parameter corresponding to the sample group G1 and the rice parameter corresponding to the sample group G2 is the first information. 5 may be auxiliary information.
- the Rice parameter corresponding to the sample group G1 is larger than the Rice parameter corresponding to the sample group G2
- the magnitude relationship between the Rice parameter corresponding to the sample group G1 and the Rice parameter corresponding to the sample group G2 Auxiliary information (such as information indicating positive / negative) is not required.
- Example 4 of auxiliary information for specifying rice parameters When the number of code bits assigned to the entire frame is determined, the gain value obtained in step S113c is also considerably restricted, and the possible range of the sample amplitude is also greatly restricted. In this case, the average of the amplitudes of the samples can be estimated with a certain degree of accuracy from the number of code bits assigned to the entire frame.
- the encoding unit 616b may perform the rice encoding using the rice parameter estimated from the average estimated value of the amplitude of the sample.
- the encoding unit 616b uses a value obtained by adding a first difference value (for example, 1) to the estimated rice parameter as a rice parameter corresponding to the sample group G1, and uses the estimated rice parameter in the sample group G2. It may be used as a corresponding rice parameter.
- the encoding unit 616b uses the estimated rice parameter as the rice parameter corresponding to the sample group G1, and subtracts the second difference value (for example, 1) from the estimated rice parameter to the sample group G2. It may be used as a corresponding rice parameter.
- the encoding unit 616b in these cases for example, in addition to the code string, auxiliary information (seventh auxiliary information) for specifying the first difference value or auxiliary information (eighth auxiliary information) for specifying the second difference value. Information).
- Example 5 of auxiliary information for specifying rice parameters Even if the amplitudes of the samples included in the sample group G1 are not equal or the amplitudes of the samples included in the sample group G2 are not equal, the sample row X (1),. ., X (N) can be used to estimate a Rice parameter with a larger code amount reduction effect based on the envelope information of the amplitude of X (N). For example, when the amplitude of the sample is higher as the frequency is higher, the rice parameter corresponding to the higher frequency sample among the samples included in the sample group G1 is fixedly increased, and the sample included in the sample group G2 The amount of codes can be further reduced by fixedly increasing the rice parameter corresponding to the high frequency side sample. Specific examples are shown below.
- s1 and s2 are Rice parameters respectively corresponding to the sample groups G1 and G2 exemplified in [Examples 1 to 4 of auxiliary information for specifying Rice parameters].
- const.1 to const.10 are predetermined positive integers.
- the encoding unit 616b may output auxiliary information (the ninth auxiliary information) for specifying the envelope information in addition to the auxiliary information exemplified in the code strings and the Rice parameter examples 2 and 3.
- the encoding unit 616b may not output the seventh auxiliary information.
- the frequency domain pitch period consideration decoding unit 623 includes a decoding unit 623a, decodes the code string by a decoding method based on the frequency domain pitch period T, and obtains and outputs a frequency domain sample string.
- the decoding unit 623a selects one or a plurality of consecutive samples including a sample corresponding to the frequency domain pitch period T in the frequency domain sample sequence and a frequency in the frequency domain sample sequence.
- a sample group G2 based on samples is obtained by decoding a code string by a decoding process according to (differentiated) different criteria, and is output.
- the decoding unit 623a uses the input frequency domain pitch period T (if the first auxiliary information is input, based on the frequency domain pitch period T and the first auxiliary information) to convert the input code string for each frame.
- the code groups C1 and C2 included and the sample numbers included in the sample groups G1 and G2 to which the respective code groups correspond are specified, and each code corresponds to the sample value group obtained by decoding the code groups C1 and C2.
- a sample sequence in the frequency domain is obtained by obtaining sample groups G1 and G2 by assigning them to sample numbers.
- the code group C1 includes codes corresponding to samples included in the sample group G1 in the code string
- the code group C2 includes codes corresponding to samples included in the sample group G2 in the code string.
- the identification method of the code groups C1 and C2 in the decoding unit 623a corresponds to the setting method of the sample groups G1 and G2 in the encoding unit 616b.
- “sample” in the setting method of the sample groups G1 and G2 described above. Is replaced with “code”, “F (j)” with “C (j)”, “sample group G1” with “code group C1”, and “sample group G2” with “code group C2”. is there.
- C (j) is a code corresponding to the sample F (j).
- samples F (nT ⁇ 1) and F (nT + 1) before and after the sample F (nT) corresponding to an integer multiple of the frequency domain pitch period T are included in the sample sequence input to the encoding unit 616b.
- the decoding unit 623a receives the input code string C (1), ..., C (jmax), codes C (nT) corresponding to three sample numbers including sample numbers nT-1 and nT + 1 before and after the sample number nT corresponding to an integral multiple of the frequency domain pitch period T -1), C (nT), C (nT + 1) is a code group C1, a group of codes not included in the code group C1 is a code group C2, and a code C (nT included in the code group C1 -1), C (nT), and C (nT + 1), respectively, and sample F (nT-1) of sample number nT-1, sample F (nT) of
- nT-1, nT, nT + 1 other than sample number.
- n represents each integer from 1 to 5
- a group consisting of (4T), C (4T + 1), fifth code group C (5T-1), C (5T), C (5T), C (5T + 1) is the code group C1, and the first code set C (1), ..., C (T-2), second code set C (T + 2), ..., C (2T-2), third code set C (2T + 2), ..., C ( 3T-2), fourth code set C (3T + 2), ..., C (4T-2), fifth code set C (4T + 2), ..., C (5T-2), sixth A group consisting of code sets C (5T + 2),...
- C (jmax) is a code group C2, and these code groups and code sets are respectively decoded to obtain first sample groups F (T-1), F (T), F (T + 1), second sample group F (2T-1), F (2T), F (2T + 1), third sample group F (3T-1), F (3T ), F (3T + 1), fourth sample group F (4T-1), F (4T), F (4T + 1), fifth sample group F (5T-1), F (5T), F (5T + 1), first Sample set F (1), ..., F (T-2), second sample set F (T + 2), ..., F (2T-2), third sample set F (2T + 2), ...
- the decoding unit 623a decodes the code group C1 and the code group C2 according to different criteria, thereby obtaining and outputting a frequency domain sample string. For example, the decoding unit 623a decodes the code included in the code group C1 according to the magnitude of the amplitude of the sample included in the sample group G1 corresponding to the code group C1 or the criterion corresponding to the estimated value, and corresponds to the code group C2. The code included in the code group C2 is decoded according to the amplitude corresponding to the sample included in the sample group G2 or the criterion corresponding to the estimated value.
- the decoding unit 623a determines, for each frame, the Rice parameter corresponding to the sample group G1 specified from the input auxiliary information (at least part of the first to ninth auxiliary information) corresponding to the code group C1.
- the rice parameter corresponding to the sample group G2 is set as the parameter corresponding to the code group C2.
- the following is an example of a rice parameter identification method corresponding to [Examples 1 to 5 of auxiliary information for identifying rice parameters] described above.
- Example 1 of auxiliary information for identifying rice parameters For example, the decoding unit 623a to which the third auxiliary information and the fourth auxiliary information are input identifies the rice parameter corresponding to the sample group G1 from the third auxiliary information, sets it as the rice parameter corresponding to the code group C1, (4) A rice parameter corresponding to the sample group G2 is identified from the auxiliary information, and is set as a rice parameter corresponding to the code group C2.
- Example 2 of auxiliary information for identifying rice parameters the decoding unit 623a, to which only the fourth auxiliary information is input in addition to the code string, identifies the Rice parameter corresponding to the code group C2 from the fourth auxiliary information, and sets a fixed value ( For example, a value obtained by adding 1) is set as a rice parameter corresponding to the code group C1.
- the decoding unit 623a, to which only the third auxiliary information is input in addition to the code string identifies the Rice parameter corresponding to the code group C1 from the third auxiliary information and determines a fixed value ( For example, the value obtained by subtracting 1) is the Rice parameter corresponding to the code group C2.
- Example 3 of auxiliary information for identifying rice parameters the decoding unit 623a to which the fifth auxiliary information for specifying the Rice parameter and the sixth auxiliary information for specifying the difference are input specifies the Rice parameter corresponding to the sample group G1 from the fifth auxiliary information, and the code group It is assumed that the rice parameter corresponds to C1. Further, a value obtained by subtracting the difference specified from the sixth auxiliary information from the Rice parameter corresponding to the code group C1 is set as the Rice parameter corresponding to the code group C2.
- the decoding unit 623a to which the fifth auxiliary information specifying the difference and the sixth auxiliary information specifying the Rice parameter are input specifies the Rice parameter corresponding to the sample group G1 from the sixth auxiliary information, and the code group It is assumed that the rice parameter corresponds to C1. Further, a value obtained by adding the difference specified from the fifth auxiliary information to the rice parameter corresponding to the code group C2 is set as the rice parameter corresponding to the code group C1.
- the decoding unit 623a to which the seventh auxiliary information is input uses the Rice parameter estimated from the number of code bits assigned to the entire frame as the Rice parameter corresponding to the code group C2, and is specified from the seventh auxiliary information.
- the sum of the first difference values is set as a rice parameter corresponding to the code group C1.
- the decoding unit 623a to which the eighth auxiliary information is input uses the rice parameter estimated from the number of code bits assigned to the entire frame as the rice parameter corresponding to the code group C1, and is specified from the eighth auxiliary information.
- the value obtained by subtracting the second difference value is set as the Rice parameter corresponding to the code group C2.
- Example 5 of auxiliary information for identifying rice parameters the decoding unit 623a to which the ninth auxiliary information is input in addition to the auxiliary information for specifying the above-described Rice parameter specifies s1 and s2 using at least a part of the auxiliary information 3 to 8, and 9 Rice parameters corresponding to code groups C1 and C2 are obtained by adjusting s1 and s2 as shown in [Table 1] based on the auxiliary information. Even if the ninth auxiliary information is not input, the envelope information is known, and the encoding unit 616b adjusts s1 and s2 as described in [Table 1] to correspond to the sample groups G1 and G2, respectively. When the rice parameter to be obtained is obtained, the decoding unit 623a adjusts s1 and s2 as described in [Table 1] to obtain the rice parameters corresponding to the code groups C1 and C2, respectively.
- the decoding unit 623a that has obtained the Rice parameter as described above decodes the code included in the code group C1 using the Rice parameter corresponding to the code group C1 and uses the Rice parameter corresponding to the code group C2 for each frame. Thus, the codes included in the code group C2 are decoded, thereby obtaining and outputting the original sample sequence. Note that the decoding process corresponding to the rice encoding is well known, and thus the description thereof is omitted.
- the encoding device 81 of the present embodiment is different from the encoding device 51 of the fifth embodiment in that the encoding device 81 has a long-term prediction analysis unit 111, a long-term prediction residual generation unit 112, and a frequency.
- the area sample string generation unit 113 is not included.
- the encoding device 81 receives a time domain pitch period L, a time domain pitch period code CL, and a frequency domain sample sequence from the outside of the encoding device 81, and a frequency domain pitch period for the frequency domain sample sequence. It functions as an encoding device that obtains a code for specifying.
- the pitch period L and the time domain pitch period codes C L in the time domain input to the encoding device 81 is, for example, are calculated by the long-term prediction analysis unit 111, calculated using the other time-domain pitch period calculating means May be.
- the frequency domain sample sequence input to the encoding device 81 is a sample sequence corresponding to a sample sequence obtained by converting the input digital acoustic signal sequence into N points in the frequency domain. It may be a quantized MDCT coefficient sequence calculated by the region sample sequence generation unit 113, or may be a frequency domain sample sequence generated using other frequency domain sample sequence generation means.
- the period conversion unit 814 of the encoding device 81 receives the pitch period L in the time domain and the number N of sample points in the frequency domain, and calculates and outputs the conversion interval T 1 .
- the process for obtaining the conversion interval T 1 is the same as that of the period conversion unit 114.
- a time domain pitch period code C L corresponding to the time domain pitch period L may be input.
- the time domain pitch period code C L corresponding to the input time domain pitch period code C L may be input.
- the time domain pitch period L to be obtained is obtained, and the conversion interval T 1 is obtained from the time domain pitch period L and output.
- the frequency domain pitch period analysis unit 815 receives the conversion interval T 1 and the frequency domain sample string. Frequency domain pitch period analysis section 815, converted interval T 1 and the value of the integral multiple of the conversion interval T 1 U ⁇ T 1 (however, U is an integer in the first range of predetermined) from the candidate values including, frequency domain A pitch period is determined, and a code for specifying the frequency domain pitch period is obtained and output.
- the process of determining the frequency domain pitch period and the process of obtaining the code for specifying the frequency domain pitch period are performed by the long-term prediction selection information of the frequency domain pitch period analysis units 115, 115 ′, 215, 315, and 415. This is the same as the process for indicating that the
- the period conversion unit 814 and the frequency domain pitch period analysis unit 815 are similar to the period conversion units 114 and 414 and the frequency domain pitch period analysis units 115, 115 ′, 215, 315, and 415, and the long-term prediction selection information is the long-term prediction. It is good also as a structure which performs a different process with the case where it shows that this is performed, and the case where long-term prediction selection information shows not performing long-term prediction. In this case, long-term prediction selection information is also input to the encoding device 81 in the long-term prediction analysis unit 111 outside the encoding device 81.
- the decoding device 82 of the present embodiment is different from the decoding device 52 of the fifth embodiment in that the decoding device 82 does not include the long-term prediction information decoding unit 121.
- the decoding apparatus 82 includes the time domain pitch period L obtained by the long-term prediction information decoding unit 121 outside the decoding apparatus 82, and at least the frequency domain pitch period code and the time domain pitch period included in the input code string. It functions as a decoding device that obtains at least the frequency domain pitch period T from the code.
- the code string, the frequency domain pitch period T output from the decoding device 52 (and auxiliary information when auxiliary information is input) are input to the frequency domain pitch period considering decoding unit 123.
- Others are the same as the decoding apparatus 52 of 5th Embodiment.
- the frequency domain pitch period T obtained by the encoding devices 51 and 81 is converted into the frequency domain by the external frequency domain pitch period consideration encoding units 116 and 616.
- the frequency domain pitch period code corresponding to the frequency domain pitch period T is output on the premise that it is used for encoding the sample sequence.
- the frequency domain pitch period T can be used for purposes other than encoding.
- the frequency domain pitch period code corresponding to the frequency domain pitch period T may not be output. Examples of purposes other than encoding include analysis of voices and musical sounds, separation of multiple voices and musical sounds, and recognition of voices and musical sounds.
- the frequency domain pitch period analyzer 91 of the ninth embodiment is different from the encoders 51 and 81 of the fifth embodiment, the seventh embodiment, and the eighth embodiment in that the frequency domain pitch The frequency domain pitch period code corresponding to the period T is not output.
- the frequency domain pitch period analyzer 91 functions as a frequency domain pitch period analyzer that determines the frequency domain pitch period for the frequency domain sample sequence from the time domain pitch period L input from the outside.
- the period conversion unit 914 of the ninth embodiment receives the pitch period L in the time domain and the number N of sample points in the frequency domain, and calculates and outputs the conversion interval T 1 .
- the process for obtaining the conversion interval T 1 is the same as that of the period conversion unit 114.
- a frequency domain pitch period analysis section 915 a conversion interval T 1 and the frequency-domain sample sequence is input, converted interval T 1 and the value of the integral multiple of the conversion interval T 1 U ⁇ T 1 (however, U is predetermined
- a frequency domain pitch period is determined from candidate values including an integer in the first range, and the determined frequency domain pitch period is output.
- the rearrangement processing unit 116a and the encoding unit 116b as the frequency domain pitch period consideration encoding unit.
- the configuration of the encoding unit 616b is described as the frequency domain pitch period consideration encoding unit.
- any frequency domain pitch period consideration encoding unit is “frequency domain pitch period T In the encoding method based on the above, the input frequency domain sample sequence is encoded and the resulting code sequence is output.
- the frequency of the frequency domain sample sequence is output.
- One or a plurality of consecutive samples including samples corresponding to the region pitch period T, and a sample corresponding to an integer multiple of the frequency region pitch period T in the frequency domain sample sequence.
- a sample group of all or a part of one or a plurality of consecutive samples including a pull and a sample group of a sample not included in the sample group G1 in the frequency domain sample sequence according to different criteria ( It is encoded) and the resulting code string is output.
- the frequency domain pitch period consideration decoding unit is "decoding the input code string and outputting a frequency domain sample string by a decoding method based on the frequency domain pitch period T", and more specifically, “From the input code sequence, one or a plurality of consecutive samples including samples corresponding to the frequency domain pitch period T in the frequency domain sample sequence, and the frequency domain pitch period T in the frequency domain sample sequence A sample group including all or a part of one or a plurality of consecutive samples including samples corresponding to an integer multiple of, and a sample group in a frequency domain sample string A sample group according to a sample that is not included in 1, according to different criteria (the distinction to) decoded to obtain a sample sequence in the frequency domain output. "Is intended.
- the encoding device / decoding device may include an input unit to which a keyboard or the like can be connected, an output unit to which a liquid crystal display or the like can be connected, a CPU (Central Processing Unit) [cache memory or the like. ] RAM (Random Access Memory) and ROM (Read Only Memory), external storage devices that are hard disks, and these input units, output units, CPU, RAM, ROM, and exchange of data between external storage devices It has a bus that connects as much as possible. If necessary, the encoder / decoder may be provided with a device (drive) that can read and write a storage medium such as a CD-ROM.
- a device drive
- the external storage device of the encoding device / decoding device stores a program for executing encoding / decoding and data necessary for processing of this program [not limited to the external storage device, for example, a program It may be stored in a ROM which is a read-only storage device. ]. Data obtained by the processing of these programs is appropriately stored in a RAM or an external storage device.
- a storage device that stores data, addresses of storage areas, and the like is simply referred to as a “storage unit”.
- the storage unit of the encoding device stores a program for rearranging the frequency domain sample sequences derived from the audio-acoustic signal, a program for encoding the sample sequences obtained by the rearrangement, and the like. .
- the storage unit of the decoding device stores a program for decoding the input code sequence, a program for restoring the sample sequence obtained by decoding to a sample sequence before being rearranged by the encoding device, and the like. Has been.
- each program stored in the storage unit and data necessary for the processing of each program are read into the RAM as necessary, and interpreted and executed by the CPU.
- encoding is realized by the CPU realizing predetermined functions (such as a rearrangement processing unit and an encoding unit).
- each program stored in the storage unit and data necessary for processing each program are read into the RAM as necessary, and are interpreted and executed by the CPU.
- the decoding is realized by the CPU realizing a predetermined function (decoding unit, recovery unit, etc.).
- the present invention is not limited to the above-described embodiment, and can be modified as appropriate without departing from the spirit of the present invention.
- the processing described in the above embodiment may be executed not only in time series according to the order of description but also in parallel or individually as required by the processing capability of the device that executes the processing. .
- the process by the long-term prediction information decoding unit 121 and the processes by the decoding units 123a and 523a can be executed in parallel.
- processing functions in the hardware entity (encoding device / decoding device) described in the above embodiment are realized by a computer, the processing contents of the functions that the hardware entity should have are described by a program. Then, by executing this program on a computer, the processing functions in the hardware entity are realized on the computer.
- the program describing the processing contents can be recorded on a computer-readable recording medium.
- a computer-readable recording medium is a non-transitory recording medium.
- the computer-readable recording medium for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.
- a magnetic recording device a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only) Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto-Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.
- this program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.
- a computer that executes such a program first stores a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device.
- the computer reads a program stored in its own recording medium and executes a process according to the read program.
- the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer.
- the processing according to the received program may be executed sequentially.
- the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good.
- ASP Application Service Provider
- the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).
- the hardware entity is configured by executing a predetermined program on the computer.
- a predetermined program on the computer.
- at least a part of these processing contents may be realized in hardware.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
「符号化装置11」
図1を参照して符号化装置11が行う符号化処理を説明する。符号化装置11の各部は、所定の時間区間であるフレーム単位に、以下の動作をする。以下の説明では、フレームのサンプル数がNtであり、1フレーム分のディジタル音響信号がディジタル音響信号列x(1),...,x(Nt)であるとしている。
(概要)
長期予測分析部111は、所定の時間区間であるフレーム単位に、入力されたディジタル音響信号列x(1),...,x(Nt)に対応する時間領域のピッチ周期Lを得て(ステップS111-1)、当該時間領域のピッチ周期Lに対応するピッチ利得gpを算出し(ステップS111-2)、当該ピッチ利得gpに基づいて長期予測を実行するか否かを示す長期予測選択情報を求めて出力し(ステップS111-3)、長期予測選択情報が長期予測を実行することを示す場合には、少なくとも時間領域のピッチ周期Lと、時間領域のピッチ周期Lを特定する時間領域ピッチ周期符号CLとを更に出力する(ステップS111-4)。
長期予測分析部111は、例えば、予め定めた時間領域のピッチ周期の候補τの中から、式(A1)により得られる値が最大となる候補τをディジタル音響信号列x(1),...,x(Nt)に対応する時間領域のピッチ周期Lとして選択する。
候補τおよび時間領域のピッチ周期Lは、整数のみを用いて表現される場合(整数精度)のみならず、整数と小数値(分数値)とを用いて表現される場合(小数精度)もある。小数精度の候補τに対する式(A1)の値を求める場合には、複数のディジタル音響信号サンプルに重み付き平均操作を行う補間フィルタを用いてx(t-τ)を求める。
長期予測分析部111は、ピッチ利得gpが予め定めた値以上である場合には長期予測を実行することを示す長期予測選択情報を得て出力し、ピッチ利得gpが上記の予め定めた値未満である場合には長期予測を実行しないとを示す長期予測選択情報を得て出力する。
長期予測選択情報が長期予測を実行することを示す場合には、長期予測分析部111は、以下を行う。
そして、長期予測分析部111は、上記の長期予測選択情報に加えて、時間領域のピッチ周期Lと、時間領域ピッチ周期符号CLと、を出力する。
そして、長期予測分析部111は、上記の長期予測選択情報と、時間領域のピッチ周期Lと、時間領域ピッチ周期符号CLと、に加えて、量子化済みピッチ利得gp^と、ピッチ利得符号Cgpとを出力する。
長期予測分析部111が出力した長期予測選択情報が長期予測を実行することを示す場合には、長期予測残差生成部112は、所定の時間区間であるフレーム単位に、入力されたディジタル音響信号列から長期予測された信号を除いた長期予測残差信号列を生成して出力する。例えば、入力されたディジタル音響信号列x(1),...,x(Nt)と時間領域のピッチ周期Lと量子化済みピッチ利得gp^に基づき、式(A3)により長期予測残差信号列xp(1),...,xp(Nt)を算出することにより生成する。長期予測分析部111が量子化済みピッチ利得gp^を出力しない場合には、gp^として例えば0.5などの予め定めた値を用いる。
xp(t) = x(t)-gp^x(t-L) (A3)
まず、周波数領域変換部113aがフレーム単位で、長期予測分析部111が出力した長期予測選択情報が長期予測を実行することを示す場合には入力された長期予測残差信号列xp(1),...,xp(Nt)を、長期予測分析部111が出力した長期予測選択情報が長期予測を実行しないことを示す場合には入力されたディジタル音響信号列x(1),...,x(Nt)を、周波数領域のN点(Nを「変換フレーム長」と呼ぶ)のMDCT係数列X(1),...,X(N)に変換する(ステップS113a)。周波数領域変換部113aは、時間領域で2*N点の長期予測残差信号列またはディジタル音響信号列に窓をかけた後の信号列のMDCT変換を行い、周波数領域でN点の係数を得る。なお、記号*は乗算を表す。周波数領域変換部113aは、時間領域での窓をN点ずつずらすことでフレームを更新する。この際、隣り合うフレームのサンプルはN点ずつ重複する。長期予測分析の対象サンプルとMDCT変換での窓の対象サンプルとは独立で、遅延や、重ね合わせの程度で窓の形を設定できる。例えば長期予測分析の対象サンプルとして重ね合わせのないサンプル部分からNt点を取りだせばよい。また重ね合わせのあるサンプルに対しても長期予測分析を行う場合には、重ね合わせ処理と長期予測の差分と合成の処理の適応順序などを設定し、符号化装置と復号装置で大きな誤差を生じないようにする必要がある。
重み付け包絡正規化部113bが、フレーム単位のディジタル音響信号列に対する線形予測分析によって求められた線形予測係数を用いて推定されたディジタル音響信号列のパワースペクトル包絡係数列によって、入力されたMDCT係数列の各係数を正規化し、重み付け正規化MDCT係数列を出力する(ステップS113b)。ここでは聴覚的に歪が小さくなるような量子化の実現のために、重み付け包絡正規化部113bは、パワースペクトル包絡を鈍らせた重み付けパワースペクトル包絡係数列を用いて、フレーム単位でMDCT係数列の各係数を正規化する。この結果、重み付け正規化MDCT係数列は、入力されたMDCT係数列ほどの大きな振幅の傾きや振幅の凹凸を持たないが、音声音響ディジタル信号のパワースペクトル包絡係数列と類似の大小関係を有するもの、すなわち、低い周波数に対応する係数側の領域にやや大きな振幅を持ち、時間領域のピッチ周期に起因する微細構造をもつもの、となる。
N点のMDCT係数列の各係数X(1),・・・,X(N)に対応するパワースペクトル包絡係数列の各係数W(1),・・・,W(N)は、線形予測係数を周波数領域に変換して得ることができる。例えば、全極型モデルであるp次自己回帰過程により、時刻に対応するサンプル点tのディジタル音響信号x(t)は、p時点(pは正整数)まで遡った過去の自分自身の値x(t-1),・・・,x(t-p)と予測残差e(t)と線形予測係数α1,・・・,αpによって式(1)で表される。このとき、パワースペクトル包絡係数列の各係数W(n)[1≦n≦N]は式(2)で表される。exp(・)はネイピア数を底とする指数関数、jは虚数単位、σ2は予測残差エネルギーである。
<例1>
重み付け包絡正規化部113bは、MDCT係数列の各係数X(1),・・・,X(N)を当該各係数に対応するパワースペクトル包絡係数列の各係数の補正値Wγ(1),・・・,Wγ(N)で除算することによって、重み付け正規化MDCT係数列の各係数X(1)/Wγ(1),・・・,X(N)/Wγ(N)を得る処理を行う。補正値Wγ(n)[1≦n≦N]は式(3)で与えられる。但し、γは1以下の正の定数であり、パワースペクトル係数を鈍らせる定数である。
重み付け包絡正規化部113bは、MDCT係数列の各係数X(1),・・・,X(N)を当該各係数に対応するパワースペクトル包絡係数列の各係数のβ乗(0<β<1)の値W(1)β,・・・,W(N)βで除算することによって、重み付け正規化MDCT係数列の各係数X(1)/W(1)β,・・・,X(N)/W(N)βを得る処理を行う。
次に、正規化利得計算部113cが、重み付け正規化MDCT係数列を入力とし、フレームごとに、重み付け正規化MDCT係数列の各係数を与えられた総ビット数で量子化できるように、全周波数に亘る振幅値の和またはエネルギー値を用いて量子化ステップ幅を決定し、この量子化ステップ幅になるように重み付け正規化MDCT係数列の各係数を割り算する係数(以下、利得という。)を求める(ステップS113c)。この利得を表す情報は、利得情報として復号側へ伝送される。正規化利得計算部113cは、フレームごとに、入力された重み付け正規化MDCT係数列の各係数をこの利得で正規化(除算)して出力する。
次に、量子化部113dが、フレームごとに、利得で正規化された重み付け正規化MDCT係数列の各係数をステップS113cの処理で決定された量子化ステップ幅で量子化し、得られた量子化MDCT係数列を「周波数領域のサンプル列」として出力する(ステップS113d)。
周期換算部114は、長期予測選択情報が長期予測を実行することを示す場合には、入力された時間領域のピッチ周期Lと周波数領域のサンプル点数Nとに基づき、式(A4)により換算間隔T1を求めて出力する。式(A4)の「INT()」は、()内の数値の小数点以下を切り捨てたものを表す。
T1=INT(N*2/L) (A4)
なお理論的な換算周期はN*2/L‐1/2であるが、換算間隔T1を整数値とする場合にはこれを四捨五入するために1/2を加えて切り捨てる。または、N*2/L‐1/2を予め定めた小数点桁数以下を四捨五入して換算間隔T1としてもよい。例えば、N*2/L‐1/2が2進5桁の小数部をもつ疑似浮動小数点形式で保持し、整数値としてのピッチ周期を四捨五入で求める場合は、25*(N*2/L‐1/2+1/2)を切り捨てた値を換算間隔T1とし、T1を整数倍した結果を1/25=1/32倍して浮動小数点数に戻した値を候補として、周波数領域のピッチ周期を決定しても良い。
周期換算部114は、長期予測選択情報が長期予測を実行しないことを示す場合には、何もしない。ただし、長期予測選択情報が長期予測を実行する場合と同様の処理を行っても問題は無い。すなわち、周期換算部114には、長期予測選択情報が入力されず、入力された時間領域のピッチ周期Lと周波数領域のサンプル点数Nとが入力され、換算間隔T1を求めて出力する構成であってもよい。
周波数領域ピッチ周期分析部115は、長期予測選択情報が長期予測を実行することを示す場合には、入力された換算間隔T1および換算間隔T1の整数倍の値U×T1を候補値として、周波数領域ピッチ周期Tを決定し、周波数領域ピッチ周期Tと周波数領域ピッチ周期Tが換算間隔T1の何倍であるかを示す周波数領域ピッチ周期符号とを出力する。ただし、Uは予め定めた第1の範囲の整数である。例えばUは0を除く整数であり、例えばU≧2である。例えば、予め定めた第1の範囲の整数が2以上8以下である場合は、換算間隔T1、換算間隔T1の2倍~8倍の2T1、3T1、4T1、5T1、6T1、7T1、8T1の計8個の値が周波数領域ピッチ周期の候補値であり、これらの候補値の中から周波数領域ピッチ周期Tが選択される。この場合は、周波数領域ピッチ周期符号は、少なくとも3ビットの、1以上8以下の整数それぞれと一対一に対応する符号である。
ただし、ρは(1/N)1/2などの係数であり、kは周波数に対応するインデックスk=1,...,Nである。すなわち各MDCT係数列X(k)は、例えば、以下の2*N次元の正規直交基底ベクトルB(k)と信号列ベクトル(xp’(1),...,xp’(2*N))との内積である。
T1’=2*N/L≒2*N/n*Pf= (2*N/Pf)/n (A41)
つまり、間隔T1’は理想換算間隔(2*N/Pf)の1/n倍で近似することができる。このような場合、間隔T1’そのものではなく、間隔の整数倍n*T1’が理想換算間隔2*N/Pfに対応する。
さらに、周波数領域におけるサンプリング間隔の整数倍は、理想換算間隔2*N/Pfに対応しているとは限らない。例えば、図4の例では、理想換算間隔2*N/PfがMDCT係数列X(1),...,X(N)の隣接サンプル間隔の整数倍となっていないため、理想換算間隔2*N/Pfを周波数領域ピッチ周期Tとしてサンプル群を選択することができない。しかし、周波数領域のピッチ周期に基づいて選択されるサンプル群へのエネルギーの集中度を大きくするという目的においては、理想換算間隔2*N/Pfそのものが周波数領域のピッチ周期として選択できなくても、理想換算間隔2*N/Pfのm倍(ただし、mは正整数)を周波数領域ピッチ周期T=m*2*N/Pfとしてサンプル群を選択することで、選択されたサンプル群へのエネルギーの集中度を示す指標値を大きくすることができる。つまり、選択されるサンプル群へのエネルギーの集中度を大きくするという目的においては、周波数領域ピッチ周期Tと換算間隔T1’との関係は、式(A41)を用いて以下のように書ける。
T=m*(2*N/Pf) ≒m*n*T1’ (A42)
さらに、式(A42)は式(A4)の換算間隔T1を用いて以下のように近似できる。
T≒m*n*INT(T1’)=m*n*INT(2*N/L)=m*n*T1 (A43)
つまり、周波数領域のピッチ周期Tは、換算間隔T1の整数倍で近似することができる。言い換えれば、換算間隔T1の整数倍の値の方が、それ以外の値よりもサンプル群へのエネルギーの集中度を示す指標値を大きくするような周波数領域のピッチ周期Tである可能性が高い。すなわち、換算間隔T1および換算間隔T1の整数倍とその近傍の値を候補値として、周波数領域ピッチ周期Tを決定することで、サンプル群へのエネルギーの集中度を示す指標値を大きくすることができる。
上述のように、nは値が小さいものほど使われる可能性が高い傾向にあり、mは正整数なので、周波数領域においては、周波数領域ピッチ周期Tの換算間隔T1に対する乗数m*nが小さいものほど、周波数領域ピッチ周期Tとして決定されやすい傾向にあると言える。すなわち、換算間隔T1の整数倍の倍数値が小さいほど周波数領域ピッチ周期Tとして決定されやすい傾向にあるといえる。
周波数領域ピッチ周期考慮符号化部116は、並べ替え処理部116aと符号化部116bとを備え、周波数領域ピッチ周期Tに基づく符号化方法で、入力された周波数領域のサンプル列を符号化し、それによって得られた符号列を出力する。
並べ替え処理部116aは、(1)周波数領域のサンプル列の全てのサンプルを含み、かつ、(2)周波数領域のサンプル列のうちの周波数領域ピッチ周期分析部115が決定した周波数領域ピッチ周期Tに対応するサンプルを含む一つまたは連続する複数のサンプルおよび、周波数領域サンプル列のうちの周波数領域ピッチ周期Tの整数倍に対応するサンプルを含む一つまたは連続する複数のサンプル、の全部または一部のサンプルが集まるようにサンプル列に含まれる少なくとも一部のサンプルを並べ替えたもの、を並べ替え後のサンプル列として出力する。つまり、周波数領域ピッチ周期Tに対応するサンプルを含む一つまたは連続する複数のサンプルおよび、当該周波数領域ピッチ周期Tの整数倍に対応するサンプルを含む一つまたは連続する複数のサンプルが集まるように、入力されたサンプル列に含まれる少なくとも一部のサンプルが並べ替えられる。
このように並べ替えられた後のサンプル列は、周波数を横軸とし、サンプルの指標を縦軸とした場合に、サンプルの指標の包絡線が周波数の増大に伴って増大傾向を示すことになる。換言すれば、並べ替え処理部116aは、サンプルの指標の包絡線が周波数の増大に伴って増大傾向を示すように入力されたサンプル列に含まれる少なくとも一部のサンプルを並べ替えると言ってもよい。
また、この実施形態では、各サンプル群に含まれるサンプルの個数が、周波数領域ピッチ周期Tないしその整数倍に対応するサンプル(以下、中心サンプルという)とその前後1サンプルの計3サンプルであるという固定された個数の例を示した。しかしながら、サンプル群に含まれるサンプルの個数やサンプルインデックスを可変とする場合には、並び替え処理部116aは、サンプル群に含まれるサンプルの個数とサンプルインデックスの組み合わせが異なる複数の選択肢の中から選択された一つを表す情報を補助情報(第1補助情報)として出力する。
例えば、選択肢として、
(1)中心サンプルのみ、F(nT)
(2)中心サンプルとその前後1サンプルの計3サンプル、F(nT-1),F(nT),F(nT+1)
(3)中心サンプルとその前2サンプルの計3サンプル、F(nT-2),F(nT-1),F(nT)
(4)中心サンプルとその前3サンプルの計4サンプル、F(nT-3),F(nT-2),F(nT-1),F(nT)
(5)中心サンプルとその後2サンプルの計3サンプル、F(nT),F(nT+1),F(nT+2)
(6)中心サンプルとその後3サンプルの計4サンプル、F(nT),F(nT+1),F(nT+2),F(nT+3)
が設定されている場合に、(4)が選択されたならば、この(4)が選択されたことを表す情報を第1補助情報とする。この例であれば、選択された選択肢を表す情報として3ビットあれば十分である。
次に、符号化部116bが、並べ替え処理部116aが出力したサンプル列を符号化し、得られた符号列を出力する(ステップS116b)。例えば、符号化部116bは、並べ替え処理部116aが出力したサンプル列に含まれるサンプルの振幅の偏りに応じて可変長符号化の方法を切り替えて符号化する。つまり、並べ替え処理部116aによってフレーム内で、低域側(あるいは高域側)に振幅の大きなサンプルが集められているので、符号化部116bはその偏りに適した方法による可変長符号化を行う。並べ替え処理部116aが出力したサンプル列のように、局所的な領域ごとに同等か同程度の振幅を持つサンプルが集まっていると、例えば領域ごとに異なるライスパラメータでライス符号化することによって平均符号量を削減できる。以下、フレーム内で低域側(フレームの先頭に近い側)に振幅の大きなサンプルが集められている場合を例に採って説明する。
具体例として、符号化部116bは、大きな振幅を持つサンプルが集まっている領域ではサンプルごとにライス符号化(ゴロム-ライス符号化ともいう)を適用する。この領域以外の領域では、符号化部116bは、複数のサンプルをまとめたサンプルの集合に対する符号化にも適するエントロピー符号化(ハフマン符号化や算術符号化など)を適用する。ライス符号化の適用に関して、ライス符号化の適用領域とライスパラメータが固定されていてもよいし、あるいは、ライス符号化の適用領域とライスパラメータの組み合わせが異なる複数の選択肢の中から一つ選択できる構成であってもよい。このような複数の選択肢から一つを選択する際、ライス符号化の選択情報として、例えば下記のような可変長符号(記号""で囲まれたバイナリ値)を使うことができ、符号化部116bは選択情報も出力する。
"1":ライス符号化を適用しない
"01":ライス符号化を先頭から1/32の領域にライスパラメータを1として適用する。
"001":ライス符号化を先頭から1/32の領域にライスパラメータを2として適用する。
"0001":ライス符号化を先頭から1/16の領域にライスパラメータを1として適用する。
"00001":ライス符号化を先頭から1/16の領域にライスパラメータを2として適用する。
"00000":ライス符号化を先頭から1/32の領域にライスパラメータを3として適用する。
図2を参照して復号装置12が行う復号処理を説明する。
復号装置12には、少なくとも、上記長期予測選択情報と、上記利得情報と、上記周波数領域ピッチ周期符号と、上記符号列が入力される。また、上記長期予測選択情報が長期予測を実行することを示す場合には、少なくとも時間領域ピッチ周期符号CLが入力される。時間領域ピッチ周期符号CLに加えてピッチ利得符号Cgpも入力される場合もある。なお、符号化装置11から選択情報や第1補助情報や第2補助情報が出力された場合にはこの選択情報や第1補助情報や第2補助情報も復号装置12に入力される。
周波数領域ピッチ周期考慮復号部123は、復号部123aと回復部123bとを備え、周波数領域ピッチ周期Tに基づく復号方法で、入力された符号列を復号して元のサンプルの並びを得て出力する。
復号部123aが、フレームごとに、入力された符号列を復号して周波数領域のサンプル列を出力する(ステップS123a)。
長期予測情報復号部121は、長期予測選択情報が長期予測を実行することを示す場合には、入力された時間領域ピッチ周期符号CLを復号して時間領域のピッチ周期Lを得て出力する。ピッチ利得符号Cgpも入力された場合には、さらに、ピッチ利得符号Cgpを復号して量子化済みピッチ利得gp^を得て出力する。
周期換算部122は、長期予測選択情報が長期予測を実行することを示す場合には、入力された周波数領域ピッチ周期符号を復号して周波数領域ピッチ周期Tが換算間隔T1の何倍であるかを示す整数値を得て、時間領域のピッチ周期Lと周波数領域のサンプル点数Nとに基づき式(A4)によって換算間隔T1を得て、換算間隔T1に整数値を乗算することで周波数領域ピッチ周期Tを得て出力する。
周期換算部122は、長期予測選択情報が長期予測を実行しないことを示す場合には、入力された周波数領域ピッチ周期符号を復号して周波数領域ピッチ周期Tを得て出力する。
次に、回復部123bが、フレームごとに、周期換算部122が得た周波数領域ピッチ周期Tに従って、または、復号装置12に補助情報が入力された場合には周期換算部122が得た周波数領域ピッチ周期Tと入力された補助情報とに従って、復号部123aが出力した周波数領域のサンプル列から元のサンプルの並びを得て出力する(ステップS123b)。ここで「元のサンプルの並び」とは、符号化装置11の周波数領域サンプル列生成部113から出力された「周波数領域のサンプル列」に相当する。上述のとおり、符号化装置11の並べ替え処理部116aによる並べ替え方法や並べ替え方法に対応する並べ替えの選択肢は種々あるが、並べ替えが実行された場合には実行された並べ替えは一つであり、その並べ替えは周波数領域ピッチ周期Tと補助情報とによって特定できる。
次に、利得乗算部124aが、フレームごとに、復号部123aまたは回復部123bが出力したサンプル列の各係数に、上記利得情報で特定される利得を乗じて、「正規化された重み付け正規化MDCT係数列」を得て出力する(ステップS124a)。
次に、重み付け包絡逆正規化部124bが、フレームごとに、利得乗算部124aが出力した「正規化された重み付け正規化MDCT係数列」の各係数に、前述のように伝送されたパワースペクトル包絡係数列から得られる補正係数を適用することで「MDCT係数列」を得て出力する(ステップS124b)。符号化装置11で実行された重み付け包絡正規化処理の例に対応させて具体例を説明すると、重み付け包絡逆正規化部124bは、利得乗算部124aが出力した「正規化された重み付け正規化MDCT係数列」の各係数に、当該各係数に対応するパワースペクトル包絡係数列の各係数のβ乗(0<β<1)の値W(1)β,・・・,W(N)βを乗算することによって、MDCT係数列の各係数X(1),・・・,X(N)を得る。
次に、時間領域変換部124cが、フレームごとに、重み付け包絡逆正規化部124bが出力した「MDCT係数列」を時間領域に変換してフレーム単位の信号列(時間領域の信号列)を得て出力する(ステップS124c)。長期予測情報復号部121が出力した長期予測選択情報が長期予測を実行することを示す場合には、時間領域変換部124cが得た信号列は長期予測残差信号列xp(1),...,xp(Nt)として長期予測合成部125に入力される。長期予測情報復号部121が出力した長期予測選択情報が長期予測を実行しないことを示す場合には、時間領域変換部124cが得た信号列はディジタル音響信号列x(1),...,x(Nt)として復号装置12から出力される。
長期予測合成部125は、長期予測選択情報が長期予測を実行することを示す場合には、時間領域変換部124cが得た長期予測残差信号列xp(1),...,xp(Nt)と、長期予測情報復号部121が出力した時間領域のピッチ周期Lと量子化済みピッチ利得gp^と、長期予測合成部125が生成した過去のディジタル音響信号とに基づき、式(A5)によって、ディジタル音響信号列x(1),...,x(Nt)を得る。長期予測情報復号部121が量子化済みピッチ利得gp^を出力しない場合、すなわち、復号装置12にピッチ利得符号Cgpが入力されなかった場合には、gp^として例えば0.5などの予め定めた値を用いる。この場合のgp^の値は、符号化装置11と復号装置12とで同じ値を用いることができるよう、長期予測情報復号部121内に予め記憶しておく。
x(t)= xp(t)+gp^x(t-L) (A5)
そして、長期予測合成部125が得た信号列はディジタル音響信号列x(1),...,x(Nt)として復号装置12から出力される。
長期予測合成部125は、長期予測選択情報が長期予測を実行しないことを示す場合には、何もしない。
第1実施形態の符号化装置11では換算間隔T1および換算間隔T1の整数倍の値U×T1を候補値として周波数領域ピッチ周期Tを決定したが、換算間隔T1の整数倍の値U×T1以外の倍数値も候補値として周波数領域ピッチ周期Tを決定してもよい。以下、第1実施形態と異なる点について説明する。
本変形例の符号化装置11’が第1実施形態の符号化装置11と異なるのは、周波数領域ピッチ周期分析部115に替えて周波数領域ピッチ周期分析部115’を備える点である。本変形例では、周波数領域ピッチ周期分析部115’が、換算間隔T1および換算間隔T1の整数倍の値U×T1および換算間隔T1の整数倍U×T1以外の予め定めた倍数の値を候補値として、周波数領域ピッチ周期Tを決定して出力する。周波数領域ピッチ周期分析部115’は、長期予測選択情報が長期予測を実行しないことを示す場合には、第1実施形態と同様に、予め定めた第2の範囲の整数値を候補値として周波数領域ピッチ周期Tを決定して出力する。
周波数領域ピッチ周期分析部115’は、換算間隔T1および換算間隔T1の整数倍の値U×T1および換算間隔T1の整数倍U×T1以外の予め定めた倍数の値を候補値として、周波数領域ピッチ周期Tを決定し(換算間隔T1および換算間隔T1の整数倍の値U×T1を含む候補値の中から周波数領域ピッチ周期Tを決定し)、周波数領域ピッチ周期Tと周波数領域ピッチ周期Tが換算間隔T1の何倍であるかを示す周波数領域ピッチ周期符号とを出力する。
本変形例の復号装置12’が第1実施形態の復号装置12と異なるのは、周期換算部122に替えて周期換算部122’を備える点である。
周期換算部122’は、長期予測選択情報が長期予測を実行することを示す場合には、周波数領域ピッチ周期符号を復号して周波数領域ピッチ周期Tが換算間隔T1の何倍であるかを示す値(倍数値)を得て、時間領域のピッチ周期Lと周波数領域のサンプル点数Nとに基づき式(A4)によって換算間隔T1を得て、換算間隔T1に何倍であるかを示す値を乗算することで周波数領域ピッチ周期Tを得て出力する。
周期換算部122’は、長期予測選択情報が長期予測を実行しないことを示す場合には、周波数領域ピッチ周期符号を復号して周波数領域ピッチ周期Tを得て出力する。
第1実施例の変形例1では、換算間隔T1の整数倍の値U×T1以外の倍数値も候補値として周波数領域ピッチ周期Tを決定した。このとき、整数倍の値U×T1の方がそれ以外の値よりも周波数領域ピッチ周期Tとなる可能性が高いという特性があることを反映し、第1実施形態の変形例2では、周波数領域ピッチ周期符号の長さを可変長符号帳により決定する。
また、周波数領域ピッチ周期分析部115’’において、周波数領域ピッチ周期符号の長さも考慮して、ピッチ周期Tを決定する。
「周波数領域ピッチ周期分析部115’’」
周波数領域ピッチ周期分析部115’’は、換算間隔T1および換算間隔T1の整数倍の値U×T1および換算間隔T1の整数倍U×T1以外の予め定めた倍数の値を候補値として、周波数領域ピッチ周期Tを決定し(換算間隔T1および換算間隔T1の整数倍の値U×T1を含む候補値の中から周波数領域ピッチ周期Tを決定し)、周波数領域ピッチ周期Tと周波数領域ピッチ周期Tが換算間隔T1の何倍であるかを示す周波数領域ピッチ周期符号とを出力する。
変形した集中度指標=集中度の指標-c*(換算間隔T1との関係を示す符号の長さ)
とし、変形した集中度指標が最大となる周波数領域ピッチ周期Tを決定する。
[符号化装置21]
本実施形態の符号化装置21が第1実施形態の符号化装置11と異なるのは、周波数領域ピッチ周期分析部115に替えて周波数領域ピッチ周期分析部215を備える点である。本実施形態では、周波数領域ピッチ周期分析部215が、長期予測選択情報が長期予測を実行することを示す場合には、換算間隔T1および換算間隔T1の整数倍の値U×T1の中から中間候補値を決定し、中間候補値および中間候補値の近傍の予め定めた第3の範囲の値の中から周波数領域ピッチ周期Tを決定して出力する。周波数領域ピッチ周期分析部215は、長期予測選択情報が長期予測を実行しないことを示す場合には、第1実施形態と同様に、予め定めた第2の範囲の整数値を候補値として周波数領域ピッチ周期Tを決定して出力する。以下、第1実施形態と異なる点について説明する。
周波数領域ピッチ周期分析部215は、長期予測選択情報が長期予測を実行することを示す場合には、まず、換算間隔T1および換算間隔T1の整数倍の値U×T1を候補値として、中間候補値を決定する。次に周波数領域ピッチ周期分析部215は、中間候補値および中間候補値の近傍の予め定めた第3の範囲の値を候補値として、周波数領域ピッチ周期Tを決定し周波数領域ピッチ周期Tを出力する。さらに、周波数領域ピッチ周期分析部215は、中間候補値が換算間隔T1の何倍であるかを示す情報と、周波数領域ピッチ周期Tと中間候補値との差を示す情報と、を周波数領域ピッチ周期符号として出力する。
本実施形態の復号装置22が第1実施形態の復号装置12と異なるのは、周期換算部122に替えて周期換算部222を備える点である。本実施形態では、周期換算部222が、長期予測選択情報が長期予測を実行することを示す場合には、周波数領域ピッチ周期符号を復号して、中間候補値が換算間隔T1の何倍であるかの整数値と、周波数領域ピッチ周期Tと中間候補値との差の値と、を得て、換算間隔T1に整数値を乗算して得られる値に上記の差の値を加算したものを周波数領域ピッチ周期Tとして得て出力する。周期換算部222は、長期予測選択情報が長期予測を実行しないことを示す場合には、周波数領域ピッチ周期符号を復号して周波数領域ピッチ周期Tを得て出力する。
[符号化装置31]
本実施形態の符号化装置31が第1実施形態、第1実施形態の変形例、および第2実施形態の符号化装置11,11’,21と異なるのは、周波数領域ピッチ周期分析部115,115’,215に替えて周波数領域ピッチ周期分析部315を備える点である。本実施形態では、周波数領域ピッチ周期分析部315は、「長期予測選択情報が長期予測を実行することを示す場合」に替えて「量子化済みピッチ利得gp^が予め定めた値以上である場合」、「長期予測選択情報が長期予測を実行しないことを示す場合」に替えて「量子化済みピッチ利得gp^が予め定めた値より小さい場合」、として処理を行う。これ以外は、第1実施形態および第2実施形態と同様である。なお、本実施形態は、第1実施形態のうち、符号化装置31が量子化済みピッチ利得gp^およびピッチ利得符号Cgpを得る構成が前提となる。
本実施形態の復号装置32が第1実施形態および第2実施形態の復号装置12,12’,22と異なるのは、周期換算部122,122’,222に替えて周期換算部322を備える点である。本実施形態では、周期換算部322は、「長期予測選択情報が長期予測を実行することを示す場合」に替えて「量子化済みピッチ利得gp^が予め定めた値以上である場合」、「長期予測選択情報が長期予測を実行しないことを示す場合」に替えて「量子化済みピッチ利得gp^が予め定めた値より小さい場合」、として処理を行う。これ以外は、第1実施形態および第2実施形態と同様である。なお、本実施形態は、第1実施形態のうち、復号装置32にピッチ利得符号Cgpが入力され量子化済みピッチ利得gp^を得る構成、が前提となる。
[符号化装置41]
本実施形態の符号化装置41が第1実施形態、第1実施形態の変形例、および第2実施形態の符号化装置11,11’,21と異なるのは、長期予測分析部111、長期予測残差生成部112、周波数領域変換部113a、周期換算部114、周波数領域ピッチ周期分析部115,115’,215のそれぞれに替えて、長期予測分析部411、長期予測残差生成部412 、周波数領域変換部413a、周期換算部414、周波数領域ピッチ周期分析部415を備える点である。
本実施形態の復号装置42が第1実施形態および第2実施形態の復号装置12,12’,22と異なるのは、復号部123a、長期予測情報復号部121、周期換算部122,122’,222、時間領域変換部124c、長期予測合成部125のそれぞれに替えて、復号部423a、長期予測情報復号部421、周期換算部422、時間領域変換部424c、長期予測合成部425を備える点である。本実施形態は、長期予測選択情報や量子化済みピッチ利得gp^の値に関わらず長期予測合成を行う。従って、本実施形態の復号装置42には、長期予測選択情報は入力される必要は無い。
上記の各実施形態の符号化装置11,11’,21,31,41では、周波数領域変換部113a,413aと重み付け包絡正規化部113bと正規化利得計算部113cと量子化部113dを備えて、量子化部113dで得られたフレーム単位の量子化MDCT係数列を周波数領域ピッチ周期分析部115,115’,215,315,415の入力とした。しかしながら、符号化装置11,11’,21,31,41が、周波数領域変換部113a,413aと重み付け包絡正規化部113bと正規化利得計算部113cと量子化部113d以外の処理部を備えたり、一部の処理部を省略した処理を行ってもよい。すなわち、符号化装置11,11’,21,31,41は、一例として周波数領域変換部113a,413aと重み付け包絡正規化部113bと正規化利得計算部113cと量子化部113dとにより構成される、周波数領域サンプル列生成部113を備えていることになる。符号化装置11,11’,21,31,41が備える周波数領域サンプル列生成部113は、長期予測を実行する場合には上記長期予測残差信号に由来する周波数領域のサンプル列を得る処理を行い、長期予測を実行しない場合には上記音響信号に由来する周波数領域のサンプル列を得る処理を行う。周波数領域サンプル列生成部113が得たサンプル列は周波数領域ピッチ周期分析部115,115’,215,315,415に入力される。
[符号化装置51]
図8に示すように、本実施形態の符号化装置51が第1実施形態、第1実施形態の変形例、第2実施形態、第3実施形態および第4実施形態の符号化装置11,11’,21,31,41と異なるのは、符号化装置51が周波数領域ピッチ周期考慮符号化部116を含まない点である。この場合は、符号化装置51は、周波数領域ピッチ周期を特定するための符号を得る符号化装置として機能する。符号化装置51から出力された周波数領域のサンプル列も符号化する場合は、符号化装置51から出力された周波数領域のサンプル列は、例えば、符号化装置51の外部の周波数領域ピッチ周期考慮符号化部116に入力されて符号化されるが、その他の符号化手段を用いて符号化してもよい。その他は、第1実施形態、第1実施形態の変形例、第2実施形態、第3実施形態および第4実施形態の符号化装置11,11’,21,31,41と同じである。
図9に示すように、本実施形態の復号装置52が第1実施形態、第1実施形態の変形例、第2実施形態、第3実施形態および第4実施形態の復号装置12,12’,22,32,42と異なるのは、復号装置52が周波数領域ピッチ周期考慮復号部123、時間領域信号列生成部124、および長期予測合成部125を含まない点である。この場合は、復号装置52は、符号列に含まれる少なくとも周波数領域ピッチ周期符号と時間領域ピッチ周期符号とから、少なくとも長期予測周波数領域ピッチ周期Tと時間領域のピッチ周期Lとを得る復号装置として機能する。例えば、復号装置52から出力された時間領域のピッチ周期Lおよび量子化済みピッチ利得gp^は、長期予測合成部125の入力となる。また、例えば、符号列、復号装置52から出力された周波数領域ピッチ周期T、(および、補助情報が入力された場合には補助情報)は、周波数領域ピッチ周期考慮復号部123の入力となる。その他は、第1実施形態、第1実施形態の変形例、第2実施形態、第3実施形態および第4実施形態の復号装置12,12’,22,32,42と同じである。
図10および図11に示すように、本実施形態の符号化装置61および復号装置62が第1実施形態、第1実施形態の変形例、第2実施形態、第3実施形態および第4実施形態と異なるのは、周波数領域ピッチ周期考慮符号化部116に替えて周波数領域ピッチ周期考慮符号化部616が構成され、周波数領域ピッチ周期考慮復号部123に替えて周波数領域ピッチ周期考慮復号部623が構成される点である。周波数領域のサンプル列は、周波数領域ピッチ周期考慮符号化部616の入力となる。符号列、周波数領域ピッチ周期Tおよび補助情報は、周波数領域ピッチ周期考慮復号部623の入力となる。以下では、周波数領域ピッチ周期考慮符号化部616および周波数領域ピッチ周期考慮復号部623のみを説明する。
周波数領域ピッチ周期考慮符号化部616は、符号化部616bを備え、周波数領域ピッチ周期Tに基づく符号化方法で、入力された周波数領域のサンプル列を符号化し、それによって得られた符号列を出力する。
符号化部616bは、周波数領域のサンプル列のうちの周波数領域ピッチ周期Tに対応するサンプルを含む一つまたは連続する複数のサンプルおよび、周波数領域のサンプル列のうちの周波数領域ピッチ周期Tの整数倍に対応するサンプルを含む一つまたは連続する複数のサンプル、の全部または一部のサンプルによるサンプル群G1と、周波数領域のサンプル列のうちのサンプル群G1に含まれないサンプルによるサンプル群G2と、を異なる基準に従って(区別して)符号化し、それによって得られた符号列を出力する。
「周波数領域のサンプル列のうちの周波数領域ピッチ周期Tに対応するサンプルを含む一つまたは連続する複数のサンプルおよび、周波数領域のサンプル列のうちの周波数領域ピッチ周期Tの整数倍に対応するサンプルを含む一つまたは連続する複数のサンプル、の全部または一部のサンプル」の具体例は第1実施形態と同じであり、このようなサンプルによる群がサンプル群G1である。第1実施形態で説明したように、このようなサンプル群G1の設定方法には様々な選択肢がある。例えば、符号化部616bに入力されたサンプル列のうち、周波数領域ピッチ周期Tの整数倍に対応するサンプルF(nT)の前後のサンプルF(nT-1),F(nT+1)を含めた3個のサンプルF(nT-1),F(nT),F(nT+1)によるサンプル群の集合がサンプル群G1の例である。例えば、nが1から5までの各整数を表す場合、第1のサンプル群F(T-1),F(T),F(T+1)、第2のサンプル群F(2T-1),F(2T),F(2T+1)、第3のサンプル群F(3T-1),F(3T),F(3T+1)、第4のサンプル群F(4T-1),F(4T),F(4T+1)、第5のサンプル群F(5T-1),F(5T),F(5T+1)からなる群がサンプル群G1である。
符号化部616bは、サンプル群G1,G2に含まれるサンプルの並び替えを行うことなく、サンプル群G1とサンプル群G2とを互いに異なる基準に従って符号化し、それによって得られた符号列を出力する。
可変長符号化として1サンプルごとのライス符号化を用いる例を説明する。
この場合、符号化部616bは、サンプル群G1に含まれるサンプルの振幅の大きさまたはその推定値に対応するライスパラメータを用いてサンプル群G1に含まれるサンプルを1サンプルごとにライス符号化する。また符号化部616bは、サンプル群G2に含まれるサンプルの振幅の大きさまたはその推定値に対応するライスパラメータを用いてサンプル群G2に含まれるサンプルを1サンプルごとにライス符号化する。符号化部616bは、ライス符号化によって得られた符号列と、ライスパラメータを特定するための補助情報とを出力する。
サンプル群G1に含まれるサンプルX(k)を1サンプルごとにライス符号化して得られる符号は、サンプル群G1のライスパラメータsに対応する値でサンプルX(k)を除算して得られる商q(k)をアルファ符号化したprefix(k)と、その剰余を特定するsub(k)とを含む。すなわち、この例でのサンプルX(k)に対応する符号はprefix(k)とsub(k)とを含む。なお、ライス符号化対象となるサンプルX(k)は整数表現されたものである。
ライスパラメータs>0の場合、以下のように商q(k)が生成される。ただし、floor(χ)はχ以下の最大の整数である。
q(k)=floor(X(k)/2s-1) (for X(k)≧0) …(B1)
q(k)=floor{(-X(k)-1)/2s-1} (for X(k)<0) …(B2)
ライスパラメータs=0の場合、以下のように商q(k)が生成される。
q(k)=2*X(k) (for X(k)≧0) …(B3)
q(k)=-2*X(k)-1 (for X(k)<0) …(B4)
ライスパラメータs>0の場合、以下のようにsub(k)が生成される。
sub(k)=X(k)-2s-1*q(k)+2s-1 (for X(k)≧0) …(B5)
sub(k)=(-X(k)-1)-2s-1*q(k) (for X(k)<0) …(B6)
ライスパラメータs=0の場合、sub(k)はnullである(sub(k)=null)。
q(k)=floor{(2*|X(k)|-z)/2s} (z=0 or 1 or 2) …(B7)
ライス符号化の場合、prefix(k)は商q(k)をアルファ符号化した符号であり、その符号量は、式(B7)を用いて以下のように表現できる。
floor{(2*|X(k)|-z)/2s}+1 …(B8)
s’=log2{ln2*(2*D/|G1|-z)} …(B11)
D/|G1|がzよりも十分大きいならば、式(B11)は以下のように近似できる。
s’=log2{ln2*(2・D/|G1|)} …(B12)
式(B12)で得られるs’は整数化されていないため、s’を整数に量子化した値をライスパラメータsとする。このライスパラメータsは、サンプル群G1に含まれるサンプルの振幅の大きさの平均D/|G1|に対応し(式(B12)参照)、サンプル群G1に含まれるサンプルX(k)に対応する符号の総符号量を最小化する。
サンプル群G1に対応するライスパラメータとサンプル群G2に対応するライスパラメータとを区別して扱う場合、復号側では、サンプル群G1に対応するライスパラメータを特定するための補助情報(第3補助情報)と、サンプル群G2に対応するライスパラメータを特定するための補助情報(第4補助情報)とが必要となる。そのため、符号化部616bは、サンプル列を1サンプルごとにライス符号化して得られた符号からなる符号列に加え、第3補助情報および第4補助情報を出力してもよい。
音響信号が符号化対象である場合、サンプル群G1に含まれるサンプルの振幅の大きさの平均はサンプル群G2に含まれるサンプルの振幅の大きさの平均よりも大きく、サンプル群G1に対応するライスパラメータがサンプル群G2に対応するライスパラメータよりも大きい。このことを利用してライスパラメータを特定するための補助情報の符号量を削減することもできる。
単独でサンプル群G1に対応するライスパラメータを特定できる情報を第5補助情報とし、サンプル群G1に対応するライスパラメータとサンプル群G2に対応するライスパラメータとの差分を特定できる情報を第6補助情報としてもよい。逆に、単独でサンプル群G2に対応するライスパラメータを特定できる情報を第6補助情報とし、サンプル群G1に対応するライスパラメータとサンプル群G2に対応するライスパラメータとの差分を特定できる情報を第5補助情報としてもよい。なお、サンプル群G1に対応するライスパラメータがサンプル群G2に対応するライスパラメータよりも大きいことが分かっているため、サンプル群G1に対応するライスパラメータとサンプル群G2に対応するライスパラメータとの大小関係を表す補助情報(正負を表す情報など)は不要である。
フレーム全体に割り当てられる符号ビット数が定められている場合には、ステップS113cで求められる利得の値もかなり制約され、サンプルの振幅のとり得る範囲も大きく制約される。この場合、フレーム全体に割り当てられる符号ビット数からサンプルの振幅の大きさの平均を或る程度の精度で推定できる。符号化部616bは、当該サンプルの振幅の大きさの平均の推定値から推定されるライスパラメータを用いてライス符号化を行ってもよい。
サンプル群G1に含まれるサンプルの振幅の大きさが均等ではない場合や、サンプル群G2に含まれるサンプルの振幅の大きさが均等ではない場合であっても、サンプル列X(1),...,X(N)の振幅の包絡情報をたよりに、符号量削減効果がより大きなライスパラメータを推定することもできる。たとえば、サンプルの振幅の大きさが高域ほど大きい場合には、サンプル群G1に含まれるサンプルのうち高域側のサンプルに対応するライスパラメータを固定的に増加させ、サンプル群G2に含まれるサンプルのうち高域側のサンプルに対応するライスパラメータを固定的に増加させることで、符号量をより削減できる。以下に具体例を示す。
周波数領域ピッチ周期考慮復号部623は、復号部623aを備え、周波数領域ピッチ周期Tに基づく復号方法で符号列を復号して周波数領域のサンプル列を得て出力する。
復号部623aは、周波数領域のサンプル列を、周波数領域のサンプル列のうちの周波数領域ピッチ周期Tに対応するサンプルを含む一つまたは連続する複数のサンプルおよび、周波数領域のサンプル列のうちの周波数領域ピッチ周期Tの整数倍に対応するサンプルを含む一つまたは連続する複数のサンプル、の全部または一部のサンプルによるサンプル群G1と、周波数領域のサンプル列のうちのサンプル群G1に含まれないサンプルによるサンプル群G2と、を異なる基準に従った(区別された)復号処理により符号列を復号することにより得て出力する。
復号部623aは、入力された周波数領域ピッチ周期Tによって(第1補助情報が入力される場合には周波数領域ピッチ周期Tと第1補助情報とによって)、フレームごとに、入力された符号列に含まれる符号群C1およびC2、およびそれぞれの符号群が対応するサンプル群G1およびG2に含まれるサンプル番号を特定し、符号群C1およびC2を復号して得られるサンプル値群を各符号が対応するサンプル番号に割り当てることでサンプル群G1およびG2を得ることにより周波数領域のサンプル列を得る。符号群C1は、符号列のうちサンプル群G1に含まれるサンプルに対応する符号からなり、符号群C2は、符号列のうちサンプル群G2に含まれるサンプルに対応する符号からなる。復号部623aでの符号群C1およびC2の特定方法は、符号化部616bでのサンプル群G1およびG2の設定方法に対応し、例えば、前述のサンプル群G1およびG2の設定方法での「サンプル」を「符号」に、「F(j)」を「C(j)」に、「サンプル群G1」を「符号群C1」に、「サンプル群G2」を「符号群C2」に置換したものである。ただし、C(j)はサンプルF(j)に対応する符号である。
復号部623aは、符号群C1と符号群C2とを互いに異なる基準に従って復号し、それによって周波数領域のサンプル列を得て出力する。例えば、復号部623aは、符号群C1に対応するサンプル群G1に含まれるサンプルの振幅の大きさまたはその推定値に対応する基準に従って符号群C1に含まれる符号を復号し、符号群C2に対応するサンプル群G2に含まれるサンプルの振幅の大きさまたはその推定値に対応する基準に従って符号群C2に含まれる符号を復号する。
1サンプルごとのライス符号化によって符号列が得られている場合を例示する。
この場合、復号部623aは、フレームごとに、入力された補助情報(第1~9補助情報の少なくとも一部)から特定される、サンプル群G1に対応するライスパラメータを符号群C1に対応するライスパラメータとし、サンプル群G2に対応するライスパラメータを符号群C2に対応するライスパラメータとする。以下に前述の[ライスパラメータを特定するための補助情報の例1~5]に対応するライスパラメータの特定方法を例示する。
例えば、第3補助情報および第4補助情報が入力された復号部623aは、第3補助情報からサンプル群G1に対応するライスパラメータを特定し、それを符号群C1に対応するライスパラメータとし、第4補助情報からサンプル群G2に対応するライスパラメータを特定し、それを符号群C2に対応するライスパラメータとする。
例えば、符号列の他に第4補助情報のみが入力された復号部623aは、第4補助情報から符号群C2に対応するライスパラメータを特定し、符号群C2に対応するライスパラメータに固定値(例えば1)を加えたものを符号群C1に対応するライスパラメータとする。或いは、符号列の他に第3補助情報のみが入力された復号部623aは、第3補助情報から符号群C1に対応するライスパラメータを特定し、符号群C1に対応するライスパラメータから固定値(例えば1)を減じたものを符号群C2に対応するライスパラメータとする。
例えば、ライスパラメータを特定する第5補助情報および差分を特定する第6補助情報が入力された復号部623aは、第5補助情報からサンプル群G1に対応するライスパラメータを特定し、それを符号群C1に対応するライスパラメータとする。さらに、符号群C1に対応するライスパラメータから、第6補助情報から特定した差分を減じた値を符号群C2に対応するライスパラメータとする。
例えば、差分を特定する第5補助情報およびライスパラメータを特定する第6補助情報が入力された復号部623aは、第6補助情報からサンプル群G1に対応するライスパラメータを特定し、それを符号群C1に対応するライスパラメータとする。さらに、符号群C2に対応するライスパラメータに第5補助情報から特定した差分を加算した値を符号群C1に対応するライスパラメータとする。
例えば、第7補助情報が入力された復号部623aは、フレーム全体に割り当てられる符号ビット数から推定されるライスパラメータを符号群C2に対応するライスパラメータとし、これに第7補助情報から特定される第1差分値を加算したものを符号群C1に対応するライスパラメータとする。
例えば、第8補助情報が入力された復号部623aは、フレーム全体に割り当てられる符号ビット数から推定されるライスパラメータを符号群C1に対応するライスパラメータとし、これから、第8補助情報から特定される第2差分値を減じたものを符号群C2に対応するライスパラメータとする。
例えば、上述のライスパラメータを特定するための補助情報に加え、さらに第9補助情報が入力された復号部623aは、補助情報3~8の少なくとも一部を用いてs1およびs2を特定し、第9補助情報に基づいてs1およびs2を前述の[表1]ように調整することで、符号群C1およびC2にそれぞれ対応するライスパラメータを得る。
第9補助情報が入力されない場合であっても、包絡情報が既知であって、符号化部616bがs1およびs2を前述の[表1]ように調整することでサンプル群G1およびG2にそれぞれ対応するライスパラメータを得ている場合には、復号部623aは、s1およびs2を前述の[表1]ように調整することで、符号群C1およびC2にそれぞれ対応するライスパラメータを得る。
第6実施形態では、符号化装置61の内部に周波数領域ピッチ周期考慮符号化部616が構成され、復号装置62の内部に周波数領域ピッチ周期考慮復号部623が構成される例を示した。しかしながら、符号化装置61に周波数領域ピッチ周期考慮符号化部616を含まない構成とし、復号装置62に周波数領域ピッチ周期考慮復号部623を含まない構成としてもよい。これは、第1実施形態、第1実施形態の変形例、第2実施形態、第3実施形態、第4実施形態に対する第5実施形態と同じ構成の差異であるので、詳細な説明は省略する。
[符号化装置81]
図14に示すように、本実施形態の符号化装置81が第5実施形態の符号化装置51と異なるのは、符号化装置81が長期予測分析部111と長期予測残差生成部112と周波数領域サンプル列生成部113とを含まない点である。この場合は、符号化装置81は、符号化装置81の外部から時間領域のピッチ周期Lと時間領域ピッチ周期符号CLと周波数領域サンプル列とが入力され、周波数領域サンプル列に対する周波数領域ピッチ周期を特定するための符号を得る符号化装置として機能する。
図15に示すように、本実施形態の復号装置82が第5実施形態の復号装置52と異なるのは、復号装置82が長期予測情報復号部121を含まない点である。この場合は、復号装置82は、復号装置82の外部の長期予測情報復号部121により得た時間領域ピッチ周期Lと、入力される符号列に含まれる少なくとも周波数領域ピッチ周期符号と時間領域ピッチ周期符号とから、少なくとも周波数領域ピッチ周期Tを得る復号装置として機能する。例えば、符号列、復号装置52から出力された周波数領域ピッチ周期T、(および、補助情報が入力された場合には補助情報)は、周波数領域ピッチ周期考慮復号部123の入力となる。その他は、第5実施形態の復号装置52と同じである。
[周波数領域ピッチ周期分析装置91]
また、第5実施形態、第7実施形態、第8実施形態では、符号化装置51、81で求めた周波数領域ピッチ周期Tを、外部の周波数領域ピッチ周期考慮符号化部116、616で周波数領域のサンプル列の符号化に用いることを前提とし、周波数領域ピッチ周期Tに対応する周波数領域ピッチ周期符号を出力していた。しかし、周波数領域ピッチ周期Tを、符号化以外の目的に使うことも可能であり、その場合、周波数領域ピッチ周期Tに対応する周波数領域ピッチ周期符号を出力しなくても良い。符号化以外の目的としては、例えば、音声や楽音の分析、複数の音声や楽音の分離、音声や楽音の認識などが考えられる。
なお、第1実施形態、第1実施形態の変形例、第2実施形態、第3実施形態、第4実施形態では、周波数領域ピッチ周期考慮符号化部として並べ替え処理部116aと符号化部116bとによる構成を説明し、第6実施形態では、周波数領域ピッチ周期考慮符号化部として符号化部616bによる構成を説明したが、何れの周波数領域ピッチ周期考慮符号化部も「周波数領域ピッチ周期Tに基づく符号化方法で、入力された周波数領域のサンプル列を符号化し、それによって得られた符号列を出力する。」ものであり、より詳細には、「周波数領域のサンプル列のうちの周波数領域ピッチ周期Tに対応するサンプルを含む一つまたは連続する複数のサンプルおよび、周波数領域のサンプル列のうちの周波数領域ピッチ周期Tの整数倍に対応するサンプルを含む一つまたは連続する複数のサンプル、の全部または一部のサンプルによるサンプル群と、周波数領域のサンプル列のうちのサンプル群G1に含まれないサンプルによるサンプル群と、を異なる基準に従って(区別して)符号化し、それによって得られた符号列を出力する。」ものである。
上述の実施形態に関わる符号化装置/復号装置は、キーボードなどが接続可能な入力部、液晶ディスプレイなどが接続可能な出力部、CPU(Central Processing Unit)〔キャッシュメモリなどを備えていてもよい。〕、メモリであるRAM(Random Access Memory)やROM(Read Only Memory)、ハードディスクである外部記憶装置、およびこれらの入力部、出力部、CPU、RAM、ROM、外部記憶装置間のデータのやり取りが可能なように接続するバスなどを備えている。また必要に応じて、符号化装置/復号装置に、CD-ROMなどの記憶媒体を読み書きできる装置(ドライブ)などを設けるとしてもよい。
本発明は上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。また、上記実施形態において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。例えば、上述の復号処理において、長期予測情報復号部121による処理と復号部123a,523aによる処理とは、並列に実行することができる。
Claims (41)
- 所定の時間区間の音響信号の時間領域ピッチ周期符号に時間領域のピッチ周期Lが対応し、上記時間領域のピッチ周期Lに対応する周波数領域のサンプル間隔を換算間隔T1として得る周期換算ステップと、
上記換算間隔T1および上記換算間隔T1の整数倍の値U×T1(ただし、Uは予め定めた第1の範囲の整数)を含む候補値の中から上記音響信号に由来する周波数領域サンプル列のピッチ周期(以下、「第1周波数領域ピッチ周期」という。)Tを決定し、上記第1周波数領域ピッチ周期Tが上記換算間隔T1の何倍であるかを示す第1周波数領域ピッチ周期符号を得る周波数領域ピッチ周期分析ステップと、
を有する符号化方法。 - 所定の時間区間の音響信号の時間領域での長期予測分析を行い時間領域のピッチ周期Lと当該時間領域のピッチ周期Lに対応する時間領域ピッチ周期符号を得る長期予測分析ステップと、
上記時間領域のピッチ周期Lを用いて上記音響信号の長期予測残差信号を得る長期予測残差生成ステップと、
上記長期予測残差信号に由来する周波数領域のサンプル列または上記音響信号に由来する周波数領域のサンプル列を得る周波数領域サンプル列生成ステップと、
上記時間領域のピッチ周期Lに対応する周波数領域のサンプル間隔を換算間隔T1として得る周期換算ステップと、
上記換算間隔T1および上記換算間隔T1の整数倍の値U×T1(ただし、Uは予め定めた第1の範囲の整数)を含む候補値の中から上記周波数領域サンプル列のピッチ周期(以下、「第1周波数領域ピッチ周期」という。)Tを決定し、上記第1周波数領域ピッチ周期Tが上記換算間隔T1の何倍であるかを示す第1周波数領域ピッチ周期符号を得る周波数領域ピッチ周期分析ステップと、
を有する符号化方法。 - 請求項1または2に記載の符号化方法であって、
上記周波数領域ピッチ周期分析ステップは、
上記換算間隔T1および上記換算間隔T1の整数倍の値U×T1を含む候補値の中から中間候補値を決定し、上記中間候補値および上記中間候補値の近傍の予め定めた第3の範囲の値の中から上記第1周波数領域ピッチ周期Tを決定し、上記中間候補値が上記換算間隔T1の何倍であるかを示す情報と、上記第1周波数領域ピッチ周期Tと上記中間候補値との差を示す情報と、を上記第1周波数領域ピッチ周期符号として得る
ことを特徴とする符号化方法。 - 所定の時間区間の音響信号の時間領域での長期予測分析を行い、長期予測を実行するか否かを示す長期予測選択情報、長期予測を実行する場合は時間領域のピッチ周期Lと当該時間領域のピッチ周期に対応する時間領域ピッチ周期符号、を得る長期予測分析ステップと、
長期予測を実行する場合には、上記時間領域のピッチ周期Lを用いて上記音響信号の長期予測残差信号を得る長期予測残差生成ステップと、
長期予測を実行する場合には上記長期予測残差信号に、長期予測を実行しない場合には上記音響信号に、由来する周波数領域のサンプル列を得る周波数領域サンプル列生成ステップと、
上記時間領域のピッチ周期Lに対応する周波数領域のサンプル間隔を換算間隔T1として得る周期換算ステップと、
長期予測を実行する場合には、上記換算間隔T1および上記換算間隔T1の整数倍の値U×T1(ただし、Uは予め定めた第1の範囲の整数)を含む候補値の中から上記周波数領域サンプル列のピッチ周期(以下、「第1周波数領域ピッチ周期」という。)Tを決定し、上記周波数領域ピッチ周期Tが上記換算間隔T1の何倍であるかを示す第1周波数領域ピッチ周期符号を得て、長期予測を実行しない場合には、予め定めた第2の範囲の整数値を候補値の中から上記周波数領域サンプル列のピッチ周期(以下、「第2周波数領域ピッチ周期」という。)Tを決定し、上記第2周波数領域ピッチ周期Tを示す第2周波数領域ピッチ周期符号を得る周波数領域ピッチ周期分析ステップと、
を有する符号化方法。 - 請求項4に記載の符号化方法であって、
上記周波数領域ピッチ周期分析ステップは、
長期予測を実行する場合には、上記換算間隔T1および上記換算間隔T1の整数倍の値U×T1を含む候補値の中から中間候補値を決定し、上記中間候補値および上記中間候補値の近傍の予め定めた第3の範囲の値の中から上記第1周波数領域ピッチ周期Tを決定し、上記中間候補値が上記換算間隔T1の何倍であるかを示す情報と、上記第1周波数領域ピッチ周期Tと上記中間候補値との差を示す情報と、を上記第1周波数領域ピッチ周期符号として得て、長期予測を実行しない場合には、上記予め定めた第2の範囲の整数値を候補値として上記第2周波数領域ピッチ周期Tを決定し、上記第2周波数領域ピッチ周期Tと上記第2周波数領域ピッチ周期Tを示す上記第2周波数領域ピッチ周期符号とを得る
ことを特徴とする符号化方法。 - 所定の時間区間の音響信号の時間領域での長期予測分析を行い、長期予測を実行するか否かを示す長期予測選択情報、長期予測を実行する場合は時間領域のピッチ周期Lと当該時間領域のピッチ周期に対応する時間領域ピッチ周期符号とピッチ利得、を得る長期予測分析ステップと、
長期予測を実行する場合には、上記時間領域のピッチ周期Lと上記ピッチ利得とを用いて上記音響信号の長期予測残差信号を得る長期予測残差生成ステップと、
長期予測を実行する場合には上記長期予測残差信号に、長期予測を実行しない場合には上記音響信号に、由来する周波数領域のサンプル列を得る周波数領域サンプル列生成ステップと、
上記時間領域のピッチ周期Lに対応する周波数領域のサンプル間隔を換算間隔T1として得る周期換算ステップと、
量子化済みピッチ利得が予め定めた値以上である場合には、上記換算間隔T1および上記換算間隔T1の整数倍の値U×T1(ただし、Uは予め定めた第1の範囲の整数)を含む候補値の中から上記周波数領域サンプル列のピッチ周期(以下、「第1周波数領域ピッチ周期」という。)Tを決定し、上記第1周波数領域ピッチ周期Tと上記第1周波数領域ピッチ周期Tが上記換算間隔T1の何倍であるかを示す第1周波数領域ピッチ周期符号とを得て、上記量子化済みピッチ利得が予め定めた値より小さい場合には、予め定めた第2の範囲の整数値を候補値の中から上記周波数領域サンプル列のピッチ周期(以下、「第2周波数領域ピッチ周期」という。)Tを決定し、上記第2周波数領域ピッチ周期Tと上記第2周波数領域ピッチ周期Tを示す第2周波数領域ピッチ周期符号とを得る周波数領域ピッチ周期分析ステップと、
を有する符号化方法。 - 請求項6に記載の符号化方法であって、
上記周波数領域ピッチ周期分析ステップは、
上記量子化済みピッチ利得が予め定めた値以上である場合には、上記換算間隔T1および上記換算間隔T1の整数倍の値U×T1を含む候補値の中から中間候補値を決定し、上記中間候補値および上記中間候補値の近傍の予め定めた第3の範囲の値の中から上記第1周波数領域ピッチ周期Tを決定し、上記中間候補値が上記換算間隔T1の何倍であるかを示す情報と、上記第1周波数領域ピッチ周期Tと上記中間候補値との差を示す情報と、を上記第1周波数領域ピッチ周期符号として得て、上記量子化済みピッチ利得が予め定めた値より小さい場合には、上記予め定めた第2の範囲の整数値を候補値として上記第2周波数領域ピッチ周期Tを決定し、上記第2周波数領域ピッチ周期Tと上記第2周波数領域ピッチ周期Tを示す上記第2周波数領域ピッチ周期符号とを得る
ことを特徴とする符号化方法。 - 請求項1から7の何れかに記載の符号化方法であって、
上記第1または2周波数領域ピッチ周期Tに基づく符号化方法で上記周波数領域のサンプル列を符号化する周波数領域ピッチ周期考慮符号化ステップ
を更に有する符号化方法。 - 請求項8に記載の符号化方法であって、
上記第1または2周波数領域ピッチ周期Tに基づく符号化方法は、
上記周波数領域のサンプル列のうちの上記第1または2周波数領域ピッチ周期Tに対応するサンプルを含む一つまたは連続する複数のサンプルおよび、上記周波数領域のサンプル列のうちの上記第1または2周波数領域ピッチ周期Tの整数倍に対応するサンプルを含む一つまたは連続する複数のサンプル、の全部または一部のサンプルによるサンプル群と、上記サンプル列のうちの上記サンプル群に含まれないサンプルによるサンプル群と、を異なる基準に従って符号化する符号化方法である
符号化方法。 - 請求項8に記載の符号化方法であって、
上記周波数領域ピッチ周期考慮符号化ステップは、
(1)上記サンプル列の全てのサンプルが含まれ、かつ、
(2)上記サンプル列のうちの上記第1または2周波数領域ピッチ周期Tに対応するサンプルを含む一つまたは連続する複数のサンプルおよび、上記サンプル列のうちの上記第1または2周波数領域ピッチ周期Tの整数倍に対応するサンプルを含む一つまたは連続する複数のサンプル、の全部または一部のサンプルが集まるように上記サンプル列に含まれる少なくとも一部のサンプルを並べ替えたもの、
を並べ替え後のサンプル列として得る並べ替えステップと、
上記並べ替えステップで得られたサンプル列を符号化する符号化ステップと、
を含む符号化方法。 - 請求項10に記載の符号化方法であって、
上記並べ替えステップでは、
上記所定の時間区間の音響信号に対応する予測利得またはその推定値が予め定めた閾値以下である場合は、上記サンプル列を並べ替え後のサンプル列として出力する
ことを特徴とする符号化方法。 - 請求項1,2,4,6の何れかに記載の符号化方法であって、
上記第1周波数領域ピッチ周期符号は、上記第1周波数領域ピッチ周期Tが
上記換算間隔T1そのものである場合、
上記換算間隔T1の整数倍である場合、
の符号長のほうが、
それ以外の場合の符号長よりも短い
ことを特徴とする符号化方法。 - 請求項1,2,4,6の何れかに記載の符号化方法であって、
上記第1周波数領域ピッチ周期符号は、上記第1周波数領域ピッチ周期Tが
上記換算間隔T1そのものである場合、
上記換算間隔T1の整数倍である場合、
上記換算間隔T1の近傍である場合、
上記換算間隔T1の整数倍の近傍である場合、
の符号長のほうが、
それ以外の場合の符号長よりも短い
ことを特徴とする符号化方法。 - 請求項1,2,4,6の何れかに記載の符号化方法であって、
上記第1周波数領域ピッチ周期符号は、上記第1周波数領域ピッチ周期Tが
上記換算間隔T1そのものである場合、
の符号長のほうが、
上記換算間隔T1の近傍である場合の符号長よりも短い
ことを特徴とする符号化方法。 - 請求項1,2,4,6の何れかに記載の符号化方法であって、
上記第1周波数領域ピッチ周期符号は、上記第1周波数領域ピッチ周期Tが
上記換算間隔T1の整数倍である場合、
の符号長のほうが、
上記換算間隔T1の整数倍の近傍である場合の符号長よりも短い
ことを特徴とする符号化方法。 - 請求項12乃至15のいずれかに記載の符号化方法であって、
少なくとも、上記第1周波数領域ピッチ周期Tが上記換算間隔T1の整数倍の値V×T1(ただし、Vは正の整数)である場合の上記第1周波数領域ピッチ周期符号の符号長は、
上記整数Vの大きさに対して単調非減少の関係にある
ことを特徴とする符号化方法。 - 時間領域ピッチ周期符号を復号して時間領域のピッチ周期Lを得る長期予測情報復号ステップと、
上記時間領域のピッチ周期Lに対応する周波数領域のサンプル間隔を換算間隔T1として得、第1周波数領域ピッチ周期符号を復号して第1周波数領域ピッチ周期Tが上記換算間隔T1の何倍であるかを示す倍数値を得、上記換算間隔T1に上記倍数値を乗算したものを上記第1周波数領域ピッチ周期Tとして得る周期換算ステップと
を有する復号方法。 - 請求項17に記載の復号方法であって、
上記周期換算ステップは、
上記時間領域のピッチ周期Lに対応する周波数領域のサンプル間隔を上記換算間隔T1として得、上記第1周波数領域ピッチ周期符号を復号して、中間候補値が上記換算間隔T1の何倍であるかの倍数値と、上記第1周波数領域ピッチ周期Tと上記中間候補値との差の値と、を得、上記換算間隔T1に上記倍数値を乗算して得られる値に上記差の値を加算したものを上記第1周波数領域ピッチ周期Tとして得る
ことを特徴とする復号方法。 - 長期予測選択情報が長期予測を実行することを示す場合に、時間領域ピッチ周期符号を復号して時間領域のピッチ周期Lを得る長期予測情報復号ステップと、
上記長期予測選択情報が長期予測を実行することを示す場合には、上記時間領域のピッチ周期Lに対応する周波数領域のサンプル間隔を換算間隔T1として得、第1周波数領域ピッチ周期符号を復号して第1周波数領域ピッチ周期Tが上記換算間隔T1の何倍であるかを示す倍数値を得、上記換算間隔T1に上記倍数値を乗算したものを上記第1周波数領域ピッチ周期Tとして得て、上記長期予測選択情報が長期予測を実行しないことを示す場合には、第2周波数領域ピッチ周期符号を復号して上記第2周波数領域ピッチ周期Tを得る周期換算ステップと
を有する復号方法。 - 請求項19に記載の復号方法であって、
上記周期換算ステップは、
上記長期予測選択情報が長期予測を実行することを示す場合には、上記時間領域のピッチ周期Lに対応する周波数領域のサンプル間隔を上記換算間隔T1として得、上記第1周波数領域ピッチ周期符号を復号して、中間候補値が上記換算間隔T1の何倍であるかの倍数値と、上記第1周波数領域ピッチ周期Tと上記中間候補値との差の値と、を得、上記換算間隔T1に上記倍数値を乗算して得られる値に上記差の値を加算したものを上記第1周波数領域ピッチ周期Tとして得て、上記長期予測選択情報が長期予測を実行しないことを示す場合には、上記第2周波数領域ピッチ周期符号を復号して上記第2周波数領域ピッチ周期Tを得る
ことを特徴とする復号方法。 - 長期予測選択情報が長期予測を実行することを示す場合に、時間領域ピッチ周期符号を復号して時間領域のピッチ周期Lを得て、利得符号を復号して量子化済みピッチ利得を得る長期予測情報復号ステップと、
上記量子化済みピッチ利得が予め定めた値以上である場合には、上記時間領域のピッチ周期Lに対応する周波数領域のサンプル間隔を換算間隔T1として得、第1周波数領域ピッチ周期符号を復号して第1周波数領域ピッチ周期Tが上記換算間隔T1の何倍であるかを示す倍数値を得、上記換算間隔T1に上記倍数値をしたものを上記第1周波数領域ピッチ周期Tとして得て、上記量子化済みピッチ利得が予め定めた値より小さい場合には、第2周波数領域ピッチ周期符号を復号して第2周波数領域ピッチ周期Tを得る周期換算ステップと
を有する復号方法。 - 請求項21に記載の復号方法であって、
上記周期換算ステップは、
上記量子化済みピッチ利得が予め定めた値以上である場合には、上記時間領域のピッチ周期Lに対応する周波数領域のサンプル間隔を上記換算間隔T1として得、上記第1周波数領域ピッチ周期符号を復号して、中間候補値が上記換算間隔T1の何倍であるかの倍数値と、上記第1周波数領域ピッチ周期Tと上記中間候補値との差の値と、を得、上記換算間隔T1に上記倍数値を乗算して得られる値に上記差の値を加算したものを上記第1周波数領域ピッチ周期Tとして得て、上記量子化済みピッチ利得が予め定めた値より小さい場合には、上記第2周波数領域ピッチ周期符号を復号して上記第2周波数領域ピッチ周期Tを得る
ことを特徴とする復号方法。 - 請求項17から22の何れかに記載の復号方法であって、
上記第1または2周波数領域ピッチ周期Tに基づく復号方法で符号列を復号して周波数領域のサンプル列を得る周波数領域ピッチ周期考慮復号ステップと、
上記周波数領域のサンプル列に由来する時間領域の信号列を得る時間領域信号列生成ステップと、
上記時間領域の信号列と上記時間領域のピッチ周期Lと過去の復号音響信号列を用いて、復号音響信号列を得る長期予測合成ステップと、
を更に有する復号方法。 - 請求項23に記載の復号方法であって、
上記第1または2周波数領域ピッチ周期Tに基づく復号方法は、
上記周波数領域のサンプル列のうちの上記第1または2周波数領域ピッチ周期Tに対応するサンプルを含む一つまたは連続する複数のサンプルおよび、上記周波数領域のサンプル列のうちの上記第1または2周波数領域ピッチ周期Tの整数倍に対応するサンプルを含む一つまたは連続する複数のサンプル、の全部または一部のサンプルによるサンプル群と、上記周波数領域のサンプル列のうちの上記サンプル群に含まれないサンプルによるサンプル群と、が異なる基準に従った復号処理により得られる復号方法である
ことを特徴とする復号方法。 - 請求項23に記載の復号方法であって、
上記周波数領域ピッチ周期考慮復号ステップは、
上記符号列を復号してサンプル列を得る復号ステップと
上記第1または2周波数領域ピッチ周期Tに従って上記サンプル列から周波数順のサンプルの並びである周波数領域のサンプル列を得る回復ステップを含む
ことを特徴とする復号方法。 - 請求項25に記載の復号方法であって、
上記回復ステップでは、
所定の時間区間の予測利得の推定値が予め定めた閾値以下である場合は、上記符号列を復号して得られたサンプル列を元のサンプルの並びである周波数領域のサンプル列として出力する
ことを特徴とする復号方法。 - 所定の時間区間の音響信号に由来する周波数領域サンプル列のピッチ周期(以下、「周波数領域ピッチ周期」という。)Tを決定する周波数領域ピッチ周期分析方法であって、
上記音響信号の時間領域のピッチ周期Lに対応する周波数領域のサンプル間隔を換算間隔T1として得る周期換算ステップと、
上記換算間隔T1および上記換算間隔T1の整数倍の値U×T1(ただし、Uは予め定めた第1の範囲の整数)を含む候補値の中から上記周波数領域ピッチ周期Tを決定する周波数領域ピッチ周期分析ステップと、
を有する周波数領域ピッチ周期分析方法。 - 請求項27に記載の周波数領域ピッチ周期分析方法であって、
上記周波数領域ピッチ周期分析ステップは、
上記換算間隔T1および上記換算間隔T1の整数倍の値U×T1を含む候補値の中から中間候補値を決定し、上記中間候補値および上記中間候補値の近傍の予め定めた第3の範囲の値の中から周波数領域ピッチ周期Tを決定する
ことを特徴とする周波数領域ピッチ周期分析方法。 - 所定の時間区間の音響信号の時間領域ピッチ周期符号に時間領域のピッチ周期Lが対応し、上記時間領域のピッチ周期Lに対応する周波数領域のサンプル間隔を換算間隔T1として得る周期換算部と、
上記換算間隔T1および上記換算間隔T1の整数倍の値U×T1(ただし、Uは予め定めた第1の範囲の整数)を含む候補値の中から上記音響信号に由来する周波数領域サンプル列のピッチ周期(以下、「第1周波数領域ピッチ周期」という。)Tを決定し、上記第1周波数領域ピッチ周期Tが上記換算間隔T1の何倍であるかを示す第1周波数領域ピッチ周期符号を得る周波数領域ピッチ周期分析部と、
を有する符号化装置。 - 所定の時間区間の音響信号の時間領域での長期予測分析を行い時間領域のピッチ周期Lと当該時間領域のピッチ周期Lに対応する時間領域ピッチ周期符号を得る長期予測分析部と、
上記時間領域のピッチ周期Lを用いて上記音響信号の長期予測残差信号を得る長期予測残差生成部と、
上記長期予測残差信号に由来する周波数領域のサンプル列または上記音響信号に由来する周波数領域のサンプル列を得る周波数領域サンプル列生成部と、
上記時間領域のピッチ周期Lに対応する周波数領域のサンプル間隔を換算間隔T1として得る周期換算部と、
上記換算間隔T1および上記換算間隔T1の整数倍の値U×T1(ただし、Uは予め定めた第1の範囲の整数)を含む候補値の中から上記周波数領域サンプル列のピッチ周期(以下、「第1周波数領域ピッチ周期」という。)Tを決定し、上記第1周波数領域ピッチ周期Tが上記換算間隔T1の何倍であるかを示す第1周波数領域ピッチ周期符号を得る周波数領域ピッチ周期分析部と、
を有する符号化装置。 - 請求項29または30に記載の符号化装置であって、
上記周波数領域ピッチ周期分析部は、
上記換算間隔T1および上記換算間隔T1の整数倍の値U×T1を含む候補値の中から中間候補値を決定し、上記中間候補値および上記中間候補値の近傍の予め定めた第3の範囲の値の中から上記第1周波数領域ピッチ周期Tを決定し、上記中間候補値が上記換算間隔T1の何倍であるかを示す情報と、上記第1周波数領域ピッチ周期Tと上記中間候補値との差を示す情報と、を上記第1周波数領域ピッチ周期符号として得る
ことを特徴とする符号化装置。 - 時間領域ピッチ周期符号を復号して時間領域のピッチ周期Lを得る長期予測情報復号部と、
上記時間領域のピッチ周期Lに対応する周波数領域のサンプル間隔を換算間隔T1として得、第1周波数領域ピッチ周期符号を復号して第1周波数領域ピッチ周期Tが上記換算間隔T1の何倍であるかを示す倍数値を得、上記換算間隔T1に上記倍数値を乗算したものを上記第1周波数領域ピッチ周期Tとして得る周期換算部と
を有する復号装置。 - 請求項32に記載の復号装置であって、
上記周期換算部は、
上記時間領域のピッチ周期Lに対応する周波数領域のサンプル間隔を上記換算間隔T1として得、上記第1周波数領域ピッチ周期符号を復号して、中間候補値が上記換算間隔T1の何倍であるかの倍数値と、上記第1周波数領域ピッチ周期Tと上記中間候補値との差の値と、を得、上記換算間隔T1に上記倍数値を乗算して得られる値に上記差の値を加算したものを上記第1周波数領域ピッチ周期Tとして得る
ことを特徴とする復号装置。 - 所定の時間区間の音響信号に由来する周波数領域サンプル列のピッチ周期(以下、「周波数領域ピッチ周期」という。)Tを決定する周波数領域ピッチ周期分析装置であって、
上記音響信号の時間領域のピッチ周期Lに対応する周波数領域のサンプル間隔を換算間隔T1として得る周期換算部と、
上記換算間隔T1および上記換算間隔T1の整数倍の値U×T1(ただし、Uは予め定めた第1の範囲の整数)を含む候補値の中から上記周波数領域ピッチ周期Tを決定する周波数領域ピッチ周期分析部と、
を有する周波数領域ピッチ周期分析装置。 - 請求項34に記載の周波数領域ピッチ周期分析装置であって、
上記周波数領域ピッチ周期分析部は、
上記換算間隔T1および上記換算間隔T1の整数倍の値U×T1を含む候補値の中から中間候補値を決定し、上記中間候補値および上記中間候補値の近傍の予め定めた第3の範囲の値の中から周波数領域ピッチ周期Tを決定する
ことを特徴とする周波数領域ピッチ周期分析装置。 - 請求項1から16の何れかに記載の符号化方法の各ステップをコンピュータに実行させるためのプログラム。
- 請求項17から26の何れかに記載の復号方法の各ステップをコンピュータに実行させるためのプログラム。
- 請求項27または28の周波数領域ピッチ周期分析方法の各ステップをコンピュータに実行させるためのプログラム。
- 請求項1から16の何れかに記載の符号化方法の各ステップをコンピュータに実行させるためのプログラムを格納したコンピュータ読み取り可能な記録媒体。
- 請求項17から26の何れかに記載の復号方法の各ステップをコンピュータに実行させるためのプログラムを格納したコンピュータ読み取り可能な記録媒体。
- 請求項27または28の周波数領域ピッチ周期分析方法の各ステップをコンピュータに実行させるためのプログラムを格納したコンピュータ読み取り可能な記録媒体。
Priority Applications (17)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811010320.XA CN108962270B (zh) | 2012-05-23 | 2013-05-22 | 解码方法、解码装置以及记录介质 |
CN201380026430.4A CN104321814B (zh) | 2012-05-23 | 2013-05-22 | 频域基音周期分析方法和频域基音周期分析装置 |
KR1020177016696A KR101762204B1 (ko) | 2012-05-23 | 2013-05-22 | 부호화 방법, 복호 방법, 부호화 장치, 복호 장치, 프로그램 및 기록 매체 |
PL18173806T PL3385950T3 (pl) | 2012-05-23 | 2013-05-22 | Sposoby dekodowania audio, dekodery audio oraz odpowiedni program i nośnik rejestrujący |
EP18173806.3A EP3385950B1 (en) | 2012-05-23 | 2013-05-22 | Audio decoding methods, audio decoders and corresponding program and recording medium |
PL13793620T PL2830057T3 (pl) | 2012-05-23 | 2013-05-22 | Kodowanie sygnału audio |
KR1020147030874A KR20140143438A (ko) | 2012-05-23 | 2013-05-22 | 부호화 방법, 복호 방법, 부호화 장치, 복호 장치, 프로그램 및 기록 매체 |
KR1020167018299A KR101750071B1 (ko) | 2012-05-23 | 2013-05-22 | 부호화 방법, 복호 방법, 부호화 장치, 복호 장치, 프로그램 및 기록 매체 |
US14/391,534 US9947331B2 (en) | 2012-05-23 | 2013-05-22 | Encoding method, decoding method, encoder, decoder, program and recording medium |
EP19185171.6A EP3576089B1 (en) | 2012-05-23 | 2013-05-22 | Encoding of an audio signal |
CN201811009738.9A CN109147827B (zh) | 2012-05-23 | 2013-05-22 | 编码方法、编码装置以及记录介质 |
EP13793620.9A EP2830057B1 (en) | 2012-05-23 | 2013-05-22 | Encoding of an audio signal |
ES13793620.9T ES2689072T3 (es) | 2012-05-23 | 2013-05-22 | Codificación de una señal de audio |
KR1020167021875A KR101663607B1 (ko) | 2012-05-23 | 2013-05-22 | 부호화 방법, 복호 방법, 주파수 영역 피치 주기 분석 방법, 부호화 장치, 복호 장치, 주파수 영역 피치 주기 분석 장치 및 기록 매체 |
JP2014516829A JP6053196B2 (ja) | 2012-05-23 | 2013-05-22 | 符号化方法、復号方法、符号化装置、復号装置、プログラム、および記録媒体 |
US15/904,159 US10096327B2 (en) | 2012-05-23 | 2018-02-23 | Long-term prediction and frequency domain pitch period based encoding and decoding |
US15/904,140 US10083703B2 (en) | 2012-05-23 | 2018-02-23 | Frequency domain pitch period based encoding and decoding in accordance with magnitude and amplitude criteria |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2012117172 | 2012-05-23 | ||
JP2012-117172 | 2012-05-23 | ||
JP2012171155 | 2012-08-01 | ||
JP2012-171155 | 2012-08-01 |
Related Child Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/391,534 A-371-Of-International US9947331B2 (en) | 2012-05-23 | 2013-05-22 | Encoding method, decoding method, encoder, decoder, program and recording medium |
US15/904,140 Continuation US10083703B2 (en) | 2012-05-23 | 2018-02-23 | Frequency domain pitch period based encoding and decoding in accordance with magnitude and amplitude criteria |
US15/904,159 Continuation US10096327B2 (en) | 2012-05-23 | 2018-02-23 | Long-term prediction and frequency domain pitch period based encoding and decoding |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013176177A1 true WO2013176177A1 (ja) | 2013-11-28 |
Family
ID=49623862
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2013/064209 WO2013176177A1 (ja) | 2012-05-23 | 2013-05-22 | 符号化方法、復号方法、符号化装置、復号装置、プログラム、および記録媒体 |
Country Status (8)
Country | Link |
---|---|
US (3) | US9947331B2 (ja) |
EP (3) | EP2830057B1 (ja) |
JP (1) | JP6053196B2 (ja) |
KR (4) | KR101762204B1 (ja) |
CN (3) | CN109147827B (ja) |
ES (3) | ES2762160T3 (ja) |
PL (2) | PL3385950T3 (ja) |
WO (1) | WO2013176177A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016167215A1 (ja) * | 2015-04-13 | 2016-10-20 | 日本電信電話株式会社 | 線形予測符号化装置、線形予測復号装置、これらの方法、プログラム及び記録媒体 |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101762204B1 (ko) * | 2012-05-23 | 2017-07-27 | 니폰 덴신 덴와 가부시끼가이샤 | 부호화 방법, 복호 방법, 부호화 장치, 복호 장치, 프로그램 및 기록 매체 |
WO2016121826A1 (ja) * | 2015-01-30 | 2016-08-04 | 日本電信電話株式会社 | 符号化装置、復号装置、これらの方法、プログラム及び記録媒体 |
CN107430869B (zh) * | 2015-01-30 | 2020-06-12 | 日本电信电话株式会社 | 参数决定装置、方法及记录介质 |
WO2016142002A1 (en) | 2015-03-09 | 2016-09-15 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Audio encoder, audio decoder, method for encoding an audio signal and method for decoding an encoded audio signal |
CN106373594B (zh) * | 2016-08-31 | 2019-11-26 | 华为技术有限公司 | 一种音调检测方法及装置 |
CN110291583B (zh) * | 2016-09-09 | 2023-06-16 | Dts公司 | 用于音频编解码器中的长期预测的系统和方法 |
JP6712643B2 (ja) * | 2016-09-15 | 2020-06-24 | 日本電信電話株式会社 | サンプル列変形装置、信号符号化装置、信号復号装置、サンプル列変形方法、信号符号化方法、信号復号方法、およびプログラム |
WO2019142513A1 (ja) * | 2018-01-17 | 2019-07-25 | 日本電信電話株式会社 | 符号化装置、復号装置、摩擦音判定装置、これらの方法及びプログラム |
CN110728990B (zh) * | 2019-09-24 | 2022-04-05 | 维沃移动通信有限公司 | 基音检测方法、装置、终端设备和介质 |
US11769071B2 (en) * | 2020-11-30 | 2023-09-26 | IonQ, Inc. | System and method for error correction in quantum computing |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0792998A (ja) * | 1993-07-27 | 1995-04-07 | Sony Corp | 音声信号の符号化方法及び復号化方法 |
JP2002515610A (ja) * | 1998-05-11 | 2002-05-28 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | 位相変化からの雑音寄与度の決定に基づく音声符号化 |
JP2002516420A (ja) * | 1998-05-21 | 2002-06-04 | ユニバーシティ オブ サリー | 音声コーダ |
JP2003216189A (ja) * | 2002-10-21 | 2003-07-30 | Sony Corp | 符号化装置及び復号装置 |
JP2005528647A (ja) * | 2002-05-31 | 2005-09-22 | ヴォイスエイジ・コーポレーション | 合成発話の周波数選択的ピッチ強調方法およびデバイス |
JP2009156971A (ja) | 2007-12-25 | 2009-07-16 | Nippon Telegr & Teleph Corp <Ntt> | 符号化装置、復号化装置、符号化方法、復号化方法、符号化プログラム、復号化プログラム、および記録媒体 |
Family Cites Families (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4797926A (en) | 1986-09-11 | 1989-01-10 | American Telephone And Telegraph Company, At&T Bell Laboratories | Digital speech vocoder |
US5003604A (en) * | 1988-03-14 | 1991-03-26 | Fujitsu Limited | Voice coding apparatus |
US5127053A (en) * | 1990-12-24 | 1992-06-30 | General Electric Company | Low-complexity method for improving the performance of autocorrelation-based pitch detectors |
WO1996006489A1 (fr) * | 1994-08-22 | 1996-02-29 | Sony Corporation | Emetteur-recepteur |
TW321810B (ja) * | 1995-10-26 | 1997-12-01 | Sony Co Ltd | |
US7072832B1 (en) * | 1998-08-24 | 2006-07-04 | Mindspeed Technologies, Inc. | System for speech encoding having an adaptive encoding arrangement |
JP4550176B2 (ja) * | 1998-10-08 | 2010-09-22 | 株式会社東芝 | 音声符号化方法 |
JP2000267700A (ja) * | 1999-03-17 | 2000-09-29 | Yrp Kokino Idotai Tsushin Kenkyusho:Kk | 音声符号化復号方法および装置 |
JP4005359B2 (ja) * | 1999-09-14 | 2007-11-07 | 富士通株式会社 | 音声符号化及び音声復号化装置 |
JP3404350B2 (ja) * | 2000-03-06 | 2003-05-06 | パナソニック モバイルコミュニケーションズ株式会社 | 音声符号化パラメータ取得方法、音声復号方法及び装置 |
WO2004097796A1 (ja) * | 2003-04-30 | 2004-11-11 | Matsushita Electric Industrial Co., Ltd. | 音声符号化装置、音声復号化装置及びこれらの方法 |
JP5036317B2 (ja) | 2004-10-28 | 2012-09-26 | パナソニック株式会社 | スケーラブル符号化装置、スケーラブル復号化装置、およびこれらの方法 |
EP1837997B1 (en) * | 2005-01-12 | 2011-03-16 | Nippon Telegraph And Telephone Corporation | Long-term prediction encoding method, long-term prediction decoding method, devices thereof, program thereof, and recording medium |
UA92341C2 (ru) * | 2005-04-01 | 2010-10-25 | Квелкомм Инкорпорейтед | Системы, способы и устройство широкополосного речевого кодирования |
KR100647336B1 (ko) * | 2005-11-08 | 2006-11-23 | 삼성전자주식회사 | 적응적 시간/주파수 기반 오디오 부호화/복호화 장치 및방법 |
US8909521B2 (en) * | 2009-06-03 | 2014-12-09 | Nippon Telegraph And Telephone Corporation | Coding method, coding apparatus, coding program, and recording medium therefor |
JP5612698B2 (ja) | 2010-10-05 | 2014-10-22 | 日本電信電話株式会社 | 符号化方法、復号方法、符号化装置、復号装置、プログラム、記録媒体 |
KR101762204B1 (ko) * | 2012-05-23 | 2017-07-27 | 니폰 덴신 덴와 가부시끼가이샤 | 부호화 방법, 복호 방법, 부호화 장치, 복호 장치, 프로그램 및 기록 매체 |
US9589570B2 (en) * | 2012-09-18 | 2017-03-07 | Huawei Technologies Co., Ltd. | Audio classification based on perceptual quality for low or medium bit rates |
-
2013
- 2013-05-22 KR KR1020177016696A patent/KR101762204B1/ko active IP Right Grant
- 2013-05-22 JP JP2014516829A patent/JP6053196B2/ja active Active
- 2013-05-22 US US14/391,534 patent/US9947331B2/en active Active
- 2013-05-22 EP EP13793620.9A patent/EP2830057B1/en active Active
- 2013-05-22 PL PL18173806T patent/PL3385950T3/pl unknown
- 2013-05-22 CN CN201811009738.9A patent/CN109147827B/zh active Active
- 2013-05-22 EP EP19185171.6A patent/EP3576089B1/en active Active
- 2013-05-22 ES ES18173806T patent/ES2762160T3/es active Active
- 2013-05-22 ES ES13793620.9T patent/ES2689072T3/es active Active
- 2013-05-22 PL PL13793620T patent/PL2830057T3/pl unknown
- 2013-05-22 KR KR1020167018299A patent/KR101750071B1/ko active IP Right Grant
- 2013-05-22 EP EP18173806.3A patent/EP3385950B1/en active Active
- 2013-05-22 ES ES19185171T patent/ES2834391T3/es active Active
- 2013-05-22 CN CN201811010320.XA patent/CN108962270B/zh active Active
- 2013-05-22 KR KR1020147030874A patent/KR20140143438A/ko active Application Filing
- 2013-05-22 WO PCT/JP2013/064209 patent/WO2013176177A1/ja active Application Filing
- 2013-05-22 KR KR1020167021875A patent/KR101663607B1/ko active IP Right Grant
- 2013-05-22 CN CN201380026430.4A patent/CN104321814B/zh active Active
-
2018
- 2018-02-23 US US15/904,159 patent/US10096327B2/en active Active
- 2018-02-23 US US15/904,140 patent/US10083703B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0792998A (ja) * | 1993-07-27 | 1995-04-07 | Sony Corp | 音声信号の符号化方法及び復号化方法 |
JP2002515610A (ja) * | 1998-05-11 | 2002-05-28 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | 位相変化からの雑音寄与度の決定に基づく音声符号化 |
JP2002516420A (ja) * | 1998-05-21 | 2002-06-04 | ユニバーシティ オブ サリー | 音声コーダ |
JP2005528647A (ja) * | 2002-05-31 | 2005-09-22 | ヴォイスエイジ・コーポレーション | 合成発話の周波数選択的ピッチ強調方法およびデバイス |
JP2003216189A (ja) * | 2002-10-21 | 2003-07-30 | Sony Corp | 符号化装置及び復号装置 |
JP2009156971A (ja) | 2007-12-25 | 2009-07-16 | Nippon Telegr & Teleph Corp <Ntt> | 符号化装置、復号化装置、符号化方法、復号化方法、符号化プログラム、復号化プログラム、および記録媒体 |
Non-Patent Citations (3)
Title |
---|
J. HERRE; E. ALLAMANCHE; K. BRANDENBURG; M. DIETZ; B. TEICHMANN; B. GRILL; A. JIN; T. MORIYA; N. IWAKAMI; T. NORIMATSU: "The Integrated Filterbank Based Scalable MPEG-4, Audio Coder", 105TH CONVENTION AUDIO ENGINEERING SOCIETY, 1998, pages 4810 |
See also references of EP2830057A4 |
T. MORIYA; N. IWAKAMI; A. JIN; K. IKEDA; S. MIKI: "A Design of Transform Coder for Both Speech and Audio Signals at 1 bit/sample", PROC. ICASSP '97, 1997, pages 1371 - 1374, XP000822711, DOI: doi:10.1109/ICASSP.1997.596202 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016167215A1 (ja) * | 2015-04-13 | 2016-10-20 | 日本電信電話株式会社 | 線形予測符号化装置、線形予測復号装置、これらの方法、プログラム及び記録媒体 |
CN107408390A (zh) * | 2015-04-13 | 2017-11-28 | 日本电信电话株式会社 | 线性预测编码装置、线性预测解码装置、它们的方法、程序以及记录介质 |
JPWO2016167215A1 (ja) * | 2015-04-13 | 2018-02-01 | 日本電信電話株式会社 | 線形予測符号化装置、線形予測復号装置、これらの方法、プログラム及び記録媒体 |
Also Published As
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6053196B2 (ja) | 符号化方法、復号方法、符号化装置、復号装置、プログラム、および記録媒体 | |
JP5612698B2 (ja) | 符号化方法、復号方法、符号化装置、復号装置、プログラム、記録媒体 | |
JP5596800B2 (ja) | 符号化方法、周期性特徴量決定方法、周期性特徴量決定装置、プログラム | |
JP5603484B2 (ja) | 符号化方法、復号方法、符号化装置、復号装置、プログラム、記録媒体 | |
JP5893153B2 (ja) | 符号化方法、符号化装置、プログラム、および記録媒体 | |
JP5694751B2 (ja) | 符号化方法、復号方法、符号化装置、復号装置、プログラム、記録媒体 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13793620 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14391534 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 2014516829 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2013793620 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 20147030874 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |