WO2011071335A2 - 음성 신호 부호화 방법 및 장치 - Google Patents
음성 신호 부호화 방법 및 장치 Download PDFInfo
- Publication number
- WO2011071335A2 WO2011071335A2 PCT/KR2010/008848 KR2010008848W WO2011071335A2 WO 2011071335 A2 WO2011071335 A2 WO 2011071335A2 KR 2010008848 W KR2010008848 W KR 2010008848W WO 2011071335 A2 WO2011071335 A2 WO 2011071335A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- current frame
- vector
- codebook
- quantized
- spectral
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/06—Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/032—Quantisation or dequantisation of spectral components
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/06—Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
- G10L19/07—Line spectrum pair [LSP] vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/09—Long term prediction, i.e. removing periodical redundancies, e.g. by using adaptive codebook or pitch predictor
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/10—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation
- G10L19/107—Sparse pulse excitation, e.g. by using algebraic codebook
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L2019/0001—Codebooks
- G10L2019/0007—Codebook element generation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L2019/0001—Codebooks
- G10L2019/0007—Codebook element generation
- G10L2019/001—Interpolation of codebook vectors
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L2019/0001—Codebooks
- G10L2019/0013—Codebook search algorithms
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L2019/0001—Codebooks
- G10L2019/0016—Codebook for LPC parameters
Definitions
- the present invention relates to a speech signal encoding method and apparatus.
- BACKGROUND ART Linear prediction, adaptive codebook and fixed codebook search techniques are used to increase the compression rate of speech signals.
- the present invention aims to minimize errors in spectral quantization in speech signal coding.
- the present invention proposes a speech signal encoding method characterized by extracting candidates usable as optimal spectrum vectors for speech signals according to the first best information.
- the present invention proposes a speech signal encoding method comprising extracting candidates usable as optimal adaptive codebooks for speech signals according to the second best information.
- the present invention proposes a speech signal encoding method comprising extracting candidates usable as optimal fixed codebooks for speech signals according to third best information.
- the speech signal coding method based on the best information is optimal.
- the optimal coding parameter is determined through a search process combining coding parameters in all cases. In this way, we can find the optimal parameters to minimize the quantization error and improve the sound quality of the synthesized speech signal.
- FIG. 1 is a block diagram showing a speech coder of an Analysis by Synthesis method.
- FIG. 2 is a block diagram illustrating a structure of a speech coder of a CELP scheme according to an embodiment to which the present invention is applied.
- 3 illustrates a process of sequentially obtaining coding parameters required for a speech signal encoding process as an embodiment to which the present invention is applied.
- 4 illustrates a process of quantizing an input signal using a quantized spectral candidate vector based on first best information according to an embodiment to which the present invention is applied.
- 5 illustrates a process of obtaining a quantized spectral candidate vector using first best information.
- FIG. 6 illustrates a process of quantizing an input signal using an adaptive codebook candidate based on second best information as an embodiment to which the present invention is applied.
- FIG. 7 illustrates a process of quantizing an input signal using a fixed codebook candidate based on third best information according to an embodiment to which the present invention is applied.
- the speech signal sub-method according to the present invention obtains a linear filter coefficient of a current frame from an input signal using linear prediction, And quantized spectral vector of the current frame corresponding to the current prelinear prediction filter coefficients, and interpolating the quantized spectral vector of the current frame and the quantized spectral vector of the frame.
- the first best information is characterized in that the number of information of the codebook index extracted in units of frames.
- the quantized spectral candidate vector converts the linear prediction filter coefficients of the current frame into a spectral vector of the current frame, calculates an error between the spectral vector of the current frame and the codebook of the current frame, wherein the error and It is obtained by extracting the codebook index of the current frame in consideration of the first best information.
- an error between the spectral vector of the current frame and the codebook is calculated to align the quantized code vector or codebook index in order of decreasing errors.
- index of the current frame is characterized in that the error between the spectrum vector of the current frame and the codebook is extracted in the order of less error.
- the quantized code vector corresponding to the codebook index is a quantized emission spectral frequency candidate vector of the current frame.
- the apparatus for encoding a speech signal according to the present invention is based on a linear prediction analyzer 200 and first best information for obtaining a linear prediction filter coefficient of a current frame from an input signal using linear prediction.
- a quantization unit 210 for obtaining a quantized spectral candidate vector of the current frame which is based on the linear prediction filter coefficients of the current frame, and interpolating the quantized spectral candidate vector of the current frame and the quantized spectral vector of the previous frame. Characterized in that.
- the first best information is characterized in that the number of information of the codebook index extracted in units of frames.
- the quantization unit 210 for obtaining the quantized spectral frequency candidate vector converts the linear prediction filter coefficients of the current frame into a spectral vector of the current frame, and codes the spectral vector of the current frame and the current frame.
- the quantization unit 210 further includes a quantization unit 210 that calculates an error between the spectral vector of the current frame and the codebook and arranges the quantized code vector or codebook index in the order of the least errors. It features.
- the codebook index of the current frame is characterized in that the error between the spectral vector of the current frame and the codebook is extracted in order of decreasing.
- the quantized code vector corresponding to the codebook index is a quantized spectral candidate vector of the current frame.
- Fig. 1 is a block diagram illustrating a speech coder of an Analysis by Synthesis method.
- the analysis method based on the synthesis may mean a method of determining an optimal coding parameter of the speech encoder by comparing the synthesized signal through the block diagram of the speech encoder and the original input signal. It does not measure the mean square error in the excitation signal step, but determines the optimal coding parameter by measuring the mean square error in the synthesis step. That is, it can be seen as a closed loop search method.
- the speech coder of the analysis method based on the synthesis may include an excitation signal generator 100, a long term synthesis filter 110, and a short term synthesis filter 120.
- the weighting filter 130 may be further included according to the method.
- the excitation signal generator 100 may obtain a residual signal according to long-term prediction and finally model a component having no correlation as a fixed codebook.
- an algebraic codebook which is a method of encoding a pulse position of a fixed size in a subframe, may be used, which may vary the data rate according to the number of pulses and save the codebook memory.
- the long-term synthesis filter 110 plays a role of making long-term correlation, which is physically related to the pitch excitation signal.
- the long term synthesis filter 110 may be implemented using a delay value (D) and a gain value (g p ) obtained through long term prediction or pitch analysis. For example, the following equation (1).
- the short segment synthesis filter 120 models short-term correlation in the input signal.
- the short-term synthesis filter 120 may be implemented using linear prediction filter coefficients obtained through linear prediction. For example, the following equation (2).
- the linear prediction filter coefficients may be obtained in a process of minimizing an error due to the linear prediction, and may include a covariance method, an autocorrelation method, a lattice filter, a Levinson-Derbin algorithm. Durbin algorithm) and the like can be used.
- the weight filter 130 may adjust the noise according to the energy level of the input signal. For example, it can add noise to the formant of the input signal and lower the noise on relatively low energy signals.
- Equation 3 In the synthesis method based on the synthesis, an optimal coding parameter may be obtained by performing a closed loop search to minimize an error between the original input signal ⁇ ( «) and the synthesized signal.
- the coding parameter may include an index of a fixed codebook, a delay value and a gain value of an adaptive codebook, and a linear prediction filter coefficient.
- the synthesis method based on the synthesis may be implemented by various coding methods based on a method of modeling an excitation signal.
- CELP speech coder as a method of modeling the excitation signal will be described.
- the present invention is not limited thereto, and the same technical idea may be applied to a multi-pulse excitation (MPE), ACELPCAlgebraic CELP, and the like.
- MPE multi-pulse excitation
- ACELPCAlgebraic CELP ACELPCAlgebraic CELP
- FIG. 2 is a block diagram illustrating a structure of a speech coder of a CELP scheme according to an embodiment to which the present invention is applied.
- the linear prediction analyzer 200 may obtain linear prediction filter coefficients by performing linear prediction analysis on an input signal. Using a linear predictive analysis and short-term forecasting (short-term prediction) is approaching one "law (autocorrelation approach) with that the current state in the time series data have closely related with the past or the future state, CELP (Code ⁇ Excited Linear
- CELP Code ⁇ Excited Linear
- the quantization unit 210 may convert the obtained linear prediction filter coefficients into an emission spectrum pair, which is a parameter suitable for quantization, and then quantize and interpolate the quantization unit 210.
- the interpolated emittance spectral pairs are transformed onto a linear prediction domain, which can be used to calculate the synthesis filter and the weighted filter for each subframe. Quantization will be described with reference to Figs. 4 and 5.
- the pitch analyzer 220 determines the pitch of the input signal. Compute the interval from the input signal to obtain the delay value and the gain value of the long-term synthesis filter through pitch analysis on the signal to which the psychometric weighting filter 280 is applied.
- Codebook 230 may be generated.
- the fixed codebook 240 may model the aperiodic random signal from which the short-term prediction component and the long-term prediction component are removed, and store the random signals in the form of a codebook.
- the adder 250 multiplies a gain value by each of the periodic sound source signals extracted from the adaptive codebook 230 and the random signals output from the fixed codebook 240 according to the pitch period estimated by the pitch analyzer 220, and then adds them.
- the synthesis filter 260 may generate a synthesis signal by performing synthesis filtering based on the quantized linear prediction coefficients on the excitation signal output from the adder 250.
- the error calculator 270 may calculate an error between the input signal that is the original signal and the synthesized signal.
- the error minimization unit 290 may determine a delay value, a gain value, and a random signal of the red-eye codebook that minimize the error considering the auditory characteristic through the psychometric weighting filter 280.
- the speech coder analyzes the excitation signal corresponding to the residual signal of the linear prediction analysis by dividing it into an adaptive codebook and a fixed codebook, and may be modeled as in Equation 4 below.
- Equation 4 g p v ( ⁇ ) + gA n l for n ⁇ Q , ''., N s- ⁇
- the above excitation signal is the adaptive codebook, v (") and the degenerate codebook gain ⁇ , fixed codebook It may be expressed as ⁇ and a fixed codebook gain value, ⁇ .
- a weighted input signal may be generated from an input signal through a weighting filter 300.
- an initial value of the weighted synthesis filter 310 may be generated.
- the zero input response (ZIR: zero input response) may be removed from the weighted input signal to generate a target signal of the red ox codebook in order to remove the memory influence
- the weighted synthesis filter 310 may perform the weighted filter 300.
- a pitch interval is achieved by minimizing the mean square error (MSE) of the weighted synthesis filter 310 zero state response (ZSR) by the target signal of the adaptive codebook and the adaptive codebook 320.
- MSE mean square error
- ZSR zero state response
- the delay value and the gain value of the adaptive codebook can be obtained.
- the red ox codebook 320 may be made of the long term synthesis filter 120.
- the long-term synthesis filter may use an optimal delay value and a gain value for minimizing an error between a target signal of the adaptive codebook and a signal passed through the long-term synthesis filter.
- the optimal delay value can be obtained as shown in Equation 6 below.
- Equation 6 which maximizes the above Equation 6, is used, and L means the length of one subframe of the decoding end.
- the gain value of the long-term synthesis filter is obtained by applying the delay value D obtained in Equation 6 to Equation 7.
- the fixed codebook 330 models the remaining components from which the influence of the adaptive codebook is removed from the excitation signal.
- the fixed codebook 330 may be searched by a process of minimizing an error between the weighted input signal and the weighted composite signal.
- the target signal of the fixed codebook may be updated with a signal from which a zero state response (ZSR) of the adaptive codebook 320 is removed from the input signal to which the weighting filter 300 is applied.
- ZSR zero state response
- the target signal of the fixed codebook may be expressed as Equation 8 below.
- c («) is the target signal of the fixed codebook
- s w ( «) is the weighted filter 300
- the v ( «) represents a red-eye codebook made using a long-term synthesis filter.
- the fixed codebook 330 may be searched by minimizing Equation 9 in the process of minimizing an error between the target signal of the fixed codebook and the fixed codebook.
- Equation (9) H is a lower triangular Toeplitz convolution matrix made of the impulse response of the augmented short-section synthesis filter, «), and the main diagonal components are / ⁇ (0) and lower diagonal components.
- the molecular term of Equation 9 is It is calculated as 10, where N p is the number of fixed codebooks and is the sign of the / th pulse.
- Equation 11 "
- the coding parameter of the speech coder may use a stepwise estimation method for finding an optimal red-eye codebook and then finding a fixed codebook.
- 4 illustrates a process of quantizing an input signal using a quantized emission spectrum frequency candidate vector based on first best information according to an embodiment to which the present invention is applied.
- the linear prediction analyzer 200 may obtain linear prediction filter coefficients through linear prediction analysis on the input signal (S400).
- the linear prediction filter coefficients may be obtained in a process of minimizing an error due to the linear prediction, and may include a covariance method, an autocorrelation method, a lattice filter, a Levinson-Derbin algorithm. Durbin algorithm) can be used as described above.
- the linear prediction filter coefficients may be obtained in frame units.
- the quantization unit 210 may obtain a quantized spectral candidate vector based on the linear prediction filter coefficients (S410).
- the quantized spectral candidate vector may be obtained using first best information, which will be described with reference to FIG. 5.
- 5 illustrates a process of obtaining a quantized spectral candidate vector using first best information.
- the quantization unit 210 may convert the linear prediction filter coefficients of the current frame into a current framer random spectral vector (S500).
- the spectral vector may already be a dominant spectral frequency vector.
- the present invention is not limited thereto and may be converted into parameters such as a line spectrum frequency and a line spectrum pair.
- the spectral vector may be divided into several subvectors to find respective codebooks.
- a multi-stage vector quantizer having several stages may be used, but the present invention is not limited thereto.
- the transformed spectral vector of the current frame can be used as it is.
- a technique of quantizing the spectral residual vector of the current frame may be used.
- the spectral residual vector of the current frame may be generated using the spectral vector of the current frame and the predictive vector of the current frame.
- the predictive vector of the current frame may be derived from the quantized spectral vector of the previous frame. For example, the spectral residual vector of the current frame may be derived as in Equation 12 below.
- r (n) means a current random spectral residual vector
- z (n) is the vector whose mean value is removed from the spectral vector of the current frame
- P (n) is the predictive vector of the current frame
- ⁇ ( «- ⁇ ) is the quantized spectrum of the previous frame.
- the quantization unit 210 may calculate an error between the spectral vector of the current frame and the codebook of the current frame (S520).
- the codebook of the current frame may mean a codebook used for spectral vector quantization.
- the codebook of the current frame may consist of a quantized code vector and a codebook index corresponding to the quantized code vector.
- the quantization unit 210 may calculate an error between the spectral vector of the current frame and the codebook to sort the quantized code vector or codebook index in the order of decreasing errors.
- the codebook index may be extracted in consideration of the error and the first best information in S520 (S530).
- the first best information may mean information on the number of codebook indices extracted on a frame basis.
- the first best information may be a value determined by an encoder.
- Codebook indices (or quantized codevectors) may be extracted in order of decreasing errors between the spectral vector of the current frame and the codebook according to the first best information.
- Each quantized spectral candidate vector corresponding to the extracted codebook index may be obtained (S540).
- the quantized code vector based on the extracted codebook index may be used as the quantized spectral candidate vector of the current frame.
- the first best information may mean information on the number of quantized spectral candidate vectors obtained on a frame basis.
- One quantized spectral candidate vector may be obtained according to the first best information, and a plurality of quantized spectral candidate vectors may be obtained.
- the quantized spectral candidate vector of the current frame obtained in step S410 may be used as a quantized spectral candidate vector for any one of subframes in the current frame.
- the quantization unit 210 may interpolate the quantized spectral candidate vector (S420). The interpolation may acquire quantized spectral candidate vectors for the remaining subframes in the current frame.
- the quantized spectral candidate vectors obtained for each subframe in the current frame will be referred to as a quantized spectral candidate vector set.
- the first best information may mean information on the number of quantized spectral candidate vector sets obtained on a frame basis. Therefore, one or several quantized spectral candidate vector sets may be obtained for the current frame according to the first best information.
- the quantized spectral candidate vector of the current frame obtained in S410 is quantized for the subframe in which the weight gain of the shadow is located. Can be used as a spectral candidate vector.
- the quantized spectral candidate vector for the remaining subframes may be obtained through linear interpolation between the quantized spectral candidate vector of the current frame extracted in S410 and the quantized spectral vector of the previous frame.
- the quantized spectral candidate vector corresponding to each subframe may be generated as shown in Equation 13.
- ⁇ denotes a quantized spectral vector corresponding to the last subframe of the previous frame
- q denotes a quantized spectral candidate vector corresponding to the last subframe of the current frame.
- the quantization unit 210 may obtain linear prediction filter coefficients for the interpolated quantized spectral candidate vector.
- the quantized spectral candidate vector shown in Fig. 1 can be transformed onto a linear prediction domain, which can be used to calculate the linear prediction filter and the weighted filter for each subframe.
- the mental weight filter 280 may generate a weighted input signal from the input signal (S430).
- the weighted filter may be obtained from Equation 3 using linear prediction filter coefficients obtained from the interpolated quantized spectral candidate vector.
- the red-eye codebook 230 may obtain an adaptive codebook with respect to the weighted input signal (S440).
- the red-eye codebook can be obtained with a long-term synthesis filter.
- the long-term synthesis filter has an optimal delay value and a gain value that minimizes the error between the target signal of the adaptive codebook and the signal passed through the long-term synthesis filter. Can be used.
- the delay value and the gain value that is, the coding parameters of the adaptive codebook, may be extracted for quantized spectral candidate vectors according to the first best information, respectively.
- the delay value and the gain value are as described above with reference to Equations 6 and 7.
- the fixed codebook 240 may search for the fixed codebook with respect to the target signal of the fixed codebook (S450).
- the target signal and the fixed codebook search process of the fixed codebook have been described with reference to Equations 8 and 9 below.
- the fixed codebook may be obtained for each of the quantized emission spectrum frequency candidate vector or the quantized emission spectrum frequency candidate vector set according to the first best information.
- the adder 250 may generate an excitation signal by multiplying and then adding a gain value to each of the adaptive codebook obtained in S450 and the fixed codebook found in S460 (S460). .
- the synthesis filter 260 may generate a synthesis signal by performing synthesis filtering on the excitation signal output from the adder 250 based on the linear prediction filter coefficients obtained from the interpolated quantized spectral candidate vector. (S470). When a weighted filter is applied to the synthesis filter 260, a weighted synthesis signal may be generated.
- the error minimizing unit 290 may obtain a coding parameter for minimizing an error between an input signal (or a weighted input signal) and the composite signal (or the weighted composite signal) (S480).
- the coding parameters may include linear prediction filter coefficients, delay and gain values of the adaptive codebook, and index and gain values of the fixed codebook. For example, a coding parameter for minimizing the error may be obtained using Equation 14 below.
- Equation 14 denotes a weighted input signal, and ⁇ denotes an incremental synthesized signal according to an i th coding parameter.
- FIG. 6 illustrates a process of quantizing an input signal using an adaptive codebook candidate based on second best information as an embodiment to which the present invention is applied. Referring to FIG. 6, the linear prediction analyzer 200 predicts linearly the input signal.
- the linear prediction filter coefficients may be obtained through the analysis period interceptor (S600).
- the linear prediction filter coefficients may be obtained in a process of minimizing an error due to the linear prediction, covariance method, autocorrelation method, lattice filter, Levinson-dervin algorithm Durbin algorithm) can be used as described above.
- the linear prediction filter coefficients may be obtained in units of frames.
- the quantization unit 210 may obtain a quantized emission spectrum frequency vector corresponding to the linear prediction filter coefficients (S610). Hereinafter, a method of obtaining the quantized spectral vector will be described.
- the quantization unit 210 may convert the linear prediction filter coefficients of the current frame into a current random spectral vector to quantize the linear prediction filter coefficients in the spectral frequency domain.
- the quantization unit 210 may measure an error between the spectral vector of the current frame and the codebook of the current frame.
- the codebook of the current frame may mean a codebook used for spectral vector quantization.
- the codebook of the current frame may consist of a quantized code vector and an index assigned to the quantized code vector.
- the quantization unit 210 may measure an error between the spectral vector of the current frame and the codebook and sort the quantized code vector or codebook index in the order of decreasing errors, and may store the error.
- a codebook injection (or quantized codevector) that minimizes an error between the spectral vector of the current frame and the codebook may be extracted.
- a quantized code vector subtracting the codebook index may be used as the quantized spectral vector of the current frame.
- the obtained quantized spectral vector of the current frame can be used as a quantized spectral vector for any of the frames in the current frame.
- the quantization unit 210 may perform the quantized spectral vector (S620).
- the interpolation has been described with reference to FIG. 4, and thus a detailed description thereof will be omitted.
- the quantization unit 210 may obtain a linear prediction filter coefficient corresponding to the interpolated quantized spectral vector.
- the interpolated quantized spectral vector is placed on a linear prediction domain. Can be converted. This can be used to calculate the linear prediction filter and the weighted filter for each subframe.
- the mental weight filter 280 may generate a weighted input signal from the input signal (S630).
- the weighted filter may be obtained from Equation 3 using linear prediction filter coefficients obtained from the interpolated quantized spectral vector.
- the red-eye codebook 230 may obtain an adaptive codebook candidate in consideration of the second best information with respect to the weighted input signal (S640).
- the second best information may mean information on the number of adaptive codebooks obtained in units of frames.
- the second best information may mean information on the number of coding parameters of the adaptive codebook obtained in units of frames.
- the coding parameter of the adaptive codebook may include a delay value and a gain value of the red-eye codebook.
- the adaptive codebook candidate may mean a red-eye codebook obtained according to the second best information.
- the red-eye codebook 230 may obtain a delay value and a gain value corresponding to an error between the target signal of the adaptive codebook and the signal passing through the long-term synthesis filter.
- the delay and gain values may be ordered in the order of least error, which may be stored.
- the delay value and the gain value may be extracted in order of decreasing errors between the target signal of the adaptive codebook and the signal passing through the long-term synthesis filter according to the second best information.
- the extracted delay value and gain value may be used as a delay value and a gain value of the red-hero codebook candidate.
- the long-term synthesis filter candidate may be obtained using the extracted delay value and gain value.
- the adaptive codebook candidate may be obtained by applying the long-term synthesis filter candidate to an input signal or a weighted input signal.
- the fixed codebook 240 may search for the fixed codebook with respect to the target signal of the fixed codebook (S650).
- the target signal and the fixed codebook search process of the fixed codebook have been described with reference to Equations 8 and 9 below.
- the target signal of the fixed codebook may mean a signal from which the zero state answer of the red-hero codebook candidate is removed from the input signal to which the weighting filter 300 is applied.
- the fixed codebook may be searched for the red-hero codebook candidates according to the second best information.
- the adder 250 may generate an excitation signal by multiplying and then adding a gain value to each of the adaptive codebook candidates obtained in S640 and the fixed codebook found in S650 (S660).
- the synthesis filter 260 may generate a synthesis signal by performing synthesis filtering on the excitation signal output from the adder 250 based on linear prediction filter coefficients obtained from the interpolated quantized spectral vector ( S670). When a weighted filter is applied to the synthesis filter 260, a weighted synthesis signal may be generated.
- the error minimizing unit 290 may acquire a coding parameter for minimizing an error between an input signal (or a weighted input signal) and the composite signal (or the weighted composite signal) (S680).
- the coding parameters may include linear prediction filter coefficients, delay and gain values of adaptive codebook candidates, and index and gain values of fixed codebooks. The coding parameter for minimizing the error is as described in Equation 14, and a detailed description thereof will be omitted.
- FIG. 7 illustrates a process of quantizing an input signal using a fixed codebook candidate based on third best information according to an embodiment to which the present invention is applied.
- the linear prediction analyzer 200 may obtain linear prediction filter coefficients through linear prediction analysis on a frame basis with respect to the input signal (S700).
- the linear prediction filter coefficients may be obtained in a process of minimizing an error due to linear prediction.
- the quantization unit 210 may obtain a quantized spectral vector corresponding to the linear prediction filter coefficients (S710). The method of obtaining the quantized spectral vector is described with reference to FIG. 4, and thus a detailed description thereof will be omitted.
- the obtained quantized spectral vector of the current frame may be used as a quantized emission spectral frequency vector for any one of the subframes in the current frame.
- the quantization unit 210 may interpolate the quantized spectral vector (S720).
- the interpolated quantized spectral frequency vector of the remaining subframes in the current frame may be obtained through the interpolation, and the interpolation method will be omitted in FIG. 4.
- the quantizer 210 subtracts the interpolated quantized spectral vector.
- Linear prediction filter coefficients may be obtained.
- the interpolated quantized emission spectral frequency vector may be converted into a linear prediction domain impression. This can be used to calculate the linear prediction filter and the weighted filter for each subframe.
- the psychological weighting filter 280 may generate a weighted input signal from the input signal (S730).
- the weighted filter may be obtained from Equation 3 using linear prediction filter coefficients obtained from the interpolated quantized spectral vector.
- the adaptive codebook 230 may obtain an adaptive codebook with respect to the weighted input signal (S740).
- the red-eye codebook can be obtained with a long-term synthesis filter.
- the long-term synthesis filter may use an optimal delay value and a gain value for minimizing an error between the target signal of the red-eye codebook and the signal passed through the long-range synthesis filter. The method of obtaining the delay value and the gain value is as described above with reference to Equations 6 and 7.
- the fixed codebook 240 may search for the fixed codebook candidate for the target signal of the fixed codebook based on the third best information (S750).
- the third best information may refer to information on the number of coding parameters of a fixed codebook extracted on a frame basis.
- the coding parameter of the fixed codebook may include an index and a gain value of the fixed codebook.
- the target signal of the fixed codebook is as described in Equation (8).
- the fixed codebook 330 may calculate an error between the target signal of the fixed codebook and the fixed codebook. An error between the target signal of the fixed codebook and the fixed codebook may be sorted in a small order, and may be stored.
- the indexes and gain values of the fixed codebook may be sorted and stored in the order of decreasing errors between the target signal of the fixed codebook and the fixed codebook.
- the index and the gain value of the fixed codebook may be extracted in order of decreasing errors between the target signal of the fixed codebook and the fixed codebook according to the third best information.
- the extracted index and gain values of the fixed codebook may be used as indexes and gain values of the fixed codebook candidates.
- the adder 250 fixes the adaptive codebook acquired in S740 and the search found in S750.
- An excitation signal may be generated by multiplying each of the codebook candidates by a gain and then adding the gains (S760).
- the synthesis filter 260 may generate a synthesis signal by performing synthesis filtering on the excitation signal output from the adder 250 by linear prediction filter coefficients obtained from the interpolated quantized spectral vector (S770). .
- a weighted filter When a weighted filter is applied to the synthesis filter 260, a weighted synthesis signal may be generated.
- the error minimizing unit 290 may acquire a coding parameter for minimizing an error between an input signal (or a weighted input signal) and the composite signal (or the weighted composite signal) (S780).
- the coding parameters may include linear prediction filter coefficients, delay and gain values of the adaptive codebook, and index and gain values of the fixed codebook candidates.
- the coding parameter for minimizing the error is as described in Equation 14, and a detailed description thereof will be omitted.
- the input signal may be quantized by combining the first best information, the second best information, and the third best information.
- Industrial Applicability The present invention can be used for speech signal encoding.
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201080056249.4A CN102656629B (zh) | 2009-12-10 | 2010-12-10 | 编码语音信号的方法和设备 |
EP10836230.2A EP2511904A4 (en) | 2009-12-10 | 2010-12-10 | METHOD AND APPARATUS FOR ENCODING A SPEECH SIGNAL |
KR1020127017163A KR101789632B1 (ko) | 2009-12-10 | 2010-12-10 | 음성 신호 부호화 방법 및 장치 |
US13/514,613 US9076442B2 (en) | 2009-12-10 | 2010-12-10 | Method and apparatus for encoding a speech signal |
Applications Claiming Priority (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US28518409P | 2009-12-10 | 2009-12-10 | |
US61/285,184 | 2009-12-10 | ||
US29516510P | 2010-01-15 | 2010-01-15 | |
US61/295,165 | 2010-01-15 | ||
US32188310P | 2010-04-08 | 2010-04-08 | |
US61/321,883 | 2010-04-08 | ||
US34822510P | 2010-05-25 | 2010-05-25 | |
US61/348,225 | 2010-05-25 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2011071335A2 true WO2011071335A2 (ko) | 2011-06-16 |
WO2011071335A3 WO2011071335A3 (ko) | 2011-11-03 |
Family
ID=44146063
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2010/008848 WO2011071335A2 (ko) | 2009-12-10 | 2010-12-10 | 음성 신호 부호화 방법 및 장치 |
Country Status (5)
Country | Link |
---|---|
US (1) | US9076442B2 (ko) |
EP (1) | EP2511904A4 (ko) |
KR (1) | KR101789632B1 (ko) |
CN (1) | CN102656629B (ko) |
WO (1) | WO2011071335A2 (ko) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9728200B2 (en) | 2013-01-29 | 2017-08-08 | Qualcomm Incorporated | Systems, methods, apparatus, and computer-readable media for adaptive formant sharpening in linear prediction coding |
EP3786949B1 (en) * | 2014-05-01 | 2022-02-16 | Nippon Telegraph And Telephone Corporation | Coding of a sound signal |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR960015861B1 (ko) * | 1993-12-18 | 1996-11-22 | 휴우즈 에어크라프트 캄파니 | 선 스펙트럼 주파수 벡터의 양자화 방법 및 양자화기 |
US6108624A (en) | 1997-09-10 | 2000-08-22 | Samsung Electronics Co., Ltd. | Method for improving performance of a voice coder |
US7072832B1 (en) * | 1998-08-24 | 2006-07-04 | Mindspeed Technologies, Inc. | System for speech encoding having an adaptive encoding arrangement |
US6574593B1 (en) * | 1999-09-22 | 2003-06-03 | Conexant Systems, Inc. | Codebook tables for encoding and decoding |
US7389227B2 (en) * | 2000-01-14 | 2008-06-17 | C & S Technology Co., Ltd. | High-speed search method for LSP quantizer using split VQ and fixed codebook of G.729 speech encoder |
KR20010084468A (ko) * | 2000-02-25 | 2001-09-06 | 대표이사 서승모 | 음성 부호화기의 lsp 양자화기를 위한 고속 탐색 방법 |
US7003454B2 (en) | 2001-05-16 | 2006-02-21 | Nokia Corporation | Method and system for line spectral frequency vector quantization in speech codec |
CN1975861B (zh) * | 2006-12-15 | 2011-06-29 | 清华大学 | 声码器基音周期参数抗信道误码方法 |
US8719011B2 (en) | 2007-03-02 | 2014-05-06 | Panasonic Corporation | Encoding device and encoding method |
-
2010
- 2010-12-10 CN CN201080056249.4A patent/CN102656629B/zh not_active Expired - Fee Related
- 2010-12-10 US US13/514,613 patent/US9076442B2/en active Active
- 2010-12-10 EP EP10836230.2A patent/EP2511904A4/en not_active Ceased
- 2010-12-10 WO PCT/KR2010/008848 patent/WO2011071335A2/ko active Application Filing
- 2010-12-10 KR KR1020127017163A patent/KR101789632B1/ko active IP Right Grant
Non-Patent Citations (2)
Title |
---|
None |
See also references of EP2511904A4 |
Also Published As
Publication number | Publication date |
---|---|
WO2011071335A3 (ko) | 2011-11-03 |
KR101789632B1 (ko) | 2017-10-25 |
EP2511904A2 (en) | 2012-10-17 |
CN102656629B (zh) | 2014-11-26 |
CN102656629A (zh) | 2012-09-05 |
US9076442B2 (en) | 2015-07-07 |
KR20120109539A (ko) | 2012-10-08 |
US20120245930A1 (en) | 2012-09-27 |
EP2511904A4 (en) | 2013-08-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR100872538B1 (ko) | Lpc 파라미터의 벡터 양자화 장치, lpc 파라미터복호화 장치, lpc 계수의 복호화 장치, 기록 매체,음성 부호화 장치, 음성 복호화 장치, 음성 신호 송신장치, 및 음성 신호 수신 장치 | |
US8392178B2 (en) | Pitch lag vectors for speech encoding | |
KR100756298B1 (ko) | 고속 코드 여기 선형 예측 파라미터 매핑 방법 및 장치 | |
JP4005154B2 (ja) | 音声復号化方法及び装置 | |
CA2061803C (en) | Speech coding method and system | |
JP6316398B2 (ja) | Celpコーデックにおける励振信号の適応寄与分および固定寄与分の利得を量子化するための装置および方法 | |
KR101849613B1 (ko) | 스피치 관련 스펙트럼 정형 정보를 사용하는 오디오 신호의 인코딩 및 오디오 신호의 디코딩을 위한 개념 | |
JPH0990995A (ja) | 音声符号化装置 | |
WO2010079164A1 (en) | Speech coding | |
KR20180021906A (ko) | 결정론적 및 잡음 유사 정보를 사용하는 오디오 신호의 인코딩 및 오디오 신호의 디코딩을 위한 개념 | |
CN112927703A (zh) | 对线性预测系数量化的方法和装置及解量化的方法和装置 | |
KR20130069546A (ko) | 씨이엘피 부호기 및 복호기에 사용하기 위한 가요성 및 스케일러블 조합형 이노베이션 코드북 | |
JPH0341500A (ja) | 低遅延低ビツトレート音声コーダ | |
WO2011071335A2 (ko) | 음성 신호 부호화 방법 및 장치 | |
CN101192408A (zh) | 选择导谱频率系数矢量量化的方法及装置 | |
WO2000057401A1 (en) | Computation and quantization of voiced excitation pulse shapes in linear predictive coding of speech | |
KR0155798B1 (ko) | 음성신호 부호화 및 복호화 방법 | |
Yen et al. | Introducing compact: An oscillator-based approach to toll-quality speech coding at low bit rates | |
JPH0455899A (ja) | 音声信号符号化方式 | |
JP2003195899A (ja) | 音声/音響信号の符号化方法及び電子装置 | |
JPH0594200A (ja) | コード励振線形予測符号化装置 | |
JP2001100799A (ja) | 音声符号化装置、音声符号化方法および音声符号化アルゴリズムを記録したコンピュータ読み取り可能な記録媒体 | |
JPH0473699A (ja) | 音声符号化方式 | |
JPH1097299A (ja) | ベクトル量子化方法、音声符号化方法及び装置、並びに音声復号化方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201080056249.4 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10836230 Country of ref document: EP Kind code of ref document: A1 |
|
REEP | Request for entry into the european phase |
Ref document number: 2010836230 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010836230 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13514613 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 20127017163 Country of ref document: KR Kind code of ref document: A |