WO1994029965A1 - Codeur-decodeur predictif lineaire a excitation par codes - Google Patents

Codeur-decodeur predictif lineaire a excitation par codes Download PDF

Info

Publication number
WO1994029965A1
WO1994029965A1 PCT/JP1993/000776 JP9300776W WO9429965A1 WO 1994029965 A1 WO1994029965 A1 WO 1994029965A1 JP 9300776 W JP9300776 W JP 9300776W WO 9429965 A1 WO9429965 A1 WO 9429965A1
Authority
WO
WIPO (PCT)
Prior art keywords
code
excitation
vector
excited linear
decoder
Prior art date
Application number
PCT/JP1993/000776
Other languages
English (en)
Japanese (ja)
Inventor
Kenichiro Hosoda
Hiromi Aoyagi
Hiroshi Katsuragawa
Yoshihiro Ariyama
Original Assignee
Oki Electric Industry Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oki Electric Industry Co., Ltd. filed Critical Oki Electric Industry Co., Ltd.
Priority to PCT/JP1993/000776 priority Critical patent/WO1994029965A1/fr
Priority to EP03013629A priority patent/EP1355298B1/fr
Priority to US08/379,653 priority patent/US5727122A/en
Priority to EP93913500A priority patent/EP0654909A4/fr
Priority to DE69334115T priority patent/DE69334115T2/de
Priority to SG1996004078A priority patent/SG43128A1/en
Priority claimed from SG1996004078A external-priority patent/SG43128A1/en
Publication of WO1994029965A1 publication Critical patent/WO1994029965A1/fr
Priority to NO950490A priority patent/NO950490L/no

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/06Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
    • G10L19/07Line spectrum pair [LSP] vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/10Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0004Design or structure of the codebook
    • G10L2019/0005Multi-stage vector quantisation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Definitions

  • the present invention relates to an encoder and a decoder that follow a code-excited linear predictive coding system (CELP).
  • CELP code-excited linear predictive coding system
  • code-excited linear predictive coding which is a modification of the code-excited linear predictive coding
  • VSELP vector-addition-excited linear predictive coding
  • Coders using the code-excited linear prediction coding method are described in, for example, the literature “NSJarna and JHChen,“ Speech Coding with Time-Varying Bit Allocations to Excitation and LPC Parameters ”, Proc. ICASSP, p65-68, 1989 J.
  • the basic configuration of an encoding method for a speech signal is to find a vocal tract parameter representing vocal tract characteristics of speech and a sound source parameter representing sound source information.
  • an excitation signal as sound source information is converted to an adaptive excitation code vector that contributes to voiced sound with statistically strong periodicity and a random non-speech signal with statistically weak periodicity.
  • the statistical excitation code vector that contributes to the input speech vector and stored in the codebook are coded with the statistical excitation code vector that contributes to the input speech vector and stored in the codebook, and the sum of the weighted error powers of the input speech vector and the synthesized speech vector is minimized.
  • the encoding process is performed by finding the optimal adaptive excitation code vector and statistical excitation code vector in each small codebook.
  • a forward-type coding for obtaining vocal tract parameters from the input speech vector At least the sound source parameters, that is, the information of the optimal adaptive excitation code and the statistical excitation code, are transmitted, regardless of the format or the coding method of the background type that obtains the vocal tract parameters from the synthesized speech vector.
  • the adaptive excitation codebook is updated adaptively by the synthesis code vector of the optimal adaptive excitation code vector and the statistical excitation code vector. It can be said that it is formed based on the code vector. For this reason, voiced sounds with strong periodicity have a slow rise time, and codes with strong pulse characteristics cannot be formed even in the stationary part of voiced sounds, and the reproduced voice lacks clarity.
  • the present invention has been made in consideration of the above points, and has a code capable of obtaining a high-quality reproduced voice even when a noise component having a strong pulse property is included in an input voice vector.
  • the objective is to provide an excitation linear predictive encoder and decoder.
  • the present invention improves the quality of reproduced audio even at low coding rates. It is intended to provide a code-excited linear prediction encoder and decoder that can be enhanced. Disclosure of the invention
  • the present invention relates to a code excitation linear prediction encoder that uses an excitation signal as an excitation codebook as sound source information of a voice, and a fixed code vector such as a statistical excitation code vector output from the excitation codebook.
  • a code vector conversion circuit is provided in the code excitation linear predictive encoder, which converts the frequency characteristics of the above into the frequency characteristics determined when the excitation code vector is output.
  • the code vector conversion circuit is provided for the following reason.
  • the frequency characteristics of the excitation signal have been theoretically modeled as white, but it has been experimentally confirmed that the frequency characteristics of the excitation signal are not white and have characteristics close to the frequency characteristics of the input speech vector. I have.
  • information representing the frequency characteristics includes parameters of LPC (linear prediction coefficient) and information of an optimal adaptive excitation code (including VQ gain corresponding thereto), which means pitch prediction information. Therefore, the code vector conversion circuit operates the frequency characteristics of the fixed code vector such as the statistical excitation code vector based on such information.
  • the code vector conversion circuit that brings the frequency characteristics of the fixed code vector closer to the frequency characteristics of the input speech vector is used. It is also provided in the linear predictive decoder. Then, in this code vector conversion circuit, an impulse response determined based on a vocal tract parameter as a filter transfer function H (Z),
  • H (Z) (1 - ⁇ Aj ajZ-J) / (1 - ⁇ BJ ajZ- ⁇ (1)
  • the adaptive excitation vector Is added to create the excitation code vector.
  • aj (j is 1 to p) is a parameter of LPC
  • p is a vocal tract analysis order.
  • A, B, and ⁇ are predetermined constants in the range of 0 ⁇ ⁇ 1, 0 ⁇ ⁇ 1, and 1 ⁇ 1, and L is calculated from the index of the adaptive excitation code vector. Pitch lag.
  • a pulsed excitation codebook that stores the pulsed excitation vector is provided so that the voiced sound with a strong periodicity can start quickly, and a clear pulse can be generated even in the stationary part of the voiced sound. This makes it possible to form a strong excitation code vector.
  • an excitation vector from a statistical excitation codebook or a pulsed excitation codebook is selected and used, and the selected information is used as a code excitation linear prediction code.
  • the excitation vector from the statistical excitation codebook or the pulse excitation codebook is selected based on the selection information given from the code-excitation linear prediction encoder. It is designed to improve playback quality at low encoding speeds.
  • the output vocal tract parameters are set as LSP (line spectrum pair) parameters, and the code-excited linear predictive decoder is used to reproduce the line spectral pair parameters.
  • LSP line spectrum pair
  • the code-excited linear predictive decoder is used to reproduce the line spectral pair parameters.
  • the playback quality at low coding rates is improved in terms of vocal tract parameters.
  • the LSP parameter is used as the vocal tract parameter because the interpolation characteristics with respect to the vocal tract frequency characteristics are improved, and the LSP parameter is better than the LPC parameter even if the LSP parameter is coded with a small number of coding bits. This is because there is an advantage that distortion imparted to the spectrum is small and that efficient coding can be performed by a combination with the vector quantization method.
  • FIG. 1 is a block diagram showing the structure of first and second code-excited linear prediction encoders according to the present invention
  • FIG. 2 is a block diagram showing first and second code-excited linear prediction encoders. It is a block diagram which shows the structure of the corresponding code excitation linear prediction decoder.
  • FIG. 3 is a block diagram showing the structure of a third code-excited linear prediction encoder according to the present invention
  • FIG. 4 is a code diagram corresponding to the third code-excited linear prediction encoder.
  • FIG. 4 is a block diagram showing a structure of a linear excitation linear prediction decoder.
  • FIG. 5 is a block diagram showing a detailed configuration of the code vector conversion circuit described in FIG. 3 or FIG. BEST MODE FOR CARRYING OUT THE INVENTION Code Excited Linear Prediction Encoder and Code Excited Linear Prediction Preferred embodiments of the encoder are described in detail with reference to the drawings.
  • FIG. 1 is a block diagram showing a structure of a first code excitation linear prediction encoder according to the present invention.
  • an input speech vector S which is grouped in a frame unit from an input terminal 101 and input as a vector, is first input to a vocal tract analysis circuit 102, and a vocal tract prediction parameter aj is calculated.
  • the LPC (linear prediction coefficient) quantization circuit 103 LPC-quantizes the vocal tract prediction parameters a j and sends the code Ic (LPC code) to the LPC inverse quantization circuit 104 and the multiplexing circuit 106.
  • the LPC inverse quantization circuit 104 inversely transforms the LPC code Ic into a vocal tract prediction parameter aqj and sends it to the synthesis filter 105.
  • the code vector conversion circuit 109 convolves the statistical excitation code vector e si from the statistical excitation codebook 108 using the impulse response of the filter transfer function H (Z) shown in the following equation (3). Calculates and outputs the modified statistical excitation code vector escl.
  • H (Z) (1 - ⁇ 0.4j aqj Z-no (1— ⁇ 0.9 ⁇ aqj Z) ⁇ ⁇ ⁇ (3)
  • the adaptive excitation vector eai is multiplied by the gain / 9k by the multiplier 113 to become the vector eaik, while the modified statistical excitation code vector e scl is multiplied by the gain yk by the multiplier 114 to become the vector esclk.
  • the adder 115 calculates the excitation code vector e by adding the components of the vector eaik and the vector esclk in component units.
  • the synthesis filter 105 calculates a synthesized speech vector Sw for the excitation code vector e, and sends it to the subtractor 116.
  • the subtractor 116 subtracts the component of the synthesized speech vector Sw and the input speech vector S on a component basis, and sends an error vector er to the perceptual filter 111.
  • the perceptual filter 111 sends a perceptual error vector ew for the error vector er to the perceptual error calculation circuit 112.
  • the perceptual error calculation circuit 112 calculates the root mean square of each component of the perceptual error vector ew, and determines the excitation code vector (ie, the combination of i, 1 and k) that minimizes this value as the input voice at the current time. The optimal excitation code vector of the vector is determined. Then, the indexes Ia, 18 and 1 of each codebook at that time are sent to the adaptive excitation codebook 107, the statistical excitation codebook 108, the VQ gain codebook 110 and the multiplexing circuit 106.
  • the adaptive excitation codebook 107 has the optimal adaptive excitation code vector ea based on the index Ia.
  • the statistical excitation codebook 108 outputs the optimal statistical excitation code vector es according to the index I s. Output, and the VQ gain codebook 110 provides the optimal VQ gain /? And a. Is output.
  • the code vector conversion circuit 109 outputs the statistical excitation code vector es output from the statistical excitation codebook based on the index Is.
  • the optimal modified statistical excitation code vector esc. And output.
  • the optimal excitation code vector e. pt is input to the adaptive excitation codebook 107, and changes the contents of the adaptive excitation codebook 107.
  • FIG. 2 is a block diagram of a code-excited linear prediction encoder corresponding to the code-excited linear prediction encoder of FIG.
  • a total code C input from an input terminal 201 is demultiplexed by an LPC code Ic, an adaptive excitation code index Ia, a statistical excitation code index Is and a VQ gain code by a demultiplexing circuit 212.
  • the signals are separated into an index Ig and sent to an LPC inverse quantization circuit 202, an adaptive excitation codebook 204, a statistical excitation codebook 205, and a VQ gain codebook 207, respectively.
  • the LPC inverse quantization circuit 202 converts the LPC code Ic into a vocal tract prediction parameter aj and sends it to the synthesis filter 203.
  • the adaptive excitation codebook 204 stores the adaptive excitation code vector ea based on the index la
  • the statistical excitation codebook 205 stores the statistical excitation code vector es based on the index Is
  • the VQ gain codebook 207 stores the index Ig based on the index Is.
  • the excitation gain / 3 and ⁇ are output based on this.
  • the code vector conversion circuit 206 converts the vector e s into the vector e sc in the same manner as the above-mentioned code excitation linear predictive encoder, and outputs it.
  • the adaptive excitation vector e a is multiplied by a gain /? By a multiplier 208, while the vector e sc is multiplied by a gain y by a multiplier 209. Then, the adder 210 performs addition in component units of these multiplied vectors to obtain an excitation code vector e.
  • the synthesis filter 203 calculates a synthesized speech vector S for the excitation code vector e, and outputs it from the output terminal 211.
  • the content of the adaptive excitation codebook 204 is updated by the vector e.
  • the configuration of the second code-excited linear prediction encoder is the same as that of the first code-excited linear prediction encoder except for the code vector conversion circuit 109. Now, only the operation of the code vector conversion circuit 109 will be described in detail.
  • the code vector conversion circuit 109 has a filter transfer function represented by the following equation (4).
  • L the pitch lag calculated from the index of the adaptive excitation code.
  • the index of the adaptive excitation code and the pitch lag correspond one-to-one as follows, for example.
  • the convolution process of the first and second code excitation linear predictive encoders is as follows, where esl is the output statistical excitation code vector of the statistical excitation codebook, escl is the converted statistical excitation code vector, and h is the impulse response. (5).
  • X, y and h are the respective elements, and n is the subframe length (or frame length).
  • the impulse response is the impulse response of the transfer function expressed using vocal tract parameters in the case of the first code-excited linear predictive encoder using the short-time nature of the input speech vector.
  • the transfer expressed using pitch lag is used. This is the impulse response of the arrival function.
  • FIG. 3 is a block diagram showing a structure of a third code excitation linear prediction encoder according to the present invention.
  • the code-excited linear predictive encoder generally includes an input speech processing unit 301, an optimum synthesized speech search unit 302, and a multiplexing circuit 303.
  • the input speech processing unit 301 includes an LSP parameter analysis circuit 311, an LSP parameter encoding circuit 312, an LSP parameter decoding circuit 313, an LPC inverse quantization circuit 314, a weighting filter 315, a synthesis filter zero input response generation circuit 316, It comprises a weighting filter zero input response generation circuit 317, a subtractor 318, and a subtractor 319.
  • the digitized discrete input speech vector sequence is accumulated for a time corresponding to the analysis frame length for obtaining the vocal tract parameters. Is divided into several subframes and processed by the input audio processing unit 301.
  • the input speech vector is provided to an LSP parameter analysis circuit 311, where the LSP parameter analysis circuit 311 analyzes the LSP parameter and converts it into LSP parameters as vocal tract parameters.
  • LSP parameters are encoded (for example, vectorized) by an LSP parameter encoding circuit 312, provided to a multiplexing circuit 303, and transmitted to a code excitation linear prediction decoder side. Further, the encoded LSP parameters are decoded (vector inverse quantization) by an LSP parameter decoding circuit 313 and then converted to LPC by an LPC inverse quantization circuit 314.
  • the above-mentioned input speech vector is given to the weighting filter 315 and weighted in consideration of human auditory characteristics, and then given to the subtracter 318 as a subtracted input. Further, to the subtracter 318, a zero input response vector relating to the synthesis filter 329 generated by the synthesis filter zero input response generator 316 using LPC as a tap coefficient is given as a subtraction input. Thus, a speech vector is obtained in which the influence of the state of the synthesis filter 329 in the immediately preceding analysis frame has been removed, and this is provided to the subtractor 319 as an input to be subtracted.
  • a zero input response vector related to the weighting filter 315 generated by the weighting filter zero input response generator 317 using the LPC as a sunset coefficient is given as a subtraction input.
  • a speech vector from which the influence of the state of the weighting filter 315 in the immediately preceding analysis frame has been removed is obtained, and this is given to a subtractor 330 described later as a target speech vector.
  • the optimum synthesized speech search unit 302 searches for sound source parameters whose synthesized speech vector obtained by local reproduction is most similar to the target speech vector.
  • the adaptive excitation codebook 320, statistical excitation codebook 321 and pulsed excitation codebook 322 are adaptive excitation code vector, statistical excitation code vector, and pulsed excitation code vector, which are waveform codes related to excitation signals, respectively.
  • the VQ gain codebook 323 stores the VQ for the adaptive excitation code vector and the fixed excitation code vector (general term for the statistical excitation code vector and the pulse excitation excitation vector). It stores the gain code.
  • the adaptive excitation code vector and the statistical excitation code vector are waveform excitation vectors in which the adaptive excitation code vector contributes to a voiced sound with strong statistical periodicity, as in the past. This is a waveform excitation code vector whose code vector contributes to random unvoiced sound with statistically weak periodicity. Note that the adaptive excitation code vector of the adaptive excitation codebook 320 is adaptively updated as described later.
  • the pulse excitation code vector is a waveform excitation code vector consisting of an isolated impulse, and takes into account the fact that it contributes to the rise of a voiced sound with a strong periodicity and the steady portion of a voiced sound with a clear pulse. It was done.
  • the VQ gain code is, for example, vector quantized, and one component of the vector relates to the VQ gain of the adaptive excitation code vector, and the other component relates to the VQ gain of the fixed code vector.
  • the pulsed sound source vector is a simple signal having a periodicity, it is considered that the pulsed signal can be generated by a pulse signal generation circuit.
  • This code is coded like a linear excitation predictive encoder. It is preferable to generate the information by reading it from the book 322 for the following reasons. That is, the sound source vector can be synchronized with the output from the adaptive excitation codebook 320, and the statistical excitation.
  • multiplexing processing when selecting a statistical excitation code vector or a pulse excitation code vector and transmitting the selected vector to the decoder will be described later. This is because it becomes easy.
  • the optimum code of various codes whose locally reproduced synthesized speech vector is most similar to the target speech vector is obtained, and the index is given to the multiplexing circuit 303, and the code excitation linearity is calculated. Transmit to the predictive decoder side.
  • the search for the optimal code including the selection process of the statistical excitation code vector or the pulse excitation code vector is performed by the adaptive excitation code,
  • the statistical excitation code, the pulse excitation code, and the VQ gain code are executed in this order.
  • the outputs from the statistical excitation codebook 321 and the pulse excitation codebook 322 are set to 0, and the VQ gain controller 324 sets the VQ gain to an appropriate value. Multiply by a factor (for example, 1).
  • the adaptive excitation codebook 320 outputs all stored adaptive excitation code vectors in time order or in parallel, and outputs the synthesized filter 329 via the VQ gain controller 324 and the adder 325. Is given as an excitation code vector.
  • the synthesis filter 329 performs convolution processing on this excitation code vector using the LPC given from the LPC inverse quantization circuit 314 as a tap coefficient, and only the content of the adaptive excitation code vector is used as a sound source parameter.
  • the reflected synthesized speech vector is obtained for all adaptive excitation code vectors.
  • the subtractor 330 obtains an error vector between the synthesized speech vector reflecting only the content of the adaptive excitation code vector and the target speech vector for all the adaptive excitation code vectors, and calculates an error.
  • Power sum calculation circuit 331 calculates the sum of squares of the components of the error vector (error power Is calculated for all the adaptive excitation code vectors, and given to the minimum error power sum code selection circuit 332.
  • the minimum error power sum code selection circuit 332 determines the adaptive excitation code vector having the minimum error power sum as the optimal one.
  • a search for an optimal statistical excitation code vector is performed.
  • the fixed code selection switch 326 is switched to the statistical excitation codebook 321 and the adaptive excitation codebook 320 is output. Is set to 0. -At this time, the optimal adaptive excitation code vector obtained earlier may be output.
  • the statistical excitation codebook 321 outputs all the stored statistical excitation code vectors in time order or in parallel, and switches the fixed code selection switch 326 and the VQ gain controller 324. Input to the code vector converter 328 via
  • the code vector conversion circuit 328 converts the frequency characteristics of the input statistical excitation code vector into the frequency characteristics of the input speech vector corresponding to the time length of the statistical excitation code vector. Perform the conversion operation to make them closer. All the statistical excitation vectors whose frequency characteristics have been converted in this way are added to the synthesis filter 329 as excitation code vectors via the adder 325 (in this case, they do not function as adders). Given. Subsequent processing is performed in the same manner as the search for the optimal adaptive excitation vector, and the minimum error power sum code selection circuit 332 determines the optimal statistical excitation vector.
  • the search for the optimal pulse excitation code vector is performed next.
  • the fixed code selection switch 326 is switched to the pulse excitation codebook 322, and the output of the adaptive excitation codebook 320 becomes zero.
  • the pulse excitation code book 322 stores all the stored pulse excitation codes. The vectors are output in chronological order or in parallel. Subsequent processing is the same as when searching for the optimal statistical excitation vector, and a description thereof will be omitted.
  • the minimum error power sum code selection circuit 332 determines the error power sum of the optimal statistical excitation code vector and the optimal pulse excitation code. The sum of the error power of the code vector and the sum of the error powers is compared, and the one with the smaller error power sum is determined as the fixed code to be transmitted to the code-excitation linear prediction decoder.
  • the search for the optimal VQ gain code is performed.
  • the adaptive excitation codebook 320 outputs the optimal adaptive excitation code vector
  • the fixed code selection switch 326 selects the selected statistical excitation codebook 321 or pulsed excitation.
  • the codebook is switched to the codebook 322, and the optimum fixed code vector is output from the selected fixed codebook 321 or 322.
  • One VQ gain codebook 323 consists of VQ gain for adaptive excitation code vector and VQ gain for fixed code vector, and VQ gain for adaptive excitation code vector is VQ gain.
  • the VQ gain for the fixed code vector is provided to the VQ gain controller 327.
  • the VQ gain-controlled optimal adaptive excitation code vector and the frequency characteristic operation and the VQ gain-controlled optimal fixed code vector are added by the adder 325, and the result is obtained as the excitation code vector.
  • the synthesis filter is provided to the filter 329. Such processing is performed on all the VQ gain codes in the VQ gain codebook 323 in time order or in parallel. The processing at the time of search after the synthesis filter 329 is the same as the processing at the time of searching for other codes.
  • the multiplexing circuit 303 multiplexes the LSP parameter given from the LSP parameter coding circuit 312 and the information, and outputs the multiplexed result to the code excitation linear predictive decoding side.
  • the transmitted index is a vector number.
  • the minimum error power sum code selection circuit 332 converts the index and fixed code selection switch information to be given to the multiplexing circuit 303 into a corresponding codebook (320 and 323 and 321 or 322) or a fixed code selection switch. Give to 326. At this time, the switch 326 is switched, and the optimal code is output from each codebook. As a result, an excitation vector that can form a synthesized speech vector closest to the target speech vector during the current subframe processing is output from the adder 325, and this is output to the adaptive excitation codebook 320. Given. Then, the adaptive excitation codebook 320 performs an adaptive excitation code update process.
  • the above encoding process is repeated for each subframe, and the encoded speech vector is sequentially transmitted to the code excitation linear prediction decoder.
  • FIG. 5 shows a detailed configuration of the code vector conversion circuit 328 described above.
  • the code vector conversion circuit 328 includes two cascade-connected filters 328a and 328b and a pitch lag determination circuit 328c.
  • the fixed code vector output from the fixed code selection switch 326 is provided to the first filter 328a.
  • the impulse response H1 (Z) of the first filter 328a is selected as shown in the following equation (6), whereby the frequency conversion with respect to the input fixed code vector is performed. Perform the operation.
  • H1 (Z) (1 ⁇ Ai ajZ- — ⁇ ajZ) —— (6)
  • aj (j is 1 to p) is a tap coefficient for the synthesis filter 329 supplied from the LPC inverse quantization circuit 314.
  • Yes, p is the vocal tract analysis order.
  • a and B are constants that are predetermined in the range of 0 ⁇ A and B ⁇ 1.
  • the fixed code vector whose frequency characteristic is manipulated by the first filter 328a is converted to the second filter 328b Is input to
  • the pitch lag determination circuit 328c obtains the pitch lag L from the index of the optimal adaptive excitation code for the adaptive excitation codebook 320, and provides the pitch lag L to the second filter 328b.
  • the impulse response H2 (Z) of the second filter 328b is selected as shown in the following equation (7), whereby the frequency conversion operation is performed on the input fixed code vector. .
  • is a predetermined constant in the range of 0 ⁇ 1.
  • the output power of the second filter 328b is provided to the VQ gain controller 327 shown in FIG.
  • the code vector conversion circuit 328 having such a detailed configuration changes the frequency characteristics of the input fixed code vector to the time length of the fixed code vector. Correspondingly, it can approach the frequency characteristics of the input speech vector.
  • FIG. 4 is a block diagram showing the structure of a code-excited linear prediction decoder corresponding to the code-excited linear prediction encoder of FIG.
  • the code-excited linear predictive decoder is a demultiplexer 440, an LSP parameter decoder 441, an LPC dequantizer 442, an adaptive excitation codebook 443, a statistical excitation codebook 444, and a pulsed excitation code. It consists of a bookbook 445, a VQ gain codebook 446, a VQ gain controller 447, a VQ gain controller 449, a fixed code selection switch 448, a frequency characteristic operation unit 450, an adder 451, and a synthesis filter 452.
  • the coded speech vector given from the code excitation linear prediction coder side is input to the demultiplexing circuit 440.
  • the demultiplexing circuit 440 separates the coded speech vector into LSP parameters, the index of the optimal adaptive excitation code, the index of the optimal fixed code, the index of the optimal VQ gain code, and the fixed code selection switch information. .
  • the LSP parameters are given to the LSP parameter decoding circuit 441, the index of the optimal adaptive excitation code is given to the adaptive excitation codebook 443, and the index of the optimal VQ gain code is given to the VQ gain codebook 446.
  • the fixed code selection switch information is provided to the fixed code selection switch 448.
  • the index of the optimal fixed code is given to the statistical excitation codebook 444 or the pulsed excitation codebook 445 determined based on the fixed code selection switch information.
  • the adaptive excitation codebook 443 outputs an adaptive excitation code vector determined by the given index, and the adaptive excitation vector is VQ-gain controlled via the VQ gain controller 447 and is output to the adder 451. Given. Further, the adaptive excitation codebook 443 provides the adaptive excitation code vector to the code vector conversion circuit 450.
  • the Statistical Excitation Codebook 444 or the Pulse Excitation Codebook 445 contains the statistical excitation vector or pulsed excitation code corresponding to the given index.
  • the code vector is supplied to a code vector conversion circuit 450 via a fixed code selection switch 448.
  • the code vector conversion circuit 450 operates based on the LPC and the index of the adaptive excitation code vector so that its frequency characteristic is close to the frequency characteristic of the input speech vector.
  • the more detailed configuration of the code vector conversion circuit 450 is the same as that in FIG. 5 described above.
  • the fixed code vector whose frequency characteristics have been manipulated in this way is subjected to VQ gain control by a VQ gain controller 449, and is provided to an adder 451.
  • the adder 451 adds the given adaptive excitation code vector and the fixed code vector, sets the added vector as an excitation code vector, and provides the resultant to the synthesis filter 452.
  • the synthesis filter 452 convolves the excitation code vector with LPC to form a synthesized speech vector and outputs it.
  • This code-excited linear predictive decoder performs the above processing every time a decoded speech vector is given, that is, for each subframe.
  • Characteristic features of the present invention are that transmission is performed using LSP parameters as vocal tract parameters, that a pulse excitation codebook is provided to provide sound source parameters, and that a fixed code base is used. The point is that the frequency characteristics of the vector are being manipulated. Each of these features is effective even if each is incorporated in the encoder and the decoder independently.
  • the encoder and the decoder relate to a forward-type code-excited linear predictive encoder and a decoder
  • the present invention relates to a backward-type code-excited linear-prediction encoder and a decoder. It can also be applied to gasifiers.
  • the above encoder and decoder are designed to solve the problems arising from the encoding speed of 4 bit / s or less. Even better adaptation to encoders and decoders You can get live audio. If the coding speed allows, the statistical excitation codebook and the pulse excitation codebook may not always be selected, but both may always be activated. Industrial applicability
  • the excitation code vector is considered in consideration of the fact that the frequency characteristics of the actual excitation code vector are close to those of the input speech vector.
  • convolution processing is performed on the statistical excitation code vector using a specific impulse response, and then added to the adaptive excitation code vector. Since there is a means to create the excitation code vector, an excitation code vector that is well suited to the input speech vector can be obtained even with a small number of vectors, and the quantization error vector associated with this conversion is obtained. A masking effect on the torque is produced, and the reproduction quality can be improved.
  • a pulse excitation excitation codebook that stores a pulse excitation excitation vector consisting of isolated impulse is provided, so that voiced sound with strong periodicity is provided. As a result, it is possible to form the excitation code vector with a clear pulse characteristic even in the stationary part of voiced sound.
  • the pulse excitation code vector and the statistical excitation code vector are switched, they can cope with low encoding speeds and have a mixture of random and pulse signals. A good reproduced sound can be obtained even for the signal of the period.
  • the excitation god vector from the statistical excitation codebook or the pulse excitation codebook is selected and used, so that the encoding bit of the sound source parameter is used. Number of Good reproduction sound can be obtained in a small number of states.
  • the vocal tract parameters used for voice synthesis are encoded with a small number of encoded bits, the vocal tract parameters are reduced by LPC or the like. Since the LSP parameter has a small distortion to the vector, it is possible to improve the reproduction quality at low coding speed from the viewpoint of the vocal tract parameters.

Abstract

Ce codeur-décodeur, qui utilise un codage prédictif linéaire à excitation par codes (technologie CELP), fonctionne par transformation adaptative de vecteurs de codage prédictif linéaire (LPC) sur la base des données résultant de l'analyse de la voix, les vecteurs de codage étant fournis par une liste de codage contenant des codes invariants, telle qu'une liste de codage statistique de codes excitateurs réalisée à partir de listes de codage de signaux excitateurs. Cette méthode génère une reproduction de la voix de haute qualité à une faible vitesse de codage. En outre, pour obtenir des effets similaires, la liste de codage adaptative de signaux excitateurs et la liste de codage statistique de signaux excitateurs sont complétées d'une liste de codage de signaux excitateurs impulsionnels constituée d'impulsions isolées. Cela permet d'utiliser sélectivement la liste de codage statistique de signaux excitateurs et la liste de codage de signaux excitateurs impulsionnels; les paramètres de l'appareil vocal sont constitués de paramètres de couples de spectres de raies.
PCT/JP1993/000776 1993-06-10 1993-06-10 Codeur-decodeur predictif lineaire a excitation par codes WO1994029965A1 (fr)

Priority Applications (7)

Application Number Priority Date Filing Date Title
PCT/JP1993/000776 WO1994029965A1 (fr) 1993-06-10 1993-06-10 Codeur-decodeur predictif lineaire a excitation par codes
EP03013629A EP1355298B1 (fr) 1993-06-10 1993-06-10 Codeur-décodeur prédictif linéaire à excitation par codes
US08/379,653 US5727122A (en) 1993-06-10 1993-06-10 Code excitation linear predictive (CELP) encoder and decoder and code excitation linear predictive coding method
EP93913500A EP0654909A4 (fr) 1993-06-10 1993-06-10 Codeur-decodeur predictif lineaire a excitation par codes.
DE69334115T DE69334115T2 (de) 1993-06-10 1993-06-10 CELP Kodierer und Dekodierer
SG1996004078A SG43128A1 (en) 1993-06-10 1993-06-10 Code excitation linear predictive (celp) encoder and decoder
NO950490A NO950490L (no) 1993-06-10 1995-02-09 Kode-eksiterende, lineært forutsigbar (CELP) koder og dekoder

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
PCT/JP1993/000776 WO1994029965A1 (fr) 1993-06-10 1993-06-10 Codeur-decodeur predictif lineaire a excitation par codes
SG1996004078A SG43128A1 (en) 1993-06-10 1993-06-10 Code excitation linear predictive (celp) encoder and decoder

Publications (1)

Publication Number Publication Date
WO1994029965A1 true WO1994029965A1 (fr) 1994-12-22

Family

ID=26434408

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP1993/000776 WO1994029965A1 (fr) 1993-06-10 1993-06-10 Codeur-decodeur predictif lineaire a excitation par codes

Country Status (2)

Country Link
NO (1) NO950490L (fr)
WO (1) WO1994029965A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0333900A (ja) * 1989-06-30 1991-02-14 Fujitsu Ltd 音声符号化方式
JPH03171828A (ja) * 1989-11-29 1991-07-25 Sony Corp 圧縮符号化装置及び方法
JPH0451100A (ja) * 1990-06-18 1992-02-19 Sharp Corp 音声情報圧縮装置
JPH0451199A (ja) * 1990-06-18 1992-02-19 Fujitsu Ltd 音声符号化・復号化方式

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0333900A (ja) * 1989-06-30 1991-02-14 Fujitsu Ltd 音声符号化方式
JPH03171828A (ja) * 1989-11-29 1991-07-25 Sony Corp 圧縮符号化装置及び方法
JPH0451100A (ja) * 1990-06-18 1992-02-19 Sharp Corp 音声情報圧縮装置
JPH0451199A (ja) * 1990-06-18 1992-02-19 Fujitsu Ltd 音声符号化・復号化方式

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP0654909A4 *

Also Published As

Publication number Publication date
NO950490D0 (no) 1995-02-09
NO950490L (no) 1995-03-29

Similar Documents

Publication Publication Date Title
JP3134817B2 (ja) 音声符号化復号装置
EP1235203A2 (fr) Procédé de dissimulation de pertes de trames de parole et décodeur pour cela
US5727122A (en) Code excitation linear predictive (CELP) encoder and decoder and code excitation linear predictive coding method
EP0926660B1 (fr) Procédé de codage et décodage de la parole
JP2002202799A (ja) 音声符号変換装置
JPH0353300A (ja) 音声符号化装置
WO2004097796A1 (fr) Dispositif et procede de codage audio et dispositif et procede de decodage audio
KR20010099763A (ko) 광대역 신호들의 효율적 코딩을 위한 인식적 가중디바이스 및 방법
JPH10187197A (ja) 音声符号化方法及び該方法を実施する装置
EP1019907A2 (fr) Codage de signal vocal
US6826527B1 (en) Concealment of frame erasures and method
JP2003223189A (ja) 音声符号変換方法及び装置
WO2005106850A1 (fr) Appareil de codage de hiérarchie et procédé de codage de hiérarchie
JP2002518694A (ja) 音声符号化装置及び音声復号化装置
JP3063668B2 (ja) 音声符号化装置及び復号装置
JP2001154699A (ja) フレーム消去の隠蔽及びその方法
US7346503B2 (en) Transmitter and receiver for speech coding and decoding by using additional bit allocation method
JP3199142B2 (ja) 音声の励振信号符号化方法および装置
JP2968109B2 (ja) コード励振線形予測符号化器及び復号化器
WO1994029965A1 (fr) Codeur-decodeur predictif lineaire a excitation par codes
JPS6238500A (ja) 高能率音声符号化方式とその装置
JP3490325B2 (ja) 音声信号符号化方法、復号方法およびその符号化器、復号器
JP3232701B2 (ja) 音声符号化方法
JP2004348120A (ja) 音声符号化装置、音声復号化装置及びこれらの方法
EP1355298A2 (fr) Codeur-décodeur prédictif linéaire à excitation par codes

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): NO US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE

WWE Wipo information: entry into national phase

Ref document number: 08379653

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 1993913500

Country of ref document: EP

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWP Wipo information: published in national office

Ref document number: 1993913500

Country of ref document: EP

WWR Wipo information: refused in national office

Ref document number: 1993913500

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1993913500

Country of ref document: EP