WO1991003790A1 - Digital speech coder having improved sub-sample resolution long-term predictor - Google Patents

Digital speech coder having improved sub-sample resolution long-term predictor Download PDF

Info

Publication number
WO1991003790A1
WO1991003790A1 PCT/US1990/003625 US9003625W WO9103790A1 WO 1991003790 A1 WO1991003790 A1 WO 1991003790A1 US 9003625 W US9003625 W US 9003625W WO 9103790 A1 WO9103790 A1 WO 9103790A1
Authority
WO
WIPO (PCT)
Prior art keywords
long
signal
samples
vector
filter
Prior art date
Application number
PCT/US1990/003625
Other languages
English (en)
French (fr)
Inventor
Ira Alan Gerson
Mark A. Jasiuk
Original Assignee
Motorola, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=23590969&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=WO1991003790(A1) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by Motorola, Inc. filed Critical Motorola, Inc.
Priority to EP91905041A priority Critical patent/EP0450064B2/de
Priority to DK91905041T priority patent/DK0450064T4/da
Priority to AT91905041T priority patent/ATE191987T1/de
Priority to DE69033510T priority patent/DE69033510T3/de
Publication of WO1991003790A1 publication Critical patent/WO1991003790A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0011Long term prediction filters, i.e. pitch estimation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0012Smoothing of parameters of the decoder interpolation

Definitions

  • the present invention generally relates to digital speech coding at low bit rates, and more particularly, is directed to an improved method for determining long-term predictor output responses for code-excited linear prediction speech coders.
  • Code-excited linear prediction is a speech coding technique which has the potential of producing high quality synthesized speech at low bit rates, i.e., 4.8 to 9.6 kilobits-per- second (kbps).
  • This class of speech coding also known as vector- excited linear prediction or stochastic coding, will most likely be used in numerous speech communications and speech synthesis applications.
  • CELP may prove to be particularly applicable to digital speech encryption and digital radiotelephone communication systems wherein speech quality, data rate, size, and cost are significant issues.
  • code-excited or "vector-excited” is derived from the fact that the excitation sequence for the speech coder is vector quantized, i.e., a single codeword is used to represent a sequence, or vector, of excitation samples. In this way, data rates of less than one bit per sample are possible for coding the excitation sequence.
  • the stored excitation code vectors generally consist of independent random white Gaussian sequences. One code vector from the codebook is chosen to represent each block of N excitation samples. Each stored code vector is represented by a codeword, i.e., the address of the code vector memory location. It is this codeword that is subsequently sent over a communications channel to the speech synthesizer to reconstruct the speech frame at the receiver. See M.R.
  • CELP Code-Excited Linear Prediction
  • the excitation code vector from the codebook is applied to two time-varying linear filters which model the characteristics of the input speech signal.
  • the first filter includes a long-term predictor in its feedback loop, which has a long delay, i.e., 2 to 15 milliseconds, used to introduce the pitch periodicity of voiced speech.
  • the second filter includes a short- term predictor in its feedback loop, which has a short delay, i.e., less than 2 msec, used to introduce a spectral envelope or format structure.
  • the speech coder applies each individual code vector to the filters to generate a reconstructed speech signal, and compares the original input speech signal to the reconstructed signal to create an error signal.
  • the error signal is then weighted by passing it through a weighting filter having a response based on human auditory perception.
  • the optimum excitation signal is determined by selecting the code vector which produces the weighted error signal having the minimum energy for the current frame.
  • the codeword for the optimum code vector is then transmitted over a communications channel.
  • the codeword received from the channel is used to address the codebook of excitation vectors.
  • the single code vector is then multiplied by a gain factor, and filtered by the long-term and short-term filters to obtain a reconstructed speech vector.
  • the gain factor and the predictor parameters are also obtained from the channel. It has been found that a better quality synthesized signal can be generated if the actual parameter used by the synthesizer are used in the analysis stage, thus minimizing the quantization errors.
  • the use of these synthesis parameters in the CELP speech analysis stage to produce higher quality speech is referred to as analysis-by-synthesis speech coding.
  • the short-term predictor attempts to predict the current output sample s(n) by a linear combination of the immediately preceding output samples s(n-i), according to the equation:
  • s(n) ⁇ s(n-l) + ⁇ 2s(n-2) + . . . + ⁇ p s(n-p) +e(n)
  • p is the order of the short-term predictor
  • e(n) is the prediction residual, i.e., that part of s(n) that cannot be represented by the weighted sum of p previous samples.
  • the predictor order p typically ranges from 8 to 12, assuming an 8 kiloHertz (kHz) sampling rate.
  • the weights ⁇ i, 0:2, ⁇ p , in this equation are called the predictor coefficients.
  • the short-term predictor coefficients are determined from the speech signal using conventional linear predictive coding (LPC) techniques.
  • LPC linear predictive coding
  • the long-term filter on the other hand, must predict the next output sample from preceding samples that extend over a much longer time period. If only a single past sample is used in the predictor, then the predictor is a single-tap predictor. Typically, one to three taps are used.
  • the output response for a long-term filter incorporating a single-tap, long-term predictor is given in z-transform notation as:. 1
  • this output response is a function of only the delay or lag L of the filter and the filter coefficient ⁇ .
  • the lag L would typically be the pitch period of the speech, or a multiple of it.
  • a suitable range for the lag L would be between 16 and 143, which corresponds to a pitch range between 56 and 500 Hz.
  • the long-term predictor lag L and long-term predictor coefficient ⁇ can be determined from either an open-loop or a closed loop configuration. Using the open-loop configuration, the lag L and coefficient ⁇ are computed from the input signal (or its residual) directly. In the closed loop configuration, the lag-L, and the coefficient ⁇ are computed at the frame rate from coded data representing the past output of the long-term filter and the input speech signal. In using the coded data, the long-term predictor lag determination is based on the actual long-term filter state that will exist at the synthesizer. Hence, the closed-loop configuration gives better performance than the open-loop method, since the pitch filter itself is would be contributing to the optimization of the error signal. Moreover, a single-tap predictor works very well in the closed-loop configuration.
  • This technique is straightforward for pitch lags L which are greater than the frame length N, i.e., when L ⁇ N, since the term
  • a pitch of 250 Hz at an 8 kHz sampling rate corresponds to a long-term predictor lag L of 32 samples. It is not desirable, however, to employ frame length N of less than 4 msec, since the CELP excitation vector can be coded more efficiently when longer frame lengths are used. Accordingly, utilizing a frame length time of 7.5 msec at a sampling rate of 8 kHz, the frame length N would be equal to 60 samples. This means only 32 past samples would be available to predict the next 60 samples of the frame, hence, if the long-term predictor lag L is less than the frame length N, only L past samples of the required N samples are defined.
  • a third solution is to reduce the size of the frame length N.
  • the long-term predictor lag L can always be determined from past samples. This approach, however, suffers from a severe bit rate penalty.
  • a shorter frame length a greater number of long-term predictor parameters and excitation vectors must be coded, and accordingly, the bit rate of the channel must be greater to accommodate the extra coding.
  • the sampling rate used in the coder places an upper limit on the performance of a single-tap pitch predictor. For example, if the pitch frequency is actually 485 Hz, the closest lag value would be 16 which corresponds to 500 Hz. This results in an error of 15 Hz for the fundamental pitch frequency which degrades voice quality. This error is multiplied for the harmonics of the pitch frequency causing further degradation.
  • a general object of the present invention is to provide an improved digital speech coding technique that produces high quality speech at low bit rates.
  • a more specific object of the present invention is to provide a method to determine long-term predictor parameters using the closed-loop approach.
  • Another object of the present invention is to provide an improved method for determining the output response of a long- term predictor in the case of when the long-term predictor lag parameter L is a non-integer number.
  • a further object of the present invention is to provide an improved CELP speech coder which permits joint optimization of the gain factor ⁇ and the long-term predictor coefficient ⁇ during the codebook search for the optimum excitation code vector.
  • the resolution of the parameter L is increased by allowing L to take on values which are not integers.
  • This is achieved by the use of interpolating filters to provide interpolated samples of the long- term predictor state.
  • future samples of the long-term predictor state are not available to the interpolating filters.
  • This problem is circumvented by pitch- synchronously extending the long-term predictor state into the future for use by the interpolation filter.
  • the long- term predictor state is updated to reflect the actual excitation samples (replacing those based on the pitch-synchronously extended samples).
  • the interpolation can be used to interpolate one sample between each existing sample thus doubling the resolution of L to half a sample.
  • a higher interpolation factor could also be chosen, such as three or four, which would increase the resolution of L to a third or a fourth of a sample.
  • Figure 1 is a general block diagram of a code-excited linear predictive speech coder, illustrating the location of a long-term filter for use with the present invention
  • Figure 2A is a detailed block diagram of an embodiment of the long-term filter of Figure 1, illustrating the long-term predictor response where filter lag L is an integer;
  • Figure 2B is a simplified diagram of a shift register which can be used to illustrate the operation of the long-term predictor in Figure 2A
  • Figure 2C is a detailed block diagram of another embodiment of the long-term filter of Figure 1, illustrating the long-term predictor response where filter lag L is an integer;
  • Figure 3 is a detailed flowchart diagram illustrating the operations performed by the long-term filter of Figure 2A;
  • Figure 4 is a general block diagram of a speech synthesizer for use in accordance with the present invention
  • Figure 5 is a detailed block diagram of the long-term filter of Figure 1, illustrating the sub-sample resolution long-term predictor response in accordance with the present invention
  • Figures 6A and 6B are detailed flowchart diagrams illustrating the operations performed by the long-term filter of Figure 5
  • Figure 7 is a detailed block diagram of a pitch post filter for intercoupling the short term filter and D/A converter of the speech synthesizer in Figure 4.
  • FIG. 1 there is shown a general block diagram of code excited linear predictive speech coder 100 utilizing the long-term filter in accordance with the present invention.
  • An acoustic input signal to be analyzed is applied to speech coder 100 at microphone 102.
  • the input signal typically a speech signal, is then applied to filter 104.
  • Filter 104 generally will exhibit bandpass filter characteristics. However, if the speech bandwidth is already adequate, filter 104 may comprise a direct wire connection.
  • the analog speech signal from filter 104 is then converted into a sequence of N pulse samples, and the amplitude of each pulse sample is then represented by a digital code in analog-to- digital (A/D) converter 108, as known in the art.
  • the sampling rate is determined by sample clock SC, which represents an 8.0 kHz rate in the preferred embodiment.
  • the sample clock SC is generated along with the frame clock FC via clock 112.
  • A/D 108 which may be represented as input speech vector s(n)
  • This input speech vector s(n) is repetitively obtained in separate frames, i.e., blocks of time, the length of which is determined by the frame clock FC.
  • LPC linear predictive coding
  • the short-term predictor parameters ⁇ i, long- term predictor coefficient ⁇ , nominal long-term predictor lag parameter L, weighting filter parameters WFP, and excitation gain factor 7 (along with the best excitation codeword I as described later) area applied to multiplexer 150 and sent over the channel for use by the speech synthesizer.
  • the input speech vector s(n) is also applied to subtractor 130 the function of which will subsequently be described.
  • Codebook ROM 120 contains a set of M excitation vectors ui(n),. wherein 1 ⁇ i ⁇ M, each comprised of N samples, wherein
  • Codebook ROM 120 generates these pseudorandom excitation vectors in response to a particular one of a set of excitation codewords i.
  • Each of the M excitation vectors are comprised of a series of random white Gaussian samples, although other types of excitation vectors may be used with the present invention. If the excitation signal were coded at a rate of 0.2 bits per sample for each of the 60 samples, then there would be 4096 codewords i corresponding to the possible excitation vectors. For each individual excitation vector u ⁇ (n), a reconstructed speech vector s'j(n) is generated for comparison to the input speech vector s(n).
  • Gain block 122 scales the excitation vector Uj(n) by the excitation gain factor 7 , which is constant for the frame.
  • the excitation gain factor 7 may be pre-computed by coefficient analyzer 110 and used to analyze all excitation vectors as shown in Figure 1, or may be optimized jointly with the search for the best excitation codeword I and generated by codebook search controller 140.
  • the scaled excitation signal 7 u ⁇ (n) is then filtered by long- ter filter 124 and short-term filter 126 to generate the reconstructed speech vector s' ⁇ (n).
  • Filter 124 utilizes the long- term predictor parameters ⁇ and L to introduce voice periodicity
  • filter 126 utilizes the short-term predictor parameters G J to introduce the spectral envelope, as described above.
  • Long-term filter 124 will be described in detail in the following figures. Note that blocks 124 and 126 are actually recursive filters which contain the long-term predictor and short-term predictor in their respective feedback paths.
  • the reconstructed speech vector s'i(n) for the i-th excitation code vector is compared to the same block of the input speech vector s(n) by subtracting these two signals in subtracter 130.
  • the difference vector e ⁇ (n) represents the difference between the original and the reconstructed blocks of speech.
  • the difference vector is perceptually weighted by weighting filter 132, utilizing the weighting filter parameters WTP generated by coefficient analyzer 110. Refer to the preceding reference for a representative weighting filter transfer function. Perceptual weighting accentuates those frequencies where the error is perceptually more important to the human ear, and attenuates other frequencies.
  • Energy calculator 134 computes the energy of the weighted difference vector e'i(n), and applies this error signal E ⁇ to codebook search controller 140.
  • the search controller compares the i-th error signal for the present excitation vector uj(n) against previous error signals to determine the excitation vector producing the minimum error.
  • the code of the i-th excitation vector having a minimum error is then output over the channel as the best excitation code I.
  • search controller 140 may determine a particular codeword which provides an error signal having some predetermined criteria, such as meeting a predefined error threshold.
  • Figure 1 illustrates one embodiment of the invention for a code-excited linear predictive speech coder.
  • the long-term filter parameters L and ⁇ are determined in an open-loop configuration by coefficient analyzer 110.
  • the long-term filter parameters can be determined in a closed- loop configuration as described in the aforementioned Singhal and Atal reference.
  • performance of the speech coder is improved using long-term filter parameters determined in the closed-loop configuration.
  • the novel structure of the long-term predictor according to the present invention greatly facilitates the use of the closed-loop determination of these parameters for lags L less than the frame length N.
  • Figure 2A illustrates an embodiment of long-term filter 124 of Figure 1, where L is constrained to be an integer.
  • Figure 1 shows the scaled excitation vector 7 Uj(n) from gain block 122 as being input to long-term filter 124, a representative input speech vector s(n) has been used in Figure 2A for purposes of explanation, hence, a frame of N samples of input speech vector s(n) is applied to adder 210.
  • the output of adder 210 produces the output vector b(n) for the long-term filter 124.
  • the output vector b(n) is fed back to delay block 230 of the long-term predictor.
  • the nominal long-term predictor lag parameter L is also input to delay block 230.
  • the long-term predictor delay block provides output vector q(n) to long-term predictor multiplier block 220, which scales the long-term predictor response by the long-term predictor coefficient ⁇ .
  • the scaled output ⁇ q(n) is then applied to adder 210 to complete the feedback loop if the recursive filter.
  • the output response H n (z) of long-term filter 124 is defined in Z- transform notation as:
  • n represents a sample number of a frame containing N samples, O ⁇ . n ⁇ , N-l, wherein ⁇ represents a filter coefficient, wherein L represents the nominal lag or delay of the long-term predictor, and wherein [(n+L)/Lj represents the closest integer less than or equal to (n+L)/L.
  • the long-term predictor delay (n+L)/Lj L varies as a function of the sample number n.
  • the actual long-term predictor delay becomes kL, wherein L is the basic or nominal long-term predictor lag, and wherein k is an integer chosen from the set ⁇ 1,
  • the long-term filter output response b(n) is a function of the nominal long-term predictor lag parameter L and the filter state FS which exists at the beginning of the frame. This statement holds true for all values of L — even for the problematic case of when the pitch lag L is less than the frame length N.
  • the function of the long-term predictor delay block 230 is to store the current input samples in order to predict future samples.
  • Figure 2B represents a simplified diagram of a shift register, which may be helpful in understanding the operation of long-term predictor delay block 230 of Figure 2A.
  • the current output sample b(n) is applied to the input of the shift register, which is shown on the right on Figure 2B.
  • the previous sample b(n) is shifted left into the shift register. This sample now becomes the first past sample b(n-l).
  • another sample of b(n) is shifted into the register, and the original sample is again shifted left to become the second past sample b(n-2).
  • the original sample has been shifted left L number of times such that it may be represented as b(n-L).
  • the lag L would typically be the pitch period of voiced speech or a multiple of it.
  • the long-term predictor lag parameter L is shorter than the frame length N, then an insufficient number of samples would have been shifted into the shift register by the beginning of the next frame.
  • the pitch lag L would be equal to 32.
  • b(n-L) would normally be b(27), which represents a future sample with respect to the beginning of the frame of 60 samples.
  • the complete long-term predictor response is needed at the beginning of the frame such that closed-loop analysis of the predictor parameters can be performed.
  • the same stored samples b(n-L), 0 ⁇ n ⁇ L are repeated such that the output response of the long-term predictor is always a function of samples which have been input into the long-term predictor delay block prior to the start of the current frame.
  • the shift register has thus been extended to store another kL samples, which represent modifying the structure of the long-term predictor delay block 230.
  • k must be chosen such that b(n-kL) represents a sample which existed in the shift register prior to he start of the frame.
  • long-term filter 124 of Figure 2A The operation of long-term filter 124 of Figure 2A will now be described in accordance with the flowchart of Figure 3.
  • the sample number n is initialized to zero at step 351.
  • the nominal long-term predictor lag parameter L and the long-term predictor coefficient ⁇ are input from coefficient analyzer 110 in step 352.
  • the sample number n is tested to see if an entire frame has been output. If n ⁇ . N, operation ends at step 361. If all samples have not yet been computed, a signal sample s(n) is input in step 354.
  • step 357 the sample in the shift register is shifted left one position, for all register locations between b(n-2) and b(n-LMAX) » where LM X represents the maximum long-term predictor lag that can be assigned. In the preferred embodiment, LMAX would be equal to 143.
  • step 358 the output sample b(n) is input into the first location b(n-l) of the shift register.
  • Step 359 outputs the filtered sample b(n). The sample number n is then incremented in step 360, and then tested in step 353. When all N samples have been computed, the process ends at step 361.
  • Figure 2C is an alternative embodiment of a long-term filter incorporating the present invention.
  • Filter 124' is the feedforward inverse version of the recursive filter configuration of Figure 2A.
  • Input vector s(n) is applied to both subtracter 240 and long-term predictor delay block 260. Delayed vector q(n) is output to multiplier 250, which scales the vector by the long-term predictor coefficient ⁇ .
  • n represents the sample number of a frame containing
  • the structure of the long-term predictor has again been modified so as to repeatedly output the same stored samples of the long- term predictor in the case of when the long-term predictor lag L is less than the frame length N.
  • Figure 5 there is illustrated the preferred embodiment of the long-term filter 124 of Figure 1 which allows for subsample resolution for the lag parameter L.
  • a frame of N samples of input speech vector s(n) is applied to adder 510,
  • the output of adder 510 produces the output vector b(n) for the long term filter 124.
  • the output vector b(n) is fed back to delayed vector generator block 530 of the long-term predictor.
  • the nominal long- term predictor lag parameter L is also input to delayed vector generator block 530.
  • the long-term predictor lag parameter L can take on non-integer values.
  • the preferred embodiment allows L to take on values which are a multiple of one half. Alternate implementations of the sub-sample resolution long-term predictor of the present invention could allow values which are multiples of one third or one fourth or any other rational fraction.
  • the delayed vector generator 530 includes a memory which holds past samples of b(n).
  • interpolated samples of b(n) are also calculated by delayed vector generator 530 and stored in its memory.
  • the state of the long-term predictor which is contained in delayed vector generator 530 has two samples for every stored sample of b(n).
  • samples of b(n) can be obtained from delayed vector generator 530 which correspond to integer delays or multiples of half sample delays.
  • the interpolation is done using interpolating finite impulse response filters as described in the book by R. Crochiere and L. Rabiner entitled Multirate Digital Signal Processing, published by Prentice Hall in 1983.
  • the operation of vector delay generator 530 is described in further detail hereinbelow in conjunction with the flowcharts in Figure 6A and 6B.
  • Delayed vector generator 530 provides output vector q(n) to long-term multiplier block 520, which scales the long-term predictor response by the long-term predictor coefficient ⁇ .
  • the scaled output ⁇ q(n) is then applied to adder 510 to complete the feedback loop of the recursive filter 124 in Figure 5.
  • Figures 6A and 6B there are illustrated detailed flowchart diagrams detailing the operations performed by the long-term filter of Figure 5.
  • the resolution of the long- term predictor memory is extended by mapping an N point sequence b(n), onto a 2N point vector ex(i).
  • the negative indexed samples of ex(i) contain the extended resolution past values of long-term filter output b(n), or the extended resolution long term history.
  • the mapping process doubles the temporal resolution of the long-term predictor memory, each time it is applied.
  • single stage mapping is described, although additional stages may be implemented in other embodiments of the present invention.
  • step 604 L, ⁇ and s(n) are inputted.
  • long term predictor lag L may be the pitch period or a multiple of the pitch period.
  • L may be an integer or a real number whose fractional part is 0.5 in the preferred embodiment. When the fractional part of L is 0.5, L has an effective resolution of half a sample.
  • step 612 vector b(n) of the long-term filter is outputted.
  • step 614 the extended resolution state ex(n) is updated to generate and store the interpolated values of b(n) in the memory of delayed vector generator 530. Step 614 is illustrated in more detail in Figure 6B.
  • step 616 the process has been completed and stops.
  • the interpolated samples of ex(i) initialized to zero are reconstructed through FIR interpolation, using a symmetric, zero-phase shift filter, assuming that the order of such FIR filter is 2M+1 as explained hereinabove.
  • the minimum value of 2L to be used in this scheme is 2M+1.
  • the parameter ⁇ the history extension scaling factor
  • which is the pitch predictor coefficient, or set to unity.
  • the process has been completed and stops.
  • FIG 4 a speech synthesizer block diagram is illustrated using the long-term filter of the present invention.
  • Synthesizer 400 obtains the short-term predictor parameters cq, long-term predictor parameters ⁇ and L, excitation gain factor 7 and the codeword I received from the channel, via de-multiplexer 450.
  • the codeword I is applied to codebook ROM 420 to address the codebook of excitation vectors.
  • the single excitation vector u ⁇ (n) is then multiplied by the gain factor 7 in block 422, filtered by long-term predictor filter 424 and short-term predictor filter 426 to obtain reconstructed speech vector s' ⁇ (n).
  • This vector which represents a frame of reconstructed speech, is then applied to analog-to-digital (A D) converter 408 to produce a reconstructed analog signal, which is then low pass filtered to reduce aliasing by filter 404, and applied to an output transducer such as speaker 402.
  • a D analog-to-digital
  • the CELP synthesizer utilizes the same codebook, gain block, long-term filter, and short-term filter as the CELP analyzer of Figure 1.
  • FIG. 7 is a detailed block diagram of a pitch post filter for intercoupling the short term filter 426 and D/A converter 408 of the speech synthesizer in Figure 4.
  • a pitch post filter enhances the speech quality by removing noise introduced by the filters 424 and 426.
  • a frame of N samples of reconstructed speech vector s' ⁇ (n) is applied to adder 710.
  • the output of adder 710 produces the output vector s" ⁇ (n) for the pitch post filter.
  • the output vector s" ⁇ (n) is fed back to delayed sample generator block 730 of the pitch post filter.
  • the nominal long-term predictor lag parameter L is also input to delayed sample generator block 730. L may take on non-integer values for the present invention.
  • Delayed sample generator 730 provides output vector q(n) to multiplier block 720, which scales the pitch post filter response by coefficient R which is a function of the long-term predictor coefficient ⁇ . The scaled output Rq(n) is then applied to adder 710 to complete the feedback loop of the pitch post filter in Figure 7.
  • the excitation gain factor 7 and the long- term predictor coefficient ⁇ can be simultaneously optimized for all values of L in a closed-loop configuration.
  • This joint optimization technique was heretofore impractical for values of L ⁇ N, since the joint optimization equations would become nonlinear in the single parameter ⁇ .
  • the present invention modifies the structure of the long-term predictor to allow a linear joint optimization equation.
  • the present invention allows the long-term predictor lag to have better resolution than one sample thereby enhancing its performance.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Complex Calculations (AREA)
  • Analogue/Digital Conversion (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Near-Field Transmission Systems (AREA)
PCT/US1990/003625 1989-09-01 1990-06-25 Digital speech coder having improved sub-sample resolution long-term predictor WO1991003790A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP91905041A EP0450064B2 (de) 1989-09-01 1990-06-25 Numerischer sprachkodierer mit verbesserter langzeitvorhersage durch subabtastauflösung
DK91905041T DK0450064T4 (da) 1989-09-01 1990-06-25 Digital talekoder med forbedret langtidsforudsigter med subsampleoplösning
AT91905041T ATE191987T1 (de) 1989-09-01 1990-06-25 Numerischer sprachkodierer mit verbesserter langzeitvorhersage durch subabtastauflösung
DE69033510T DE69033510T3 (de) 1989-09-01 1990-06-25 Numerischer sprachcodierer mit verbesserter langzeitvorhersage durch subabtastauflösung

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US40220689A 1989-09-01 1989-09-01
US402,206 1989-09-01

Publications (1)

Publication Number Publication Date
WO1991003790A1 true WO1991003790A1 (en) 1991-03-21

Family

ID=23590969

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1990/003625 WO1991003790A1 (en) 1989-09-01 1990-06-25 Digital speech coder having improved sub-sample resolution long-term predictor

Country Status (12)

Country Link
EP (1) EP0450064B2 (de)
JP (1) JP3268360B2 (de)
CN (1) CN1026274C (de)
AT (1) ATE191987T1 (de)
AU (1) AU634795B2 (de)
CA (1) CA2037899C (de)
DE (1) DE69033510T3 (de)
DK (1) DK0450064T4 (de)
ES (1) ES2145737T5 (de)
MX (1) MX167644B (de)
SG (1) SG47028A1 (de)
WO (1) WO1991003790A1 (de)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0573216A2 (de) * 1992-06-04 1993-12-08 AT&T Corp. CELP-Vocoder
FR2702590A1 (fr) * 1993-03-12 1994-09-16 Massaloux Dominique Dispositif de codage et de décodage numériques de la parole, procédé d'exploration d'un dictionnaire pseudo-logarithmique de délais LTP, et procédé d'analyse LTP.
EP0623916A1 (de) 1993-05-06 1994-11-09 Nokia Mobile Phones Ltd. Verfahren und Vorrichtung zum Einbau eines langterm Synthesefilters
WO1995029480A2 (en) * 1994-04-22 1995-11-02 Philips Electronics N.V. Analogue signal coder
EP0689191A3 (de) * 1994-06-22 1997-05-28 Philips Patentverwaltung Mobilfunkendgerät
EP0689195A3 (de) * 1994-06-21 1997-10-15 Nec Corp Verfahren und Vorrichtung zur Kodierung eines Anregungssignals
US5708757A (en) * 1996-04-22 1998-01-13 France Telecom Method of determining parameters of a pitch synthesis filter in a speech coder, and speech coder implementing such method
US5899968A (en) * 1995-01-06 1999-05-04 Matra Corporation Speech coding method using synthesis analysis using iterative calculation of excitation weights
US5963898A (en) * 1995-01-06 1999-10-05 Matra Communications Analysis-by-synthesis speech coding method with truncation of the impulse response of a perceptual weighting filter
US5974377A (en) * 1995-01-06 1999-10-26 Matra Communication Analysis-by-synthesis speech coding method with open-loop and closed-loop search of a long-term prediction delay
US7467083B2 (en) 2001-01-25 2008-12-16 Sony Corporation Data processing apparatus
WO2010079167A1 (en) * 2009-01-06 2010-07-15 Skype Limited Speech coding
US8396706B2 (en) 2009-01-06 2013-03-12 Skype Speech coding
WO2013056388A1 (en) * 2011-10-18 2013-04-25 Telefonaktiebolaget L M Ericsson (Publ) An improved method and apparatus for adaptive multi rate codec
US8655653B2 (en) 2009-01-06 2014-02-18 Skype Speech coding by quantizing with random-noise signal
US8849658B2 (en) 2009-01-06 2014-09-30 Skype Speech encoding utilizing independent manipulation of signal and noise spectrum
FR3015754A1 (fr) * 2013-12-20 2015-06-26 Orange Re-echantillonnage d'un signal audio cadence a une frequence d'echantillonnage variable selon la trame
US9530423B2 (en) 2009-01-06 2016-12-27 Skype Speech encoding by determining a quantization gain based on inverse of a pitch correlation

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4857468B2 (ja) * 2001-01-25 2012-01-18 ソニー株式会社 データ処理装置およびデータ処理方法、並びにプログラムおよび記録媒体

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4080660A (en) * 1975-07-11 1978-03-21 James Nickolas Constant Digital signal time scale inversion
US4573135A (en) * 1983-04-25 1986-02-25 Rca Corporation Digital lowpass filter having controllable gain
US4918729A (en) * 1988-01-05 1990-04-17 Kabushiki Kaisha Toshiba Voice signal encoding and decoding apparatus and method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NL168669C (nl) * 1974-09-16 1982-04-16 Philips Nv Interpolerend digitaal filter met ingangsbuffer.
US4020332A (en) * 1975-09-24 1977-04-26 Bell Telephone Laboratories, Incorporated Interpolation-decimation circuit for increasing or decreasing digital sampling frequency
NL8105801A (nl) * 1981-12-23 1983-07-18 Philips Nv Recursief digitaal filter.
JPS60116000A (ja) * 1983-11-28 1985-06-22 ケイディディ株式会社 音声符号化装置
JPS63214032A (ja) * 1987-03-02 1988-09-06 Fujitsu Ltd 符号化伝送装置
JPS63249200A (ja) * 1987-04-06 1988-10-17 日本電信電話株式会社 ベクトル量子化方式

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4080660A (en) * 1975-07-11 1978-03-21 James Nickolas Constant Digital signal time scale inversion
US4573135A (en) * 1983-04-25 1986-02-25 Rca Corporation Digital lowpass filter having controllable gain
US4918729A (en) * 1988-01-05 1990-04-17 Kabushiki Kaisha Toshiba Voice signal encoding and decoding apparatus and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. ASSP-32, No. 4, August 1974 (BELLANGER et al) "Interpolation, Extrapolation, and Reduction of Computation Speed in Digital Filters," pages 231-235. *
Proceedings of the IEEE, Vol. 64, No. 6, June 1973 (SCHAFER et al) "A Digital Signal Processing Approach to Interpolation", pages 692-702. *

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0573216A3 (en) * 1992-06-04 1994-07-13 At & T Corp Celp vocoder
EP0573216A2 (de) * 1992-06-04 1993-12-08 AT&T Corp. CELP-Vocoder
FR2702590A1 (fr) * 1993-03-12 1994-09-16 Massaloux Dominique Dispositif de codage et de décodage numériques de la parole, procédé d'exploration d'un dictionnaire pseudo-logarithmique de délais LTP, et procédé d'analyse LTP.
EP0616315A1 (de) * 1993-03-12 1994-09-21 France Telecom Vorrichtung zur digitalen Sprachkodierung und -dekodierung, Verfahren zum Durchsuchen eines pseudologarithmischen LTP-Verzögerungskodebuchs und Verfahren zur LTP-Analyse
US5704002A (en) * 1993-03-12 1997-12-30 France Telecom Etablissement Autonome De Droit Public Process and device for minimizing an error in a speech signal using a residue signal and a synthesized excitation signal
EP0623916A1 (de) 1993-05-06 1994-11-09 Nokia Mobile Phones Ltd. Verfahren und Vorrichtung zum Einbau eines langterm Synthesefilters
US5793930A (en) * 1994-04-22 1998-08-11 U.S. Philips Corporation Analogue signal coder
WO1995029480A2 (en) * 1994-04-22 1995-11-02 Philips Electronics N.V. Analogue signal coder
WO1995029480A3 (en) * 1994-04-22 1995-12-07 Philips Electronics Nv Analogue signal coder
EP0689195A3 (de) * 1994-06-21 1997-10-15 Nec Corp Verfahren und Vorrichtung zur Kodierung eines Anregungssignals
EP0689191A3 (de) * 1994-06-22 1997-05-28 Philips Patentverwaltung Mobilfunkendgerät
US5974377A (en) * 1995-01-06 1999-10-26 Matra Communication Analysis-by-synthesis speech coding method with open-loop and closed-loop search of a long-term prediction delay
US5899968A (en) * 1995-01-06 1999-05-04 Matra Corporation Speech coding method using synthesis analysis using iterative calculation of excitation weights
US5963898A (en) * 1995-01-06 1999-10-05 Matra Communications Analysis-by-synthesis speech coding method with truncation of the impulse response of a perceptual weighting filter
US5708757A (en) * 1996-04-22 1998-01-13 France Telecom Method of determining parameters of a pitch synthesis filter in a speech coder, and speech coder implementing such method
US7467083B2 (en) 2001-01-25 2008-12-16 Sony Corporation Data processing apparatus
US8849658B2 (en) 2009-01-06 2014-09-30 Skype Speech encoding utilizing independent manipulation of signal and noise spectrum
US8396706B2 (en) 2009-01-06 2013-03-12 Skype Speech coding
US8655653B2 (en) 2009-01-06 2014-02-18 Skype Speech coding by quantizing with random-noise signal
WO2010079167A1 (en) * 2009-01-06 2010-07-15 Skype Limited Speech coding
US9263051B2 (en) 2009-01-06 2016-02-16 Skype Speech coding by quantizing with random-noise signal
US9530423B2 (en) 2009-01-06 2016-12-27 Skype Speech encoding by determining a quantization gain based on inverse of a pitch correlation
US10026411B2 (en) 2009-01-06 2018-07-17 Skype Speech encoding utilizing independent manipulation of signal and noise spectrum
WO2013056388A1 (en) * 2011-10-18 2013-04-25 Telefonaktiebolaget L M Ericsson (Publ) An improved method and apparatus for adaptive multi rate codec
EP2761616A4 (de) * 2011-10-18 2015-06-24 Ericsson Telefon Ab L M Verbessertes verfahren und vorrichtung für einen adaptiven multiraten-codec
FR3015754A1 (fr) * 2013-12-20 2015-06-26 Orange Re-echantillonnage d'un signal audio cadence a une frequence d'echantillonnage variable selon la trame
WO2015092229A3 (fr) * 2013-12-20 2015-11-19 Orange Ré-échantillonnage d'un signal audio cadencé à une fréquence d'échantillonnage variable selon la trame
US9940943B2 (en) 2013-12-20 2018-04-10 Orange Resampling of an audio signal interrupted with a variable sampling frequency according to the frame

Also Published As

Publication number Publication date
EP0450064B2 (de) 2006-08-09
MX167644B (es) 1993-03-31
DE69033510D1 (de) 2000-05-25
DK0450064T3 (da) 2000-10-02
EP0450064A1 (de) 1991-10-09
DK0450064T4 (da) 2006-09-04
CN1050633A (zh) 1991-04-10
JP3268360B2 (ja) 2002-03-25
SG47028A1 (en) 1998-03-20
EP0450064A4 (en) 1995-04-05
ES2145737T5 (es) 2007-03-01
EP0450064B1 (de) 2000-04-19
CN1026274C (zh) 1994-10-19
CA2037899A1 (en) 1991-03-02
AU634795B2 (en) 1993-03-04
ES2145737T3 (es) 2000-07-16
DE69033510T3 (de) 2007-06-06
DE69033510T2 (de) 2000-11-23
ATE191987T1 (de) 2000-05-15
JPH04502675A (ja) 1992-05-14
AU5952590A (en) 1991-04-08
CA2037899C (en) 1996-09-17

Similar Documents

Publication Publication Date Title
US5359696A (en) Digital speech coder having improved sub-sample resolution long-term predictor
EP0450064B1 (de) Numerischer sprachkodierer mit verbesserter langzeitvorhersage durch subabtastauflösung
JP2523031B2 (ja) 改良されたベクトル励起源を有するデジタル音声コ―ダ
RU2417457C2 (ru) Способ конкатенации кадров в системе связи
US6694292B2 (en) Apparatus for encoding and apparatus for decoding speech and musical signals
US5903866A (en) Waveform interpolation speech coding using splines
KR100304682B1 (ko) 음성 코더용 고속 여기 코딩
RU2679228C2 (ru) Передискретизация звукового сигнала для кодирования/декодирования с малой задержкой
JP2003512654A (ja) 音声の可変レートコーディングのための方法およびその装置
WO1994023426A1 (en) Vector quantizer method and apparatus
KR19980080463A (ko) 코드여기 선형예측 음성코더내에서의 벡터 양자화 방법
EP0415675B1 (de) Codierung unter Anwendung von beschränkter stochastischer Anregung
US5924061A (en) Efficient decomposition in noise and periodic signal waveforms in waveform interpolation
JP3070955B2 (ja) 音声符号器において使用するためのスペクトルノイズ重み付けフィルタを発生する方法
US7337110B2 (en) Structured VSELP codebook for low complexity search
JP3168238B2 (ja) 再構成音声信号の周期性を増大させる方法および装置
US6041298A (en) Method for synthesizing a frame of a speech signal with a computed stochastic excitation part
KR100341398B1 (ko) 씨이엘피형 보코더의 코드북 검색 방법
JP3749838B2 (ja) 音響信号符号化方法、音響信号復号方法、これらの装置、これらのプログラム及びその記録媒体
JPH05273998A (ja) 音声符号化装置
JP4007730B2 (ja) 音声符号化装置、音声符号化方法および音声符号化アルゴリズムを記録したコンピュータ読み取り可能な記録媒体
KR950001437B1 (ko) 음성부호화방법
Kao Thesis Report
JPH0588699A (ja) 音声駆動信号のベクトル量子化方式
Eng Pitch Modelling for Speech Coding at 4.8 kbitsls

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU CA JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB IT LU NL SE

WWE Wipo information: entry into national phase

Ref document number: 2037899

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 1991905041

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1991905041

Country of ref document: EP

ENP Entry into the national phase

Ref country code: CA

Ref document number: 2037899

Kind code of ref document: A

Format of ref document f/p: F

WWG Wipo information: grant in national office

Ref document number: 1991905041

Country of ref document: EP