WO1995006310A1 - Adaptive speech coder having code excited linear prediction - Google Patents

Adaptive speech coder having code excited linear prediction Download PDF

Info

Publication number
WO1995006310A1
WO1995006310A1 PCT/US1993/008095 US9308095W WO9506310A1 WO 1995006310 A1 WO1995006310 A1 WO 1995006310A1 US 9308095 W US9308095 W US 9308095W WO 9506310 A1 WO9506310 A1 WO 9506310A1
Authority
WO
WIPO (PCT)
Prior art keywords
speech
codevector
coder
transmission
signal
Prior art date
Application number
PCT/US1993/008095
Other languages
English (en)
French (fr)
Inventor
Harprit S. Chhatwal
Original Assignee
Pacific Communication Sciences, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pacific Communication Sciences, Inc. filed Critical Pacific Communication Sciences, Inc.
Priority to JP7507532A priority Critical patent/JPH09506182A/ja
Priority to AU50951/93A priority patent/AU5095193A/en
Priority to EP93920386A priority patent/EP0803117A1/en
Priority to PCT/US1993/008095 priority patent/WO1995006310A1/en
Publication of WO1995006310A1 publication Critical patent/WO1995006310A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0013Codebook search algorithms

Definitions

  • the present invention relates to the field of speech coding, and more particularly, to improvements in the field of adaptive coding of speech or voice signals wherein code excited linear prediction (CELP) techniques are utilized.
  • CELP code excited linear prediction
  • Digital telecommunication carrier systems have existed in the United States since approximately 1962 when the Tl system was introduced. This system utilized a 24-voice channel digital signal transmitted at an overall rate of 1.544 Mb/s. In view of cost advantages over existing analog systems, the Tl system became widely deployed.
  • An individual voice channel in the Tl system was typically generated by band limiting a voice signal in a frequency range from about 300 to 3400 Hz, sampling the limited signal at a rate of 8 kHz, and thereafter encoding the sampled signal with an 8 bit logarithmic quantizer.
  • the resultant digital voice signal was a 64 kb/s signal .
  • 24 individual digital voice signals were multiplexed into a single data stream.
  • the Tl system is limited to 24 voice channels if 64 kb/s voice signals are used.
  • the individual signal transmission rate must be reduced from 64 kb/s to some lower rate.
  • the problem with lowering the transmission rate in the typical Tl voice signal generation scheme, by either reducing the sampling rate or reducing the size of the quantizer, is that certain portions of the voice signal essential for accurate reproduction of the original speech is lost.
  • TC transform coding
  • ATC adaptive transform coding
  • LPC linear prediction coding
  • CELP code excited linear prediction
  • a speech signal is divided into sequential blocks of speech samples.
  • the samples in each block are arranged in a vector and transformed from the time domain to an alternate domain, such as the frequency domain.
  • each block of speech samples is analyzed in order to determine the linear prediction coefficients for that block and other information such as long term predictors (LTP) .
  • LTP long term predictors
  • Linear prediction coefficients are equation components which reflect certain aspects of the spectral envelope associated with a particular block of speech signal samples. Such spectral information represents the dynamic properties of speech, namely formants.
  • Speech is produced by generating an excitation signal which is either periodic (voiced sounds) , aperiodic (unvoiced sounds) , or a mixture (eg. voiced fricatives) .
  • the periodic component of the excitation signal is known as the pitch.
  • the excitation signal is filtered by a vocal tract filter, determined by the position of the mouth, jaw, lips, nasal cavity, etc. This filter has resonances or formants which determine the nature of the sound being heard.
  • the vocal tract filter provides an envelope to the excitation signal. Since this envelope contains the filter formants, it is known as the formant or spectral envelope. It is this spectral envelope which is reflected in the linear prediction coefficients .
  • Long Term Predictors are filters reflective of redundant pitch structure in the speech signal.
  • Such structure is removed by estimating the LTP values for each block and subtracting those values from current signal values.
  • the removal of such information permits the speech signal to be converted to a digital signal using fewer bits.
  • the LTP values are transmitted separately and added back to the remaining speech signal at the receiver.
  • a generalized prior art LPC vocoder is shown in Fig. 1.
  • the device shown converts transmitted digital signals into synthesized voice signals, i.e., blocks of synthesized speech samples.
  • a synthesis filter utilizing the LPCs determined for a given block of samples, produces a synthesized speech output by filtering the excitation signal in relation to the LPCs.
  • Both the synthesis filter coefficients (LPCs) and the excitation signal are updated for each sample block or frame (i.e. every 20-30 milliseconds) .
  • the excitation signal can be either a periodic excitation signal or a noise excitation signal .
  • synthesized speech produced by an LPC vocoder can be broken down into three basic elements:
  • the speech signal has a definite pitch period (or periodicity) and this is accounted for by the periodic excitation signal which is composed largely of pulses spaced at the pitch period (determined from the LTP) ;
  • the speech signal is much more like random noise and has no periodicity and this is provided for by the noise excitation signal .
  • a switch controls which form of excitation signal is fed to the synthesis filter.
  • the gain controls the actual volume level of the output speech.
  • Both types of excitation (2) and (3) are, therefore, very different in the time domain (one being made up of equally spaced pulses while the other is noise-like) but both have the common property of a flat spectrum in the frequency domain. The correct spectral shape will be provided at the output of the synthesis by the LPCs.
  • LPC vocoder requires the transmission of only the LPCs and the excitation information, i.e., whether the switch provides periodic or noise-like excitation to the speech synthesizer. Consequently, a reduced bit rate can be used to transmit speech signals processed in an LPC vocoder.
  • CELP vocoders overcome this problem by leaving ON both the periodic and noise-like signals at the same time.
  • the degree to which each of these signals makes up the excitation signal (e(n)) for provision to the synthesis filter is determined by separate gains which are assigned to each of the two excitations.
  • e(n) j ⁇ -p(n) + g-c(n) (1)
  • p(n) pulse-like periodic component
  • c(n) noise-like component
  • gain for periodic component
  • g gain for noise component
  • LPC vocoders During a coding operation in an LPC vocoder, the input speech is analyzed in a step-by-step manner to determine what the most likely value is for the pitch period of the input speech. The important point to note is that this decision about the best pitch period is final. There is no comparison made against other possible pitch periods .
  • the CELP vocoder has stored within it several hundred (or possibly several thousand) noise-like signals each of which is one frame long.
  • the CELP vocoder uses each of these noise- like signals, in turn, to synthesize output speech and chooses the one which produces the minimum error between the input and synthesized speech signals, i.e., another closed-loop procedure.
  • This stored set of noise-like signals is known as a codebook and the process of searching through each of the codebook signals in turn to find the best one is known as a codebook search.
  • the major advantage of the closed-loop CELP approach is that, at the end of the search, the best possible values have been chosen for a given input speech signal - leading to major improvements in speech quality.
  • CELP coding techniques require the transmission of only the LPC values, LTP values' and address of the chosen codebook signal . It is not necessary to transmit an excitation signal. Consequently, CELP coding techniques are particularly desirable to increase the number of voice channels in the Tl system.
  • the primary disadvantage with current CELP coding techniques is the amount of computing power required. In CELP coding it is necessary to search a large set of possible pitch values and codebook entries. The high complexity of the traditional CELP approach is only incurred at the transmitter since the receiver consists of just the simple synthesis structure shown in Fig. 2.
  • the present invention overcomes the need to perform traditional codebook searching. In order to understand the significance of such an improvement, it is helpful to review the traditional CELP coding techniques.
  • the general CELP speech signal conversion operation is shown in Fig. 3.
  • the order of conversion processes is as follows: (i) compute LPC coefficients, (ii) use LPC coefficients in determining LTP parameters (i.e. best pitch period and corresponding gain ⁇ ) , (iii) use LPC coefficients and LTP parameters in a codebook search to determine the codebook parameters (i.e. the best codeword c (n) and corresponding gain g) .
  • LTP parameters i.e. best pitch period and corresponding gain ⁇
  • LPC coefficients and LTP parameters in a codebook search to determine the codebook parameters (i.e. the best codeword c (n) and corresponding gain g) .
  • the codebook signal c (n) can be represented in matrix form by an (N-by-1) vector c. This vector will have exactly the same elements as c (n) except in matrix form.
  • the operation of filtering c by the impulse response of the LPC synthesis filter A can be represented by the matrix multiple Ac. This multiple produces the same result as the signal y(n) in equation (3) for ⁇ equal to zero.
  • r and e are the (N-by-1) vector representations of the signals r(n) , e (n) (the ringing signal and the excitation signal) respectively.
  • the result is the same as equation (4) but now in matrix form. From equation (1) , the synthesized speech signal can be rewritten in matrix form as: s'
  • equation (6) can be rearranged as: gAc ⁇ s - r - 0Ap (7)
  • the input speech signal has the ringing vector r removed.
  • the LTP vector p i.e. the pitch or periodic component p(n) of the excitation
  • Ap the LPC synthesis filter
  • the resulting signal is the so-called target vector x which is approximated by the term gAc.
  • C ⁇ Gi the important variables which must be computed.
  • the codebook is populated by many hundreds of possible vectors c. Consequently, it is desirable not to form Ac or c ⁇ for each possible codebook vector.
  • the selected codebook vector is that vector associated with the largest value for:
  • the correct gain g for a given codebook vector is given by:
  • codebook search involves the following steps for each vector: scaling the vector; filtering the vector by long term predictor components to add pitch information to the vector; filtering the vector by short term predictors to add spectral information; subtracting the scaled and double filtered vector from the original speech signal and analyzing the answer to determine whether the best codebook vector has been chosen.
  • the problems of the prior art are overcome and the advantages of the invention are achieved in an apparatus and method for speech coding in which analog speech signals are converted to digital speech signals for transmission.
  • the speech coder utilizing CELP techniques, includes a first filter for filtering out the spectral information from the speech signal. The spectral information is provided for transmission.
  • a second filter is provided for filtering out the pitch information from the speech signal and such pitch information is also provided for transmission.
  • a codevector generator determines, in one embodiment, the characteristics of a bi-pulse codevector representative of the speech signal. In this embodiment the impulse response of the first filter is truncated for determining the codevector characteristics.
  • the codevector generator includes a transformer for transforming codevector possibilities from being representative of pulse-like sound to being representative of noise-like sound. It is especially preferred for the transform to be a Hadamard transform. It is also preferred to scramble the transformed codevector to modify the sequency properties.
  • the bi- pulse codevector generator and the scrambled codevector generator are combined with a single pulse codevector generator. In such an embodiment, it is preferred to include a comparator for evaluating the characteristics determined by the three codebook generators and choosing the output of the one providing the best codebook vector.
  • FIG. 1 is a block diagram of a prior art generalized
  • Fig. 2 is a block diagram of a prior art generalized CELP vocoder-receiver
  • Fig. 3 is a block diagram of a prior art generalized CELP vocoder-transmitter
  • Fig. 4 is a flow chart of a prior art CELP codebook search
  • Fig. 5 is a schematic view of an adaptive speech coder in accordance with the present invention
  • Fig. 6 is a general flow chart of those operations performed in the adaptive coder shown in Fig. 5, prior to transmission;
  • Fig. 7 is a flow chart of a codebook search technique in accordance with the present invention
  • Fig. 8 is a flow chart of another codebook search technique in accordance with the present invention
  • Fig. 9 is a flow chart of those operations performed in the adaptive transform coder shown in Fig. 5, subsequent to reception to perform speech synthesis.
  • the present invention is embodied in a new and novel apparatus and method for adaptive speech coding wherein rates have been significantly reduced.
  • the present invention enhances CELP coding for reduced transmission rates by providing more efficient methods for performing a codebook search.
  • FIG. 5 An adaptive CELP coder constructed in accordance with the present invention is depicted in Fig. 5 and is generally referred to as 10.
  • the heart of coder 10 is a digital signal processor 12, which in the preferred embodiment is a TMS320C51 digital signal processor manufactured and sold by Texas Instruments, Inc. of Houston, Texas. Such a processor is capable of processing pulse code modulated signals having a word length of 16 bits.
  • Processor 12 is shown to be connected to three major bus networks, namely serial port bus 14, address bus 16, and data bus 18.
  • Program memory 20 is provided for storing the programming to be utilized by processor 12 in order to perform CELP coding techniques in accordance with the present invention. Such programming is explained in greater detail in reference to Figs. 6 through 9.
  • Program memory 20 can be of any conventional design, provided it has sufficient speed to meet the specification requirements of processor 12. It should be noted that the processor of the preferred embodiment (TMS320C51) is equipped with an internal memory. Data memory 22 is provided for the storing of data which may be needed during the operation of processor 12.
  • a clock signal is provided by conventional clock signal generation circuitry (not shown) to clock input 2 .
  • the clock signal provided to input 24 is a 20 MHz clock signal.
  • a reset input 26 is also provided for resetting processor 12 at appropriate times, such as when processor 12 is first activated. Any conventional circuitry may be utilized for providing a signal to input 26, as long as such signal meets the specifications called for by the chosen processor.
  • Processor 12 is connected to transmit and receive telecommunication signals in two ways. First, when communicating with CELP coders constructed in accordance with the present invention, processor 12 is connected to receive and transmit signals via serial port bus 14.
  • Channel interface 28 is provided in order to interface bus 14 with the compressed voice data stream. Interface 28 can be any known interface capable of transmitting and receiving data in conjunction with a data stream operating at the prescribed transmission rate.
  • processor 12 when communicating with existing 64 kb/s channels or with analog devices, processor 12 is connected to receive and transmit signals via data bus 18.
  • Converter 30 is provided to convert individual 64 kb/s channels appearing at input 32 from a serial format to a parallel format for application to bus 18. As will be appreciated, such conversion is accomplished utilizing known codecs and serial/parallel devices which are capable of use with the types of signals utilized by processor 12.
  • processor 12 receives and transmits parallel 16 bit signals on bus 18.
  • an interrupt signal is provided to processor 12 at input 34.
  • analog interface 36 serves to convert analog signals by sampling such signals at a predetermined rate for presentation to converter 30.
  • interface 36 converts the sampled signal from converter 30 to a continuous signal.
  • Adaptive speech coding for transmission of telecommunications signals in accordance with the CELP techniques of the present invention is shown in Fig. 6.
  • Telecommunication signals to be coded and transmitted appear on bus 18 and are presented to input buffer 40.
  • Such telecommunication signals are sampled signals made up of 16 bit PCM representations of each sample where sampling occurs at a frequency of 8 kHz. For purposes of the present description, assume that a voice signal sampled at 8 kHz is to be coded for transmission.
  • Buffer 40 accumulates a predetermined number of samples into a sample block.
  • LPCs are determined for each block of speech samples at 42.
  • the technique for determining the LPCs can be any desired technique such as that described in U.S. Patent No. 5,012,517 - Wilson et al. , incorporated herein by reference. It is noted that the cited U.S. Patent concerns adaptive transform coding, however, the techniques described for determining LPCs are applicable to the present invention.
  • the determined LPCs are formatted for transmission as side information at 44.
  • the determined LPCs are also provided for LTP processing at 46, particularly to form the LPC synthesis filter.
  • LTPs are determined for each block of speech samples at 46.
  • the periodicity or pitch based information can be determined through the use of any known technique such as that described previously.
  • the fundamental prerequisite for deriving an LTP filter is the calculation of a precise pitch or fundamental frequency estimate.
  • the determined LTPs are also formatted for transmission as side information.
  • the ringing vector associated with the synthesis filter is removed from the speech signal and the vector p (representative of LTP pitch information) is removed from the speech signal in accordance with equation (7) , thereby forming the target vector x.
  • the so-modified speech signal is thereafter provided for codebook searching in accordance with the present invention.
  • codebook searching three forms are performed in the present invention, namely, bi- pulse searching at 50, scrambled searching at 52 and single pulse searching at 54.
  • bi-pulse searching technique shown in Fig. 7. It will be recalled that codebooks can be populated by many hundreds of possible vectors c. Since it is not desirable to form Ac or ⁇ A for each possible vector, precomputing two variables occurs before the codebook search, the (N-by-1) vector d and the (N-by-N) matrix F (equation 9) . The process of pre-forming d by backward filtering is performed at 60.
  • codebook vectors c Two major requirements on codebook vectors c are (i) that they have a flat frequency spectrum (since they will be shaped into the correct form for each particular sound by the synthesis filter) and (ii) that each codeword is sufficiently different from each other so that entries in the codebook are not wasted by having several almost identical to each other.
  • all the entries in the codebook effectively consist of an (N-by-1) vector which is zero in all of its N samples except for two entries which are +1 and -1 respectively.
  • the preferred value of N is 64, however, in order to illustrate the principles of the invention, a smaller number of samples per vector is shown.
  • each codevector c is of the form:
  • This form of vector is called a bi-pulse vector since it has only two non-zero pulses.
  • This vector has the property of being spectrally flat as desired for codebook vectors. Since the +1 pulse can be in any of N possible positions and the -1 pulse can be in any one of (N-l) positions, the total number of combinations allowed is N(N-l) . Since it is preferred that N equal 64, the potential size of the codebook is 4032 vectors. It is noted that use of a bi-pulse vector for the form of the codebook vector permits all the speech synthesis calculations by knowing the positioning of the +1, -1 pulses in the codevector c. Since only position information is required, no codebook need be stored.
  • G (Fix + F 3-3 2F 13 ) (HI where d ⁇ is the element i of the vector d, d j is the element j of the vector d and F ⁇ is the element in row i and column j of the matrix F.
  • the search for the optimum codeword reduces to determining position information only, which in turn reduces to manipulating the values in the d vector and the F matrix in accordance with equation (11) .
  • the original impulse response is chopped off after a certain number of samples. Therefore, the energy produced by the filtered vector Ac will now be mostly concentrated in this frame wherever the pulses happen to be. It is presently preferred for the value of NTRUNC to be 8.
  • Precomputing the (N-by-N) matrix F (equation 9) is performed at 64.
  • the full response computation is used for the gain calculation since, although the truncated impulse response evens up the chances of all pulse positions being picked for a particular frame, the values of C 1 , G x produced by the bi-pulse process are not quite "exact" in the sense that they no longer exactly minimize the error between the gain-scaled filtered codevector gAc and the target vector x. Therefore, the un- truncated response must be used to compute the value of the gain g which does actually minimize this error.
  • C 1 2 /G 1 and C 1 /G 1 were also used in traditional codebook searching in order to find the best codeword and the appropriate gain. By use of the present invention, these values are calculated more quickly. However, the time necessary to calculate the best codebook vector and the efficiency of such calculations can be improved even further.
  • N 64. Consequently, even the simplified truncated search described above still requires the computation of C 1# G for N(N-l) or 4,032 vectors and this would be prohibitive in terms of the processing power required. In the present invention only a very small subset of these possible codewords is searched. This reduced search yields almost identical performance to the full codebook search.
  • Equation (10) for Gi then becomes: w-i
  • the codebook search procedure just consists of scanning the d vector for its largest positive component which reveals i (the position of the +1 within the codebook vector c) and the largest negative component which reveals j (the position of the -1 within the codebook vector c) .
  • the numerator only search is much simpler than the alternative of computing Ci, G ⁇ for each codevector. However, it relies on the assumption that G ⁇ remains constant for all pulses positions and this assumption is only approximately valid - especially if the +1, -1 pulses are close together.
  • NDBUF NDBUF
  • the assumption is now made that, even allowing for the slight variation in Gi with pulse position, the "best" codeword will still come from the pulse positions corresponding to these two sets (d(i_max k ) ⁇ , ⁇ d(j_min 1 ) ⁇ .
  • this numerator only search to select NDBUF largest positive elements and NDBUF largest negative elements is performed at 66.
  • the energy value E is set to zero at 68.
  • Ci, G can now be computed at 70, 72 from the following modification of equation (11) ,
  • Ci d(i_max k ) - d(j_mini)
  • Gi F (i_max k , i_max k ) + F(j_min x , j_min x ) - 2F(i_max k/ j ⁇ in (16) where F(i,j) is the element in row i, column j of the matrix F.
  • the maximum Ci 2 /Gi is determined in the loop including 70, 72, 74, 76 and 78.
  • Ci, Gi are computed at 72.
  • the value of E or Ci 2 /Gi is compared to the recorded value of E at 74. If the new value of E exceeds the recorded value, the new values of E, g and c are recorded at 76.
  • the complexity reduction process of doing a numerator-only search has the effect of winnowing down the number of codevectors to be searched from approximately 4000 to around 25 by calculating the largest set of Ci values based on the assumption that G ⁇ is approximately constant . For each of these 25, both C ir Gi (using the truncated impulse response) are then computed and the best codeword (position of +1 and -1) is found. For this one best codeword, the un-truncated impulse response is then used to compute the codebook gain g at 80. Both positions i and j as well as the gain g are provided for transmission.
  • Unvoiced sounds can be classified into definite types .
  • plosives e.g. t, p, k
  • the speech waveform resembles a sharp pulse which quickly decays to almost zero.
  • the bi-pulse codebook described above is very effective at representing these signals since it itself consists of pulses.
  • the other class of unvoiced signals is the fricatives (e.g. s, sh, f) which have a speech waveform which resembles random noise..
  • This type of signal is not well modeled by the sequence of pulses produced by the bi-pulse codebook and the effect of using bi-pulses on these signals is the introduction of a very course raspiness to the output speech.
  • the ideal solution would be to take the bi-pulse codebook vectors and transform them in some way such that they produced noise-like waveforms. Such an operation has the additional constraint that the transformation be easy to compute since this computation will be done many times in each frame.
  • the transformation of the preferred embodiment is achieved using the Hadamard Transform. While the Hadamard Transform is known, its use for the purpose described below is new.
  • the Hadamard transform is associated with an (N-by-N) transform matrix H which operates on the codebook vector c.
  • the transformed codevector c' will have elements which have one of the three values 0,-2, +2. The actual proportion of these three values occurring within c' will actually be 1/2, 1/4, 1/4 respectively.
  • This form of codevector is called a ternary codevector (since it assumes three distinct values) . While ternary vectors have been used in traditional random CELP codebooks, the ternary vector processing of the invention is new.
  • the transform matrix H has a very wide range of sequencies within its columns. Since c' is composed of a combination of columns of H as in equation (19) , the vector c' will have similar sequency properties to H in the respect that in some speech frames there will be many changes of sign within C while other frames will have c' vectors with relatively few changes. The actual sequency will depend on the +1,-1 pulse positions within c.
  • a high sequency c' vector has the frequency transform characteristic of being dominated by lots of energy at high frequencies while a low sequency c' has mainly low frequency components.
  • the effect of this wide range of sequency is that there are very rapid changes in the frequency content of the output speech from one frame to the next. This has the effect of introducing a warbly, almost underwater effect to the synthesized speech.
  • the preferred 64 diagonal values for the scrambling matrix S are as follows: -1, -1, -1, -1, -1, -1, 1, -1, 1, 1, -1, -1, -1, 1, 1, 1, -1, -1, 1, 1, 1, 1, 1, 1, -1, -1, 1, 1, 1, 1, 1, 1, 1, -1, 1, 1, 1, 1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, -1, 1, 1, 1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, -1, 1, -1, -1, -1.
  • Ci c ' ⁇ x
  • This computation is made up of three stages: (i) the calculation of A is just the backward filtering operation described above, (ii) the multiplication by the scrambling matrix S matrix is trivial since it just involved inverting the sign of certain entries. It will be noted that only the +1, -1 entries in S need be stored in memory rather than the whole (N- by-N) matrix) , (iii) the Hadamard transform can be computed efficiently by fast algorithms. Once d' ' has been computed, all that remains is to compute C ⁇ from:
  • Ci c fc d" (25) where c is still the bi-pulse vector. This is exactly the same as equation (10) with d being replaced by d' ' and so the same principles used to simplify the search for the bi-pulse codebook are also used with this scrambled Hadamard codebook
  • the numerator-only search can be employed to reduce the number of codebook entries searched from
  • Gi y' '*y' ' (26) which is just the correlation of this filtered signal y' ' with itself.
  • the scrambled codevector c' ' is formed at 90 and filtered through the LPC synthesis filter to form y' ' at 92.
  • the single pulse codebook is made up of vectors that are zero in every sample except one which has a +1 value.
  • This codebook is not only similar in form to the bi-pulse codebook but also in its computational details. Consequently, a flow chart similar to that shown in Fig. 7, has not been shown. If the +1 value occurs in row k of the codeword c, the values C , Gi are now computed as :
  • Ci d k
  • this codebook is identical to the bi-pulse codebook so that the concepts of a truncated impulse response for the codebook search and a numerator-only search are again utilized.
  • the reason for the modification is that the SHC was designed to operate well for fricative unvoiced sounds (e.g. s, f, sh) .
  • the speech waveforms associated with these sounds are best described as being made up of a noise-like waveform with occasional large spikes/pulses.
  • the bi-pulse codebook will represent these spikes very well but not the noise component, while the SHC will model the noise component but perform relatively poorly on the spikes.
  • Ci 2 /Gi Since the maximization of Ci 2 /Gi is associated with the minimization of a squared error between input and synthesized speech signals, an error at the spikes is weighted very heavily in the total error and so the SHC will occasionally produce large squared errors even for fricative speech inputs.
  • the squared error is not necessarily the best error criterion since the ear itself is sensitive to signals on a dB (or log) scale which gives small signals a larger importance relative to larger signals than a squared error criterion would imply. This means that, even if choosing the SHC would be the best decision perceptually, the squared error criterion may not come to the same final choice. Therefore, it is necessary to artificially weigh the decision at 102 in Fig.
  • a receiver constructed in accordance with the present invention is disclosed. It is noted that Fig. 9, similar to Fig. 6, is representative of programming used in conjunction with device 10 shown in Fig. 5. Transmitted telecommunication signals appearing on bus 18 are first buffered at 120 in order to assure that all of the bits associated with a single block are operated upon relatively simultaneously. The buffered signals are thereafter de- formatted at 122. LPC information is provided to synthesis filter 124. LTP information is provided to the periodic excitation generator 126. The output of generator 126 is multiplied by the gain ⁇ at multiplier 128. The i and j information together with the identification of the particular search method chosen at 100 in Fig. 5, are provided to codevector construction generator 130. The output of generator 130 is multiplied by the gain g at multiplier 132. The outputs of multipliers 128 and 132 are summed in summer 134. The summed signal is provided to synthesis filter 124 as the excitation signal.
  • the codevector will be a bi-pulse having a +1 at the i row and a -1 at the j row. If the scrambled search technique is used, since the pulse positions are known the codevector c for the SHC can be readily formed. This vector is then transformed and scrambled. Thereafter it is gain-scaled at 132 and filtered at 124 to form output speech vector gASHc. If the single pulse method was used, the codevector c is still capable of quick construction.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
PCT/US1993/008095 1993-08-27 1993-08-27 Adaptive speech coder having code excited linear prediction WO1995006310A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP7507532A JPH09506182A (ja) 1993-08-27 1993-08-27 符号駆動線形予測を備える適応音声符号化器
AU50951/93A AU5095193A (en) 1993-08-27 1993-08-27 Adaptive speech coder having code excited linear prediction
EP93920386A EP0803117A1 (en) 1993-08-27 1993-08-27 Adaptive speech coder having code excited linear prediction
PCT/US1993/008095 WO1995006310A1 (en) 1993-08-27 1993-08-27 Adaptive speech coder having code excited linear prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US1993/008095 WO1995006310A1 (en) 1993-08-27 1993-08-27 Adaptive speech coder having code excited linear prediction

Publications (1)

Publication Number Publication Date
WO1995006310A1 true WO1995006310A1 (en) 1995-03-02

Family

ID=22236901

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1993/008095 WO1995006310A1 (en) 1993-08-27 1993-08-27 Adaptive speech coder having code excited linear prediction

Country Status (4)

Country Link
EP (1) EP0803117A1 (ja)
JP (1) JPH09506182A (ja)
AU (1) AU5095193A (ja)
WO (1) WO1995006310A1 (ja)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999041737A1 (en) * 1998-02-17 1999-08-19 Motorola Inc. Method and apparatus for high speed determination of an optimum vector in a fixed codebook
CN101609677B (zh) * 2009-03-13 2012-01-04 华为技术有限公司 一种预处理方法、装置及编码设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5119423A (en) * 1989-03-24 1992-06-02 Mitsubishi Denki Kabushiki Kaisha Signal processor for analyzing distortion of speech signals
US5138662A (en) * 1989-04-13 1992-08-11 Fujitsu Limited Speech coding apparatus
US5224167A (en) * 1989-09-11 1993-06-29 Fujitsu Limited Speech coding apparatus using multimode coding

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0451200A (ja) * 1990-06-18 1992-02-19 Fujitsu Ltd 音声符号化方式
US5457783A (en) * 1992-08-07 1995-10-10 Pacific Communication Sciences, Inc. Adaptive speech coder having code excited linear prediction

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5119423A (en) * 1989-03-24 1992-06-02 Mitsubishi Denki Kabushiki Kaisha Signal processor for analyzing distortion of speech signals
US5138662A (en) * 1989-04-13 1992-08-11 Fujitsu Limited Speech coding apparatus
US5224167A (en) * 1989-09-11 1993-06-29 Fujitsu Limited Speech coding apparatus using multimode coding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP0803117A4 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999041737A1 (en) * 1998-02-17 1999-08-19 Motorola Inc. Method and apparatus for high speed determination of an optimum vector in a fixed codebook
US6807527B1 (en) 1998-02-17 2004-10-19 Motorola, Inc. Method and apparatus for determination of an optimum fixed codebook vector
CN101609677B (zh) * 2009-03-13 2012-01-04 华为技术有限公司 一种预处理方法、装置及编码设备
US8566085B2 (en) 2009-03-13 2013-10-22 Huawei Technologies Co., Ltd. Preprocessing method, preprocessing apparatus and coding device
US8831961B2 (en) 2009-03-13 2014-09-09 Huawei Technologies Co., Ltd. Preprocessing method, preprocessing apparatus and coding device

Also Published As

Publication number Publication date
JPH09506182A (ja) 1997-06-17
EP0803117A1 (en) 1997-10-29
EP0803117A4 (ja) 1997-10-29
AU5095193A (en) 1995-03-21

Similar Documents

Publication Publication Date Title
US5457783A (en) Adaptive speech coder having code excited linear prediction
US5717824A (en) Adaptive speech coder having code excited linear predictor with multiple codebook searches
EP0573216B1 (en) CELP vocoder
US5255339A (en) Low bit rate vocoder means and method
EP0409239B1 (en) Speech coding/decoding method
EP0470975B1 (en) Methods and apparatus for reconstructing non-quantized adaptively transformed voice signals
EP0360265B1 (en) Communication system capable of improving a speech quality by classifying speech signals
US6006174A (en) Multiple impulse excitation speech encoder and decoder
WO1980002211A1 (en) Residual excited predictive speech coding system
JPH0668680B2 (ja) 改善された多パルス線形予測符号化音声処理装置
EP0390975B1 (en) Encoder Device capable of improving the speech quality by a pair of pulse producing units
US4945565A (en) Low bit-rate pattern encoding and decoding with a reduced number of excitation pulses
EP0374941B1 (en) Communication system capable of improving a speech quality by effectively calculating excitation multipulses
JP2000155597A (ja) デジタル音声符号器において使用するための音声符号化方法
US5839098A (en) Speech coder methods and systems
US5673361A (en) System and method for performing predictive scaling in computing LPC speech coding coefficients
US5235670A (en) Multiple impulse excitation speech encoder and decoder
WO1995006310A1 (en) Adaptive speech coder having code excited linear prediction
EP0573215A2 (en) Vocoder synchronization
JP2946528B2 (ja) 音声符号化復号化方法及びその装置
JP2003323200A (ja) 音声符号化のための線形予測係数の勾配降下最適化
JP2615862B2 (ja) 音声符号化復号化方法とその装置
KR100205060B1 (ko) 정규 펄스 여기 방식을 이용한 celp 보코더의 피치검색 방법
JP3271966B2 (ja) 符号化装置及び符号化方法
JP3035960B2 (ja) 音声符号化復号化方法及びその装置

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU BB BG BR BY CA CZ FI HU JP KP KR KZ LK MG MN MW NO NZ PL RO RU SD SK UA VN

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 1993920386

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: CA

WWP Wipo information: published in national office

Ref document number: 1993920386

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1993920386

Country of ref document: EP