EP0203940A1

EP0203940A1 - Relp vocoder implemented in digital signal processors

Info

Publication number: EP0203940A1
Application number: EP85905709A
Authority: EP
Inventors: Philip John Wilson
Original assignee: Hughes Network Systems LLC; MA Com Government Systems Inc
Current assignee: Hughes Network Systems LLC; MA Com Government Systems Inc
Priority date: 1984-11-01
Filing date: 1985-11-01
Publication date: 1986-12-10
Also published as: CA1240396A; AU5019885A; WO1986002726A1; DK311386A; NO862602L; EP0203940A4; DK311386D0; JPS63500896A; AU577641B2; NO862602D0

Abstract

Un vocoder RELP (à prédiction linéaire et excitation résiduelle) est utilisé dans deux processeurs de signaux numériques, l'un pour un système de transmission (Fig. 1) et l'autre pour un système de réception à distance (Fig. 2). Le transmetteur traite des échantillons de signaux numériques de données vocales pour fournir un signal formaté de transmission comprenant (a) un signal résiduel quantifié généré par filtrage inverse des échantillons selon des coefficients prévisibles linéaires de codage (LPC) générés à partir des échantillons, (b) des coefficients LPC quantifiés, et (c) des paramètres de pas et de gain générés pendant la quantification du signal résiduel des échantillons inversement filtrés; tous ceux-ci sont générés par le processeur à partir des échantillons de signaux numériques de données vocales. Le processeur de réception du signal numérique traite le signal formaté de transmission afin de synthétiser les signaux numériques reconstitués de données vocales. Des systèmes de transmission et de réception situés au même endroit peuvent être inclus dans un seul processeur de signaux numériques.A RELP vocoder (linear prediction and residual excitation) is used in two digital signal processors, one for a transmission system (Fig. 1) and the other for a remote reception system (Fig. 2). The transmitter processes samples of digital voice data signals to provide a formatted transmission signal comprising (a) a quantized residual signal generated by reverse filtering the samples according to predictable linear coding coefficients (LPC) generated from the samples, (b ) quantized LPC coefficients, and (c) pitch and gain parameters generated during quantization of the residual signal of the inversely filtered samples; all these are generated by the processor from samples of digital voice data signals. The digital signal receiving processor processes the formatted transmission signal to synthesize the reconstructed digital signals from voice data. Transmission and reception systems located at the same location can be included in a single digital signal processor.

Description

RELP VOCODER IMPLEMENTED IN DIGITAL SIGNAL PROCESSORS

BACKGROUND OF THE INVENTION

The present invention generally pertain-s to voice coders (vocoders) and is particularly directed to Residual-Excited Linear Prediction (RELP) vocoders. Vocoders convert speech signals into digital form for transmission and synthesize speech signals from these digital signals upon reception. Vocoders typically operate at flexible binary data rates^' varying from 32 kbps {kilobits per second) down to about 2.4 kbps.

Vocoders traditionally are divided into two basic types, waveform coders and pitch-excited source coders. Waveform coders operate at high data rates (above 16 kbps) and produce good quality natural sounding speech which is robust against both acoustic and transmitted noise.. Source coders operate at low data rates (less than 4.8 kbps) in an analysis/synthesis mode governed by a mathematical model of the human vocal ,_. apparatus. Source vocoders typically sound robotic and do not perform well under poor acoustic conditions. ,

The RELP vocoder was originally proposed by Un and Magill, "The Residual-Excited Linear Prediction Vocoder with Transmissio Rate Below 9.6 kbits/s", IEEE Trans. COM-23, 1975 pp. 1466-1473; and an enhanced RELP vocoder was proposed by Dankberg and Wong, "Development of a 4.8-9.6 kbps RELP vocoder", ICASSP-79. The purpose of the RELP vocoder was to provide satisfactory perform¬ ance in the gap between the operating ranges of waveform coders and source coders, to wit: 4.8 kbps to 16 kbps. The RELP vocoder contains some features of both waveform coders and source coders.

In prior art RELP vocoders, digital speech data signal samples are analyzed over relatively short time segments (typically in the range of 10-30 ms-.) by a linear predictive coding (LPC) vocal tract modeling technique to provide LPC coefficients for each block of samples. The LPC coefficients represent the vocal tract, glottal flow and radiation of the speech represented by the digital signal samples. Using the LPC coefficients, the digital speech data signal samples are inverse filtered by a time-variant, all-pole recursive digital filter over each short time segment to provide residual signal (prediction error signal samples. The time-variant character of speech is handled by a succession of such filters with different parameters.

The residual signal and the LPC coefficients are encoded (quantized) and formatted for transmission. Upon reception, speech is synthesized by processing the residual signal in- accordance with the LPC coefficients.

In prior art RELP vocoders, the residual signal samples are bandlimited and downsampled prior to quantization in order to provide residual signal s-amples at a reduced data rate. The upper band harmonics are generated during synthesis of th speech signal when the downsampled residual signal is upsa pled and zeros are inserted between data points. In the Un and Magill RELP vocoder the residual signal is quantized prior to transmission by adaptive delta modulation. Dankberg and Wong. considered various other quantization tech¬ niques and concluded that pitch predictive adaptive different pulse code modulation (PPADPCM) provided the best signal-to- quantizing noise ratio.

In accordance with the PPADPCM technique, the residual signal samples are processed by pitch analysis to determine t pitch delay, are processed by pitch predictor gain analysis t determine the pitch predictor gain in accordance with the det mined pitch delay, processed by gain analysis to provide a maximum deviation quantizer gain, and are further processed b PPADPCM in accordance with the quantizer gain, pitch predicto gain and delay parameters to thereby provide the quantized residual signal. The quantizer gain, pitch predictor gain an

•_» the pitch delay parameters are combined with the quantized residual signal and the quantized LPC coefficients- for transmission.

RELP vocoders of the prior art have required complex hardware and have been so expensive to implement as to be commercially impractical. - m

SUMMARY OF THE INVENTION

The present invention provides a commercially practical RELP vocoder that, is implemented by two digital signal processors, one for a transmitter system and one for a remotely located re- ceiver system. The transmitter digital signal processor is adapte for processing digital speech data signal samples to provide a formatted transmission signal including (a) a quantized residual signal generated by inverse filtering of the samples in accordance with linear predictive coding (LPC) coefficients generated from the samples, (b) quantized LPC -coefficients and (c) pitch and gain parameters generated during quantization of the residual signal from the inverse filtered samples, all of which are generated by the processor from the digital -speech data samples. The receiver digital signal processor is adapted for processing the formatted transmission signal to synthesize reconstructed digital speech data signal samples.

The transmitter digital signal processor is adapted for performing a routine for generating the LPC coefficients; a routine for generating the residual signal; and a routine for quantizing the residual signal and the LPC coefficients. The routine for generating the LPC coefficients includes a subroutine for pre-emphasizing the samples in order to emphasize the high frequencies of speech, a subroutine for defining an auto-correlatio function (ACF) from the prec-emphasized s-amples in order to generate ACF coefficients; and a subroutine for generating the LPC coe ficients from the generated ACF coefficients. The routine for generating the residual signal includes a subroutine for inverse filtering the pre-emphasized samples in accordance with the generated LPC coefficients; a subroutine for bandlimitin the residual signal by low-pass filtering in a manner which will reduce the effects of quantization; and a subroutine for downsampling the bandlimited residual signal to reduce the number of residual signal samples that are quantized and formatte for transmission. The routine for quantizing the residual signal and LPC coefficients includes a subroutine for quantizing the LPC coefficients; a subroutine for estimating, the pitch period of the downsampled residual signal by ACF analysis of the current downsampled residual signal frame in accordance with the ACF coefficients generated for the previous, frame to thereby provide a pitch delay parameter for the current frame; a subroutine for providing a pitch predictor gain parameter for each residual signal frame in accordance with the estimated pitch delay para- meter for each corresponding frame; a subroutine for providing a quantizer gain parameter for each residual signal frame in accord ance with the pitch delay and pitch predictor gain parameters for each corresponding frame; and a subroutine for quantizing each residual signal frame by pitch predictive adaptive differential pulse code modulation (PPADPCM) in accordance with the pitch delay, pitch predictor gain and quantizer gain parameters for each corresponding frame.

The receiver digital signal processor is adapted for processi the formatted transmission signal to synthesize reconstructed digital speech data signal samples by performing^'a synthesis routine th includes a subroutine for regenerating the LPC coefficients from the quantized LPC coefficients included in the transmission signal; a subroutine for decoding the quantized residual signal included in the transmission signal in accordance with the pitch delay, pitch predictive gain and quantizer gain parameters included in the transmission signal to thereby provide a decoded downsampled residual signal; a subroutine for spectrally regenerating-a full-band residual signal from the decoded downsampled residual signal; a subroutine for regenerating pre-emphasized digital speech data signal samples by auto-regressively filtering the regenerated full-band residual signal in accordance with the regenerated LPC coefficients; and a subroutine for de-emphasizing the regenerated preemphasized samples in order to de-emphasize the high frequencies of speech, to thereby provide the reconstructed digital speech data signal samples. The decoding subroutine includes a subroutine for scaling quantizer coefficients for each quantized residual signal frame in accordance with the quantizer gain parameter included in the transmission signal; a subroutine for providing data samples from the quantized residual signal included in the transmission signal in accord- ^' ance with the scaled quantizer coefficients; and a subroutine for providing the decoded downsampled residual signal from the data samples by pitch excitation in accordance with the pitch delay and pitch predictor gain parameters.

Additional features of the present invention are discussed in relation to the description of the preferred embodiment.

BRIEF DESCRIPTION OF THE DRAWING

Figure 1 is a functional block diagram illustrating the process implemented by the transmitter digital signal processor to code an input signal sample for transmission.

Figure 2 is a functional block diagram illustrating the process implemented by the receiver signal processor to decode a sample which is coded in accordance with the process illustrate in Figure 1.

Figure 3 is a flow chart -of the LPC coefficient generation routine performed by the transmitter digital signal processor. Figure 4 is a flow chart of the residual signal generation routine performed by the transmitter digital signal processor.

Figure 5 is a flow chart of the quantization routine per¬ formed by the transmitter digital signal processor. Figure 6 is a diagram of a quantization filter implemented during the PPADPCM quantization subroutine included in the routin of Figure 3.

Figure 7 is a flow chart of the synthesis routine performed by the receiver digital signal processor.

DESCRIPTION OF THE PREFERRED EMBODIMENT

In the preferred embodiment of the present invention, the transmitter digital signal processor and receiver digital signal processor respectively are each Texas Instruments Model TMS32010 Digital Signal Processors. The TMS32010 processor is a 16-bit, 200 ns cycle time, stand-alone processor with a 32-bit ALU and Accumulator. The processor has a four level stack for nested subroutines; and arithmetic performance is enhanced by a hardware 16*16-bit parallel multiplier, which performs a pipelined multiply/accumulate operation in 400 ns. The TMS32010 processor has 144 16-bit words available as internal RAM which may be augmented by addressing external RAM, for buffer storage, via TBLR/TBLW (table read/write) commands. These commands allow a trade-off between data memory requirements and speed of operation Program memory may be redefined as external- data memory but its access time is 600 ns. External program memory may be expanded to 8K bytes at full speed. The two processors must perform all operations of the RELP vocoder in real time. The processor choice is constrained by two key factors: operating speed and available internal RAM (especially important because frame storage is required) . The TMS32010 processor is chosen based 5 on its fast operating speed (5 MHz) , data storage capabilities, and extensive development tools.

The principal functions of the"transmitter processor are described with reference to Figure 1. Digital speech data signal samples 10 are pre-emphasized 11 to improve the representation of 0 high frequencies during the subsequent LPC analysis. Pre-emphasiz samples 12 are subjected- to LPC analysis 13 to provide LPC reflec¬ tion coefficients 14.

The LPC reflection coefficients 14^" are quantized 15 to provid quantized LPC reflection coefficients 16. The LPC reflection 5 coefficients 14 are quantized to minimize distortion during sub¬ sequent transmission to the receiver. LPC coefficients 17 are - generated 18 from. the quantized LPC reflection coefficients 16.

The pre-emphasized samples 12 are inverse filtered 19 in accordance with the LPC coefficients 17 to provide a residual 0 signal 20. The residual signal 20 is bandlimited 21 and down¬ sampled 22 to provide a baseband residual signal 23.

The baseband residual signal 23 is quantized by PPADPCM quantization 24 in order to minimize the effects of distortion during subsequent .transmission of the quantized residual signal 25. Three of the parameters of the PPADPCM quantization 24 are pitch delay, pitch predictor gain and quantizer gain. These three parameters are generated during PPADPCM quantization 24 and are necessary to decode to the quantized residual signal received by the receiver system. Accordingly, a pitch delay signal is provided on line 26, a pitch predictor gain signal is provided on line 27 and a quantizer gain signal is provided on line 28 incident to the PPADPCM quantization 24 of the baseband residual signal 23. .. . The quantized residual signal 25, the quantizer, the pitch delay signal on line 26, the pitch predictor gain signal 27, the quantizer gain signal 28 and the quantized LPC reflection coefficients 16 are combined linearly by formatting 32 to provide a transmission frame 34. The principal functions of the receiver processor are de¬ scribed with reference to Figure 2. The format of each received data transmission frame 36 is decoded 37 to provide the quantized residual signal 39, the pitch delay parameter 40, the pitch predictor gain parameter 41, the quantizer gain parameter 42 and the quantized LPC reflection coefficients 43.

The quantized residual signal 39 is decoded by PPADPCM decoding 46 in accordance with the pitch delay 40, pitch predic¬ tor gain 4T and quantizer gain 42 to provide a decoded baseband residual signal 47. The decoded baseband residual signal 47 is spectrally regenerated 48 to provide a full-band residual signal 4 The quantized LPC reflection coefficients 43 are processed 50 to generate the LPC coefficients 51.

The full-band residual signal 49 is filtered 52 in accord¬ ance with the generated LPC coefficients 51 to synthesize a decoded speech data signal samples 53. The decoded speech data signal samples 53 are de-emphasized 54 to provide a regenerated digital speech data signal samples 55.

The processing routines performed by the transmitter processor to perform the above-described signal processing functions are described below with reference to the flow charts of Figures 3, 4 and 5.

The processing routine represented by the flow chart of Figure 3 generally pertains to LPC analysis. This routine gene ates the LPC coefficients from a buffered frame of pre-emphasiz speech data signal samples. The routine of Figure 4 is general directed to generation of the residual signal; and the routine Figure 5 is generally directed to quantization of the residual signal and the LPC coefficients.

The LPC analysis routine includes the subroutines of initialization 58, sample input 59, pre-emphasis 61, ACF generation 63, ACF normalization 65 and LPC analysis 66.

The sample input subroutine 59 reads in digital speech data signal samples from an external data memory buffer. The pre-emphasis subroutine 61 applies first-order digital pre-emphasis to the input speech data signal samples. The input to the algorithm is the input^' speech sample S and the output is the pre-emphasized speech sample S¹ , both located in internal RAM. First-order digital pre-emphasis is applied to the input speech signal to emphasize the high frequencies of speech. This leads to a more accurate estimate of the vocal tract frequency response, which is controlled by the. LPC parameters. Pre-emphasi uses a single-delay high-pass filter. Experimentation shows that the choice of the pre-emphasis constant (a) is not critical and it is normally set to 0.9375. The difference equation for the filter is:

^S'n ^{= S}n ^{" a}'^Sn-l . (Eq. 1)

The pre-emphasis function is complemented at the receiver system by applying a de-emphasis function.

The pre-emphasized samples are stored in an external data memory for use in the residual signal generation routine of Figure- 4.

The ACF generation subroutine 63 iteratively updates a correlation buffer for each input speech data signal sample.

This buffer must be zeroed prior to the first call to the sub¬ routine. The output of this subroutine is a 32-bit precision auto-correlation function (ACF) for delays between zero and ten points. 13

In order to generate the LPC coefficients, an auto¬ correlation function (ACF) must be defined from a windowed buffer of pre-emphasized speech samples (s .) . The ACF of a sequence is defined as:

* ⁼ I^-_θ ^"k _χ • x K=0,..^'.,N-1 ⁽Eq. 2)

' j j + k

where x. = w. • S. ,„ ,,

3 3 3 ⁽Eq- 3)

The window (w.) is chosen to be rectangular for ease of implementation. W = 1 n = 0, ... ,N-1

= 0 elsewhere (Eq. 4)

A tenth-order LPC analysis requires the ACF coefficients R₀,...,R,_Q. These coefficients may be updated iteratively for each input speech data signal sample. •

Vⁿ⁺¹⁾ = ^Rk^{(n) + x}n ^{* x}n-k (Eg. 5) where R, (n) is the n iteration of the k ACF coefficients. This equation is implemented by the ACF generation subroutine 63. The coefficients R, are maintained with 3.'2-bit accuracy to remove round-off error problems. The algorithm is imple- mented by creating a delay buffer that is initialized to zero and ripples after each iteration. This implementation also ensures that the 32-bit result will not overflow. The maximum 14

value of the ACF is the zero-delay element. If, for example, each input sample has a maximum of 12-bit resolution, the maxi¬ mum value attained by the accumulator, for a data buffer of 180 samples, is:

log [2¹*. * .2¹¹ * 180] = 29.492 bits . (Eq. 6)

Upon completion of sample input, the 32-bit ACF. result, must be converted to 16-bit coefficients. The ACF normalization subroutine 65 performs all operations required to convert the 32-bit ACF to a 16-bit result. The LPC analysis subroutine 66 is transparent to a scaled ACF input. Therefore, to obtain the maximum dynamic range of the 16-bit ACF, the 32-bit results are scaled to the maximum, R_Q, prior to truncation to 16-bits. The optimal procedure for this would be to divide all coefficients by R_Q. However, execution efficiency is greatly improved by simply left-shifting the 32-bit numbers to remove leading zeros in the R_Q value.

A.decision 67 that the 32-bit correlation frame is complete enables the processor to proceed to the ACF normalization subroutine 65. The LPC analysis subroutine 66 implements the Durbin algorithm to generate the ten LPC .coefficients and ten LPC reflection coefficients 36. _. The Durbin algorithm-!¹s input is the normalized 16-bit ACF. 15 The Durbin algorithm is an extremely efficient algorithm for generating the LPC coefficients. See J. Makhoul, "Linear Predition: A Tutorial Review", Proc IEEE, Vol. 63, pp 561-80, 1975. The^' algorithm is suitable for fixed-point arithmetic 5 implementation and also generates, as a by-product, the reflec¬ tion coefficients, which may used for quantization and coding prior to transmission to the receiver. ^" . •

Alternatively the LPC coefficients may be generated by the Le Roux-Gueguen (LG) recursion, which is described in O J. Le Roux and C. Gueguen, "A Fixed Point Computation of Partial Correlation Coefficients in Linear Prediction", Proc ICASSP-77, pp 742-3. The LG recursion, although faster than the Durbin algorithm, generates only the LPC reflection -coefficients and not the LPC coefficients, per se which must be generated 5 separately.

Durbin^!s recursive procedure is as follows: Initialization:

E₀ = R_Q (Eq. 7)

0 . E_χ = [l-k₁ ²]E₀ (Eq. 9)

Recursion (i=2,P): k. = -(R. +^'Y÷^"J a.^{1-1 •} R. ,)/E. , ,_ __n, l ^- -l 3 ^{1_ 1-}J- ^(Eσ_-> ¹⁰⁾

^ai ^{= k}i (Eq. 11) a . ^ = a .^1"1 + k . • a . .^1_1 Kj <l-1 (Eq . 12) 3 3 i i-D ^•

E_± = [ l-k_i ²] • E_i_₁ (Eq . 13 ) _.

Symbols defined:

E. is the prediction error energy

--V.

R. is the i auto-correlation function k. is the i -reflection coefficient a.-' is the i . LPC coefficient (j iteration

The order of the LPC analysis, P, is determined experimentally and a 10th order analysis is s.ufficient to adequately model the vocal tract frequency response.

The LPC parameters must be quantized and coded prior to transmission and resynthesis of the digital speech data signal at the receiver. However, the LPC coefficients, a, , are sensi¬ tive to quantization noise and introduce significant distortion to the signal. A solution is to quantize and code the LPC reflection coefficients, k., which are much less sensitive to

1 quantization noise.- This operation is performed by a LPC coefficient quantization subroutine 68, which is a part of the quantization routine of Figure 5. At the receiver the LPC coefficients may be recovered-from the quantized reflection coefficients using a subset of the recursion above.^~ The initialization subroutine 58 and the sample input ;.-. subroutine 59 are both contained .in the main program for the transmitter processor. The main program controls the calling of the other subroutines in the LPC analysis routine of - Figure 3 in accordance with the following hierarchy: pre-emphasis 61, ACF generation 63, ACF normalization 65 and LPC analysis 66. The main program implements the LPC -analysis routine of Figure 5 to generate a frame of a predetermined number of pre-emphasized speech data signal samples and the ten LPC coefficients. The term "LPC coefficients" as used herein refers to either LPC coefficients or LPC reflection coefficients unless the latter is specified.

The residual signal generation routine is represented by the flow chart of Figure 4. This routine includes the subroutines of initialization 70, sample imput 71, inverse filter 72, bandlimit 73 and downsample 74.

The initialization subroutine 70 transfers second-order section filter coefficients from external data memory to the internal RAM of the transmitter processor for use during the bandlimit subroutine -73.

The s-ample input subroutine 71 inputs the pre-emphasized samples from a speech data buffer located in the external data memory to the zero-delay position of a speech delay buffer, which is located in the internal RAM of the transmitter processor. ^a 85/02KB

18

the delay buffer is used for the implementation by.the inverse filter subroutine 72 of the all-zero Finite-Impulse-Response (FIR filter in accordance with the LPC coefficients.

The inverse filter subroutine 72 implements an all-zero inverse filter in accordance with the LPC coefficients to generat the residual signal 19 (Figure 1) . The output from this sub¬ routine 72 is provided to a residual signal data buffer which is located in the external data memory. •

The residual signal 20 is generated by inverse filtering the pre-emphasized speech data signal samples 12 in accordance with the LPC coefficients 17. (See Figure 1) . The LPC coeffi¬ cients are formulated mathematically to estimate the transfer function of the vocal tract. This function .is represented by the polynomial H(z): H(z) = [1 -^₌₁ a_k ^• z^"*]^"1 (Eq. 1 where a, is the k LPC coefficient. The residual signal 19 is obtained by filtering the speech data signal samples 12 by the all-zero filter H(z)~ . If represents the input speech sample at time n and y represents the corresponding output sample, the filter can be represented by the following difference equation:

n — xn "r a1_. • xn-1- -*r a2_- • T* ... T a1..0_Λ * xn-1.0 _n ( /E—.q. 1-i , 19 The simplest way to implement this structure is to place the coefficients a, in a fixed register and to implement the delay buffer using a shift register. The TMS32010 micro-code is optimized to perform this operation using the LTD/MPY commands: the processor has a pipelined Multiply/Accumulate instruction that executes in 400 ns.

The bandlimit subroutine 73 low-pass filters the residual signal 20 by implementing an eighth-order elliptic half-band filter, which in turn is implemented by using a cascade of four second-order sections. The transfer function of the elliptic filter is:

H(z) = A(z) / B(z) (Eq. 16 where A⁽z)

^B(z) =5J ₀ ^b ' ^{z_k b}0^{=1 (Eq}* ¹⁸⁾ It is important to implement this filter in a manner which will reduce the effects of coefficient quantization and finite register length effects which are described in L. A. Rabiner and B. Gold, "Theory and Application of Digital Signal

Processing", Prentice-Hall, 1975. This is best achieved by factorizing the polynomial H(z) into second order polynomials:

H(z) = H₁(z) ^• H₂(z) • H₃(z) • H₄(z) (Eq. 1

-1 -2 where: ^H _m ^{(z) = a}n ^{+ a}ι ' ^{z + a} ₂ ^{* z}

1 + b₁ * z^"1 + b₂ ^• z^"2 (Eq. 2 20 The second-order polynomial H (z) is implemented by a second- order filter section. The second-order section is implemented by an internal subroutine that is called four times to provide a cascade of four second-order sections. A cascade of four sections is equivalent to an eighth-order elliptic low-pass filter. Each section uses a set of filter coefficients and requires its own delay buffer, which must be shifted at each iteration.

The downsample subroutine 74 implements downsampling by discarding predetermined samples. The downsample algorithm uses the frame counter to alternate between discarding the input data point or scaling it to maintain the energy per frame. The downsampling function reduces the filtered residual signal sample data rate. This function is executed by a frame position pointer. The sample is either discarded or magnitude-scaled (multiplied by a predetermined factor to maintain the average frame energy of the residual signal).. If, for example, the downsampling ratio is two, the scaling factor is also two.

A decision 75 that the frame is complete concludes the residual signal generation routine of Figure 4.

The sample input 71 and inverse filter 72 subroutines and the decision 75 are integrated together and control the calling hierarchy for the other subroutines in the residual signal generation routines of Figure 4. The order of such calling hierarchy is bandlimit 73 and downsample 74.

The quantization routine represented by the flow chart of Figure 5 includes the following^' subroutines: LPC coefficient quantization 68 (discussed above in relation to the LPC analysis subroutine 66), pitch delay 78, pitch predictor gain 80, quan¬ tizer gain 81, CRC 82, PPADPCM quantization 83 and data format 8

The LPC coefficient quantization subroutine 68 quantizes the ten LPC reflection coefficients. 14. This subroutine obtains its input data from the LPC reflection coefficients 14 and quantizer look-up subroutine 68 during the operation of the LPC analysis subroutine 66. "This subroutine 68 is called by the LPC analysis subroutine 66.

The reflection coefficients are quantised with a variable number of bits per coefficient compatible with DOD standard LPC-10 coding, which is described in T. E. Tremain, "The

Government Standard Linear Predictive Coding Algorithm: LPC-10" Speech Technology, April 1982.

Data management is necessary because of the limited avail¬ ability of internal RAM in the TMS32010. Additional data buffers may be located in external data memory, which has a very slow access time (800 ns) . A data management algorithm performs buffer transfers between internal RAM and external data memory to enable all routines to execute using internal RAM memory. The pitch delay subroutine 78 estimates the pitch period -to determine the pitch delay parameter T of the downsampled residual signal.22 (Figure 1) used for the PPADPCM quantization using an auto-correlation function (ACF) analysis of the signal 22. The inputs to the algorithm are the partial ACF of the previous frame and the current residual signal frame. The out¬ put from the algorithm is the estimated pitch delay T and the updated partial ACF.

The pitch delay is updated at the frame rate. Pitch analysis uses a simple auto-correlation detector:

^R(τ> ⁼ _n=Q ^s _n ^{* S}n-T (Eq.

The pitch delay, T, is chosen as the maximum value of R(T) , evaluating ³ R(T) between Tmm. and Tmax. To enable an accurate estimate of the pitch delay, the analysis must cover three pitch periods, i.e., N>3Tmax. The- limits of the pitch detection are chosen experimentally using Fortran simulations of the

RELP vocoder alg ³orithm; for examp ^cle, Tmm. is a 15 samp ^cle delay ^J and T is a 40 sample delay. This corresponds to pitch fre¬ quencies of 267 Hz and 100 Hz respectively if the downsampled residual signal 22 has a sampling rate of 4 kHz. The value N is therefore chosen to be two downsampled frames. The auto¬ correlation detector, R(T) is evaluated as- two partial-ACF¹s, R, (T) and R_ (T) , where: 23

^R1 ^(T) __.__-l ^Sn - ^Sn-T ^(E?- ²

^R2 ^lT) n-M ^Sn - ^Sn-T ^{' iE~}- ²

M is a single downsampled frame. R(T) is calculated by adding the current frame's partial-ACF, R₂(T), and the previous frame's partial-ACF, R_, (T) , that was stored in external data memory.

The pitch predictor gain subroutine 80 evaluates the pitch predictor gain parameter B for the PPADPCM quantization and updates such evaluation at the frame rate. The pitch predictor gain B is evaluated as:-

where M is a single downsampled frame and T is the pitch delay. 3 is constrained between two limits:

B > 1.0 Then: B = 1.0

B < 0.1 Then: B = 0.0

The quantizer gain subroutine 81 evaluates the quantizer gain parameter q . for the PPADPCM quantization and updates such evaluation at the frame rate. This parameter is used to scale the quantizer to the input signal level; each input and output level of the quantizer is multiplied by q . . The parameter is chosen to be the maximum x :

n = I ^Sn ^{" B} n-T n = 0 ,M-1 (Eq. 2 24

where M is a single downsampled frame, T is the pitch delay, and B is the pitch predictor gain.

The CRC subroutine 82 introduces an n-bit cyclic redun- dancy code (CRC) on part of the transmission frame to enable detection of bit errors during transmission. The code protects the LPC coefficients and PPADPCM parameters. The input to the subroutine is the relevant quantized coefficients. The output from the subroutine is an n-bit CRC- to be transmitted.

The PPADPCM subroutine 83 quantizes the downsampled residual signal 22, using Pitch Predictive Adaptive Differential Pulse Code Modulation (PPADPCM) . The term "pitch predictive"- is misleading however. The pitch predictor is used to remove the dominant periodic frequency from the residual signal 22 prior to quantization. While this frequency is most commonly the pitch period, the predictor may.lock onto an alternate frequency without detrimenting the operation of the quantizer. Therefore a rigorous pitch extraction algorithm is not necessary. The predictor removes the dominant periodicity of the waveform to generate a "white noise" signal with a Gaussian probability density function (pdf) . This signal may then be quantized using a classical Max quantizer, as described in J. Max, "Quantizing for Minimum Distortion," IRE Trans on Information Theory, March 1960. 25 Figure 6 shows the structure of the PPADPCM quantizer.

The quantizer is embedded in the predictor loop so that the error spectrum introduced by quantization is uniform. The parameters of the quantizer are the pitch delay (T) , the quantizer gain (qga. ), the pitch predictor gain (B) , and the order of the quantizer (Q) . Experimentation determines that a 3-bit quantizer is adequate to ensure good subjective speech quality at the receiver.

The data format subroutine 84 formats a data frame 34 (Figure 1) for transmission. The input to the subroutine 84 is a predetermined number of quantized residual signal samples 25, the pitch delay parameter 26, the pitch predictor gain 27, the quantizer gain 28, the quantized LPC coefficients 31 (Figure 1) and the CRC. The output from the subroutine 84 is a transmissio data frame 34 which is placed in the output buffer.

A decision 85 that the frame is complete concludes the quantization routine of Figure 5.

The calling hierarchy of the. subroutines in the quantiza¬ tion routine of Figure 5 is under the control of the main program. The following subroutines are integrated together in a subroutine designated PPQNT: pitch predictor gain 80, quantize gain 81 and PPADPCM quantization 83. The calling hierarchy is as follows: pitch ^*78, PPQNT, CRC 82 and data format 84. The subroutine 68 is called by the LPC analysis subroutine 66 in the LPC analysis routine of Figure 3. The receiver digital signal processor utilizes a synthesis processing routine. Referring to Figure 7, the synthesis routine includes the following subroutines: initialization 88, data input 89,- CRC check 90, LPC coefficient generation 91, PPADPCM decoding 92, spectral regeneration 93, LPC synthesis filter 94, de-emphasis 95, and speech output 97.

The initialization subroutine 88 is included in the main program for the receiver processor.- The initialization sub¬ routine 88 initializes all registers and data locations within the processor prior to the execution of each subroutine.

The data input subroutine 89 also is included in the main program for the receiver processor. This subroutine inputs the data transmission frame 36 received from the transmitter by inputting the frame from a frame buffer in external data memory. The CRC check subroutine 90 uses the received transmission data frame to generate an n-bit CRC which it compares to the n-bit CRC in the received transmission data frame to check for transmission errors. If any errors are detected, a subset of the LPC and PPADPCM parameters for the current frame are dis- carded and a subset of the previous fr-ame's parameters substituted. The input to this subroutine is an-bit CRC word from the data transmission frame. The output from this subroutine is a flag indicating which set of parameters to use during the rest of the subroutine. 27

The LPC coefficient generation subroutine 91 reads in the transmitted quantized LPC parameters, calls a subroutine: IQRC to decode the LPC reflection coefficients, and performs a step-up algorithm to transform the LPC reflection coefficients to the LPC coefficients^'. The input to this subroutine is the transmitted quantized LPC reflection coefficients 43 and the output is the LPC coefficients 51 (Figure 2) .

Prior to LPC synthesis filtering 52, the LPC coefficients must be generated from the transmitted quantized LPC reflection coefficients. These quantized LPC reflection coefficients must be unpacked and decoded using the quantizer look-up tables described in T. E. Tremain, "The Government Standard Linear Predictive Coding Algorithm: LPC-10", Speech Technology, April 1982. The LPC coefficients are then generated from the decoded LPC reflection coefficients using the step-up algorithm, a recursive algorithm which is a subset of the Durbin algorithm described in J. Makhoul, "Linear Prediction: A Tutorial Review," Proc IEEE, Vol 63, pp 561-80, 1975.

The recursion is as follows:

Initialization: a^ = k- (Eq. 26

Recursion (i=2,P) : a = k. (Eq. 27

a. . - = a . -^'1 + k. . a. .^X_1 Kj<i-1 (Eq. 3 3 l i-: ^J where ki. is the i reflection coefficient and ai.³- is the i^th LPC coefficient (j^th iteration) . The order of the transmitter LPC analysis, P, is ten.

The PPADPCM decoding subroutine 92 reads in the bit-packed quantized residual signal 39 and quantizer parameters 40, 41, 42 received from the transmitter and generates a decoded baseband (downsampled) residual ^'signal 47 (Figure 2) . This subroutine 92 must perform the inverse operation of the transmitter's PPADPCM coding. It therefore divides into three parts: unpacking, quantizer look-up, and pitch excitation.

The PPADPCM decoding subroutine 92 first transfers the PPADPCM quantizer coefficients to internal RAM -and scales them using the quantizer gain parameter. The inputs to this operation are the coefficient buffer stored in external data memory and the quantizer gain. The output of this operation is the scaled look-up table located in internal RAM.

This subroutine 92 next reads in packed data bytes from a data buffer in external data memory, unpacks the byte, and decodes the data samples using the quantizer look-up table. The input to this operation is the bit-packed data word and the quantizer coefficient table. The output from this operation is the set of decoded data samples. The received data Bytes are unpacked into individual data samples by masking off each individual data sample, which may then be decoded using the 29 quantizer look-up table that is identical to the one used at the transmitter to quantize the data samples.

The PPADPCM decoding subroutine 92 then implements a variable delay first-order difference equation to "pitch excite" the input data and recover the downsampled residual signal 47. The input to this operation is the transmitted data sample, the pitch delay parameter and the pitch predictor gain parameter. The output from this operation is the downsampled residual signal 47. The difference equation for this operation is:

^Sn = x (Eq. 29 n + B 'n-T

where S is the downsampled residual signal sample, x is the transmitted data sample, B is the pitch predictor gain, and T is the current frame's pitch delay (period) .

The spectral regeneration subroutine 93 is included in the main program for the receiver' processor. The spectral regeneration subroutine 93 generates a full-band residual signal 49 from downsampled residual signal 47. The effect is to convert a 4 kHz downsampled signal 47 to an 8 kHz full-band signal 49. The LPC synthesis filter subroutine 94 implements. an auto- regressive LPC synthesis filter governed by the LPC> coefficients The inputs to this subroutine are the LPC coefficients 51 and the regenerated full-band residual signal 49. The output from this subroutine is the regenerated pre-emphasized speech data signal sample 53. This subroutine 94 generates the speech data signal samples 53 by filtering the residual signal 49 with a tenth-order all-pole filter.- The filter is governed by the generated LPC coefficients 51. The transfer function of the filter is:

^H(z) = ^ -J _l ^* ^z" ^^{_:L " {E(}3- ³

where a, is the k LPC coefficient. If x represents the residual signal sample 49 at time n and y represents the corresponding regenerated _.pre-emphasized speech data signal sample 53, the filter operation can be represented by the following difference equation:

^yή ^{= x}n ^{+ a}l ' y_n-l ^{+ a}2 ' ^Yn-2 ⁺ ' " ' ^{+ a}10 ^{* Y}n-10 ^(Ec2- ³

The simplest way to implement this equation is to place the coefficients a, in a fixed register and to implement the delay buffer using a shift register. The ^'-TMS32010 micro-code is optimized to perform this operation using the LTD/MPY 31 commands:, the processor has a pipelined Multiply-Accumulate instruction that executes in 400 ns.

The de-emphasis subroutine 95 implements a first-order digital de-emphasis filter. The inputs to this subroutine are the current regenerated sample 53, the previous regenerated sample, and the pre-emphasis constant. The output from this subroutine is the regenerated speech data signal sample 55.

First-order digital de-emphasis is applied to complement the pre-emphasis function in the transmitter processor. De-emphasis uses a single-delay low-pass filter. The de-emphasis constant (A) is. also set to 0.9375. The difference equation for the filter is:

Yn = xn + A • Yn-,1 (Eq -. 32)

The speech output subroutine 97 also is included in the main program for the receiver processor. This subroutine out¬ puts the regenerated speech data signal samples to a data buffer in external data memory from which the samples are provided.

A decision 98 that the frame has been^" completed concludes the synthesis routine of Figure 7.

The calling hierarchy for the synthesis routine of Figure 7 is controlled by the main program for the receiver processor and „__,

PCT/US85/02168

32

calls the following subroutines in the following order: CRC check 90_/ LPC coefficient generation 91, PPADPCM decoding 92, inverse filter 94 and de-emphasis 95.

Transmitter and receiver systems that are commonly located may be included in a single digital processor.

Claims

33CLAIMS

1. A residual-excited linear prediction (RELP) vocoder comprising a digital signal processor adapted for processing digital speech data signal samples to provide a formatted transmission signal including (a) a quantized residual signal generated by inverse filtering of the samples in accordance with linear pre- dictive coding (LPC) coefficients generated from the samples, (b) quantized- LPC coefficients and (c) pitch and gain parameters generated during quantization of the residual signal from the inverse filtered samples, all of which are generated by the processor fro the digital speech data signal samples.

2. A RELP vocoder according to Claim 1, wherein the digital signal processor is adapted for performing a routine for generating the LPC coefficients; a routine for generating the residual signal; and a routine for quantizing the residual signal and the LPC coefficients.

3. A RELP vocoder according to Claim 2, wherein the routine for generating the LPC coefficients comprises a subroutine for pre-emphasizing the samples in order to emphasize the high frequencies of speech; a subroutine for defining an auto-correlation function (ACF) from the pre-emphasized samples in order to generate ACF coefficients; and a .subroutine for generating the LPC coefficients from the generated ACF coefficients.

4. A RELP vocoder according to Claim 3, wherein the routine for generating the residual signal comprises a subroutine for inverse filtering the pre-emphasized samples in accordance with the generated LPC coefficients; a subroutine for bandlimiting the residual signal by low-pass filtering; and^" a subroutine for downsampling the bandlimited residual signal to reduce the number of residual signal samples that are quantized and formatted for transmission.

5. A RELP vocoder according to Claim 4, wherein the routine for quantizing the residual signal and LPC coefficients comprises a subroutine for quantizing the LPC coefficients; a subroutine for estimating the pitch period of the down¬ sampled residual signal by ACF analysis of the current downsampled residual signal^' frame in accordance with the ACF 35

coefficients generated for the previous frame to thereby provide a pitch delay parameter.for the current frame; a subroutine for providing a pitch predictor gain . parameter for each residual signal frame in accordance with the estimated pitch delay parameter for each corresponding frame; a subroutine for providing a quantizer gain parameter for each residual signal frame in accordance with the pitch delay and pitch predictor gain parameters 'for each corresponding frame; and a subroutine for quantizing each residual signal frame by pitch predictive adaptive differential pulse code modulation ^•

(PPADPCM) in accordance with the pitch delay,, pitch predictor gain and quantizer gain parameters for each corresponding frame.

6. A RELP vocoder according to Claim 5, further comprising a second digital signal processor adapted for processing said formatted transmission signal to synthesize reconstructed digital speech data signal samples, wherein the second processor is adapted for performing a synthesis routine comprising a subroutine for regenerating the LPC coefficients from the quantized LPC coefficients included in the transmission signal KTiυsKma

36

_ a subroutine for decoding the quantized residual signal included in the transmission signal in accordance with the pitch delay, pitch predictive gain and quantizer gain parameters included in the transmission signal to thereby provide a decoded downsampled residual signal; a subroutine for spectrally regenerating a full-band residual signal from the decoded downsampled residual signal;- a subroutine for regenerating pre-emphasized digital speech data signal samples by auto-regressively filtering the regener- ated full-band residual signal in accordance with the regenerated

LPC coefficients; and a subroutine for de-emphasizing the regenerated pre-emphasized samples in order to de-emphasize the high frequencies of speech, to thereby provide the reconstructed digital speech data signal samples.

7. A RELP vocoder according to Claim 6, wherein the decoding subroutine comprises a subroutine for scaling quantizer coefficients for each quantized residual signal frame in accordance with the quantizer gain parameter included in the transmission signal; " a subroutine for providing data samples from the quantized residual signal included in the transmission signal in accordance with.the scaled quantizer coefficients; and 37 a subroutine for providing the decoded downsampled residual signal from the data samples by pitch excitation in accordance with the pitch delay and pitch predictor gain parameters.

8. A RELP vocoder according to Claim 7, wherein the pitch excitation subroutine comprises processing the data samples in accordance with a variable delay. first order difference equation:

where S is the provided decoded downsampled residual signal sample, X is the provided data sample, B is the pitch pre- dictor gain and T is the pitch delay for the current residual signal frame.

9. A RELP vocoder according to Claim 5, further comprising a second digital signal processor adapted for processing said formatted transmission signal to synthesize signal samples

10. A RELP vocoder according to Claim 2, further comprising a second digital signal processor adapted for processing said formatted transmission signal to synthesize reconstructed digital speech data signal samples. Fcr/ua^aj/t-i*

38

11. A RELP vocoder according to Claim 1, further comprising a second digital signal processor adapted for processing . said formatted transmission signal to synthesize reconstructed digital speech data signal samples.

12. A RELP vocoder according to Claim 11, wherein the second processor is adapted for performing a synthesis routine comprising a subroutine for regenerating the LPC coefficients from the quantized LPC coefficients included in the transmission signal; a subroutine for decoding the quantized residual signal included in the transmission signal in accordance with the pitch delay, pitch predictive gain and quantizer gain para- meters included in the transmission signal to thereby provide a decoded downsampled residual signal; a subroutine .for spectrally regenerating a full-band residual signal from the decoded downsampled residual signal; a subroutine for regenerating pre-emphasized digital speech data signal samples by auto-regressively filtering the regenerated full-band residual signal in accordance with the regenerated LPC coefficients; and a subroutine for de-emphasizing the regenerated pre-emphasized samples in order to .de-emphasize the high 39

frequencies of speech, to thereby provide the reconstructed digital speech data signal samples.

13. A RELP vocoder according to Claim 12, wherein the decoding subroutine comprises a subroutine for scaling quantizer coefficients for each quantized frame in accordance with the quantizer gain parameter included in the transmission signal.; a subroutine for providing data samples from the quantized residual signal included in the transmission signal in accordance with the scaled quantizer coefficients; and a subroutine for providing the decoded downsampled residual signal from the data samples by pitch excitation in accordance with the pitch delay and pitch predictor gain parameters.

14. A RELP vocoder according to Claim 13, wherein the pitch excitation subroutine comprises processing the data samples in accordance with a variable delay first order difference equation:

where S is the provided decoded downsampled residual signal sample, X is the provided data sample, B is the -pitch predictor gain and T is the pitch delay for the current residual signal fram

15. A residual-excited linear prediction (RELP) vocoder 2 comprising a digital signal processor adapted for processing a 4 formatted transmission signal including (a) a quantized residual signal generated by inverse filtering of digital 6 speech data signal samples in accordance with linear^'predic¬ tive coding (LPC) coefficients generated from the samples, 8 (b) quantized LPC coefficients and (c) pitch and gain para¬ meters generated during quantization of the residual signal 0 from the inverse filtered samples, to synthesize reconstructed digital speech data signal samples.

16. A RELP vocoder according to Claim 15,

2 wherein the processor is adapted for performing a synthe¬ sis routine comprising

4 a subroutine for regenerating the LPC coefficients from the quantized LPC coefficients included in the transmission

6 signal; a subroutine for decoding the quantized residual signal

8 included in the transmission signal in accordance with the pitch delay, pitch predictive gain and quantizer gain para- 0 meters included in the transmission signal to thereby provide a decoded downsampled residual signal; 41 a subroutine for spectrally regenerating a full-band residual signal from the decoded downsampled residual signal; a subroutine for regenerating pre-emphasized digital speech^" data signal samples by auto-regressively filtering the regenerated full-band residual signal in accordance with the regenerated LPC coefficients; and a subroutine for de-emphasizing the regenerated pre-emphasized samples in order to de-emphasize the high frequencies of speech, to thereby provide the reconstructed digital speech data signal samples.

17. A RELP vocoder according to Claim 16, wherein the decoding subroutine comprises a subroutine for scaling quantizer coefficients for each quantized frame in accordance with the quantizer gain parameter included in the transmission signal; a subroutine-for providing data samples from the quantized residual signal included in the transmission signal in accord- ance with the scaled quantizer coefficients; and a subroutine for providing the decoded downsampled residual signal from the data samples by pitch excitation in accordance with the pitch delay and pitch predictor gain parameters.

18. A RELP vocoder according to Claim 17 wherein the pitch excitation subroutine comprises processing the data samples in accordance with a variable delay first order difference equation:'

where S is the provided decoded downsampled residual signal sample, X is the provided data sample, B is the pitch predictor gain and T is the pitch- delay for the current residual signal frame.