EP0203940A1 - Relp vocoder implemented in digital signal processors - Google Patents
Relp vocoder implemented in digital signal processorsInfo
- Publication number
- EP0203940A1 EP0203940A1 EP85905709A EP85905709A EP0203940A1 EP 0203940 A1 EP0203940 A1 EP 0203940A1 EP 85905709 A EP85905709 A EP 85905709A EP 85905709 A EP85905709 A EP 85905709A EP 0203940 A1 EP0203940 A1 EP 0203940A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- subroutine
- signal
- samples
- residual signal
- pitch
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000005540 biological transmission Effects 0.000 claims abstract description 54
- 238000013139 quantization Methods 0.000 claims abstract description 32
- 238000001914 filtration Methods 0.000 claims abstract description 16
- 230000005284 excitation Effects 0.000 claims abstract description 9
- 238000004458 analytical method Methods 0.000 claims description 28
- 238000012545 processing Methods 0.000 claims description 17
- 230000015572 biosynthetic process Effects 0.000 claims description 14
- 238000003786 synthesis reaction Methods 0.000 claims description 14
- 238000005311 autocorrelation function Methods 0.000 claims description 12
- 230000001172 regenerating effect Effects 0.000 claims description 11
- 230000003044 adaptive effect Effects 0.000 claims description 5
- 238000000034 method Methods 0.000 abstract description 9
- 101001096074 Homo sapiens Regenerating islet-derived protein 4 Proteins 0.000 abstract 1
- 102100037889 Regenerating islet-derived protein 4 Human genes 0.000 abstract 1
- 239000000872 buffer Substances 0.000 description 21
- 230000006870 function Effects 0.000 description 14
- 238000012546 transfer Methods 0.000 description 6
- 230000001755 vocal effect Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 5
- 238000010606 normalization Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000008929 regeneration Effects 0.000 description 3
- 238000011069 regeneration method Methods 0.000 description 3
- 230000003595 spectral effect Effects 0.000 description 3
- 238000013523 data management Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 208000030979 Language Development disease Diseases 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 239000006227 byproduct Substances 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000005314 correlation function Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 230000001373 regressive effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/06—Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
Definitions
- the present invention generally pertain-s to voice coders (vocoders) and is particularly directed to Residual-Excited Linear Prediction (RELP) vocoders.
- Vocoders convert speech signals into digital form for transmission and synthesize speech signals from these digital signals upon reception.
- Vocoders typically operate at flexible binary data rates ' varying from 32 kbps ⁇ kilobits per second) down to about 2.4 kbps.
- Vocoders traditionally are divided into two basic types, waveform coders and pitch-excited source coders.
- Waveform coders operate at high data rates (above 16 kbps) and produce good quality natural sounding speech which is robust against both acoustic and transmitted noise.
- Source coders operate at low data rates (less than 4.8 kbps) in an analysis/synthesis mode governed by a mathematical model of the human vocal , . apparatus.
- Source vocoders typically sound robotic and do not perform well under poor acoustic conditions. ,
- the RELP vocoder was originally proposed by Un and Magill, "The Residual-Excited Linear Prediction Vocoder with Transmissio Rate Below 9.6 kbits/s", IEEE Trans. COM-23, 1975 pp. 1466-1473; and an enhanced RELP vocoder was proposed by Dankberg and Wong, "Development of a 4.8-9.6 kbps RELP vocoder", ICASSP-79.
- the purpose of the RELP vocoder was to provide satisfactory perform ⁇ ance in the gap between the operating ranges of waveform coders and source coders, to wit: 4.8 kbps to 16 kbps.
- the RELP vocoder contains some features of both waveform coders and source coders.
- digital speech data signal samples are analyzed over relatively short time segments (typically in the range of 10-30 ms-.) by a linear predictive coding (LPC) vocal tract modeling technique to provide LPC coefficients for each block of samples.
- LPC coefficients represent the vocal tract, glottal flow and radiation of the speech represented by the digital signal samples.
- the digital speech data signal samples are inverse filtered by a time-variant, all-pole recursive digital filter over each short time segment to provide residual signal (prediction error signal samples.
- the time-variant character of speech is handled by a succession of such filters with different parameters.
- the residual signal and the LPC coefficients are encoded (quantized) and formatted for transmission.
- speech is synthesized by processing the residual signal in- accordance with the LPC coefficients.
- the residual signal samples are bandlimited and downsampled prior to quantization in order to provide residual signal s-amples at a reduced data rate.
- the upper band harmonics are generated during synthesis of th speech signal when the downsampled residual signal is upsa pled and zeros are inserted between data points.
- the residual signal is quantized prior to transmission by adaptive delta modulation.
- Dankberg and Wong. considered various other quantization tech ⁇ niques and concluded that pitch predictive adaptive different pulse code modulation (PPADPCM) provided the best signal-to- quantizing noise ratio.
- PPADPCM pitch predictive adaptive different pulse code modulation
- the residual signal samples are processed by pitch analysis to determine t pitch delay, are processed by pitch predictor gain analysis t determine the pitch predictor gain in accordance with the det mined pitch delay, processed by gain analysis to provide a maximum deviation quantizer gain, and are further processed b PPADPCM in accordance with the quantizer gain, pitch predicto gain and delay parameters to thereby provide the quantized residual signal.
- RELP vocoders of the prior art have required complex hardware and have been so expensive to implement as to be commercially impractical. - m
- the present invention provides a commercially practical RELP vocoder that, is implemented by two digital signal processors, one for a transmitter system and one for a remotely located re- ceiver system.
- the transmitter digital signal processor is adapte for processing digital speech data signal samples to provide a formatted transmission signal including (a) a quantized residual signal generated by inverse filtering of the samples in accordance with linear predictive coding (LPC) coefficients generated from the samples, (b) quantized LPC -coefficients and (c) pitch and gain parameters generated during quantization of the residual signal from the inverse filtered samples, all of which are generated by the processor from the digital -speech data samples.
- the receiver digital signal processor is adapted for processing the formatted transmission signal to synthesize reconstructed digital speech data signal samples.
- the transmitter digital signal processor is adapted for performing a routine for generating the LPC coefficients; a routine for generating the residual signal; and a routine for quantizing the residual signal and the LPC coefficients.
- the routine for generating the LPC coefficients includes a subroutine for pre-emphasizing the samples in order to emphasize the high frequencies of speech, a subroutine for defining an auto-correlatio function (ACF) from the prec-emphasized s-amples in order to generate ACF coefficients; and a subroutine for generating the LPC coe ficients from the generated ACF coefficients.
- ACF auto-correlatio function
- the routine for generating the residual signal includes a subroutine for inverse filtering the pre-emphasized samples in accordance with the generated LPC coefficients; a subroutine for bandlimitin the residual signal by low-pass filtering in a manner which will reduce the effects of quantization; and a subroutine for downsampling the bandlimited residual signal to reduce the number of residual signal samples that are quantized and formatte for transmission.
- the routine for quantizing the residual signal and LPC coefficients includes a subroutine for quantizing the LPC coefficients; a subroutine for estimating, the pitch period of the downsampled residual signal by ACF analysis of the current downsampled residual signal frame in accordance with the ACF coefficients generated for the previous, frame to thereby provide a pitch delay parameter for the current frame; a subroutine for providing a pitch predictor gain parameter for each residual signal frame in accordance with the estimated pitch delay para- meter for each corresponding frame; a subroutine for providing a quantizer gain parameter for each residual signal frame in accord ance with the pitch delay and pitch predictor gain parameters for each corresponding frame; and a subroutine for quantizing each residual signal frame by pitch predictive adaptive differential pulse code modulation (PPADPCM) in accordance with the pitch delay, pitch predictor gain and quantizer gain parameters for each corresponding frame.
- PPADPCM pitch predictive adaptive differential pulse code modulation
- the receiver digital signal processor is adapted for processi the formatted transmission signal to synthesize reconstructed digital speech data signal samples by performing ' a synthesis routine th includes a subroutine for regenerating the LPC coefficients from the quantized LPC coefficients included in the transmission signal; a subroutine for decoding the quantized residual signal included in the transmission signal in accordance with the pitch delay, pitch predictive gain and quantizer gain parameters included in the transmission signal to thereby provide a decoded downsampled residual signal; a subroutine for spectrally regenerating-a full-band residual signal from the decoded downsampled residual signal; a subroutine for regenerating pre-emphasized digital speech data signal samples by auto-regressively filtering the regenerated full-band residual signal in accordance with the regenerated LPC coefficients; and a subroutine for de-emphasizing the regenerated preemphasized samples in order to de-emphasize the high frequencies of speech, to thereby provide the reconstructed digital speech data signal samples.
- the decoding subroutine includes a subroutine for scaling quantizer coefficients for each quantized residual signal frame in accordance with the quantizer gain parameter included in the transmission signal; a subroutine for providing data samples from the quantized residual signal included in the transmission signal in accord- ' ance with the scaled quantizer coefficients; and a subroutine for providing the decoded downsampled residual signal from the data samples by pitch excitation in accordance with the pitch delay and pitch predictor gain parameters.
- Figure 1 is a functional block diagram illustrating the process implemented by the transmitter digital signal processor to code an input signal sample for transmission.
- FIG 2 is a functional block diagram illustrating the process implemented by the receiver signal processor to decode a sample which is coded in accordance with the process illustrate in Figure 1.
- Figure 3 is a flow chart -of the LPC coefficient generation routine performed by the transmitter digital signal processor.
- Figure 4 is a flow chart of the residual signal generation routine performed by the transmitter digital signal processor.
- Figure 5 is a flow chart of the quantization routine per ⁇ formed by the transmitter digital signal processor.
- Figure 6 is a diagram of a quantization filter implemented during the PPADPCM quantization subroutine included in the routin of Figure 3.
- Figure 7 is a flow chart of the synthesis routine performed by the receiver digital signal processor.
- the transmitter digital signal processor and receiver digital signal processor respectively are each Texas Instruments Model TMS32010 Digital Signal Processors.
- the TMS32010 processor is a 16-bit, 200 ns cycle time, stand-alone processor with a 32-bit ALU and Accumulator.
- the processor has a four level stack for nested subroutines; and arithmetic performance is enhanced by a hardware 16*16-bit parallel multiplier, which performs a pipelined multiply/accumulate operation in 400 ns.
- the TMS32010 processor has 144 16-bit words available as internal RAM which may be augmented by addressing external RAM, for buffer storage, via TBLR/TBLW (table read/write) commands.
- Program memory may be redefined as external- data memory but its access time is 600 ns. External program memory may be expanded to 8K bytes at full speed.
- the two processors must perform all operations of the RELP vocoder in real time.
- the processor choice is constrained by two key factors: operating speed and available internal RAM (especially important because frame storage is required) .
- the TMS32010 processor is chosen based 5 on its fast operating speed (5 MHz) , data storage capabilities, and extensive development tools.
- Digital speech data signal samples 10 are pre-emphasized 11 to improve the representation of 0 high frequencies during the subsequent LPC analysis.
- Pre-emphasiz samples 12 are subjected- to LPC analysis 13 to provide LPC reflec ⁇ tion coefficients 14.
- the LPC reflection coefficients 14 " are quantized 15 to provid quantized LPC reflection coefficients 16.
- the LPC reflection 5 coefficients 14 are quantized to minimize distortion during sub ⁇ sequent transmission to the receiver.
- LPC coefficients 17 are - generated 18 from. the quantized LPC reflection coefficients 16.
- the pre-emphasized samples 12 are inverse filtered 19 in accordance with the LPC coefficients 17 to provide a residual 0 signal 20.
- the residual signal 20 is bandlimited 21 and down ⁇ sampled 22 to provide a baseband residual signal 23.
- the baseband residual signal 23 is quantized by PPADPCM quantization 24 in order to minimize the effects of distortion during subsequent .transmission of the quantized residual signal 25.
- Three of the parameters of the PPADPCM quantization 24 are pitch delay, pitch predictor gain and quantizer gain. These three parameters are generated during PPADPCM quantization 24 and are necessary to decode to the quantized residual signal received by the receiver system. Accordingly, a pitch delay signal is provided on line 26, a pitch predictor gain signal is provided on line 27 and a quantizer gain signal is provided on line 28 incident to the PPADPCM quantization 24 of the baseband residual signal 23. .. .
- the quantized residual signal 25, the quantizer, the pitch delay signal on line 26, the pitch predictor gain signal 27, the quantizer gain signal 28 and the quantized LPC reflection coefficients 16 are combined linearly by formatting 32 to provide a transmission frame 34.
- the principal functions of the receiver processor are de ⁇ scribed with reference to Figure 2.
- the format of each received data transmission frame 36 is decoded 37 to provide the quantized residual signal 39, the pitch delay parameter 40, the pitch predictor gain parameter 41, the quantizer gain parameter 42 and the quantized LPC reflection coefficients 43.
- the quantized residual signal 39 is decoded by PPADPCM decoding 46 in accordance with the pitch delay 40, pitch predic ⁇ tor gain 4T and quantizer gain 42 to provide a decoded baseband residual signal 47.
- the decoded baseband residual signal 47 is spectrally regenerated 48 to provide a full-band residual signal 4
- the quantized LPC reflection coefficients 43 are processed 50 to generate the LPC coefficients 51.
- the full-band residual signal 49 is filtered 52 in accord ⁇ ance with the generated LPC coefficients 51 to synthesize a decoded speech data signal samples 53.
- the decoded speech data signal samples 53 are de-emphasized 54 to provide a regenerated digital speech data signal samples 55.
- the processing routine represented by the flow chart of Figure 3 generally pertains to LPC analysis.
- This routine gene ates the LPC coefficients from a buffered frame of pre-emphasiz speech data signal samples.
- the routine of Figure 4 is general directed to generation of the residual signal; and the routine Figure 5 is generally directed to quantization of the residual signal and the LPC coefficients.
- the LPC analysis routine includes the subroutines of initialization 58, sample input 59, pre-emphasis 61, ACF generation 63, ACF normalization 65 and LPC analysis 66.
- the sample input subroutine 59 reads in digital speech data signal samples from an external data memory buffer.
- the pre-emphasis subroutine 61 applies first-order digital pre-emphasis to the input speech data signal samples.
- the input to the algorithm is the input ' speech sample S and the output is the pre-emphasized speech sample S 1 , both located in internal RAM.
- First-order digital pre-emphasis is applied to the input speech signal to emphasize the high frequencies of speech. This leads to a more accurate estimate of the vocal tract frequency response, which is controlled by the. LPC parameters.
- Pre-emphasi uses a single-delay high-pass filter. Experimentation shows that the choice of the pre-emphasis constant (a) is not critical and it is normally set to 0.9375.
- the difference equation for the filter is:
- the pre-emphasis function is complemented at the receiver system by applying a de-emphasis function.
- the pre-emphasized samples are stored in an external data memory for use in the residual signal generation routine of Figure- 4.
- the ACF generation subroutine 63 iteratively updates a correlation buffer for each input speech data signal sample.
- This buffer must be zeroed prior to the first call to the sub ⁇ routine.
- the output of this subroutine is a 32-bit precision auto-correlation function (ACF) for delays between zero and ten points. 13
- an auto ⁇ correlation function (ACF) must be defined from a windowed buffer of pre-emphasized speech samples (s .) .
- ACF auto ⁇ correlation function
- the window (w.) is chosen to be rectangular for ease of implementation.
- a tenth-order LPC analysis requires the ACF coefficients R 0 ,...,R, Q . These coefficients may be updated iteratively for each input speech data signal sample. •
- V n+1) R k (n) + x n * x n-k (Eg. 5)
- R, (n) is the n iteration of the k ACF coefficients.
- This equation is implemented by the ACF generation subroutine 63.
- the coefficients R, are maintained with 3.'2-bit accuracy to remove round-off error problems.
- the algorithm is imple- mented by creating a delay buffer that is initialized to zero and ripples after each iteration. This implementation also ensures that the 32-bit result will not overflow.
- the maxi ⁇ mum value attained by the accumulator for a data buffer of 180 samples, is:
- the 32-bit ACF. result Upon completion of sample input, the 32-bit ACF. result, must be converted to 16-bit coefficients.
- the ACF normalization subroutine 65 performs all operations required to convert the 32-bit ACF to a 16-bit result.
- the LPC analysis subroutine 66 is transparent to a scaled ACF input. Therefore, to obtain the maximum dynamic range of the 16-bit ACF, the 32-bit results are scaled to the maximum, R Q , prior to truncation to 16-bits. The optimal procedure for this would be to divide all coefficients by R Q . However, execution efficiency is greatly improved by simply left-shifting the 32-bit numbers to remove leading zeros in the R Q value.
- A.decision 67 that the 32-bit correlation frame is complete enables the processor to proceed to the ACF normalization subroutine 65.
- the LPC analysis subroutine 66 implements the Durbin algorithm to generate the ten LPC .coefficients and ten LPC reflection coefficients 36. .
- the Durbin algorithm-! 1 s input is the normalized 16-bit ACF. 15
- the Durbin algorithm is an extremely efficient algorithm for generating the LPC coefficients. See J. Makhoul, "Linear Predition: A tutorial Review", Proc IEEE, Vol. 63, pp 561-80, 1975.
- the ' algorithm is suitable for fixed-point arithmetic 5 implementation and also generates, as a by-product, the reflec ⁇ tion coefficients, which may used for quantization and coding prior to transmission to the receiver. " . •
- the LPC coefficients may be generated by the Le Roux-Gueguen (LG) recursion, which is described in O J. Le Roux and C. Gueguen, "A Fixed Point Computation of Partial Correlation Coefficients in Linear Prediction", Proc ICASSP-77, pp 742-3.
- LG Le Roux-Gueguen
- the LG recursion although faster than the Durbin algorithm, generates only the LPC reflection -coefficients and not the LPC coefficients, per se which must be generated 5 separately.
- R. is the i auto-correlation function k. is the i -reflection coefficient a.-' is the i . LPC coefficient (j iteration
- the order of the LPC analysis, P is determined experimentally and a 10th order analysis is s.ufficient to adequately model the vocal tract frequency response.
- the LPC parameters must be quantized and coded prior to transmission and resynthesis of the digital speech data signal at the receiver.
- the LPC coefficients, a, are sensi ⁇ tive to quantization noise and introduce significant distortion to the signal.
- a solution is to quantize and code the LPC reflection coefficients, k., which are much less sensitive to
- LPC coefficient quantization subroutine 68 which is a part of the quantization routine of Figure 5.
- the initialization subroutine 58 and the sample input ;.-. subroutine 59 are both contained .in the main program for the transmitter processor.
- the main program controls the calling of the other subroutines in the LPC analysis routine of - Figure 3 in accordance with the following hierarchy: pre-emphasis 61, ACF generation 63, ACF normalization 65 and LPC analysis 66.
- the main program implements the LPC -analysis routine of Figure 5 to generate a frame of a predetermined number of pre-emphasized speech data signal samples and the ten LPC coefficients.
- LPC coefficients refers to either LPC coefficients or LPC reflection coefficients unless the latter is specified.
- the residual signal generation routine is represented by the flow chart of Figure 4. This routine includes the subroutines of initialization 70, sample imput 71, inverse filter 72, bandlimit 73 and downsample 74.
- the initialization subroutine 70 transfers second-order section filter coefficients from external data memory to the internal RAM of the transmitter processor for use during the bandlimit subroutine -73.
- the s-ample input subroutine 71 inputs the pre-emphasized samples from a speech data buffer located in the external data memory to the zero-delay position of a speech delay buffer, which is located in the internal RAM of the transmitter processor. a 85/02KB
- the delay buffer is used for the implementation by.the inverse filter subroutine 72 of the all-zero Finite-Impulse-Response (FIR filter in accordance with the LPC coefficients.
- the inverse filter subroutine 72 implements an all-zero inverse filter in accordance with the LPC coefficients to generat the residual signal 19 ( Figure 1) .
- the output from this sub ⁇ routine 72 is provided to a residual signal data buffer which is located in the external data memory.
- the residual signal 20 is generated by inverse filtering the pre-emphasized speech data signal samples 12 in accordance with the LPC coefficients 17. (See Figure 1) .
- the residual signal 19 is obtained by filtering the speech data signal samples 12 by the all-zero filter H(z) ⁇ . If represents the input speech sample at time n and y represents the corresponding output sample, the filter can be represented by the following difference equation:
- the simplest way to implement this structure is to place the coefficients a, in a fixed register and to implement the delay buffer using a shift register.
- the TMS32010 micro-code is optimized to perform this operation using the LTD/MPY commands: the processor has a pipelined Multiply/Accumulate instruction that executes in 400 ns.
- the bandlimit subroutine 73 low-pass filters the residual signal 20 by implementing an eighth-order elliptic half-band filter, which in turn is implemented by using a cascade of four second-order sections.
- the transfer function of the elliptic filter is:
- the second-order polynomial H (z) is implemented by a second- order filter section.
- the second-order section is implemented by an internal subroutine that is called four times to provide a cascade of four second-order sections.
- a cascade of four sections is equivalent to an eighth-order elliptic low-pass filter. Each section uses a set of filter coefficients and requires its own delay buffer, which must be shifted at each iteration.
- the downsample subroutine 74 implements downsampling by discarding predetermined samples.
- the downsample algorithm uses the frame counter to alternate between discarding the input data point or scaling it to maintain the energy per frame.
- the downsampling function reduces the filtered residual signal sample data rate. This function is executed by a frame position pointer. The sample is either discarded or magnitude-scaled (multiplied by a predetermined factor to maintain the average frame energy of the residual signal).. If, for example, the downsampling ratio is two, the scaling factor is also two.
- a decision 75 that the frame is complete concludes the residual signal generation routine of Figure 4.
- the sample input 71 and inverse filter 72 subroutines and the decision 75 are integrated together and control the calling hierarchy for the other subroutines in the residual signal generation routines of Figure 4.
- the order of such calling hierarchy is bandlimit 73 and downsample 74.
- the quantization routine represented by the flow chart of Figure 5 includes the following ' subroutines: LPC coefficient quantization 68 (discussed above in relation to the LPC analysis subroutine 66), pitch delay 78, pitch predictor gain 80, quan ⁇ tizer gain 81, CRC 82, PPADPCM quantization 83 and data format 8
- the LPC coefficient quantization subroutine 68 quantizes the ten LPC reflection coefficients. 14. This subroutine obtains its input data from the LPC reflection coefficients 14 and quantizer look-up subroutine 68 during the operation of the LPC analysis subroutine 66. "This subroutine 68 is called by the LPC analysis subroutine 66.
- the reflection coefficients are quantised with a variable number of bits per coefficient compatible with DOD standard LPC-10 coding, which is described in T. E. Tremain, "The
- a data management algorithm performs buffer transfers between internal RAM and external data memory to enable all routines to execute using internal RAM memory.
- the pitch delay subroutine 78 estimates the pitch period -to determine the pitch delay parameter T of the downsampled residual signal.22 ( Figure 1) used for the PPADPCM quantization using an auto-correlation function (ACF) analysis of the signal 22.
- the inputs to the algorithm are the partial ACF of the previous frame and the current residual signal frame.
- the out ⁇ put from the algorithm is the estimated pitch delay T and the updated partial ACF.
- the pitch delay is updated at the frame rate.
- Pitch analysis uses a simple auto-correlation detector:
- the pitch delay, T is chosen as the maximum value of R(T) , evaluating 3 R(T) between Tmm. and Tmax. To enable an accurate estimate of the pitch delay, the analysis must cover three pitch periods, i.e., N>3Tmax.
- The- limits of the pitch detection are chosen experimentally using Fortran simulations of the
- RELP vocoder alg 3 orithm; for examp c le, Tmm. is a 15 samp c le delay J and T is a 40 sample delay. This corresponds to pitch fre ⁇ quencies of 267 Hz and 100 Hz respectively if the downsampled residual signal 22 has a sampling rate of 4 kHz.
- the value N is therefore chosen to be two downsampled frames.
- the auto ⁇ correlation detector, R(T) is evaluated as- two partial-ACF 1 s, R, (T) and R_ (T) , where: 23
- R(T) is calculated by adding the current frame's partial-ACF, R 2 (T), and the previous frame's partial-ACF, R_, (T) , that was stored in external data memory.
- the pitch predictor gain subroutine 80 evaluates the pitch predictor gain parameter B for the PPADPCM quantization and updates such evaluation at the frame rate.
- the pitch predictor gain B is evaluated as:-
- M is a single downsampled frame and T is the pitch delay. 3 is constrained between two limits:
- the quantizer gain subroutine 81 evaluates the quantizer gain parameter q . for the PPADPCM quantization and updates such evaluation at the frame rate. This parameter is used to scale the quantizer to the input signal level; each input and output level of the quantizer is multiplied by q . .
- the parameter is chosen to be the maximum x :
- T is the pitch delay
- B is the pitch predictor gain
- the CRC subroutine 82 introduces an n-bit cyclic redun- dancy code (CRC) on part of the transmission frame to enable detection of bit errors during transmission.
- CRC cyclic redun- dancy code
- the code protects the LPC coefficients and PPADPCM parameters.
- the input to the subroutine is the relevant quantized coefficients.
- the output from the subroutine is an n-bit CRC- to be transmitted.
- the PPADPCM subroutine 83 quantizes the downsampled residual signal 22, using Pitch Predictive Adaptive Differential Pulse Code Modulation (PPADPCM) .
- PPADPCM Pitch Predictive Adaptive Differential Pulse Code Modulation
- the term "pitch predictive"- is misleading however.
- the pitch predictor is used to remove the dominant periodic frequency from the residual signal 22 prior to quantization. While this frequency is most commonly the pitch period, the predictor may.lock onto an alternate frequency without detrimenting the operation of the quantizer. Therefore a rigorous pitch extraction algorithm is not necessary.
- the predictor removes the dominant periodicity of the waveform to generate a "white noise" signal with a Gaussian probability density function (pdf) . This signal may then be quantized using a classical Max quantizer, as described in J. Max, "Quantizing for Minimum Distortion," IRE Trans on Information Theory, March 1960. 25 Figure 6 shows the structure of the PPADPCM quantizer.
- the quantizer is embedded in the predictor loop so that the error spectrum introduced by quantization is uniform.
- the parameters of the quantizer are the pitch delay (T) , the quantizer gain (qga. ), the pitch predictor gain (B) , and the order of the quantizer (Q) .
- T the pitch delay
- Q the quantizer gain
- B the pitch predictor gain
- Q the order of the quantizer
- the data format subroutine 84 formats a data frame 34 ( Figure 1) for transmission.
- the input to the subroutine 84 is a predetermined number of quantized residual signal samples 25, the pitch delay parameter 26, the pitch predictor gain 27, the quantizer gain 28, the quantized LPC coefficients 31 ( Figure 1) and the CRC.
- the output from the subroutine 84 is a transmissio data frame 34 which is placed in the output buffer.
- a decision 85 that the frame is complete concludes the quantization routine of Figure 5.
- the calling hierarchy of the. subroutines in the quantiza ⁇ tion routine of Figure 5 is under the control of the main program.
- the following subroutines are integrated together in a subroutine designated PPQNT: pitch predictor gain 80, quantize gain 81 and PPADPCM quantization 83.
- the calling hierarchy is as follows: pitch * 78, PPQNT, CRC 82 and data format 84.
- the subroutine 68 is called by the LPC analysis subroutine 66 in the LPC analysis routine of Figure 3.
- the receiver digital signal processor utilizes a synthesis processing routine.
- the synthesis routine includes the following subroutines: initialization 88, data input 89,- CRC check 90, LPC coefficient generation 91, PPADPCM decoding 92, spectral regeneration 93, LPC synthesis filter 94, de-emphasis 95, and speech output 97.
- the initialization subroutine 88 is included in the main program for the receiver processor.- The initialization sub ⁇ routine 88 initializes all registers and data locations within the processor prior to the execution of each subroutine.
- the data input subroutine 89 also is included in the main program for the receiver processor. This subroutine inputs the data transmission frame 36 received from the transmitter by inputting the frame from a frame buffer in external data memory.
- the CRC check subroutine 90 uses the received transmission data frame to generate an n-bit CRC which it compares to the n-bit CRC in the received transmission data frame to check for transmission errors. If any errors are detected, a subset of the LPC and PPADPCM parameters for the current frame are dis- carded and a subset of the previous fr-ame's parameters substituted.
- the input to this subroutine is an-bit CRC word from the data transmission frame.
- the output from this subroutine is a flag indicating which set of parameters to use during the rest of the subroutine.
- the LPC coefficient generation subroutine 91 reads in the transmitted quantized LPC parameters, calls a subroutine: IQRC to decode the LPC reflection coefficients, and performs a step-up algorithm to transform the LPC reflection coefficients to the LPC coefficients ' .
- the input to this subroutine is the transmitted quantized LPC reflection coefficients 43 and the output is the LPC coefficients 51 ( Figure 2) .
- the LPC coefficients Prior to LPC synthesis filtering 52, the LPC coefficients must be generated from the transmitted quantized LPC reflection coefficients. These quantized LPC reflection coefficients must be unpacked and decoded using the quantizer look-up tables described in T. E. Tremain, "The Government Standard Linear Predictive Coding Algorithm: LPC-10", Speech Technology, April 1982. The LPC coefficients are then generated from the decoded LPC reflection coefficients using the step-up algorithm, a recursive algorithm which is a subset of the Durbin algorithm described in J. Makhoul, "Linear Prediction: A tutorial Review,” Proc IEEE, Vol 63, pp 561-80, 1975.
- the PPADPCM decoding subroutine 92 reads in the bit-packed quantized residual signal 39 and quantizer parameters 40, 41, 42 received from the transmitter and generates a decoded baseband (downsampled) residual ' signal 47 ( Figure 2) .
- This subroutine 92 must perform the inverse operation of the transmitter's PPADPCM coding. It therefore divides into three parts: unpacking, quantizer look-up, and pitch excitation.
- the PPADPCM decoding subroutine 92 first transfers the PPADPCM quantizer coefficients to internal RAM -and scales them using the quantizer gain parameter.
- the inputs to this operation are the coefficient buffer stored in external data memory and the quantizer gain.
- the output of this operation is the scaled look-up table located in internal RAM.
- This subroutine 92 next reads in packed data bytes from a data buffer in external data memory, unpacks the byte, and decodes the data samples using the quantizer look-up table.
- the input to this operation is the bit-packed data word and the quantizer coefficient table.
- the output from this operation is the set of decoded data samples.
- the received data Bytes are unpacked into individual data samples by masking off each individual data sample, which may then be decoded using the 29 quantizer look-up table that is identical to the one used at the transmitter to quantize the data samples.
- the PPADPCM decoding subroutine 92 then implements a variable delay first-order difference equation to "pitch excite" the input data and recover the downsampled residual signal 47.
- the input to this operation is the transmitted data sample, the pitch delay parameter and the pitch predictor gain parameter.
- the output from this operation is the downsampled residual signal 47.
- the difference equation for this operation is:
- S is the downsampled residual signal sample
- x is the transmitted data sample
- B is the pitch predictor gain
- T is the current frame's pitch delay (period) .
- the spectral regeneration subroutine 93 is included in the main program for the receiver' processor.
- the spectral regeneration subroutine 93 generates a full-band residual signal 49 from downsampled residual signal 47. The effect is to convert a 4 kHz downsampled signal 47 to an 8 kHz full-band signal 49.
- the LPC synthesis filter subroutine 94 implements. an auto- regressive LPC synthesis filter governed by the LPC> coefficients
- the inputs to this subroutine are the LPC coefficients 51 and the regenerated full-band residual signal 49.
- the output from this subroutine is the regenerated pre-emphasized speech data signal sample 53.
- This subroutine 94 generates the speech data signal samples 53 by filtering the residual signal 49 with a tenth-order all-pole filter.-
- the filter is governed by the generated LPC coefficients 51.
- the transfer function of the filter is:
- the filter operation can be represented by the following difference equation:
- the simplest way to implement this equation is to place the coefficients a, in a fixed register and to implement the delay buffer using a shift register.
- the ' -TMS32010 micro-code is optimized to perform this operation using the LTD/MPY 31 commands:, the processor has a pipelined Multiply-Accumulate instruction that executes in 400 ns.
- the de-emphasis subroutine 95 implements a first-order digital de-emphasis filter.
- the inputs to this subroutine are the current regenerated sample 53, the previous regenerated sample, and the pre-emphasis constant.
- the output from this subroutine is the regenerated speech data signal sample 55.
- First-order digital de-emphasis is applied to complement the pre-emphasis function in the transmitter processor.
- De-emphasis uses a single-delay low-pass filter.
- the de-emphasis constant (A) is. also set to 0.9375.
- the difference equation for the filter is:
- the speech output subroutine 97 also is included in the main program for the receiver processor. This subroutine out ⁇ puts the regenerated speech data signal samples to a data buffer in external data memory from which the samples are provided.
- CRC check 90 / LPC coefficient generation 91
- PPADPCM decoding 92 inverse filter 94
- de-emphasis 95 calls the following subroutines in the following order: CRC check 90 / LPC coefficient generation 91, PPADPCM decoding 92, inverse filter 94 and de-emphasis 95.
- Transmitter and receiver systems that are commonly located may be included in a single digital processor.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computational Linguistics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Cash Registers Or Receiving Machines (AREA)
- Telephone Function (AREA)
- Input From Keyboards Or The Like (AREA)
- Silicon Polymers (AREA)
- Stereo-Broadcasting Methods (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
Abstract
Un vocoder RELP (à prédiction linéaire et excitation résiduelle) est utilisé dans deux processeurs de signaux numériques, l'un pour un système de transmission (Fig. 1) et l'autre pour un système de réception à distance (Fig. 2). Le transmetteur traite des échantillons de signaux numériques de données vocales pour fournir un signal formaté de transmission comprenant (a) un signal résiduel quantifié généré par filtrage inverse des échantillons selon des coefficients prévisibles linéaires de codage (LPC) générés à partir des échantillons, (b) des coefficients LPC quantifiés, et (c) des paramètres de pas et de gain générés pendant la quantification du signal résiduel des échantillons inversement filtrés; tous ceux-ci sont générés par le processeur à partir des échantillons de signaux numériques de données vocales. Le processeur de réception du signal numérique traite le signal formaté de transmission afin de synthétiser les signaux numériques reconstitués de données vocales. Des systèmes de transmission et de réception situés au même endroit peuvent être inclus dans un seul processeur de signaux numériques.A RELP vocoder (linear prediction and residual excitation) is used in two digital signal processors, one for a transmission system (Fig. 1) and the other for a remote reception system (Fig. 2). The transmitter processes samples of digital voice data signals to provide a formatted transmission signal comprising (a) a quantized residual signal generated by reverse filtering the samples according to predictable linear coding coefficients (LPC) generated from the samples, (b ) quantized LPC coefficients, and (c) pitch and gain parameters generated during quantization of the residual signal of the inversely filtered samples; all these are generated by the processor from samples of digital voice data signals. The digital signal receiving processor processes the formatted transmission signal to synthesize the reconstructed digital signals from voice data. Transmission and reception systems located at the same location can be included in a single digital signal processor.
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US667446 | 1984-11-01 | ||
US66744684A | 1984-11-02 | 1984-11-02 |
Publications (2)
Publication Number | Publication Date |
---|---|
EP0203940A1 true EP0203940A1 (en) | 1986-12-10 |
EP0203940A4 EP0203940A4 (en) | 1987-04-07 |
Family
ID=24678262
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP19850905709 Withdrawn EP0203940A4 (en) | 1984-11-02 | 1985-11-01 | Relp vocoder implemented in digital signal processors. |
Country Status (7)
Country | Link |
---|---|
EP (1) | EP0203940A4 (en) |
JP (1) | JPS63500896A (en) |
AU (1) | AU577641B2 (en) |
CA (1) | CA1240396A (en) |
DK (1) | DK311386A (en) |
NO (1) | NO862602L (en) |
WO (1) | WO1986002726A1 (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4675863A (en) * | 1985-03-20 | 1987-06-23 | International Mobile Machines Corp. | Subscriber RF telephone system for providing multiple speech and/or data signals simultaneously over either a single or a plurality of RF channels |
JP2626223B2 (en) * | 1990-09-26 | 1997-07-02 | 日本電気株式会社 | Audio coding device |
US6006174A (en) * | 1990-10-03 | 1999-12-21 | Interdigital Technology Coporation | Multiple impulse excitation speech encoder and decoder |
US5235670A (en) * | 1990-10-03 | 1993-08-10 | Interdigital Patents Corporation | Multiple impulse excitation speech encoder and decoder |
ES2143396B1 (en) * | 1998-02-04 | 2000-12-16 | Univ Malaga | LOW RATE MONOLITHIC CODEC-ENCRYPTOR MONOLITHIC CIRCUIT FOR VOICE SIGNALS. |
US7907977B2 (en) | 2007-10-02 | 2011-03-15 | Agere Systems Inc. | Echo canceller with correlation using pre-whitened data values received by downlink codec |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3431362A (en) * | 1966-04-22 | 1969-03-04 | Bell Telephone Labor Inc | Voice-excited,bandwidth reduction system employing pitch frequency pulses generated by unencoded baseband signal |
US3750024A (en) * | 1971-06-16 | 1973-07-31 | Itt Corp Nutley | Narrow band digital speech communication system |
GB2102254B (en) * | 1981-05-11 | 1985-08-07 | Kokusai Denshin Denwa Co Ltd | A speech analysis-synthesis system |
-
1985
- 1985-11-01 CA CA000494448A patent/CA1240396A/en not_active Expired
- 1985-11-01 EP EP19850905709 patent/EP0203940A4/en not_active Withdrawn
- 1985-11-01 JP JP50505785A patent/JPS63500896A/en active Pending
- 1985-11-01 AU AU50198/85A patent/AU577641B2/en not_active Ceased
- 1985-11-01 WO PCT/US1985/002168 patent/WO1986002726A1/en not_active Application Discontinuation
-
1986
- 1986-06-27 NO NO86862602A patent/NO862602L/en unknown
- 1986-06-30 DK DK311386A patent/DK311386A/en not_active Application Discontinuation
Non-Patent Citations (3)
Title |
---|
ICASSP 84 - IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 19th-21st March 1984, San Diego, US, vol. 2, pages 27.8.1-27.8.4, IEEE, New York, US; M. DANKBERG et al.: "Implementation of the RELP Vocoder using the TMS320" * |
ICASSP 85 - IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 26th-29th March 1985, Tampa, US, vol. 3, pages 969-972, IEEE, New York, US; R.L. ZINSER: "An efficient pitch-aligned high-frequency regeneration technique for RELP Vocoders" * |
See also references of WO8602726A1 * |
Also Published As
Publication number | Publication date |
---|---|
NO862602L (en) | 1986-09-01 |
EP0203940A4 (en) | 1987-04-07 |
NO862602D0 (en) | 1986-06-27 |
DK311386D0 (en) | 1986-06-30 |
AU577641B2 (en) | 1988-09-29 |
WO1986002726A1 (en) | 1986-05-09 |
AU5019885A (en) | 1986-05-15 |
CA1240396A (en) | 1988-08-09 |
DK311386A (en) | 1986-06-30 |
JPS63500896A (en) | 1988-03-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5903866A (en) | Waveform interpolation speech coding using splines | |
EP0392126B1 (en) | Fast pitch tracking process for LTP-based speech coders | |
US5339384A (en) | Code-excited linear predictive coding with low delay for speech or audio signals | |
EP0666557B1 (en) | Decomposition in noise and periodic signal waveforms in waveform interpolation | |
US6691084B2 (en) | Multiple mode variable rate speech coding | |
Andersen et al. | Internet low bit rate codec (iLBC) | |
EP0331857B1 (en) | Improved low bit rate voice coding method and system | |
US8392176B2 (en) | Processing of excitation in audio coding and decoding | |
EP0865029B1 (en) | Efficient decomposition in noise and periodic signal waveforms in waveform interpolation | |
EP2120234B1 (en) | Speech coding apparatus and method | |
US6047254A (en) | System and method for determining a first formant analysis filter and prefiltering a speech signal for improved pitch estimation | |
EP0416036A1 (en) | Improved adaptive transform coding | |
WO1990013110A1 (en) | Adaptive transform coder having long term predictor | |
EP0673015B1 (en) | Computational complexity reduction during frame erasure or packet loss | |
JP2003050600A (en) | Method and system for generating and encoding line spectrum square root | |
US4710959A (en) | Voice encoder and synthesizer | |
JP3236592B2 (en) | Speech coding method for use in a digital speech coder | |
AU577641B2 (en) | Relp vocoder implemented in digital signal processors | |
Chu et al. | A frequency weighted Itakura-Saito spectral distance measure | |
Cuperman et al. | Backward adaptation for low delay vector excitation coding of speech at 16 kbit/s | |
US5673361A (en) | System and method for performing predictive scaling in computing LPC speech coding coefficients | |
JP3237178B2 (en) | Encoding method and decoding method | |
US5937374A (en) | System and method for improved pitch estimation which performs first formant energy removal for a frame using coefficients from a prior frame | |
Eriksson et al. | On waveform-interpolation coding with asymptotically perfect reconstruction | |
Sunwoo et al. | Real-time implementation of the VSELP on a 16-bit DSP chip |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 19860703 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE CH DE FR GB IT LI LU NL SE |
|
A4 | Supplementary search report drawn up and despatched |
Effective date: 19870407 |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: HUGHES NETWORK SYSTEMS, INC. (A DELAWARE CORPORAT Owner name: M/A-COM GOVERNMENT SYSTEMS, INC. |
|
17Q | First examination report despatched |
Effective date: 19881207 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 19890418 |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: WILSON, PHILIP, JOHN |