EP0119033B1 - Speech encoder - Google Patents

Speech encoder Download PDF

Info

Publication number
EP0119033B1
EP0119033B1 EP19840301302 EP84301302A EP0119033B1 EP 0119033 B1 EP0119033 B1 EP 0119033B1 EP 19840301302 EP19840301302 EP 19840301302 EP 84301302 A EP84301302 A EP 84301302A EP 0119033 B1 EP0119033 B1 EP 0119033B1
Authority
EP
European Patent Office
Prior art keywords
signal
weighting
parameters
encoder
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
EP19840301302
Other languages
German (de)
French (fr)
Other versions
EP0119033A1 (en
Inventor
Gideon Abraham Senensieb
Anthony John Milbourn
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Prutec Ltd
Original Assignee
Prutec Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GB8306685A external-priority patent/GB2137054B/en
Application filed by Prutec Ltd filed Critical Prutec Ltd
Publication of EP0119033A1 publication Critical patent/EP0119033A1/en
Application granted granted Critical
Publication of EP0119033B1 publication Critical patent/EP0119033B1/en
Expired legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/10Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation

Definitions

  • This invention relates to a speech encoder, this being a circuit for converting a speech signal into a pulse train.
  • the pulse train may either be transmitted, encrypted, or stored and from it the original speech can be reproduced.
  • the linear predictor in Figure 1 is a recursive digital filter comprising a summation circuit 10 which has an input line 12 and an output line 14.
  • the output line 14 is connected to a shift register or to a tapped delay line 16 each tapping of which is fed back to the summation circuit by way of a respective multiplication circuit 18, to 18 n .
  • the output signal has a first component determined by the weighted summed outputs from the tappings of the delay line and a second component determined by the value of the input signal at that instant.
  • the first of these two components may be regarded as the predicted value based on previous values of the output signal and the second as the residual error. If the weighting parameters p, to p " of the circuits 18 are optimised then the residual error will be minimised.
  • the residual error if used as the excitation, yields perfect reproduction of the original speech.
  • the configuration of the vocal tract being due to physical movement of articulatory organs, can only change quite slowly.
  • the analogy between the configuration of the vocal tract and the weighting parameters allows much of the information in the speech signal to be transmitted at a low data rate. While this ensures good intelligibility, the quality and naturalness of the reproduced speech is largely dependent on the excitation signal used.
  • the parameters of the predictor are transmitted or stored and the excitation signal is selected either as white noise or as a regular series of pulses depending on the type of sound to be produced. Even using such crude simulation of the residual signal it was possible to produce recognisable speech. However, though the quality was acceptable for certain applications, for example military applications where maximum signal compression was of most importance, it fell below acceptable commercial standards.
  • the predictor should be excited by a train of pulses, in which the timing and the magnitude of each pulse in the train should be selected in order to minimise the difference between the re-synthesised speech and the original speech signal.
  • the excitation signal does not depend on the type of sound to be produced but for each frame the ideal excitation pulse train is computed.
  • the multi-pulse excitation signal, u(n), is a sequence of samples whose values are zero in all but a few positions.
  • the amplitudes and positions of the non-zero samples are chosen so as to minimise a perceptually meaningful error.
  • the linear predictor has a transfer function H(z), corresponding to an impulse response h(n).
  • the positions n k and the values u(n k ) are to be chosen so as to minimise the energy in the error sequence.
  • s(n) is the sequence of samples of original speech
  • w(n) is the impulse response corresponding to a spectral weighting function W(z)
  • * denotes convolution.
  • the problem is therefore to determine so as to minimise
  • an interative procedure can be adopted in which position and amplitude are evaluated for one non-zero sample at a time.
  • u(n k ) set to zero for k>j.
  • the sequence e(n) and values of u(n j ) for all possible n j must be recomputed at each iteration over the interval of interest.
  • the procedure can be refined by re-adjusting the amplitudes of all selected samples simultaneously, once their positions are all known.
  • the present invention is intended to encode speech using linear predictive coding in which the LPC filter is excited by a series of pulses whose positions and amplitude are capable of being computed in real time.
  • an encoder for encoding speech signals comprising means for sampling frames of the speech signal to be encoded, a linear prediction analyser for determining for each frame parameters of a linear predictor to minimise the residual signal for the sampled frame, and means for producing an excitation signal for transmission or storage in conjunction with the parameters to enable each frame of the speech signal to be resynthesised, characterised in that the encoder comprises a weighting filter with weighting parameters for damping reverberations within the speech signal caused by resonances in the vocal tract and a circuit for time weighting the parameters determined by the analyser by multiplication with a factor and in that the means for producing an excitation signal comprises correlating means for correlating the outputs of the weighting filter and the circuit.
  • a linear recursive filter if excited by a single pulse may have an impulse response of very long time duration and provided that it is not unstable will eventually decay rather than oscillate.
  • the effect of a long time response is that responses from consecutive excitation pulses tend to run into each other and it is difficult when performing a correlation to separate the pulse response of one excitation from another.
  • the speech signal is passed through a weighting filter, preferably a pole-zero filter, which has the effect of damping reverberations.
  • the weighting filter has a non-recursive part the weighting parameters of which are of the same magnitude as, but of opposite sign to, those of the linear predictor in the decoder.
  • the purpose of the non-recursive side of the weighting filter may regard the purpose of the non-recursive side of the weighting filter as negating the effect of the vocal tract on the pulses originally generated within the throat of the speaker.
  • the other side of the filter on the other hand, the recursive part, has weighting coefficients which are related to those of the linear predictor but are weighted by a factor which follows a power law of k", (k ⁇ 1), so that time-weighting of the impulse response is achieved.
  • the correlator will produce a high correlation output at the times when impulses should be applied to the linear prediction filter in order to simulate the speech signal.
  • the weighting filter is followed by a correlator of which the output is fed to an impulse selector.
  • the purpose of the impulse selector is to select from amongst the peaks of the output of the correlator a number of peaks having the highest magnitude. These peaks determine the time at which the residual signal should be applied to the linear predictor in the decoder in order to resynthesise the speech signal.
  • the peaks are selected such that they are all of the same polarity.
  • This polarity can be set so as to match the polarity of the microphone being used. If the polarities of the peak selection and microphone are correctly matched, then this improves the quality of the resynthesised speech by helping to preserve its harmonic content.
  • the excitation pulses should have an amplitude related to the amplitude of the - peak produced by the correlator. Because the auto-correlation functions of the pulse responses of the LPC filter are not constant but vary with the weighting parameters, it is preferred that the excitation pulse amplitude should be derived by dividing the correlator output by the value of the auto-correlation function of the impulse response of the filter with the prevailing time weighted parameters.
  • the speech signal to be encoded is received over an input line 30.
  • the input signal is applied to a known circuit 32 which is a linear prediction analyser.
  • This circuit computes the values of the weighting parameters of the digital recursive filter which would minimise the residual signal and outputs these parameters.
  • a linear prediction analyser more readily computes so called reflection co-efficients which are not the same as the weighting parameters but from which these parameters can be computed.
  • the reflection co-efficients are applied to a line 34.
  • the speech signal is also applied via a line 36 to a weighting filter 38 which will now be described by reference to Figure 3.
  • the weighting filter comprises an input line 40 connected to a summation circuit 42 having an output line 44.
  • a multi-tapped delay line (or shift register) 46 is connected to the input line 40 and a similar multi-tapped delay line 48 is connected to the output line 44.
  • the tappings of the delay line 46 are connected by way of a first set of weighting circuits 50 to the circuit 42 which also receives signals from the tappings of the delay line 48 through weighting circuits 52.
  • the values of the parameters used in the multiplication circuits of the weighting filter 38 in Figure 3 are derived from the linear prediction analyser 32.
  • the weighting parameters p, to Pn equivalent to the reflection coefficients are computed.
  • the coefficient weighting circuits 32 two sets of parameters are derived from the parameters p, to p " for setting the parameters of the weighting filter 38.
  • the first set of parameters is applied to the weighting circuits 50 and are equal to -p i to -p n .
  • the combination of the summation circuit with the delay line 46 and the weighting circuits 50 results in a digital non-recursive filter having parameters which are the opposite of those used in the receiving circuit to resynthesize the speech signal.
  • the effect of the non-recursive part of the weighting filter is to negate the effect of the vocal tract.
  • the second set of parameters evaluated by the coefficient weighting circuit 32 is equal to k - P1 to k" - Pn , where k is less than 1.
  • the delay line 48 and the weighting circuits 52 produce in conjunction with the summation circuit 42 a recursive digital filter whose pulse response is similar to that of the filter used to resynthesize the speech but with more rapid decay.
  • the effect of combination of the non-recursive and recursive filters which constitute the weighting filter 38, which is also termed a pole-zero filter, is to produce from the speech signal one in which reverberations are more severely damped to reduce the interaction between the effects of consecutive excitation pulses.
  • the output of the digital weighting filter 38 is applied to a correlator 64 connected to a circuit 66 which evaluates the impulse response of a digital recursive filter of the same construction as that shown in Fig. 1 but with weighting parameters k ⁇ p, to k" - p " .
  • the correlator 64 may consist of a shift register whose tapping are connected to multiplication circuits the multiplication factors of which are determined by the impulse response evaluating circuit 66. When there is a high level of correlation between the output of the weighting filter 38 and the impulse response evaluated by the circuit 66, a high output is produced by the correlator.
  • the output of the correlator 64 thus contains peaks which coincide with impulses in the excitation signal which, if applied to the linear predictor at the decoder, will cause a good approximation to the original speech signal to be produced.
  • the purpose of the pulse selector circuit 70 in Figure 2 is to select the timing of the pulses which are to be encoded.
  • One possible algorithm would be to disregard high values adjacent a local maximum or minimum if they are not separated from the local maximum or minimum by a zero crossing or a turning point.
  • Another possible algorithm is to select a fixed number of the greatest peaks in each time frame and to ensure that they are separated by at least some minimum time.
  • the amplitude of the selected pulses will be related to the amplitude of an optimal excitation signal.
  • the impulse response circuit 66 additionally evaluates the auto-correlation function of each pulse response and applies a signal over a line 72 to a divider circuit 74.
  • the selected pulses are divided by the auto-correlation value and the output signal from the divider is fed to a multiplexer 76 which encodes the reflection coefficient received over the line 34 and the signals from the divider 74 to produce the encoded signal on output line 78 for transmission or storage.
  • the preferred embodiment of the invention proposes making some simplifying assumptions in order to derive a modified algorithm which permits implementation in real-time of a 7.2 kbits/s vocoder using standard components on a double Eurosize circuit board.
  • ng is an arbitrary positive integer.
  • the approximation in (12) can be improved by increasing ng and/or by reducing r.
  • (12) can be applied to (5) to yield
  • sequence e o (n) is defined as where S o (n) is the output of the linear predictor driven with zero input.
  • the modified computation exploits an alternative interpretation of the role of the weighting function W(z).
  • the effect of the weighting function can be viewed as an attempt to separate the response of the system to successive non-zero excitation samples. If these samples are far enough apart, their values can be optimized independently.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Description

  • This invention relates to a speech encoder, this being a circuit for converting a speech signal into a pulse train. The pulse train may either be transmitted, encrypted, or stored and from it the original speech can be reproduced.
  • The design of a vocoderfor commercial application requires a careful compromise between three main parameters: perceived voice quality, data rate, and complexity (roughly equivalent to cost) of the hardware implementation. Other important performance parameters that must be considered are voice quality in the presence of acoustic noise at the input, robustness against errors in the low bit-rate digital stream and performance in tandem with other voice coding equipment.
  • There is already known in the art a system of speech encoding which makes use of the technique of linear predictive coding (LPC). In order to explain the principles employed in this method of encoding, reference will first be made to Figure 1 which shows a linear predictor.
  • The linear predictor in Figure 1 is a recursive digital filter comprising a summation circuit 10 which has an input line 12 and an output line 14. The output line 14 is connected to a shift register or to a tapped delay line 16 each tapping of which is fed back to the summation circuit by way of a respective multiplication circuit 18, to 18n.
  • Assume that it is desired to produce a particular sequence of output signals corresponding to a sampled speech signal. At any given instant, the output signal has a first component determined by the weighted summed outputs from the tappings of the delay line and a second component determined by the value of the input signal at that instant. The first of these two components may be regarded as the predicted value based on previous values of the output signal and the second as the residual error. If the weighting parameters p, to p" of the circuits 18 are optimised then the residual error will be minimised. To enable the reproduction by a linear predictor of an original speech signal it is only necessary to transmit or store in each frame the weighting parameters and an excitation signal. The residual error, if used as the excitation, yields perfect reproduction of the original speech.
  • The technique described above works well for speech signals because the operation simulates the acoustic properties of the human vocal tract. When a sound is uttered a vibration is transmitted down the vocal tract which is configured to produce the desired sound.
  • The configuration of the vocal tract, being due to physical movement of articulatory organs, can only change quite slowly. The analogy between the configuration of the vocal tract and the weighting parameters allows much of the information in the speech signal to be transmitted at a low data rate. While this ensures good intelligibility, the quality and naturalness of the reproduced speech is largely dependent on the excitation signal used.
  • In a system which has been proposed in the past, the parameters of the predictor are transmitted or stored and the excitation signal is selected either as white noise or as a regular series of pulses depending on the type of sound to be produced. Even using such crude simulation of the residual signal it was possible to produce recognisable speech. However, though the quality was acceptable for certain applications, for example military applications where maximum signal compression was of most importance, it fell below acceptable commercial standards.
  • In order to improve the quality of the reproduced speech, it is necessary to put more information into the excitation signal so that it should resemble the residual signal more closely. With this aim in mind, it has been proposed that in each frame the predictor should be excited by a train of pulses, in which the timing and the magnitude of each pulse in the train should be selected in order to minimise the difference between the re-synthesised speech and the original speech signal. In this last case, the excitation signal does not depend on the type of sound to be produced but for each frame the ideal excitation pulse train is computed.
  • The multi-pulse-excited linear-reductive model of speech generation was presented by Atal and Remde in their paper "A New Model of LPC Excitation for Producing Natural Sounding Speech at Low Bit Rates" Proc. ICASSP 1982 pp 614―617. This model bypasses neatly the problem of inflexible classification of speech segments into voiced and unvoiced sounds encountered in previous approaches to vocoding. It has been used to demonstrate very good reproduction of speech from parameters encoded at bit-rates estimated to be in the region of 10 kbits/s.
  • There now follows the derivation of multi-phase excitation parameters based on a model comprising a multi-pulse excitation generator coupled to a linear predictor. The multi-pulse excitation signal, u(n), is a sequence of samples whose values are zero in all but a few positions. The amplitudes and positions of the non-zero samples are chosen so as to minimise a perceptually meaningful error. The details of a possible formulation are summarized below.
  • A sequence s(n) is to be synthesised over the interval n=1 ... N by exciting a linear predictor with the multi-pulse sequence u(n). The linear predictor has a transfer function H(z), corresponding to an impulse response h(n). The sequence u(n) contains at most K non-zero samples u(nk), k=1 ... K, where K<N. The positions nk and the values u(nk) are to be chosen so as to minimise the energy in the error sequence.
    Figure imgb0001
    where s(n) is the sequence of samples of original speech, w(n) is the impulse response corresponding to a spectral weighting function W(z), and * denotes convolution.
  • The problem is therefore to determine
    Figure imgb0002
    so as to minimise
    Figure imgb0003
    In order to avoid the complexity of determining all 2K unknowns simultaneously, an interative procedure can be adopted in which position and amplitude are evaluated for one non-zero sample at a time.
  • The jth iteration establishes the values nj, u(nj) once nk, u(nk) for k=1 ... j-1 have been determined by the previous j-1 iterations and with u(nk) set to zero for k>j. At the jth iteration we minimise
    Figure imgb0004
    Setting
    Figure imgb0005
    and noting that
    Figure imgb0006
    we have
    Figure imgb0007
  • The optimum value of u(nj) is found by differentiating Ej partially with respect to u(nj) and setting the derivative to 0. This yields
    Figure imgb0008
  • The minimum value of El for a given nj can then be obtained by combining (6) and (7) into
    Figure imgb0009
  • From (6), Ejmin cannot be negative. Therefore Ei is minimised in (8) if nj is chosen such that |u(nj)| is a maximum.
  • The sequence e(n) and values of u(nj) for all possible nj must be recomputed at each iteration over the interval of interest. The procedure can be refined by re-adjusting the amplitudes of all selected samples simultaneously, once their positions are all known.
  • The procedure described above is ill-suited to implementation in real-time at low-cost with current hardware technology because of the large computation rate and because of the inherent block-processed structure of the algorithm.
  • The present invention is intended to encode speech using linear predictive coding in which the LPC filter is excited by a series of pulses whose positions and amplitude are capable of being computed in real time.
  • According to the present invention, there is provided an encoder for encoding speech signals, comprising means for sampling frames of the speech signal to be encoded, a linear prediction analyser for determining for each frame parameters of a linear predictor to minimise the residual signal for the sampled frame, and means for producing an excitation signal for transmission or storage in conjunction with the parameters to enable each frame of the speech signal to be resynthesised, characterised in that the encoder comprises a weighting filter with weighting parameters for damping reverberations within the speech signal caused by resonances in the vocal tract and a circuit for time weighting the parameters determined by the analyser by multiplication with a factor and in that the means for producing an excitation signal comprises correlating means for correlating the outputs of the weighting filter and the circuit.
  • A linear recursive filter if excited by a single pulse may have an impulse response of very long time duration and provided that it is not unstable will eventually decay rather than oscillate. The effect of a long time response is that responses from consecutive excitation pulses tend to run into each other and it is difficult when performing a correlation to separate the pulse response of one excitation from another.
  • In the encoder of the present invention, the speech signal is passed through a weighting filter, preferably a pole-zero filter, which has the effect of damping reverberations. The weighting filter has a non-recursive part the weighting parameters of which are of the same magnitude as, but of opposite sign to, those of the linear predictor in the decoder. In the analogy mentioned above one may regard the purpose of the non-recursive side of the weighting filter as negating the effect of the vocal tract on the pulses originally generated within the throat of the speaker. The other side of the filter, on the other hand, the recursive part, has weighting coefficients which are related to those of the linear predictor but are weighted by a factor which follows a power law of k", (k<1), so that time-weighting of the impulse response is achieved.
  • If one correlates the speech signal after passing it through such a weighting filter with the impulse response of a filter which consists only of the recursive side of the weighting filter when excited by a single excitation pulse, then the correlator will produce a high correlation output at the times when impulses should be applied to the linear prediction filter in order to simulate the speech signal.
  • Thus, in the preferred embodiment, the weighting filter is followed by a correlator of which the output is fed to an impulse selector. The purpose of the impulse selector is to select from amongst the peaks of the output of the correlator a number of peaks having the highest magnitude. These peaks determine the time at which the residual signal should be applied to the linear predictor in the decoder in order to resynthesise the speech signal.
  • Also in the preferred embodiment, the peaks are selected such that they are all of the same polarity. This polarity can be set so as to match the polarity of the microphone being used. If the polarities of the peak selection and microphone are correctly matched, then this improves the quality of the resynthesised speech by helping to preserve its harmonic content.
  • It is also preferred that the excitation pulses should have an amplitude related to the amplitude of the - peak produced by the correlator. Because the auto-correlation functions of the pulse responses of the LPC filter are not constant but vary with the weighting parameters, it is preferred that the excitation pulse amplitude should be derived by dividing the correlator output by the value of the auto-correlation function of the impulse response of the filter with the prevailing time weighted parameters.
  • The invention will now be described further, by way of example, with reference to the accompanying drawings, in which:
    • Figure 1 is, as earlier described, a diagram of a linear predictive filter;
    • Figure 2 is a block circuit diagram of an encoder in accordance with the present invention; and
    • Figure 3 is a diagram showing a weighting filter.
  • In Figure 2, the speech signal to be encoded is received over an input line 30. The input signal is applied to a known circuit 32 which is a linear prediction analyser. This circuit computes the values of the weighting parameters of the digital recursive filter which would minimise the residual signal and outputs these parameters. As is known, a linear prediction analyser more readily computes so called reflection co-efficients which are not the same as the weighting parameters but from which these parameters can be computed. The reflection co-efficients are applied to a line 34.
  • The speech signal is also applied via a line 36 to a weighting filter 38 which will now be described by reference to Figure 3. The weighting filter comprises an input line 40 connected to a summation circuit 42 having an output line 44. A multi-tapped delay line (or shift register) 46 is connected to the input line 40 and a similar multi-tapped delay line 48 is connected to the output line 44. The tappings of the delay line 46 are connected by way of a first set of weighting circuits 50 to the circuit 42 which also receives signals from the tappings of the delay line 48 through weighting circuits 52. The values of the parameters used in the multiplication circuits of the weighting filter 38 in Figure 3 are derived from the linear prediction analyser 32.
  • In analyser 32, the weighting parameters p, to Pn, equivalent to the reflection coefficients are computed. In the coefficient weighting circuits 32, two sets of parameters are derived from the parameters p, to p" for setting the parameters of the weighting filter 38. The first set of parameters is applied to the weighting circuits 50 and are equal to -pi to -pn. Thus the combination of the summation circuit with the delay line 46 and the weighting circuits 50 results in a digital non-recursive filter having parameters which are the opposite of those used in the receiving circuit to resynthesize the speech signal. As previously stated, the effect of the non-recursive part of the weighting filter is to negate the effect of the vocal tract.
  • The second set of parameters evaluated by the coefficient weighting circuit 32 is equal to k - P1 to k" - Pn, where k is less than 1. Thus, the delay line 48 and the weighting circuits 52 produce in conjunction with the summation circuit 42 a recursive digital filter whose pulse response is similar to that of the filter used to resynthesize the speech but with more rapid decay. The effect of combination of the non-recursive and recursive filters which constitute the weighting filter 38, which is also termed a pole-zero filter, is to produce from the speech signal one in which reverberations are more severely damped to reduce the interaction between the effects of consecutive excitation pulses.
  • The output of the digital weighting filter 38 is applied to a correlator 64 connected to a circuit 66 which evaluates the impulse response of a digital recursive filter of the same construction as that shown in Fig. 1 but with weighting parameters k · p, to k" - p".
  • The correlator 64 may consist of a shift register whose tapping are connected to multiplication circuits the multiplication factors of which are determined by the impulse response evaluating circuit 66. When there is a high level of correlation between the output of the weighting filter 38 and the impulse response evaluated by the circuit 66, a high output is produced by the correlator. The output of the correlator 64 thus contains peaks which coincide with impulses in the excitation signal which, if applied to the linear predictor at the decoder, will cause a good approximation to the original speech signal to be produced. However, in order to reduce the bit rate, it is necessary to select from amongst the correlator output only a small number of pulses and these should coincide with the impulses of maximum energy in the excitation signal.
  • The purpose of the pulse selector circuit 70 in Figure 2 is to select the timing of the pulses which are to be encoded. One could merely store the output values from the correlator and select the highest peaks but this could result in consecutive high values being used to produce excitation pulses when they are truly the flanks of the same pulse. Therefore, it is preferable that the impulse circuit locate local maxima and minima and disregard the values adjacent to these peaks. One possible algorithm would be to disregard high values adjacent a local maximum or minimum if they are not separated from the local maximum or minimum by a zero crossing or a turning point. Another possible algorithm is to select a fixed number of the greatest peaks in each time frame and to ensure that they are separated by at least some minimum time.
  • The amplitude of the selected pulses will be related to the amplitude of an optimal excitation signal. In order to normalise these pulses to take into account the different values of the auto-correlation function of the impulse responses, the impulse response circuit 66 additionally evaluates the auto-correlation function of each pulse response and applies a signal over a line 72 to a divider circuit 74. In the divider circuit 74, the selected pulses are divided by the auto-correlation value and the output signal from the divider is fed to a multiplexer 76 which encodes the reflection coefficient received over the line 34 and the signals from the divider 74 to produce the encoded signal on output line 78 for transmission or storage.
  • The mathematical considerations underlying the invention are now considered for completeness but the successful operation of the apparatus of the invention is not dependent upon the accuracy of the analysis.
  • The preferred embodiment of the invention proposes making some simplifying assumptions in order to derive a modified algorithm which permits implementation in real-time of a 7.2 kbits/s vocoder using standard components on a double Eurosize circuit board.
  • Defining the linear predictor of order M as
    Figure imgb0010
    We now define the weighting function, W(z) by
    Figure imgb0011
    where r is a real number between 0 and 1. The filter W(z) serves to de-emphasize the error signal e(n) in the formant regions, reflecting the fact that distortion in these regions is masked by relatively large concentrations of energy in the speech signal. Broadly speaking, the de-emphasis effect is enhanced by reducing r.
  • If the linear-redictive analysis method employed leads to an unconditionally-stable linear predictor, then the envelope of its impulse response, h(n), decays with time.
  • The impulse response h'(n), defined in (4), corresponds to the cascade of the transfer functions H(z) and W(z). Some thought shows that h1(n)=h(n). rn (11)
  • Since r<1, the envelope of h'(n) decays more rapidly than that of h(n). Combining this with the causality of the linear predictor we can write
    Figure imgb0012
  • In (12) ng is an arbitrary positive integer. The approximation in (12) can be improved by increasing ng and/or by reducing r.
  • Furthermore, (12) can be applied to (5) to yield
    Figure imgb0013
  • We now apply the restriction
    Figure imgb0014
    requiring a minimum separation of ng between non-zero samples of the excitation sequence. This restriction can extend (13) to
    Figure imgb0015
  • The sequence eo(n) is defined as
    Figure imgb0016
    where So(n) is the output of the linear predictor driven with zero input.
  • Using equations (12) and (15), we can rewrite (7) in approximate form as
    Figure imgb0017
    subject to the restriction of (14).
  • An approximate solution to the problem of determining the positions and amplitudes of the non-zero excitation samples can therefore be found by computing the following equation:
    Figure imgb0018
    and selecting values of n=nk, for which the corresponding values of |φ(n)| are local maxima, subject to the restriction of (14).
  • The modified computation exploits an alternative interpretation of the role of the weighting function W(z). The effect of the weighting function can be viewed as an attempt to separate the response of the system to successive non-zero excitation samples. If these samples are far enough apart, their values can be optimized independently.
  • In low bit-rate applications it is desirable to place the non-zero samples far apart, so as to distribute them roughly evenly across the interval of synthesis.

Claims (6)

1. An encoder for encoding speech signals, comprising means for sampling frames of the speech signal to be encoded, a linear prediction analyser (32) for determining for each frame parameters (Pn) of a linear predictor (10,16,18) to minimise the residual signal for the sampled frame, and means (64, 70, 74, 76) for producing an excitation signal for transmission or storage in conjunction with the parameters to enable each frame of the speech signal to be resynthesised, characterised in that the encoder comprises a weighting filter (38) with weighting parameters (-Pn) for damping reverberations within the speech signal caused by resonances in the vocal tract and a circuit (66) for time weighting the parameters (pn) determined by the analyser (32) by multiplication with a factor (K") and in that the means for producing an excitation signal comprises correlating means (64) for correlating the outputs of the weighting filter (38) and the circuit (66).
2. A signal encoder as claimed in Claim 2, in which the weighting filter (38) comprises a pole-zero filter.
3. A signal encoder as claimed in claim 1 or 2, in which the correlating means (64) comprises a tapped delay line, means for multiplying the tapped signals by the said time weighted impulse response, and means for summing the outputs of the multiplication circuits.
4. A signal encoder as claimed in any preceding claim, in which the output of the correlating means is connected to a pulse selector (70) which is operative to select a number of pulses from the correlator output.
5. A signal encoder as claimed in Claim 4, in which the pulse selector comprises means for detecting local peaks and means for selecting amongst the local peaks, those having the most positive or the most negative amplitudes.
6. A signal encoder as claimed in any preceding claim, comprising means (66, 74) for determining the magnitude of the transmitted pulses by dividing the output of the correlating means (64) by the auto-correlation function of the said time weighted impulse response.
EP19840301302 1983-03-11 1984-02-28 Speech encoder Expired EP0119033B1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GB8306685A GB2137054B (en) 1983-03-11 1983-03-11 Speech encoder
GB8306685 1983-03-11
GB8333037 1983-12-10
GB8333037 1983-12-10

Publications (2)

Publication Number Publication Date
EP0119033A1 EP0119033A1 (en) 1984-09-19
EP0119033B1 true EP0119033B1 (en) 1987-04-15

Family

ID=26285473

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19840301302 Expired EP0119033B1 (en) 1983-03-11 1984-02-28 Speech encoder

Country Status (3)

Country Link
EP (1) EP0119033B1 (en)
CA (1) CA1202419A (en)
DE (1) DE3463192D1 (en)

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4472832A (en) * 1981-12-01 1984-09-18 At&T Bell Laboratories Digital speech coder

Also Published As

Publication number Publication date
DE3463192D1 (en) 1987-05-21
EP0119033A1 (en) 1984-09-19
CA1202419A (en) 1986-03-25

Similar Documents

Publication Publication Date Title
CN100369112C (en) Variable rate speech coding
JP4005359B2 (en) Speech coding and speech decoding apparatus
US6385577B2 (en) Multiple impulse excitation speech encoder and decoder
US6055496A (en) Vector quantization in celp speech coder
JP3094908B2 (en) Audio coding device
CA2016462A1 (en) Hybrid switched multi-pulse/stochastic speech coding technique
EP0342687B1 (en) Coded speech communication system having code books for synthesizing small-amplitude components
JP2002268686A (en) Voice coder and voice decoder
EP1103953B1 (en) Method for concealing erased speech frames
JP2615548B2 (en) Highly efficient speech coding system and its device.
EP0119033B1 (en) Speech encoder
JPH08328597A (en) Sound encoding device
JPH07101358B2 (en) Multi-pulse coding method and apparatus
JP3299099B2 (en) Audio coding device
JPH0258100A (en) Voice encoding and decoding method, voice encoder, and voice decoder
JP2853170B2 (en) Audio encoding / decoding system
Srivastava Fundamentals of linear prediction
GB2137054A (en) Speech encoder
JP3103108B2 (en) Audio coding device
JPH08320700A (en) Sound coding device
JP3071800B2 (en) Adaptive post filter
JPH0511799A (en) Voice coding system
JP2615862B2 (en) Voice encoding / decoding method and apparatus
JP3274451B2 (en) Adaptive postfilter and adaptive postfiltering method
JPH02160300A (en) Voice encoding system

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Designated state(s): BE CH DE FR GB LI NL

17P Request for examination filed

Effective date: 19850207

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): BE CH DE FR GB LI NL

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Effective date: 19870415

Ref country code: LI

Effective date: 19870415

Ref country code: FR

Free format text: THE PATENT HAS BEEN ANNULLED BY A DECISION OF A NATIONAL AUTHORITY

Effective date: 19870415

Ref country code: CH

Effective date: 19870415

Ref country code: BE

Effective date: 19870415

REF Corresponds to:

Ref document number: 3463192

Country of ref document: DE

Date of ref document: 19870521

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

EN Fr: translation not filed
NLV1 Nl: lapsed or annulled due to failure to fulfill the requirements of art. 29p and 29m of the patents act
PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed
PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Effective date: 19881101

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Effective date: 19890228

GBPC Gb: european patent ceased through non-payment of renewal fee