EP0361432B1 - Method of and device for speech signal coding and decoding by means of a multipulse excitation - Google Patents

Method of and device for speech signal coding and decoding by means of a multipulse excitation Download PDF

Info

Publication number
EP0361432B1
EP0361432B1 EP89117837A EP89117837A EP0361432B1 EP 0361432 B1 EP0361432 B1 EP 0361432B1 EP 89117837 A EP89117837 A EP 89117837A EP 89117837 A EP89117837 A EP 89117837A EP 0361432 B1 EP0361432 B1 EP 0361432B1
Authority
EP
European Patent Office
Prior art keywords
signal
long
term
gain
excitation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP89117837A
Other languages
German (de)
French (fr)
Other versions
EP0361432A3 (en
EP0361432A2 (en
Inventor
Maurizio Omologo
Daniele Sereno
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SIP SAS
Italtel SpA
Telecom Italia SpA
Original Assignee
SIP SAS
Italtel SpA
Italtel Societa Italiana Telecomunicazioni SpA
SIP Societa Italiana per lEsercizio delle Telecomunicazioni SpA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SIP SAS, Italtel SpA, Italtel Societa Italiana Telecomunicazioni SpA, SIP Societa Italiana per lEsercizio delle Telecomunicazioni SpA filed Critical SIP SAS
Publication of EP0361432A2 publication Critical patent/EP0361432A2/en
Publication of EP0361432A3 publication Critical patent/EP0361432A3/en
Application granted granted Critical
Publication of EP0361432B1 publication Critical patent/EP0361432B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/10Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation

Definitions

  • the present invention concerns medium-low bit-race speech signal coding systems, and more particularly it relates to a coding-decoding method and device using a multipulse analysis-by-synthesis excitation technique.
  • Multipulse linear prediction coding is one of the most promising techniques for obtaining high quality synthetic speech at bit rates below 16 kbit/s. This technique has been originally proposed by B. S. Atal and J. R. Remde in the paper entitled “A new method of LPC excitation for producing natural-sounding speech at low bit rates", International Conference on Acoustic, Speech, Signal Processing (ICASSP), pages 614-617, Paris, 1982.
  • IICASSP International Conference on Acoustic, Speech, Signal Processing
  • the excitation signal for the synthesis filter consists of a train of pulses whose amplitudes and time positions are determined so as to minimize a perceptually-meaningful distorsion measurement; such a measurement is obtained by comparing the samples at the synthesis filter output with the original speech samples and simultaneous weighting the difference by a function which takes into account how the human perception evaluates the distorsion introduced (analysis-by-synthesis procedure).
  • the synthesizer comprises the cascade of a long-term and a short-term synthesis filter are of particular interest: in fact they provide signals whose quality gradually decreases as the bit rate decreases and do not present a dramatic performance deterioration below a threshold rate.
  • the invention provides a method and a device allowing quality to be increased leaving the bit rate unchanged or a given quality to be maintained even at lower bit rate.
  • This can be achieved by using a combined optimization technique, of sequential type, of the parameters of the long-term synthesis filter and of the excitation within the analysis-by-synthesis procedure; the sequential procedure is sub-optimum with respect to the original optimum one, but it is easier to be implemented.
  • a method is provided where an optimization of parameters according to the particular error minimization procedure is used, which is a closed loop analysis.
  • the terms "open loop analysis” and “closed loop analysis” are here used as explained e.g. in IEEE Journal on Selected Areas in communications, Vol. 6 No. 2, Feb. 1988, p.353-363, Kroon and Deprettere.
  • the long-term analysis means are apt to determine said lag and gain in two successive steps, preceding a step in which the amplitudes and positions of the excitation pulses are determined by said excitation generator, and comprise: a second long-term synthesis filter, which is fed with a null signal and in which, for the computation of the lag, there is used
  • a generic speech signal coding-decoding system can be schematized by a coder COD, a transmission channel CH and a decoder DEC.
  • coder COD receives digital samples s(n) of the original speech signal, organized into frames comprising each a predetermined number of samples, and sends onto channel CH, for each sample frame, the coding of a suitable representation ⁇ (k) of a group of linear prediction coefficients a(k) obtained by a short-term analysis of the speech signal, the coded amplitudes and positions A(i), Cp of the pulses forming the excitation signal, the coded r.m.s. values ⁇ (i) of the excitation pulses, and the codings of two parameters (gain B and lag M) determined by the long-term analysis.
  • Decoder DEC reconstructs the excitation and generates a synthesized speech signal on the basis of the reconstructed excitation, the linear prediction coefficients reconstructed starting from the transmitted representation thereof, and long-term analysis parameters.
  • the digital sample frames, present on connection 1 are supplied to a spectral shaping circuit SW and to a short-term analysis circuit STA.
  • Spectral shaping circuit SW performs a frequency-shaping of the speech signal in order to render the differences between the original and the reconstructed speech signals less perceptible in correspondence with the formants of the original speech signal.
  • Such a circuit consists of a pair of cascaded digital filters F1, F2, whose transfer functions, in z transform, are given in a non-limiting example respectively by relations where z represents a sampling interval delay; â(k) is a quantized linear prediction coefficient vector (1 ⁇ k ⁇ p, where p is the filter order) reconstructed from the coded representation of the linear prediction coefficients obtained as short-term analysis result; ⁇ is an experimentally determined constant correcting factor, determining the bandwidth increase around the formants.
  • a signal r(n) hereinafter referred to as “residual signal”
  • spectrally shaped speech signal s w (n) is obtained on output connection 3 of F2: both signals are used in long-term analysis.
  • Short-term analysis circuit STA is to determine linear prediction coefficients a(k), which depend on short-term correlations deriving from a non-flat spectral envelope of speech signal. Circuit STA calculates coefficients a(k) according to the classical autocorrelation method, as described in "Digital Signal Processing of Speech Signals" by L.R. Rabiner and R.W. Schafer (Prentice-Hall, Englewood Cliffs, N.J., USA, 1978), page 401, and uses to this aim a set of digital samples s h (n) which can comprise, besides the samples of the current frame, a certain number of samples of both the preceding and the following frames.
  • Block STA also comprises circuits for transforming the coefficients into a group of parameters ⁇ (k) in the frequency domain, known as "line spectrum pairs", which are presented on output 5 of STA.
  • line spectrum pairs denote the resonant frequencies at which the acoustic tube, the vocal tract can be assimilated to, exhibits a line spectrum structure under extreme boundary conditions corresponding to complete opening and closure at the glottis.
  • the conversion of linear prediction coefficients into line spectrum pairs is described e.g. by N. Sugamura and F.Itakura in the paper "Speech analysis and synthesis method developed at ECL in NTT - From LPC to LSP", Speech Communication, Vol.5, No.2, June 1986, pages 199-215.
  • Line spectrum pairs ⁇ (k) or the differences ⁇ between adjacent line pairs are then vectorially quantized in a vector quantization circuit VQ exploiting techniques of the type described in published European Patent application EP-A-186763 (CSELT), applied to a set of codebooks.
  • CSELT published European Patent application EP-A-186763
  • That vector instead of being coded by a single word with that number of bits, is quantized by a group of words of smaller size chosen out of suitable sub-codebooks.
  • the modality of quantization of the above patent application are applied to obtain each of said words.
  • vector quantizer VQ is one of the characteristics of the present invention and allows a reduction in the number of bits necessary to code the results of the short-term analysis, while maintaining the same quality of the coded signal, from about 36-34 bits (scalar quantization) to 24 (vector quantization).
  • differences ⁇ organized into three vectors of 3, 3 and 4 components respectively, may be quantized with 24 bits organized into three groups of 256 words, each group corresponding to one of said vectors.
  • the indices of the vectors are sent by VQ on a connection 6 which belongs to channel CH.
  • a circuit DCO obtains from said indices quantized linear prediction coefficients â(k) which are supplied, through connection 4, to filters F1, F2 or circuit SW, to an excitation generator EG and to a long-term analysis circuit LTA.
  • LTA supplies information dependent on the fine spectral structure of the signal, which information is used to make the synthesized signal more natural-sounding.
  • the samples relevant to M preceding sampling instants weighted by a weighting factor (gain) 3, are used.
  • LTA is just to determine both M and B.
  • Lag M in case of a voiced sound, corresponds to the pitch period.
  • the lag can range from 20 to 83 samples and it is updated every frame. The gain is on the contrary updated every half frame.
  • Values M and B are emitted on a connection 7 and are supplied to excitation generator EG which also receives, through a connection 8, a signal s we (n), obtained from s w (n) in a manner which will be described hereinafter. Values M and B are also sent to a coder LTC, which transfers the coded signals onto a connection 9 belonging to channel CH.
  • LTC liquid crystal display
  • Long-term analysts circuit LTA performs a closed-loop analysis as a part of the procedure for determining the pulse positions, with modalities allowing a good coder performance to be maintained even if a sub-optimum procedure is used, as will be better described hereinafter.
  • Excitation generator EG is to supply the sequence of Ns pulses (e.g. 6), distributed within a time period Ls (more particularly corresponding to half a frame), forming the excitation signal; such a signal is computed so as to minimize a mean squared error, frequency shaped as mentioned, between the original signal and the reconstructed one.
  • Ns pulses e.g. 6
  • Ls more particularly corresponding to half a frame
  • Excitation generator EG supplies, through a connection 10, the pulses it has generated to a circuit PAC coding the amplitudes and the positions of such pulses, which circuits calculate and code also the r.m.s. values of said pulses.
  • the coded values ⁇ (i), A(i) (1 ⁇ i ⁇ Ns) and Cp are emitted on a connection 11, also belonging to channel CH.
  • circuit PAC The structure of circuit PAC is known to the skilled in the art.
  • an excitation decoder ED reconstructs the excitation starting from the coded values ⁇ (i), A(i), Cp.
  • reconstructed excitation pulses ê are supplied by ED to a long-term synthesis filter LTP1 which, together with a short-term synthesis filter STP, forms synthesizer SYN.
  • Reconstructed residual signal r ⁇ is present at the output of LTP1 and is sent via a connection 14 to short-term synthesis filter STP.
  • This is a filter whose transfer function in z transform is 1/A(z), where A(z) is the function already examined for filter F1 of spectral shaping circuit SW.
  • Coefficients â(k) for filter STP are supplied through a connection 15 from a circuit STD, which reconstructs them by decoding the information relevant to line spectrum pairs.
  • Filter STP emits on connection 16 the reconstructed or synthesized speech signal ⁇ .
  • the optimum solution would be determining, for each pair of possible values m, b of the lag and gain used to determine the optimum values M, B to be exploited in the synthesis, the combination of excitation pulses, gain and lag minimizing the mean squared error between the original signal and the reconstructed signal.
  • the optimum solution is too complex and hence, according to the invention, the determination of M and B is separated from that of the excitation pulses There are hence two successive operation phases.
  • M, B of m and b are to be found which minimize mean squared error between frequency-shaped speech signal s w (n) and a signal s w0 (n) obtained by weighting, in the same way as the residual signal, a signal r0 obtained as a response from a long-term synthesis filter (similar to the one of the synthesizer), when at the filter input a zero has been forced (long-term synthesis filter memory).
  • a predetermined value b is allotted to the gain and the error is minimized for each value m of lag: once found optimum lag M, the successive step is that of determining the optimum gain B.
  • value B of b is chosen which renders E(M, b) minimum.
  • B is computed every half frame, and hence also the excitation pulses will be computed every half frame.
  • Fig. 3 shows a block diagram of the devices of LTP and EG in case signal 0 is used to determine M and B.
  • a synthesis filter LTP2 having a transfer function similar to that of LTP1 (Fig. 1), is fed with a null signal.
  • filter LTP2 successively uses the different values m and, for each of them, an optimum value b(ott) which is implicitly obtained in the above-mentioned derivative operation.
  • B LTP2 uses value M of the lag determined in the preceding step and different values b.
  • Values m and b are supplied to LTP2 by a processing unit CMB, carrying out the computations and comparisons mentioned above.
  • Signal r0 is present on output 20 of LTP2.
  • Output 20 is connected to a first input of a multiplexer MX1 receiving at a second input the residual signal r(n) present on connection 2, and letting through signal r0 or signal r depending on the relative value of m and n.
  • signal 0 is present on output connection 21 of MX1, and that signal is delayed by a time equal to m samples in a delay element DL1 before being sent to CMB.
  • the latter receives also signal r(n) and, for each frame and for all values m, calculates function R'(m) and determines the value M of m which maximizes such function.
  • the value is stored into a register RM and made available on wires 7a of connection 7.
  • Output 20 of LTP2 is also connected to a weighting filter F3, which is enabled only while B is being computed and has the same transfer function 1/ A(z/ ⁇ ) as filter F2 in SW (Fig. 1).
  • Filter F3 weights signal r0 (or r'0, when the gain used in LTP2 is 1) giving at output 22 signal s w0 (s' w0 ).
  • the latter is supplied at an input of an adder SM1 where it is subtracted from signal s w coming from spectral shaping filter SW (Fig. 1) via connection 3.
  • SM1 supplies on output 8 signal s we .
  • device CMB determines, every half frame, value B of b which minimizes E and stores it into register RB which keeps it available, for the whole half frame, on a group of wires 7b of connection 7.
  • Values B, M computed by CMB are supplied to LTC (Fig.1) and to a long-term synthesis filter LTP3 which is part of the excitation generator EG and is followed by a weighting filter F4.
  • Filters LTP3, F4 have transfer functions similar to those of LTP1 and F2, respectively;
  • LTP3 is fed, during the analysis-by-synthesis procedure, with the excitation pulses e(i) supplied via connection 10 by a processing unit CE which sequentially determines the positions and the amplitudes of the various pulses.
  • F4 emits on output 24 signal ⁇ we which is supplied to a first input of an adder SM2 receiving at a second input signal s we outgoing from SM1. The difference between the two signals is then supplied via connection 25 to CE, which determines pulses e(i) by minimizing mean squared error dw.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Analogue/Digital Conversion (AREA)
  • Dc Digital Transmission (AREA)

Abstract

A coding-decoding method using a mulipulse analysis-by-synthesis excitation technique comprises, in the decoding phase, cascaded long-term and short-term synthesis filterings. The lag and gain of the long-term synthesis and the excitation pulses are determined during the coding phase within the analysis-by-synthesis procedure in two subsequent steps, in the first of which the lag and the gain are determined, while in the second the positions and the amplitudes of the excitation pulses are determined. The invention concerns also the device performing the method.

Description

  • The present invention concerns medium-low bit-race speech signal coding systems, and more particularly it relates to a coding-decoding method and device using a multipulse analysis-by-synthesis excitation technique.
  • Multipulse linear prediction coding is one of the most promising techniques for obtaining high quality synthetic speech at bit rates below 16 kbit/s. This technique has been originally proposed by B. S. Atal and J. R. Remde in the paper entitled "A new method of LPC excitation for producing natural-sounding speech at low bit rates", International Conference on Acoustic, Speech, Signal Processing (ICASSP), pages 614-617, Paris, 1982. According to this technique, the excitation signal for the synthesis filter consists of a train of pulses whose amplitudes and time positions are determined so as to minimize a perceptually-meaningful distorsion measurement; such a measurement is obtained by comparing the samples at the synthesis filter output with the original speech samples and simultaneous weighting the difference by a function which takes into account how the human perception evaluates the distorsion introduced (analysis-by-synthesis procedure).
  • Different coding-decoding systems using this excitation technique have been suggested. Among those systems, the ones where the synthesizer comprises the cascade of a long-term and a short-term synthesis filter are of particular interest: in fact they provide signals whose quality gradually decreases as the bit rate decreases and do not present a dramatic performance deterioration below a threshold rate.
  • Examples of said systems are described e.g. in the papers "High quality multipulse speech coder with pitch prediction" presented by K. Ozawa and T. Araseki at the conference ICASSP 86, Tokyo, 7-11 April 1986, and published at pages 33.3.1 - 33.3.4 of the conference proceedings, and "Experimental evaluation of different approaches to the multipulse coder", presented by P. Kroon and E. F. Deprettere at the conference ICASSP 84, San Diego, 19-21 March 1984, and published at pages 10.4.1-10.4.4 of the conference proceedings.
  • In those systems all parameters relevant to long-term synthesis filter and to excitation are optimized within the analysis-by-synthesis procedure . This procedure gives highly-complex optimum algorithms. If the optimum procedure is not followed, there is a performance reduction for a given transmission rate, or a transmission rate increase is required to maintain a certain performance level.
  • The aforementionned paper by Kroon and Deprettere shows determination of long-term analysis delay and gain in a separate step from determination of pulse amplitudes and locations. Yet such delay and gain are directly determined from the residual signal (open loop analysis) which does not lead to an optimal performance.
  • The invention provides a method and a device allowing quality to be increased leaving the bit rate unchanged or a given quality to be maintained even at lower bit rate. This can be achieved by using a combined optimization technique, of sequential type, of the parameters of the long-term synthesis filter and of the excitation within the analysis-by-synthesis procedure; the sequential procedure is sub-optimum with respect to the original optimum one, but it is easier to be implemented. In fact, a method is provided where an optimization of parameters according to the particular error minimization procedure is used, which is a closed loop analysis. The terms "open loop analysis" and "closed loop analysis" are here used as explained e.g. in IEEE Journal on Selected Areas in communications, Vol. 6 No. 2, Feb. 1988, p.353-363, Kroon and Deprettere.
  • The method of speech signal coding and decoding according to the invention, using a multipulse analysis-by-synthesis excitation technique, comprises a coding phase including the following operations: speech signal conversion into frames of digital samples [s(n)]: short-term analysis of the speech signal, to determine a group of linear prediction coefficients [a(k)] (k = 1, ..., p) relevant to a current frame and a representation thereof as line spectrum pairs; coding of said representation of the linear prediction coefficients, and obtaining quantized linear prediction coefficents [â(k)] from said representation; spectral shaping of the speech signal, by weighting the digital samples [s(n)] in a frame by a first and a second weighting functions A(z), 1/A(z/γ), where
    Figure imgb0001

    the weighting by the first weighting function generating a residual signal [r(n)], which is then weighted by the second function to generate a spectrally-shaped speech signal [sw(n)]; long-term analysis of the speech signal, by using said residual signal [r(n)] and said spectrally weighted signal [sw(n)], to determine the lag separating a current sample from a preceding sample [r(n-M)] used to process said current sample, and the gain by which said preceding sample is weighted for the processing; determination of the positions and amplitudes of the excitation pulses, by exploiting the results of short-term and long-term analysis; coding of the values of said lag and gain of long-term analysis and of said amplitudes and positions of the excitation pulses, the coded values forming, jointly with the coded representation of the linear prediciton coefficients and with coded r.m.s. values of said excitation pulses, the coded speech signal; and also comprises a decoding phase, where the excitation is reconstructed starting from the coded values of the amplitudes, the positions and the r.m.s. values of the pulses and where a synthesized speech signal [ŝ(n)] is generated by passing said reconstructed excitation through a long-term synthesis filter 1/(1-B·z-M) followed by a short-term synthesis filter 1/A(z), which filters exploit the long-term analysis parameters and respectively the quantized linear prediction coefficients; wherein said long-term analysis and excitation pulse generation are performed in successive steps, in the first of which long-term analysis gain and lag are determined by minimizing a mean squared error between the spectrally-shaped speech signal [sw(n)] and a further signal [sw0(n)] obtained by weighting by said second weighting function 1/A(z/γ) the signal resulting from a long-term synthesis filtering, which is similar to that performed during decoding and in which the signal used for the synthesis is a null signal, while in the second step the amplitudes and positions of the excitation pulses [e(i)] are actually determined by minimizing the mean squared error between a signal [swe(n)] representing the difference between the spectrally-shaped speech signal [sw(n)] and said further signal [sw0(n)], and a third weighted signal [ŝwe(n)], obtained by submitting the excitation pulses to a long-term synthesis filtering and to a weighting by said second weighting function; and wherein the coding of said representation of the linear prediction coefficients consists in a vector quantization of the line spectrum pairs or of the adjacent line pair differences according to a split-codebook quantization technique.
  • The invention provides also a device for speech signal coding and decoding by multipulse analysis-by-synthesis excitation techniques, for implementing the above method, comprising, for speech signal coding: means for converting the speech signal into frames of digital samples [s(n)]; means for the short-term analysis of the speech signal, which means receive a group of samples from said converting means, compute a set of linear prediction coefficients [a(k)], (k = 1, ...,p) relevant to a current frame and emit a representation of said linear prediction coefficients [a(k)] as line spectrum pairs; means for coding said representation of the linear prediction coefficients; means for obtaining quantized linear prediction coefficients [â(k)] from said coded representation; a circuit for the spectral shaping of the speech signal, connected to the converting means and to the means obtaining the quantized linear prediction coefficients and comprising a pair of cascaded weighting digital filters, weighting the digital samples [s(n)] according to a first and a second weighting function A(z), 1/A(z/γ), where
    Figure imgb0002

    respectively, said first filter supplying a residual signal r(n); means for the long-term analysis of the speech signal, connected to the outputs of said first filter and of the spectral shaping circuit to determine the lag which separates a current sample from a preceding sample [r(n-M)], used to process said current sample, and the gain by which said preceding sample is weighted for the processing; an excitation generator for determining the positions and the amplitudes of the excitation pulses, connected to said short-term and long-term analysis means and to said spectral shaping circuit; means for coding the values of said long-term analysis lag and gain and excitation pulse positions and amplitudes, the coded values forming, jointly with the coded representation of the linear prediction coefficients and with r.m.s. values of said excitation pulses, the coded speech signal; and also comprising, for speech signal decoding (synthesis): means for reconstructing the excitation, the long-term analysis lag and gain and the linear prediction coefficients [a(k)] starting from the coded signal; and a synthesizer, comprising the cascade of a first long-term synthesis filter, which receives the reconstructed excitation pulses, gain and lag and filters them according to a first transfer function 1/(1-B·z-M) and a short-term synthesis filter having a second transfer function 1/A(z) which is the reciprocal of said first spectral weighting function A(z), whereby the long-term analysis means are apt to determine said lag and gain in two successive steps, preceding a step in which the amplitudes and positions of the excitation pulses are determined by said excitation generator, and comprise: a second long-term synthesis filter, which is fed with a null signal and in which, for the computation of the lag, there is used a predetermined set of values of the number of samples separating a current sample being synthesized from a previous sample used for the synthesis, and, for the computation of the gain, a predetermined set of possible values of the gain itself is used; a multiplexer receiving at a first input a sample of the residual signal [r(n)] and at a second input a sample of the output signal of the second long-term synthesis filter and supplying the samples present at either input depending on whether or not said number of samples is lower than a frame length; a third weighting filter, which has the same transfer function as said second digital filter of the spectral shaping circuit, is connected to the output of said second long-term synthesis filter and is enabled only during the determination of the long-term analysis gain; a first adder, which receives at a first input the spectrally-shaped signal (sw) and at a second input the output signal of said third weighting filter and supplies the difference between the signals present at its first and second input; a first processing unit, which receives in the first of said two successive steps the signal outgoing from said multiplexer and determines the optimum value of said number of samples, and in the second of said two successive steps receives the output signal of said first adder and determines, by using the lag computed in the first step, the value of the gain which minimizes the mean squared error, within a validity period of the excitation pulses, between the input signals of the first adder; and whereby the excitation generator for generating the excitation pulses [e(i)] comprises: a third long-term synthesis filter, which has the same transfer function as the first long-term synthesis filter and is fed with the excitation pulses generated; a fourth weighting filter, connected to the output of the third synthesis filter and having the same transfer function as said second and third weighting filters; a second adder, which receives at a first input the output signal of said first adder and at a second input the output signal of the fourth weighting filter, and supplies the difference between the signals present at its first and second input; a second processing unit which is connected to the output of said second adder and determines the amplitudes and positions of said pulses by minimizing the mean squared error, within a pulse validity period, between the input signals of the second adder.
  • The invention will be better understood from the following description of an exemplary embodiment thereof, with reference to the annexed drawings, in which:
    • Fig. 1 a block diagram of the coder-decoder according to the invention;
    • Fig. 2 is a flow chart of the operations concerning the determination of long-term analysis gain;
    • Fig. 3 is a block diagram of the circuits for long-term analysis and excitation pulse generation.
  • With reference to Fig. 1, a generic speech signal coding-decoding system can be schematized by a coder COD, a transmission channel CH and a decoder DEC.
  • In case of a system based on a multipulse excitation technique and exploiting speech signal long-term and short-term correlations, coder COD receives digital samples s(n) of the original speech signal, organized into frames comprising each a predetermined number of samples, and sends onto channel CH, for each sample frame, the coding of a suitable representation ω(k) of a group of linear prediction coefficients a(k) obtained by a short-term analysis of the speech signal, the coded amplitudes and positions A(i), Cp of the pulses forming the excitation signal, the coded r.m.s. values σ(i) of the excitation pulses, and the codings of two parameters (gain B and lag M) determined by the long-term analysis. Decoder DEC reconstructs the excitation and generates a synthesized speech signal on the basis of the reconstructed excitation, the linear prediction coefficients reconstructed starting from the transmitted representation thereof, and long-term analysis parameters.
  • By way of example, whenever necessary, reference will be made to a 15 ms frame duration, which corresponds to 120 samples if a 8 kHz sampling frequency is assumed.
  • For the coding in COD, the digital sample frames, present on connection 1, are supplied to a spectral shaping circuit SW and to a short-term analysis circuit STA.
  • Spectral shaping circuit SW performs a frequency-shaping of the speech signal in order to render the differences between the original and the reconstructed speech signals less perceptible in correspondence with the formants of the original speech signal. Such a circuit consists of a pair of cascaded digital filters F1, F2, whose transfer functions, in z transform, are given in a non-limiting example respectively by relations
    Figure imgb0003

    where z represents a sampling interval delay; â(k) is a quantized linear prediction coefficient vector (1 ≦ k ≦ p, where p is the filter order) reconstructed from the coded representation of the linear prediction coefficients obtained as short-term analysis result; γ is an experimentally determined constant correcting factor, determining the bandwidth increase around the formants. Spectral shaping circuit SW as a whole has a transfer function W(z) = A(z)/A(z,γ)
    Figure imgb0004
    . A signal r(n), hereinafter referred to as "residual signal", is obtained on output connection 2 of F1, and spectrally shaped speech signal sw(n) is obtained on output connection 3 of F2: both signals are used in long-term analysis.
  • Short-term analysis circuit STA is to determine linear prediction coefficients a(k), which depend on short-term correlations deriving from a non-flat spectral envelope of speech signal. Circuit STA calculates coefficients a(k) according to the classical autocorrelation method, as described in "Digital Signal Processing of Speech Signals" by L.R. Rabiner and R.W. Schafer (Prentice-Hall, Englewood Cliffs, N.J., USA, 1978), page 401, and uses to this aim a set of digital samples sh(n) which can comprise, besides the samples of the current frame, a certain number of samples of both the preceding and the following frames.
  • More particularly, with reference to the exemplary frame length, the set of samples sh(n) can comprise 200 samples, overlapping the frame which is being processed. Block STA also comprises circuits for transforming the coefficients into a group of parameters ω(k) in the frequency domain, known as "line spectrum pairs", which are presented on output 5 of STA. As known, line spectrum pairs denote the resonant frequencies at which the acoustic tube, the vocal tract can be assimilated to, exhibits a line spectrum structure under extreme boundary conditions corresponding to complete opening and closure at the glottis.
    The conversion of linear prediction coefficients into line spectrum pairs is described e.g. by N. Sugamura and F.Itakura in the paper "Speech analysis and synthesis method developed at ECL in NTT - From LPC to LSP", Speech Communication, Vol.5, No.2, June 1986, pages 199-215.
  • Line spectrum pairs ω(k) or the differences Δω between adjacent line pairs are then vectorially quantized in a vector quantization circuit VQ exploiting techniques of the type described in published European Patent application EP-A-186763 (CSELT), applied to a set of codebooks. In other words, leaving unchanged the number of bits by which each vector of ω (or Δω) is desiderably coded, that vector, instead of being coded by a single word with that number of bits, is quantized by a group of words of smaller size chosen out of suitable sub-codebooks. The modality of quantization of the above patent application are applied to obtain each of said words. The presence of vector quantizer VQ is one of the characteristics of the present invention and allows a reduction in the number of bits necessary to code the results of the short-term analysis, while maintaining the same quality of the coded signal, from about 36-34 bits (scalar quantization) to 24 (vector quantization). By way of example, differences Δω, organized into three vectors of 3, 3 and 4 components respectively, may be quantized with 24 bits organized into three groups of 256 words, each group corresponding to one of said vectors. The indices of the vectors are sent by VQ on a connection 6 which belongs to channel CH.
  • A circuit DCO obtains from said indices quantized linear prediction coefficients â(k) which are supplied, through connection 4, to filters F1, F2 or circuit SW, to an excitation generator EG and to a long-term analysis circuit LTA.
  • Long-term analysis circuit LTA supplies information dependent on the fine spectral structure of the signal, which information is used to make the synthesized signal more natural-sounding. For the analysis concerning a sample frame, the samples relevant to M preceding sampling instants, weighted by a weighting factor (gain) 3, are used. LTA is just to determine both M and B. Lag M, in case of a voiced sound, corresponds to the pitch period. In the example considered, the lag can range from 20 to 83 samples and it is updated every frame. The gain is on the contrary updated every half frame. Values M and B are emitted on a connection 7 and are supplied to excitation generator EG which also receives, through a connection 8, a signal swe(n), obtained from sw(n) in a manner which will be described hereinafter. Values M and B are also sent to a coder LTC, which transfers the coded signals onto a connection 9 belonging to channel CH.
  • The structure and the operation of a device such as LTC are known in the art.
  • In the paper "A Class of Analysis-by-Synthesis Predictive Coders for High Quality speech Coding at Rates Between 4.8 and 16 kbits/s" by P. Kroon and F. Deprettere, in IEEE Journal of Selected Areas in Communications, vol. 6 No. 2, 1988 at pages 353-363, the terms "open-loop" and "closed-loop" and their meanings in the art are well described. Open loop configurations use the residual signal and closed-loop configurations use the analysis-by-synthesis.
  • Long-term analysts circuit LTA performs a closed-loop analysis as a part of the procedure for determining the pulse positions, with modalities allowing a good coder performance to be maintained even if a sub-optimum procedure is used, as will be better described hereinafter.
  • Excitation generator EG is to supply the sequence of Ns pulses (e.g. 6), distributed within a time period Ls (more particularly corresponding to half a frame), forming the excitation signal; such a signal is computed so as to minimize a mean squared error, frequency shaped as mentioned, between the original signal and the reconstructed one.
  • The operations carried out by blocks LTA and EG will he described in more details hereinafter, making also reference to Fig. 3.
  • Excitation generator EG supplies, through a connection 10, the pulses it has generated to a circuit PAC coding the amplitudes and the positions of such pulses, which circuits calculate and code also the r.m.s. values of said pulses. The coded values σ(i), A(i) (1≦i≦Ns) and Cp are emitted on a connection 11, also belonging to channel CH.
  • The structure of circuit PAC is known to the skilled in the art.
  • In decoder DEC, an excitation decoder ED reconstructs the excitation starting from the coded values σ(i), A(i), Cp. Through a connection 12, reconstructed excitation pulses ê are supplied by ED to a long-term synthesis filter LTP1 which, together with a short-term synthesis filter STP, forms synthesizer SYN. The long-term synthesis filter is a filter whose transfer function, in z transform, is 1/P(z) = 1/(1-B·z -M )
    Figure imgb0005
    , where M, B, have the meanings stated above and are supplied to LTP1, through a connection 13, by a circuit LTD decoding the long-term analysis parameters.
  • Reconstructed residual signal r̂ is present at the output of LTP1 and is sent via a connection 14 to short-term synthesis filter STP. This is a filter whose transfer function in z transform is 1/A(z), where A(z) is the function already examined for filter F1 of spectral shaping circuit SW. Coefficients â(k) for filter STP are supplied through a connection 15 from a circuit STD, which reconstructs them by decoding the information relevant to line spectrum pairs.
  • Filter STP emits on connection 16 the reconstructed or synthesized speech signal ŝ.
  • To simplify the drawing we have not represented the devices for converting speech signal into sample frames, the buffers for the samples to be processed and the time base for timing the various operations. On the other hand said devices are wholly conventional.
  • Considering again long-term analysis and excitation generation, the optimum solution would be determining, for each pair of possible values m, b of the lag and gain used to determine the optimum values M, B to be exploited in the synthesis, the combination of excitation pulses, gain and lag minimizing the mean squared error between the original signal and the reconstructed signal. However, the optimum solution is too complex and hence, according to the invention, the determination of M and B is separated from that of the excitation pulses There are hence two successive operation phases.
  • In the first phase (determination of M, B) values M, B of m and b are to be found which minimize mean squared error
    Figure imgb0006

    between frequency-shaped speech signal sw(n) and a signal sw0(n) obtained by weighting, in the same way as the residual signal, a signal r₀ obtained as a response from a long-term synthesis filter (similar to the one of the synthesizer), when at the filter input a zero has been forced (long-term synthesis filter memory). In the second phase the positions and amplitudes of the excitation pulses are actually determined, so as to minimize, in a perceptually meaningful way, a squared error
    Figure imgb0007

    where swe(n) has the meaning above and ŝwe(n) is the signal obtained by filtering excitation pulses e(i) according to a function H(z) = 1/[P(z)A(z)]
    Figure imgb0008
    .
  • For the first phase an analytical approach could be followed, by taking into account that determining the minimum of E(m,b) corresponds to determining the maximum of a function

    R(m) = x²(m)/y(m)   (3)
    Figure imgb0009


    where
    Figure imgb0010

    L being the frame length.
  • This can be easily deduced by deriving the relation which gives the error and equalling the derivative to 0. However, for a generic value of n and m, signal r₀(n-m) can be unavailable, unless the lag exceeds the frame duration.
  • According to the invention two sub-optimum solutions allowing elimination of the constraint between the lag and the duration of the frame are proposed for computing B and M.
  • According to the first sub-optimum solution a predetermined value b is allotted to the gain and the error is minimized for each value m of lag: once found optimum lag M, the successive step is that of determining the optimum gain B.
  • A second and simpler solution is that of computing M by using a signal
    Figure imgb0011
    ₀ which consists of the signal r₀, when the lag is greater than the frame length (or, more generally, when a sample of the current frame is processed by using a sample of the preceding frame), while in the opposite case it is equal to residual signal r(n), and minimising the error
    Figure imgb0012

    Under said conditions the previous constraint for the lag is eliminated, since signals r₀ are always available, and hence M can be determined as the number m of samples which maximizes the function

    R'(m) = X²(m)/Y(m)   (6)
    Figure imgb0013


    where
    Figure imgb0014

       Once M has been determined, gain B can be determined either in exhaustive manner or by the following procedure, which reduces the necessary amount of computations. First, value s'w0 of sw0 when b=1 is determined, according to relation
    Figure imgb0015

    where r'₀(n) is the value of r₀ for b=1, and mean squared error E(M,1) is calculated. For each b ≠ 1, sw0 is calculated starting from s'w0, according to relations:

    s w0 (n) = bs' w0 (n) for n ≦ M
    Figure imgb0016

    s w0 (n) = b[s w0 (n-M) + s' w0 (n) - s' w0 (n-M)] for n > M   (9)
    Figure imgb0017


    and the corresponding error E(M, b) is determined. Lastly, value B of b is chosen which renders E(M, b) minimum. Once found M, B, the positions of the individual pulses e(i) of the excitation signal and then the amplitudes of same are determined so as to minimize dw, e.g. by the modalities described in the paper "Efficient computation and encoding of the multipulse excitation for LPC" by M. Berouti, H. Garten, P. Kabal and P. Mermelstein, presented at the already mentioned conference ICASSP 84 and published at pages 10.1.1-10.1.4 of the conference proceedings.
  • As said, B is computed every half frame, and hence also the excitation pulses will be computed every half frame.
  • Fig. 3 shows a block diagram of the devices of LTP and EG in case signal
    Figure imgb0018
    ₀ is used to determine M and B.
  • In circuit LTA a synthesis filter LTP2, having a transfer function similar to that of LTP1 (Fig. 1), is fed with a null signal. When M is being determined, filter LTP2 successively uses the different values m and, for each of them, an optimum value b(ott) which is implicitly obtained in the above-mentioned derivative operation. When B is being determined, LTP2 uses value M of the lag determined in the preceding step and different values b. Values m and b are supplied to LTP2 by a processing unit CMB, carrying out the computations and comparisons mentioned above. Signal r₀ is present on output 20 of LTP2.
  • Output 20 is connected to a first input of a multiplexer MX1 receiving at a second input the residual signal r(n) present on connection 2, and letting through signal r₀ or signal r depending on the relative value of m and n. Hence signal
    Figure imgb0019
    ₀ is present on output connection 21 of MX1, and that signal is delayed by a time equal to m samples in a delay element DL1 before being sent to CMB. The latter receives also signal r(n) and, for each frame and for all values m, calculates function R'(m) and determines the value M of m which maximizes such function. The value is stored into a register RM and made available on wires 7a of connection 7.
  • Output 20 of LTP2 is also connected to a weighting filter F3, which is enabled only while B is being computed and has the same transfer function 1/ A(z/γ) as filter F2 in SW (Fig. 1). Filter F3 weights signal r₀ (or r'₀, when the gain used in LTP2 is 1) giving at output 22 signal sw0 (s'w0). The latter is supplied at an input of an adder SM1 where it is subtracted from signal sw coming from spectral shaping filter SW (Fig. 1) via connection 3. SM1 supplies on output 8 signal swe. By using the procedure above (relations (8) and (9)), device CMB determines, every half frame, value B of b which minimizes E and stores it into register RB which keeps it available, for the whole half frame, on a group of wires 7b of connection 7.
  • Values B, M computed by CMB are supplied to LTC (Fig.1) and to a long-term synthesis filter LTP3 which is part of the excitation generator EG and is followed by a weighting filter F4. Filters LTP3, F4 have transfer functions similar to those of LTP1 and F2, respectively; LTP3 is fed, during the analysis-by-synthesis procedure, with the excitation pulses e(i) supplied via connection 10 by a processing unit CE which sequentially determines the positions and the amplitudes of the various pulses. F4 emits on output 24 signal ŝwe which is supplied to a first input of an adder SM2 receiving at a second input signal swe outgoing from SM1. The difference between the two signals is then supplied via connection 25 to CE, which determines pulses e(i) by minimizing mean squared error dw.
  • It is clear that what described has been given only by way of non limiting example and that variations and modifications are possible without going out of the scope of the invention as defined in the following claims.

Claims (6)

  1. A method of speech signal coding and decoding, using a multipulse analysis-by-synthesis excitation technique, which method comprises a coding phase including the following operations:
    - speech signal conversion into frames of digital samples [s(n)];
    - short-term analysis of the speech signal, to determine a group of linear prediction coefficients [a(k)] (k = 1, ..., p) relevant to a current frame and a representation thereof as line spectrum pairs;
    - coding of said representation of the linear prediction coefficients, and obtaining quantized linear prediction coefficients [â(k)] from said representation;
    - spectral shaping of the speech signal, by weighting the digital samples [s(n)] in a frame by a first and a second weighting functions A(z), 1/A(z/γ), where
    Figure imgb0020
    the weighting by the first weighting function generating a residual signal [r(n)], which is then weighted by the second function to generate a spectrally-shaped speech signal [sw(n)];
    - long-term analysis of the speech signal, by using said residual signal [r(n)] and said spectrally shaped signal [sw(n)], to determine the lag (M) separating a current sample from a preceding sample [r(n-M)] used to process said current sample, and the gain (B) by which said preceding sample is weighted for the processing;
    - determination of the positions and amplitudes of the excitation pulses, by exploiting the results of short-term and long-term analysis;
    - coding of the values of said lag and gain of long-term analysis and of said amplitudes and positions of the excitation pulses, the coded values forming, jointly with the coded representation of the linear prediction coefficients and with coded r.m.s. values of said excitation pulses, the coded speech signal;
    and also comprising a decoding phase, where the excitation is reconstructed starting from the coded values of the amplitudes, the positions and the r.m.s. values of the pulses and where a synthesized speech signal [ŝ(n)] is generated by passing said reconstructed excitation (ê) through a long-term synthesis filter 1/(1-B·z-M) followed by a short-term synthesis filter 1/A(z), which filters exploit the long-term analysis parameters and respectively the quantized linear prediction coefficients; wherein said long-term analysis and excitation pulse generation are performed in successive steps, in the first of which long-term analysis lag (M) and gain (B) are determined by minimizing a mean squared error between the spectrally-shaped speech signal [sw(n)] and a further signal [sw0(n)] obtained by weighting by said second weighting function 1/A(z/γ) the signal resulting from a long-term synthesis filtering, which is similar to that performed during decoding and in which the signal used for the synthesis is a null signal, while in the second step the amplitudes and positions of the excitation pulses [e(i)] are actually determined by minimizing the mean squared error between a signal [swe(n)] representing the difference between the spectrally-shaped speech signal [sw(n)] and said further signal [sw0(n)], and a third weighted signal [ŝwe(n)], obtained by submitting the excitation pulses to a long-term synthesis filtering and to a weighting by said second weighting function; and wherein the coding of said representation of the linear prediction coefficients consists in a vector quantization of the line spectrum pairs or of the adjacent line pair differences according to a split-codebook quantization technique.
  2. A method as claimed in claim 1, characterized in that the lag (M) and the gain (B) are determined in two successive steps, in the first of which an optimum value of the lag is determined by minimizing said error for a predetermined gain value, while in the second the optimum gain value is determined, by using said optimum lag value.
  3. A method as claimed in claim 1, characterized in that the lag (M) and the gain (B) are determined in two successive steps, in the first of which the mean squared error is minimized between the residual signal [r(n)] and a signal [
    Figure imgb0021
    ₀(n)] which is the signal [r₀(n)] resulting from said long-term synthesis filtering with null input, if the synthesis relevant to a sample of the current frame is performed on the basis of a sample of a preceding frame, and is said residual signal [r(n)] if the synthesis relevant to a sample of the current frame is performed on the basis of a preceding sample of the same frame, while in the second step the gain (B) is calculated with the following sequence of operations: a value [s'w0(n)] of said further signal is determined for a unitary gain value; a first error value E(M,1) is hence determined, and the operations for determining the value of the signal weighted with said second weighting function and of the error are repeated for each value possible for the gain, the value adopted being the one which minimizes the error.
  4. A method as claimed in claim 3, characterized in that the lag (M) is computed every frame, and the gain (B) every half frame.
  5. Device for speech signal coding and decoding by multipulse analysis-by-synthesis excitation techniques, for implementing the method as claimed in any of claims 1, 3 oder 4, comprising, for speech signal coding:
    - means for converting the speech signal into frames of digital samples [s(n)];
    - means (STA) for the short-term analysis of the speech signal, which means receive a group of samples from said converting means, compute a set of linear prediction coefficients [a(k)], (k = 1, ...,p) relevant to a current frame and emit a representation of said linear prediction coefficients [a(k)] as line spectrum pairs;
    - means (VQ) for coding said representation of the linear prediction coefficients;
    - means (DCO) for obtaining quantized linear prediction coefficients [â(k)] from said coded representation;
    - a circuit (SW) for the spectral shaping of the speech signal, connected to the converting means and to the means (DCO) obtaining the quantized linear prediction coefficients and comprising a pair of cascaded weighting digital filters (F1, F2), weighting the digital samples [s(n)] according to a first and a second weighting function A(z), 1/A(z/γ), where
    Figure imgb0022
    respectively, said first filter (F1) supplying a residual signal r(n);
    - means (LTA) for the long-term analysis of the speech signal, connected to the outputs of said first filter (F1) and of the spectral shaping circuit (SW) to determine the lag (M) which separates a current sample from a preceding sample [r(n-M)], used to process said current sample, and the gain (B) by which said preceding sample is weighted for the processing;
    - an excitation generator (EG) for determining the positions and the amplitudes of the excitation pulses, connected to said short-term and long-term analysis means (STA, LTA) and to said spectral shaping circuit (SW);
    - means (LTC, PAC) for coding the values of said long-term analysis lag and gain and excitation pulse positions and amplitudes, the coded values forming, jointly with the coded representation of the linear prediction coefficients and with r.m.s. values of said excitation pulses, the coded speech signal;
    and also comprising, for speech signal decoding (synthesis):
    - means (ED, LTD, STD) for reconstructing the excitation, the long-term analysis lag (M) and gain (B) and the linear prediction coefficients [a(k)] starting from the coded signal; and
    - a synthesizer, comprising the cascade of a first long-term synthesis filter (LTP1), which receives the reconstructed excitation pulses, gain and lag and filters said pulses according to a first transfer function 1/(1-B·z-M), and a short-term synthesis filter (STP) having a second transfer function 1/A(z) which is the reciprocal of said first spectral weighting function A(z), whereby
    the long-term analysis means (LTA) are apt to determine said lag (M) and gain (B) in two successive steps, preceding a step in which the amplitudes and positions of the excitation pulses are determined by said excitation generator (EG), and comprise:
    - a second long-term synthesis filter (LTP2), which is fed with a null signal and in which, for the computation of the lag (M), there is used a predetermined set of values of the number of samples separating a current sample being synthesized from a previous sample used for the synthesis, and, for the computation of the gain (B), a predetermined set of possible values of the gain itself is used;
    - a multiplexer (MX1) receiving at a first input a sample of the residual signal [r(n)] and at a second input a sample of the output signal of the second long-term synthesis filter (LTP2) and supplying the samples present at either input depending on whether or not said number of samples is lower than a frame length;
    - a third weighting filter (F3), which has the same transfer function as said second digital filter (F2) of the spectral shaping circuit (SW), is connected to the output of said second long-term synthesis filter (LTP2) and is enabled only during the determination of the long-term analysis gain (B);
    - a first adder (SM1), which receives at a first input the spectrally-shaped signal (sw) and at a second input the output signal of said third weighting filter (F3) and supplies the difference between the signals present at its first and second input;
    - a first processing unit (CMB), which receives in a first of said two successive steps the signal outgoing from said multiplexer (MX1) and determines the optimum value of said number of samples, and in the second of said two successive steps receives the output signal of said first adder (SM1) and determines, by using the lag computed in the first step, the value of the gain which minimizes the mean squared error, within a validity period of the excitation pulses, between the input signals of the first adder (SM1);
    and whereby the excitation generator (EG) for generating the excitation pulses [e(i)] comprises:
    - a third long-term synthesis filter (LTP3), which has the same transfer function as the first long-term synthesis filter (LTP1) and is fed with the excitation pulses generated;
    - a fourth weighting filter (F4), connected to the output of the third synthesis filter (LTP3) and having the same transfer function as said second and third weighting filters (F2, F3);
    - a second adder (SM2), which receives at a first input the output signal of said first adder (SM1) and at a second input the output signal of the fourth weighting filter (F4), and supplies the difference between the signals present at its first and second input;
    - a second processing unit (CE) which is connected to the output of said second adder (SM2) and determines the amplitudes and positions of said pulses by minimizing the mean squared error, within a pulse validity period, between the input signals of the second adder (SM2).
  6. A device as claimed in claim 5, characterized in that the means (VQ) coding said representation of the linear prediction coefficient consist of a vector quantizer (VQ) for split-codebook vector quantization of the line spectrum pairs or of the differences between adjacent line spectrum pairs.
EP89117837A 1988-09-28 1989-09-27 Method of and device for speech signal coding and decoding by means of a multipulse excitation Expired - Lifetime EP0361432B1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IT6786888 1988-09-28
IT67868/88A IT1224453B (en) 1988-09-28 1988-09-28 PROCEDURE AND DEVICE FOR CODING DECODING OF VOICE SIGNALS WITH THE USE OF MULTIPLE PULSE EXCITATION

Publications (3)

Publication Number Publication Date
EP0361432A2 EP0361432A2 (en) 1990-04-04
EP0361432A3 EP0361432A3 (en) 1990-09-26
EP0361432B1 true EP0361432B1 (en) 1994-08-17

Family

ID=11305936

Family Applications (1)

Application Number Title Priority Date Filing Date
EP89117837A Expired - Lifetime EP0361432B1 (en) 1988-09-28 1989-09-27 Method of and device for speech signal coding and decoding by means of a multipulse excitation

Country Status (6)

Country Link
EP (1) EP0361432B1 (en)
AT (1) ATE110180T1 (en)
DE (2) DE361432T1 (en)
ES (1) ES2017906T3 (en)
GR (1) GR900300170T1 (en)
IT (1) IT1224453B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0910064B1 (en) * 1991-02-26 2002-12-18 Nec Corporation Speech parameter coding apparatus
FI98104C (en) * 1991-05-20 1997-04-10 Nokia Mobile Phones Ltd Procedures for generating an excitation vector and digital speech encoder
ES2042410B1 (en) * 1992-04-15 1997-01-01 Control Sys S A ENCODING METHOD AND VOICE ENCODER FOR EQUIPMENT AND COMMUNICATION SYSTEMS.
FI95086C (en) * 1992-11-26 1995-12-11 Nokia Mobile Phones Ltd Method for efficient coding of a speech signal
FI96248C (en) * 1993-05-06 1996-05-27 Nokia Mobile Phones Ltd Method for providing a synthetic filter for long-term interval and synthesis filter for speech coder
GB9408037D0 (en) * 1994-04-22 1994-06-15 Philips Electronics Uk Ltd Analogue signal coder

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ICASSP 86, IEEE-IECEJ-ASJ INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, ANDSIGNAL PROCESSING, Tokyo, 7th - 11th April 1986, vol. 4, pages 3067-3070, IEEE,New York, US; G. OHYAMA et al.: "A novel approach to estimating excitation codein code-excited linear prediction coding" *
SIGNAL PROCESSING, Toyko, 7th - 11th April 1986, vol. 3, pages 1689-1692,IEEE, New York, US; K. OZAWA et al.: "High quality multi-pulse speech coderwith pitch predicton" *

Also Published As

Publication number Publication date
DE68917552D1 (en) 1994-09-22
EP0361432A3 (en) 1990-09-26
ES2017906T3 (en) 1994-10-16
ES2017906A4 (en) 1991-03-16
IT8867868A0 (en) 1988-09-28
ATE110180T1 (en) 1994-09-15
IT1224453B (en) 1990-10-04
GR900300170T1 (en) 1991-09-27
EP0361432A2 (en) 1990-04-04
DE361432T1 (en) 1991-03-21
DE68917552T2 (en) 1995-01-12

Similar Documents

Publication Publication Date Title
EP0409239B1 (en) Speech coding/decoding method
EP1221694B1 (en) Voice encoder/decoder
EP1232494B1 (en) Gain-smoothing in wideband speech and audio signal decoder
US7260521B1 (en) Method and device for adaptive bandwidth pitch search in coding wideband signals
CA1181854A (en) Digital speech coder
EP0360265B1 (en) Communication system capable of improving a speech quality by classifying speech signals
US5602961A (en) Method and apparatus for speech compression using multi-mode code excited linear predictive coding
US7280959B2 (en) Indexing pulse positions and signs in algebraic codebooks for coding of wideband signals
EP1224662B1 (en) Variable bit-rate celp coding of speech with phonetic classification
KR100264863B1 (en) Method for speech coding based on a celp model
US5339384A (en) Code-excited linear predictive coding with low delay for speech or audio signals
JPH10187196A (en) Low bit rate pitch delay coder
US5027405A (en) Communication system capable of improving a speech quality by a pair of pulse producing units
EP0361432B1 (en) Method of and device for speech signal coding and decoding by means of a multipulse excitation
Cuperman et al. Backward adaptation for low delay vector excitation coding of speech at 16 kbit/s
JPH086597A (en) Device and method for coding exciting signal of voice
US4908863A (en) Multi-pulse coding system
US5708756A (en) Low delay, middle bit rate speech coder
JPH0720897A (en) Method and apparatus for quantization of spectral parameter in digital coder
KR0155798B1 (en) Vocoder and the method thereof
JP3296411B2 (en) Voice encoding method and decoding method
JP2853170B2 (en) Audio encoding / decoding system
JPH08320700A (en) Sound coding device
JP3144244B2 (en) Audio coding device
WO2001009880A1 (en) Multimode vselp speech coder

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH DE ES FR GB GR LI NL SE

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Kind code of ref document: A3

Designated state(s): AT BE CH DE ES FR GB GR LI NL SE

17P Request for examination filed

Effective date: 19901019

EL Fr: translation of claims filed
TCAT At: translation of patent claims filed
DET De: translation of patent claims
TCNL Nl: translation of patent claims filed
17Q First examination report despatched

Effective date: 19920814

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AT BE CH DE ES FR GB GR LI NL SE

REF Corresponds to:

Ref document number: 110180

Country of ref document: AT

Date of ref document: 19940915

Kind code of ref document: T

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: SE

Payment date: 19940831

Year of fee payment: 6

Ref country code: BE

Payment date: 19940831

Year of fee payment: 6

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: CH

Payment date: 19940906

Year of fee payment: 6

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 19940919

Year of fee payment: 6

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GR

Payment date: 19940921

Year of fee payment: 6

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: ES

Payment date: 19940922

Year of fee payment: 6

REF Corresponds to:

Ref document number: 68917552

Country of ref document: DE

Date of ref document: 19940922

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 19940929

Year of fee payment: 6

Ref country code: AT

Payment date: 19940929

Year of fee payment: 6

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: NL

Payment date: 19940930

Year of fee payment: 6

Ref country code: FR

Payment date: 19940930

Year of fee payment: 6

ET Fr: translation filed
REG Reference to a national code

Ref country code: ES

Ref legal event code: FG2A

Ref document number: 2017906

Country of ref document: ES

Kind code of ref document: T3

REG Reference to a national code

Ref country code: GR

Ref legal event code: FG4A

Free format text: 3012980

EAL Se: european patent in force in sweden

Ref document number: 89117837.8

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed
PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Effective date: 19950927

Ref country code: AT

Effective date: 19950927

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SE

Effective date: 19950928

Ref country code: ES

Free format text: LAPSE BECAUSE OF THE APPLICANT RENOUNCES

Effective date: 19950928

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LI

Effective date: 19950930

Ref country code: CH

Effective date: 19950930

Ref country code: BE

Effective date: 19950930

BERE Be: lapsed

Owner name: SOCIETA ITALIANA TELECOMUNICAZIONI S.P.A. ITALTE

Effective date: 19950930

Owner name: SOCIETA ITALIANA PER L'ESERCIZIO DELLE TELECOMUNIC

Effective date: 19950930

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GR

Free format text: THE PATENT HAS BEEN ANNULLED BY A DECISION OF A NATIONAL AUTHORITY

Effective date: 19960331

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Effective date: 19960401

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 19950927

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Effective date: 19960531

REG Reference to a national code

Ref country code: GR

Ref legal event code: MM2A

Free format text: 3012980

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Effective date: 19960601

NLV4 Nl: lapsed or anulled due to non-payment of the annual fee

Effective date: 19960401

EUG Se: european patent has lapsed

Ref document number: 89117837.8

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

REG Reference to a national code

Ref country code: ES

Ref legal event code: FD2A

Effective date: 19991007