EP0347307B1 - Coding method and linear prediction speech coder - Google Patents

Coding method and linear prediction speech coder Download PDF

Info

Publication number
EP0347307B1
EP0347307B1 EP89401644A EP89401644A EP0347307B1 EP 0347307 B1 EP0347307 B1 EP 0347307B1 EP 89401644 A EP89401644 A EP 89401644A EP 89401644 A EP89401644 A EP 89401644A EP 0347307 B1 EP0347307 B1 EP 0347307B1
Authority
EP
European Patent Office
Prior art keywords
filtering
vector
excitation
vectors
subjected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP89401644A
Other languages
German (de)
French (fr)
Other versions
EP0347307A2 (en
EP0347307A3 (en
Inventor
Michel Lever
Marc Delprat
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nortel Networks France SAS
Original Assignee
Matra Communication SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matra Communication SA filed Critical Matra Communication SA
Publication of EP0347307A2 publication Critical patent/EP0347307A2/en
Publication of EP0347307A3 publication Critical patent/EP0347307A3/en
Application granted granted Critical
Publication of EP0347307B1 publication Critical patent/EP0347307B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/10Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation
    • G10L19/113Regular pulse excitation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0003Backward prediction of gain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0013Codebook search algorithms
    • G10L2019/0014Selection criteria for distances
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/06Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients

Definitions

  • the subject of the present invention is a coding method and a speech coder of the type known as linear prediction analysis. It relates more particularly to methods and speech coders of this type with excitation by excitation vector, often designated by the English abbreviation CELP, which are to be distinguished from coding methods with analysis by linear prediction with multi-pulse excitation. (MPLPC), an example of which is given in document EP-A-0 195 487 to which reference may be made.
  • Vector-driven linear prediction analysis coding provides an interesting solution to the problem of speech transmission in a narrow band channel, for example, transmission between mobiles and to mobiles in a 12.5 kHz channel which reduces the bit rate available at around 8 kbits / s; in the latter case, the bit rate assigned to the transmission of the parameters representing the speech signal is reduced to about 6 kbits / s since part of the overall bit rate must be assigned to the transmission of an error correction code.
  • Speech coders with linear prediction and vector excitation are already known, usable with a low bit rate, usually between a quarter of a bit and a half bit per speech sample.
  • SCHROEDER and ATAL Code excited linear prediction (CELP): high quality speech at very low bit rates
  • FIG. 1 gives a schematic diagram of such an encoder 10.
  • the speech signal is applied to this encoder via a digitization chain.
  • the chain comprises, from a microphone 12, a low-pass filter 14 limiting the bandwidth to approximately 4000 Hz and a sampler-encoder 16.
  • the sampler takes samples speech at a rate which is for example 8 kHz and provides successive samples, grouped by vocoder frames occupying time windows of fixed duration, for example 20 ms.
  • the coder 10 transforms the speech signal into a coded signal having a lower bit rate, transmitted to the transmission equipment by a multiplexer 18 which receives, for each frame, the indices k of the optimal excitation vectors Ck , the associated gains G k and coefficients identifying prediction parameters, for each of the constituent blocks of the frame, each occupying a sub-window.
  • the coder 10 shown by way of example in FIG. 1 uses analysis by synthesis: the speech spectrum in each window is modeled by a linear predictor filter whose coefficients are variable over time.
  • the residual signal, obtained by subtraction, is subject to vector quantization using a dictionary of waveforms.
  • excitation vectors stored in the dictionary 20 are chosen either empirically taking account of statistical data on the language, or randomly, or else from conventional binary digital codes such as the Golay codes.
  • the article by SCHROEDER and others mentioned above proposes for example a dictionary comprising 1024 excitation vectors each made up of 40 samples. This number of vectors is placed between the minimum below which the excitation would be poorly represented and the maximum beyond which the number of bits left free would be insufficient to transmit the parameters of the predictors.
  • the output of amplifier 22 is applied to a predictive synthesis filter consisting of a long-term predictor filter 24, intended to introduce the periodicity of the long-term signal, and of a short-term predictor filter 26.
  • the output Sn of the predictor filter which represents a synthesis of estimation of the speech signal, is applied to the subtractive input of a subtractor 28 which receives, on its additive input, the sampled and digitized speech signal Sn.
  • the coding operation consists in determining the optimal sequence of innovation c k and the gain G k for each speech frame by a synthetic analysis process.
  • the synthesis signal obtained S k is compared to the original signal S and the difference signal obtained in the subtractor 28 is processed in a perceptual weighting filter 30 having a transfer function W (z ), whose function is to attenuate the frequencies for which the errors are less important from the perceptual point of view and on the contrary to amplify the frequencies for which the errors are more important from the perceptual point of view.
  • a circuit 32 searches for the coding sequence for which the energy contained in the weighted error signal e k for a sub-window is minimal; this sequence is selected for the current block, then the optimum gain G k is calculated.
  • the function A (z) of the short-term predictive filter 26 is of the form:
  • the coefficients a (i) constitute the parameters of linear prediction. Their number is generally between 8 and 16 for windows of 20 ms.
  • the transfer function B (z) can be of the form 1-bz- T and involve a delay T ranging from 40 to 120 samples.
  • the perceptual weighting filter 30 has a transfer function W (z) which is generally of the form:
  • CELP coding method in accordance with the preamble of claim 1 (IEEE Journal on selected areas in communications, Vol. 6, n ° 2, February 1988, pages 353-363); the present invention aims to provide a coding method with linear prediction and excitation by coding vectors of this type, which meets the requirements of practice better than those previously known, in particular in that it reduces by at least an order of magnitude the volume of calculation to be carried out for the coding of a segment.
  • the invention notably proposes a speech coding method, with linear prediction and vector excitation, according to the characterizing part of claim 1.
  • each coding sequence consists of several equidistant pulses separated by zeros, advantageously binary, that is to say that an excitation by regular pulse sequences, or RPCELP is used, we reduce in very large proportions the duration of the search for the optimal sequence, especially if an appropriate choice is made of the characteristics of the perceptual weighting filter.
  • the perceptual weighting filter 30, placed at the output of the subtractor 28 in FIG. 1 is transferred to the two input branches of the subtractor in the form of filters 34 and 36, of transfer function 1 / A (z / y). There is thus in cascade, on the branch assigned to the original signal S (n), the filter 33 of transfer function A (z) and the filter 36 having the same transfer function as the filter 34.
  • the filtering of all the vectors by the synthesis filter, of transfer function 1 / A (z / y) whose coefficients vary over time, represents an enormous volume of calculations. This volume is reduced very considerably according to a first aspect of the invention, by adopting a perceptual weighting filter with small number of fixed coefficients in time, chosen according to the average characteristics of the speech over a long time interval.
  • the perceptual weighting filter then has a transfer function W (z) which can be written: where C (z / -y) is the transfer function of a short-term speech predictor, for example of the form:
  • the contribution of the memory in the long-term predictor filter 24, of transfer function 1 / B (z), and in the short-term predictor weighted function filter transfer 1 / A (z / y), is subtracted from the original signal having undergone the weighting to obtain a signal x n , before the start of the search in the vector dictionary 20.
  • This operation is carried out in the figure 3 using a subtractor 38 which receives only the memory component of the long-term predictor filter 24.
  • each vector C k is only processed by the weighted synthesis filter 34.
  • each of the filters 34 and 36 has been shown broken down into a filter 34a or 36a of transfer function 1 / ⁇ (z / ⁇ ), without memory, and a filter 34b or 36b corresponding only to the contribution of the memory terms.
  • the filtering operation by filter 34a is expressed above by the convolution of two finite sequences, represented by the product of a matrix and a vector: where H is a lower triangular matrix LxL (L being the common length of the sequences) whose elements are taken from the impulse response h (i) of 1 / A (z / y), of the form: which merges with that of 1 / ⁇ (z / y)
  • the next step in the process consists in eliminating the memory terms, that is to say the operations shown diagrammatically in 34a and 36a, to arrive at the constitution shown in FIG. 5.
  • W '(z) A (z) / C (z / -y)
  • Yet another embodiment of the invention implements a modified error evaluation criterion to be minimized.
  • the sample frames each occupying a window are successively applied; consequently, the impulse response of the weighted synthesis filter for a frame (or a block) occurs on the next frame (or the next block).
  • we use the damping of the filters and we apply to their input instead of a sequence consisting only of L samples, a sequence consisting of L samples and J zeros, J being chosen so that the impulse response of the synthesis filter W (z) / A (z) is practically zero after J samples.
  • the impulse response matrix then becomes a rectangular "strip" type matrix with (L + J) xL terms of the type:
  • A will then be calculated for each frame while k and G k will be calculated for each block.
  • a particularly interesting solution in this case consists in using sequences of pulses of length L having a regular structure made up of q equidistant pulses separated by D-1 zeros, the first pulse occupying one of the positions 0 to D-1 and the number of sequences being such that all of these positions are successively occupied. It is thus possible to give a satisfactory representation of the phase information in the excitation signal.
  • the dictionary consists of a basic set of K / D sequences, with a zero phase and with three successive shifts, ie in all K sequences.
  • Excitation by regular excitation sequences reduces the number of operations to be performed, since many of the products to be performed are zero, one of the factors being a zero whose position is known for each sample.
  • EP-A-0 195 487 relates to an MPLPC coding method according to which it is necessary successively to determine an optimal phase of pulses, then to seek the optimal amplitude of all the pulses constituting sequence among discrete values , quantified for example on 3 bits.
  • H c k all become equal and we have: where d m denotes one of the sequences (the number of K / D) resulting from the decimation of the components of the K vectors by elimination of zeros; the sequence d m for 0 ⁇ k 3 3 is given in FIG. 7 by way of example.
  • the coder then presents the constitution of principle shown in FIG. 6.
  • a single filtering operation is performed on the speech signal frame by the filter 33.
  • the sequence c k tested in a form which no longer needs to be prefiltered , is applied to the circuit 32 for calculating the scalar product c k t .y and for determining the maximum, for which an index selection order is sent at 40.
  • the sequence c k amplified at 22 is applied to the long-term predictor 24, shown with a single coefficient b.
  • the term r is formed by subtracting the output of the long-term predictor 24 from the output of the filter 34 on the speech channel, in the subtractor 38.
  • the filter 42 which receives the residue has a fixed response R (z) represented by a symmetric Toeplitz matrix.
  • This process reduces the number of calculations required in a report which is typically about three orders of magnitude compared to the conventional CELP method, regardless of the length L chosen for the speech blocks.
  • the gain G k is, for transmission, quantized in a quantizer 46.
  • Each signal frame is split into several blocks, an intermediate memory 48 must be interposed between the components 33 and 44.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)

Abstract

The method, usable in particular for low-rate speech transmission, uses vector excitation. A signal frame is represented, on the one hand, by prediction parameters, and on the other hand, by a succession of excitation vectors contained in a dictionary (20) and by amplification gains (Gk) of these vectors, the vectors adopted being determined by searching for the minimum energy of an error signal obtained by subtraction of each vector in its turn, after having submitted to a filtering, from the frame of the speech signal. Before subtraction: each frame of the speech signal is subjected to a short-term analysis filtering and to a weighted synthesis filtering, with coefficients possibly fixed in time, and the amplified vector is subjected to a long-term predictive filtering and to the same perceptual weighted synthesis filtering as the speech signal. <IMAGE>

Description

La présente invention a pour objet un procédé de codage et un codeur de parole du type dit à analyse par prédiction linéaire. Elle concerne plus particulièrement les procédés et codeurs de parole de ce genre à excitation par vecteur d'excitation, souvent désignés par l'abréviation anglo-saxonne CELP, qui sont à distinguer des procédés de codage à analyse par prédiction linéaire à excitation multi-impulsionnelle (MPLPC) dont un exemple est donné dans le document EP-A-0 195 487 auquel on pourra se reporter.The subject of the present invention is a coding method and a speech coder of the type known as linear prediction analysis. It relates more particularly to methods and speech coders of this type with excitation by excitation vector, often designated by the English abbreviation CELP, which are to be distinguished from coding methods with analysis by linear prediction with multi-pulse excitation. (MPLPC), an example of which is given in document EP-A-0 195 487 to which reference may be made.

Le codage à analyse par prédiction linéaire à excitation vectorielle fournit une solution intéressante au problème de la transmission de parole dans un canal à bande étroite, par exemple de la transmission entre mobiles et vers les mobiles dans un canal de 12,5 kHz qui réduit le débit disponible à environ 8 kbits/s ; dans ce dernier cas le débit affecté à la transmission des paramètres représentant le signal de parole est réduit à environ 6 kbits/s du fait qu'une partie du débit global doit être affectée à la transmission d'un code de correction d'erreur.Vector-driven linear prediction analysis coding provides an interesting solution to the problem of speech transmission in a narrow band channel, for example, transmission between mobiles and to mobiles in a 12.5 kHz channel which reduces the bit rate available at around 8 kbits / s; in the latter case, the bit rate assigned to the transmission of the parameters representing the speech signal is reduced to about 6 kbits / s since part of the overall bit rate must be assigned to the transmission of an error correction code.

On connaît déjà des codeurs de parole à prédiction linéaire et excitation vectorielle, utilisables avec un faible débit binaire, habituellement entre un quart de bit et un demi bit par échantillon de parole. On pourra notamment trouver un exemple de réalisation d'un tel codeur dans l'article de SCHROEDER et ATAL "Code excited linear prediction (CELP) : high quality speech at very low bit rates", proc. ICASSP, mars 1985.Speech coders with linear prediction and vector excitation are already known, usable with a low bit rate, usually between a quarter of a bit and a half bit per speech sample. We can in particular find an embodiment of such an encoder in the article by SCHROEDER and ATAL "Code excited linear prediction (CELP): high quality speech at very low bit rates", proc. ICASSP, March 1985.

La figure 1 donne un schéma de principe d'un tel codeur 10. Le signal de parole est appliqué à ce codeur par l'intermédiaire d'une chaîne de numérisation. Dans le mode de réalisation montré en figure 1, la chaîne comprend, à partir d'un microphone 12, un filtre passe-bas 14 limitant la bande passante à 4 000 Hz environ et un échantillonneur-codeur 16. L'échantillonneur prélève des échantillons de parole à une cadence qui est par exemple de 8 kHz et fournit des échantillons successifs, regroupés par trames vocodeurs occupant des fenêtres temporelles de durée déterminée, par exemple 20 ms.FIG. 1 gives a schematic diagram of such an encoder 10. The speech signal is applied to this encoder via a digitization chain. In the embodiment shown in FIG. 1, the chain comprises, from a microphone 12, a low-pass filter 14 limiting the bandwidth to approximately 4000 Hz and a sampler-encoder 16. The sampler takes samples speech at a rate which is for example 8 kHz and provides successive samples, grouped by vocoder frames occupying time windows of fixed duration, for example 20 ms.

Le codeur 10 transforme le signal de parole en un signal codé ayant une cadence de bits inférieure, transmis vers l'équipement d'émission par un multiplexeur 18 qui reçoit, pour chaque trame, les indices k des vecteurs d'excitation optimaux Ck, les gains Gk associés et des coefficients identifiant des paramètres de prédiction, pour chacun des blocs constitutifs de la trame, occupant chacun une sous-fenêtre.The coder 10 transforms the speech signal into a coded signal having a lower bit rate, transmitted to the transmission equipment by a multiplexer 18 which receives, for each frame, the indices k of the optimal excitation vectors Ck , the associated gains G k and coefficients identifying prediction parameters, for each of the constituent blocks of the frame, each occupying a sub-window.

Le codeur 10 représenté à titre d'exemple sur la figure 1 utilise l'analyse par synthèse : le spectre de parole dans chaque fenêtre est modélisé par un filtre prédicteur linéaire dont les coefficients sont variables dans le temps. Le signal résiduel, obtenu par soustraction, fait l'objet d'une quantification vectorielle en utilisant un dictionnaire de formes d'onde. Sur la figure 1, le dictionnaire 20 contient K+ 1 vecteurs d'excitation ck (avec k = 0, ..., k, ..., K) et attaque un amplificateur 22 de gain Gk.The coder 10 shown by way of example in FIG. 1 uses analysis by synthesis: the speech spectrum in each window is modeled by a linear predictor filter whose coefficients are variable over time. The residual signal, obtained by subtraction, is subject to vector quantization using a dictionary of waveforms. In FIG. 1, the dictionary 20 contains K + 1 excitation vectors c k (with k = 0, ..., k, ..., K) and drives an amplifier 22 of gain G k .

Habituellement les vecteurs d'excitation mémorisés dans le dictionnaire 20 sont choisis soit de façon empirique en tenant compte de données statistiques sur la langue, soit de façon aléatoire, soit encore à partir de codes numériques binaires classiques tels que les codes de Golay.Usually the excitation vectors stored in the dictionary 20 are chosen either empirically taking account of statistical data on the language, or randomly, or else from conventional binary digital codes such as the Golay codes.

L'article de SCHROEDER et autres mentionné plus haut propose par exemple un dictionnaire comportant 1024 vecteurs d'excitation constitués chacun de 40 échantillons. Ce nombre de vecteurs se place entre le minimum au dessous duquel l'excitation serait mal représentée et le maximum au delà duquel le nombre de bits laissés libres serait insuffisant pour transmettre les paramètres des prédicteurs.The article by SCHROEDER and others mentioned above proposes for example a dictionary comprising 1024 excitation vectors each made up of 40 samples. This number of vectors is placed between the minimum below which the excitation would be poorly represented and the maximum beyond which the number of bits left free would be insufficient to transmit the parameters of the predictors.

La sortie de l'amplificateur 22 est appliquée à un filtre prédictif de synthèse constitué d'un filtre prédicteur à long terme 24, destiné à introduire la périodicité du signal à long terme, et d'un filtre prédicteur à court terme 26. La sortie Sn du filtre prédicteur, qui représente une synthèse d'estimation du signal de parole, est appliquée à l'entrée soustractive d'un soustracteur 28 qui reçoit, sur son entrée additive, le signal de parole Sn échantillonné et numérisé.The output of amplifier 22 is applied to a predictive synthesis filter consisting of a long-term predictor filter 24, intended to introduce the periodicity of the long-term signal, and of a short-term predictor filter 26. The output Sn of the predictor filter, which represents a synthesis of estimation of the speech signal, is applied to the subtractive input of a subtractor 28 which receives, on its additive input, the sampled and digitized speech signal Sn.

Les fonctions de transfert respectives 1/B(z) et 1/A(z) des filtres 24 et 26 une fois calculées et quantifiées, l'opération de codage consiste à déterminer la séquence optimale d'innovation ck et le gain Gk pour chaque trame de parole par un processus d'analyse par synthèse. Pour chacune des séquences de codage Ck, le signal de synthèse obtenu Sk est comparé au signal d'origine S et le signal de différence obtenu dans le soustracteur 28 est traité dans un filtre de pondération perceptuelle 30 ayant une fonction de transfert W(z), dont la fonction est d'atténuer les fréquences pour lesquelles les erreurs ont moins d'importance du point de vue perceptuel et d'amplifier au contraire les fréquences pour lesquelles les erreurs ont davantage d'importance du point de vue perceptuel.The respective transfer functions 1 / B (z) and 1 / A (z) of the filters 24 and 26 once calculated and quantified, the coding operation consists in determining the optimal sequence of innovation c k and the gain G k for each speech frame by a synthetic analysis process. For each of the coding sequences Ck , the synthesis signal obtained S k is compared to the original signal S and the difference signal obtained in the subtractor 28 is processed in a perceptual weighting filter 30 having a transfer function W (z ), whose function is to attenuate the frequencies for which the errors are less important from the perceptual point of view and on the contrary to amplify the frequencies for which the errors are more important from the perceptual point of view.

Un circuit 32 recherche la séquence de codage pour laquelle l'énergie contenue dans le signal d'erreur pondéré ek pour une sous-fenêtre, est minimale ; cette séquence est sélectionnée pour le bloc en cours, puis le gain optimum Gk est calculé.A circuit 32 searches for the coding sequence for which the energy contained in the weighted error signal e k for a sub-window is minimal; this sequence is selected for the current block, then the optimum gain G k is calculated.

Classiquement, la fonction A(z) du filtre prédicteur à court terme 26 est de la forme :

Figure imgb0001
Conventionally, the function A (z) of the short-term predictive filter 26 is of the form:
Figure imgb0001

Dans cette formule, qui utilise la notation classique en z, les coefficients a(i) constituent les paramètres de prédiction linéaire. Leur nombre est généralement compris entre 8 et 16 pour des fenêtres de 20 ms.In this formula, which uses the classic notation in z, the coefficients a (i) constitute the parameters of linear prediction. Their number is generally between 8 and 16 for windows of 20 ms.

Quant à la fonction de transfert B(z), elle peut être de la forme 1-bz-T et faire intervenir un retard T allant de 40 à 120 échantillons.As for the transfer function B (z), it can be of the form 1-bz- T and involve a delay T ranging from 40 to 120 samples.

Le filtre de pondération perceptuelle 30 a de son côté une fonction de transfert W(z) qui est généralement de la forme :

Figure imgb0002
The perceptual weighting filter 30 has a transfer function W (z) which is generally of the form:
Figure imgb0002

En dépit de son intérêt, le procédé de codage qui vient d'être exposé ne peut pratiquement pas être mis en oeuvre en temps réel, du fait du volume énorme de calculs requis pour rechercher la séquence d'innovation, (c'est-à-dire le vecteur d'excitation) optimale par K + parcours successifs de boucle, chaque parcours constituant le filtrage d'un vecteur d'excitation, par des filtres à coefficients variables dans le temps.Despite its interest, the coding method which has just been described can practically not be implemented in real time, because of the enormous volume of computations required to search for the sequence of innovation, (i.e. - say the optimal excitation vector) by K + successive loop paths, each path constituting the filtering of an excitation vector, by filters with coefficients variable over time.

On connait également un procédé de codage CELP conforme au préambule de la revendication 1 (IEEE Journal on selected areas in communications, Vol. 6, n°2, Février 1988, pages 353-363) ; la présente invention vise à fournir un procédé de codage à prédiction linéaire et excitation par vecteurs de codage de ce type, répondant mieux que ceux antérieurement connus aux exigences de la pratique, notamment en ce qu'elle réduit d'au moins un ordre de grandeur le volume de calcul à effectuer pour le codage d'un segment.There is also known a CELP coding method in accordance with the preamble of claim 1 (IEEE Journal on selected areas in communications, Vol. 6, n ° 2, February 1988, pages 353-363); the present invention aims to provide a coding method with linear prediction and excitation by coding vectors of this type, which meets the requirements of practice better than those previously known, in particular in that it reduces by at least an order of magnitude the volume of calculation to be carried out for the coding of a segment.

Pour cela l'invention propose notamment un procédé de codage de parole, à prédiction linéaire et excitation vectorielle, suivant la partie caractérisante de la revendication 1.For this, the invention notably proposes a speech coding method, with linear prediction and vector excitation, according to the characterizing part of claim 1.

Du fait que chaque séquence de codage est constituée de plusieurs impulsions équidistantes séparées par des zéros, avantageusement binaires, c'est-à-dire qu'on utilise une excitation par séquences d'impulsions régulières, ou RPCELP, on réduit dans des proportions très considérables la durée de recherche de la séquence optimale, surtout si on fait un choix approprié des caractéristiques du filtre de pondération perceptuelle.Since each coding sequence consists of several equidistant pulses separated by zeros, advantageously binary, that is to say that an excitation by regular pulse sequences, or RPCELP is used, we reduce in very large proportions the duration of the search for the optimal sequence, especially if an appropriate choice is made of the characteristics of the perceptual weighting filter.

D'autres caractères de l'invention sont définis dans les revendications 2 à 6.Other features of the invention are defined in claims 2 to 6.

L'invention sera mieux comprise à la lecture de la description qui suit de modes particuliers de réalisation, donnés à titre d'exemples non limitatifs. La description se réfère aux dessins qui l'accompagnent, dans lesquels :

  • - la figure 1, déjà mentionnée, est un schéma de principe d'un codeur de parole à prédiction linéaire et excitation vectorielle déjà connu ;
  • - la figure 2, similaire à la figure 1, est une variante du schéma montrant une constitution possible de codeur de la figure 1, susceptible d'être simplifiée pour constituer un premier mode de réalisation de l'invention ;
  • - les figures 3, 4 et 5 sont des schémas montrant des évolutions successives du codeur de la figure 2 ;
  • - la figure 6, similaire à la figure 5, montre de façon plus complète un mode de réalisation de l'invention réduisant encore le volume de calcul ;
  • - la figure 7 montre une répartition possible de séquences de codage dans le dictionnaire ;
  • - la figure 8 montre une autre constitution possible du dictionnaire.
The invention will be better understood on reading the following description of particular embodiments, given by way of nonlimiting examples. The description refers to the accompanying drawings, in which:
  • - Figure 1, already mentioned, is a block diagram of a speech coder with linear prediction and vector excitation already known;
  • - Figure 2, similar to Figure 1, is a variant of the diagram showing a possible constitution of the encoder of Figure 1, which can be simplified to constitute a first embodiment of the invention;
  • - Figures 3, 4 and 5 are diagrams showing successive changes to the encoder of Figure 2;
  • - Figure 6, similar to Figure 5, shows more fully an embodiment of the invention further reducing the computational volume;
  • - Figure 7 shows a possible distribution of coding sequences in the dictionary;
  • - Figure 8 shows another possible constitution of the dictionary.

Dans le codeur de parole représenté schématiquement en figure 2 (où les éléments correspondant à ceux de la figure 1 sont désignés par le même numéro de référence) le filtre de pondération perceptuelle 30, placé à la sortie du soustracteur 28 sur la figure 1, est reporté sur les deux branches d'entrée du soustracteur sous forme de filtres 34 et 36, de fonction de transfert 1/A(z/y). On trouve ainsi en cascade, sur la branche affectée au signal d'origine S(n), le filtre 33 de fonction de transfert A(z) et le filtre 36 ayant la même fonction de transfert que le filtre 34.In the speech coder represented diagrammatically in FIG. 2 (where the elements corresponding to those of FIG. 1 are designated by the same reference number) the perceptual weighting filter 30, placed at the output of the subtractor 28 in FIG. 1, is transferred to the two input branches of the subtractor in the form of filters 34 and 36, of transfer function 1 / A (z / y). There is thus in cascade, on the branch assigned to the original signal S (n), the filter 33 of transfer function A (z) and the filter 36 having the same transfer function as the filter 34.

Le filtrage de tous les vecteurs par le filtre de synthèse, de fonction de transfert 1/A(z/y) dont les coefficients varient dans le temps, représente un volume énorme de calculs. Ce volume est réduit de façon très considérable suivant un premier aspect de l'invention, en adoptant un filtre de pondération perceptuelle à petit nombre de coefficients fixes dans le temps, choisis en fonction des caractéristiques moyennes de la parole sur un long intervalle de temps. Le filtre de pondération perceptuelle a alors une fonction de transfert W(z) qui peut s'écrire :

Figure imgb0003
où C(z/-y) est la fonction de transfert d'un prédicteur de parole à court terme, par exemple de la forme :
Figure imgb0004
The filtering of all the vectors by the synthesis filter, of transfer function 1 / A (z / y) whose coefficients vary over time, represents an enormous volume of calculations. This volume is reduced very considerably according to a first aspect of the invention, by adopting a perceptual weighting filter with small number of fixed coefficients in time, chosen according to the average characteristics of the speech over a long time interval. The perceptual weighting filter then has a transfer function W (z) which can be written:
Figure imgb0003
where C (z / -y) is the transfer function of a short-term speech predictor, for example of the form:
Figure imgb0004

Les fonctions de transfert des composants 34 et 36 de la figure 2 deviennent alors 1/C(z/γ).The transfer functions of components 34 and 36 in Figure 2 then become 1 / C (z / γ).

Un autre mode de mise en oeuvre de l'invention, pouvant être combiné au premier, apparaît mieux en considérant les transformations successives apportées au circuit de la figure 2 pour y aboutir.Another embodiment of the invention, which can be combined with the first, appears better by considering the successive transformations made to the circuit of FIG. 2 to achieve it.

Tout d'abord, comme indiqué sur les figures 3 et 4, la contribution de la mémoire dans le filtre prédicteur à long terme 24, de fonction de transfert 1/B(z), et dans le filtre prédicteur à court terme pondéré de fonction de transfert 1/A(z/y), est soustraite du signal d'origine ayant subi la pondération pour obtenir un signal xn, avant le début de la recherche dans le dictionnaire de vecteurs 20. Cette opération s'effectue sur la figure 3 à l'aide d'un soustracteur 38 qui reçoit uniquement la composante de mémoire du filtre prédicteur à long terme 24.First, as shown in FIGS. 3 and 4, the contribution of the memory in the long-term predictor filter 24, of transfer function 1 / B (z), and in the short-term predictor weighted function filter transfer 1 / A (z / y), is subtracted from the original signal having undergone the weighting to obtain a signal x n , before the start of the search in the vector dictionary 20. This operation is carried out in the figure 3 using a subtractor 38 which receives only the memory component of the long-term predictor filter 24.

Ainsi, au cours de la procédure de recherche du vecteur optimal, chaque vecteur Ck est traitée uniquement par le filtre de synthèse pondéré 34.So, during the optimal vector search procedure, each vector C k is only processed by the weighted synthesis filter 34.

On va maintenant montrer comment il est possible de réduire encore notablement le volume de calcul en faisant référence à la figure 4. Sur cette figure, chacun des filtres 34 et 36 a été montré décomposé en un filtre 34a ou 36a de fonction de transfert 1/Ã(z/γ), sans mémoire, et un filtre 34b ou 36b correspondant uniquement à la contribution des termes de mémoire.We will now show how it is possible to further reduce the computational volume considerably by referring to FIG. 4. In this figure, each of the filters 34 and 36 has been shown broken down into a filter 34a or 36a of transfer function 1 / Ã (z / γ), without memory, and a filter 34b or 36b corresponding only to the contribution of the memory terms.

Au cours de la recherche du vecteur optimal Ck , chaque vecteur Ck amplifié avec le gain Gk n'est plus traité que par le filtre de synthèse pondéré sans mémoire 1/Ã(z/γ) qui fournit en sortie un signal z(n). Si on identifie par une tilde les grandeurs sans mémoire et si on désigne par :

  • r le signal résiduel après soustraction des effets du prédicteur à long terme 24,
  • x le signal original dont la redondance à long terme a été écartée dans le soustracteur 38 et qui a été pondéré par W(z),
  • zk le signal synthétisé,
  • x° et z° les contributions des mémoires des filtres au calcul de x et z.
During the search for the optimal vector C k , each vector C k amplified with the gain G k is no longer processed except by the weighted synthesis filter without memory 1 / Ã (z / γ) which outputs a signal z (n). If we identify by a tilde the quantities without memory and if we designate by:
  • r the residual signal after subtracting the effects of the long-term predictor 24,
  • x the original signal whose long-term redundancy has been eliminated in subtractor 38 and which has been weighted by W (z),
  • z k the synthesized signal,
  • x ° and z ° the contributions of the memories of the filters to the calculation of x and z.

On peut écrire :

  • x = Hr + x°
    Figure imgb0005
  • Zk = Gk.H Ck + Zo
We can write :
  • x = Hr + x °
    Figure imgb0005
  • Z k = G k .HC k + Z o

L'opération de filtrage par le filtre 34a, sans mémoire, est exprimée ci-dessus par la convolution de deux séquences finies, représentée par le produit d'une matrice et d'un vecteur :

Figure imgb0006
où H est une matrice triangulaire inférieure LxL (L étant la longueur commune des séquences) dont les éléments sont tirés de la réponse impulsionnelle h(i) de 1/A(z/y), de la forme :
Figure imgb0007
qui se confond avec celle de 1/Â(z/y)The filtering operation by filter 34a, without memory, is expressed above by the convolution of two finite sequences, represented by the product of a matrix and a vector:
Figure imgb0006
where H is a lower triangular matrix LxL (L being the common length of the sequences) whose elements are taken from the impulse response h (i) of 1 / A (z / y), of the form:
Figure imgb0007
which merges with that of 1 / Â (z / y)

Le vecteur x' à l'entrée du soustracteur 28, après soustraction des effets de mémoire, peut lui-même s'écrire :

Figure imgb0008
The vector x 'at the input of the subtractor 28, after subtracting the memory effects, can itself be written:
Figure imgb0008

L'énergie de l'erreur Ek pondérée pour le vecteur d'indice k (avec 0 ≦ k K) peut s'écrire :

Figure imgb0009
The energy of the error E k weighted for the vector of index k (with 0 ≦ k K) can be written:
Figure imgb0009

Le processus de recherche de la séquence d'innovation optimale (indice k de vecteur Ck et gain d'amplification Gk) comporte deux étapes qui découlent de l'équation (6) si on tient compte du fait connu (J.P. ADOUL et coll. "Fast CELP coding based on algebraic codes", Proc. ICASSP, April 1987) que minimiser l'énergie Ek revient à maximiser un produit scalaire Pw :

  • - recherche de l'indice k pour lequel le produit scalaire Pω(k) est maximum :
    Figure imgb0010
  • - calcul du gain correspondant Gk :
    Figure imgb0011
The process of finding the optimal innovation sequence (vector index k C k and gain of amplification G k ) comprises two stages which follow from equation (6) if one takes into account the known fact (JP ADOUL et al. "Fast CELP coding based on algebraic codes", Proc. ICASSP, April 1987) that to minimize the energy E k is to maximize a scalar product Pw:
  • - search for the index k for which the scalar product Pω (k) is maximum:
    Figure imgb0010
  • - calculation of the corresponding gain G k :
    Figure imgb0011

Le calcul d'un produit scalaire est évidemment plus rapide que la recherche d'une distance euclidienne, de sorte que le schéma de la figure 3 permet déjà à lui seul de réduire le volume de calcul.The computation of a scalar product is obviously faster than the search for a Euclidean distance, so that the diagram of figure 3 already allows alone to reduce the volume of computation.

L'étape suivante de la démarche consiste à faire disparaître les termes de mémoire, c est-à-dire les opérations schématisées en 34a et 36a, pour arriver à la constitution montrée en figure 5.The next step in the process consists in eliminating the memory terms, that is to say the operations shown diagrammatically in 34a and 36a, to arrive at the constitution shown in FIG. 5.

Comme dans le cas de la figure 2, une simplification importante consiste à substituer des filtres fixes de synthèse à fonction de transfert 1/C(z/-y) aux filtres 34a et 36a de fonction 1/Ã(z/γ), ce qui revient encore à adopter un filtre de pondération perceptuelle de la forme W'(z) = A(z)/C(z/-y). Il n'y a plus à effectuer une opération de filtrage répétitive par 34a, dans la mesure où les vecteurs d'excitation sont stockés dans le dictionnaire 20, d'une part à l'état préfiltré pour les appliquer directement au circuit de maximisation de produit scalaire 38, d'autre part sous forme originale pour application à l'amplificateur 22 de gain Gk. La simplification apparaît immédiatement par une comparaison avec les processus classiques de recherche de minimum.As in the case of FIG. 2, an important simplification consists in substituting fixed synthesis filters with transfer function 1 / C (z / -y) for filters 34a and 36a with function 1 / Ã (z / γ), this which again amounts to adopting a perceptual weighting filter of the form W '(z) = A (z) / C (z / -y). There is no longer any need to perform a repetitive filtering operation by 34a, insofar as the excitation vectors are stored in the dictionary 20, on the one hand in the prefiltered state in order to apply them directly to the maximization circuit of scalar product 38, on the other hand in original form for application to the gain amplifier 22 Gk. The simplification appears immediately by a comparison with the classical processes of search for minimum.

Un autre mode encore de mise en oeuvre de l'invention met en oeuvre un critère d'évaluation d'erreur à minimiser modifié. Les trames d'échantillons occupant chacun une fenêtre sont successivement appliquées ; en conséquence, la réponse impulsionnelle du filtre pondéré de synthèse pour une trame (ou un bloc) intervient sur la trame suivante (ou le bloc suivant). Pour écarter cet effet, on utilise l'amortissement des filtres et on applique à leur entrée, au lieu d'une séquence constituée uniquement de L échantillons, une séquence constituée de L échantillons et J zéros, J étant choisi de façon que la réponse impulsionnelle du filtre de synthèse W(z)/A(z) soit pratiquement nulle après J échantillons. Une valeur J=10 est généralement suffisante pour que l'amortissement des filtres permette d'écarter les termes représentatifs de leur mémoire. La matrice de réponse impulsionnelle devient alors une matrice rectangulaire du type "bande" à (L + J)xL termes du type :

Figure imgb0012
Yet another embodiment of the invention implements a modified error evaluation criterion to be minimized. The sample frames each occupying a window are successively applied; consequently, the impulse response of the weighted synthesis filter for a frame (or a block) occurs on the next frame (or the next block). To avoid this effect, we use the damping of the filters and we apply to their input, instead of a sequence consisting only of L samples, a sequence consisting of L samples and J zeros, J being chosen so that the impulse response of the synthesis filter W (z) / A (z) is practically zero after J samples. A value J = 10 is generally sufficient for the damping of the filters to exclude the terms representative of their memory. The impulse response matrix then becomes a rectangular "strip" type matrix with (L + J) xL terms of the type:
Figure imgb0012

La matrice HtH = R est alors une matrice symétrique de Toeplitz, construite à partir de l'autocorréla- tion R(i) de la réponse impulsionnelle h(n). Ht désigne la transposée de H.The matrix H t H = R is then a symmetric Toeplitz matrix, constructed from the autocorrelation R (i) of the impulse response h (n). H t denotes the transpose of H.

L'erreur de mémoire qui apparaît dans l'équation représentative de x' est alors suffisamment faible pour pouvoir être considérée comme nulle et l'équation (7) peut s'écrire :

Figure imgb0013
The memory error that appears in the equation representative of x 'is then small enough to be considered as zero and equation (7) can be written:
Figure imgb0013

Le vecteur y' = rt H' H peut être calculé de façon précise, une seule fois par trame, par une opération de filtrage, en utilisant un filtre adaptatif dont les coefficients sont les termes d'auto-corrélation R(i).The vector y '= r t H' H can be calculated precisely, only once per frame, by a filtering operation, using an adaptive filter whose coefficients are the auto-correlation terms R (i).

Pour mettre en oeuvre ce procédé dant le cas d'un signal de parole échantillonné à 8 kHz et dont les échantillons sont répartis en trames de 160 échantillons occupant chacune 20 ms, on peut notamment fractionner chaque trame, après filtrage en 33 (figure 5), en quatre blocs de L=40 échantillons qui sont successivement appliqués au filtre 36a, suivis chaque fois de J = 10 zéros.To implement this method in the case of a speech signal sampled at 8 kHz and the samples of which are distributed in frames of 160 samples each occupying 20 ms, it is in particular possible to split each frame, after filtering at 33 (FIG. 5) , in four blocks of L = 40 samples which are successively applied to the filter 36a, followed each time by J = 10 zeros.

A sera alors calculé pour chaque trame tandis que k et Gk seront calculés pour chaque bloc.A will then be calculated for each frame while k and G k will be calculated for each block.

Une solution particulièrement intéressante dans ce cas consiste à utiliser des séquences d'impulsions de longueur L ayant une structure régulière constituée de q impulsions équidistantes séparées par D-1 zéros, la première impulsion occupant l'une des positions 0 à D-1 et le nombre de séquences étant tel que toutes ces positions sont successivement occupées. On peut ainsi donner une représentation satisfaisante de l'information de phase dans le signal d'excitation. La figure 7 montre, à titre d'exemple, quatre séquences (pour k=0, 1, 2 et 3) identiques sauf en ce qu'elle correspondent à D=4 phases différentes. On peut considérer que le dictionnaire est constitué par un jeu de base de K/D séquences, avec une phase nulle et avec trois décalages successifs, soit en tout K séquences.A particularly interesting solution in this case consists in using sequences of pulses of length L having a regular structure made up of q equidistant pulses separated by D-1 zeros, the first pulse occupying one of the positions 0 to D-1 and the number of sequences being such that all of these positions are successively occupied. It is thus possible to give a satisfactory representation of the phase information in the excitation signal. FIG. 7 shows, by way of example, four sequences (for k = 0, 1, 2 and 3) identical except that they correspond to D = 4 different phases. We can consider that the dictionary consists of a basic set of K / D sequences, with a zero phase and with three successive shifts, ie in all K sequences.

L'excitation par séquences d'excitation régulières réduit le nombre d'opérations à effectuer, du fait que beaucoup des produits à effectuer sont nuls, l'un des facteurs étant un zéro dont la position est connue pour chaque échantillon. On peut encore simplifier les calculs en constituant les séquences uniquement d'échantillons binaires ne pouvant prendre que les valeurs + 1, -1 (et 0), comme indiqué sur la figure 8. En effet toutes les séquences contiennent alors la même énergie ; la recherche de la séquence optimum s'effectue avec des produits purement scalaires et revient à chercher le vecteur binaire qui donne le meilleur résultat. On peut à sujet noter que le document EP-A-0 195 487 concerne un procédé de codage MPLPC suivant lequel il faut successivement déterminer une phase optimale d'impulsions, puis chercher l'amplitude optimale de toutes les impulsions constituant séquence parmi des valeurs discrètes, quantifiées par exemple sur 3 bits.Excitation by regular excitation sequences reduces the number of operations to be performed, since many of the products to be performed are zero, one of the factors being a zero whose position is known for each sample. We can further simplify the calculations by constituting the sequences only of binary samples which can only take the values + 1, -1 (and 0), as indicated in FIG. 8. Indeed, all the sequences then contain the same energy; the search for the optimum sequence is carried out with purely scalar products and amounts to looking for the binary vector which gives the best result. It may be noted that the document EP-A-0 195 487 relates to an MPLPC coding method according to which it is necessary successively to determine an optimal phase of pulses, then to seek the optimal amplitude of all the pulses constituting sequence among discrete values , quantified for example on 3 bits.

Dans le cas du critère modifié et d'une excitation par séquences régulières, et notamment dans le cas de séquences constituées d'échantillons binaires, et à la condition supplémentaire que l'auto-corrélation soit normalisée et présente des termes nuls dont l'écartement correspond aux échantillons non nuls des séquences, les termes H ck deviennent tous égaux et on a :

Figure imgb0014
où dm désigne l'une des séquences (au nombre de K/D) résultant de la décimation des composantes des K vecteurs par élimination des zéros ; la séquence dm pour 0 < k 3 3 est donnée en figure 7 à titre d'exemple.In the case of the modified criterion and of an excitation by regular sequences, and in particular in the case of sequences made up of binary samples, and with the additional condition that the autocorrelation is normalized and presents null terms whose spacing corresponds to the non-zero samples of the sequences, the terms H c k all become equal and we have:
Figure imgb0014
where d m denotes one of the sequences (the number of K / D) resulting from the decimation of the components of the K vectors by elimination of zeros; the sequence d m for 0 <k 3 3 is given in FIG. 7 by way of example.

Si les séquences sont normalisées, la procédure de recherche se limite à la recherche de la séquence pour laquelle le produit scalaire P(k) = yt.ck est maximum.If the sequences are normalized, the search procedure is limited to the search for the sequence for which the dot product P (k) = y t. c k is maximum.

Les conditions nécessaires à l'applicabilité de la formule (11) peuvent notamment être obtenues :

  • - soit en adoptant un filtre fixe R tel que R(iD) soit nul pour i > 0,
  • - soit en adoptant un filtre à coefficients variables, mais dont la réponse impulsionnelle finie (RIF) est tronquée pour les indices d'échantillons supérieurs à D.
The conditions necessary for the applicability of formula (11) can in particular be obtained:
  • - either by adopting a fixed filter R such that R (iD) is zero for i> 0,
  • - or by adopting a variable coefficient filter, but whose finite impulse response (RIF) is truncated for the sample indices greater than D.

Le codeur présente alors la constitution de principe montrée en figure 6. Une opération de filtrage unique est effectuée sur la trame de signal de parole par le filtre 33. La séquence ck testée, sous forme qui n'a plus besoin d'être préfiltrée, est appliquée au circuit 32 de calcul du produit scalaire ck t.y et de détermination du maximum, pour lequel un ordre de sélection d'indice est envoyé en 40. La séquence ck amplifiée en 22 est appliquée au prédicteur à long terme 24, représenté avec un seul coefficient b. Le terme r est formé en soustrayant la sortie du prédicteur à long terme 24 de la sortie du filtre 34 sur la voie de parole, dans le soustracteur 38. Le filtre 42 qui reçoit le résidu r a une réponse fixe R(z) représentée par une matrice de Toeplitz symétrique.The coder then presents the constitution of principle shown in FIG. 6. A single filtering operation is performed on the speech signal frame by the filter 33. The sequence c k tested, in a form which no longer needs to be prefiltered , is applied to the circuit 32 for calculating the scalar product c k t .y and for determining the maximum, for which an index selection order is sent at 40. The sequence c k amplified at 22 is applied to the long-term predictor 24, shown with a single coefficient b. The term r is formed by subtracting the output of the long-term predictor 24 from the output of the filter 34 on the speech channel, in the subtractor 38. The filter 42 which receives the residue has a fixed response R (z) represented by a symmetric Toeplitz matrix.

La recherche du vecteur optimal peut alors s'effectuer par un nombre réduit d'opérations de multiplication et d'addition, sous la réserve que la réponse soit tronquée si le filtre est variable, par exemple par la démarche suivante si les vecteurs d'excitation régulière sont binaires :

  • - détermination de la phase qui donne une valeur maximale à M(p) :
    Figure imgb0015
  • - puis, parmi les vecteurs ayant la phase retenue, sélection du vecteur dm tel que
    • yt.Ck = M(p)
    • c'est-à-dire :
    • dm(i) = signe de y(p + iD)
    • pour i = 0, ..., q-1
The search for the optimal vector can then be carried out by a reduced number of multiplication and addition operations, provided that the response is truncated if the filter is variable, for example by the following approach if the excitation vectors regular are binary:
  • - determination of the phase which gives a maximum value to M (p):
    Figure imgb0015
  • - then, among the vectors having the selected phase, selection of the vector d m such that
    • y t .C k = M ( p )
    • that is to say :
    • d m (i) = sign of y (p + iD)
    • for i = 0, ..., q-1

Une fois sélectionné le vecteur optimal, le gain Gk à retenir s'en déduit directement, puisque ∥H Ck∥2 est égal à une valeur constante q, quelle que soit la valeur de k, dans le cas de vecteurs binaires qui ont tous la même norme.Once the optimal vector has been selected, the gain G k to be retained is deduced directly from it, since ∥HC k ∥2 is equal to a constant value q, whatever the value of k, in the case of binary vectors which all have the same standard.

Ce processus réduit le nombre de calculs requis dans un rapport qui est typiquement de trois ordres de grandeur environ par rapport au procédé CELP classique, et ce quelle que soit la longueur L choisie pour les blocs de parole.This process reduces the number of calculations required in a report which is typically about three orders of magnitude compared to the conventional CELP method, regardless of the length L chosen for the speech blocks.

Les données à transmettre par le multiplexeur 18 seront :

  • - le coefficient unique b et la période T (correspondant à la périodicité du signal de parole) du filtre prédicteur à long terme 24, une ou plusieurs fois par fenêtre, les coefficients a du filtre 33, de fonction de transfert A(z), une fois par fenêtre,
  • - l'indice k du vecteur optimal et le gain correspondant Gk, une fois par bloc correspondant à une sous-fenêtre de par exemple 40 échantillons.
The data to be transmitted by the multiplexer 18 will be:
  • the single coefficient b and the period T (corresponding to the periodicity of the speech signal) of the long-term predictor filter 24, one or more times per window, the coefficients a of the filter 33, of transfer function A (z), once per window,
  • - the index k of the optimal vector and the corresponding gain G k , once per block corresponding to a sub-window of for example 40 samples.

Le gain Gk est, en vue de la transmission, quantifié dans un quantificateur 46. Chaque trame de signal est fractionnée en plusieurs blocs, une mémoire intermédiaire 48 doit être interposée entre les composants 33 et 44.The gain G k is, for transmission, quantized in a quantizer 46. Each signal frame is split into several blocks, an intermediate memory 48 must be interposed between the components 33 and 44.

Il faut au surplus noter que, du fait que l'excitation est binaire et régulière, elle est peu sensible aux erreurs de transmission : une erreur modifiant la valeur d'un bit ne modifie le vecteur que localement. Les bits de phase en nombre réduit peuvent être protégés par un code correcteur.It should also be noted that, since the excitation is binary and regular, it is not very sensitive to transmission errors: an error modifying the value of a bit only modifies the vector locally. The reduced phase bits can be protected by a correction code.

Claims (6)

1. A word coding method, with linear prediction and vectorial excitation, allowing the coding of word signals put in the form of digitized samples distributed in patterns, wherein:
a signal pattern is represented on the one hand by prediction parameters and on the other hand by a succession of excitation vectors contained in a dictionary (20) and by amplification gains (Gk) of said vectors, the vectors adopted being determined by searching (32) for the minimum energy of an error signal obtained by subtracting each vector in its turn, which has previously be subjected to filtering, from the pattern of the word signal; and,
before subtraction, subjecting each word signal pattern to a short-term analytical filtering (33) and to a perceptual weighted synthesis filtering (36) and subjecting the amplified vector to a long-term predictive filtering (24) and to the same perceptual weighted synthesis filtering (34,36) as the word signal,

characterized in that all the excitation vectors are formed by the same number of equi-distant pulses separated by zeros.
2. A method according to Claim 1, characterized in that the pulses separated by zeros are binary.
3. A method according to Claims 1 or 2, characterized in that each search (32) for the minimum energy of the error signal is performed by subjecting to filtering (34,36 or 34a,36a) an assembly comprising, in addition to the real samples of a block forming a fraction of the pattern, zero samples in an adequate number for the predictive filtering pulse response corresponding to the last real sample to be substantially damped, the filtering (34,36 or 34a,36a) being performed without storing from one block to the other.
4. A method according to Claims 1 or 2, characterized in that after being subjected to a short-term analytical predictive filtering A(z), each word signal pattern is applied to the additive input of a subtracter (38) which receives at its subtractive input the contribution from the store of the long-term predictive filter (24),
the output of the subtracter is subjected to a filtering (42),
the scalar product (32) of the filtered output of the subtracter and of each unamplified sequence is calculated in its turn, while searching for that sequence for which the scalar product is the maximum.
5. A method according to Claim 4, characterized in that the filtering (42) is with fixed coefficients.
6. A word coding method, with linear prediction and vectorial excitation, allowing the coding of word signals put in the form of digitised samples distributed in patterns, wherein:
each block forming a fraction of signal pattern is represented by one of the vectors contained in a dictionary (20), by a vector amplification gain (Gk) and by prediction parameters, the vector adopted being determined by searching the minimum energy of an error signal obtained by subtracting each vector in its turn, previously subjected to a filtering, from the word signal pattern; and,
before subtraction, subjecting each word signal pattern to a short-term analytical filtering A(z), characterized in that
the result of the subtraction (38) is subjected to a perceptual weighted synthesis filtering (36) with coefficients fixed in time and
the excitation vector, stored in a precalculated form and filtered, is subjected to a filtering by a perceptual weighted synthesis filter (34) 1/C(z/y) which is fixed and without store, all the excitation vectors being formed by the same number of equi-distant pulses, separated by zeros.
EP89401644A 1988-06-13 1989-06-13 Coding method and linear prediction speech coder Expired - Lifetime EP0347307B1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR8807846 1988-06-13
FR8807846A FR2632758B1 (en) 1988-06-13 1988-06-13 LINEAR PREDICTION SPEECH CODING AND ENCODING METHOD

Publications (3)

Publication Number Publication Date
EP0347307A2 EP0347307A2 (en) 1989-12-20
EP0347307A3 EP0347307A3 (en) 1990-12-27
EP0347307B1 true EP0347307B1 (en) 1994-05-04

Family

ID=9367205

Family Applications (1)

Application Number Title Priority Date Filing Date
EP89401644A Expired - Lifetime EP0347307B1 (en) 1988-06-13 1989-06-13 Coding method and linear prediction speech coder

Country Status (4)

Country Link
EP (1) EP0347307B1 (en)
DE (1) DE68915057T2 (en)
ES (1) ES2052043T3 (en)
FR (1) FR2632758B1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ATE186607T1 (en) * 1990-12-21 1999-11-15 British Telecomm VOICE CODING
ES2042410B1 (en) * 1992-04-15 1997-01-01 Control Sys S A ENCODING METHOD AND VOICE ENCODER FOR EQUIPMENT AND COMMUNICATION SYSTEMS.
FR2720849B1 (en) * 1994-06-03 1996-08-14 Matra Communication Method and device for preprocessing an acoustic signal upstream of a speech coder.
CN101615394B (en) * 2008-12-31 2011-02-16 华为技术有限公司 Method and device for allocating subframes

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA1337217C (en) * 1987-08-28 1995-10-03 Daniel Kenneth Freeman Speech coding

Also Published As

Publication number Publication date
FR2632758A1 (en) 1989-12-15
EP0347307A2 (en) 1989-12-20
ES2052043T3 (en) 1994-07-01
DE68915057T2 (en) 1994-08-18
DE68915057D1 (en) 1994-06-09
FR2632758B1 (en) 1991-06-07
EP0347307A3 (en) 1990-12-27

Similar Documents

Publication Publication Date Title
EP0749626B1 (en) Speech coding method using linear prediction and algebraic code excitation
EP0608174B1 (en) System for predictive encoding/decoding of a digital speech signal by an adaptive transform with embedded codes
EP0782128B1 (en) Method of analysing by linear prediction an audio frequency signal, and its application to a method of coding and decoding an audio frequency signal
EP3330964B1 (en) Resampling of an audio signal for encoding/decoding with low delay
EP0542585B1 (en) Predictive filter quantification method for low bit rate vocoder
EP0428445B1 (en) Method and apparatus for coding of predictive filters in very low bitrate vocoders
EP0481895B1 (en) Method and apparatus for low bit rate transmission of a speech signal using CELP coding
EP1836699B1 (en) Method and device for carrying out optimized audio coding between two long-term prediction models
EP0490740A1 (en) Method and apparatus for pitch period determination of the speech signal in very low bitrate vocoders
WO2006114494A1 (en) Method for adapting for an interoperability between short-term correlation models of digital signals
FR2674710A1 (en) METHOD AND SYSTEM FOR PROCESSING PREECHOS OF AUDIO-DIGITAL SIGNAL CODE BY FREQUENCY TRANSFORMATION
EP0078581B1 (en) Differential pcm transmission system with adaptive prediction
EP0347307B1 (en) Coding method and linear prediction speech coder
EP0334714A1 (en) Differential coder with a self-adaptive filter, and decoder for said coder
EP2652735B1 (en) Improved encoding of an improvement stage in a hierarchical encoder
WO2023165946A1 (en) Optimised encoding and decoding of an audio signal using a neural network-based autoencoder
EP0616315A1 (en) Digital speech coding and decoding device, process for scanning a pseudo-logarithmic LTP codebook and process of LTP analysis
EP0341129A1 (en) Method and apparatus for coding speech signal energy in very low bitrate vocoders
EP0030194A1 (en) Predictive stage for a dataflow compression system
WO2011144863A1 (en) Encoding with noise shaping in a hierarchical encoder
EP0469997B1 (en) Coding method and speech coder using linear prediction analysis
EP1192618B1 (en) Audio coding with adaptive liftering
FR2771543A1 (en) Noise reduction algorithm
EP0454552A2 (en) Method and apparatus for low bitrate speech coding
EP0300880A1 (en) Adaptive predictor

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): DE ES GB IT

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Kind code of ref document: A3

Designated state(s): DE ES GB IT

17P Request for examination filed

Effective date: 19901221

17Q First examination report despatched

Effective date: 19930126

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE ES GB IT

REF Corresponds to:

Ref document number: 68915057

Country of ref document: DE

Date of ref document: 19940609

GBT Gb: translation of ep patent filed (gb section 77(6)(a)/1977)

Effective date: 19940516

REG Reference to a national code

Ref country code: ES

Ref legal event code: FG2A

Ref document number: 2052043

Country of ref document: ES

Kind code of ref document: T3

ITF It: translation for a ep patent filed

Owner name: GUZZI E RAVIZZA S.R.L.

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed
REG Reference to a national code

Ref country code: GB

Ref legal event code: IF02

REG Reference to a national code

Ref country code: GB

Ref legal event code: 732E

REG Reference to a national code

Ref country code: ES

Ref legal event code: PC2A

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20040608

Year of fee payment: 16

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20040609

Year of fee payment: 16

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: ES

Payment date: 20040611

Year of fee payment: 16

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IT

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20050613

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20050613

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: ES

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20050614

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20060103

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20050613

REG Reference to a national code

Ref country code: ES

Ref legal event code: FD2A

Effective date: 20050614