EP0608174B1 - System for predictive encoding/decoding of a digital speech signal by an adaptive transform with embedded codes - Google Patents

System for predictive encoding/decoding of a digital speech signal by an adaptive transform with embedded codes Download PDF

Info

Publication number
EP0608174B1
EP0608174B1 EP94400109A EP94400109A EP0608174B1 EP 0608174 B1 EP0608174 B1 EP 0608174B1 EP 94400109 A EP94400109 A EP 94400109A EP 94400109 A EP94400109 A EP 94400109A EP 0608174 B1 EP0608174 B1 EP 0608174B1
Authority
EP
European Patent Office
Prior art keywords
signal
module
perceptual
speech signal
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP94400109A
Other languages
German (de)
French (fr)
Other versions
EP0608174A1 (en
Inventor
Bruno Lozach
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Orange SA
Original Assignee
France Telecom SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by France Telecom SA filed Critical France Telecom SA
Publication of EP0608174A1 publication Critical patent/EP0608174A1/en
Application granted granted Critical
Publication of EP0608174B1 publication Critical patent/EP0608174B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0002Codebook adaptations
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0003Backward prediction of gain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0004Design or structure of the codebook
    • G10L2019/0005Multi-stage vector quantisation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0011Long term prediction filters, i.e. pitch estimation

Definitions

  • the present invention relates to a system for predictive coding-decoding of a digital speech signal by adaptive transform with nested codes.
  • the digital signal to code Sn coming from a signal of analog source speech, is subject to a process of short-term prediction, LPC analysis, coefficients of prediction being obtained by prediction of the speech signal on windows with M samples.
  • the signal speech digital code Sn is filtered using a perceptual weighting filter W (z) deduced from the coefficients above, to get the signal perceptual pn
  • a long-term prediction process then makes it possible to take into account the periodicity of the residue for the voiced sounds, on all the sub-windows of N samples, N ⁇ M, in the form of a contribution p and n , which is subtracted from the perceptual signal pn so as to obtain the signal p'n in the form of a vector P' ⁇ R N.
  • a transformation followed by a quantification are then carried out on the aforementioned vector P ′ in order to carry out a digital transmission.
  • the reverse operations allow, after transmission, the modeling of the synthetic signal S and n .
  • the Karhunen-Loeve transform obtained from the eigenvectors of the auto-correlation matrix where I is the number of vectors contained in the learning corpus, allows to maximize the expression where K is an integer, K ⁇ N.
  • K is an integer
  • K ⁇ N the mean square error of the Karhunen-Loeve transform is lower than that of any other transformation for a given modeling order K, this transform being, in this sense, optimal .
  • This type of transform was introduced into a predictive coder by orthogonal transform by N. Moreau and P. Dymarski, confer publication "Successive Orthogonalisations in the Multistage CELP Coder", ICASSP 92 Vol.1, pp I-61 - I-64.
  • suboptimal transforms such as the transform of Fourier Rapide (FFT), the transform into a discrete cosine (TCD) the discrete transform of Hadamard (DHT) or Walsh Hadamard (DWHT) for example.
  • FFT Fourier Rapide
  • TCD discrete cosine
  • DHT discrete transform of Hadamard
  • DWHT Walsh Hadamard
  • Another method for the construction of an orthonormal transform consists in decomposing into singular values the lower triangular Toeplitz matrix H defined by: matrix in which h (n) is the impulse response of the short-term prediction filter 1 / A (z) of the current window.
  • the matrix H can then be decomposed into a sum of matrices of rank 1:
  • Encoders with nested codes currently known allow data to be transmitted by theft of elements binaries normally allocated to speech on the channel transmission in a transparent manner for the encoder, which encodes the speech signal at the maximum rate.
  • a 64 kbit / s encoder with scaled quantizer with nested codes has been normalized to 1986 by standard G 722 established by the CCITT.
  • This encoder operating in the field of wideband speech (50 Hz to 7 kHz bandwidth audio signal, sampled at 16 kHz), is based on coding in two sub-bands each containing a Pulse Modulation encoder and Adaptive Differential Coding (MICDA coding).
  • MICDA coding Adaptive Differential Coding
  • This coding technique allows you to transmit signals from wideband speech and data, if necessary, on a 64 kbit / s channel, at three different bit rates 64-56-48 kbit / s and 0-8-16 kbit / s for data.
  • Coders predictive by art transformation above do not allow the transmission of data and therefore cannot fulfill the function of coders with nested codes.
  • coded coders nested in the prior art do not use the technique of the orthonormal transform, which does not allow to tend towards or reach an optimal transform coding.
  • the object of the present invention is to remedy the aforementioned drawback by the implementation of a system of predictive coding-decoding of a digital speech signal by adaptive transform with nested codes.
  • Another object of the present invention is the implementation implementation of a predictive coding-decoding system of a digital speech and data signal allowing transmission at reduced and flexible rates.
  • the predictive coding system for a digital signal into a digital signal with nested codes in which the coded digital signal consists of a speech signal encoded and, if necessary, by an auxiliary data signal inserted into the coded speech signal after coding this last object of the present invention includes a filter perceptual weighting driven by a prediction loop short term to generate a perceptual signal and a long-term prediction circuit delivering a estimated perceptual signal, this long prediction circuit term forming a long-term prediction loop allowing to deliver, from the perceptual signal and the signal estimated past excitation, a perceptual excitation signal modeled and adaptive transform circuits and quantization allowing from the excitation signal perceptual to generate the coded speech signal.
  • the weighting filter perceptual consists of a prediction filter to short term of the speech signal to be coded, so as to achieve a frequency distribution of the quantization noise and in that it includes a circuit for subtracting the contribution of the excitation signal passed from the signal perceptual to deliver an updated perceptual signal, the long-term prediction circuit being formed, in a loop closed, from a dictionary updated by excitement past modeled corresponding to the lowest flow to deliver an optimal waveform and gain associated with it, constituting the estimated perceptual signal.
  • the transform circuit is formed by a module of orthonormal transform including a transformation module orthogonal adaptive and a modeling module progressive by orthogonal vectors.
  • the modeling module progressive and long-term prediction circuit allow the delivery of indexes representative of the signal coded speech.
  • An auxiliary data insertion circuit is coupled to the transmission channel.
  • the transform predictive decoding system adaptive of a digital signal coded with codes nested in which the coded digital signal consists of a signal digital coded and, where appropriate, by a data signal auxiliaries inserted in the coded speech signal after coding of the latter, is remarkable in that it comprises a data signal extraction circuit enabling, hand, data extraction for use auxiliary and, on the other hand, index transmission representative of the coded speech signal. It includes in addition to a circuit for modeling the speech signal at minimum flow and a signal modeling circuit speech at least at a rate greater than the minimum rate.
  • the predictive coding-decoding system for a signal speech digital by adaptive transform to codes nested object of the present invention finds application, in general, to the transmission of speech and data at flexible rates, and more specifically, audio-visual conference protocols, videophone, speakerphone, storage and transport of digital audio signals over links long distances, when transmitting with mobiles and channel concentration systems.
  • the signal digital coded by implementing the coding system object of the present invention consists of a signal speech coded and if necessary by a data signal auxiliaries inserted in the coded speech signal, after coding of this digital speech signal.
  • the object coding system of the present invention may include, from a transducer delivering the analog speech signal, a converter analog-digital and a memory circuit input or input buffer to deliver the signal digital to code Sn.
  • the coding system object of the present invention also comprises a perceptual weighting filter 11 controlled by a short-term prediction loop making it possible to generate a perceptual signal, noted .
  • the long-term prediction circuit 13 forms a long-term prediction loop to deliver, to from the perceptual signal and the past excitation signal estimated, noted P and 0 / n, a perceptual excitation signal modeled.
  • the coding system which is the subject of the invention as shown in FIG. 2 further comprises an adaptive transform and quantization circuit making it possible, from the perceptual excitation signal P n, to generate the coded speech signal as it will be described below in the description.
  • the perceptual weighting filter 11 consists of a filter for short-term prediction of the speech signal to be coded, so as to achieve a frequency distribution of the quantization noise.
  • the perceptual weighting filter 11 delivering the perceptual signal thus comprises, as shown in the same FIG. 2, a circuit 120 for subtracting the contribution of the past excitation signal P and 0 / n from the perceptual signal to deliver an updated perceptual signal, this perceptual signal updated being noted P n .
  • the long term prediction circuit 13 is formed in a closed loop from an updated dictionary by the past excitation modeled corresponding to the lowest bit rate, this dictionary allows to deliver an optimal waveform and an estimated gain associated therewith.
  • this dictionary allows to deliver an optimal waveform and an estimated gain associated therewith.
  • the corresponding modeled past excitation at the lowest flow rate is noted r and 1 / n.
  • the transform module circuit is formed by an orthonormal transform module 14, comprising a properly adaptive orthogonal transformation module said and a progressive modeling module by orthogonal vectors, noted 16.
  • the progressive modeling module 16 and the circuit long-term prediction 13 allow to issue indexes representative of the coded speech signal, these indexes being denoted i (0), j (0) respectively i (1), j (1) with 1 ⁇ [1, L] in figure 2.
  • the coding system further includes a data insertion circuit 19 auxiliaries coupled to the transmission channel, noted 18.
  • the operation of the object coding device the present invention can be illustrated as follows.
  • the synthetic signal S and n is of course the signal reconstructed on reception, that is to say at the decoding level after transmission as will be described later in the description.
  • a short-term prediction analysis formed by the analysis circuit 10 of the LPC type for "Linear Predictive Coding" and by the perceptual weighting filter 11 is carried out for the digital signal to be coded by a conventional prediction technique on windows comprising for example M samples.
  • the analysis circuit 10 then delivers the coefficients a i , where the aforementioned coefficients a i are the linear prediction coefficients.
  • the speech signal to be coded Sn is then filtered by the perceptual weighting filter 11 of transfer function W (z), which makes it possible to deliver the perceptual signal proper, noted .
  • the coefficients of the perceptual weighting filter are obtained from a short-term prediction analysis on the first correlation coefficients of the sequence of the coefficients a i of the analysis filter A (z) of circuit 10 for the current window.
  • This operation makes it possible to achieve a good frequency distribution of the quantization noise.
  • the perceptual signal delivered tolerates greater coding noise in high-energy areas where the noise is less audible, since it is frequently masked by the signal. It is indicated that the perceptual filtering operation is broken down into two stages, the digital signal to code Sn being filtered a first time by the filter constituted by the analysis circuit 10, in order to obtain the residue to be modeled, then a second times by the perceptual weighting filter 11 to deliver the perceptual signal .
  • the second operation is to then remove the contribution from the excitement past, or estimated past excitation signal, noted p and 0 / n of aforementioned perceptual signal.
  • h n is the impulse response of the double filtering performed by the circuit 10 and the perceptual weighting filter 11 in the current window and r and 1 / n is the past excitation modeled corresponding to the lowest flow rate, as well as 'It will be described later in the description.
  • the operating mode of the long prediction circuit term 13 in closed loop is then the following.
  • This circuit allows to take into account the periodicity of the residue for voiced sounds, this long-term prediction being performed all the sub-windows of N samples, thus that it will be described in connection with FIG. 3.
  • the long-term closed-loop prediction circuit 13 comprises a first stage constituted by an adaptive dictionary 130, which is updated all the aforementioned sub-windows by the modeled excitation denoted r and 1 / n, delivered by the module 17 , which will be described later in the description.
  • the adaptive dictionary 130 makes it possible to minimize the error, noted with respect to the two parameters g 0 and q.
  • the waveform of index j is filtered by a filter 131 and corresponds to the excitation modeled at the lowest rate r and 1 / n delayed by q samples by the aforementioned filter.
  • the optimal waveform f 1 / n is delivered by the filtered adaptive dictionary 133.
  • a module 132 for calculating and quantifying the prediction gain makes it possible, from the perceptual signal P n and all the waveforms fj (0) / n, to carry out a calculation for quantifying the prediction gain, and to deliver an index i (0) representative of the number of the quantization range, as well as its associated quantized gain g (0).
  • a multiplier circuit 134 delivers from adaptive dictionary filtered 133, that is to say of the result filtering the waveform with index j C j / n, i.e. f j / n, and the associated quantified gain g (0), the prediction excitation at long term modeled and filtered perceptually noted P and 1 / n.
  • a module 136 makes it possible to calculate the Euclidean norm
  • a module 137 makes it possible to search for the optimal waveform corresponding to the minimum value of the above-mentioned Euclidean standard and to deliver the index j (0).
  • the parameters transmitted by the coding system object of the invention for modeling the long-term prediction signal are then the index j (0) of the optimal waveform f j (0) as well as the number i ( 0) of the quantization range of its associated gain g (0) quantized.
  • the method used for construction of this transform corresponds to that proposed by B.S.Atal and E.Ofer, as previously mentioned in the description.
  • this consists in decomposing, not the short-term prediction filtering matrix, but the perceptual weighting matrix W formed by a lower triangular Toeplitz matrix defined by the relation (4):
  • w (n) denotes the response of the perceptual weighting filter W (z) of the current window previously mentioned.
  • FIG. 4a there is shown the partial diagram of a predictive transform coder and in FIG. 4b, the corresponding equivalent scheme in which the matrix or perceptual weighting filter W, designated by 140, has been highlighted, a perceptual weighting filter reverse 121 having, however, been inserted between the long-term prediction 13 and the subtractor circuit 120. It is indicated that the filter 140 performs a combination linear of the basic vectors obtained from a decomposition into singular values of the matrix representative of the perceptual weighting filter W.
  • the signal S ' corresponding to the speech signal to be coded S n from which it has been subtracted the contribution of the past excitation delivered by the module 12, as well as that of the long-term prediction P and 1 / n filtered by a perceptual transfer weighting module with transfer function (W (z)) -1 , is filtered by the perceptual weighting filter with transfer function W (z), so as to obtain the vector P ' .
  • the matrix W is then decomposed into a sum of matrices of rank 1, and verifies the relation:
  • the short-term analysis filter circuit 10 being updated on windows of M samples, the decomposition into singular values of the matrix of W perceptual weighting is performed at the same frequency.
  • the orthonormal transform process is built by learning.
  • the orthonormal transform module can be formed by a stochastic transform submodule constructed by drawing of a Gaussian random variable for initialization, this sub-module comprising in FIG. 5 the steps 1000, 1001, 1002 and 1003 and being noted SMTS.
  • Step 1002 can consist in applying the algorithm of the K-mean over the aforementioned vector corpus.
  • the SMTS sub-module is successively followed by module 1004 for building centers, a module 1005 for construction of classes and, in order to obtain a vector G whose components are relatively ordered, of a transform reordering module 1006 according to the cardinal of each class.
  • the aforementioned module 1006 is followed by a module of Gram-Schmidt calculation, noted 1007a, so as to obtain a orthonormal transform.
  • the aforementioned module 1007a is associated a module 1007b for calculating the error under the conditions conventional implementation of the treatment process Gram-Schmidt.
  • the 1007a module is itself followed by a 1008 module test on the number of iterations, this to allow obtain an orthonormal transform performed offline by learning.
  • memory 1009 of memory type dead allows to memorize the orthonormal transform under transform vector shape. It is indicated that the scheduling relative of the components of the gain vector G is accentuated by the orthogonalization process.
  • Figure 5b shows the scheduling of components of the gain vector G, i.e. the value G normalized mean for a transform obtained on the one hand by decomposition into singular values of the matrix of perceptual weighting W, and on the other hand, by learning.
  • the orthonormal transform F can be obtained according to two different methods.
  • the new dimension of the gain vector G becomes then equal to N-1, which increases the number bits per sample during quantization vector of it and therefore the quality of its modelization.
  • a first solution to calculate the transform F 'can then consist in making a prediction analysis at long term, to shift the transform obtained by learning up a notch, to place the long-term predictor at the first position, then apply the Gram-Schmidt algorithm, in order to obtain a new transform F '.
  • the transformation used must keep the dot product.
  • the transformation is not applied than to the perceptual signal P, and the modeled perceptual signal P and can then be calculated by the inverse transformation.
  • the adaptive transformation module 14 can include a Householder transformation module 140 receiving the estimated perceptual signal constituted by the optimal waveform and by the estimated gain and the signal perceptual P to generate a transformed perceptual signal P ''. It is indicated that the Householder transformation module 140 comprises a module 1401 for calculating parameters B and wB as defined previously by relation 13. It also includes a module 1402 comprising a multiplier and a subtractor making it possible to carry out the transformation proper according to the relation 14. It is indicated that the transformed perceptual signal P '' is delivered in the form of a vector of perceptual transformed signal of component P '' k , with k ⁇ [0, N-1].
  • the adaptive transformation module 14 such as represented in FIG. 7 also includes a plurality N of registers for memorizing orthonormal waveforms, the current register being noted r, with r ⁇ [1, N].
  • the aforementioned N storage registers form the read-only memory previously described in the description, each register comprising N storage cells, each component of rank k of each vector, component denoted f 1 / orth (k) being stored in a row cell correspondent of the current register r considered.
  • the module 14 comprises a plurality of N multiplier circuits associated with each register of rank r forming the plurality of the previously mentioned storage registers.
  • each rank multiplier register k receives on the one hand the component of rank k of the stored vector and on the other hand the component P '' k of the transformed perceptual signal vector of corresponding rank k.
  • the Mrk multiplier circuit delivers the product P '' k .fk / orth (k) of the components of the transformed perceptual signal.
  • a plurality of N-1 summing circuits is associated with each register of rank r, each circuit summator of rank k, noted Srk, receiving the product of rank anterior k-1, and the product of corresponding rank k delivered by the multiplier circuit Mrk of the same rank k.
  • the circuit summator of highest rank, SrN-1 then delivers a component g (r) of the estimated gain expressed as a vector G gain
  • the progressive modeling module by orthogonal vectors in fact comprises a module 15 for normalizing the gain vector to generate a normalized gain vector, denoted G k , by comparison of the normalized value of the gain vector G with respect to a threshold value.
  • This normalization module 15 also makes it possible to generate a signal of length of the normalized gain vector linked to the modeling order k to the decoder system as a function of this modeling order.
  • the progressive vector modeling module orthogonal further comprises, in cascade with the module 15 normalization of the gain vector, a stage 16 of progressive modeling by orthogonal vectors.
  • This floor model 16 receives the normalized vector Gk and delivers the indexes representative of the coded speech signal, these indexes being denoted I (1), J (1), these indexes being representative selected vectors and their associated gain.
  • the transmission of auxiliary data formed by indexes is performed by overwriting the parts of the allocated frame to the indices and track numbers to form the signal auxiliary data.
  • the operation of the standardization module 15 is the following.
  • the gain vector thus obtained G K is then quantified and its length k is transmitted by the coding system object of the invention in order to be taken into account by the corresponding decoding system, as will be described later in the description.
  • the average standard criterion according to the order of modeling K is given in figure 8a for a transform orthonormal obtained on the one hand by decomposition into values singulars of the perceptual weighting matrix W and on the other hand by learning.
  • FIG. 8b A particularly advantageous embodiment of the progressive modeling module using orthogonal vectors 16 will now be given in connection with FIG. 8b.
  • the aforementioned module allows in fact to perform a quantification vector illustration.
  • ⁇ 1 is the gain associated with the optimal vector ⁇ j (1) / K from the stochastic dictionary of rank l, noted 16 l.
  • the vectors selected iteratively are generally not linearly independent and do not therefore not form a basis.
  • the subspace generated by the L optimal vectors ⁇ j (L) / K is of dimension less than L.
  • r (l, j) ⁇ ⁇ j (l) orth (l) ⁇ j (l) l
  • ⁇ j orth (l) > represents the intercorrelation of the optimal vectors of rank j and of rank j (l)
  • T K I NOT - ⁇ j (l) orth (l) ⁇ j (l) l ( ⁇ j orth (l) ) T represents the orthogonalization matrix.
  • the previous operation allows to remove from the dictionary the contribution of the previously selected wave and thus imposes linear independence for any vector optimal of rank i between l + 1 and L compared to optimal vectors of lower rank.
  • Q is an orthonormal matrix
  • R an upper triangular matrix whose elements of the main diagonal are all positive, which ensures the uniqueness of the decomposition.
  • the upper triangular matrix R thus makes it possible to recursively calculate the gains ⁇ (k) relative to the base of origin.
  • the parameters transmitted by the coding system which is the subject of the invention for modeling the gain vector G are then the indices j (l) of the selected vectors as well as the numbers i (l) of the ranges of quantification of their associated gains, .
  • Data transmission is then done by overwriting the parts of the frame allocated to the indices and track numbers j (l), i (l), for l ⁇ [L1, L2-1] and [L2, L] as required. of communication.
  • the previously mentioned processing process uses the recursive modified Gram-Schmidt algorithm in order to code the gain vector G.
  • the parameters transmitted by the coding system according to the invention being the aforementioned indices, j (0) to j (L ) of the different dictionaries as well as the quantified gains g (0) and ⁇ ⁇ , it is necessary to code the various aforementioned gains g (0) and ⁇ ⁇ .
  • a study has shown that the gains relative to the orthogonal base ⁇ j (l) / orth (L) ⁇ being decorrelated, these have good properties for their quantification.
  • the gains ⁇ ⁇ are ordered in a relatively decreasing order, and it is possible to use this property by coding not the aforementioned gains, but their ratio given by Several solutions can be used to code the aforementioned reports.
  • the coding device object of the present invention comprises a module for modeling the excitation of the filter summary corresponding to the lowest throughput, this module being noted 17 in the aforementioned figure.
  • the principle diagram for calculating the excitation signal of the synthesis filter corresponding to the lowest bit rate is given in FIG. 11.
  • An inverse transformation is applied to the gain vectors modeled G and 1 , this inverse adaptive transformation can for example correspond to a reverse transformation of the Householder type, which will be described later in the description, in conjunction with the decoding device which is the subject of the present invention.
  • the signal obtained after inverse adaptive transformation is added to the long-term prediction signal B '1 / n by means of a summator 171, the estimated perceptual signal or long-term prediction signal being delivered by the long-term prediction circuit 13 closed loop term.
  • the resulting signal delivered by the adder 171 is filtered by a filter 172, which corresponds from the point of view of the transfer function to the filter 131 of FIG. 3.
  • the filter 172 delivers the residual signal modeled r and 1 / n.
  • a transformative predictive decoding system adaptive to nested codes of a coded digital signal consisting of a coded speech signal, and where appropriate, by an auxiliary data signal inserted into the signal speech coded after coding of the latter will now described in conjunction with Figure 12.
  • the decoding system comprises a circuit 20 for extracting the data signal allowing on the one hand the extraction of the data for an auxiliary use, by an output of the auxiliary data and, on the other hand , the transmission of indexes representative of the coded speech signal.
  • the aforementioned indexes are the indices i (l) and j (l), for l between 0 and L 1 -1 previously described in the description and for l between l 1 and L under the conditions which will be described below.
  • the decoding system according to the invention comprises a circuit 21 for modeling the speech signal at the minimum bit rate, as well as a circuit 22 or 23 for modeling the speech signal at at least one flow greater than the aforementioned minimum flow.
  • the decoding system comprises, in addition to the system for extracting data 20, a first signal modeling module 21 speech at minimum bit rate receiving signal directly coded and delivering a first estimated speech signal, noted S and 1 / n and a second module 22 for modeling the signal of speech at an intermediate rate connected to the extraction system 20 data via a circuit 27 conditional switching on actual flow criteria allocated to the speech signal and delivering a second signal estimated speech, noted S and 2 / n.
  • the decoding system shown in Figure 12 also includes a third modeling module 23 of the speech signal at maximum rate, this module being connected to the data extraction system 20 via of a conditional switching circuit 28 on criterion of the actual bit rate allocated to speech and delivering a third estimated speech signal S and 3 / n.
  • a summing circuit 24 receives the first, the second and the third estimated speech signal, and delivers at its output a resulting estimated speech signal, noted S and n .
  • an adaptive filtering circuit 25 receives the resulting estimated speech signal S and n and delivering a reconstituted estimated speech signal, denoted S and ' n .
  • a digital-to-analog converter 26 may be provided to receive the reconstructed speech signal and to output an audio-frequency reconstituted speech signal.
  • each of the signal modeling modules of speech at a minimum, intermediate and maximum rate includes a inverse adaptive transformation sub-module, followed by a inverse perceptual weighting filter.
  • the object decoding system of the present invention takes into account the constraints imposed by data transmission at the level of coding system and in particular at the dictionary level adaptive, as well as the contribution of past excitement.
  • the speech signal modeling circuit at minimum flow 21 is identical to that described relatively to circuit 17 of the coding system according to the invention at from an inverse adaptive transformation module similar to module 170 described in relation to the figure 11.
  • an inverse adaptive transformation module similar to module 170 described in relation to the figure 11.
  • FIG. 13b an advantageous embodiment of this is shown in FIG. 13b. It is indicated that the embodiment represented in FIG. 13b corresponds to a reverse Householder type transform using elements identical to the Householder transform represented in FIG. 7. It is simply indicated that for a perceptual signal delivered by the long-term prediction circuit 13, this signal being denoted P and 1 entering a similar module 140, the signals entering the module 1402, respectively at the level of the multipliers associated with each register, are inverted. The resulting signal delivered by the summator corresponding to the summator 171 of FIG. 11 is filtered by a filter of inverse transfer function of the transfer function of the perceptual weighting matrix and corresponding to the filter 172 of the same FIG. 11.
  • the speech signal modeling modules at intermediate flow or at maximum flow, module 22 or 23, are shown in Figures 14a and 14b.
  • the gain vectors modeled G and 2 , G and 3 are added, as shown in FIG. 14b, by a summator 220, subjected to the process of inverse adaptive transformation in a module 221 identical to the module 210 of FIG. 13a, then filtered by the inverse weighting filter W -1 (z) previously mentioned, this filter being designated by 222, the filtering starting from zero initial conditions, which makes it possible to perform an operation equivalent to multiplication by the inverse matrix W -1 , in order to obtain a progressive modeling of the synthesis signal S and n .
  • FIG. 14b the presence of switching devices, which are none other than the switching devices 24 and 28 shown in FIG. 12, which are controlled as a function of the actual bit rate of the data transmitted.
  • This adaptive filter makes it possible to improve the perceptual quality of the synthesis signal S and n obtained following the summation by the summator 24.
  • a filter comprises for example a long-term post-filtering module denoted 250, followed by a short-term post-filtering module and an energy control module 252, which is controlled by a module 253 for calculating the scale factor.
  • the adaptive filter 25 delivers the filtered signal S and ' n , this signal corresponding to the signal in which the quantization noise introduced by the coder on the synthesized speech signal has been filtered in the places of the spectrum where this is possible.
  • FIG. 15 corresponds to the publications of JHChen and A.Gersho, "Real Time Vector APC Speech Coding at 4800 Bps with Adaptative Postfiltering", ICASSP 87, Vol.3, pp 2185-2188.
  • the coding system object of the invention allows wideband coding at speech / data rates of 32/0 kbit / s, 24/8 kbit / s and 16/16 kbit / s.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Description

La présente invention est relative à un système de codage-décodage prédictif d'un signal numérique de parole par transformée adaptative à codes imbriqués.The present invention relates to a system for predictive coding-decoding of a digital speech signal by adaptive transform with nested codes.

Dans les codeurs prédictifs par transformée actuellement utilisés, ce type de codeur étant représenté en figure 1, on cherche à construire un signal synthétique S andn le plus ressemblant possible au signal numérique de parole à coder Sn, ressemblance au sens d'un critère perceptuel.In currently predictive transform coders used, this type of encoder being represented in figure 1, we seek to construct a synthetic signal S andn as close as possible to the digital speech signal to code Sn, resemblance in the sense of a perceptual criterion.

Le signal numérique à coder Sn, issu d'un signal de parole source analogique, est soumis à un processus de prédiction à court terme, analyse LPC, les coefficients de prédiction étant obtenus par prédiction du signal de parole sur des fenêtres comportant M échantillons. Le signal numérique de parole à coder Sn est filtré au moyen d'un filtre de pondération perceptuelle W(z) déduit des coeffidents de prédiction précités, pour obtenir le signal perceptuel pnThe digital signal to code Sn, coming from a signal of analog source speech, is subject to a process of short-term prediction, LPC analysis, coefficients of prediction being obtained by prediction of the speech signal on windows with M samples. The signal speech digital code Sn is filtered using a perceptual weighting filter W (z) deduced from the coefficients above, to get the signal perceptual pn

Un processus de prédiction à long terme permet ensuite de prendre en compte la périodicité du résidu pour les sons voisés, sur toutes les sous-fenêtres de N échantillons, N < M, sous forme d'une contribution p andn, laquelle est soustraite du signal perceptuel pn de façon à obtenir le signal p'n sous forme d'un vecteur P'∈RN.A long-term prediction process then makes it possible to take into account the periodicity of the residue for the voiced sounds, on all the sub-windows of N samples, N <M, in the form of a contribution p and n , which is subtracted from the perceptual signal pn so as to obtain the signal p'n in the form of a vector P'∈R N.

Une transformation suivie d'une quantification sont ensuite réalisées sur le vecteur P' précité en vue d'effectuer une transmission numérique. Les opérations inverses permettent, après transmission, la modélisation du signal synthétique S andn.A transformation followed by a quantification are then carried out on the aforementioned vector P ′ in order to carry out a digital transmission. The reverse operations allow, after transmission, the modeling of the synthetic signal S and n .

Afin d'obtenir un bon comportement perceptuel, selon les critères habituels établis par l'expérience, il est nécessaire d'établir un processus de transformation par transformée orthonormée F et de quantification du vecteur P', en présence de valeurs de gain G vérifiant des propriétés bien déterminées, G = FT.P' où FT désigne la matrice transposée de la matrice F.In order to obtain good perceptual behavior, according to the usual criteria established by experience, it is necessary to establish a transformation process by orthonormal transform F and quantization of the vector P ', in the presence of gain values G verifying well determined properties, G = F T .P 'where F T denotes the matrix transposed from the matrix F.

Une première solution, proposée par G.Davidson et A.Gersho, dans la publication "Multiple-Stage Vector Excitation Coding of Speech Wave forms", ICASSP 88, Vol.1, pp 163-166, consiste à utiliser une matrice de transformation non singulière V = HC où H est une matrice triangulaire inférieure et C un dictionnaire non singulier, construit par apprentissage, assurant l'inversablité de la matrice de transformation V pour toute sous-fenêtre.A first solution, proposed by G. Davidson and A.Gersho, in the publication "Multiple-Stage Vector Excitation Coding of Speech Wave forms ", ICASSP 88, Vol.1, pp 163-166, consists in using a transformation matrix non singular V = HC where H is a triangular matrix lower and C a non-singular dictionary, constructed by learning, ensuring the invertability of the matrix transformation V for any sub-window.

Afin de pouvoir exploiter certaines propriétés de décorrélation et d'ordonnancement des composantes du vecteur de coefficients de la transformée G lors de l'étape de quantification, plusieurs solutions utilisant des transformées orthonormées ont été proposées.In order to be able to exploit certain properties of decorrelation and ordering of vector components of coefficients of the transform G during the step of quantification, several solutions using transforms orthonormal have been proposed.

La transformée de Karhunen-Loeve, obtenue à partir des vecteurs propres de la matrice d'auto-corrélation

Figure 00020001
où I est le nombre de vecteurs contenu dans le corpus d'apprentissage, permet de maximiser l'expression
Figure 00020002
où K est un entier, K ≤ N. On démontre que l'erreur quadratique moyenne de la transformée de Karhunen-Loeve est inférieure à celle de toute autre transformation pour un ordre de modélisation K donné, cette transformée étant, dans ce sens, optimale. Ce type de transformée a été introduit dans un codeur prédictif par transformée orthogonale par N.Moreau et P.Dymarski, confer publication "Successive Orthogonalisations in the Multistage CELP Coder", ICASSP 92 Vol.1, pp I-61 - I-64.The Karhunen-Loeve transform, obtained from the eigenvectors of the auto-correlation matrix
Figure 00020001
where I is the number of vectors contained in the learning corpus, allows to maximize the expression
Figure 00020002
where K is an integer, K ≤ N. We prove that the mean square error of the Karhunen-Loeve transform is lower than that of any other transformation for a given modeling order K, this transform being, in this sense, optimal . This type of transform was introduced into a predictive coder by orthogonal transform by N. Moreau and P. Dymarski, confer publication "Successive Orthogonalisations in the Multistage CELP Coder", ICASSP 92 Vol.1, pp I-61 - I-64.

Toutefois, afin de réduire la complexité de calcul du vecteur de gain G, il est possible d'utiliser des transformées sous-optimales, telles que la transformée de Fourier Rapide (FFT), la transformée en cosinus discrète (TCD) la transformée discrète de Hadamard (DHT) ou de Walsh Hadamard (DWHT) par exemple.However, in order to reduce the computational complexity of the gain vector G, it is possible to use suboptimal transforms, such as the transform of Fourier Rapide (FFT), the transform into a discrete cosine (TCD) the discrete transform of Hadamard (DHT) or Walsh Hadamard (DWHT) for example.

Une autre méthode pour la construction d'une transformée orthonormée consiste à décomposer en valeurs singulières la matrice de Toeplitz triangulaire inférieure H définie par :

Figure 00030001
matrice dans laquelle h(n) est la réponse impulsionnelle du filtre de prédiction à court terme 1/A(z) de la fenêtre courante.Another method for the construction of an orthonormal transform consists in decomposing into singular values the lower triangular Toeplitz matrix H defined by:
Figure 00030001
matrix in which h (n) is the impulse response of the short-term prediction filter 1 / A (z) of the current window.

La matrice H peut alors être décomposée en une somme de matrices de rang 1 :

Figure 00030002
The matrix H can then be decomposed into a sum of matrices of rank 1:
Figure 00030002

La matrice U étant unitaire, celle-ci peut être utilisée en tant que transformée orthonormée. Une telle construction a été proposée par B.S.Atal dans la publication "A Model of LPC Excitation in Terms of Eigenvectors of the Autocorrelation Matrix of the Impulse Response of the LPC Filter", ICASSP 89, Vol.1, pp 45-48 et par E.Ofer dans la publication "A Unified Framework for LPC Excitation Representation in Residual Speech Coders" ICASSP 89, Vol.1 pp 41-44.The matrix U being unitary, this can be used as an orthonormal transform. Such a construction was proposed by B.S. Atal in the publication "A Model of LPC Excitation in Terms of Eigenvectors of the Autocorrelation Matrix of the Impulse Response of the LPC Filter ", ICASSP 89, Vol.1, pp 45-48 and by E.Ofer in the publication "A Unified Framework for LPC Excitation Representation in Residual Speech Coders "ICASSP 89, Vol.1 pp 41-44.

Les codeurs à codes imbriqués actuellement connus permettent de transmettre des données par vol d'éléments binaires normalement alloués à la parole sur le canal de transmission, et ce, d'une façon transparente pour le codeur, lequel code le signal de parole au débit maximum.Encoders with nested codes currently known allow data to be transmitted by theft of elements binaries normally allocated to speech on the channel transmission in a transparent manner for the encoder, which encodes the speech signal at the maximum rate.

Parmi ce type de codeurs, un codeur à 64 kbit/s à quantificateur scalaire à codes imbriqués a été normalisé en 1986 par la norme G 722 établie par le CCITT. Ce codeur opérant dans le domaine de la parole en bande élargie (signal audio de largeur de bande de 50 Hz à 7 kHz, échantillonné à 16 kHz), est basé sur un codage en deux sous-bandes contenant chacune un codeur à Modulation par Impulsion et Codage Différentiel Adaptatif (codage MICDA). Cette technique de codage permet de transmettre des signaux de parole en bande élargie et des données, si nécessaire, sur un canal à 64 kbit/s, à trois débits différents 64-56-48 kbit/s et 0-8-16 kbit/s pour les données.Among this type of encoder, a 64 kbit / s encoder with scaled quantizer with nested codes has been normalized to 1986 by standard G 722 established by the CCITT. This encoder operating in the field of wideband speech (50 Hz to 7 kHz bandwidth audio signal, sampled at 16 kHz), is based on coding in two sub-bands each containing a Pulse Modulation encoder and Adaptive Differential Coding (MICDA coding). This coding technique allows you to transmit signals from wideband speech and data, if necessary, on a 64 kbit / s channel, at three different bit rates 64-56-48 kbit / s and 0-8-16 kbit / s for data.

En outre, dans le cadre de la mise en oeuvre de codeurs excités par codes (ou codeurs CELP) M.Johnson et T.Tanigushi ont décrit un codeur CELP multi-étages à codes imbriqués. Confer la publication des auteurs précités intitulée "Pitch Orthogonal Code-Excited LPC", Globecom 90, Vol.1, pp 542-546.In addition, as part of the implementation of coders excited by codes (or CELP coders) M.Johnson and T. Tanigushi described a multistage CELP coder nested. Confer the publication of the aforementioned authors entitled "Pitch Orthogonal Code-Excited LPC", Globecom 90, Vol. 1, pp 542-546.

Enfin, R.Drogo De Iacovo et D.Sereno ont décrit un codeur de type CELP modifié permettant d'obtenir des codes imbriqués ou modélisant le signal d'excitation du filtre d'analyse LPC par une somme de différentes contributions et utilisant seulement la première d'entre elles pour la remise à jour de la mémoire du filtre de synthèse, confer la publication de ces auteurs "Embedded CELP Coding For Variable Bit-Rate Between 6.4 and 9.6 kbit/s" ICASSP 91 Vol.1, pp 681-684.Finally, R. Drogo De Iacovo and D. Sereno described a modified CELP type coder for obtaining codes nested or modeling the filter excitation signal LPC analysis by a sum of different contributions and using only the first of them for delivery the memory of the synthesis filter, confer the publication of these authors "Embedded CELP Coding For Variable Bit-Rate Between 6.4 and 9.6 kbit / s "ICASSP 91 Vol. 1, pp 681-684.

Les codeurs prédictifs par transformée de l'art antérieur précités ne permettent pas de transmettre des données et ne peuvent donc pas remplir la fonction de codeurs à codes-imbriqués. En outre, les codeurs à codes imbriqués de l'art antérieur n'utilisent pas la technique de la transformée orthonormée, ce qui ne permet pas de tendre vers ou d'atteindre un codage par transformée optimal.Coders predictive by art transformation above do not allow the transmission of data and therefore cannot fulfill the function of coders with nested codes. In addition, coded coders nested in the prior art do not use the technique of the orthonormal transform, which does not allow to tend towards or reach an optimal transform coding.

La présente invention a pour objet de remédier à l'inconvénient précité par la mise en oeuvre d'un système de codage-décodage prédictif d'un signal numérique de parole par transformée adaptative à codes imbriqués.The object of the present invention is to remedy the aforementioned drawback by the implementation of a system of predictive coding-decoding of a digital speech signal by adaptive transform with nested codes.

Un autre objet de la présente invention est la mise en oeuvre d'un système de codage-décodage prédictif d'un signal numérique de parole et de données permettant une transmission à des débits réduits et flexibles.Another object of the present invention is the implementation implementation of a predictive coding-decoding system of a digital speech and data signal allowing transmission at reduced and flexible rates.

Le système de codage prédictif d'un signal numérique en un signal numérique à codes imbriqués, dans lequel le signal numérique codé est constitué par un signal de parole codé et, le cas échéant, par un signal de données auxiliaires insérées au signal de parole codé après codage de ce dernier, objet de la présente invention, comprend un filtre de pondération perceptuelle piloté par une boucle de prédiction à court terme permettant d'engendrer un signal perceptuel et un circuit de prédiction à long terme délivrant un signal perceptuel estimé, ce circuit de prédiction à long terme formant une boucle de prédiction à long terme permettant de délivrer, à partir du signal perceptuel et du signal d'excitation passée estimé, un signal d'excitation perceptuelle modélisé et des circuits de transformée adaptative et de quantification permettant à partir du signal d'excitation perceptuelle d'engendrer le signal de parole codé.The predictive coding system for a digital signal into a digital signal with nested codes, in which the coded digital signal consists of a speech signal encoded and, if necessary, by an auxiliary data signal inserted into the coded speech signal after coding this last object of the present invention includes a filter perceptual weighting driven by a prediction loop short term to generate a perceptual signal and a long-term prediction circuit delivering a estimated perceptual signal, this long prediction circuit term forming a long-term prediction loop allowing to deliver, from the perceptual signal and the signal estimated past excitation, a perceptual excitation signal modeled and adaptive transform circuits and quantization allowing from the excitation signal perceptual to generate the coded speech signal.

Il est remarquable en ce que le filtre de pondération perceptuelle consiste en un filtre de prédiction à court terme du signal de parole à coder, de façon à réaliser une répartition fréquentielle du bruit de quantification et en ce qu'il comporte un circuit de soustraction de la contribution du signal d'excitation passée du signal perceptuel pour délivrer un signal perceptuel réactualisé, le circuit de prédiction à long terme étant formé, en boucle fermée, à partir d'un dictionnaire réactualisé par l'excitation passée modélisée correspondant au débit le plus faible permettant de délivrer une forme d'onde optimale et un gain associé à celle-ci, constitutif du signal perceptuel estimé. Le circuit de transformée est formé par un module de transformée orthonormée comportant un module de transformation orthogonale adaptative et un module de modélisation progressive par vecteurs orthogonaux. Le module de modélisation progressive et le circuit de prédiction à long terme permettent de délivrer des index représentatifs du signal de parole codé. Un circuit d'insertion des données auxiliaires est couplé au canal de transmission.It is remarkable in that the weighting filter perceptual consists of a prediction filter to short term of the speech signal to be coded, so as to achieve a frequency distribution of the quantization noise and in that it includes a circuit for subtracting the contribution of the excitation signal passed from the signal perceptual to deliver an updated perceptual signal, the long-term prediction circuit being formed, in a loop closed, from a dictionary updated by excitement past modeled corresponding to the lowest flow to deliver an optimal waveform and gain associated with it, constituting the estimated perceptual signal. The transform circuit is formed by a module of orthonormal transform including a transformation module orthogonal adaptive and a modeling module progressive by orthogonal vectors. The modeling module progressive and long-term prediction circuit allow the delivery of indexes representative of the signal coded speech. An auxiliary data insertion circuit is coupled to the transmission channel.

Le système de décodage prédictif par transformée adaptative d'un signal numérique codé à codes imbriqués dans lequel le signal numérique codé est constitué par un signal numérique codé et, le cas échéant, par un signal de données auxiliaires insérées au signal de parole codé après codage de ce dernier, est remarquable en ce qu'il comporte un circuit d'extraction du signal de données permettant, d'une part, l'extraction des données en vue d'une utilisation auxiliaire et, d'autre part, la transmission d'index représentatifs du signal de parole codé. Il comprend en outre un circuit de modélisation du signal de parole au débit minimum et un circuit de modélisation du signal de parole à au moins un débit supérieur au débit minimum.The transform predictive decoding system adaptive of a digital signal coded with codes nested in which the coded digital signal consists of a signal digital coded and, where appropriate, by a data signal auxiliaries inserted in the coded speech signal after coding of the latter, is remarkable in that it comprises a data signal extraction circuit enabling, hand, data extraction for use auxiliary and, on the other hand, index transmission representative of the coded speech signal. It includes in addition to a circuit for modeling the speech signal at minimum flow and a signal modeling circuit speech at least at a rate greater than the minimum rate.

Le système de codage-décodage prédictif d'un signal numérique de parole par transformée adaptative à codes imbriqués objet de la présente invention trouve application, de manière générale, à la transmission de la parole et de données à des débits flexibles, et, plus particulièrement, aux protocoles de conférences audio-visuelles, au visiophone, à la téléphonie sur haut-parleurs, au stockage et au transport de signaux audio-numériques sur des liaisons longues distances, à la transmission avec des mobiles et des systèmes à concentration de voies.The predictive coding-decoding system for a signal speech digital by adaptive transform to codes nested object of the present invention finds application, in general, to the transmission of speech and data at flexible rates, and more specifically, audio-visual conference protocols, videophone, speakerphone, storage and transport of digital audio signals over links long distances, when transmitting with mobiles and channel concentration systems.

Une description plus détaillée du système de codage-décodage objet de l'invention sera donnée ci-après en relation avec les dessins dans lesquels, outre la figure 1 relative à l'art antérieur concernant un codeur prédictif par transformée,

  • la figure 2 représente un schéma de principe du système de codage prédictif d'un signal de parole par transformée adaptative à codes imbriqués objet de la présente invention,
  • la figure 3 représente un détail de réalisation d'un module de prédiction à long terme en boucle fermée utilisé dans le système de codage représenté en figure 2,
  • les figures 4a et 4b représentent un schéma partiel d'un codeur prédictif par transformée et un schéma équivalent au schéma partiel de la figure 4a,
  • la figure 5a représente un organigramme d'un processus de transformée orthonormée construit par apprentissage,
  • la figure 5b représente deux diagrammes comparatifs de valeurs de gain normalisées obtenus par décomposition en valeurs singulières respectivement par apprentissage,
  • les figures 6a et 6b représentent schématiquement le processus de transformation de Householder appliqué au signal perceptuel,
  • la figure 7 représente un module de transformation adaptative mettant en oeuvre une transformation de Householder,
  • la figure 8a représente, pour la décomposition en valeurs singulières respectivement la construction pour apprentissage, un critère normalisé de gain en fonction du nombre de composantes du vecteur de gains,
  • la figure 8b représente un schéma de principe de quantification vectorielle multiétage dans lequel le vecteur de gains G and est obtenu par combinaison linéaire de vecteurs issus de dictionnaires stochastiques,
  • la figure 9 est une représentation géométrique de la prospection du vecteur de gain G dans un sous-espace de vecteurs issus de dictionnaires stochastiques,
  • les figures 10a e 10b représentent le schéma de principe d'un processus de quantification vectorielle de gain par modélisations progressives orthogonales, correspondant à une projection optimale de ce vecteur de gain représentée en figure 9, dans le cas d'un seul respectivement de plusieurs dictionnaires stochastiques,
  • la figure 11 représente un mode de réalisation de la modélisation de l'excitation du filtre de synthèse correspondant au débit le plus faible,
  • la figure 12 représente un schéma de principe d'un système de décodage prédictif d'un signal de parole par transformée adaptative à codes imbriqués objet de la présente invention,
  • la figure 13a représente un schéma de principe d'un module de modélisation du signal de parole au débit minimum,
  • la figure 13b représente un mode de réalisation d'un module de transformation orthonormée inverse,
  • la figure 14a représente un schéma d'un module de modélisation du signal de parole aux débits autres que le débit minimum,
  • la figure 14b représente un schéma équivalent au module de modélisation représenté en figure 14a,
  • la figure 15 représente la mise en oeuvre d'un filtre adaptatif de post-filtrage destiné à améliorer la qualité perceptuelle du signal de parole de synthèse S andn.
A more detailed description of the coding-decoding system which is the subject of the invention will be given below in relation to the drawings in which, in addition to FIG. 1 relating to the prior art concerning a predictive transform coder,
  • FIG. 2 represents a block diagram of the predictive coding system of a speech signal by adaptive transform with nested codes which is the subject of the present invention,
  • FIG. 3 represents a detail of an embodiment of a long-term closed-loop prediction module used in the coding system represented in FIG. 2,
  • FIGS. 4a and 4b represent a partial diagram of a predictive coder by transform and a diagram equivalent to the partial diagram of FIG. 4a,
  • FIG. 5a represents a flow diagram of an orthonormal transform process constructed by learning,
  • FIG. 5b represents two comparative diagrams of normalized gain values obtained by decomposition into singular values respectively by learning,
  • FIGS. 6a and 6b schematically represent the Householder transformation process applied to the perceptual signal,
  • FIG. 7 represents an adaptive transformation module implementing a Householder transformation,
  • FIG. 8a represents, for the decomposition into singular values respectively the construction for learning, a normalized gain criterion as a function of the number of components of the gain vector,
  • FIG. 8b represents a principle diagram of multi-stage vector quantization in which the gain vector G and is obtained by linear combination of vectors from stochastic dictionaries,
  • FIG. 9 is a geometric representation of the prospecting of the gain vector G in a subspace of vectors originating from stochastic dictionaries,
  • FIGS. 10a and 10b represent the block diagram of a vector quantization process of gain by orthogonal progressive modeling, corresponding to an optimal projection of this gain vector represented in FIG. 9, in the case of a single respectively of several dictionaries stochastic,
  • FIG. 11 represents an embodiment of the modeling of the excitation of the synthesis filter corresponding to the lowest bit rate,
  • FIG. 12 represents a block diagram of a system for predictive decoding of a speech signal by adaptive transform with nested codes object of the present invention,
  • FIG. 13a represents a block diagram of a module for modeling the speech signal at the minimum rate,
  • FIG. 13b represents an embodiment of a reverse orthonormal transformation module,
  • FIG. 14a represents a diagram of a module for modeling the speech signal at bit rates other than the minimum bit rate,
  • FIG. 14b represents a diagram equivalent to the modeling module represented in FIG. 14a,
  • FIG. 15 represents the implementation of an adaptive post-filtering filter intended to improve the perceptual quality of the synthetic speech signal S andn.

Une description plus détaillée d'un système de codage prédictif d'un signal numérique de parole par transformée adaptative en un signal numérique à codes imbriqués sera maintenant donnée en liaison avec la figure 2 et les figures suivantes.A more detailed description of a predictive coding of a digital speech signal by adaptive transform into a digital code signal nested will now be given in connection with the figure 2 and the following figures.

D'une manière générale on considère que le signal numérique codé par la mise en oeuvre du système de codage objet de la présente invention est constitué par un signal de parole codé et le cas échéant par un signal de données auxiliaires insérées au signal de parole codé, après codage de ce signal numérique de parole.Generally we consider that the signal digital coded by implementing the coding system object of the present invention consists of a signal speech coded and if necessary by a data signal auxiliaries inserted in the coded speech signal, after coding of this digital speech signal.

Bien entendu, le système de codage objet de la présente invention peut comprendre, à partir d'un transducteur délivrant le signal de parole analogique, un convertisseur analogique-numérique et un circuit de mémorisation d'entrée ou buffer d'entrée permettant de délivrer le signal numérique à coder Sn.Of course, the object coding system of the present invention may include, from a transducer delivering the analog speech signal, a converter analog-digital and a memory circuit input or input buffer to deliver the signal digital to code Sn.

Le système de codage objet de la présente invention comprend également un filtre de pondération perceptuelle 11 piloté par une boucle de prédiction à court terme permettant d'engendrer un signal perceptuel, noté

Figure 00080001
. The coding system object of the present invention also comprises a perceptual weighting filter 11 controlled by a short-term prediction loop making it possible to generate a perceptual signal, noted
Figure 00080001
.

Il comprend également un circuit de prédiction à long terme, noté 13, délivrant un signal perceptuel estimé, lequel est noté P and 1 / n.It also includes a prediction circuit to long term, noted 13, delivering an estimated perceptual signal, which is noted P and 1 / n.

Le circuit de prédiction à long terme 13 forme une boucle de prédiction à long terme permettant de délivrer, à partir du signal perceptuel et du signal d'excitation passée estimée, noté P and 0 / n, un signal d'excitation perceptuelle modélisée.The long-term prediction circuit 13 forms a long-term prediction loop to deliver, to from the perceptual signal and the past excitation signal estimated, noted P and 0 / n, a perceptual excitation signal modeled.

Le système de codage objet de l'invention tel que représenté en figure 2 comporte en outre un circuit de transformée adaptative et de quantification permettant à partir du signal d'excitation perceptuel Pn d'engendrer le signal de parole codé ainsi qu'il sera décrit ci-après dans la description.The coding system which is the subject of the invention as shown in FIG. 2 further comprises an adaptive transform and quantization circuit making it possible, from the perceptual excitation signal P n, to generate the coded speech signal as it will be described below in the description.

Selon un premier aspect particulièrement avantageux du système de codage objet de la présente invention, le filtre de pondération perceptuelle 11 consiste en un filtre de prédiction à court terme du signal de parole à coder, de façon à réaliser une répartition fréquentielle du bruit de quantification. Le filtre de pondération perceptuelle 11 délivrant le signal perceptuel , le dispositif de codage selon l'invention comprend ainsi que représenté sur la même figure 2 un circuit 120 de soustraction de la contribution du signal d'excitation passée P and 0 / n du signal perceptuel pour délivrer un signal perceptuel réactualisé, ce signal perceptuel réactualisé étant noté Pn.According to a first particularly advantageous aspect of the coding system which is the subject of the present invention, the perceptual weighting filter 11 consists of a filter for short-term prediction of the speech signal to be coded, so as to achieve a frequency distribution of the quantization noise. The perceptual weighting filter 11 delivering the perceptual signal , the coding device according to the invention thus comprises, as shown in the same FIG. 2, a circuit 120 for subtracting the contribution of the past excitation signal P and 0 / n from the perceptual signal to deliver an updated perceptual signal, this perceptual signal updated being noted P n .

Selon une autre caractéristique particulièrement avantageuse du dispositif de codage objet de la présente invention, le circuit de prédiction à long terme 13 est formé en boucle fermée à partir d'un dictionnaire réactualisé par l'excitation passée modélisée correspondant au débit le plus faible, ce dictionnaire permettant de délivrer une forme d'onde optimale et un gain estimé associé à celle-ci. Sur la figure 2, l'excitation passée modélisée correspondant au débit le plus faible est notée r and 1 / n. On indique en outre que la forme d'onde optimale et le gain estimé associé à celle-ci sont constitutifs du signal perceptuel estimé P and 1 / n délivré par le circuit 13 de prédiction à long terme.According to another particularly characteristic advantageous coding device object of this invention the long term prediction circuit 13 is formed in a closed loop from an updated dictionary by the past excitation modeled corresponding to the lowest bit rate, this dictionary allows to deliver an optimal waveform and an estimated gain associated therewith. In Figure 2, the corresponding modeled past excitation at the lowest flow rate is noted r and 1 / n. We indicate in in addition to the optimal waveform and associated estimated gain to it constitute the estimated perceptual signal P and 1 / n delivered by the long-term prediction circuit 13.

Selon une autre caractéristique du système de codage objet de la présente invention, ainsi que représenté en figure 2, le circuit module de transformée, noté MT, est formé par un module de transformée orthonormé 14, comportant un module de transformation orthogonale adaptative proprement dit et un module de modélisation progressive par vecteurs orthogonaux, noté 16.According to another characteristic of the coding system object of the present invention, as shown in Figure 2, the transform module circuit, denoted MT, is formed by an orthonormal transform module 14, comprising a properly adaptive orthogonal transformation module said and a progressive modeling module by orthogonal vectors, noted 16.

Conformément à un aspect particulièrement avantageux du système de codage objet de la présente invention, le module de modélisation progressive 16 et le circuit de prédiction à long terme 13 permettent de délivrer des index représentatifs du signal de parole codé, ces index étant notés i(0), j(0) respectivement i(1), j(1) avec 1 ∈ [1,L] sur la figure 2.In accordance with a particularly advantageous aspect of the coding system which is the subject of the present invention, the progressive modeling module 16 and the circuit long-term prediction 13 allow to issue indexes representative of the coded speech signal, these indexes being denoted i (0), j (0) respectively i (1), j (1) with 1 ∈ [1, L] in figure 2.

Enfin, le système de codage selon l'invention comprend en outre un circuit 19 d'insertion des données auxiliaires couplé au canal de transmission, noté 18.Finally, the coding system according to the invention further includes a data insertion circuit 19 auxiliaries coupled to the transmission channel, noted 18.

Le fonctionnement du dispositif de codage objet de la présente invention peut être illustré de la façon ci-après.The operation of the object coding device the present invention can be illustrated as follows.

Ainsi qu'on l'a indiqué précédemment, on cherche à reconstituer un signal synthétique S andn le plus ressemblant possible perceptuellement au signal numérique à coder Sn.As indicated previously, an attempt is made to reconstruct a synthetic signal S and n as closely as possible perceptually resembling the digital signal to be coded Sn.

Le signal synthétique S andn est bien entendu le signal reconstitué à la réception, c'est à dire au niveau décodage après transmission ainsi qu'il sera décrit ultérieurement dans la description.The synthetic signal S and n is of course the signal reconstructed on reception, that is to say at the decoding level after transmission as will be described later in the description.

Une analyse de prédiction à court terme formée par le circuit d'analyse 10 de type LPC pour "Linear Predictive Coding" et par le filtre de pondération perceptuelle 11 est réalisée pour le signal numérique à coder par une technique classique de prédiction sur des fenêtres comportant par exemple M échantillons. Le circuit d'analyse 10 délivre alors les coefficients ai, où les coefficients ai précités sont les coefficients de prédiction linéaire.A short-term prediction analysis formed by the analysis circuit 10 of the LPC type for "Linear Predictive Coding" and by the perceptual weighting filter 11 is carried out for the digital signal to be coded by a conventional prediction technique on windows comprising for example M samples. The analysis circuit 10 then delivers the coefficients a i , where the aforementioned coefficients a i are the linear prediction coefficients.

Le signal de parole à coder Sn est alors filtré par le filtre de pondération perceptuelle 11 de fonction de transfert W(z), lequel permet de délivrer le signal perceptuel proprement dit, noté .The speech signal to be coded Sn is then filtered by the perceptual weighting filter 11 of transfer function W (z), which makes it possible to deliver the perceptual signal proper, noted .

Les coefficients du filtre de pondération perceptuelle sont obtenus à partir d'une analyse de prédiction à court terme sur les premiers coefficients de corrélation de la séquence des coefficients ai du filtre d'analyse A(z) du circuit 10 pour la fenêtre courante. Cette opération permet de réaliser une bonne répartition fréquentielle du bruit de quantification. En effet, le signal perceptuel délivré tolère des bruits de codage plus importants dans les zones de fortes énergies où le bruit est moins audible, car masqué fréquentiellement par le signal. On indique que l'opération de filtrage perceptuel se décompose en deux étapes, le signal numérique à coder Sn étant filtré une première fois par le filtre constitué par le circuit d'analyse 10, afin d'obtenir le résidu à modéliser, puis une seconde fois par le filtre de pondération perceptuelle 11 pour délivrer le signal perceptuel .The coefficients of the perceptual weighting filter are obtained from a short-term prediction analysis on the first correlation coefficients of the sequence of the coefficients a i of the analysis filter A (z) of circuit 10 for the current window. This operation makes it possible to achieve a good frequency distribution of the quantization noise. In fact, the perceptual signal delivered tolerates greater coding noise in high-energy areas where the noise is less audible, since it is frequently masked by the signal. It is indicated that the perceptual filtering operation is broken down into two stages, the digital signal to code Sn being filtered a first time by the filter constituted by the analysis circuit 10, in order to obtain the residue to be modeled, then a second times by the perceptual weighting filter 11 to deliver the perceptual signal .

Dans le processus de fonctionnement du dispositif de codage objet de la présente invention, la seconde opération consiste à retirer alors la contribution de l'excitation passée, ou signal d'excitation passée estimée, noté p and 0 / n du signal perceptuel précité.In the process of operating the device coding object of the present invention, the second operation is to then remove the contribution from the excitement past, or estimated past excitation signal, noted p and 0 / n of aforementioned perceptual signal.

En effet, on montre que :

Figure 00110001
Indeed, we show that:
Figure 00110001

Dans cette relation, hn est la réponse impulsionnelle du double filtrage réalisé par le circuit 10 et le filtre de pondération perceptuelle 11 dans la fenêtre courante et r and 1 / n est l'excitation passée modélisée correspondant au débit le plus faible, ainsi qu'il sera décrit ultérieurement dans la description. In this relation, h n is the impulse response of the double filtering performed by the circuit 10 and the perceptual weighting filter 11 in the current window and r and 1 / n is the past excitation modeled corresponding to the lowest flow rate, as well as 'It will be described later in the description.

Le mode opératoire du circuit de prédiction à long terme 13 en boucle fermée est alors le suivant. Ce circuit permet de prendre en compte la périodicité du résidu pour les sons voisés, cette prédiction à long terme étant réalisée toutes les sous-fenêtres de N échantillons, ainsi qu'il sera décrit en liaison avec la figure 3.The operating mode of the long prediction circuit term 13 in closed loop is then the following. This circuit allows to take into account the periodicity of the residue for voiced sounds, this long-term prediction being performed all the sub-windows of N samples, thus that it will be described in connection with FIG. 3.

Le circuit 13 de prédiction à long terme en boucle fermée comprend un premier étage constitué par un dictionnaire adaptatif 130, lequel est remis à jour toutes les sous-fenêtres précitées par l'excitation modélisée notée r and 1 / n, délivrée par le module 17, lequel sera décrit ultérieurement dans la description. Le dictionnaire adaptatif 130 permet de minimiser l'erreur, notée

Figure 00120001
par rapport aux deux paramètres g0 et q.The long-term closed-loop prediction circuit 13 comprises a first stage constituted by an adaptive dictionary 130, which is updated all the aforementioned sub-windows by the modeled excitation denoted r and 1 / n, delivered by the module 17 , which will be described later in the description. The adaptive dictionary 130 makes it possible to minimize the error, noted
Figure 00120001
with respect to the two parameters g 0 and q.

Une telle opération correspond, dans le domaine fréquentiel, à un filtrage par le filtre de fonction de transfert : B( z ) = 11-g0z-q' Such an operation corresponds, in the frequency domain, to filtering by the transfer function filter: B (z) = 1 1-g 0 z -q '

Cette opération est équivalente a la recherche de la forme d'onde optimale, notée fj(0) et de son gain associé g0 dans un dictionnaire judicieusement construit. Confer l'article publié par R.Rose, et T.Barnwell, intitulé "Design and Performance of an Analysis by Synthesis Class of Predictive Speech Coders", IEEE Trans. on Acoustic Speech Signal Proceessing, Septembre 1990.This operation is equivalent to the search for the optimal waveform, noted f j (0) and its associated gain g 0 in a judiciously constructed dictionary. Confer the article published by R.Rose, and T.Barnwell, entitled "Design and Performance of an Analysis by Synthesis Class of Predictive Speech Coders", IEEE Trans. on Acoustic Speech Signal Proceessing, September 1990.

La forme d'onde d'indice j, notée Cj n = r 1 n-q issue du dictionnaire adaptatif, est filtrée par un filtre 131 et correspond à l'excitation modélisée au débit le plus faible r and 1 / n retardé de q échantillons par le filtre précité. La forme d'onde optimale f 1 / n est délivrée par le dictionnaire adaptatif filtré 133. The waveform of index j, noted VS j not = r 1 nq from the adaptive dictionary, is filtered by a filter 131 and corresponds to the excitation modeled at the lowest rate r and 1 / n delayed by q samples by the aforementioned filter. The optimal waveform f 1 / n is delivered by the filtered adaptive dictionary 133.

Un module 132 de calcul et de quantification du gain de prédiction permet à partir du signal perceptuel Pn et de l'ensemble des formes d'ondes f j(0) / n d'effectuer un calcul de quantification du gain de prédiction, et de délivrer un index i(0) représentatif du numéro de la plage de quantification, ainsi que son gain associé quantifié g(0).A module 132 for calculating and quantifying the prediction gain makes it possible, from the perceptual signal P n and all the waveforms fj (0) / n, to carry out a calculation for quantifying the prediction gain, and to deliver an index i (0) representative of the number of the quantization range, as well as its associated quantized gain g (0).

Un circuit multiplicateur 134 délivre à partir du dictionnaire adaptatif filtré 133, c'est-à-dire du résultat de filtrage de la forme d'onde d'indice j C j / n, soit f j / n, et du gain associé quantifié g(0), l'excitation de prédiction à long terme modélisée et filtrée perceptuellement notée P and 1 / n.A multiplier circuit 134 delivers from adaptive dictionary filtered 133, that is to say of the result filtering the waveform with index j C j / n, i.e. f j / n, and the associated quantified gain g (0), the prediction excitation at long term modeled and filtered perceptually noted P and 1 / n.

Un circuit soustracteur 135 permet alors d'effectuer une minimisation portant sur en = |Pn - P and 1 / n|, cette expression représentant le signal d'erreur. Un module 136 permet de calculer la norme euclidienne |en|2.A subtractor circuit 135 then makes it possible to perform a minimization relating to e n = | P n - P and 1 / n |, this expression representing the error signal. A module 136 makes it possible to calculate the Euclidean norm | e n | 2 .

Un module 137 permet de rechecher la forme d'onde optimale correspondant a la valeur minimale de la norme euclidienne précitée et de délivrer l'index j(0). Les paramètres transmis par le système de codage objet de l'invention pour la modélisation du signal de prédiction à long terme sont alors l'indice j(0) de la forme d'onde optimale fj(0) ainsi que le numéro i(0) de la plage de quantification de son gain associé g(0) quantifié.A module 137 makes it possible to search for the optimal waveform corresponding to the minimum value of the above-mentioned Euclidean standard and to deliver the index j (0). The parameters transmitted by the coding system object of the invention for modeling the long-term prediction signal are then the index j (0) of the optimal waveform f j (0) as well as the number i ( 0) of the quantization range of its associated gain g (0) quantized.

Une description plus détaillée du module de transformation orthogonale adaptative MT de la figure 2 sera donnée en liaison avec les figures 4a et 4b.A more detailed description of the transformation module orthogonal adaptive MT of Figure 2 will given in conjunction with Figures 4a and 4b.

Dans le cadre de la mise en oeuvre du système de codage prédictif par transformée orthonormée objet de la présente invention, la méthode utilisée pour la construction de cette transformée correspond à celle proposée par B.S.Atal et E.Ofer, ainsi que mentionné précédemment dans la description.As part of the implementation of the predictive coding by orthonormal transform object of the present invention the method used for construction of this transform corresponds to that proposed by B.S.Atal and E.Ofer, as previously mentioned in the description.

Conformément au mode de réalisation du système de codage selon la présente invention, celui-ci consiste à décomposer, non la matrice de filtrage de prédiction à court terme, mais la matrice de pondération perceptuelle W formée par une matrice de Toeplitz triangulaire inférieure définie par la relation (4) :

Figure 00140001
According to the embodiment of the coding system according to the present invention, this consists in decomposing, not the short-term prediction filtering matrix, but the perceptual weighting matrix W formed by a lower triangular Toeplitz matrix defined by the relation (4):
Figure 00140001

Dans cette relation, w(n) désigne la réponse impulsionnelle du filtre de pondération perceptuelle W(z) de la fenêtre courante précédemment mentionnée.In this relation, w (n) denotes the response of the perceptual weighting filter W (z) of the current window previously mentioned.

Sur la figure 4a, on a représenté le schéma partiel d'un codeur prédictif par transformée et sur la figure 4b, le schéma équivalent correspondant dans lequel la matrice ou filtre de pondération perceptuelle W, désigné par 140, a été mise en évidence, un filtre de pondération perceptuelle inverse 121 ayant par contre été inséré entre le module de prédiction à long terme 13 et le circuit soustracteur 120. On indique que le filtre 140 réalise une combinaison linéaire des vecteurs de base obtenus à partir d'une décomposition en valeurs singulières de la matrice représentative du filtre de pondération perceptuelle W.In Figure 4a, there is shown the partial diagram of a predictive transform coder and in FIG. 4b, the corresponding equivalent scheme in which the matrix or perceptual weighting filter W, designated by 140, has been highlighted, a perceptual weighting filter reverse 121 having, however, been inserted between the long-term prediction 13 and the subtractor circuit 120. It is indicated that the filter 140 performs a combination linear of the basic vectors obtained from a decomposition into singular values of the matrix representative of the perceptual weighting filter W.

Ainsi que représenté sur la figure 4b, le signal S', correspondant au signal de parole à coder Sn auquel il a été soustrait la contribution de l'excitation passée délivrée par le module 12, ainsi que celle de la prédiction à long terme P and 1 / n filtrée par un module de pondération perceptuelle inverse de fonction de transfert (W(z))-1, est filtré par le filtre de pondération perceptuelle de fonction de transfert W(z), de façon à obtenir le vecteur P'.As shown in FIG. 4b, the signal S ', corresponding to the speech signal to be coded S n from which it has been subtracted the contribution of the past excitation delivered by the module 12, as well as that of the long-term prediction P and 1 / n filtered by a perceptual transfer weighting module with transfer function (W (z)) -1 , is filtered by the perceptual weighting filter with transfer function W (z), so as to obtain the vector P ' .

Cette opération de filtrage s'écrit : P' = WS' et peut être exprimée sous forme d'une combinaison linéaire de vecteurs de base en utilisant la décomposition en valeurs singulières de la matrice W.This filtering operation is written: P '= WS' and can be expressed as a linear combination of basic vectors using the decomposition into singular values of the matrix W.

En ce qui concerne le mode de réalisation du filtre de pondération perceptuelle 140, on indique que celui-ci comprend pour toute matrice W représentative du filtre de pondération perceptuelle un premier module matriciel U = (U1,...,UN) et un deuxième module matriciel V = (V1,...,VN).As far as the embodiment of the perceptual weighting filter 140 is concerned, it is indicated that this comprises for each matrix W representative of the perceptual weighting filter a first matrix module U = (U 1 , ..., U N ) and a second matrix module V = (V 1 , ..., V N ).

Le premier et le deuxième modules matriciels vérifient la relation : UTWV = D relation dans laquelle :

  • UT désigne le module matrice transposée du module U,
  • D est un module matrice diagonale dont le coefficients constituent les valeurs singulières précitées,
  • Ui et Vj désignent respectivement le ième vecteur singulier gauche et le jème vecteur singulier droit, les vecteurs singuliers droit { Vj} formant une base orthonormée.
The first and second matrix modules verify the relationship: U T WV = D relationship in which:
  • U T designates the matrix module transposed from the module U,
  • D is a diagonal matrix module whose coefficients constitute the aforementioned singular values,
  • U i and V j respectively designate the ith left singular vector and the jth right singular vector, the right singular vectors {Vj} forming an orthonormal base.

Une telle décomposition permet de remplacer l'opération de filtrage par produit de convolution par une opération de filtrage par une combinaison linéaire.Such a breakdown makes it possible to replace the operation of filtering by convolution product by an operation filtering by a linear combination.

On indique que la décomposition en valeurs singulières de la matrice de filtrage perceptuelle W permet d'obtenir les deux matrices unitaires U et V vérifiant la relation précitée où UTWV = diag(d1,...,dN) avec la propriété d'ordonnancement telle que di ≥ di+1 > 0. Les éléments di sont appelés valeurs singulières, et les vecteurs Ui et Vj, ième vecteur singulier gauche, respectivement jème vecteur singulier droit.We indicate that the decomposition into singular values of the perceptual filtering matrix W makes it possible to obtain the two unit matrices U and V verifying the above-mentioned relation where U T WV = diag (d 1 , ..., d NOT ) with the scheduling property such that d i ≥ d i + 1 > 0. The elements d i are called singular values, and the vectors U i and V j , ith left singular vector, respectively jth right singular vector.

La matrice W se décompose alors en une somme de matrices de rang 1, et vérifie la relation : The matrix W is then decomposed into a sum of matrices of rank 1, and verifies the relation:

La matrice V étant unitaire, les vecteurs singuliers droits { Vi} forment une base orthonormée et le signal S', exprimé sous la forme :

Figure 00160001
permet d'obtenir le vecteur P' vérifiant la relation :
Figure 00160002
   avec g(k) =
Figure 00160003
(k)dk.The matrix V being unitary, the right singular vectors {V i } form an orthonormal base and the signal S ', expressed in the form:
Figure 00160001
allows to obtain the vector P 'verifying the relation:
Figure 00160002
with g (k) =
Figure 00160003
(k) d k .

Par le processus de décomposition en valeurs singulières, on indique qu'un changement sur une composante de l'excitation S' associée à une petite valeur singulière produit un changement petit à la sortie du filtre 140 et vice-versa pour l'opération de filtrage perceptuel inverse effectuée par le module 121.By the process of decomposition into values singular, we indicate that a change on a component of excitation S 'associated with a small singular value produces a small change at the output of filter 140 and vice versa for the reverse perceptual filtering operation performed by module 121.

Afin d'utiliser ces propriétés, la matrice unitaire U peut être utilisée en tant que transformée orthonormée, vérifiant la relation : F = [f1 orth,...,fN orth], c'est à dire :    f i / orth = Ui   pour i = 1 à N.In order to use these properties, the unitary matrix U can be used as an orthonormal transform, checking the relation: F = [f 1 orth , ..., f NOT orth ], that is to say: fi / orth = U i for i = 1 to N.

Le signal perceptuel pondéré P' se décompose alors de la façon ci-après : G = UTP'. The weighted perceptual signal P 'then breaks down as follows: G = U T P '.

Après quantification vectorielle des gains G, le signal perceptuel pondéré modélisé P and est calculé de la manière ci-après : P' = F'G = U'G. After vector quantization of the gains G, the weighted perceptual signal modeled P and is calculated in the following manner: P '= F' G = U ' G .

On indique que les vecteurs singuliers gauches associés aux plus grandes valeurs singulières jouent un role prépondérant dans la modélisation du signal perceptuel pondéré P'. Ainsi, afin de modéliser ce dernier, il est possible de ne conserver que les composantes associées aux K valeurs singulières les plus grandes, K < N, c'est-à-dire les K premières composantes du vecteur de gain G vérifiant la relation : G = (g1,g2 ... gK, 0,...0). We indicate that the left singular vectors associated with the largest singular values play a preponderant role in the modeling of the weighted perceptual signal P '. Thus, in order to model the latter, it is possible to keep only the components associated with the K largest singular values, K <N, that is to say the K first components of the gain vector G verifying the relation: G = (g 1 , g 2 ... g K , 0, ... 0).

Le circuit de filtrage d'analyse à court terme 10 étant réactualisé sur des fenêtres de M échantillons, la décomposition en valeurs singulières de la matrice de pondération perceptuelle W est effectuée à la même fréquence.The short-term analysis filter circuit 10 being updated on windows of M samples, the decomposition into singular values of the matrix of W perceptual weighting is performed at the same frequency.

Des processus de décomposition en valeurs singulières d'une matrice quelconque permettant un traitement rapide ont été développés, mais les calculs restent relativement complexes.Decomposition processes into singular values any matrix allowing rapid processing have been developed, but the calculations remain relatively complex.

Conformément à un objet de la présente invention, il est, afin de simplifier les opérations de traitement précitées, proposé de construire une transformée orthonormée fixe sous optimale possédant cependant de bonnes propriétés perceptuelles, quelle que soit la fenêtre courante.In accordance with an object of the present invention, it is, in order to simplify the processing operations mentioned above, proposed to build an orthonormal transform fixed sub-optimal, however with good properties perceptual, regardless of the current window.

Dans un premier mode de réalisation, tel que représenté en figure 5, le processus de transformée orthonormée est construit par apprentissage. Dans un tel cas, le module de transformée orthonormée peut être formé par un sous-module de transformée stochastique construite par tirage d'une variable aléatoire gaussienne pour l'initialisation, ce sous-module comportant sur la figure 5 les étapes de processus 1000, 1001, 1002 et 1003 et étant noté SMTS. L'étape 1002 peut consister à appliquer l'algorithme de la K-moyenne sur le corpus de vecteur précité.In a first embodiment, such as shown in Figure 5, the orthonormal transform process is built by learning. In such a case, the orthonormal transform module can be formed by a stochastic transform submodule constructed by drawing of a Gaussian random variable for initialization, this sub-module comprising in FIG. 5 the steps 1000, 1001, 1002 and 1003 and being noted SMTS. Step 1002 can consist in applying the algorithm of the K-mean over the aforementioned vector corpus.

Le sous-module SMTS est suivi successivement d'un module 1004 de construction des centres, d'un module 1005 de construction des classes et, afin d'obtenir un vecteur G dont les composantes soient relativement ordonnées, d'un module 1006 de réordonnancement de la transformée selon le cardinal de chaque classe.The SMTS sub-module is successively followed by module 1004 for building centers, a module 1005 for construction of classes and, in order to obtain a vector G whose components are relatively ordered, of a transform reordering module 1006 according to the cardinal of each class.

Le module 1006 précité est suivi d'un module de calcul de Gram-Schmidt, noté 1007a, de façon à obtenir une transformée orthonormée. Au module 1007a précité est associé un module 1007b de calcul de l'erreur dans les conditions classiques de mise en oeuvre du processus de traitement de Gram-Schmidt.The aforementioned module 1006 is followed by a module of Gram-Schmidt calculation, noted 1007a, so as to obtain a orthonormal transform. The aforementioned module 1007a is associated a module 1007b for calculating the error under the conditions conventional implementation of the treatment process Gram-Schmidt.

Le module 1007a est lui-même suivi d'un module 1008 de test sur le nombre d'itération, ceci afin de permettre d'obtenir une transformée orthonormée effectuée hors ligne par apprentissage. Enfin, la mémoire 1009 de type mémoire morte permet de mémoriser la transformée orthonormée sous forme de vecteur de transformée. On indique que l'ordonnancement relatif des composantes du vecteur de gain G est accentué par le processus d'orthogonalisation. Lorsque le processus de construction par apprentissage a convergé, on obtient une transformée orthonormée dont les formes d'ondes sont graduellement corrélées avec le corpus d'apprentissage des vecteurs délivrés par l'étape 1001 de transformée initiale.The 1007a module is itself followed by a 1008 module test on the number of iterations, this to allow obtain an orthonormal transform performed offline by learning. Finally, memory 1009 of memory type dead allows to memorize the orthonormal transform under transform vector shape. It is indicated that the scheduling relative of the components of the gain vector G is accentuated by the orthogonalization process. When the learning-building process has converged, we obtains an orthonormal transform whose waveforms are gradually correlated with the learning corpus vectors delivered by step 1001 of transform initial.

La figure 5b représente l'ordonnancement des composantes du vecteur de gain G, c'est-à-dire de la valeur G moyenne normalisée pour une transformée obtenue d'une part par décomposition en valeurs singulières de la matrice de pondération perceptuelle W, et d'autre part, par apprentissage. La transformée F obtenue par cette dernière méthode pour celles des formes d'ondes orthonormées dont les spectres en fréquence sont passe-bandes et relativement ordonnés en fonction de k, ce qui permet d'attribuer à cette transformée des propriétés pseudo-fréquentielles. Une évaluation de la qualité de transformation en termes de concentration d'énergie a permis de montrer que, à titre indicatif, sur un corpus de 38 000 vecteurs perceptuels P', le gain de transformation est de 10.35 décibels pour la transformée optimale de Karhunen-Loeve, et de 10.29 décibels pour une transformée construite par apprentissage, cette dernière tendant donc vers la transformée optimale en termes de concentration d'énergie.Figure 5b shows the scheduling of components of the gain vector G, i.e. the value G normalized mean for a transform obtained on the one hand by decomposition into singular values of the matrix of perceptual weighting W, and on the other hand, by learning. The transform F obtained by this last method for those of orthonormal waveforms whose frequency spectra are bandpass and relatively ordered as a function of k, which allows to attribute to this transformed pseudo-frequency properties. A evaluation of the quality of processing in terms of energy concentration has shown that indicative, on a corpus of 38,000 perceptual vectors P ', the transformation gain is 10.35 decibels for the Karhunen-Loeve transform, and 10.29 decibels for a transform built by learning, this last therefore tending towards the optimal transform in terms of energy concentration.

Ainsi que précédemment mentionné dans la description, la transformée orthonormée F peut être obtenue selon deux méthodes différentes.As previously mentioned in the description, the orthonormal transform F can be obtained according to two different methods.

En observant que généralement la forme d'onde la plus corrélée avec le signal perceptuel P est celle issue du dictionnaire adaptatif, il est possible d'envisager de réaliser une transformée orthonormée adaptative F' dont f' 1 / orth est égal à la forme d'onde optimale issue du dictionnaire adaptatif fj(0) normalisé, la première composante du vecteur de gain G étant alors égale au gain de prédiction à long terme normalisé g(0), qu'il n'est pas nécessaire de recalculer, puisque celui-ci a été quantifié lors de cette prédiction.By observing that generally the waveform most correlated with the perceptual signal P is that resulting from the adaptive dictionary, it is possible to envisage carrying out an adaptive orthonormal transform F 'whose f' 1 / orth is equal to the form d optimal wave from the normalized adaptive dictionary f j (0), the first component of the gain vector G then being equal to the normalized long-term prediction gain g (0), which it is not necessary to recalculate, since this was quantified during this prediction.

La nouvelle dimension du vecteur de gain G devient alors égale à N-1, ce qui permet d'augmenter le nombre d'éléments binaires par échantillon lors de la quantification vectorielle de celui-ci et donc la qualité de sa modélisation.The new dimension of the gain vector G becomes then equal to N-1, which increases the number bits per sample during quantization vector of it and therefore the quality of its modelization.

Une première solution pour calculer la transformée F' peut alors consister à faire une analyse de prédiction à long terme, à décaler la transformée obtenue par apprentissage d'un cran, de placer le prédicteur à long terme à la première position, puis d'appliquer l'algorithme de Gram-Schmidt, afin d'obtenir une nouvelle transformée F'.A first solution to calculate the transform F 'can then consist in making a prediction analysis at long term, to shift the transform obtained by learning up a notch, to place the long-term predictor at the first position, then apply the Gram-Schmidt algorithm, in order to obtain a new transform F '.

Une seconde solution, plus avantageuse, consiste à utiliser une transformation permettant de faire pivoter la base orthonormée, afin que la première forme d'onde coincide avec le prédicteur à long terme, c'est-à-dire : F' = TF avec f'1 orth = fj(0) fj(0) = Tf1 orth A second, more advantageous solution consists in using a transformation making it possible to rotate the orthonormal base, so that the first waveform coincides with the long-term predictor, that is to say: F '= TF with f ' 1 orth = f d (0) f d (0) = Tf 1 orth

Dans le but de conserver la propriété d'orthogonalité, la transformation utilisée doit conserver le produit scalaire. Une transformation particulièrement adaptée est la transformée de Householder vérifiant la relation : T = I - 2 BBT BTB    avec B = fj(0) - |fj(0)| - f 1 / orth.In order to keep the orthogonality property, the transformation used must keep the dot product. A particularly suitable transformation is the Householder transform verifying the relation: T = I - 2 BB T B T B with B = f j (0) - | f j (0) | - f 1 / orth.

Une représentation géométrique de la transformée précitée est donnée en figures 6a et 6b.A geometric representation of the transform above is given in Figures 6a and 6b.

Pour une définition plus détaillée de ce type de transformation, on pourra utilement se reporter à la publication de Alan O.Steinhardt intitulée "Householder Transforms in Signal Processing", IEEE ASSP Magazine, July 1988, pp 4-12.For a more detailed definition of this type of transformation, we can usefully refer to the Alan O.Steinhardt's publication "Householder Transforms in Signal Processing ", IEEE ASSP Magazine, July 1988, pp 4-12.

Par l'utilisation de cette transformation, il est possible de réduire la complexité des calculs et la projection du signal perceptuel P dans cette nouvelle base s'écrit: G = F'TP = FTTP = FTP''    avec P' = TP = (P-B[wBTP]).By using this transformation, it is possible to reduce the complexity of the calculations and the projection of the perceptual signal P in this new base is written: G = F ' T P = F T TP = F T P '' with P '= TP = (PB [wB T P]).

Dans cette relation, w désigne un scalaire égal à w = 2/BTB.In this relation, w denotes a scalar equal to w = 2 / B T B.

On indique que dans ce mode de réalisation de la transformée orthonormée, la transformation n'est appliquée qu'au signal perceptuel P, et le signal perceptuel modélisé P and peut être alors calculé par la transformation inverse.It is indicated that in this embodiment of the orthonormal transform, the transformation is not applied than to the perceptual signal P, and the modeled perceptual signal P and can then be calculated by the inverse transformation.

Un mode de réalisation particulièrement avantageux du module de transformée orthonormée proprement dit 14 dans le cas où une transformation de Householder est utilisée sera maintenant décrit en liaison avec la figure 7.A particularly advantageous embodiment of the orthonormal transform module itself 14 in the case where a Householder transformation is used will now be described in conjunction with FIG. 7.

Ainsi qu'on l'a représenté sur la figure 7 précitée, le module 14 de transformation adaptative peut comporter un module 140 de transformation de Householder recevant le signal perceptuel estimé constitué par la forme d'onde optimale et par le gain estimé et le signal perceptuel P pour engendrer un signal perceptuel transformé P''. On indique que le module 140 de transformation de Householder comporte un module de calcul 1401 des paramètres B et wB tels que définis précédemment par la relation 13. Il comporte également un module 1402 comprenant un multiplicateur et un soustracteur permettant de réaliser la transformation proprement dite selon la relation 14. On indique que le signal perceptuel transformé P'' est délivré sous forme de vecteur de signal perceptuel transformé de composante P''k, avec k ∈ [0,N-1].As shown in FIG. 7 above, the adaptive transformation module 14 can include a Householder transformation module 140 receiving the estimated perceptual signal constituted by the optimal waveform and by the estimated gain and the signal perceptual P to generate a transformed perceptual signal P ''. It is indicated that the Householder transformation module 140 comprises a module 1401 for calculating parameters B and wB as defined previously by relation 13. It also includes a module 1402 comprising a multiplier and a subtractor making it possible to carry out the transformation proper according to the relation 14. It is indicated that the transformed perceptual signal P '' is delivered in the form of a vector of perceptual transformed signal of component P '' k , with k ∈ [0, N-1].

Le module de transformation adaptative 14 tel que rerpésenté en figure 7 comprend également une pluralité N de registres de mémorisation des formes d'ondes orthonormées, le registre courant étant noté r, avec r ∈ [1,N]. On indique que les N registres de mémorisation précités forment la mémoire morte précédement décrite dans la description, chaque registre comportant N cellules de mémorisation, chaque composante de rang k de chaque vecteur, composante notée f 1    / orth(k) étant mémorisée dans une cellule de rang correspondant du registre courant r considéré.The adaptive transformation module 14 such as represented in FIG. 7 also includes a plurality N of registers for memorizing orthonormal waveforms, the current register being noted r, with r ∈ [1, N]. We indicate that the aforementioned N storage registers form the read-only memory previously described in the description, each register comprising N storage cells, each component of rank k of each vector, component denoted f 1 / orth (k) being stored in a row cell correspondent of the current register r considered.

En outre, ainsi qu'on l'observera sur la figure 7, le module 14 comprend une pluralité de N circuits multiplicateurs associés à chaque registre de rang r formant la pluralité des registres de mémorisation précédemment mentionnés. En outre, chaque registre multiplicateur de rang k reçoit d'une part la composante de rang k du vecteur mémorisé et d'autre part la composante P''k du vecteur de signal perceptuel transformé de rang k correspondant. Le circuit multiplicateur Mrk délivre le produit P''k.f k    / orth(k) des composantes de signal perceptuel transformé.In addition, as will be seen in FIG. 7, the module 14 comprises a plurality of N multiplier circuits associated with each register of rank r forming the plurality of the previously mentioned storage registers. In addition, each rank multiplier register k receives on the one hand the component of rank k of the stored vector and on the other hand the component P '' k of the transformed perceptual signal vector of corresponding rank k. The Mrk multiplier circuit delivers the product P '' k .fk / orth (k) of the components of the transformed perceptual signal.

Enfin, une pluralité de N-1 circuits sommateurs est associée à chaque registre de rang r, chaque circuit sommateur de rang k, noté Srk, recevant le produit de rang antérieur k-1, et le produit de rang correspondant k délivré par le circuit multiplicateur Mrk de même rang k. Le circuit sommateur de rang le plus élevé, SrN-1, délivre alors une composante g(r) du gain estimé exprimé sous forme de vecteur de gain G.Finally, a plurality of N-1 summing circuits is associated with each register of rank r, each circuit summator of rank k, noted Srk, receiving the product of rank anterior k-1, and the product of corresponding rank k delivered by the multiplier circuit Mrk of the same rank k. The circuit summator of highest rank, SrN-1, then delivers a component g (r) of the estimated gain expressed as a vector G gain

On indique que le système de codage prédictif utilisant la transformée orthonormée adaptative construite par apprentissage est susceptible de donner de meilleurs résultats, alors que la transformation de Householder permet d'obtenir une complexité réduite.It is indicated that the predictive coding system using the adaptive orthonormal transform constructed by learning is likely to give better results, while the transformation of Householder allows to obtain reduced complexity.

Ainsi qu'on l'observera sur la figure 2, le module de modélisation progressive par vecteurs orthogonaux comporte en fait un module 15 de normalisation du vecteur de gain pour engendrer un vecteur de gain normalisé, noté Gk, par comparaison de la valeur normée du vecteur de gain G par rapport à une valeur de seuil. Ce module de normalisation 15 permet d'engendrer en outre un signal de longueur du vecteur de gain normalisé lié à l'ordre de modélisation k vers le système décodeur en fonction de cet ordre de modélisation.As will be observed in FIG. 2, the progressive modeling module by orthogonal vectors in fact comprises a module 15 for normalizing the gain vector to generate a normalized gain vector, denoted G k , by comparison of the normalized value of the gain vector G with respect to a threshold value. This normalization module 15 also makes it possible to generate a signal of length of the normalized gain vector linked to the modeling order k to the decoder system as a function of this modeling order.

Le module de modélisation progressive par vecteurs orthogonaux comporte en outre, en cascade avec le module 15 de normalisation du vecteur de gain, un étage 16 de modélisation progressive par vecteurs orthogonaux. Cet étage de modélisation 16 reçoit du vecteur normalisé Gk et délivre les index représentatifs du signal de parole codée, ces index étant notés I(1), J(1), ces index étant représentatifs des vecteurs sélectionnés et de leur gain associé. La transmission des données auxiliaires formées par les index est effectuée en écrasant les parties de la trame allouée aux indices et numéros de plages pour former le signal de données auxiliaires.The progressive vector modeling module orthogonal further comprises, in cascade with the module 15 normalization of the gain vector, a stage 16 of progressive modeling by orthogonal vectors. This floor model 16 receives the normalized vector Gk and delivers the indexes representative of the coded speech signal, these indexes being denoted I (1), J (1), these indexes being representative selected vectors and their associated gain. The transmission of auxiliary data formed by indexes is performed by overwriting the parts of the allocated frame to the indices and track numbers to form the signal auxiliary data.

Le fonctionnement du module de normalisation 15 est le suivant.The operation of the standardization module 15 is the following.

L'énergie du signal perceptuel donnée par |P'|2 = |G|2 est constante pour une sous fenêtre donnée. Dans ces conditions, maximiser cette énergie est équivalent à minimiser l'expression :

Figure 00220001
où Gk = (0,g2,g3,...,gk,0,...0).The energy of the perceptual signal given by | P '| 2 = | G | 2 is constant for a given sub window. Under these conditions, maximizing this energy is equivalent to minimizing the expression:
Figure 00220001
where G k = (0, g 2 , g 3 , ..., g k , 0, ... 0).

On indique que lors d'une telle opération, une façon supplémentaire d'augmenter le nombre d'éléments binaires par échantillon lors de la quantification vectorielle du vecteur G est d'utiliser le critère normalisé suivant, consistant à choisir K tel que :

Figure 00220002
It is indicated that during such an operation, an additional way of increasing the number of binary elements per sample during the vector quantization of the vector G is to use the following normalized criterion, consisting in choosing K such that:
Figure 00220002

Le vecteur de gain ainsi obtenu GK est alors quantifié et sa longueur k est transmise par le système de codage objet de l'invention afin d'être prise en compte par le système de décodage correspondant, ainsi qu'il sera décrit ultérieurement dans la description.The gain vector thus obtained G K is then quantified and its length k is transmitted by the coding system object of the invention in order to be taken into account by the corresponding decoding system, as will be described later in the description.

Le critère normalisé moyen en fonction de l'ordre de modélisation K est donné en figure 8a pour une transformée orthonormée obtenue d'une part par décomposition en valeurs singulières de la matrice de pondération perceptuelle W et d'autre part par apprentissage.The average standard criterion according to the order of modeling K is given in figure 8a for a transform orthonormal obtained on the one hand by decomposition into values singulars of the perceptual weighting matrix W and on the other hand by learning.

Un mode de réalisation particulièrement avantageux du module de modélisation progressive par vecteurs orthogonaux 16 sera maintenant donné en liaison avec la figure 8b. Le module précité permet de réaliser en fait une quantification vectorielle multiétage.A particularly advantageous embodiment of the progressive modeling module using orthogonal vectors 16 will now be given in connection with FIG. 8b. The aforementioned module allows in fact to perform a quantification vector illustration.

Le vecteur de gain G and est obtenu par combinaison linéaire de vecteurs, notée Ψj K = (0,Ψ2 j3 j,...,ΨK j,0,0...0). The gain vector G and is obtained by linear combination of vectors, noted Ψ j K = (0, Ψ 2 j , Ψ 3 j , ..., Ψ K j , 0.0 ... 0).

Ces vecteurs étant issus de dictionnaires stochastiques, notés 161, 162, 16 L, construits soit par tirage d'une variable aléatoire gaussienne, soit par apprentissage. Le vecteur de gain estimé G and vérifie la relation :

Figure 00230001
These vectors being derived from stochastic dictionaries, noted 161, 162, 16 L, constructed either by drawing a Gaussian random variable, or by learning. The estimated gain vector G and verifies the relation:
Figure 00230001

Dans cette relation, 1 est le gain associé au vecteur optimal Ψ j(1) / K issu du dictionnaire stochastique de rang l, noté 16 l.In this relation,  1 is the gain associated with the optimal vector Ψ j (1) / K from the stochastic dictionary of rank l, noted 16 l.

Toutefois, les vecteurs sélectionnés itérativement ne sont généralement pas linéairement indépendants et ne forment donc pas une base. Dans un tel cas, le sous-espace engendré par les L vecteurs optimaux Ψ j(L) / K    est de dimension inférieure à L.However, the vectors selected iteratively are generally not linearly independent and do not therefore not form a basis. In such a case, the subspace generated by the L optimal vectors Ψ j (L) / K is of dimension less than L.

Sur la figure 9 on a représenté la projection du vecteur G sur le sous-espace engendré par les vecteurs optimaux de rang l, respectivement l-1, cette projection étant optimale lorsque les vecteurs précités sont orthogonaux. In FIG. 9 the projection of the vector G on the subspace generated by the vectors optimal of rank l, respectively l-1, this projection being optimal when the aforementioned vectors are orthogonal.

Il est donc particulièrement avantageux d'orthogonaliser le dictionnaire stochastique de rang l par rapport au vecteur optimal de l'étage de rang précédent Ψ j(l-1) / K.It is therefore particularly advantageous to orthogonalize the stochastic dictionary of rank l with respect to optimal vector of the stage of previous rank Ψ j (l-1) / K.

Ainsi, quel que soit le vecteur optimal de rang l issu du nouveau dictionnaire ou étage de rang correspondant l, celui-ci sera orthogonal au vecteur optimal Ψ j(l-1) / Kde rang antérieur, et l'on obtient : ψj orth(l+1) = ψj orth(l) - r(l,j) ψj(l) orth(l) √αj(l) l Thus, whatever the optimal vector of rank l from the new dictionary or stage of corresponding rank l, it will be orthogonal to the optimal vector Ψ j (l-1) / K of previous rank, and we obtain: ψ j orth (l + 1) = ψ j orth (l) - r (l, j) ψ j (l) orth (l) √α j (l) l

Dans cette relation, on indique que : αj(l) l= ψj(l) orth(l) 2 correspond à l'énergie de l'onde sélectionnée à l'étape l, r(l,j) = < ψj(l) orth(l) √αj(l) l , ψj orth(l) > représente l'intercorrélation des vecteurs optimaux de rang j et de rang j(l) et TK = IN - ψj(l) orth(l) √αj(l) l j orth(l))T représente la matrice d'orthogonalisation.In this relation, we indicate that: α j (l) l = ψ j (l) orth (l) 2 corresponds to the energy of the wave selected in step l, r (l, j) = < ψ j (l) orth (l) √α j (l) l , ψ j orth (l) > represents the intercorrelation of the optimal vectors of rank j and of rank j (l) and T K = I NOT - ψ j (l) orth (l) √α j (l) l j orth (l) ) T represents the orthogonalization matrix.

L'opération précédente permet de retirer du dictionnaire la contribution de l'onde précédemment sélectionnée et impose ainsi une indépendance linéaire pour tout vecteur optimal de rang i compris entre l+1 et L par rapport aux vecteurs optimaux de rang inférieur.The previous operation allows to remove from the dictionary the contribution of the previously selected wave and thus imposes linear independence for any vector optimal of rank i between l + 1 and L compared to optimal vectors of lower rank.

Des schémas de principe de la quantification vectorielle par modélisation progressive orthogonale sont donnés aux figures 10a et 10b selon qu'il existe un ou plusieurs dictionnaires stochastiques.Principle diagrams of quantification vector by orthogonal progressive modeling are given in Figures 10a and 10b depending on whether there is one or several stochastic dictionaries.

Afin de réduire la complexité du processus de quantification vectorielle, on indique que l'algorithme de Gram-Schmidt modifié récursif peut être utilisé ainsi que proposé par N.Moreau, P.Dymarski, A.Vigier, dans la publication intitulée : "Optimal and Suboptimal Algorithms for Selecting the Excitation in Linear Predictive Products", Proc. ICASSP 90, pp 485-488.In order to reduce the complexity of the vector quantization, we indicate that the Recursive modified Gram-Schmidt can be used as well as proposed by N. Moreau, P.Dymarski, A.Vigier, in the publication titled: "Optimal and Suboptimal Algorithms for Selecting the Excitation in Linear Predictive Products ", Proc. ICASSP 90, pp 485-488.

Compte tenu des propriétés d'orthogonalisation, on montre que :

Figure 00250001
Given the orthogonalization properties, we show that:
Figure 00250001

Compte tenu de cette expression, l'algorithme de Gram-Schmidt modifié récursif tel que proposé précédemment peut être utilisé.Given this expression, the algorithm of Recursive modified Gram-Schmidt as previously proposed can be used.

Il n'est alors plus nécessaire de recalculer explicitement les dictionnaires à chaque étape de l'orthogonalisation.It is no longer necessary to recalculate explicitly the dictionaries at each stage of orthogonalisation.

Le processus de calcul précité peut être explicité sous forme matricielle à partir de la matrice

Figure 00250002
The above calculation process can be explained in matrix form from the matrix
Figure 00250002

On indique que Q est une matrice orthonormée, et R une matrice triangulaire supérieure dont les éléments de la diagonale principale sont tous positifs, ce qui assure l'unicité de la décomposition.We indicate that Q is an orthonormal matrix, and R an upper triangular matrix whose elements of the main diagonal are all positive, which ensures the uniqueness of the decomposition.

Le vecteur de gain G vérifie la relation matricielle :

Figure 00250003
ce qui implique que R =
Figure 00250004
.The gain vector G verifies the matrix relation:
Figure 00250003
which implies that R =
Figure 00250004
.

La matrice R triangulaire supérieure permet ainsi de calculer récursivement les gains (k) relatifs à la base d'origine.The upper triangular matrix R thus makes it possible to recursively calculate the gains  (k) relative to the base of origin.

La contribution des vecteurs optimaux à la base orthonormée, notée : {Ψ j(l) / orth(L)} dans la modélistion du vecteur de gain GK a tendance à décroítre, et les gains {

Figure 00260001
} sont ordonnés de façon décroissante. Le résidu peut être modélisé de façon graduelle de la façon ci-après où  and cod / k désigne le gains associé au vecteur optimal orthogonal Ψ j(k) / orth(k) quantifié, compte tenu des relations :
Figure 00260002
Figure 00260003
Figure 00260004
G = G 1 + G 2 + G 3 (Débit le plus élevé) G = G 1 + G 2 (Débit intermédiaire) G = G 1 (Débit le plus faible)    avec 1 ≤ L1 ≤ L2 ≤ L.The contribution of the optimal vectors to the orthonormal base, noted: {Ψ j (l) / orth (L)} in the modelization of the gain vector G K tends to decrease, and the gains {
Figure 00260001
} are ordered in descending order. The residue can be modeled gradually in the following way where  and cod / k denotes the gains associated with the optimal orthogonal vector Ψ j (k) / orth (k) quantified, taking into account the relationships:
Figure 00260002
Figure 00260003
Figure 00260004
G = G 1 + G 2 + G 3 (Highest speed) G = G 1 + G 2 (Intermediate flow) G = G 1 (Lowest flow) with 1 ≤ L 1 ≤ L 2 ≤ L.

On obtient alors les vecteurs de gain G and1, G and2, G and3 orthogonaux dont la contribution dans la modélisation du vecteur de gain G est décroissante, ce qui permet la modélisation graduelle du résidu rn de manière efficace. Les paramètres transmis par le système de codage objet de l'invention pour la modélisation du vecteur de gain G sont alors les indices j(l) des vecteurs sélectionnés ainsi que les numéros i(l) des plages de quantification de leurs gains associés,

Figure 00260005
. La transmission des données se fait alors en écrasant les parties de la trame allouées aux indices et numéros de plages j(l), i(l), pour l ∈ [L1, L2-1] et [L2,L] selon les besoins de la communication.We then obtain the gain vectors G and 1 , G and 2 , G and 3 orthogonal whose contribution in the modeling of the gain vector G is decreasing, which allows the gradual modeling of the residue r n effectively. The parameters transmitted by the coding system which is the subject of the invention for modeling the gain vector G are then the indices j (l) of the selected vectors as well as the numbers i (l) of the ranges of quantification of their associated gains,
Figure 00260005
. Data transmission is then done by overwriting the parts of the frame allocated to the indices and track numbers j (l), i (l), for l ∈ [L1, L2-1] and [L2, L] as required. of communication.

Le processus de traitement précédemment mentionné utilise l'algorithme de Gram-Schmidt modifié récursif afin de coder le vecteur de gain G. Les paramètres transmis par le système de codage selon l'invention étant les indices précités, j(0) à j(L) des différents dictionnaires ainsi que les gains quantifiés g(0) et {

Figure 00260006
}, il est nécessaire de coder les différents gains précités g(0) et { }. Une étude a montré que les gains relatifs à la base orthogonale {Ψ j(l)    / orth(L)} étant décorrélés, ceux-ci possèdent de bonnes propriétés pour leur quantification. En outre, la contribution des vecteurs optimaux dans la modélisation du vecteurs de gain G ayant tendance à décroítre, les gains { } sont ordonnés de façon relativement décroissantes, et il est possible d'utiliser cette propriété en codant non pas les gains précités, mais leur rapport donné par
Figure 00270001
Plusieurs solutions peuvent être utilisées pour coder les rapports précités.The previously mentioned processing process uses the recursive modified Gram-Schmidt algorithm in order to code the gain vector G. The parameters transmitted by the coding system according to the invention being the aforementioned indices, j (0) to j (L ) of the different dictionaries as well as the quantified gains g (0) and {
Figure 00260006
}, it is necessary to code the various aforementioned gains g (0) and { }. A study has shown that the gains relative to the orthogonal base {Ψ j (l) / orth (L)} being decorrelated, these have good properties for their quantification. In addition, the contribution of the optimal vectors in the modeling of the gain vectors G tending to decrease, the gains { } are ordered in a relatively decreasing order, and it is possible to use this property by coding not the aforementioned gains, but their ratio given by
Figure 00270001
Several solutions can be used to code the aforementioned reports.

Ainsi qu'on le remarquera sur la figure 2, le dispositif de codage objet de la présente invention comporte un module de modélisation de l'excitation du filtre de synthèse correspondant au débit le plus faible, ce module étant noté 17 sur la figure précitée.As will be seen in Figure 2, the coding device object of the present invention comprises a module for modeling the excitation of the filter summary corresponding to the lowest throughput, this module being noted 17 in the aforementioned figure.

Le schéma de principe de calcul du signal d'excitation du filtre de synthèse correspondant au débit le plus faible est donné en figure 11. Une transformation inverse est appliquée aux vecteurs de gain modélisés G and1, cette transformation adaptative inverse pouvant par exemple correspondre à une transformation inverse de type Householder, laquelle sera décrite ultérieurement dans la decription, en liaison avec le dispositif de décodage objet de la présente invention. Le signal obtenu après transformation adaptative inverse est ajouté au signal de prédiction à long terme B' 1 / n au moyen d'un sommateur 171, le signal perceptuel estimé ou signal de prédiction à long terme étant délivré par le circuit 13 de prédiction à long terme en boucle fermée. Le signal résultant délivré par le sommateur 171 est filtré par un filtre 172, lequel correspond du point de vue de la fonction de transfert au filtre 131 de la figure 3. Le filtre 172 délivre le signal résiduel modélisé r and 1 / n.The principle diagram for calculating the excitation signal of the synthesis filter corresponding to the lowest bit rate is given in FIG. 11. An inverse transformation is applied to the gain vectors modeled G and 1 , this inverse adaptive transformation can for example correspond to a reverse transformation of the Householder type, which will be described later in the description, in conjunction with the decoding device which is the subject of the present invention. The signal obtained after inverse adaptive transformation is added to the long-term prediction signal B '1 / n by means of a summator 171, the estimated perceptual signal or long-term prediction signal being delivered by the long-term prediction circuit 13 closed loop term. The resulting signal delivered by the adder 171 is filtered by a filter 172, which corresponds from the point of view of the transfer function to the filter 131 of FIG. 3. The filter 172 delivers the residual signal modeled r and 1 / n.

Un système de décodage prédictif par transformée adaptative à codes imbriqués d'un signal numérique codé constitué par un signal de parole codée, et le cas échéant, par un signal de données auxiliaires inséré au signal de parole codé après codage de ce dernier sera maintenant décrit en liaison avec la figure 12.A transformative predictive decoding system adaptive to nested codes of a coded digital signal consisting of a coded speech signal, and where appropriate, by an auxiliary data signal inserted into the signal speech coded after coding of the latter will now described in conjunction with Figure 12.

Selon la figure précitée, le système de décodage comprend un circuit 20 d'extraction du signal de données permettant d'une part l'extraction des données en vue d'une utilisation auxiliaire, par une sortie des données auxiliaires et, d'autre part, la transmission d'index représentatifs du signal de parole codé. On comprend bien sûr que les index précités sont les index i(l) et j(l), pour l compris entre 0 et L1-1 précédemment décrits dans la description et pour l compris entre l1 et L dans les conditions qui seront décrites ci-après. Ainsi qu'on l'a représenté en outre en figure 12, le système de décodage selon l'invention comprend un circuit 21 de modélisation du signal de parole au débit minimum, ainsi qu'un circuit 22 ou 23 de modélisation du signal de parole à au moins un débit supérieur au débit minimum précité.According to the aforementioned figure, the decoding system comprises a circuit 20 for extracting the data signal allowing on the one hand the extraction of the data for an auxiliary use, by an output of the auxiliary data and, on the other hand , the transmission of indexes representative of the coded speech signal. It is of course understood that the aforementioned indexes are the indices i (l) and j (l), for l between 0 and L 1 -1 previously described in the description and for l between l 1 and L under the conditions which will be described below. As shown in addition in FIG. 12, the decoding system according to the invention comprises a circuit 21 for modeling the speech signal at the minimum bit rate, as well as a circuit 22 or 23 for modeling the speech signal at at least one flow greater than the aforementioned minimum flow.

Dans un mode de réalisation préférentiel, tel que représenté en figure 12, le système de décodage selon l'invention comporte, outre le système d'extraction des données 20, un premier module 21 de modélisation du signal de parole au débit minimum recevant directement le signal codé et délivrant un premier signal de parole estimé, noté S and 1 / n et un deuxième module 22 de modélisation du signal de parole à un débit intermédiaire connecté au système d'extraction 20 des données par l'intermédiaire d'un circuit 27 de commutation conditionnelle sur critère du débit réel alloué au signal de parole et délivrant un deuxième signal de parole estimé, noté S and 2 / n.In a preferred embodiment, such as represented in FIG. 12, the decoding system according to the invention comprises, in addition to the system for extracting data 20, a first signal modeling module 21 speech at minimum bit rate receiving signal directly coded and delivering a first estimated speech signal, noted S and 1 / n and a second module 22 for modeling the signal of speech at an intermediate rate connected to the extraction system 20 data via a circuit 27 conditional switching on actual flow criteria allocated to the speech signal and delivering a second signal estimated speech, noted S and 2 / n.

Le système de décodage représenté en figure 12 comporte également un troisième module de modélisation 23 du signal de parole à un débit maximum, ce module étant connecté au système d'extraction des données 20 par l'intermédiaire d'un circuit 28 de commutation conditionnelle sur critère du débit réel alloué à la parole et délivrant un troisième signal de parole estimé S and 3 / n. The decoding system shown in Figure 12 also includes a third modeling module 23 of the speech signal at maximum rate, this module being connected to the data extraction system 20 via of a conditional switching circuit 28 on criterion of the actual bit rate allocated to speech and delivering a third estimated speech signal S and 3 / n.

En outre, un circuit sommateur 24 reçoit le premier, le deuxième et le troisième signal de parole estimé, et délivre à sa sortie un signal de parole estimé résultant, noté S andn. En sortie du circuit sommateur 24 sont connectés en cascade un circuit de filtrage adaptatif 25 recevant le signal de parole estimé résultant S andn et délivrant un signal de parole estimé reconstitué, noté S and'n. Un convertisseur numérique-analogique 26 peut être prévu pour recevoir le signal de parole reconstitué et pour délivrer un signal de parole reconstitué audio-fréquence.In addition, a summing circuit 24 receives the first, the second and the third estimated speech signal, and delivers at its output a resulting estimated speech signal, noted S and n . At the output of the summing circuit 24 are connected in cascade an adaptive filtering circuit 25 receiving the resulting estimated speech signal S and n and delivering a reconstituted estimated speech signal, denoted S and ' n . A digital-to-analog converter 26 may be provided to receive the reconstructed speech signal and to output an audio-frequency reconstituted speech signal.

Selon une caractéristique particulièrement avantageuse du dispositif de décodage objet de la présente invention, chacun des modules de modélisation du signal de parole à un débit minimum, intermédiaire et maximum, c'est-à-dire les modules 21, 22 et 23 de la figure 12, comprend un sous-module de transformation adaptative inverse, suivi d'un filtre de pondération perceptuelle inverse.According to a particularly advantageous characteristic of the decoding device which is the subject of this invention, each of the signal modeling modules of speech at a minimum, intermediate and maximum rate, i.e. modules 21, 22 and 23 of Figure 12, includes a inverse adaptive transformation sub-module, followed by a inverse perceptual weighting filter.

Le schéma de principe du module de modélisation du signal de parole au débit minimum est donné en figure 13a.The block diagram of the modeling module of the speech signal at minimum rate is given in figure 13a.

D'une manière générale, le système de décodage objet de la présente invention prend en compte les contraintes imposées par la transmission des données au niveau du système de codage et notamment au niveau du dictionnaire adaptatif, ainsi que la contribution de l'excitation passée.In general, the object decoding system of the present invention takes into account the constraints imposed by data transmission at the level of coding system and in particular at the dictionary level adaptive, as well as the contribution of past excitement.

Le circuit de modélisation du signal de parole au débit minimum 21 est identique à celui décrit relativement au circuit 17 du système de codage selon l'invention à partir d'un module de transformation adaptative inverse semblable au module 170 décrit en relation avec la figure 11. On note simplement que sur la figure 13a, on a explicité l'obtention du signal perceptuel P and 1 / n partir des index {i(0),j(0)}, de l'ordre de modélisation K et des indices i(l),j(l) pour l = 1 à L1-1.The speech signal modeling circuit at minimum flow 21 is identical to that described relatively to circuit 17 of the coding system according to the invention at from an inverse adaptive transformation module similar to module 170 described in relation to the figure 11. We simply note that in Figure 13a, we have explained obtaining the perceptual signal P and 1 / n from the indexes {i (0), j (0)}, of the order of modeling K and indices i (l), j (l) for l = 1 at L1-1.

En ce qui concerne la transformation adaptative inverse, un mode de réalisation avantageux de celle-ci est représenté en figure 13b. On indique que le mode de réalisation représenté en figure 13b correspond à une transformée de type Householder inverse utilisant des éléments identiques à la transformée de Householder représentée en figure 7. On indique simplement que pour un signal perceptuel délivré par le circuit de prédiction à long terme 13, ce signal étant noté P and1 entrant dans un module semblable 140, les signaux entrant dans le module 1402, respectivement au niveau des multiplicateurs associés à chaque registre, sont inversés. Le signal résultant délivré par le sommateur correspondant au sommateur 171 de la figure 11 est filtré par un filtre de fonction de transfert inverse de la fonction de transfert de la matrice de pondération perceptuelle et correspondant au filtre 172 de la même figure 11.As regards the inverse adaptive transformation, an advantageous embodiment of this is shown in FIG. 13b. It is indicated that the embodiment represented in FIG. 13b corresponds to a reverse Householder type transform using elements identical to the Householder transform represented in FIG. 7. It is simply indicated that for a perceptual signal delivered by the long-term prediction circuit 13, this signal being denoted P and 1 entering a similar module 140, the signals entering the module 1402, respectively at the level of the multipliers associated with each register, are inverted. The resulting signal delivered by the summator corresponding to the summator 171 of FIG. 11 is filtered by a filter of inverse transfer function of the transfer function of the perceptual weighting matrix and corresponding to the filter 172 of the same FIG. 11.

Les modules de modélisation du signal de parole au débit intermédiaire ou au débit maximum, module 22 ou 23, sont représentés en figures 14a et 14b.The speech signal modeling modules at intermediate flow or at maximum flow, module 22 or 23, are shown in Figures 14a and 14b.

Bien entendu, il est possible pour des raisons de complexité de regrouper les différentes modélisations du signal de parole correspondant aux autres débits en un seul bloc tel que représenté sur la figure 14a et 14b. Selon le débit réel alloué à la parole, les vecteurs de gain modélisés G and2, G and3 sont additionnés, ainsi que représenté en figure 14b, par un sommateur 220, soumis au processus de transformation adaptative inverse dans un module 221 identique au module 210 de la figure 13a, puis filtrés par le filtre de pondération inverse W-1(z) précédemment mentionné, ce filtre étant désigné par 222, le filtrage partant de conditions initiales nulles, ce qui permet d'effectuer une opération équivalente à la multiplication par la matrice inverse W-1, afin d'obtenir une modélisation progressive du signal de synthèse S andn. On note sur la figure 14b la présence de dispositifs de commutation, lesquels ne sont autres que les dispositifs de commutation 24 et 28 représentés en figure 12, lesquels sont commandés en fonction du débit réel des données transmises.Of course, it is possible for reasons of complexity to group the different models of the speech signal corresponding to the other bit rates in a single block as shown in FIG. 14a and 14b. According to the actual bit rate allocated to speech, the gain vectors modeled G and 2 , G and 3 are added, as shown in FIG. 14b, by a summator 220, subjected to the process of inverse adaptive transformation in a module 221 identical to the module 210 of FIG. 13a, then filtered by the inverse weighting filter W -1 (z) previously mentioned, this filter being designated by 222, the filtering starting from zero initial conditions, which makes it possible to perform an operation equivalent to multiplication by the inverse matrix W -1 , in order to obtain a progressive modeling of the synthesis signal S and n . Note in FIG. 14b the presence of switching devices, which are none other than the switching devices 24 and 28 shown in FIG. 12, which are controlled as a function of the actual bit rate of the data transmitted.

Enfin, en ce qui concerne le filtre adaptatif 25, un mode de réalisation particulièrement avantageux est donné en figure 15. Ce filtre adaptatif permet d'améliorer la qualité perceptuelle du signal de synthèse S andn obtenu suite à la sommation par le sommateur 24. Un tel filtre comprend par exemple un module de post-filtrage à long terme noté 250, suivi d'un module de post-filtrage à court terme et d'un module 252 de contrôle de l'énergie, lequel est piloté par un module 253 de calcul du facteur d'échelle. Ainsi, le filtre adaptatif 25 délivre le signal S and'n filtré, ce signal correspondant au signal dans lequel le bruit de quantification introduit par le codeur sur le signal de parole synthétisé a été filtré dans les endroits du spectre où cela est possible. On indique que le schéma représenté en figure 15 correspond aux publications de J.H.Chen et A.Gersho, "Real Time Vector APC Speech Coding at 4800 Bps with Adaptative Postfiltering", ICASSP 87, Vol.3, pp 2185-2188.Finally, as regards the adaptive filter 25, a particularly advantageous embodiment is given in FIG. 15. This adaptive filter makes it possible to improve the perceptual quality of the synthesis signal S and n obtained following the summation by the summator 24. Such a filter comprises for example a long-term post-filtering module denoted 250, followed by a short-term post-filtering module and an energy control module 252, which is controlled by a module 253 for calculating the scale factor. Thus, the adaptive filter 25 delivers the filtered signal S and ' n , this signal corresponding to the signal in which the quantization noise introduced by the coder on the synthesized speech signal has been filtered in the places of the spectrum where this is possible. It is indicated that the diagram represented in figure 15 corresponds to the publications of JHChen and A.Gersho, "Real Time Vector APC Speech Coding at 4800 Bps with Adaptative Postfiltering", ICASSP 87, Vol.3, pp 2185-2188.

On a ainsi décrit un système de codage prédictif par transformée orthonormée à codes imbriqués permettant d'apporter des solutions inédites dans le domaine des codeurs à codes imbriqués. On indique que d'une manière générale, le système de codage objet de l'invention permet un codage en bande élargie à des débits parole/données de 32/0 kbit/s, 24/8 kbit/s et 16/16 kbit/s.We have thus described a predictive coding system by orthonormal transform with nested codes allowing to bring new solutions in the field of nested code encoders. We indicate that in a way general, the coding system object of the invention allows wideband coding at speech / data rates of 32/0 kbit / s, 24/8 kbit / s and 16/16 kbit / s.

Claims (10)

  1. System for predictive coding of a digital signal as an embedded-code digital signal, coded by embedded-code adaptive transformation, in which the coded digital signal consists of a coded speech signal and, if appropriate, of an auxiliary data signal inserted into the coded speech signal after coding the latter, a system of the type including a perceptual weighting filter (11) driven by a short-term prediction loop allowing the generation of a perceptual signal
    Figure 00440001
    and a long-term prediction circuit delivering an estimated perceptual signal P and 1 / n, this long-term prediction circuit forming a long-term prediction loop making it possible to deliver, from the perceptual signal and from the estimated past excitation signal, a modelled perceptual excitation signal, and adaptive transform and quantization means making it possible from the perceptual excitation signal to generate the coded speech signal, characterized in that the perceptual weighting filter consists of a filter for short-term prediction of the speech signal to be coded, so as to produce a frequency distribution of the quantization noise, and in that it comprises a means (12) for subtracting the contribution of the past excitation signal P and 0 / n from the said perceptual signal to deliver an updated perceptual signal Pn, and in that the long-term prediction circuit is formed, as a closed loop, from a dictionary updated by the modelled past excitation corresponding to the lowest throughput making it possible to deliver an optimal waveform and an estimated gain associated therewith, which make up the estimated perceptual signal, and in that the transform means are formed by an orthonormal transform module including an adaptive orthogonal transformation module and a module for progressive modelling by orthogonal vectors, these means of progressive modelling and the long-term prediction circuit making it possible to deliver indices representing the coded speech signal, the said system furthermore including means (19) for inserting auxiliary data, coupled to the transmission channel.
  2. Coding system according to Claim 1, characterized in that the said adaptive orthogonal transformation module includes:
    a filter producing a linear combination of the basis vectors obtained from a singular-value decomposition of the matrix representing the perceptual weighting filter.
  3. Coding system according to Claim 2, characterized in that the said filter comprises, for every matrix W representing the perceptual weighting filter:
    a first matrix module U = (U1,...,UN) and
    a second matrix module V = (V1,...,VN), the said first and second matrix modules satisfying the relation: UTWV = D in which
    UT
    denotes the matrix transpose module of the module U and where
    D
    is a diagonal matrix module whose coefficients constitute the said singular values,
    Ui and Vj denoting respectively the ith left singular vector and the jth right singular vector, the said right singular vectors {Vj} forming an orthonormal basis, thus making it possible to transform the operation for filtering by convolution product by an operation for filtering by a linear combination.
  4. Coding system according to Claim 1, characterized in that the said orthonormal transform module is formed by:
    a stochastic transform sub-module constructed by drawing a Gaussian random variable, for initialization,
    a module for global averaging over a plurality of vectors arising from a predictive transform coder,
    a reordering module,
    a Gram-Schmidt processing module, one reiteration of the processing by the preceding modules making it possible to obtain an orthonormal transform, performed off-line, formed by learning,
    a memory of read-only memory type, making it possible to store the orthonormal transform in the form of transformed vectors.
  5. Coding system according to Claim 4, characterized in that the said transform is formed by orthonormal waveforms whose frequency spectra are band-pass and relatively ordered, the first waveform of the relatively ordered orthonormal waveforms being equal to the normalized optimal waveform arising from the said adaptive dictionary and the first component of estimated gain is equal to the normalized long-term prediction gain.
  6. Coding system according to Claims 2 and 5, characterized in that the said adaptive transformation module includes:
    a Householder transformation module receiving the said estimated perceptual signal p and 1 / 1 consisting of the said optimal waveform and of the said estimated gain, and the said perceptual signal to generate a transformed perceptual signal P" in the form of a transformed perceptual signal vector with component P"k, and
    a plurality of N registers for storing the said orthonormal waveforms, the said plurality of registers forming the said read-only memory, each register of rank r including N storage cells, a component of rank k of each vector being stored in a cell of corresponding rank,
    a plurality of N multiplier circuits associated with each register forming the said plurality of storage registers, each multiplier circuit of rank k receiving, on the one hand, the component of rank k of the stored vector and, on the other hand, the component P"k of the transformed perceptual signal vector of rank k, and delivering the product P"k.fk orth(k) of the transformed perceptual signal vector components,
    a plurality of N-1 summing circuits associated with each register of rank r, each summing circuit of rank k receiving the product of previous rank k-1 delivered by the multiplier circuit of previous rank and the product of corresponding rank k delivered by the multiplier circuit of previous rank and the product of corresponding rank k delivered by the multiplier circuit of like rank k, the summing circuit of highest rank, N-1, delivering a component g(r) of the estimated gain, expressed in the form of the gain vector G.
  7. System according to Claim 1, characterized in that the said module for progressive modelling by orthogonal vector includes:
    a module for normalizing the gain vector to generate a normalized gain vector Gk, by comparing the normed value of the gain vector G with respect to a threshold value, the said normalization module making it possible to generate furthermore a length signal for the normalized gain vector Gk, destined for the decoder system as a function of the order of modelling,
    a stage for progressive modelling by orthogonal vectors properly speaking receiving the said normalized vector Gk and delivering the said indices representing the coded speech signal, the said indices being representative of the selected vectors and of their associated gains, the transmission of the auxiliary data formed by the indices being performed by overwriting the parts of the frame allocated to the indices and range numbers to form the auxiliary data signal.
  8. System for predictive decoding by adaptive transform of a digital signal coded with embedded codes in which the coded digital signal consists of a coded speech signal and, if appropriate, of an auxiliary data signal inserted into the coded speech signal after coding the latter, characterized in that it comprises:
    means for extracting the said data signal making it possible, on the one hand, to extract the said data with a view to an auxiliary use, and on the other hand, to transmit the indices representing the coded speech signal,
    means for modelling the speech signal at the minimum throughput,
    means for modelling the speech signal at at least one throughput above the minimum throughput.
  9. Decoding system according to Claim 8, characterized in that the decoder includes, apart from the data extraction system,
    a first module for modelling the speech signal at the minimum throughput, receiving the coded signal directly and delivering a first estimated speech signal S and 1 / n,
    a second module for modelling the speech signal at an intermediate throughput connected with the said data extraction system by way of means for conditional switching by criterion of the value of the said indices, and delivering a second estimated speech signal S and 2 / n,
    a third module for modelling the speech signal at a maximum throughput, connected with the said data extraction system by way of means for conditional switching by criterion of the value of the said indices and delivering a third estimated speech signal S and 3 / n,
    a summing circuit receiving on its summing inputs the first, the second respectively the third estimated speech signal and delivering at its output a resultant estimated speech signal and which are cascaded at the output of the said summing circuit,
    an adaptive filtering circuit receiving the said resultant estimated speech signal and delivering a reproduced estimated speech signal, and a digital/ analog converter receiving the said reproduced estimated speech signal and delivering an audio frequency reproduced speech signal.
  10. Decoding system according to Claim 9, characterized in that each of the minimum, intermediate or maximum throughput speech signal modelling modules comprises an inverse adaptive transformation sub-module followed by an inverse perceptual weighting filter.
EP94400109A 1993-01-21 1994-01-18 System for predictive encoding/decoding of a digital speech signal by an adaptive transform with embedded codes Expired - Lifetime EP0608174B1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR9300601 1993-01-21
FR9300601A FR2700632B1 (en) 1993-01-21 1993-01-21 Predictive coding-decoding system for a digital speech signal by adaptive transform with nested codes.

Publications (2)

Publication Number Publication Date
EP0608174A1 EP0608174A1 (en) 1994-07-27
EP0608174B1 true EP0608174B1 (en) 1998-08-12

Family

ID=9443261

Family Applications (1)

Application Number Title Priority Date Filing Date
EP94400109A Expired - Lifetime EP0608174B1 (en) 1993-01-21 1994-01-18 System for predictive encoding/decoding of a digital speech signal by an adaptive transform with embedded codes

Country Status (4)

Country Link
US (1) US5583963A (en)
EP (1) EP0608174B1 (en)
DE (1) DE69412294T2 (en)
FR (1) FR2700632B1 (en)

Families Citing this family (51)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5822436A (en) * 1996-04-25 1998-10-13 Digimarc Corporation Photographic products and methods employing embedded information
FR2722631B1 (en) * 1994-07-13 1996-09-20 France Telecom Etablissement P METHOD AND SYSTEM FOR ADAPTIVE FILTERING BY BLIND EQUALIZATION OF A DIGITAL TELEPHONE SIGNAL AND THEIR APPLICATIONS
FR2729245B1 (en) * 1995-01-06 1997-04-11 Lamblin Claude LINEAR PREDICTION SPEECH CODING AND EXCITATION BY ALGEBRIC CODES
JP3046213B2 (en) * 1995-02-02 2000-05-29 三菱電機株式会社 Sub-band audio signal synthesizer
IT1277194B1 (en) * 1995-06-28 1997-11-05 Alcatel Italia METHOD AND RELATED APPARATUS FOR THE CODING AND DECODING OF A CHAMPIONSHIP VOICE SIGNAL
US5781882A (en) * 1995-09-14 1998-07-14 Motorola, Inc. Very low bit rate voice messaging system using asymmetric voice compression processing
JPH11504733A (en) * 1996-02-26 1999-04-27 エイ・ティ・アンド・ティ・コーポレーション Multi-stage speech coder by transform coding of prediction residual signal with quantization by auditory model
US6107430A (en) * 1996-03-14 2000-08-22 The Dow Chemical Company Low application temperature hot melt adhesive comprising ethylene α-olefin
JP3878254B2 (en) * 1996-06-21 2007-02-07 株式会社リコー Voice compression coding method and voice compression coding apparatus
US6038528A (en) * 1996-07-17 2000-03-14 T-Netix, Inc. Robust speech processing with affine transform replicated data
JP3263347B2 (en) * 1997-09-20 2002-03-04 松下電送システム株式会社 Speech coding apparatus and pitch prediction method in speech coding
JP2000197054A (en) * 1998-12-24 2000-07-14 Hudson Soft Co Ltd Dynamic image encoding method, recording medium with program thereof recorded therein and apparatus
AU2001249785A1 (en) * 2000-04-03 2001-10-15 Flint Hills Scientific, L.L.C. Method, computer program, and system for automated real-time signal analysis fordetection, quantification, and prediction of signal changes
US6768969B1 (en) 2000-04-03 2004-07-27 Flint Hills Scientific, L.L.C. Method, computer program, and system for automated real-time signal analysis for detection, quantification, and prediction of signal changes
SE522261C2 (en) * 2000-05-10 2004-01-27 Global Ip Sound Ab Encoding and decoding of a digital signal
US6993477B1 (en) * 2000-06-08 2006-01-31 Lucent Technologies Inc. Methods and apparatus for adaptive signal processing involving a Karhunen-Loève basis
US7339605B2 (en) 2004-04-16 2008-03-04 Polycom, Inc. Conference link between a speakerphone and a video conference unit
US8948059B2 (en) 2000-12-26 2015-02-03 Polycom, Inc. Conference endpoint controlling audio volume of a remote device
US8964604B2 (en) 2000-12-26 2015-02-24 Polycom, Inc. Conference endpoint instructing conference bridge to dial phone number
US7864938B2 (en) 2000-12-26 2011-01-04 Polycom, Inc. Speakerphone transmitting URL information to a remote device
US9001702B2 (en) 2000-12-26 2015-04-07 Polycom, Inc. Speakerphone using a secure audio connection to initiate a second secure connection
US8977683B2 (en) * 2000-12-26 2015-03-10 Polycom, Inc. Speakerphone transmitting password information to a remote device
US8934382B2 (en) 2001-05-10 2015-01-13 Polycom, Inc. Conference endpoint controlling functions of a remote device
US8976712B2 (en) 2001-05-10 2015-03-10 Polycom, Inc. Speakerphone and conference bridge which request and perform polling operations
JP4231698B2 (en) 2001-05-10 2009-03-04 ポリコム イスラエル リミテッド Multi-point multimedia / audio system control unit
US8705719B2 (en) 2001-12-31 2014-04-22 Polycom, Inc. Speakerphone and conference bridge which receive and provide participant monitoring information
US8934381B2 (en) * 2001-12-31 2015-01-13 Polycom, Inc. Conference endpoint instructing a remote device to establish a new connection
US8885523B2 (en) 2001-12-31 2014-11-11 Polycom, Inc. Speakerphone transmitting control information embedded in audio information through a conference bridge
US7978838B2 (en) 2001-12-31 2011-07-12 Polycom, Inc. Conference endpoint instructing conference bridge to mute participants
US8144854B2 (en) * 2001-12-31 2012-03-27 Polycom Inc. Conference bridge which detects control information embedded in audio information to prioritize operations
US8223942B2 (en) * 2001-12-31 2012-07-17 Polycom, Inc. Conference endpoint requesting and receiving billing information from a conference bridge
US7787605B2 (en) 2001-12-31 2010-08-31 Polycom, Inc. Conference bridge which decodes and responds to control information embedded in audio information
US7742588B2 (en) * 2001-12-31 2010-06-22 Polycom, Inc. Speakerphone establishing and using a second connection of graphics information
US8102984B2 (en) * 2001-12-31 2012-01-24 Polycom Inc. Speakerphone and conference bridge which receive and provide participant monitoring information
US8947487B2 (en) 2001-12-31 2015-02-03 Polycom, Inc. Method and apparatus for combining speakerphone and video conference unit operations
EP1914722B1 (en) 2004-03-01 2009-04-29 Dolby Laboratories Licensing Corporation Multichannel audio decoding
US8126029B2 (en) * 2005-06-08 2012-02-28 Polycom, Inc. Voice interference correction for mixed voice and spread spectrum data signaling
US7796565B2 (en) * 2005-06-08 2010-09-14 Polycom, Inc. Mixed voice and spread spectrum data signaling with multiplexing multiple users with CDMA
US8199791B2 (en) * 2005-06-08 2012-06-12 Polycom, Inc. Mixed voice and spread spectrum data signaling with enhanced concealment of data
US8190251B2 (en) * 2006-03-24 2012-05-29 Medtronic, Inc. Method and apparatus for the treatment of movement disorders
US7761145B2 (en) * 2006-04-21 2010-07-20 Medtronic, Inc. Method and apparatus for detection of nervous system disorders
US20070249953A1 (en) * 2006-04-21 2007-10-25 Medtronic, Inc. Method and apparatus for detection of nervous system disorders
US20070249956A1 (en) * 2006-04-21 2007-10-25 Medtronic, Inc. Method and apparatus for detection of nervous system disorders
US8165683B2 (en) * 2006-04-21 2012-04-24 Medtronic, Inc. Method and apparatus for detection of nervous system disorders
US7764989B2 (en) * 2006-04-21 2010-07-27 Medtronic, Inc. Method and apparatus for detection of nervous system disorders
US7761146B2 (en) * 2006-04-21 2010-07-20 Medtronic, Inc. Method and apparatus for detection of nervous system disorders
US8108438B2 (en) * 2008-02-11 2012-01-31 Nir Asher Sochen Finite harmonic oscillator
GB2495468B (en) 2011-09-02 2017-12-13 Skype Video coding
GB2495469B (en) 2011-09-02 2017-12-13 Skype Video coding
GB2495467B (en) * 2011-09-02 2017-12-13 Skype Video coding
FI3444818T3 (en) 2012-10-05 2023-06-22 Fraunhofer Ges Forschung An apparatus for encoding a speech signal employing acelp in the autocorrelation domain

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NL8802291A (en) * 1988-09-16 1990-04-17 Koninkl Philips Electronics Nv DEVICE FOR TRANSMITTING DATA WORDS WHICH REPRESENT A DIGITIZED ANALOGUE SIGNAL AND A DEVICE FOR RECEIVING THE TRANSMITTED DATA WORDS.
US5208862A (en) * 1990-02-22 1993-05-04 Nec Corporation Speech coder
JPH0451199A (en) * 1990-06-18 1992-02-19 Fujitsu Ltd Sound encoding/decoding system
IT1241358B (en) * 1990-12-20 1994-01-10 Sip VOICE SIGNAL CODING SYSTEM WITH NESTED SUBCODE
US5371853A (en) * 1991-10-28 1994-12-06 University Of Maryland At College Park Method and system for CELP speech coding and codebook for use therewith

Also Published As

Publication number Publication date
DE69412294T2 (en) 1999-04-15
EP0608174A1 (en) 1994-07-27
FR2700632A1 (en) 1994-07-22
FR2700632B1 (en) 1995-03-24
US5583963A (en) 1996-12-10
DE69412294D1 (en) 1998-09-17

Similar Documents

Publication Publication Date Title
EP0608174B1 (en) System for predictive encoding/decoding of a digital speech signal by an adaptive transform with embedded codes
EP0782128B1 (en) Method of analysing by linear prediction an audio frequency signal, and its application to a method of coding and decoding an audio frequency signal
EP0749626B1 (en) Speech coding method using linear prediction and algebraic code excitation
EP1692689B1 (en) Optimized multiple coding method
EP1593116B1 (en) Method for differentiated digital voice and music processing, noise filtering, creation of special effects and device for carrying out said method
FR2731548A1 (en) DEPTH SEARCHING FIRST IN AN ALGEBRA DIRECTORY FOR RAPID ENCODING OF THE WALL
FR2481026A1 (en)
EP0481895B1 (en) Method and apparatus for low bit rate transmission of a speech signal using CELP coding
EP0428445B1 (en) Method and apparatus for coding of predictive filters in very low bitrate vocoders
FR2880724A1 (en) OPTIMIZED CODING METHOD AND DEVICE BETWEEN TWO LONG-TERM PREDICTION MODELS
EP2652735B1 (en) Improved encoding of an improvement stage in a hierarchical encoder
Brauer et al. Learning to dequantize speech signals by primal-dual networks: an approach for acoustic sensor networks
FR3133265A1 (en) Optimized encoding and decoding of an audio signal using a neural network-based autoencoder
CA2108663C (en) Filtering method and device for reducing digital audio signal pre-echoes
EP1383109A1 (en) Method and device for wide band speech coding
EP0347307B1 (en) Coding method and linear prediction speech coder
WO2011144863A1 (en) Encoding with noise shaping in a hierarchical encoder
EP1192618B1 (en) Audio coding with adaptive liftering
Pavlov Inter-frame interpolation of the spectral envelope of the speech signal in the space of linear spectral frequencies of the highest regression
FR2709366A1 (en) Method of storing reflection coefficient vectors
FR2980620A1 (en) Method for processing decoded audio frequency signal, e.g. coded voice signal including music, involves performing spectral attenuation of residue, and combining residue and attenuated signal from spectrum of tonal components
EP1383111A2 (en) Method and device for speechcoding with enlarged bandwidth
EP1383110A1 (en) Method and device for wide band speech coding, particularly allowing for an improved quality of voised speech frames
Kao Thesis Report
FR2709387A1 (en) Vector sum excited linear predictive coding speech coder

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): DE GB

17P Request for examination filed

Effective date: 19940714

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

17Q First examination report despatched

Effective date: 19971024

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE GB

REF Corresponds to:

Ref document number: 69412294

Country of ref document: DE

Date of ref document: 19980917

GBT Gb: translation of ep patent filed (gb section 77(6)(a)/1977)

Effective date: 19980902

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed
REG Reference to a national code

Ref country code: GB

Ref legal event code: IF02

REG Reference to a national code

Ref country code: GB

Ref legal event code: 732E

Free format text: REGISTERED BETWEEN 20090528 AND 20090603

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20101215

Year of fee payment: 18

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20110131

Year of fee payment: 18

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20120118

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20120118

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20120801

REG Reference to a national code

Ref country code: DE

Ref legal event code: R119

Ref document number: 69412294

Country of ref document: DE

Effective date: 20120801