EP0749626B1 - Verfahren zur sprachkodierung mittels linearer prädiktion und anregung durch algebraische kodes - Google Patents

Verfahren zur sprachkodierung mittels linearer prädiktion und anregung durch algebraische kodes Download PDF

Info

Publication number
EP0749626B1
EP0749626B1 EP96901020A EP96901020A EP0749626B1 EP 0749626 B1 EP0749626 B1 EP 0749626B1 EP 96901020 A EP96901020 A EP 96901020A EP 96901020 A EP96901020 A EP 96901020A EP 0749626 B1 EP0749626 B1 EP 0749626B1
Authority
EP
European Patent Office
Prior art keywords
integer
pulse
pos
codebook
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP96901020A
Other languages
English (en)
French (fr)
Other versions
EP0749626A1 (de
Inventor
Claude Lamblin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Orange SA
Original Assignee
France Telecom SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by France Telecom SA filed Critical France Telecom SA
Publication of EP0749626A1 publication Critical patent/EP0749626A1/de
Application granted granted Critical
Publication of EP0749626B1 publication Critical patent/EP0749626B1/de
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/10Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation
    • G10L19/107Sparse pulse excitation, e.g. by using algebraic codebook
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0007Codebook element generation
    • G10L2019/0008Algebraic codebooks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0013Codebook search algorithms
    • G10L2019/0014Selection criteria for distances

Definitions

  • the present invention relates to a coding method digital, especially speech signals.
  • CELP codes Code Excited Linear Prediction
  • This type of coding is widely used, mainly in terrestrial or satellite transmission systems, or in storage applications.
  • the first generation of CELP coders that used directories stochastic was very complex to implement and required large memory capacities.
  • a second generation of CELP coders then appeared: coders CELP with algebraic repertoire. They are less complex to implement and require less memory but the earnings are still insufficient.
  • CELP coding technology with algebraic repertoire has been further improved by the introduction of ACELP coders (Algebraic Code Excited Linear Prediction) that use a algebraic directory associated with a search focused on adaptive thresholds to adjust the complexity of the calculation.
  • ACELP coders Algebraic Code Excited Linear Prediction
  • the amount of RAM required remains still important.
  • CELP coders belong to the family of synthesis analysis coders, in which the coding synthesis model is used.
  • the compression rate varies from 1 to 16:
  • CELP coders operate at rates of 2 to 16 kbit / s in the telephone band, and at rates of 16 to 32 kbit / s in band enlarged.
  • the speech signal is sampled and converted into a series of frames of L samples.
  • Each frame is synthesized by filtering a waveform extracted from a repertoire (also called dictionary), multiplied by a gain, through two filters varying over time.
  • the excitation repertoire is a set of K codes or waveforms of L samples.
  • the waveforms are numbered by an integer index k, k ranging from 0 to K- 1, K being the size of the repertoire.
  • the first filter is the long-term prediction filter.
  • LTP Long Term Prediction
  • LPC Linear Prediction Coding
  • the process used to determine the innovation sequence is the method of analysis by synthesis: at the coder, all the innovation sequences of the excitation repertoire are filtered by the two filters LTP and LPC, and the waveform selected is that producing the synthetic signal closest to the original speech signal according to a perceptual weighting criterion.
  • CELP coder In a CELP coder, the excitation of the synthesis is therefore made up of extracted waveforms of a directory. Depending on the type of this directory, there are two kinds of CELP coders.
  • the first directories CELP coders consisted of waveforms stochastic. These directories are obtained either by learning either by random generation. Their major drawback is their lack of structure which requires to store them and entails a complexity of setting in high work.
  • the first excitement repertoire CELP coder was a stochastic dictionary, composed of a set of 1024 waveforms of 40 Gaussian samples. This CELP encoder did not work in real time on the most powerful computers of the time. Others stochastic dictionaries allowing to decrease and the memory and computation time required have been introduced; however, both the complexity and the capacity memory requirements remained significant.
  • ACELP coders (see WO 91/13432) have been proposed as candidates for several standardizations: ITU standardization (International Telecommunications Union) at 8kbit / s, ITU standardization for PSTN video call at 6.8 kbit / s-5.4 kbit / s. Short-term prediction, analysis modules LTP and perceptual weighting are similar to those used in a conventional CELP coder.
  • the originality of ACELP encoder resides in the signal search module of excitement.
  • the ACELP coder has two advantages major: great flexibility in flow and complexity of adjustable implementation. The flexibility in flow comes of the directory generation method. The possibility of settling the complexity is due to the selection procedure of the waveform that uses focused threshold search adaptive.
  • the excitation directory is a virtual set (in the sense that it is not stored), generated algebraically.
  • the algebraic code generator produces in response to an index k, k varying from 0 to K -1, a code vector of L samples having very few non-zero components.
  • N be the number of non-zero components.
  • the dimension of the code words is extended to L + N, and the last N components are zero. It is assumed here, without affecting the generality of the presentation, that L is a multiple of N.
  • the code words c k are therefore composed of N pulses. The amplitudes of the pulses are fixed (for example ⁇ 1).
  • L ' L / N.
  • the position can be greater than or equal to L , and the corresponding pulse is then simply canceled.
  • h (L-1)) H is the lower triangular LxL matrix of Toeplitz formed from this impulse response.
  • U H T .H is the covariance matrix of h .
  • the waveform c k is composed of N pulses of positions pos i (q, k), q and of amplitude S q (0 ⁇ q ⁇ N)
  • the scalar product P k of the vector -target D with a waveform c k and the energy ⁇ k 2 of the filtered waveform c k have the expression:
  • the exploration is accelerated by calculating before entering the search procedure an adaptive threshold for each loop. One enters the search loop of the pulse q only if a partial quantity Cr k (q-1), calculated from the pulses 0 to q-1 previously determined in the upper loops, exceeds a threshold calculated for the loop q-1.
  • a main object of the present invention is to propose an ACELP type coding method which reduces notably the size of the memory necessary for the coder.
  • the invention thus proposes a speech coding method with linear prediction and excitation by codes (CELP), in which a speech signal is digitized in successive frames of L samples, synthesis parameters are determined on the one hand defining synthesis filters, and on the other hand excitation parameters including for each frame pulse positions of an excitation code of L samples belonging to a predetermined algebraic repertoire and an associated excitation gain, and quantization values representative of the determined parameters are transmitted.
  • the algebraic repertoire is defined from at least one group of N sets of possible pulse positions in codes of at least L samples, a code of the repertoire being represented by N pulse positions belonging respectively to the N sets d 'a group.
  • the memorized components of the covariance matrix are only, for at least one group of N sets, those of the form: with 0 ⁇ p ⁇ N and those of the form: with 0 ⁇ p ⁇ q ⁇ N, pos i, p and pos j, q designating respectively the positions of order i and j in the sets of said group containing possible positions for the pulses p and q of the codes of the directory.
  • the stored components of the covariance matrix are structured, for a group, in the form of N correlation vectors and N (N-1) / 2 correlation matrices.
  • This arrangement of the components of the covariance matrix facilitates their access during the search for the ACELP excitation, so as to reduce or at least not increase the complexity of this module.
  • the method according to the invention is applicable to various types of algebraic codes, that is to say whatever the structure of the sets of possible positions for the different pulses of the codes of the directory.
  • FIG. 1 The speech synthesis process implemented in a CELP coder and decoder is illustrated in FIG. 1.
  • An excitation generator 10 delivers an excitation code c k belonging to a predetermined repertoire in response to an index k.
  • An amplifier 12 multiplies this excitation code by an excitation gain ⁇ , and the resulting signal is subjected to a long-term synthesis filter 14.
  • the output signal u of the filter 14 is in turn subjected to a short-term synthesis filter 16, whose output s and constitutes what is considered here as the synthesized speech signal.
  • filters can also be implemented at the level of the decoder, for example post-filters, as is well known in the field of speech coding.
  • the aforementioned signals are digital signals represented for example by 16-bit words at a sampling rate Fe equal for example to 8 kHz.
  • the synthesis filters 14, 16 are generally purely recursive filters.
  • the delay T and the gain G constitute long-term prediction parameters (LTP) which are determined adaptively by the coder.
  • the LPC parameters of the short-term synthesis filter 16 are determined at the coder by a linear prediction of the speech signal.
  • the transfer function of the filter 16 is thus of the form 1 / A (z) with in the case of a linear prediction of order P (typically P ⁇ 10), a i representing the i-th linear prediction coefficient.
  • FIG. 2 shows the diagram of a CELP coder.
  • the speech signal s (n) is a digital signal, for example supplied by an analog-to-digital converter 20 processing the amplified and filtered output signal from a microphone 22.
  • LPC, LTP and EXC parameters index k and gain excitation ⁇
  • LPC, LTP and EXC parameters index k and gain excitation ⁇
  • These parameters are then quantified in a known manner for transmission efficient digital and then subjected to a multiplexer 30 which forms the encoder output signal.
  • These parameters are also supplied to a module 32 for calculating initial states some encoder filters.
  • This module 32 includes essentially a decoding chain such as that shown in Figure 1. The module 32 allows to know at the encoder level, the previous states of the summary 14, 16 of the decoder, determined according to the synthesis and excitation parameters prior to the subframe considered.
  • the short-term analysis module 24 determines the LPC parameters (coefficients a i of the short-term synthesis filter) by analyzing the short-term correlations of the speech signal s (n). This determination is made for example once per frame of ⁇ samples, so as to adapt to the evolution of the spectral content of the speech signal. LPC analysis methods are well known in the art, and will therefore not be detailed here. We can for example refer to the book “Digital Processing of Speech Signals" by LR Rabiner and RW Shafer, Prentice-Hall Int., 1978.
  • the next step in coding is determining long-term synthesis LTP parameters. These are for example determined once per L subframe samples.
  • a subtractor 34 subtracts from the signal speech s (n) the response to a zero input signal from the short-term summary 16. This answer is determined by a filter 36 of transfer function 1 / A (z) whose coefficients are given by the LPC parameters that have been determined by module 24, and whose initial states s and are provided by module 32 so as to correspond to the last p synthetic signal samples.
  • the output signal of the subtractor 34 is subjected to a weighting filter 38 perceptual.
  • the transfer function W (z) of this filter perceptual weighting is determined from the parameters LPC.
  • W (z) A (z) / A (z / ⁇ ), where ⁇ is a coefficient on the order of 0.8.
  • the role of the perceptual weighting 38 is to accentuate the portions of the spectrum where errors are most noticeable.
  • the closed loop LTP analysis performed by the module 26 consists, in a conventional manner, in selecting for each subframe the delay T which maximizes the normalized correlation: where x '(n) denotes the output signal of the filter 38 during the sub-frame considered, and y T (n) denotes the convolution product u (nT) * h' (n).
  • h '(0), h' (1), ..., h '(L-1) denotes the impulse response of the weighted synthesis filter, with transfer function W (z) / A (z).
  • This impulse response h ′ is obtained by a module 40 for calculating impulse responses, as a function of the LPC parameters which have been determined for the sub-frame.
  • the samples u (nT) are the previous states of the long-term synthesis filter 14, provided by the module 32.
  • the missing samples u (nT) are obtained by interpolation on the basis of previous samples, or from the speech signal.
  • the delays T, whole or fractional, are selected in a determined window, ranging for example from 20 to 143 samples.
  • the open loop search consists more simply in determining the delay T 1 which maximizes the autocorrelation of the speech signal s (n) possibly filtered by the reverse filter of transfer function A (z). Once the delay T has been determined, the gain G of long-term prediction is obtained by:
  • the signal Gy T (n) which has been calculated by the module 26 for the optimal delay T, is first subtracted from the signal x '(n) by the subtractor 42.
  • the resulting signal x (n) is subjected to a reverse filter 44 which provides a signal D (n) given by: where h (0), h (1), ..., h (L-1) designates the impulse response of the filter composed of the synthesis filters and the perceptual weighting filter, calculated by the module 40.
  • the compound filter has the transfer function W (z) / [A (z) .B (z)].
  • the vector D constitutes a target vector for the module 28 for searching for the excitation.
  • the algebraic repertoire of possible excitation codes is defined from at least one group of N sets E 0 , E 1 , ..., E N-1 of possible positions for pulses of order 0.1 ,. .., N-1 and of amplitude S 0 , S 1 , ..., S N-1 in codes of at least L samples.
  • a directory code is represented by N pulse positions belonging respectively to the sets E 0 , E 1 , ..., E N-1 of the same group of N sets.
  • the cardinals L ' 0 , L' 1 , ..., L ' N-1 of the sets E 0 , E 1 , ..., E N-1 can be equal or different, and these sets can be disjointed or not.
  • the module 28 After having calculated and memorized certain terms of the covariance matrix U, the module 28 proceeds to the search for the excitation code for the current sub-frame.
  • the memorized components of the covariance matrix are on the one hand those of the form: structured in the form of N correlation vectors R p, p (0 ⁇ p ⁇ N) to L 'components, and on the other hand those of the form: structured in the form of N (N-1) / 2 correlation matrices R p, q (0 ⁇ p ⁇ q ⁇ N) at L 'rows and L' columns.
  • the calculation of the N correlation vectors R p, p is carried out by the module 28 in the manner illustrated in FIG. 3.
  • This calculation comprises a loop indexed by an integer i decreasing from L'-1 to 0.
  • the integer variable k is taken equal to L- ⁇ L ⁇ N- ⁇ (we assume here L- ⁇ L'N- ⁇ ⁇ 0), and the accumulation variable cor is taken equal to 0.
  • the components R p, p (i) are successively calculated for p decreasing from N-1 to 0.
  • the variable p is first taken equal to N-1 (step 52).
  • N correlation vectors requires of the order of ⁇ L'N additions, ⁇ L'N multiplications and L'N loads in memory.
  • the initialization 50 of the computation could be different.
  • the integer k can also be initialized to L- ⁇ L'N in step 50, each iteration in the loops indexed by p decreasing from N-1 to 0 then being constituted by ⁇ - ⁇ executions from step 54, followed by step 56 followed by ⁇ executions of step 54.
  • the calculation remains correct because in total ⁇ steps 54 are carried out between two successive memorizations of terms R p, p (i).
  • N (N-1) / 2 correlation matrices R p, q can be carried out by module 28 in the manner illustrated in FIG. 4.
  • this calculation includes a loop B t, d ' , indexed by an integer i decreasing from L'-1-d' to 0.
  • the integer t is taken equal to 1.
  • the integer d ' is then taken equal to 0 in step 72.
  • Step 74 corresponds to the initialization of the loop indexed by the integer i.
  • the integer i is initialized to L'-1-d ', the integer j to L'-1, the integer d to ⁇ .
  • Step 78 is then executed ⁇ times, which consists in adding the term h (k) .h (k + d) to the accumulation variable cor and in incrementing the variable k by one unit.
  • step 80 the component R p, q (i, j) is taken equal to the accumulation variable cor, and the integers p and q are each decremented by one.
  • Test 82 is then performed on the integer p. If p ⁇ 0, we return before step 78 which will be executed again ⁇ times. If test 82 shows that p ⁇ 0, test 84 is performed on the integer i. If i> 0, we go to step 86 where the integer p 'is initialized to N-1, the integer q remaining equal to t-1.
  • Step 86 is followed by 8 successive executions of step 88 consisting, like step 78, of adding h (k) .h (k + d) to the accumulation variable cor and of incrementing the integer variable k of a unit. Then, the component R q, p ' (j, i-1) is taken equal to the accumulation variable cor, and the integers p' and q are each decremented by one, in step 90. We perform then test 92 on the value of the integer q. If q ⁇ 0, we return before step 88 which will be executed again ⁇ times.
  • test 92 shows that q ⁇ 0, the integers i and j are each decremented by one unit in step 94, then we return before step 76 for the execution of the next iteration in the loop B t , of .
  • This loop is terminated when test 84 shows that i ⁇ 0.
  • step 72 If t ⁇ N we return before step 72 to calculate the components of the matrices R p, p + t and R q, q + Nt for the new value of t.
  • N (N-1) / 2 correlation matrices requires only about ⁇ L ' 2 N (N-1) / 2 additions, ⁇ L' 2 N (N-1) / 2 multiplications and L ' 2 N (N-1) / 2 loads in memory.
  • Search for excitation code can be performed by module 28 in accordance with the flowchart represented in Figures 5A and 5B.
  • step 120 we first calculate N-1 partial thresholds T (0), ..., T (N-2), and we initialize the threshold T (N-1) to a negative value, for example -1.
  • the partial thresholds T (0), ..., T (N-2) are positive and calculated in function of the input vector D and the targeted compromise between the efficiency of seeking excitement and simplicity of this research. High values of partial thresholds tend to decrease the amount of calculus needed to looking for excitement, while low values of partial thresholds lead to more exhaustive research in the ACELP directory.
  • the search for the excitation code includes N loops B 0 , B 1 , ..., B N-1 nested one inside the other.
  • the index i 0 is taken equal to 0.
  • Loop B N-1 consists of the same instructions as the previous loops. However, if the comparison 126 N-1 shows that P 2 (N-1) ⁇ T (N-1) . ⁇ 2 (N-1), then a step 128 is executed before going to step 130 N- 1 for incrementing the index i N-1 .
  • These parameters include the excitation gain ⁇ taken equal to P (N-1) / ⁇ 2 (N-1), and the N indexes i 0 , i 1 , ..., i N-1 used to find the positions of N code pulses.
  • the N indexes i 0 , i 1 , ..., i N-1 can be compiled into a global index k given by: this index k being coded on N.log 2 (L ') bits.
  • the last sequence numbers are preferably assigned to the pulses in question. If there are q possible amplitude values for the pulse q, then the loop B q of the flow diagram of FIGS. 5A and 5B is executed n q with each time a value different from the amplitude S q , and the number of times that the loop B q has been executed is also stored in step 128 before encountering a value greater than P 2 (N-1) / ⁇ 2 (N-1). This number will also be transmitted to the decoder which will therefore be able to find the amplitude S q to be applied to the corresponding pulse of the excitation code.
  • the ACELP decoder includes a demultiplexer 8 receiving the bit stream from of the encoder.
  • Quantified parameter values EXC excitation and LTP and LPC synthesis parameters are supplied to generator 10, amplifier 12 and filters 14, 16 to reconstruct the synthetic signal s and, which can for example be converted to analog by the converter 18 before being amplified and then applied to a loudspeaker 19 to restore the original speech.
  • a directory code is then characterized by a group index m and by N position index i.
  • looking for excitement can be simply accomplished by performing a times for each of the M groups the search in loops nested shown in Figures 5A and 5B. It is enough then to memorize in step 128 the number of times that the nested loop search has been fully executed before the current search to obtain the group index m allowing to reconstitute the selected excitation code.
  • the second embodiment with M> 1 makes it possible to implement a sub-optimal search procedure which still provides significant savings in memory space.
  • This procedure consists in memorizing the correlation vectors R p, p (m) and the correlation matrices R p, q (m) only for ⁇ of the group indices m (1 ⁇ ⁇ M). The additional gain in memory space is then a factor ⁇ / M.
  • This procedure amounts to subdividing the covariance matrix U into sub-blocks with the approximation U (i, j) ⁇ U (i-1, j-1) within each sub-block.
  • the steps 55 m , 79 m and 89 m are bypassed relative to those of the indexes m for which the correlation vectors R p , p ( m) and the correlation matrices R p, q (m) .
  • the search for excitation can be carried out in accordance with the flow diagram of FIGS. 5A and 5B by modifying the loops B q (0 ⁇ q ⁇ N) in the manner indicated in FIG. 9.
  • test 126 q shows that P 2 (q) / ⁇ 2 (q) is greater than the threshold T (q)
  • we execute the lower loops starting with B q + 1 or, if q N-1, we perform updating 128 of the threshold and of the excitation parameters which also include the index m then taken equal to 0.
  • We then go to step 125 q which is executed directly if the test 126 q shows that P 2 (q) ⁇ T (q) . ⁇ 2 (q) .
  • bit rate per frame The allocation of bit rate per frame is presented in Table II. 204 bits per frame correspond to a bit rate of 6.8 kbit / s. Settings Subframes 1 and 4 Subframes 2,3,5 and 6 Total per frame LPC 30 LTP delay (T) 8 5 36 impulses 14 + 1 14 + 1 90 ⁇ sign 1 1 6 G and ⁇ gains 7 7 42 Total 204
  • LPC coefficients are converted into vectorally quantized spectral line parameters (LSP).
  • LTP delays which can take 256 integer or fractional values between 191 ⁇ 8 and 143 are quantized on 8 bits. These 8 bits are transmitted in sub-frames 1 and 4 and, for the other sub-frames, a differential value is coded on 5 bits only.
  • the implementation of the invention divides the memory size required by 2.5 coder to store the components of the covariance matrix, while obtaining output signals identical to those obtained by the previous ACELP coder.
  • DSPs Common digital processing processors
  • the LPC and LTP parameters are determined similar to example 1.
  • the bit rate is then 158 bits per frame, or 5.3 kbit / s.
  • the implementation of the invention divides the memory required by the encoder by 2.8 for store the components of the covariance matrix while obtaining identical output signals (gain of 1488 words of 16 bits allowing a addressing on 12 bits in the RAM).
  • the synthesis parameters being coded as in the case of Examples 1 and 2, the coder produces 153 bits per frame, which represents a bit rate of 5.1 kbit / s.
  • the second embodiment of the invention applied without the sub-optimal procedure, would require storing 832 components of the matrix U.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Optimization (AREA)
  • General Physics & Mathematics (AREA)
  • Algebra (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)

Claims (7)

  1. Verfahren zur Sprachcodierung mit linarer Prädiktion und Anregung mittels Codes (CELP), mit Digitalisierung eines Sprachsignals in aufeinanderfolgende Raster von L Abtastproben, adaptiver Bestimmung von Syntheseparametern, welche Synthesefilter definieren, einerseits, und Anregungsparametern, welche für jedes Raster Impulspositionen eines Anregungscode von L Abtastproben, der einem vorgegebenen algebraischen Verzeichnis zugehört, und eine zugeordnete Anregungsverstärkung umfassen, andererseits, und Übertragung von für die bestimmten Parameter repräsentativen Quantifizierungswerten, bei dem das algebraische Verzeichnis ausgehend von mindestens einer Gruppe von N Mengen (E0, E1, ..., EN-1) von möglichen Impulspositionen in Codes mit mindestens L Abtastproben definiert ist, wobei ein Code des Verzeichnisses durch N Impulspositionen dargestellt ist, die jeweils den N Mengen von Positionen einer Gruppe zugehören, und bei dem die Bestimmung der Anregungsparameter relativ zu einem Raster die Auswahl eines Code aus dem Verzeichnis beinhaltet, der die Menge Pk 2k 2 maximiert, in der Pk = D·ck T das Skalarprodukt zwischen dem Code ck des Verzeichnisses und einem Zielvektor D bezeichnet, der von dem Sprachsignal des Rasters und den Syntheseparametern abhängt, und αk 2 die Energie des durch ein aus den Synthesefiltern und einem Wahrnehmungswichtungsfilter zusammengesetztes Filter gefilterten Code ck an dem Raster bezeichnet, wobei die Berechnung der Energien αk 2 eine Berechnung und eine Speicherung von Komponenten einer Kovarianzmatrix U = HT·H beinhaltet, wobei H eine absteigend dreiecksförmige Toeplitz-Matrix mit L Zeilen und L Spalten bezeichnet, die ausgehend von der Impulsantwort h(0), h(1), ..., h(L-1) des zusammengesetzten Filters gebildet ist,
    dadurch gekennzeichnet, daß die gespeicherten Komponenten der Kovarianzmatrix für mindestens eine Gruppe von N Mengen nur diejenigen mit der Form:
    Figure 00340001
    mit 0≤p<N, und diejenigen mit der Form:
    Figure 00340002
    mit 0≤p<q<N sind, wobei posi,p und posj,q jeweils die Positionen der Ordnung i bzw. j in den Mengen (Ep,Eq) der Gruppe bezeichnen, welche mögliche Positionen für die Impulse p und q der Codes des Verzeichnisses enthält.
  2. Verfahren nach Anspruch 1, dadurch gekennzeichnet, daß für eine Gruppe von N Mengen die gespeicherten Komponenten der Kovarianzmatrix in Form von N Korrelationsvektoren und N(N-1)/2 Korrelationsmatrizen strukturiert sind, wobei jeder Korrelationsvektor Rp,p einer Impulsnummer p in den Codes des Verzeichnisses (0 ≤ p < N) zugeordnet ist und eine Dimension Lp, besitzt, die gleich der Kardinalzahl der Menge (Ep) der Gruppe ist, welche mögliche Positionen für den Impuls p mit Komponenten i (0≤i<Lp') der Form Rp,p(i)=U(posi,p,posi,p) enthält, und jede Korrelationsmatrix Rp,q in den Codes des Verzeichnisses (0≤p<q<N) zwei unterschiedlichen Impulsnummern p,q zugeordnet ist und Lp, Zeilen und Lq' Spalten mit Komponenten der Form Rp,q(i,j) = U(posi,p,posj,q) in der Zeile i und in der Spalte j (0≤i<Lp, und 0≤j<Lq,) aufweist.
  3. Verfahren nach Anspruch 2, dadurch gekennzeichnet, daß alle Mengen (E0, E1, ..., EN-1) der Gruppe, die mögliche Positionen für einen Impuls der Codes des Verzeichnisses enthalten, die gleiche Kardinalzahl L' besitzen, wobei die Position der Ordnung i in der Menge (Ep) der möglichen Positionen für den Impuls p (0≤i<L', 0≤p<N) gegeben ist durch: posi,p = δ·(iN+p)+ε, wobei δ und ε zwei ganze Zahlen wie δ > 0 und ε ≥ 0 sind.
  4. Verfahren nach Anspruch 2, dadurch gekennzeichnet, daß das algebraische Verzeichnis ausgehend von M Gruppen von N Mengen von L' möglichen Positionen für einen Impuls eines Code des Verzeichnisses definiert ist, mit M>1, wobei die Position der Ordnung i in der Menge (Ep (m)) der Gruppe m, die mögliche Positionen für den Impuls p (0≤i<L', 0≤m<M, 0≤p<N) enthält, gegeben ist durch: posi,p (m) = δ·(iN+p)+ε(m) wobei δ,ε(0),..., ε(M-1) ganze Zahlen wie 0 ≤ ε(0)<... < ε(M-1)<δ sind.
  5. Verfahren nach Anspruch 4, dadurch gekennzeichnet, daß die Korrelationsvektoren (Rp,p (m)) und die Korrelationsmatrizen (Rp,q (m)) nur für µ der Gruppen gespeichert werden, wobei µ eine ganze Zahl wie 1≤µ<M ist.
  6. Verfahren nach Anspruch 3, 4 oder 5, dadurch gekennzeichnet, daß die Berechnung der N Korrelationsvektoren bezüglich einer Gruppe eine Initialisierung einer ganzzahligen Variablen k und einer Akkumulationsvariablen cor sowie eine durch eine ganze Zahl i absteigend von L'-1 bis 0 indexierte Schleife beinhaltet, wobei die Iteration i in der Schleife die aufeinanderfolgenden Berechnungen der Komponenten Rp,p(i) der Vektoren für p absteigend von N-1 bis 0 beinhaltet, und eine Komponente Rp,p(i) nach δ Inkrementierungen der ganzzahligen Variablen k und δ entsprechenden Additionen der Ausdrücke h(k)·h(k) mit der Akkumulationsvariablen cor gleich der Akkumulationsvariablen cor genommen wird.
  7. Verfahren nach Anspruch 3, 4 oder 5, dadurch gekennzeichnet, daß die Berechnung der N(N-1)/2 Korrelationsmatrizen bezüglich einer Gruppe für jedes ganzzahlige t in dem Intervall [1, N-1] und jedes ganzzahlige d' in dem Intervall [0, L'-1] eine Initialisierung einer ganzzahligen Variablen k und einer Akkumulationsvariablen cor und eine durch eine ganze Zahl i absteigend von L'-1-d' bis 0 indexierte Schleife (Bt,d') beinhaltet, wobei die Iteration i in der Schleife die aufeinanderfolgenden Berechnungen der Komponenten Rp,p+t(i,i+d') der Matrizen für p absteigend von N-1-t bis 0 beinhaltet, daraufhin, falls i > 0, die aufeinanderfolgenden Berechnungen der Komponenten Rq,q+N-t(1+d',i-1) der Matrizen für q absteigend von t-1 bis 0, und eine Komponente Rp,p+t(i,i+d') oder Rq,q+N-t(i+d',i-1) nach δ Inkrementierungen der ganzzahligen Variablen k und δ entsprechenden Additionen der Ausdrücke h(k)·h(k+d) mit der Akkumulationsvariablen cor gleich der Akkumulationsvariablen cor genommen wird, mit d = δ·(t+d'N).
EP96901020A 1995-01-06 1996-01-04 Verfahren zur sprachkodierung mittels linearer prädiktion und anregung durch algebraische kodes Expired - Lifetime EP0749626B1 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
FR9500133A FR2729245B1 (fr) 1995-01-06 1995-01-06 Procede de codage de parole a prediction lineaire et excitation par codes algebriques
FR9500133 1995-01-06
PCT/FR1996/000017 WO1996021221A1 (fr) 1995-01-06 1996-01-04 Procede de codage de parole a prediction lineaire et excitation par codes algebriques

Publications (2)

Publication Number Publication Date
EP0749626A1 EP0749626A1 (de) 1996-12-27
EP0749626B1 true EP0749626B1 (de) 1999-10-20

Family

ID=9474930

Family Applications (1)

Application Number Title Priority Date Filing Date
EP96901020A Expired - Lifetime EP0749626B1 (de) 1995-01-06 1996-01-04 Verfahren zur sprachkodierung mittels linearer prädiktion und anregung durch algebraische kodes

Country Status (8)

Country Link
US (1) US5717825A (de)
EP (1) EP0749626B1 (de)
JP (1) JP3481251B2 (de)
KR (1) KR100389693B1 (de)
CA (1) CA2182386C (de)
DE (1) DE69604729T2 (de)
FR (1) FR2729245B1 (de)
WO (1) WO1996021221A1 (de)

Families Citing this family (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2729246A1 (fr) * 1995-01-06 1996-07-12 Matra Communication Procede de codage de parole a analyse par synthese
FR2729247A1 (fr) * 1995-01-06 1996-07-12 Matra Communication Procede de codage de parole a analyse par synthese
US5646867A (en) * 1995-07-24 1997-07-08 Motorola Inc. Method and system for improved motion compensation
JP3094908B2 (ja) * 1996-04-17 2000-10-03 日本電気株式会社 音声符号化装置
JP3707154B2 (ja) * 1996-09-24 2005-10-19 ソニー株式会社 音声符号化方法及び装置
DE19641619C1 (de) * 1996-10-09 1997-06-26 Nokia Mobile Phones Ltd Verfahren zur Synthese eines Rahmens eines Sprachsignals
US5970444A (en) * 1997-03-13 1999-10-19 Nippon Telegraph And Telephone Corporation Speech coding method
US5924062A (en) * 1997-07-01 1999-07-13 Nokia Mobile Phones ACLEP codec with modified autocorrelation matrix storage and search
CA2254620A1 (en) * 1998-01-13 1999-07-13 Lucent Technologies Inc. Vocoder with efficient, fault tolerant excitation vector encoding
US6266412B1 (en) * 1998-06-15 2001-07-24 Lucent Technologies Inc. Encrypting speech coder
US6556966B1 (en) 1998-08-24 2003-04-29 Conexant Systems, Inc. Codebook structure for changeable pulse multimode speech coding
US6714907B2 (en) * 1998-08-24 2004-03-30 Mindspeed Technologies, Inc. Codebook structure and search for speech coding
SE521225C2 (sv) * 1998-09-16 2003-10-14 Ericsson Telefon Ab L M Förfarande och anordning för CELP-kodning/avkodning
JP4005359B2 (ja) 1999-09-14 2007-11-07 富士通株式会社 音声符号化及び音声復号化装置
WO2001024166A1 (en) * 1999-09-30 2001-04-05 Stmicroelectronics Asia Pacific Pte Ltd G.723.1 audio encoder
JP3449339B2 (ja) * 2000-06-08 2003-09-22 日本電気株式会社 復号化装置および復号化方法
US7363219B2 (en) * 2000-09-22 2008-04-22 Texas Instruments Incorporated Hybrid speech coding and system
JP3449348B2 (ja) * 2000-09-29 2003-09-22 日本電気株式会社 相関行列学習方法および装置ならびに記憶媒体
JP3536921B2 (ja) * 2001-04-18 2004-06-14 日本電気株式会社 相関行列学習方法、装置及びプログラム
DE10140507A1 (de) * 2001-08-17 2003-02-27 Philips Corp Intellectual Pty Verfahren für die algebraische Codebook-Suche eines Sprachsignalkodierers
US7383283B2 (en) * 2001-10-16 2008-06-03 Joseph Carrabis Programable method and apparatus for real-time adaptation of presentations to individuals
US8655804B2 (en) 2002-02-07 2014-02-18 Next Stage Evolution, Llc System and method for determining a characteristic of an individual
US8195597B2 (en) * 2002-02-07 2012-06-05 Joseph Carrabis System and method for obtaining subtextual information regarding an interaction between an individual and a programmable device
JP4290917B2 (ja) * 2002-02-08 2009-07-08 株式会社エヌ・ティ・ティ・ドコモ 復号装置、符号化装置、復号方法、及び、符号化方法
US7003461B2 (en) * 2002-07-09 2006-02-21 Renesas Technology Corporation Method and apparatus for an adaptive codebook search in a speech processing system
EP1383109A1 (de) * 2002-07-17 2004-01-21 STMicroelectronics N.V. Verfahren und Vorrichtung für breitbandige Sprachkodierung
US7249014B2 (en) 2003-03-13 2007-07-24 Intel Corporation Apparatus, methods and articles incorporating a fast algebraic codebook search technique
GB0307752D0 (en) * 2003-04-03 2003-05-07 Seiko Epson Corp Apparatus for algebraic codebook search
FI118835B (fi) * 2004-02-23 2008-03-31 Nokia Corp Koodausmallin valinta
KR100668299B1 (ko) * 2004-05-12 2007-01-12 삼성전자주식회사 구간별 선형양자화를 이용한 디지털 신호 부호화/복호화방법 및 장치
SG123639A1 (en) 2004-12-31 2006-07-26 St Microelectronics Asia A system and method for supporting dual speech codecs
JP3981399B1 (ja) * 2006-03-10 2007-09-26 松下電器産業株式会社 固定符号帳探索装置および固定符号帳探索方法
KR101542069B1 (ko) * 2006-05-25 2015-08-06 삼성전자주식회사 고정 코드북 검색 방법 및 장치와 그를 이용한 음성 신호의부호화/복호화 방법 및 장치
RU2439721C2 (ru) 2007-06-11 2012-01-10 Фраунхофер-Гезелльшафт цур Фёрдерунг дер ангевандтен Аудиокодер для кодирования аудиосигнала, имеющего импульсоподобную и стационарную составляющие, способы кодирования, декодер, способ декодирования и кодированный аудиосигнал
CN101842833B (zh) * 2007-09-11 2012-07-18 沃伊斯亚吉公司 语音和音频编码中快速代数码本搜索的方法和设备
ATE500588T1 (de) 2008-01-04 2011-03-15 Dolby Sweden Ab Audiokodierer und -dekodierer
CN101615394B (zh) 2008-12-31 2011-02-16 华为技术有限公司 分配子帧的方法和装置
WO2012095924A1 (ja) * 2011-01-14 2012-07-19 パナソニック株式会社 符号化装置、通信処理装置および符号化方法
CN102623012B (zh) * 2011-01-26 2014-08-20 华为技术有限公司 矢量联合编解码方法及编解码器
WO2014053261A1 (en) * 2012-10-05 2014-04-10 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. An apparatus for encoding a speech signal employing acelp in the autocorrelation domain
EP2919232A1 (de) * 2014-03-14 2015-09-16 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Codierer, Decodierer und Verfahren zur Codierung und Decodierung

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA1229681A (en) * 1984-03-06 1987-11-24 Kazunori Ozawa Method and apparatus for speech-band signal coding
CA1255802A (en) * 1984-07-05 1989-06-13 Kazunori Ozawa Low bit-rate pattern encoding and decoding with a reduced number of excitation pulses
US4868867A (en) * 1987-04-06 1989-09-19 Voicecraft Inc. Vector excitation speech or audio coder for transmission or storage
US4899385A (en) * 1987-06-26 1990-02-06 American Telephone And Telegraph Company Code excited linear predictive vocoder
US4910781A (en) * 1987-06-26 1990-03-20 At&T Bell Laboratories Code excited linear predictive vocoder using virtual searching
CA2005115C (en) * 1989-01-17 1997-04-22 Juin-Hwey Chen Low-delay code-excited linear predictive coder for speech or audio
EP0422232B1 (de) * 1989-04-25 1996-11-13 Kabushiki Kaisha Toshiba Stimmenkodierer
CA2027705C (en) * 1989-10-17 1994-02-15 Masami Akamine Speech coding system utilizing a recursive computation technique for improvement in processing speed
CA2010830C (en) * 1990-02-23 1996-06-25 Jean-Pierre Adoul Dynamic codebook for efficient speech coding based on algebraic codes
US5195137A (en) * 1991-01-28 1993-03-16 At&T Bell Laboratories Method of and apparatus for generating auxiliary information for expediting sparse codebook search
US5495555A (en) * 1992-06-01 1996-02-27 Hughes Aircraft Company High quality low bit rate celp-based speech codec
FR2700632B1 (fr) * 1993-01-21 1995-03-24 France Telecom Système de codage-décodage prédictif d'un signal numérique de parole par transformée adaptative à codes imbriqués.

Also Published As

Publication number Publication date
FR2729245A1 (fr) 1996-07-12
DE69604729D1 (de) 1999-11-25
DE69604729T2 (de) 2002-07-25
EP0749626A1 (de) 1996-12-27
WO1996021221A1 (fr) 1996-07-11
US5717825A (en) 1998-02-10
JP3481251B2 (ja) 2003-12-22
CA2182386C (fr) 2003-09-09
KR970701901A (ko) 1997-04-12
CA2182386A1 (fr) 1996-07-11
FR2729245B1 (fr) 1997-04-11
KR100389693B1 (ko) 2003-12-01
JPH10502191A (ja) 1998-02-24

Similar Documents

Publication Publication Date Title
EP0749626B1 (de) Verfahren zur sprachkodierung mittels linearer prädiktion und anregung durch algebraische kodes
EP0608174B1 (de) System zur prädiktiven Kodierung/Dekodierung eines digitalen Sprachsignals mittels einer adaptiven Transformation mit eingebetteten Kodes
EP0782128B1 (de) Verfahren zur Analyse eines Audiofrequenzsignals durch lineare Prädiktion, und Anwendung auf ein Verfahren zur Kodierung und Dekodierung eines Audiofrequenzsignals
KR930010399B1 (ko) 특정 여기 코드 워드 선택 방법
EP0801790B1 (de) Verfahren zur sprachkodierung mittels analyse durch synthese
EP1692689B1 (de) Optimiertes mehrfach-codierungsverfahren
FR2731548A1 (fr) Recherche profondeur d&#39;abord dans un repertoire algebrique pour un encodage rapide de la paroie
FR2706064A1 (fr) Procédé et dispositif de quantitication vectorielle.
EP0801788B1 (de) Verfahren zur sprachkodierung mittels analyse durch synthese
EP0721180B1 (de) Sprachkodierung mittels Analyse durch Synthese
EP0428445B1 (de) Verfahren und Einrichtung zur Codierung von Prädiktionsfiltern in Vocodern mit sehr niedriger Datenrate
FR2690551A1 (fr) Procédé de quantification d&#39;un filtre prédicteur pour vocodeur à très faible débit.
EP1836699B1 (de) Verfahren und Vorrichtung zur Ausführung einer optimalizierten Audiokodierung zwischen zwei Langzeitvorhersagemodellen
EP0616315A1 (de) Vorrichtung zur digitalen Sprachkodierung und -dekodierung, Verfahren zum Durchsuchen eines pseudologarithmischen LTP-Verzögerungskodebuchs und Verfahren zur LTP-Analyse
EP0347307B1 (de) Kodierungsverfahren und linearer Prädiktionssprachkodierer
EP1383109A1 (de) Verfahren und Vorrichtung für breitbandige Sprachkodierung
WO2011144863A1 (fr) Codage avec mise en forme du bruit dans un codeur hierarchique
EP0796490B1 (de) Verfahren und vorrichtung zur signalprädiktion für einen sprachkodierer
EP1192619B1 (de) Audio-kodierung, dekodierung zur interpolation
WO2002029786A1 (fr) Procede et dispositif de codage segmental d&#39;un signal audio
FR2709366A1 (fr) Procédé de stockage de vecteurs de coefficient de réflexion.
FR2709387A1 (fr) Système de communication radio.
JP2001100799A (ja) 音声符号化装置、音声符号化方法および音声符号化アルゴリズムを記録したコンピュータ読み取り可能な記録媒体
FR2980620A1 (fr) Traitement d&#39;amelioration de la qualite des signaux audiofrequences decodes
EP1383111A2 (de) Verfahren und Vorrichtung zur Sprachkodierung mit erweiterter Bandbreite

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 19960912

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): DE GB IT NL SE

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

17Q First examination report despatched

Effective date: 19990210

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE GB IT NL SE

REF Corresponds to:

Ref document number: 69604729

Country of ref document: DE

Date of ref document: 19991125

GBT Gb: translation of ep patent filed (gb section 77(6)(a)/1977)

Effective date: 19991203

ITF It: translation for a ep patent filed

Owner name: BARZANO' E ZANARDO MILANO S.P.A.

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed
REG Reference to a national code

Ref country code: GB

Ref legal event code: IF02

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: SE

Payment date: 20141223

Year of fee payment: 20

Ref country code: GB

Payment date: 20141219

Year of fee payment: 20

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: NL

Payment date: 20141217

Year of fee payment: 20

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: IT

Payment date: 20141222

Year of fee payment: 20

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20141218

Year of fee payment: 20

REG Reference to a national code

Ref country code: DE

Ref legal event code: R071

Ref document number: 69604729

Country of ref document: DE

REG Reference to a national code

Ref country code: NL

Ref legal event code: MK

Effective date: 20160103

REG Reference to a national code

Ref country code: GB

Ref legal event code: PE20

Expiry date: 20160103

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION

Effective date: 20160103

REG Reference to a national code

Ref country code: SE

Ref legal event code: EUG