WO2007096550A2 - Codage/decodage perfectionnes d'un signal audionumerique, en technique celp - Google Patents
Codage/decodage perfectionnes d'un signal audionumerique, en technique celp Download PDFInfo
- Publication number
- WO2007096550A2 WO2007096550A2 PCT/FR2007/050780 FR2007050780W WO2007096550A2 WO 2007096550 A2 WO2007096550 A2 WO 2007096550A2 FR 2007050780 W FR2007050780 W FR 2007050780W WO 2007096550 A2 WO2007096550 A2 WO 2007096550A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- dictionary
- dictionaries
- vector
- pattern
- basic
- Prior art date
Links
- 230000005236 sound signal Effects 0.000 title claims abstract description 15
- 238000000034 method Methods 0.000 title claims description 51
- 239000013598 vector Substances 0.000 claims abstract description 164
- 230000005284 excitation Effects 0.000 claims abstract description 48
- 239000011159 matrix material Substances 0.000 claims description 15
- 230000008569 process Effects 0.000 claims description 9
- 230000004044 response Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 8
- 238000001914 filtration Methods 0.000 claims description 7
- 230000015572 biosynthetic process Effects 0.000 claims description 6
- 238000003786 synthesis reaction Methods 0.000 claims description 6
- 230000036961 partial effect Effects 0.000 claims description 4
- 238000010276 construction Methods 0.000 abstract description 4
- 230000014509 gene expression Effects 0.000 description 13
- 238000004422 calculation algorithm Methods 0.000 description 11
- 238000004364 calculation method Methods 0.000 description 11
- 230000008901 benefit Effects 0.000 description 10
- 230000000694 effects Effects 0.000 description 9
- 230000003595 spectral effect Effects 0.000 description 8
- 238000012986 modification Methods 0.000 description 7
- 230000004048 modification Effects 0.000 description 7
- 238000001228 spectrum Methods 0.000 description 7
- 230000003044 adaptive effect Effects 0.000 description 5
- 238000012937 correction Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 239000000203 mixture Substances 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 230000001364 causal effect Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000002829 reductive effect Effects 0.000 description 3
- 238000010845 search algorithm Methods 0.000 description 3
- 238000005311 autocorrelation function Methods 0.000 description 2
- 238000009472 formulation Methods 0.000 description 2
- 238000001208 nuclear magnetic resonance pulse sequence Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000011282 treatment Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000005021 gait Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000000135 prohibitive effect Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/10—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/12—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L2019/0001—Codebooks
- G10L2019/0007—Codebook element generation
Definitions
- the present invention relates to the coding / decoding of digital audio signals, known as "CELP” (for "Code Excited Linear Prediction”).
- the compression coding of such signals can intervene for their transmission or storage.
- the signals may be speech signals or more generally digitized sound signals. More particularly, this invention relates to the technique of predictive coding in which:
- LPC Linear Prediction Coding
- the invention relates to the family of coders CELP (for "Code Excited Linear Prediction"), which select the excitation signal from among a set of candidate signals by comparing the output of the synthesis filter, excited by this signal, to original signal, with introduction of perceptual weighting.
- coders have been widely used for encoding speech signals in bit rates of 6 to 24 kbit / s, and adopted in particular in ITU-T G.729, GSM-EFR, 3GPP / WB-AMR standards.
- the invention finds an advantageous application in hierarchical coding systems described in detail below and for which the bit stream is formed of a base layer followed by additional layers to improve the quality.
- FIG. 1 A general diagram of a CELP encoder is given in Figure 1.
- Figure 2 shows the associated decoder.
- CELP Code-Excited Linear Prediction
- the encoder segments an input signal S (n) into sample blocks or "frames" (typically of the order of 10 to 20 ms of signal).
- an LPC analysis is performed to estimate and quantify the parameters of the short-term linear prediction filter.
- the excitation signal exc (z) is modeled using two dictionaries:
- the adaptive dictionary DICa intended to model the periodicity of the harmonic sounds
- the present invention rather aims at the "fixed" dictionary DICf, whereas what concerns the adaptive dictionary DICa is preferentially not dealt with in what follows.
- the modeling of the excitation signal is generally performed on sample blocks corresponding to signal subframes typically of the order of 5 ms.
- the selection of an optimal code word in a dictionary (which is also called "vector- code ", or” waveform ") is performed by minimizing the energy of the perceptually weighted error signal, which is expressed by a relation of the type:
- E ( z ) W (z) (S (z) - S (z)), where the notation E (z), S (z), S (z) represent the z transforms, respectively, of the signal of weighted error of the original signal to be encoded and the reconstructed signal.
- the filter W (z) is the perceptual weighting filter 1 1 (typically
- the weighted error signal E (z) can be expressed by a relation of the type:
- the signals exc pass (n) and exc current (n) respectively represent the past excitation signal (null signal on the current block) and current (zero memory signal).
- CELP minimization criterion (subsequent modules 13 and 14) is then expressed by searching in a dictionary of the waveform ⁇ c (n); 0 ⁇ n ⁇ N-1 ⁇ which minimizes the quantity:
- the elements ⁇ h (n ⁇ represent the impulse response of the filter H (defined above by the relation (1) above).
- the filter H is causal, that is to say that the elements h (n) such that n ⁇ 0 are zero.
- the elements h ( ⁇ ) such that n ⁇ 0 can be non-zero.
- the optimal gain associated with the selected code vector is quantized.
- a quantization index and the index associated with the selected code vector are transmitted (via a telecommunication network) or simply stored for subsequent transmission. It is on the basis of these indices that the decoding can then take place.
- the excitation dictionary is guided by constraints of flow, quality (or efficiency for a given flow) and complexity. For a restricted bit rate, it will be difficult to obtain a good reproduction quality for any signal to be encoded. Complexity is also an important factor. For all communication applications, the real-time constraint imposes limitations on the calculation time.
- the first CELP dictionaries proposed in the literature were random code vectors, which required calculating the numerator and the denominator of the criterion for each dictionary vector. The search for the best code word was then of prohibitive complexity.
- Structured dictionaries were then proposed to speed up the search for the optimal waveform, some search computations being performed once for different input signals (or “pooled calculations") due to the induced relationships between the vectors by the dictionary structure.
- One of the most popular categories of structured dictionaries is the family of algebraic dictionaries, composed of pulses whose position is defined by an algebraic code or according to a network of points (typically a Gosset network), regular or not.
- the most classic representatives of such dictionaries are known as ACELP (for "Algebraic CELP").
- the filtering of the fixed dictionary presupposes a certain continuity of the process because the filters tend to widen the support of the filtered signal, and since it is generally not possible to correct the excitation of the preceding block, irregularities at the edge of the Coded sample blocks, poorly controlled by the process, may appear.
- orthogonal dictionaries can also be provided in this context.
- Hierarchical coding structures are now briefly described. Such structures, also called “scalable”, provide coding binary data that are divided into successive layers.
- a base layer is formed of the bits absolutely necessary for the decoding of the bitstream, and determining a minimum quality of decoding.
- the following layers progressively improve the quality of the decoded signal, each new layer providing new information, which, exploited at decoding, output a signal of increasing quality.
- One of the peculiarities of hierarchical coders is the possibility of intervening at any level of the transmission or storage chain to remove a part of the bitstream without having to give any particular indication to the coder or the decoder.
- the decoder uses the binary information it receives and produces a signal of corresponding quality.
- Hierarchical CELP coders also called “nested CELPs”
- dictionaries which can be different on each floor or identical.
- the present invention improves the situation.
- an initial dictionary (hereinafter also called “basic dictionary”) is constructed in: providing the same sequence of pulses forming a basic pattern,
- pulse sequence is understood to mean a succession of samples comprising pulses and possibly one or more zero samples between the pulses, and / or at the beginning and / or at the end of the succession.
- the dictionary thus constructed is a CELP excitation dictionary of the so-called "fixed" type (referenced DICf for example in FIGS. 1 and 2 described above).
- the basic pattern appearing at each occurrence in an excitation vector is multiplied by an amplitude associated with said occurrence, this amplitude being for example chosen from a set comprising the values +1 and -1.
- all the vectors of the initial dictionary include the same number of occurrences of the basic pattern.
- an initial dictionary can be defined by:
- CELP excitation vector dictionaries these dictionaries being defined by the data of a basic pattern, appearing according to one or more occurrences, each occurrence being multiplied by an amplitude.
- the patterns possibly appearing at the edge of the block are truncated to fit exactly in the block.
- a dictionary obtained by the method in the sense of the invention, gathering vectors of dimension N, is then defined by a basic pattern, which is "displaced" in the block of length N.
- Each pattern appears according to K occurrences that we add up, each occurrence being itself defined by:
- a multi-pulse dictionary well known in the state of the art, constitutes a particular case of a dictionary thus obtained, insofar as the length of a pattern in the case of a dictionary multi-pulse is simply 1.
- This type of multi-pulse dictionary will be referred to hereinafter as the "trivial basic dictionary".
- the method in the sense of the invention makes it possible to construct combinations of dictionaries (initial and constructed as described above without also excluding the use of one or more additional multi-pulse dictionaries).
- a dictionary obtained by the method in the sense of the invention may consist of: a single non-trivial basic dictionary, defined by a basic pattern (of length greater than 1), by the positions of the pattern and by the associated amplitude according to the different occurrences, or
- a global dictionary can be constructed by a sum of basic dictionaries of which at least one is an initial dictionary defined by a basic pattern.
- the vectors of the global dictionary are formed in this case by adding the common position pulses of the basic dictionaries vectors, preferably weighted one by one by a gain each associated with a dictionary.
- a global dictionary can be constructed by a union of basic dictionaries, at least one of which is an initial dictionary defined by a basic pattern. In this case, the global dictionary simply includes all the vectors of all the basic dictionaries.
- the basic pattern comprises at least one central pulse, preceded and succeeded by at least one pulse of sign opposite to the sign of the central pulse. More precisely, the pattern may comprise in all three pulses of which:
- a second pulse preceding the central pulse and a third pulse succeeding the central pulse, the signs of the second and third pulses being opposite to that of the central pulse, the amplitude of the second and third pulses being lower, absolute value, that of the central pulse and, advantageously, variable between 0 (not included) and about half the amplitude of the central pulse, in absolute value.
- a coding / decoding device comprising a cascading of dictionaries, at least one initial dictionary of which is subsequent in the cascade, this initial dictionary comprising such a symmetrical pattern with a previous central pulse and pulses and next of amplitudes opposite to that of the central amplitude.
- This device may advantageously comprise a high-pass filtering in a global perceptual weighting filter intervening in coding in particular in the search for an optimal excitation vector.
- this realization proposes a cascading of a multi-pulse dictionary with a dictionary defined by a motive symmetrical with respect to its center, whose occurrences of the center of the motif describe the same set as the occurrences of the pulses of the multi-pulse dictionary.
- This implementation makes it possible to broaden the spectral range of the initial basic dictionary by adding one or more additional basic dictionaries, the search in these additional basic dictionaries then being spectrally focused by modifying the perceptual weighting filter. intervening in the search for the optimal vector, the choice of this modification and that of the motif of these additional basic dictionaries possibly being linked.
- the positions of the patterns and / or pulses in the vectors of the dictionaries in particular when they are cascaded, describe preferentially identical sets, the position of a pattern being marked substantially by the position of a central pulse in the sequence of pulses forming the pattern.
- the position of a pattern can be identified by the position in the sample block of the center of the pattern, if the pattern includes a number odd of samples. However, in a strictly equivalent way, a possible even length pattern may be completed by a zero to produce an odd length. More generally, any other variant for locating the position of the patterns may be considered.
- the invention proposes very simple techniques for decoding the index of the vectors of such dictionaries, by adding the scaled occurrences of the pattern or patterns whose position and the amplitude factor for each occurrence are transmitted.
- an index is formed preferably comprising at least indications:
- the index further includes an indication of the dictionary in which the best candidate vector has been found.
- the index includes in particular an indication relative to the aforementioned initial dictionary and hence an indication as to the basic pattern that made it possible to construct the dictionary and therefore the best candidate vector.
- the index In the case of a single basic dictionary, the index already reflects the amplitude and position associated with each of its occurrences. To decode the best candidate vector, it is then sufficient to position the basic pattern at the different positions that it must occupy in each occurrence, multiply it by the associated amplitudes, and sum the occurrences. In the case of a union of basic dictionaries, the index further informs about the selected base dictionary, as indicated above. In the case of a sum of basic dictionaries, we have the amplitudes and positions of the occurrences of each basic pattern and we proceed in an equivalent way to the case of the union, but by summing the contributions of all the patterns.
- the best candidate vector is reconstructed preferentially from the index:
- the indices of the vectors in each of the dictionaries are preferably determined, and from there, for each index, the last three steps described above are applied.
- the dictionary constructed within the meaning of the invention preferably comprises allowable pattern positions which describe a highly structured set, advantageously as a set of pulse positions of an ACELP dictionary.
- the cascading of dictionaries including at least one basic dictionary is very advantageous. This variant is particularly suitable for the case of hierarchical coding structures. Nevertheless, the different basic dictionaries do not play the same role because, typically, the first dictionary ensures the coding of a minimum quality of the signals that it is desired to reproduce. The following dictionaries are intended to improve this quality, and will consolidate the coding, reduce sensitivity to the type of signal, or other.
- the cascading of a plurality of dictionaries amounts to constructing a single global dictionary obtained by summation of the dictionaries weighted by gains, as indicated above.
- each excitation vector corresponds to the sum of vectors derived from basic dictionaries multiplied by a gain, the basic dictionaries being explored one after the other, subtracting the known contribution of the partial excitation produced. by the vectors of the previous dictionaries.
- the cascaded dictionaries are explored one after the other, subtracting, for a current dictionary, a known contribution of a partial excitation produced by the vectors of at least one preceding dictionary, which confers a hierarchical coding structure.
- the search in a dictionary for the purposes of the invention of a candidate excitation vector is carried out according to an estimate of a CELP criterion, little modified with respect to the prior art and then including the steps: a) calculating the convolution of the impulse response of a filter resulting from the product of an LPC synthesis filter by a perceptual filter, with the basic pattern of the dictionary, to obtain a convoluted filter vector, b) calculate the elements of an inter-correlation vector between a candidate target vector and the convoluted filter vector, c) optionally correcting elements of the inter-correlation vector to account for a truncation of the basic pattern to at least one block edge, d) calculating the elements of an autocorrelation matrix of the convoluted filter vector, e) optionally correcting elements of said matrix to take account of a truncation of the basic pattern at the m oin a block edge, f) search for the best candidate vector using a CELP criterion expressed
- the estimate of the CELP criterion is slightly modified by the addition of the steps c) and e), with respect to the estimation of the criterion CELP in the sense of the prior art.
- the present invention aims not only at the method defined above, but also at the dictionary, itself, of CELP excitation vectors, capable of being constructed by a device for encoding / decoding digital audio signals, by an implementation of the process within the meaning of the invention.
- It also relates to a computer program comprising instructions for implementing the method of constructing a dictionary as defined above.
- an advantageous embodiment consists in providing a device including means (such as a processor, a calculation memory, etc.) for generating the CELP excitation vectors of one or more dictionaries of which at least one is a dictionary. to be constructed by the implementation of the method within the meaning of the invention.
- these dictionaries can be constructed by executing a computer program of the aforementioned type, then stored in a memory of such a coding / decoding device, for example by virtue of the use of an algebraic law associating the indices. from vectors to the code vectors themselves (as for example in the ACELP technique).
- the present invention also relates to a use of such a device for the coding / decoding of digital audio signals (thus typically a coding / decoding method), as well as the computer program intended for a device for encoding / decoding digital audio signals, and comprising instructions for the implementation of such use.
- all or part of the general and optional characteristics expressed above can be applied both for the construction of the dictionary, for the dictionary itself or for the coding / decoding device comprising at least one dictionary thus constructed. or for the use of such a device, or for the computer program generating the dictionary or for the computer program for the use of the device.
- the invention proposes dictionaries of excitation vectors of the CELP type and their use, which offer a great potential wealth of contents for a moderate size.
- the decoding of the associated indices is of low complexity, despite this variety of forms.
- the present invention proposes a category of CELP dictionaries permitting the encoding of a large variety of excitation signals for relatively moderate data rates, and furthermore offering fast and efficient algorithms for the selection of the appropriate vector.
- FIG. 3a illustrates a basic pattern for the implementation of the invention
- FIGS. 3b and 3c respectively illustrate a first A 0 and a second A 1 set of the positions of the first and second occurrences of a basic pattern
- FIG. 3d illustrates an example of vector-code selected by setting implementation of the invention
- FIG. 4 is a table of modifications of the autocorrelation matrix in the estimation of the CELP criterion using a dictionary in the sense of the invention
- FIG. 5 illustrates the main steps of finding the best code vector in a dictionary; within the meaning of the invention, by applying the "corrected" CELP criterion to take account of the presence of patterns, part of which is located outside a current block,
- FIG. 6 illustrates an exemplary union of dictionaries within the meaning of the invention
- FIG. 7 illustrates an exemplary sum of dictionaries within the meaning of the invention
- FIGS. 8a and 8b illustrate a first and a second basic dictionary in an exemplary embodiment of the present invention for improving a CELP coder according to the G.729 standard
- FIG. 8c compares the shape of the average spectra of the waveforms of the dictionary of FIG. 8a and the dictionary of FIG. 8b,
- FIG. 9 illustrates an exemplary embodiment of a CELP coder according to the G.729 standard perfected by an exemplary implementation of the present invention.
- the code vectors of a base dictionary are obtained by defining a base pattern y (j) (-p ⁇ j ⁇ p) as a series of samples (FIG. 3a) which moves in a block of length N, truncating the pattern when it overflows the block.
- y (j) (-p ⁇ j ⁇ p)
- N a series of samples
- the box in dashed lines bearing the reference D2 of FIG. 7 illustrates some vectors V21, V22, V2n of a basic dictionary thus constructed.
- the first vector V21 comprises a base pattern Pat (D2) comprising a succession of eleven pulses. To the left of this pattern is the "end" of an inverse polarity pattern and truncated so that only its ninth to eleventh pulses appear in the vector V21.
- the next vector V22 takes the whole Pat (D2) pattern and another right truncated pattern of reverse polarity. In vectors V21 and V22, the patterns are disjoint.
- the last vector V2n two basic patterns are taken up with the same polarity, but their respective centers occupy positions sufficiently close so that the two patterns overlap partially.
- the overlapping pulses add up together, taking into account their sign.
- the last vector V2n of the dictionary D2 in the example of FIG. 7 comprises the sum of the pulses of the two basic patterns at their edges, right for the one and left for the other (tenth and eleventh pulses of the pattern global from the left).
- the (negative) pulse of the center of the second pattern of the vector V21 vanishes with the second (positive) pulse of the vector V12 in the sum of the vectors V21 + V12.
- pattern positions are such that patterns overlap at least partially (in the case of the vector V2n).
- the pulses of the overlapping patterns are added one by one.
- y (j) (-p ⁇ j ⁇ p) having the advantage of making more clear the following developments, seems to impose a priori an odd number 2p + 1 of elements in the basic pattern (-p ⁇ j ⁇ p).
- this feature is not necessary for the implementation of the present invention. If one wishes to use a pattern having an even number of elements, it suffices to add a null element on one of the edges, and the formulation applied here is still usable.
- Each occurrence k is characterized by: - the amplitude assigned to it, s k , taking its values in a set S k , - by the position of the basic pattern, which can be represented, for example, by the position a k to which one places one's center.
- a k takes its values in a set A k , and can possibly be outside the interval [0, N -I], the only constraint being, of course, that the intersection of the patterns and the block not zero.
- the second occurrence is characterized by the center a x which can be placed at the four positions of
- Each vector (c (n) ⁇ is defined by the set of positions of the centers of the basic patterns of each of the occurrences that compose it.
- the vectors ⁇ c (n ⁇ of the basic dictionary are deduced from the vectors ⁇ c o (n) ⁇ by convolution with the base pattern y and truncation at the terminals of the segment [0. ⁇ / -1].
- the truncation function t ⁇ n) introduces nonlinearities in the expression of c ⁇ n), which can be overcome by extending the vector ⁇ c (n ⁇ of dimension N to the vector ⁇ c '(n) ⁇ , of dimension (N + 2p):
- H O involves the calculation of two quantities: numerator Num and denominator Den.
- the number of non-zero elements h m (n, j) thus depends on the number of non-zero elements h ( ⁇ ) such that n ⁇ 0. If we assume that the filter H (z) is causal, all elements b d ( ⁇ ) such that n ⁇ N-1 are zero.
- central is conventionally expressed by
- N-I ⁇ (i, j) ⁇ h '(n-i) ⁇ h' (n-j) is an element of the autocorrelation matrix of
- the modified matrix thus makes it possible to write the denominator of the search in the dictionary within the meaning of the invention in the form: ⁇ - ⁇ KI KI
- step 53 O correlation vector between the target vector x (n) and the vector ⁇ h '(i) ⁇ (obtained in step 51).
- step 63 of FIG. 5 These elements are eventually corrected (general step 63 of FIG. 5) to take account of the patterns appearing at the edge of the block. Indeed, for all the pairs (%, ⁇ / ) of which at least one of the elements corresponds to the occurrence of a pattern that overflows on one of the block edges (arrow O at the output of the test 58), in step 60, corrected elements ⁇ ' ⁇ a k , ci ⁇ ) are calculated.
- the search for the best waveform is then carried out (step 61) using the conventional CELP search criterion, expressed as the maximization of a ratio in which the numerator implements the vector ⁇ d (a k ) ⁇ and the denominator the elements ⁇ '(a k , a t ), to finally obtain the best vector-code VC (step 62).
- FIG. 5 can illustrate, as an organogram, a part of the algorithm of the computer program allowing the use of a coding / decoding device comprising at least one dictionary within the meaning of the invention.
- Simplifications of the above method may also be provided.
- the relative energy of the elements that are squeezed out in the truncation operation is small relative to the energy of the elements that remain in the block, for the occurrences of the edges, it can be expected simply to neglect the effects. board (without then conducting tests 54 and 58).
- at least one (preferably step 63) or the two correction steps 53 and 63 can simply be deleted.
- Two methods of combination can be provided to provide a global dictionary capable of providing various representations of waveforms, in particular to provide a very satisfactory spectral richness. Indeed, it is possible to orient the contents of each basic dictionary to one or more categories of signals.
- FIG. 6 illustrates such a dictionary, presenting the union of two basic dictionaries D1 and D2, constructed from the same sets of positions for the centers of occurrences and the same sets of amplitudes, and each with two patterns comprising respectively: a single pulse Pat (D1) for the first base dictionary D1; and the pulse sequence Pat (D2) according to the pattern of FIG. 3a for the second base dictionary D2.
- each of the basic dictionaries is preferably explored separately, the best waveforms resulting from the search in each basic dictionary then being compared with each other in order to select the most appropriate one.
- the complexity of the search is in this case equivalent to the sum of the complexities of searches in each basic dictionary. The rapid searches, induced by the advantageous structure of the basic dictionaries as we saw earlier, have proved very effective.
- Exploration variants may also be proposed. For example, it is possible to first determine one (or more) basic dictionary (s) among the dictionaries that make up the global dictionary, and then to limit the search to the basic dictionaries thus preselected.
- the decoding of the indexes can be carried out by first identifying the base dictionary that has been selected (for example by comparing the index of the selected vector-code with values stored in memory corresponding to the boundaries of the basic dictionaries in the dictionary full). Then, the index of the code vector is decoded in the basic dictionary as previously indicated.
- the vectors of the dictionaries are formed simply by adding, one by one and sample by sample, all the vectors of the base dictionaries, possibly weighted by gains as in the second embodiment. which is described later.
- Figure 7 illustrates the principle of such an addition of basic dictionaries.
- two dictionaries D1, D2 are added only and it will be noted that the weights of the pulses of the vectors V1 i of the dictionary D1 are the same, in the sum D1 + D2, as those of the pulses of the vectors V2j of the dictionary D2. .
- a code vector belonging to a basic dictionary D2 can be represented by indicating the positions of the centers of the patterns and the amplitudes of the occurrences in the different dictionaries, that is to say for the different reasons, and then adding up the patterns scaled and so placed.
- the components of the code vectors of such a dictionary, obtained by summation of basic dictionaries, are expressed by a relation of the type:
- a second embodiment of a sum of basic dictionaries gives rise to simpler search algorithms.
- the principle consists in cascading the summation of the basic dictionaries, a different gain being associated with each sub-vector coming from the basic dictionaries.
- the excitation vector is expressed by:
- each basic dictionary is more particularly intended to enrich the global dictionary and, for example, according to a particular type of excitation signal, it may be advantageous to use different perceptual filters W 1 (z) (for / from 0 to / -1) for the different searches in the basic dictionaries.
- W 1 (z) for / from 0 to / -1
- a first base dictionary rather apt to represent the low frequency part of the excitation signal
- a second basic dictionary rather intended to represent the high frequency part.
- the conventional perceptual filter can be cascaded with a high-pass filter. Such an operation could also be called "spectral focusing". It will be described in detail below, with reference to Figure 9, to illustrate a particular embodiment.
- this second embodiment advantageously adapts to hierarchical CELP coding structures.
- the bitstream is hierarchized and, in the implementation of this second embodiment, the bits corresponding to the indices and the gains of each of the code sub-vectors of the base dictionaries can constitute layers. separate hierarchical (or "participate" in distinct layers). If the decoder receives only a part of this information, it will be able to reconstitute at least a part of the excitation by decoding the indices and gains received associated with the sub-code-vectors of the basic dictionaries of the first layers and by summing the excitations partial thus obtained.
- the first basic dictionary then provides the minimum quality coding and the following will allow a gradual increase in quality and a better consideration of the possible variety of signals, for example by offering an expanded spectral content.
- the bit stream of the first layer is "compatible" with that of the ITU-T G.729 standard encoder so that an encoder or a decoder within the meaning of the invention can operate with a decoder or an encoder according to the G.729 and its annexes for the 8 kbit / s rate.
- the hierarchy is ensured by the use of a dictionary according to the cascaded summation variant of the basic dictionaries in the sense of the invention.
- the block size is 5ms, ie 40 samples at 8 kHz.
- the first basic dictionary D1 (FIG. 8a) is of the "trivial" type and corresponds simply to the ACELP dictionary of the G.729 encoder, whose vectors are obtained by adding four signed pulses whose positions belong to the sets indicated in the table 2 given later.
- I 1 U IT-T G.729 (“Coding of Speech at 8 kbit / s using the Conjugate Structure Algebraic Code Excited Linear Prediction (CS-ACELP)", March 1996 ).
- the second basic dictionary D2 (FIG. 8b) is a non-trivial dictionary, whose base pattern (or "tri-pulse"), of length three, comprises three respective amplitude pulses -a, +1 and -a, with preferentially 0 ⁇ ⁇ 0.35.
- the value a may advantageously be chosen dynamically according to the characteristics of the input signal. The number of occurrences, the amplitudes and the positions of the centers of the pattern are identical to those of the first dictionary.
- FIG. 8c shows the shape of the average spectra of the waveforms of the first dictionary (arrow D1) and the second dictionary (arrow D2). It is found that the first dictionary has a spectrally flat content, while the second dictionary is richer in high frequencies.
- FIG. 9 illustrates an encoder according to this embodiment.
- a first stage ET-1 introduces the adaptive dictionary DICa (vector ⁇ p (n) ⁇ ) and its associated gain g p , then the first fixed dictionary D1 (vector [C 1 (Ii))) and the associated gain gi.
- a second stage ET-2 presents the search in the second fixed dictionary D2 (vector ⁇ c 2 (n) ⁇ ) and the associated gain g 2 .
- the search in the first basic dictionary D1 is known and uses, for example, one or the other of the fast and focused algorithms described in the G.729 standard and its reduced complexity Annex A (ITU-T Recommendation G.729, "Annex A: Reduced complexity 8 kbit / s CS-ACELP speech coded", November 1996).
- the search in the second base dictionary D2 also takes advantage of this fast algorithm, as described above.
- FIG. 9 can then schematically represent a device within the meaning of the invention, in particular here a coding device.
- h (n) is zero for n ⁇ 0 or n ⁇ 40
- h '(ri) is nonzero in principle for -l ⁇ n ⁇ 40.
- the correction (step 60) to be made to the elements ⁇ '( ⁇ k , ci ⁇ ) to take account of the left edge is as follows:
- ⁇ '(0,0) 0 (0,0) + ⁇ 2 x Yj ⁇ ⁇ n + V) 2 + 2 ⁇ h' (n) xh (n + V)
- the present invention is not limited to the embodiment described above by way of example; it extends to other variants.
- the dictionaries defined by the implementation of the invention offer great flexibility of use. Each block being totally independent of those which precede it or which follow it, it is possible to use for a block a dictionary totally different from that used for the neighboring blocks without particular precautions. This avoids possible problems of continuity. It is then very easy to adapt the dictionaries used to the signal to be coded, for example by modifying the pattern (s) used for the basic dictionaries. It can also be expected to modify the sets that define the positions of the centers of the patterns in the occurrences and / or sets of amplitudes. These possible modifications are for example particularly suitable for the case of variable rate encoders governed by the source.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP07731605A EP1994531B1 (fr) | 2006-02-22 | 2007-02-13 | Codage ou decodage perfectionnes d'un signal audionumerique, en technique celp |
JP2008555849A JP5188990B2 (ja) | 2006-02-22 | 2007-02-13 | Celp技術における、デジタルオーディオ信号の改善された符号化/復号化 |
KR1020087023140A KR101370017B1 (ko) | 2006-02-22 | 2007-02-13 | Celp 기술에서의 디지털 오디오 신호의 개선된 코딩/디코딩 |
CN2007800065199A CN101401153B (zh) | 2006-02-22 | 2007-02-13 | Celp技术中改进的数字音频信号的编码/解码 |
AT07731605T ATE520121T1 (de) | 2006-02-22 | 2007-02-13 | Verbesserte celp kodierung oder dekodierung eines digitalen audiosignals |
US12/224,205 US8271274B2 (en) | 2006-02-22 | 2007-02-13 | Coding/decoding of a digital audio signal, in CELP technique |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR0601563 | 2006-02-22 | ||
FR0601563 | 2006-02-22 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2007096550A2 true WO2007096550A2 (fr) | 2007-08-30 |
WO2007096550A3 WO2007096550A3 (fr) | 2007-10-11 |
Family
ID=37308852
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/FR2007/050780 WO2007096550A2 (fr) | 2006-02-22 | 2007-02-13 | Codage/decodage perfectionnes d'un signal audionumerique, en technique celp |
Country Status (7)
Country | Link |
---|---|
US (1) | US8271274B2 (fr) |
EP (1) | EP1994531B1 (fr) |
JP (1) | JP5188990B2 (fr) |
KR (1) | KR101370017B1 (fr) |
CN (1) | CN101401153B (fr) |
AT (1) | ATE520121T1 (fr) |
WO (1) | WO2007096550A2 (fr) |
Families Citing this family (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2466673B (en) | 2009-01-06 | 2012-11-07 | Skype | Quantization |
GB2466674B (en) * | 2009-01-06 | 2013-11-13 | Skype | Speech coding |
GB2466669B (en) * | 2009-01-06 | 2013-03-06 | Skype | Speech coding |
GB2466671B (en) * | 2009-01-06 | 2013-03-27 | Skype | Speech encoding |
US20100174539A1 (en) * | 2009-01-06 | 2010-07-08 | Qualcomm Incorporated | Method and apparatus for vector quantization codebook search |
GB2466675B (en) | 2009-01-06 | 2013-03-06 | Skype | Speech coding |
GB2466672B (en) * | 2009-01-06 | 2013-03-13 | Skype | Speech coding |
GB2466670B (en) * | 2009-01-06 | 2012-11-14 | Skype | Speech encoding |
US8452606B2 (en) * | 2009-09-29 | 2013-05-28 | Skype | Speech encoding using multiple bit rates |
WO2011052221A1 (fr) * | 2009-10-30 | 2011-05-05 | パナソニック株式会社 | Codeur, décodeur et procédés associés |
US9123334B2 (en) * | 2009-12-14 | 2015-09-01 | Panasonic Intellectual Property Management Co., Ltd. | Vector quantization of algebraic codebook with high-pass characteristic for polarity selection |
US8924203B2 (en) * | 2011-10-28 | 2014-12-30 | Electronics And Telecommunications Research Institute | Apparatus and method for coding signal in a communication system |
MX347921B (es) * | 2012-10-05 | 2017-05-17 | Fraunhofer Ges Forschung | Un aparato para la codificacion de una señal de voz que emplea prediccion lineal excitada por codigos algebraico en el dominio de autocorrelacion. |
CA3163664A1 (fr) | 2013-05-24 | 2014-11-27 | Dolby International Ab | Codeur et decodeur audio |
US9418671B2 (en) * | 2013-08-15 | 2016-08-16 | Huawei Technologies Co., Ltd. | Adaptive high-pass post-filter |
MY180722A (en) | 2013-10-18 | 2020-12-07 | Fraunhofer Ges Forschung | Concept for encoding an audio signal and decoding an audio signal using speech related spectral shaping information |
CA2927722C (fr) | 2013-10-18 | 2018-08-07 | Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V. | Concept pour l'encodage d'un signal audio et le decodage d'un signal audio au moyen d'informations deterministiques et de type bruit |
PL3069338T3 (pl) | 2013-11-13 | 2019-06-28 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Koder do kodowania sygnału audio, system przesyłania audio i sposób określania wartości korekcji |
EP2980794A1 (fr) | 2014-07-28 | 2016-02-03 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Codeur et décodeur audio utilisant un processeur du domaine fréquentiel et processeur de domaine temporel |
EP2980795A1 (fr) | 2014-07-28 | 2016-02-03 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Codage et décodage audio à l'aide d'un processeur de domaine fréquentiel, processeur de domaine temporel et processeur transversal pour l'initialisation du processeur de domaine temporel |
US10847170B2 (en) * | 2015-06-18 | 2020-11-24 | Qualcomm Incorporated | Device and method for generating a high-band signal from non-linearly processed sub-ranges |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0967594A1 (fr) * | 1997-10-22 | 1999-12-29 | Matsushita Electric Industrial Co., Ltd. | Codeur de sons et decodeur de sons |
US20020138256A1 (en) * | 1998-08-24 | 2002-09-26 | Jes Thyssen | Low complexity random codebook structure |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FI98104C (fi) * | 1991-05-20 | 1997-04-10 | Nokia Mobile Phones Ltd | Menetelmä herätevektorin generoimiseksi ja digitaalinen puhekooderi |
JPH10133697A (ja) * | 1996-09-05 | 1998-05-22 | Seiko Epson Corp | 音声符号化方法およびその装置 |
JP3174756B2 (ja) * | 1998-03-31 | 2001-06-11 | 松下電器産業株式会社 | 音源ベクトル生成装置及び音源ベクトル生成方法 |
JP3235543B2 (ja) * | 1997-10-22 | 2001-12-04 | 松下電器産業株式会社 | 音声符号化/復号化装置 |
JP3175667B2 (ja) * | 1997-10-28 | 2001-06-11 | 松下電器産業株式会社 | ベクトル量子化法 |
JP4173940B2 (ja) * | 1999-03-05 | 2008-10-29 | 松下電器産業株式会社 | 音声符号化装置及び音声符号化方法 |
US6449313B1 (en) * | 1999-04-28 | 2002-09-10 | Lucent Technologies Inc. | Shaped fixed codebook search for celp speech coding |
US6236960B1 (en) * | 1999-08-06 | 2001-05-22 | Motorola, Inc. | Factorial packing method and apparatus for information coding |
-
2007
- 2007-02-13 AT AT07731605T patent/ATE520121T1/de not_active IP Right Cessation
- 2007-02-13 EP EP07731605A patent/EP1994531B1/fr not_active Not-in-force
- 2007-02-13 KR KR1020087023140A patent/KR101370017B1/ko not_active IP Right Cessation
- 2007-02-13 US US12/224,205 patent/US8271274B2/en not_active Expired - Fee Related
- 2007-02-13 CN CN2007800065199A patent/CN101401153B/zh not_active Expired - Fee Related
- 2007-02-13 WO PCT/FR2007/050780 patent/WO2007096550A2/fr active Application Filing
- 2007-02-13 JP JP2008555849A patent/JP5188990B2/ja not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0967594A1 (fr) * | 1997-10-22 | 1999-12-29 | Matsushita Electric Industrial Co., Ltd. | Codeur de sons et decodeur de sons |
US20020138256A1 (en) * | 1998-08-24 | 2002-09-26 | Jes Thyssen | Low complexity random codebook structure |
Non-Patent Citations (2)
Title |
---|
EHARA H ET AL: "A HIGH QUALITY 4-KBIT/S SPEECH CODING ALGORITHM BASED ON MDP-CELP" VTC 2000-SPRING. 2000 IEEE 51ST. VEHICULAR TECHNOLOGY CONFERENCE PROCEEDINGS. TOKYO, JAPAN, MAY 15-18, 2000, IEEE VEHICULAR TECHNOLGY CONFERENCE, NEW YORK, NY : IEEE, US, vol. VOL. 2 OF 3. CONF. 51, 15 mai 2000 (2000-05-15), pages 1572-1576, XP000968135 ISBN: 0-7803-5719-1 * |
YASUNAGA K ET AL: "Dispersed-pulse codebook and its application to a 4KB/S speech coder" ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2000. ICASSP '00. PROCEEDINGS. 2000 IEEE INTERNATIONAL CONFERENCE ON 5-9 JUNE 2000, PISCATAWAY, NJ, USA,IEEE, vol. 3, 5 juin 2000 (2000-06-05), pages 1503-1506, XP010507636 ISBN: 0-7803-6293-4 * |
Also Published As
Publication number | Publication date |
---|---|
US8271274B2 (en) | 2012-09-18 |
CN101401153B (zh) | 2011-11-16 |
US20090222273A1 (en) | 2009-09-03 |
EP1994531B1 (fr) | 2011-08-10 |
EP1994531A2 (fr) | 2008-11-26 |
WO2007096550A3 (fr) | 2007-10-11 |
ATE520121T1 (de) | 2011-08-15 |
CN101401153A (zh) | 2009-04-01 |
JP5188990B2 (ja) | 2013-04-24 |
JP2009527784A (ja) | 2009-07-30 |
KR101370017B1 (ko) | 2014-03-05 |
KR20080110757A (ko) | 2008-12-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1994531B1 (fr) | Codage ou decodage perfectionnes d'un signal audionumerique, en technique celp | |
EP1692689B1 (fr) | Procede de codage multiple optimise | |
DK2102619T3 (en) | METHOD AND DEVICE FOR CODING TRANSITION FRAMEWORK IN SPEECH SIGNALS | |
FR2742568A1 (fr) | Procede d'analyse par prediction lineaire d'un signal audiofrequence, et procedes de codage et de decodage d'un signal audiofrequence en comportant application | |
EP0801790B1 (fr) | Procede de codage de parole a analyse par synthese | |
WO2009033288A1 (fr) | Procédé et dispositif de recherche dans un livre de codes algébriques lors d'un codage vocal ou audio | |
WO1996021221A1 (fr) | Procede de codage de parole a prediction lineaire et excitation par codes algebriques | |
EP0801788B1 (fr) | Procede de codage de parole a analyse par synthese | |
EP1692687B1 (fr) | Transcodage entre indices de dictionnaires multi-impulsionnels utilises en codage en compression de signaux numeriques | |
EP1836699B1 (fr) | Procédé et dispositif de codage audio optimisé entre deux modèles de prediction à long terme | |
FR2654542A1 (fr) | Procede et dispositif de codage de filtres predicteurs de vocodeurs tres bas debit. | |
EP2589045B1 (fr) | Codage/décodage prédictif linéaire adaptatif | |
EP1192619B1 (fr) | Codage et decodage audio par interpolation | |
EP1192618B1 (fr) | Codage audio avec liftrage adaptif | |
EP1192621B1 (fr) | Codage audio avec composants harmoniques | |
WO2002029786A1 (fr) | Procede et dispositif de codage segmental d'un signal audio | |
EP1194923B1 (fr) | Procedes et dispositifs d'analyse et de synthese audio | |
WO2013135997A1 (fr) | Modification des caractéristiques spectrales d'un filtre de prédiction linéaire d'un signal audionumérique représenté par ses coefficients lsf ou isf | |
WO2001003119A1 (fr) | Codage et decodage audio incluant des composantes non harmoniques du signal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2007731605 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2008555849 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 200780006519.9 Country of ref document: CN Ref document number: 3442/KOLNP/2008 Country of ref document: IN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020087023140 Country of ref document: KR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 12224205 Country of ref document: US |