EP1836699B1 - Verfahren und Vorrichtung zur Ausführung einer optimalizierten Audiokodierung zwischen zwei Langzeitvorhersagemodellen - Google Patents

Verfahren und Vorrichtung zur Ausführung einer optimalizierten Audiokodierung zwischen zwei Langzeitvorhersagemodellen Download PDF

Info

Publication number
EP1836699B1
EP1836699B1 EP06709052A EP06709052A EP1836699B1 EP 1836699 B1 EP1836699 B1 EP 1836699B1 EP 06709052 A EP06709052 A EP 06709052A EP 06709052 A EP06709052 A EP 06709052A EP 1836699 B1 EP1836699 B1 EP 1836699B1
Authority
EP
European Patent Office
Prior art keywords
dictionary
format
coding
ltp
orders
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Not-in-force
Application number
EP06709052A
Other languages
English (en)
French (fr)
Other versions
EP1836699A1 (de
Inventor
Mohamed Ghenania
Claude Lamblin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Orange SA
Original Assignee
France Telecom SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by France Telecom SA filed Critical France Telecom SA
Publication of EP1836699A1 publication Critical patent/EP1836699A1/de
Application granted granted Critical
Publication of EP1836699B1 publication Critical patent/EP1836699B1/de
Not-in-force legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/173Transcoding, i.e. converting between two coded representations avoiding cascaded coding-decoding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters

Definitions

  • the present invention relates to the encoding / decoding in compression of digital audio signals, especially speech signals and / or multimedia signals, in particular for transmission or storage applications. More particularly, it relates to an efficient determination of the parameters of a second long term prediction model (or " LTP " for " Long Term Prediction "), based on the parameters of at least one first LTP prediction model.
  • LTP long term prediction model
  • Compression encoders use digital audio signal properties such as local stationarity, exploited by short-term prediction filters, and its harmonic structure, exploited by LTP long-term prediction filters.
  • voiced speech sounds (such as vowels) have a long-term correlation due to vocal cord vibration.
  • the delay T is also called “pitch” period, or simply " pitch ".
  • the filter parameters vary according to the signals to be coded and for the same signal over time.
  • the range of pitch periods seeks to cover the range of fundamental frequencies of the human voice (from low to high voices). For the same speaker, this frequency also varies temporally.
  • the coefficient (s) of the filter evolve (s) also in time.
  • the parameters of P (z) are determined either by an open-loop analysis or by a closed-loop analysis or most often by a combination of the two analyzes.
  • Open loop analysis is performed by minimizing the prediction error on the signal to be modeled.
  • Closed-loop analysis (known as "synthesis analysis” ) minimizes the squared error, usually weighted, between the voice signal to be modeled and the synthesis signal.
  • an open loop search is first provided to determine a first estimate of the pitch called "open loop pitch”. Then, a synthesis analysis search on a restricted neighborhood around this anchor value makes it possible to obtain a more precise value of the pitch.
  • These analyzes are performed on sample blocks. The lengths of the open and closed loop analysis blocks are not necessarily equal. Often, only one open-loop analysis is performed for multiple closed-loop analyzes.
  • the determination of the LTP parameters is very expensive in computational complexity. It usually consists of an open loop on a large block of samples followed by closed loops on several sub-sample blocks (also called sub-frames).
  • the open loop search of the harmonic delay is a very expensive operation, coding. Usually, it requires the calculation of a function of auto-correlation of the signal for many values (in fact on range of variation of the delays). In the ITU-T G.723.1 coder, this delay range has 125 integral delays (from 18 to 142) and the open loop delay is estimated every 15 ms (ie for blocks of 120 samples).
  • the closed loop is also extremely expensive in calculations and, consequently, in resources. It requires the generation of adaptive excitations and their filtering.
  • the closed-loop analysis jointly determines the gain vector ( ⁇ i ) and a delay ⁇ (as a candidate pitch) of each sub-frame by exploring a dictionary of gain vectors for several candidate pitch values. This analysis accounts for nearly half of the total complexity of the 5.3 kbps G.723.1 encoder.
  • the complexity of the LTP analysis is particularly critical when several codings must be performed by the same processing unit such as a gateway responsible for managing many parallel communications or a server distributing a large number of multimedia contents.
  • the problem of complexity is further increased by the multiplicity of compression formats that circulate on networks. It is then provided multiple encodings or cascading (or "transcoding") or in parallel (multiformat coding or multimode coding).
  • Transcoding is typically used when, in a transmission chain, a compressed signal frame transmitted by an encoder can not continue in this format. Transcoding makes it possible to convert this frame into another format compatible with the rest of the transmission chain.
  • the most basic solution (and the most common at the moment) is the end-to-end addition of a decoder and an encoder.
  • multi-format compression systems where the same content is compressed in several formats (typically in the case of content servers that broadcast the same content in several formats adapted to the access conditions, networks and terminals of the various end users) the multi-coding operation becomes extremely complex as the number of desired formats increases, which can quickly saturate the resources of the systems.
  • Another case of multiple-coding in parallel is the post-decision multi-mode compression according to which, at each signal segment to be coded, several modes of compression are executed and the mode which optimizes a given criterion or obtains the best compromise rate / distortion is selected.
  • the complexity of each of the modes of compression limits the number and / or leads to elaborate a selection a priori of a very limited number of modes.
  • the solutions proposed today focus on limiting the number of values explored for the parameters of a second LTP model by using the parameters chosen by the first format, to reduce the complexity of the second format LTP search. .
  • Transcoding between two LTP monotap models is the simplest case. Most of the methods currently proposed concern the transcoding between delays, the transcoding of the LTP gain being done most often at the signal itself (we speak of a "partial" tandem). When the two models are identical (same dictionary of delays and same length of subframe), a simple copy of the bit fields of the delays of a flow of bits towards the other is enough. When the dictionaries differ in their resolution (integer or fractional 1/3, 1/6, etc.) and / or their ranges of values, a transcoding in the binary domain or parameters, with a possible transformation, is used. The transformation can be quantization, truncation, doubling, or splitting.
  • an interpolation of the delays can be provided. For example, the delays of a first format covering an output subframe are interpolated. We can then use this interpolated delay only when it is close to the delay obtained at the previous subframe, otherwise a conventional search is conducted.
  • Another more direct method, without interpolation is to select a delay among these delays of the first format. This selection can be made according to several criteria: last sub-frame, sub-frame having the most samples in common with the subframe of the second format or that which maximizes a criterion dependent on the LTP gain.
  • the determined delay is an anchor value for finding the delay of the second format. he can be used as the open loop delay of the second format around which a conventional or restricted closed-loop search is performed, or as a first estimate thereof, or as an anchoring of a delay trajectory.
  • this difficulty could be circumvented because the control procedure stability holds the maximum gain among the estimated gains (which can then be very dissimilar) and the adaptive pre-filter is inhibited for any multiptap model gains vector when, over the range of delays considered, the estimated gains are too different or the jigs on the delay are too dissimilar or too big. If, for the modules of adaptive pre-filtering and instability control of the long-term prediction filter, it is possible to circumvent the estimation difficulty without degrading the performances, these advantages are more difficult to achieve with the module of LTP analysis itself which plays a crucial role on the quality.
  • the 170 global gains calculated for each vector of the 170 entries of the dictionary can be very far from optimal gains.
  • the calculation of the fractional delay ⁇ ' may lead to a poor determination of fractional delay.
  • the present invention improves the situation.
  • the present invention aims to switch from a LTP model to a single coefficient (monotap) to a multi-coefficient LTP model (multitap) and vice versa, as well as switching between two multitap LTP models.
  • it proposes a process whose complexity can be adjusted, in particular according to a desired compromise between a targeted complexity and a desired quality.
  • a device for implementing the method according to the invention is, moreover, very useful for multiple coding in cascade (transcoding) or in parallel (multi-coding and multi-mode coding).
  • the invention aims first of a coding method as defined in claim 1, according to a second format, from information obtained by the implementation of at least one coding step in a first format.
  • the first and second formats implement, in particular for the coding of a speech signal, a step of searching LTP long-term prediction parameters by scanning at least one dictionary including candidate parameters, one of which least first and second coding format using a multi-coefficient filtering (called "multitap" above), for a fine search LTP parameters.
  • the invention is thus distinguished from existing solutions by the definition of orders in the dictionary and the exploitation of these orders in the dictionary search procedure.
  • the present invention is therefore part of the multiple coding in cascade or in parallel or in any other system using, to represent the long-term periodicity of a signal, a monotap or multitap type of modeling.
  • the invention makes it possible from the knowledge of the parameters of a first model to determine the parameters of a second model in the case where at least one of the two models uses multitap modeling.
  • multitap modeling For the sake of brevity, only the case of a transition from a first model to a second is described but it will be understood that the invention also applies in the case of passing from m (m ⁇ 1) first models to n ( n ⁇ 2) second models (where m and n are absolutely arbitrary).
  • the second coder COD2 has only the bit stream BS1 generated by the first coder COD1 and then including the bit codes of the parameters LTP1.
  • the invention is here applicable to intelligent transcoding.
  • the second coder COD2 also has the original signal s o (or a derived version) available to the first coder COD1 and the invention applies here to intelligent multi-coding. It is indicated that the invention can also be applied to the case particular of the multiple coding in parallel which is the multi-mode coding with a posteriori decision.
  • the first coder COD1 determined the parameters LTP1, in step 21, using at least its dictionary DIC1 (step 22).
  • the parameters LTP2 obtained (step 30) by applying the dictionary classification of the second coder within the meaning of the invention can themselves be used for the classification of a dictionary according to a third coding format (not shown), where appropriate, and so on for cascading transcoding or multiple coding in parallel.
  • the figure 2 is given here only for mainly didactic purposes.
  • the notation e i 2 , e j 2 , e k 2 , ... of the elements of the dictionary DIC2 is not really conventional, as will be seen later.
  • the classification of the DIC2 dictionary (step 25b) and the limitation of its elements to be taken into account for the search according to the quality / complexity criterion (step 28) can be conducted jointly substantially in one and the same step.
  • a first encoder COD1 issuing priori information (step 23) in the second coder COD2.
  • the second coder COD2 can simply recover from the first coder COD1 the binary codes of the parameters LTP1 that the first coder has determined and retrieve this information a priori, thanks in particular to the knowledge of the type of coding and the dictionary used by the first encoder COD1.
  • the processor 35 manages all or part of the modules of the device. For this purpose, it can be animated by a computer program product.
  • the present invention also aims at such a computer program product, stored in a memory of a processing unit or on a removable medium intended to cooperate with a reader of said processing unit or downloadable from a remote site, and comprising instructions for implementing all or part of the steps of the method according to the invention.
  • the device COD2 within the meaning of the invention, can directly recover the parameters LTP1 of the first coder COD1 in order to deduce the aforementioned information and, hence, the order of its dictionary DIC2, or, alternatively, receiving from the first coder COD1 directly the information a priori on the order of its dictionary, the first coder COD1. In the latter case, the first coder COD1 already plays a particular role in the invention.
  • the present invention also provides a system including the first encoder and the device within the meaning of the invention.
  • the device of the figure 3 can be inserted into a coding system implementing at least a first and a second coding format.
  • This system then comprises at least one coding device according to the first format COD1 and a coding device in the sense of the invention and then applying second format COD2.
  • the invention aims at such a system.
  • the coding device according to the first format and the coding device according to the second format can be cascaded for transcoding as shown in FIG. figure 1a .
  • the coding device according to the first format and the coding device according to the second format can be put in parallel, for multiple coding, as shown in FIG. figure 1 b.
  • the second coder COD2 can recover from the first coder COD1 (when the latter has determined the parameters LTP1) information that will enable him to order his dictionary DIC2 (see figure 2 ). Then, a search LTP only among the first elements (e i 2 , e j 2 ) DIC2 dictionary so ordered will maintain a good quality of the second coding.
  • This adjustment can be made at the beginning of treatment. It can also be performed at each block to be processed according to parameters of the first coding format and / or the characteristics of the signal to be coded (for example, according to a voicing criterion). For the same block, the complexity may also vary depending on the LTP subframes.
  • the invention offers a great flexibility which makes it possible to dynamically distribute the computing power available between the modules of the second encoder and / or the resources for processing the LTP subframes.
  • the dictionary DIC1 associated with a parameter of the first LTP model that orders DIC2 dictionary associated with a parameter of the second model LTP are determined. It is indicated that the determination of an order consists in classifying the elements of the second dictionary DIC2 according to a certain criterion. A ranking (or "order") is given by indexing the elements of the dictionary DIC2.
  • a first example is the elementary partition of a dictionary DIC1 of N elements in N disjoint classes of size 1. N orders of the second dictionary are then determined. More sophisticated partitions may be chosen, in particular by techniques known per se of quantization (vector or scalar) or of classification of the data.
  • the classes of the first dictionary are not necessarily disjoined.
  • the same element can be associated with more than one order of the second dictionary. Choice of order or combination orders can then take into account factors other than the current LTP parameter of the first dictionary.
  • the number of orders and the orders that are appropriate in the second dictionary are determined by a statistical and / or analytical study, as a function of successive sets of LTP parameters according to the first model.
  • This study defines, for each class of the partition of the dictionary associated with a LTP parameter of the first format, a classification of the dictionary of a parameter of the second format.
  • a statistical study was carried out on an off-line bench by associating in the same coder the LTP model of the first format and the LTP model of the second format. Paralleling the two LTP analyzes was the preferred learning configuration. Of course, other configurations may be used, including a conventional tandem cascading the two encodings.
  • the statistical study ensures, for each element of the first dictionary (or each class of its partition), a classification of the elements of the second dictionary according to a certain criterion.
  • this criterion evaluates the impact on the quality of the returned signal.
  • the quality criterion can be that used in the coding to select the second parameter LTP.
  • other criteria can be used, in particular the solicitation of an element of the second dictionary for a class of the first dictionary.
  • a combination of criteria can also be used.
  • An analytical study can also be performed to determine orders of the second dictionary based on a partition of the first dictionary.
  • the analytical study complements the statistical study described above. It is preferentially limited to dictionary parts that lead to satisfactory analytical approximations.
  • the partition of a first dictionary is preferably exploited and the commands of the second dictionary which are associated with this partition of the first dictionary.
  • each current subframe of the second coding format corresponds to a single sub-frame of the first coding format.
  • the first coding format has selected a set of parameters LTP (called "first set LTP1" ). Thanks to the partition of the dictionary associated with one of the LTP parameters of the first model, a search order of the second dictionary is selected by choosing the order associated with the class of the element of the first set LTP1. Then, the second dictionary is explored according to the order thus determined.
  • the number of elements tested is restricted. In general, we will remember that among all the elements of the second dictionary, only the first elements determined by the order that has been chosen are tested.
  • the two coding formats When the two coding formats have LTP subframes of different durations, it happens that a current subframe of the second format may correspond to more than one subframe of the first format. This situation is illustrated on the figure 5b , for example.
  • the first coding format selected sets of LTP parameters. Thanks to the partition of the dictionary associated with one of the LTP parameters of the first model, it is possible to preselect exploration orders of the second dictionary by choosing the orders associated with the classes of the elements of the first games. It may be that only one order is finally selected if the parameters chosen for the first subframes belong to the same class of the partition of the first dictionary. However, this is a special case. We are then brought back to the previous diagram corresponding to LTP subframes of identical duration. If, on the contrary, more than one order has been preselected, only one order (for example the most preselected order) may be retained, or the one which corresponds to the sub-frame of the first format which covers the plus the current subframe of the second format.
  • K orders have been selected, we first examine the first element of each K orders, eliminating any redundancies. We obtain K 1 elements (K i ⁇ K). Then, K 2 elements, such as K 2 ⁇ K and K 2 ⁇ NK 1 , chosen from the set consisting of the second element of the K orders (eliminating any redundancies), and so on until obtaining N elements, N being the maximum number of elements of the second dictionary to be tested.
  • N elements e i , e j , ..., e k , ... is schematically represented as the first elements of K ORD1, ORD2, ..., ORDK commands, on the figure 10 .
  • the number N of elements retained in the set ENS can be chosen for example according to the maximum complexity allowed. In this ranking, it is also possible to focus on the items most often ranked among the first.
  • the choice of N i is such that ⁇ Ni ⁇ N and makes it possible to treat the rankings fairly or, on the contrary, to favor certain rankings. Then, one selects all the elements present in the K subsets, then the elements present in K-1 subsets and so on until retaining N elements. If N elements have not been so obtained, the number of elements is completed by, for example, successively taking the following elements in the K subsets.
  • the second dictionary is preferably explored according to a "dynamic" order thus determined. This dynamic ordering procedure from predetermined and stored orders can also be applied when the classes of the partition are not disjoint and an element of the first dictionary belongs to more than one class.
  • the parameters of the monotap model of a COD1 format are available and it is sought to determine at a lower cost of calculation and / or resources those of the multitap model of a COD2 format.
  • the coder COD1 determined the pair ( ⁇ e , ⁇ e ) of parameters of the LTP monotap filter.
  • the coding of a sub-frame of COD2 requires the determination of pairs ( ⁇ s , ( ⁇ i ) s ) (where i is a gain index) of LTP multitap filter parameters.
  • the set of parameters of the first model is therefore ( ⁇ e , ⁇ e ).
  • the set of parameters of the second model is ( ⁇ s , ( ⁇ i ) s ).
  • the determination of the delay ⁇ s is done by one of the known methods of the state of the art. For example, it is possible to use the intelligent transcoding method which directly determines this delay ⁇ s by choosing as delay, the one determined by COD1 on its subframe which shares the most samples with the current sub-frame of COD2. (if this delay ⁇ e is fractional, we take its integer part or the nearest integer). This situation will be described later with particular reference to Figures 7a and 7b .
  • the vector of gains ( ⁇ i ) s for each COD2 subframe from at least one of the gains ⁇ e of the COD1 subframes is then determined with low complexity in the sense of the invention.
  • a study associating the two models LTP one made a partition of the first dictionary (here the dictionary of the scalar gains ⁇ e ).
  • orders of the second dictionary associated with this partition are determined. These orders correspond here to all the gain vectors ( ⁇ i ) s . From the scalar LTP gains ⁇ e chosen by the first COD1 format for its subframes corresponding to a current COD2 subframe, the orders of the second dictionary associated with the classes of these scalar gains are preselected.
  • N first gain vectors determined by this order are tested to select the best vector (according to a criterion such as the usual CELP criterion). It will be recalled that, thanks to orders, the number N can be easily adjusted as a function, for example, of the desired quality / complexity compromise. In general, N is much smaller than the size of the second dictionary.
  • the optimal gain vector of a multitap LTP filter of a second coding format is thus determined from at least one gain of a monotap LTP filter of a first format, significantly reducing the exploration complexity of the second dictionary of gain vectors and limiting the number of gain vectors to be tested.
  • the solution according to the invention makes it possible to adjust the exploration of the dictionary by according to the quality sought and complexity constraints. It will be understood that the invention involves more the different orders of the dictionary of vectors of gains than predefined and fixed subsets as in the aforementioned reference.
  • the steps outlined above can be applied to the focus of the closed-loop search in the two G.723.1 gain vector dictionaries from the LTP gains of the G.729 encoder.
  • the second dictionary consists of the set of jitter values ( ⁇ e - ⁇ s ). From the gain vectors ( ⁇ i ) e chosen by the first format COD1, for its subframes which correspond to the current subframe of COD2, the commands of the second dictionary associated with the classes of these gain vectors are preselected. Then, only one of these orders can be retained, or an order can be dynamically constituted. Finally, the "neighborhood" values thus determined around one or more anchoring delays ⁇ ' s are explored. The determination of the anchoring delay (s) is made by a method known in the state of the art.
  • the present invention therefore proposes an original solution making it possible to reduce the complexity of the determination of the delay ⁇ s by reducing the number of tested delay values of a monotap LTP model of a second coding format based on the knowledge of the parameters. of a multitap LTP model of a first coding format.
  • Most of the methods of the prior art only use the delay without exploiting the vector of gains.
  • both types of parameters are used here.
  • a vector of gains points to a set of several jitter values and not to a single value as in this reference. According to one of the advantages afforded by the invention, the problems associated with the approximation of a multitap LTP filter by a single monotap filter are thus overcome.
  • the ordered neighborhoods are intervals of increasing size. This measurement is particularly advantageous for focusing the search in open loop and / or closed.
  • An exemplary embodiment will be described below, relating to the closed-loop search of the LTP delay of the 8 kbit / s ITU-T G.729 encoder from the LTP parameters of the ITU-T G.723.1 6.3 kbit coder. / s.
  • the set of parameters of the first model is therefore written ( ⁇ e , ( ⁇ i ) e ).
  • the set of parameters of the second model is also written ( ⁇ s , ( ⁇ i ) s ). From at least one set of parameters selected by the first COD1 format, it is sought to obtain a delay ⁇ s and a gain vector ( ⁇ i ) s for the second format COD2.
  • the determination of the delay ⁇ s from at least one delay ⁇ e is done by a known method of the state of the art. It should be noted that the implementation of the present invention makes it possible here to determine with a low complexity the vector of gains ( ⁇ i ) s for each subframe of the second format COD2 from at least one vector of gains ( ⁇ i ) th subframes of the first COD1 format.
  • a partition of the first dictionary which is in this case that of the vectors of gains ( ⁇ i ) e .
  • the orders of the second dictionary are then determined (here that of the gain vectors ( ⁇ i ) s ) that are associated with this partition. From the gain vectors ( ⁇ i ) e chosen by the first format COD1 for its subframes which correspond to the current subframe of the second format COD2, the commands of the second dictionary associated with the classes of these gain vectors are preselected . Then, only one of these orders can be retained, or an order can be dynamically and evolutionarily constituted. Finally, the first earnings vectors determined by this order are tested to select the best one.
  • Three exemplary embodiments are presented which are aimed at transcoding between two different ITU-T G.729 and ITU-T G.723.1 coding formats for the first two, and a bit rate change within a multi-bit encoder ( G.723.1) for the latter.
  • a description of these two ITU-T coders is first given as well as their LTP modelings.
  • the synthesis model is used to extract the parameters modeling the signals to be coded.
  • the compression ratio varies from 1 to 16 so that these encoders operate at rates of 2 to 16 kbit / s in the telephone band, and at rates of 6 to 32 kbit / s. enlarged band.
  • the coding and digital decoding device of CELP type, synthesis analysis coder most currently used for the coding of the speech signals, is presented on the 4a.
  • the speech signal s 0 is sampled and converted into a series of blocks of (L ') samples called frames. In general, each frame is cut into smaller blocks of (L) samples, called subframes. Each block is synthesized by filtering a waveform extracted from a repertoire (also called fixed excitation dictionary), multiplied by a gain, through two filters varying in time.
  • the excitation dictionary is a finite set of waveforms of L samples.
  • the first filter is the long-term prediction filter.
  • a LTP (Long Term Prediction) analysis is used to evaluate the parameters of this long-term predictor that exploits the periodicity of voiced sounds. This predictor is equivalent to a dictionary storing past excitement for different delays. This dictionary is generally called "adaptive excitation dictionary”.
  • the second filter is the short-term prediction filter.
  • the Linear Prediction Coding ( LPC) analysis methods make it possible to obtain these short-term prediction parameters, which are representative of the vocal tract transfer function and characteristics of the signal spectrum.
  • the speech signal s 0 undergoes the LPC analysis 41 (not shown in detail), as well as an LTP analysis with a construction of the repertoire of fixed excitations 46 and adaptive excitations 45 for supplying the synthesis filter 44.
  • LPC analysis 41 not shown in detail
  • an LTP analysis with a construction of the repertoire of fixed excitations 46 and adaptive excitations 45 for supplying the synthesis filter 44.
  • a perceptual weighting module 42 and an error minimization module 43 are also provided.
  • the method used to determine the innovation sequence is therefore the method of synthesis analysis.
  • a large number of excitation dictionary innovation sequences are filtered by the two LTP and LPC filters, and the selected waveform is that producing the closest synthetic signal of the original signal according to a criterion.
  • perceptual weighting commonly known as the CELP criterion.
  • the ITU-T G.729 coder operates on a 3.4 kHz band-limited speech signal sampled at 8 kHz and cut into 10 ms frames (ie 80 samples per frame). Each frame is divided into two sub-frames (numbered hereinafter 0 and 1) of 40 samples (5 ms).
  • the LTP model of the ITU-T G.729 encoder is based on fractional resolution monotap modeling. At each frame, the LTP analysis determines a delay ⁇ i and a gain ⁇ i for each subframe. The figure 4b presents the main steps. At each frame, a search for the open-loop delay, denoted ⁇ OL , is performed in the value range [20; 143] (step 401).
  • the delay of the first sub-frame is searched in a closed loop around the open-loop delay ⁇ OL over the range [ ⁇ OL -3; ⁇ OL +3] (step 402).
  • the delay ⁇ 0 of the even subframe is determined with a fractional resolution of 1/3 in the range [ 19 ⁇ 1 3 ; 84 ⁇ 2 3 ] and in full resolution in the range [85; 143].
  • the delay ⁇ 1 of the second sub-frame is determined with a fractional resolution of 1/3 by synthesis analysis around ⁇ 0 over the range [int ( ⁇ 0 ) -5 2/3 ; int ( ⁇ 0 ) +4 2/3 ], int ( ⁇ 0 ) being the integer part of the possibly fractional delay ⁇ 0 (step 404).
  • the gain ⁇ is calculated once the determined closed-loop delay (steps 403 and 405). After searching for the fixed excitation, the gain ⁇ is quantized together with the gain of the fixed excitation by a seven-bit vector quantization.
  • the definition set (or dictionary) of the G.729 monotap LTP gain is therefore 128.
  • the ITU-T G.723.1 coder operates on a 3.4 kHz band-limited speech signal sampled at 8 kHz and cut into 30 ms frames (240 samples per frame). Each frame has 4 subframes of 7.5 ms (60 samples) grouped 2 by 2 in super subframes of 15 ms (120 samples).
  • the ITU-T G.723.1 coder uses multitap 5-order modeling.
  • the long-term predictor coefficients are vector quantized using two dictionaries previously stored at 85 or 170 inputs for the 6.3 kbit / s mode, while the 5.3 kbit / s mode uses only the 170-input dictionary. In the 6.3 kbit / s mode, the choice of the explored dictionary depends on the delay value of the even subframes.
  • the figure 4c illustrates the main steps of the LTP analysis of the G.723.1 encoder.
  • two closed loop LTPs are performed for each super subframe.
  • the delays ⁇ 2i of the even subframes are searched in a closed loop around the corresponding delay ⁇ i OL over the range [ ⁇ i OL -1; ⁇ i OL +1].
  • the earnings vector dictionary is also explored by synthetic analysis (step 411).
  • a similar search joint search of the gain vector and the closed-loop delay
  • the search for a delay ⁇ 2i + 1 in a loop closed is limited to the vicinity of the closed-loop delay of the previous sub-frame [ ⁇ 2i -1; ⁇ 2i + 2] (step 412).
  • a G.723.1 coding frame corresponds to three G.729 coding frames. It thus appears that the subframes of G.729 do not coincide with those of G.723.1, but on the contrary the seconds (7.5 ms) overlap the first ones (5 ms).
  • the figure 5b represents a frame of the G723.1 coding and three G.729 coding frames and their respective subframes. The subframes of the G.723.1 frame are numbered from 0 to 3. The three G.729 frames are grouped and their subframes are numbered from 0 to 5.
  • the delay is taken equal to the integer part of that of the sub-frames 1 and 4 of G.729.
  • a closed loop is performed around the previous delay (even subframe). This closed loop can be identical to that of G.723.1, but can also be restricted according to the desired complexity, or even eliminated to keep then the same value of delay on the two even and odd subframes.
  • the delay has been determined, it is still necessary to determine a vector of 5 gains in the vector dictionary of 5 coefficients selected by the G.723.1 coder.
  • the implementation of the present invention makes it possible to restrict its exploration to a limited number of gain vectors determined from the monotap LTP gains of the G.729 coder subframes.
  • Each subframe of G.723.1 covers (at least partially) two subframes of G.729.
  • Each of these two gains is associated with a ranking C (g i ) of the vectors of the multitap coefficient vector dictionary. This dictionary is selected by the delay value of the even subframe of G.723.1.
  • the exploration of the earnings vector dictionary is limited to the N vectors determined by the "dynamic" order thus constituted.
  • This focused exploration allows you to select the best earnings vector.
  • the selection criterion is the CELP criterion conventionally used by G.723.1 for exploring the dictionaries of vectors with LTP coefficients.
  • the solution exposed here allows a very strong reduction in the complexity of LTP analysis of G.723.1 coding without compromising quality.
  • Figures 9a and 9b for the two dictionaries, the histogram of the exploration sizes which guarantee a loss on the CELP criterion strictly lower than 1% compared to a complete exploration.
  • the exploration sizes are much smaller than the total size of the dictionary.
  • the average size is 39 for the dictionary with 85 vectors and 49 for the dictionary with 170 vectors.
  • the statistical study shows, even for average exploration sizes, well below the dictionary sizes (48 instead of 85 and 58 instead of 170), that the restricted exploration is optimal according to the CELP criterion (practically no loss on the CELP criterion). Focused research can therefore lead to performance equivalent to exhaustive search while exploring just over half of the size 85 dictionary and one third of the 170 dictionary.
  • parameters of the multitap LTP model of a G.723.1 frame are available and the G.729 monotap LTP parameters are sought for three frames, that is to say six frames. sub-frames (see figure 5b ).
  • each of the three G.729 frames first adopts for delay in open loop the delay of one of the subframes of the G.723.1 encoder.
  • the correspondence between G.729 frames and G.723.1 subframe is illustrated on the figure 6 .
  • the delay chosen by the G.723.1 encoder may be outside the range of values allowed by the G.729 encoder. Indeed, the smallest value allowed by the G.729 encoder is 19 while it is 18 for the G.723.1 encoder.
  • Several solutions are possible to work around this problem. Typically, one can for example double the delay from the G.723.1 coder, or more simply add 1.
  • the closed-loop search remains to be performed for each subframe. It is recalled that the value ranges are as follows: ⁇ 0 ⁇ ⁇ OL - 3 ; ⁇ OL + 3 and ⁇ 1 ⁇ int ⁇ 0 - 5 ⁇ 2 3 ; int ⁇ 0 + 4 ⁇ 2 3
  • the standard closed-loop search of the G.729 encoder consists firstly in successively testing all the integer values of the range (7 values for ⁇ 0 and 10 for ⁇ 1 ). Once the best integer value has been selected, the different fractions (-2/3, -1/3, 1/3, 2/3) are tested to determine the best one according to the chosen criterion, in this case the one that maximizes the criterion CELP. For the even subframe, it should be noted that the fractional part is only searched if the integer part of ⁇ 0 is less than 85.
  • the first dictionary in the definition of the invention given above is one of the two LTP gain vector dictionaries of the G.723.1 coder, the second dictionary being one of two sets of integer values of neighborhood (or jitter) around an anchoring delay. It will be understood that the invention can easily be applied to more than one first dictionary, on the one hand, and more than one second dictionary, on the other hand.
  • each G.729 sub-frame is associated with one or two G.723.1 subframes.
  • the neighborhood values of A ' are ranked in order of decreasing importance. The number of values tested is then determined according to the targeted complexity or the quality / complexity ratio.
  • FIG. figure 7a The association between the even (odd) subframes of the G.729 encoder and the set of parameters ( ⁇ j , ( ⁇ i ) j ) from the G.723.1 encoder is illustrated in FIG. figure 7a (respectively on the figure 7b ).
  • the anchor value ⁇ ' may be different from the delay ⁇ j of the set of parameters ( ⁇ j, ( ⁇ i ) j ) determined for the associated G.723.1 sub-frame. This point is explained later where we take into account the parity of the subframes (even or odd). In a first Alternatively, one can simply ignore a possible difference.
  • the set of ordered neighborhoods is modified according to the difference ( ⁇ j - ⁇ ') and the size of this set is possibly modified.
  • the difference ( ⁇ j - ⁇ ') is subtracted from each element of this ordered neighborhood according to the gains ( ⁇ i ) j and its intersection is considered with the set of definition of neighborhoods (here the interval [-3; 3] for even subframes and the interval [-5.4] for odd subframes, as discussed below.
  • the strategy can therefore be adapted to the sub-frame or the gap between the delays, or to the two criteria combined.
  • the search must be carried out around the open-loop delay ⁇ OL over the range [ ⁇ OL -3; ⁇ OL +3].
  • the gain vector (s) chosen by the G.723.1 encoder orders of all 7 jitter values (-3, -2, -1, 0, 1, 2, 3) are determined.
  • subframe 0 (respectively 2) of the G.729 encoder, there is only one subframe of the associated G.723.1 and therefore only one gain vector and, thus, a single order.
  • two sub-frames of the G.723.1 coder are associated with the sub-frame 4 of the G.729 coder, as shown in FIG. figure 7a .
  • the search must be carried out around the integer part ⁇ ' 2p of the previous sub-frame (pair) in the range [ ⁇ ' 2p -5 2/3 ; ⁇ ' 2p +4 2/3 ].
  • the delay ⁇ j of the set of parameters ( ⁇ j , ( ⁇ i ) j ) of (or of) associated G.723.1 subframes can be different from this anchor value ⁇ ' 2p .
  • the vector (s) ( ⁇ i ) j of gains chosen by the G.723.1 coder orders of all the jitter values are preselected and modified as a function of the difference ( ⁇ j - ⁇ ' 2p ).
  • N (N ⁇ 10) be the maximum number of values tested.
  • the following is preferentially performed for each odd subframe.
  • the total search range is [ ⁇ ' 2 -5 2/3 ; ⁇ ' 2 +4 2/3 ].
  • An order corresponding to the gain vector ( ⁇ i ) 2 is selected.
  • the ordered neighborhood is modified according to the difference ( ⁇ 2 - ⁇ ' 2 ).
  • the difference between ⁇ 2 and ⁇ ' 2 can be large and the intersection of the ordered neighborhood, modified by subtracting (A 2 -A' 2 ), can be zero.
  • the search is made over the entire range [ ⁇ ' 1 -5 2/3 ; ⁇ ' 1 +4 2/3 ].
  • the use of ordered neighborhoods can also be conditioned by a threshold on
  • the total search range is [ ⁇ ' 4 -5 2/3 ; ⁇ ' 4 +4 2/3 ].
  • An order corresponding to the gain vector ( ⁇ i ) 3 is selected.
  • the ordered neighborhood is modified according to the difference ( ⁇ 3 - ⁇ ' 4 ).
  • this gap is limited. Indeed, the closed-loop delay of G.729, ⁇ ' 2 , is in the vicinity ([-3.3]) of the open-loop delay (here taken as the closed-loop delay ⁇ 3 of G.723.1). .
  • the solution presented here allows a very sharp reduction in the complexity of LTP analysis of G.729 coding.
  • the invention makes it possible to test only 60% (respectively 40%) of the neighborhood values if the gain vector of the G.723.1 coder is in the dictionary with 170 entries (respectively 85 entries). ).
  • the delay of the even subframes can be used as the open loop delay of the super subframe and then restricted.
  • the variation range of the closed-loop delay of the 5.3 kbit / s mode as a function of the vector of five coefficients of the filter chosen by the 6.3 kbit / s mode.
  • no treatment other than a simple copy of the delay is necessary.
  • each 5.3 kbit / s subframe adopts for delay that the 6.3 kbit / s mode has chosen for the same subframe.
  • the gain vector that maximizes a criterion is then selected as described above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Claims (17)

  1. Verfahren zum Codieren eines Audiosignals in einem zweiten Format anhand von Informationen, die durch Ausführen wenigstens eines Schrittes zum Codieren in einem ersten Format erhalten werden, wobei das erste und das zweite Format, insbesondere zum Codieren eines Sprachsignals, einen Schritt zum Suchen von LTP-Parametern zur langfristigen Vorhersage durch Untersuchen wenigstens eines Wörterbuchs, das Kandidatenparameter enthält, ausführen, wobei das erste und/oder das zweite Codierungsformat eine Filterung mit mehreren Koeffizienten für eine feine Suche von LTP-Parametern verwenden, dadurch gekennzeichnet, dass es die folgenden Schritte umfasst:
    - Zugreifen auf Ergebnisse einer statistischen und/oder analytischen Untersuchung, die als Funktion aufeinander folgender Sätze von LTP-Parametern in dem ersten Codierungsformat ausgeführt werden, um eine Anzahl von Befehlen und geeignete Befehle in einem Wörterbuch, das das zweite Codierungsformat verwendet, zu bestimmen,
    - Wiedergewinnen einer hypothetischen Information für die Partition des ersten Wörterbuchs, die eine Klasse der Partition betrifft, zu der ein während der Codierung in dem ersten Format erhaltener LTP-Parameter gehört und die nach der Bestimmung der LTP-Parameter während der Codierung in dem ersten Format erhalten wird, um wenigstens einen Befehl des Wörterbuchs zu wählen, das das zweite Codierungsformat verwendet,
    - Anwenden des gewählten Befehls auf Kandidaten des Wörterbuchs, das das zweite Codierungsformat verwendet, um eine begrenzte Anzahl erster Kandidaten zu wählen, und
    - zum Ausführen der zweiten Codierung Ausführen der LTP-Suche ausschließlich in der begrenzten Anzahl von Kandidaten.
  2. Verfahren nach Anspruch 1, dadurch gekennzeichnet, dass zunächst eine elementare Partition des ersten Wörterbuchs vorgesehen wird, die N Elemente in N disjunkten Klassen mit der Größe 1 enthält.
  3. Verfahren nach Anspruch 1, wobei das erste Codierungsformat ein erstes Wörterbuch verwendet und das zweite Codierungsformat ein zweites Wörterbuch verwendet, dadurch gekennzeichnet, dass eine Partition des ersten Wörterbuchs in nicht disjunkten Klassen vorgesehen wird, derart, dass dasselbe Element mehr als einem Befehl des zweiten Wörterbuchs zugeordnet sein kann.
  4. Verfahren nach einem der Ansprüche 1 bis 3, wobei das erste Codierungsformat ein erstes Wörterbuch verwendet und das zweite Codierungsformat ein zweites Wörterbuch verwendet, dadurch gekennzeichnet, dass eine Umgruppierung ähnlicher Befehle vorgesehen ist, um die anfängliche Partition des ersten Wörterbuchs und von hier aus die Anzahl von Befehlen des zweiten Wörterbuchs dynamisch zu modifizieren.
  5. Verfahren nach Anspruch 4, dadurch gekennzeichnet, dass außerdem eine Operation vorgesehen ist, die darin besteht, die Befehle des zweiten Wörterbuchs nacheinander neu zu berechnen, sobald sie umgruppiert worden sind, und dass die anfängliche Partition des ersten Wörterbuchs und/oder die auf diese Weise umgruppierten Befehle dynamisch modifiziert werden.
  6. Verfahren nach einem der Ansprüche 4 bis 5, wobei das erste Codierungsformat ein erstes Wörterbuch verwendet und das zweite Codierungsformat ein zweites Wörterbuch verwendet, dadurch gekennzeichnet, dass für jeden der Befehle des zweiten Wörterbuchs eine zu berücksichtigende maximale Anzahl von Elementen des zweiten Wörterbuchs als Funktion der Klassen des ersten Wörterbuchs und/oder der Befehle des zweiten Wörterbuchs gewählt wird, um ein für die Speicherung der Befehle des zweiten Wörterbuchs verwendetes Speicherbetriebsmittel zu begrenzen.
  7. Verfahren nach einem der vorhergehenden Ansprüche, dadurch gekennzeichnet, dass die begrenzte Anzahl von Kandidaten als Funktion eines Kompromisses zwischen der Qualität und der Komplexität der zweiten Codierung gewählt wird.
  8. Verfahren nach Anspruch 7, wobei ein zu codierendes Eingangssignal in Datenblöcken verarbeitet wird, dadurch gekennzeichnet, dass der Kompromiss bei jedem zu verarbeitenden Datenblock als Funktion von Parametern des ersten Codierungsformats und/oder von Charakteristiken des zu codierenden Signals und vorzugsweise als Funktion von LTP-Unterrahmen, die jeder Datenblock enthält, festgelegt wird.
  9. Verfahren nach einem der Ansprüche 1 bis 8, wobei ein zu codierendes Eingangssignal in Datenblöcken verarbeitet wird, wovon jeder für das erste Codierungsformat erste LTP-Unterrahmen enthält und für das zweite Codierungsformat zweite LTP-Unterrahmen enthält, dadurch gekennzeichnet, dass für die ersten und zweiten Unterrahmen mit gleicher Dauer jedem momentanen Unterrahmen des zweiten Codierungsformats ein einziger Unterrahmen des ersten Codierungsformats entspricht, und dass:
    - das erste Codierungsformat einen ersten Satz von LTP-Parametern für den momentanen Unterrahmen wählt,
    - anhand der Partition nach Klassen des Wörterbuchs, die einem der LTP-Parameter des ersten Formats zugeordnet ist, ein Befehl zum Auswerten des Wörterbuchs des zweiten Formats gewählt wird, indem ein der Klasse des Elements des ersten Satzes zugeordneter Befehl gewählt wird, und
    - gemäß dem auf diese Weise gewählten Befehl eine begrenzte Anzahl erster Kandidaten des Wörterbuchs des zweiten Formats untersucht wird.
  10. Verfahren nach einem der Ansprüche 1 bis 8, wobei ein zu codierendes Eingangssignal in Datenblöcken verarbeitet wird, wovon jeder für das erste Codierungsformat erste LTP-Unterrahmen und für das zweite Codierungsformat zweite LTP-Unterrahmen enthält, dadurch gekennzeichnet, dass für erste und zweite Unterrahmen mit unterschiedlicher Dauer:
    - das erste Codierungsformat mehrere Sätze von LTP-Parametern für erste Unterrahmen, die im Wesentlichen einem momentanen zweiten Unterrahmen entsprechen, wählt,
    - anhand der Partition nach Klassen des Wörterbuchs, die einem der LTP-Parameter des ersten Formats zugeordnet ist, im Voraus Befehle zum Untersuchen des Wörterbuchs des zweiten Formats gewählt werden, indem die den Klassen der Elemente der Sätze von LTP-Parametern zugeordneten Befehle gewählt werden,
    - anhand der Vorauswahl dieser Befehle wenigstens ein bevorzugter Befehl bestimmt wird und
    - das Wörterbuch des zweiten Formats gemäß dem bevorzugten Befehl untersucht wird, indem eine Beschränkung auf seine ersten Elemente erfolgt.
  11. Verfahren nach einem der vorhergehenden Ansprüche, wobei das erste Codierungsformat eine Filterung mit einem einzigen Koeffizienten für die ersten LTP-Unterrahmen verwendet, während das zweite Codierungsformat eine Filterung mit mehreren Koeffizienten für die zweiten LTP-Unterrahmen verwendet, dadurch gekennzeichnet, dass:
    - für jeden ersten Unterrahmen durch Verwenden des ersten Codierungsformats ein Paar erster Parameter (λe, βe) des LTP-Filters mit einem einzigen Koeffizienten bestimmt wird,
    - für die Codierung eines momentanen zweiten Unterrahmens mehrere Paare von Parametern (λs, (βi)s) des LTP-Filters mit mehreren Koeffizienten auf der Grundlage des Satzes von Parametern (λe, βe) des ersten Formats bestimmt werden, mit:
    einer Bestimmung einer LTP-Verzögerung (λs), die vorzugsweise jener entspricht, die durch das erste Codierungsformat an einem ersten Unterrahmen bestimmt wird, die den momentanen zweiten Unterrahmen am weitesten abdeckt,
    einer Bestimmung eines Vektors von Verstärkungen (βi)s für den momentanen zweiten Unterrahmen anhand wenigstens einer Verstärkung βe der ersten Unterrahmen für die Ausführung der Schritte b), c) und d), wobei die Befehle des Wörterbuchs des zweiten Formats einer Gesamtheit von Verstärkungsvektoren (βi)s des zweiten Unterrahmens entsprechen.
  12. Verfahren nach Anspruch 11, dadurch gekennzeichnet, dass für die Codierung eines zweiten momentanen Unterrahmens:
    - anhand erster LTP-Verstärkungen des ersten Formats (βe), die für einen oder mehrere erste Unterrahmen gewählt werden, die einem zweiten momentanen Unterrahmen entsprechen, die Befehle des Wörterbuchs des zweiten Formats, die Klassen erster LTP-Verstärkungen zugeordnet sind, im Voraus gewählt werden,
    - ein einziger dieser Befehle vorzugsweise dynamisch anhand der im Voraus gewählten Befehle für den zweiten momentanen Unterrahmen gebildet wird, und
    - N erste Vektoren von zweiten Verstärkungen, die durch den gebildeten Befehl bestimmt werden, getestet werden, um gemäß einem gewählten Kriterium einen besten Verstärkungsvektor zu wählen, der dem zweiten Unterrahmen zugeordnet werden soll.
  13. Verfahren nach einem der Ansprüche 1 bis 10, wobei das zweite Codierungsformat eine Filterung mit einem einzigen Koeffizienten für zweite LTP-Unterrahmen verwendet, während das erste Codierungsformat eine Filterung mit mehreren Koeffizienten für erste LTP-Unterrahmen verwendet, dadurch gekennzeichnet, dass:
    - für jeden ersten Unterrahmen für die Verwendung des ersten Codierungsformats ein erster Satz von LTP-Parametern λe, (βi)e bestimmt wird, der einem Paar entspricht, das eine LTP-Verzögerung λe und einen Vektor (βi)e zugeordneter Verstärkungen des LTP-Filters mit mehreren Koeffizienten enthält,
    - eine Partition eines Wörterbuchs von Verstärkungsvektoren (βi)e des ersten Formats ausgeführt wird,
    - für die Codierung eines zweiten momentanen Unterrahmens mit dem zweiten Format Befehle eines Wörterbuchs des zweiten Formats für erste Unterrahmen, die dem zweiten momentanen Unterrahmen entsprechen, bestimmt werden, wobei das Wörterbuch des zweiten Formats aus einer Gesamtheit von Jitter-Werten gebildet ist und die Befehle dieses Wörterbuchs der Partition des Wörterbuchs des ersten Formats zugeordnet sind,
    - ein Befehl der Jitter-Werte bestimmt wird und nacheinander Werte von LTP-Verzögerungen für das zweite Format an den Jitter-Werten, die auf diese Weise geordnet worden sind, und um eine oder mehrere Verankerungsverzögerungen, die als Funktion der Verzögerungen λe in den ersten Unterrahmen bestimmt werden, untersucht werden.
  14. Verfahren nach einem der Ansprüche 1 bis 10, wobei das erste Codierungsformat eine Filterung mit mehreren Koeffizienten an ersten LTP-Unterrahmen verwendet und das zweite Codierungsformat eine Filterung mit mehreren Koeffizienten an zweiten LTP-Unterrahmen verwendet, dadurch gekennzeichnet, dass:
    - anhand wenigstens eines ersten Satzes von Parametern, der durch das erste Format gewählt wird und wenigstens einem Vektor von Verstärkungen (βi)e enthält, der für wenigstens einen ersten Unterrahmen bestimmt wird, eine Partition des Wörterbuchs des ersten Formats vorgenommen wird, die einem Wörterbuch von Verstärkungsvektoren des ersten Formats (βi)e entspricht,
    - daraus Befehle des Wörterbuchs des zweiten Formats abgeleitet werden, die einem Wörterbuch von Verstärkungsvektoren (βi)s des zweiten Formats entsprechen, wobei diese Befehle der Partition zugeordnet sind,
    - anhand der Verstärkungsvektoren (βi)e, die mit dem ersten Format für erste Unterrahmen gewählt werden, die im Wesentlichen den momentanen zweiten Unterrahmen abdecken, im Voraus Befehle des zweiten Wörterbuchs gewählt werden, die Klassen dieser Partition zugeordnet sind,
    - einer der im Voraus gewählten Befehle berücksichtigt wird,
    - mehrere Verstärkungsvektoren, die dem zweiten momentanen Unterrahmen zugeordnet werden sollen, als Funktion des berücksichtigten Befehls bestimmt werden und
    - durch Tests an den mehreren Verstärkungsvektoren der beste Verstärkungsvektor gemäß einem gewählten Kriterium ausgewählt wird.
  15. Vorrichtung zum Codieren eines Audiosignals mit einem zweiten Format, die dazu ausgelegt ist, erhaltene Codierungsinformationen für die Ausführung einer Codierung mit einem ersten Format zu verwenden, wobei das erste und das zweite Format, insbesondere für das Codieren eines Sprachsignals, eine Suche von LTP-Parametern zur langfristigen Vorhersage durch Untersuchen eines Wörterbuchs, das Kandidatenparameter enthält, ausführen, wobei das erste und/oder das zweite Codierungsformat eine Filterung mit mehreren Koeffizienten für eine feine Suche von LTP-Parametern verwenden, dadurch gekennzeichnet, dass sie umfasst:
    - einen Speicher, der eine Korrespondenztabelle speichert, die als Funktion von LTP-Parametern, die durch das erste Codierungsformat bestimmt werden, Befehle eines Wörterbuchs definiert, das das zweite Codierungsformat verwendet, wobei die Korrespondenztabelle anhand von Ergebnissen einer statistischen und/oder analytischen Untersuchung definiert ist, die als Funktion von aufeinander folgenden Sätzen von LTP-Parametern mit dem ersten Codierungsformat vorgenommen wird, um eine Anzahl von Befehlen und geeignete Befehle in einem Wörterbuch, das das zweite Codierungsformat verwendet, zu bestimmen,
    - Mittel, um ein Signal wiederzugewinnen, das wenigstens eine hypothetische Information für die Partition des ersten Wörterbuchs angibt, die eine Klasse der Partition betrifft, zu der ein LTP-Parameter gehört, der während der Codierung mit dem ersten Format erhalten wird, und die nach der Bestimmung von LTP-Parametern während einer Codierung mit dem ersten Format erhalten wird, um wenigstens einen Befehl des Wörterbuchs zu wählen, das das zweite Codierungsformat verwendet,
    - aktive Mittel zum Empfangen des Signals, um die Korrespondenztabelle abzufragen und um wenigstens einen Befehl des Wörterbuchs, das das zweite Codierungsformat verwendet, zu wählen,
    - Rechenmittel, um:
    das Wörterbuch, das das zweite Codierungsformat verwendet, gemäß dem gewählten Befehl zu ordnen, um eine begrenzte Anzahl erster Kandidaten in dem Wörterbuch zu wählen, und
    die Codierung mit dem zweiten Format fortzusetzen, indem die LTP-Suche ausschließlich in dieser begrenzten Anzahl von Kandidaten vorgenommen wird.
  16. Codierungsverfahren, das wenigstens ein erstes und ein zweites Codierungsformat verwendet, dadurch gekennzeichnet, dass es wenigstens eine Vorrichtung für die Codierung mit dem ersten Format und eine Codierungsvorrichtung nach Anspruch 15, die das zweite Format anwendet, umfasst.
  17. Computerprogrammprodukt, das in einem Speicher einer Verarbeitungseinheit oder in einem entnehmbaren Träger, der dazu vorgesehen ist, mit einem Lesegerät der Verarbeitungseinheit zusammenzuwirken, gespeichert ist oder das von einem entfernten Ort fernladbar ist, dadurch gekennzeichnet, dass es Befehle für die Ausführung aller oder einiger Schritte des Verfahrens nach einem der Ansprüche 1 bis 14 enthält.
EP06709052A 2005-01-11 2006-01-09 Verfahren und Vorrichtung zur Ausführung einer optimalizierten Audiokodierung zwischen zwei Langzeitvorhersagemodellen Not-in-force EP1836699B1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR0500272A FR2880724A1 (fr) 2005-01-11 2005-01-11 Procede et dispositif de codage optimise entre deux modeles de prediction a long terme
PCT/FR2006/000038 WO2006075078A1 (fr) 2005-01-11 2006-01-09 Procede et dispositif de codage optimise entre deux modeles de prediction a long terme

Publications (2)

Publication Number Publication Date
EP1836699A1 EP1836699A1 (de) 2007-09-26
EP1836699B1 true EP1836699B1 (de) 2011-06-29

Family

ID=34954835

Family Applications (1)

Application Number Title Priority Date Filing Date
EP06709052A Not-in-force EP1836699B1 (de) 2005-01-11 2006-01-09 Verfahren und Vorrichtung zur Ausführung einer optimalizierten Audiokodierung zwischen zwei Langzeitvorhersagemodellen

Country Status (6)

Country Link
US (1) US8670982B2 (de)
EP (1) EP1836699B1 (de)
CN (1) CN101124625B (de)
AT (1) ATE515019T1 (de)
FR (1) FR2880724A1 (de)
WO (1) WO2006075078A1 (de)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2839969B1 (fr) 2002-05-27 2005-04-01 Jean Couturier Liant hydraulique resultant du melange d'un liant sulfatique et d'un liant a caractere pouzzolanique
US7912700B2 (en) * 2007-02-08 2011-03-22 Microsoft Corporation Context based word prediction
US7809719B2 (en) * 2007-02-08 2010-10-05 Microsoft Corporation Predicting textual candidates
US8521520B2 (en) * 2010-02-03 2013-08-27 General Electric Company Handoffs between different voice encoder systems
CN103138874B (zh) * 2011-11-23 2016-07-06 中国移动通信集团公司 一种编解码动态协商方法及设备
US9830920B2 (en) 2012-08-19 2017-11-28 The Regents Of The University Of California Method and apparatus for polyphonic audio signal prediction in coding and networking systems
US9406307B2 (en) * 2012-08-19 2016-08-02 The Regents Of The University Of California Method and apparatus for polyphonic audio signal prediction in coding and networking systems

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6260009B1 (en) * 1999-02-12 2001-07-10 Qualcomm Incorporated CELP-based to CELP-based vocoder packet translation
US6687668B2 (en) * 1999-12-31 2004-02-03 C & S Technology Co., Ltd. Method for improvement of G.723.1 processing time and speech quality and for reduction of bit rate in CELP vocoder and CELP vococer using the same
JP2002202799A (ja) * 2000-10-30 2002-07-19 Fujitsu Ltd 音声符号変換装置
JP2002229599A (ja) * 2001-02-02 2002-08-16 Nec Corp 音声符号列の変換装置および変換方法
JP4231987B2 (ja) * 2001-06-15 2009-03-04 日本電気株式会社 音声符号化復号方式間の符号変換方法、その装置、そのプログラム及び記憶媒体
CN100527225C (zh) * 2002-01-08 2009-08-12 迪里辛姆网络控股有限公司 基于celp的语音代码之间的代码转换方案
US6829579B2 (en) * 2002-01-08 2004-12-07 Dilithium Networks, Inc. Transcoding method and system between CELP-based speech codes
JP4263412B2 (ja) * 2002-01-29 2009-05-13 富士通株式会社 音声符号変換方法
CA2388439A1 (en) * 2002-05-31 2003-11-30 Voiceage Corporation A method and device for efficient frame erasure concealment in linear predictive based speech codecs
US20040057521A1 (en) * 2002-07-17 2004-03-25 Macchina Pty Ltd. Method and apparatus for transcoding between hybrid video CODEC bitstreams
US7519532B2 (en) * 2003-09-29 2009-04-14 Texas Instruments Incorporated Transcoding EVRC to G.729ab
FR2867648A1 (fr) * 2003-12-10 2005-09-16 France Telecom Transcodage entre indices de dictionnaires multi-impulsionnels utilises en codage en compression de signaux numeriques
US7792670B2 (en) * 2003-12-19 2010-09-07 Motorola, Inc. Method and apparatus for speech coding

Also Published As

Publication number Publication date
WO2006075078A1 (fr) 2006-07-20
CN101124625B (zh) 2012-02-29
FR2880724A1 (fr) 2006-07-14
US8670982B2 (en) 2014-03-11
ATE515019T1 (de) 2011-07-15
US20080306732A1 (en) 2008-12-11
CN101124625A (zh) 2008-02-13
EP1836699A1 (de) 2007-09-26

Similar Documents

Publication Publication Date Title
EP1692689B1 (de) Optimiertes mehrfach-codierungsverfahren
EP1994531B1 (de) Verbesserte celp kodierung oder dekodierung eines digitalen audiosignals
BE1005622A3 (fr) Methodes de codage de segments du discours et de reglage du pas pour des systemes de synthese de la parole.
EP2277172B1 (de) Verbergung von übertragungsfehlern in einem digitalsignal in einer hierarchischen decodierungsstruktur
EP0749626B1 (de) Verfahren zur sprachkodierung mittels linearer prädiktion und anregung durch algebraische kodes
EP1836699B1 (de) Verfahren und Vorrichtung zur Ausführung einer optimalizierten Audiokodierung zwischen zwei Langzeitvorhersagemodellen
EP1692687B1 (de) Transcodierung zwischen den indizes von mehrimpuls-wörterbüchern zur codierung bei der digitalen signalkomprimierung
EP2727107B1 (de) Verzögerungsoptimierte kodierungs-/dekodierungs-gewichtungsfenster durch überlappungstransformation
JP2003122400A (ja) 低ビットレートcelp符号化のための連続タイムワーピングに基づく信号の修正
FR3001593A1 (fr) Correction perfectionnee de perte de trame au decodage d'un signal.
EP2795618B1 (de) Verfahren zur erkennung eines vorgegebenen frequenzbandes in einem audiodatensignal, erkennungsvorrichtung und computerprogramm dafür
EP0428445B1 (de) Verfahren und Einrichtung zur Codierung von Prädiktionsfiltern in Vocodern mit sehr niedriger Datenrate
EP2080194B1 (de) Dämpfung von stimmüberlagerung, im besonderen zur erregungserzeugung bei einem decoder in abwesenheit von informationen
FR2762464A1 (fr) Procede et dispositif de codage d'un signal audiofrequence par analyse lpc "avant" et "arriere"
WO2006114494A1 (fr) Procede d’adaptation pour une interoperabilite entre modeles de correlation a cout terme de signaux numeriques
FR2784218A1 (fr) Procede de codage de la parole a bas debit
EP1197952B1 (de) Verfahren zur Kodierung von Prosodie für die Sprachkodierung mit sehr niedriger Datenrate
WO2023165946A1 (fr) Codage et décodage optimisé d'un signal audio utilisant un auto-encodeur à base de réseau de neurones
EP2652735B1 (de) Verbesserte kodierung einer verbesserungsstufe bei einem hierarchischen kodierer
EP0573358B1 (de) Verfahren und Vorrichtung zur Sprachsynthese mit variabler Geschwindigkeit
EP2589045B1 (de) Adaptive lineare prädiktive codierung/decodierung
WO2011144863A1 (fr) Codage avec mise en forme du bruit dans un codeur hierarchique
WO2002029786A1 (fr) Procede et dispositif de codage segmental d'un signal audio
FR2980620A1 (fr) Traitement d'amelioration de la qualite des signaux audiofrequences decodes

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20070705

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR

DAX Request for extension of the european patent (deleted)
17Q First examination report despatched

Effective date: 20100225

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

RTI1 Title (correction)

Free format text: METHOD AND DEVICE FOR CARRYING OUT OPTIMIZED AUDIO CODING BETWEEN TWO LONG-TERM PREDICTION MODELS

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

Free format text: NOT ENGLISH

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

Free format text: LANGUAGE OF EP DOCUMENT: FRENCH

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602006022775

Country of ref document: DE

Effective date: 20110825

REG Reference to a national code

Ref country code: NL

Ref legal event code: VDEP

Effective date: 20110629

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20110629

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20110629

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20110930

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20110629

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20110629

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20110629

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20110629

REG Reference to a national code

Ref country code: IE

Ref legal event code: FD4D

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20110629

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20110629

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20111031

Ref country code: IE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20110629

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20110629

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20111029

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20110629

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20110629

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20110629

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20110629

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20110629

26N No opposition filed

Effective date: 20120330

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20110629

BERE Be: lapsed

Owner name: FRANCE TELECOM

Effective date: 20120131

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 602006022775

Country of ref document: DE

Effective date: 20120330

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MC

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20120131

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20120131

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20120131

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20120131

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20111010

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20110929

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: TR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20110629

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20120109

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: HU

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20060109

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 10

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20141219

Year of fee payment: 10

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20141218

Year of fee payment: 10

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20150121

Year of fee payment: 10

REG Reference to a national code

Ref country code: DE

Ref legal event code: R119

Ref document number: 602006022775

Country of ref document: DE

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20160109

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

Effective date: 20160930

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160109

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160802

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20160201