EP1192619B1 - Audio-kodierung, dekodierung zur interpolation - Google Patents
Audio-kodierung, dekodierung zur interpolation Download PDFInfo
- Publication number
- EP1192619B1 EP1192619B1 EP00949621A EP00949621A EP1192619B1 EP 1192619 B1 EP1192619 B1 EP 1192619B1 EP 00949621 A EP00949621 A EP 00949621A EP 00949621 A EP00949621 A EP 00949621A EP 1192619 B1 EP1192619 B1 EP 1192619B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- audio signal
- frames
- spectral
- cepstral coefficients
- subset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 230000003595 spectral effect Effects 0.000 claims abstract description 79
- 230000005236 sound signal Effects 0.000 claims abstract description 61
- 238000000034 method Methods 0.000 claims abstract description 37
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 22
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 20
- 238000001228 spectrum Methods 0.000 claims description 64
- 238000013139 quantization Methods 0.000 claims description 31
- 238000004458 analytical method Methods 0.000 claims description 25
- 230000006835 compression Effects 0.000 claims description 10
- 238000007906 compression Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 4
- 230000001131 transforming effect Effects 0.000 claims 2
- 230000002123 temporal effect Effects 0.000 abstract description 8
- 239000013598 vector Substances 0.000 description 65
- 238000011002 quantification Methods 0.000 description 21
- 230000006870 function Effects 0.000 description 20
- 238000010586 diagram Methods 0.000 description 12
- 238000000605 extraction Methods 0.000 description 9
- 230000009466 transformation Effects 0.000 description 9
- 230000006837 decompression Effects 0.000 description 8
- 238000010606 normalization Methods 0.000 description 7
- 230000006978 adaptation Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 6
- 238000009499 grossing Methods 0.000 description 6
- 150000001875 compounds Chemical class 0.000 description 5
- 238000012886 linear function Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 230000000873 masking effect Effects 0.000 description 4
- 238000002156 mixing Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 101100129500 Caenorhabditis elegans max-2 gene Proteins 0.000 description 3
- 101100083446 Danio rerio plekhh1 gene Proteins 0.000 description 3
- 102100032077 Neuronal calcium sensor 1 Human genes 0.000 description 3
- 101710133725 Neuronal calcium sensor 1 Proteins 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000001308 synthesis method Methods 0.000 description 3
- 101100459912 Caenorhabditis elegans ncs-1 gene Proteins 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 239000006185 dispersion Substances 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 238000010183 spectrum analysis Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000005311 autocorrelation function Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 210000004704 glottis Anatomy 0.000 description 1
- 230000012447 hatching Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000011282 treatment Methods 0.000 description 1
- 230000002087 whitening effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/10—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation
Definitions
- the present invention relates to the field of signal coding. audio. It applies in particular, but not exclusively, to the coding of the speech, narrowband or wideband, in various rate ranges coding.
- the main purpose of designing an audio codec is to provide good compromise between the flow rate transmitted by the encoder and the signal quality audio that the decoder is able to reconstruct from this stream.
- the coder estimates a fundamental frequency of the signal, representing its pitch pitch
- spectral analysis consists of determining parameters representing the harmonic structure of the signal at frequencies which are integer multiples of this fundamental frequency.
- a modeling of the non-harmonic, or non-voiced component can also be performed in the spectral domain.
- the parameters transmitted to the decoder represent typically the modulus of the spectrum of the voiced and unvoiced components. he is added information representing either voiced / unvoiced decisions relating to different portions of spectrum, i.e. information on the probability of signal voicing, allowing the decoder to determine in which portions of the spectrum it should use the voiced component or the unvoiced component.
- coder families include coders of the MBE type (“Multi-Band Excitation”), or STC (“Sinusoidal” type encoders) Transform Coder ”).
- MBE Multi-Band Excitation
- STC Small Cellular Transform Coder
- An object of the present invention is to allow, in a diagram of coding with analysis in the spectral domain, to improve the modeling of phases of the signal spectrum by the decoder.
- the invention thus proposes a method for decoding a digital stream input representing a coded audio signal, in which one synthesizes a set of successive frames of N samples of the audio signal from coding data included in the digital input stream, in which the coding data includes, for only a subset of the frames, data representative of spectral amplitudes associated with frequencies of the audio signal spectrum.
- the coding data includes, for only a subset of the frames, data representative of spectral amplitudes associated with frequencies of the audio signal spectrum.
- it is determined for each of the frames of said subset, on the basis of the coding data, cepstral coefficients representative of at least some of the said spectral amplitudes. For frames that are not part of said subset, we interpolate said cepstral coefficients, and we generate using cepstral coefficients interpolated a spectral estimate of the audio signal that transforms in the time domain to obtain the synthesized frame.
- the method advantageously applies when the data of coding directly include quantification data of cepstral coefficients. But if spectrum modeling employs other parameters quantified in the flow which allow, by a transformation known, to recover the cepstral coefficients, for example the LSP ("Line Spectrum Parameters ”), the decoder can use this transformation to proceed to the interpolation of parameters in the cepstral domain and benefit thus advantages of the invention.
- LSP Line Spectrum Parameters
- the interpolated cepstral coefficients for frames not part of said subset, based on error quantification data interpolation included in the coding data.
- Another aspect of the present invention relates to a method of coding of an audio signal, in which a spectrum of the audio signal is determined by a frequency domain transform of a frame of the audio signal, and we include in a digital output stream representative data spectral amplitudes associated with at least some of the frequencies of the spectrum, in which the spectrum of the audio signal is determined for a set successive frames of N samples of the audio signal, and in which determines for each of the frames of said set of cepstral coefficients representative of at least some of said spectral amplitudes.
- said data representative of the spectral amplitudes are included in the flow digital output for only a subset of the frames. For the frames not being part of said subset, we include in the flow digital output either quantification data of an error interpolation of said cepstral coefficients, that is data representing a optimal interpolator filter determined for said cepstral coefficients.
- the invention also provides an audio coder and decoder comprising means for implementing the above methods.
- the encoder and decoder described below are digital circuits which can, as is usual in the field of signal processing audio, be produced by programming a digital signal processor (DSP) or a specific application integrated circuit (ASIC).
- DSP digital signal processor
- ASIC application integrated circuit
- the audio coder shown in FIG. 1 processes an input audio signal x which, in the nonlimiting example considered below, is a speech signal.
- the signal x is available in digital form, for example at a sampling frequency F e of 8 kHz. It is for example delivered by an analog-digital converter processing the amplified output signal of a microphone.
- the input signal x can also be formed from another version, analog or digital, coded or not, of the speech signal.
- the encoder includes a module 1 which forms successive frames audio signal for the different treatments performed, and a multiplexer output 6 which delivers an output stream ⁇ containing games for each frame quantization parameters from which a decoder will be able to synthesize a decoded version of the audio signal.
- Each frame 2 is composed of a number N of consecutive samples of the audio signal x.
- N 256
- the module 1 multiplies the samples of each frame 2 by a windowing function f A , preferably chosen for its good spectral properties.
- the encoder in Figure 1 performs an analysis of the audio signal in the spectral range. It includes a module 3 which calculates the transform of Fast Fourier (TFR) of each signal frame.
- TFR Fast Fourier
- the TFR 3 module obtains the signal spectrum for each frame, the module and phase of which are respectively denoted
- a fundamental frequency detector 4 estimates for each signal frame a value of the fundamental frequency F 0 .
- the detector 4 can apply any known method of analysis of the speech signal of the frame to estimate the fundamental frequency F 0 , for example a method based on the autocorrelation function or the AMDF function, possibly preceded by a whitening module. by linear prediction.
- the estimation can also be performed in the spectral domain or in the cepstral domain.
- Another possibility is to evaluate the time intervals between the consecutive breaks in the speech signal attributable to closures of the glottis of the intervening speaker during the duration of the frame.
- Well-known methods which can be used to detect such micro-ruptures are described in the following articles: M.
- the estimated fundamental frequency F 0 is subject to quantization, for example scalar, by a module 5, which provides the output multiplexer 6 with an index iF for quantifying the fundamental frequency for each frame of the signal.
- the coder uses cepstral parametric models to represent an upper envelope and a lower envelope of the spectrum of the audio signal.
- the first step of the cepstral transformation consists in applying to the signal spectrum module a spectral compression function, which can be a logarithmic or root function.
- the compressed LX spectrum of the audio signal is processed by a module 9 which extract of the spectral amplitudes associated with the harmonics of the signal corresponding to the multiples of the estimated fundamental frequency F0. These amplitudes are then interpolated by a module 10 in order to obtain a compressed upper envelope marked LX_sup.
- spectral compression could equivalent to be performed after determining the associated amplitudes to harmonics. It could also be done after the interpolation, which would only change the form of the interpolation functions.
- the module 9 for extracting the maxima takes account of the possible variation of the fundamental frequency on the analysis frame, of the errors that the detector 4 can make, as well as of the inaccuracies linked to the discrete nature of the frequency sampling.
- the search for the amplitudes of the spectral peaks does not simply consist in taking the values LX (i) corresponding to the indexes i such that iF e / 2N is the frequency closest to a harmonic of frequency kF 0 (k ⁇ 1 ).
- the spectral amplitude retained for a harmonic of order k is a local maximum of the spectrum module in the vicinity of the frequency kF 0 (this amplitude is obtained directly in compressed form when the spectral compression 8 is carried out before the extraction of the maxima 9 ).
- FIGS. 4 and 5 show an example of the shape of the compressed spectrum LX, where it can be seen that the maximum amplitudes of the harmonic peaks do not necessarily coincide with the amplitudes corresponding to the integer multiples of the estimated fundamental frequency F 0 .
- the sides of the peaks being quite steep, a small positioning error of the fundamental frequency F 0 , amplified by the harmonic index k, can strongly distort the estimated upper envelope of the spectrum and cause poor modeling of the formantic structure of the signal.
- the interpolation is performed between points whose abscissa is the frequency corresponding to the maximum the amplitude of a spectral peak, and whose ordinate is this maximum, before or after compression.
- the interpolation performed to calculate the upper envelope LX_sup is a simple linear interpolation.
- Another form interpolation could be used (eg polynomial or spline).
- the interpolation is carried out between points whose abscissa is a frequency kF 0 multiple of the fundamental frequency (in fact the closest frequency in the discrete spectrum) and whose ordinate is the maximum amplitude, before or after compression, of the spectrum in the vicinity of this multiple frequency.
- the maximum amplitude search interval associated with a harmonic of rank k is centered on the index i of the frequency of the TFR closest to kF 0 , that is to say where ⁇ a ⁇ denotes the integer equal to or immediately less than the number a.
- the width of this search interval depends on the sampling frequency F e , the size 2N of the TFR and the range of possible variation of the fundamental frequency. This width is typically of the order of ten frequencies with the examples of values previously considered. It can be made adjustable as a function of the value F 0 of the fundamental frequency and of the number k of the harmonic.
- a non-linear distortion of the frequency scale is operated on the upper envelope compressed by a module 12 before the module 13 performs the inverse fast Fourier transform (TFRI) providing the cepstral coefficients cx_sup.
- TFRI inverse fast Fourier transform
- the non-linear distortion makes it possible to more effectively minimize the modeling error. It is for example performed according to a Mel or Bark type frequency scale. This distortion may possibly depend on the estimated fundamental frequency F 0 .
- Figure 1 illustrates the case of the Mel scale. The relationship between the frequencies F of the linear spectrum, expressed in hertz. and the frequencies F 'of the Mel scale is as follows:
- TFRI 13 module needs to calculate only a cepstral vector of NCS cepstral coefficients of orders 0 to NCS-1.
- NCS can be equal to 16.
- a post-filtering in the cepstral domain is applied by a module 15 to the compressed upper envelope LX_sup.
- This post-liftrage corresponds to a manipulation of the cepstral coefficients cx_sup delivered by the module of TRFI 13, which corresponds approximately to a post-filtering of the harmonic part of the signal by a transfer function having the classical form:
- a (z) is the transfer function of a linear prediction filter of the audio signal
- ⁇ 1 and ⁇ 2 are coefficients between 0 and 1
- ⁇ is a possibly zero pre-emphasis coefficient.
- a normalization module 16 further modifies the cepstral coefficients by imposing the exact modeling constraint of a point on the initial spectrum, which is preferably the most energetic point among the spectral maxima extracted by the module 9 In practice, this normalization only modifies the value of the coefficient c p (0).
- the normalization module 16 operates as follows: it recalculates a value of the synthesized spectrum at the frequency of the maximum indicated by the module 9, by Fourier transform of the truncated and post-liftral cepstral coefficients, taking into account the non-distortion frequency axis linear; it determines a normalization gain g N by the logarithmic difference between the value of the maximum provided by the module 9 and this recalculated value; and he adds the gain g N to the post-liftrated cepstral coefficient c p (0). This normalization can be seen as part of post-liftrage.
- the post-liftralized and standardized cepstral coefficients are the subject of a quantification by a module 18 which transmits quantification indexes corresponding to the output multiplexer 6 of the encoder.
- the module 18 can operate by vector quantization from cepstral vectors formed from post-liftred and normalized coefficients, denoted here cx [n] for the signal frame of rank n.
- cx [n] 16 cepstral coefficients cx [n, 0], cx [n, 1], ..., cx [n, NCS-1] is distributed in four sub - cepstral vectors each containing four coefficients of consecutive orders.
- the cepstral vector cx [n] can be processed by the means shown in FIG. 6, which are part of the quantization module 18.
- rcx [n] denotes a residual prediction vector for the frame of rank n whose components are respectively denoted rcx [n, 0], rcx [n, 1], ..., rcx [n, NCS-1], and ⁇ (i) denotes a prediction coefficient chosen to be representative of an assumed inter-frame correlation.
- the numerator of the relation (10) is obtained by a subtractor 20, whose components of the output vector are divided by the quantities 2- ⁇ (i) at 21.
- the residual vector rcx [n] is subdivided into four sub-vectors, corresponding to the subdivision into four cepstral sub-vectors.
- the unit 22 proceeds to the vector quantization of each sub-vector of the residual vector rcx [n].
- This quantification can consist, for each sub-vector srcx [n], in selecting from the dictionary the quantized sub-vector srcx_q [n] which minimizes the quadratic error ⁇ srcx [n] -srcx_q [n] ⁇ 2 .
- the set icxs of the quantization indices icx, corresponding to the addresses in the dictionary or dictionaries of the residual quantized sub-vectors srcx_q [n] is supplied to the output multiplexer 6.
- Unit 22 also delivers the values of the residual sub-vectors which form the vector rcx_q [n]. This is delayed by one frame in 23, and its components are multiplied by the coefficients ⁇ (i) at 24 for supply the vector to the negative input of subtractor 20. This last vector is on the other hand supplied to an adder 25, the other input of which receives a vector formed by the components of the quantized residue rcx_q [n] respectively multiplied by the quantities 1- ⁇ (i) at 26. The adder 25 thus delivers the quantified cepstral vector cx_q [n] that will recover the decoder.
- the prediction coefficient ⁇ (i) can be optimized separately for each of the cepstral coefficients.
- Quantification dictionaries can also be optimized separately for each four cepstral sub-vectors.
- the adaptation module 29 controls the post-lifter 15 so as to minimize a modulus gap between the spectrum of the audio signal and the values of corresponding module calculated in 28.
- This module deviation can be expressed by a sum of absolute values of differences in amplitudes, compressed or not, corresponding to one or more of the frequencies harmonics. This sum can be weighted according to the amplitudes spectral associated with these frequencies.
- the modulus difference taken into account in the adaptation of the post-tiftrage would take into account all the harmonics of the spectrum.
- the module 28 can resynthesize the spectral amplitudes only for one or more frequencies multiple of the fundamental frequency F 0 , selected on the basis of the size of the spectrum module in absolute value .
- the adaptation module 29 can for example consider the three most intense spectral peaks in the calculation of the module deviation to be minimized.
- the adaptation module 29 estimates a spectral masking curve of the audio signal using a model psychoacoustics, and the frequencies taken into account in the calculation of the deviation of modules to be minimized are selected based on the importance of the spectrum modulus relative to the masking curve (we can by example take the three frequencies for which the spectrum module more than the masking curve).
- Different conventional methods can be used to calculate the masking curve from the audio signal. We can for example use the one developed by J.D. Johnston ("Transform Coding of Audio Signals Using Perceptual Noise Criteria ", IEEE Journal on Selected Area in Communications, Vol. 6, No. 2, February 1988).
- the module 29 can use a filter identification model.
- a simpler method consists in predefining a set of post-liftring parameter sets, that is to say a set of couples ⁇ 1 , ⁇ 2 in the case of a post-liftring according to relations (8), to carry out the operations incumbent on the modules 15, 16, 18 and 28 for each of these sets of parameters, and to retain that of the sets of parameters which leads to the minimum module deviation between the signal spectrum and the recalculated values.
- the quantization indexes provided by the module 18 are then those which relate to the best set of parameters.
- the coder determines coefficients cx_inf representing a compressed lower envelope LX_inf.
- a module 30 extracts from the compressed spectrum LX spectral amplitudes associated with frequencies located in regions of the intermediate spectrum with respect to the frequencies multiple of the estimated fundamental frequency F 0 .
- F 0 simply corresponds to the modulus of the spectrum for the frequency (k + 1/2) .F 0 located in the middle of the interval separating the two harmonics.
- this amplitude could be an average of the spectrum modulus over a small range surrounding this frequency (k + 1/2) .F 0 .
- a module 31 performs an interpolation, for example linear, of the spectral amplitudes associated with the frequencies located in the zones intermediaries to obtain the compressed lower envelope LX_inf.
- the cepstral transformation applied to this lower envelope compressed LX_inf is performed according to a resulting frequency scale of a non-linear distortion applied by a module 32.
- the TFRI module 33 calculates a cepstral vector of NCI cepstral coefficients cx_inf of orders 0 to NCI-1 representing the lower envelope.
- the non-linear transformation of the frequency scale for the cepstral transformation of the lower envelope can be achieved towards a finer scale at high frequencies than at low frequencies, which advantageously makes it possible to model the non-voiced components of the signal at high frequencies.
- cepstral coefficients cx_inf representing the lower envelope compressed are quantified by a module 34, which can operate the same so that the module 18 for quantifying cepstral coefficients representing the compressed upper envelope.
- the vector thus formed is subjected to a vector quantization of prediction residue, performed by means identical to those shown in Figure 6 but without subdivision into sub-vectors.
- the encoder shown in Figure 1 has no device particular for coding the phases of the spectrum to the harmonics of the audio signal.
- it includes means 36-40 for coding a temporal information related to the phase of the non-harmonic component represented by the lower envelope.
- a spectral decompression module 36 and a TFRI module 37 form a temporal estimate of the frame of the non-harmonic component.
- the module 36 applies a reciprocal decompression function of the compression function applied by the module 8 (that is to say an exponential or a power 1 / ⁇ function) to the compressed lower envelope LX_inf produced by the module interpolation 31. This provides the modulus of the estimated frame of the non-harmonic component, the phase of which is taken to be equal to that ⁇ X of the spectrum of the signal X over the frame.
- the inverse Fourier transform performed by the module 37 provides the estimated frame of the non-harmonic component.
- Module 38 subdivides this estimated frame of the non-harmonic component in several time segments.
- module 38 calculates the energy equal to the sum of the squares of the samples, and forms a vector E1 formed of eight positive real components equal to the eight calculated energies. Most of these eight energies, denoted EM, is also determined to be supplied, with the vector E1, to a normalization module 39. This divides each component of the vector E1 by EM, so that the normalized vector Emix is made up of eight components between 0 and 1. It is this vector normalized Emix, or weighting vector, which is subject to quantification by module 40. This can perform vector quantization with a dictionary determined during a prior learning. The index of iEm quantification is provided by module 40 to output multiplexer 6 of the encoder.
- FIG. 7 shows an alternative embodiment of the means used by the coder of FIG. 1 to determine the vector Emix of energy weighting of the frame of the non-harmonic component.
- the spectral decompression and TFRI modules 36, 37 operate like those which have the same references in FIG. 1.
- a selection module 42 is added to determine the value of the spectrum module subjected to the inverse Fourier transform 37. On the based on the estimated fundamental frequency F 0 , the module 42 identifies harmonic regions and non-harmonic regions of the spectrum of the audio signal.
- a frequency will be considered to belong to a harmonic region if it is in a frequency interval centered on a harmonic kF 0 and of width corresponding to a width of spectral line synthesized, and to a non-harmonic region otherwise.
- the complex signal subjected to TFRI 37 is equal to the value of the spectrum, that is to say that its modulus and its phase correspond to the values
- this complex signal has the same phase ⁇ X as the spectrum and a module given by the lower envelope after spectral decompression 36. This procedure according to Figure 7 provides more precise modeling of non-harmonic regions.
- the decoder represented in FIG. 8 comprises an input demultiplexer 45 which extracts from the bit stream ⁇ , coming from an encoder according to FIG. 1, the indexes iF, icxs, icxi, iEm for quantifying the fundamental frequency F 0 , cepstral coefficients representing the compressed upper envelope, coefficients representing the compressed lower envelope, and the weighting vector Emix, and distributes them respectively to modules 46, 47, 48 and 49.
- These modules 46-49 include quantization dictionaries similar to those of modules 5, 18, 34 and 40 of FIG. 1, in order to restore the values of the quantized parameters.
- the modules 47 and 48 have dictionaries to form the quantized prediction residues rcx_q [n], and they deduce therefrom the quantized cepstral vectors cx_q [n] with elements identical to the elements 23-26 of FIG. 6. These quantified cepstral vectors cx_q [n] provide the cepstral coefficients cx_sup_q and cx_inf_q processed by the decoder.
- a module 51 calculates the fast Fourier transform of the cepstral coefficients cx_sup for each signal frame.
- the frequency scale of the resulting compressed spectrum is modified non-linearly by a module 52 applying the reciprocal non-linear transformation of that of module 12 of FIG. 1, and which provides the estimate LX_sup of the compressed upper envelope .
- a spectral decompression of LX_sup operated by a module 53, provides the upper envelope X_sup comprising the estimated values of the module of the spectrum at frequencies multiple of the fundamental frequency F 0 .
- the module 54 synthesizes the spectral estimate X v of the harmonic component of the audio signal, by a sum of spectral lines centered on the frequencies multiple of the fundamental frequency F 0 and whose amplitudes (in module) are those given by the envelope superior X_sup.
- the decoder of figure 8 is capable of extracting information on this phase from the cepstral coefficients cx_sup_q representing the compressed upper envelope. This phase information is used to assign a phase ⁇ (k) to each of the spectral lines determined by the module 54 in the estimation of the harmonic component of the signal.
- the speech signal can be considered to be at minimum phase.
- the minimum phase information can easily be deduced from a cepstral modeling. This minimum phase information is therefore calculated for each harmonic frequency.
- the minimum phase assumption means that the energy of the synthesized signal is localized at the start of each period of the fundamental frequency F 0 .
- module 56 deduces post-liftrated cepstral coefficients and smoothed the phase minimum assigned to each spectral line representing a harmonic peak of the spectrum.
- the operations performed by the modules 56, 57 for smoothing and extracting the minimum phase are illustrated by the flowchart in FIG. 9.
- the module 56 examines the variations of the cepstral coefficients to apply a lesser smoothing in the presence of sudden variations only in the presence of slow variations. For this, it performs the smoothing of cepstral coefficients by means of a forgetting factor ⁇ c chosen as a function of a comparison between a threshold d th and a distance d between two successive sets of post-liftrated cepstral coefficients.
- the threshold d th is itself adapted as a function of the variations of the cepstral coefficients.
- the first step 60 consists in calculating the distance d between the two successive vectors relating to frames n-1 and n. These vectors, noted here cxp [n-1] and cxp [n], correspond for each frame to all the NCS post-liftrated cepstral coefficients representing the upper envelope compressed.
- the distance used can in particular be the Euclidean distance between the two vectors or a quadratic distance.
- Two smoothings are first carried out, respectively by means of forgetting factors ⁇ min and ⁇ max , to determine a minimum distance d min and a maximum distance d max .
- the forgetting factors ⁇ min and ⁇ max are themselves selected from two distinct values, respectively ⁇ min1 , ⁇ min2 and ⁇ max1 , ⁇ max2 between 0 and 1, the indices ⁇ min1 , ⁇ max1 each being substantially closer to 0 than the indices ⁇ min2 , ⁇ max2 . If d> d min (test 61), the forget factor ⁇ min is equal to ⁇ min1 (step 62); otherwise it is taken equal to ⁇ min2 (step 63). In step 64, the minimum distance d min is taken equal to ⁇ min .d min + (1- ⁇ min ) .d.
- step 66 the forgetting factor ⁇ max is equal to ⁇ max1 (step 66); otherwise it is taken equal to ⁇ max2 (step 67).
- step 68 the minimum distance d max is taken equal to ⁇ max .d max + (1- ⁇ max ) .d.
- step 72 If the distance d between the two consecutive cepstral vectors is greater than the threshold d th (test 71), a value ⁇ c1 relatively close to 0 is adopted for the forget factor ⁇ c (step 72). In this case, the corresponding signal is considered to be of the non-stationary type, so that there is no need to keep a large memory of the previous cepstral coefficients. If d ⁇ d th , in step 73 we adopt for the forget factor ⁇ c a value ⁇ c2 less close to 0 in order to further smooth the cepstral coefficients.
- the module 57 then calculates the minimum phases ⁇ (k) associated with the harmonics kF 0 .
- the minimum phase for a harmonic of order k is given by: where cxl [n, m] denotes the smooth cepstral coefficient of order m for the frame n.
- step 75 the harmonic index k is initialized to 1.
- the phase ⁇ (k) and the cepstral index m are initialized respectively at 0 and 1 in step 76.
- the module 57 adds to phase ⁇ (k) the quantity -2.cxl [n, m] .sin (2 ⁇ mk.F 0 / F e ).
- the cepstral index m is incremented in step 78 and compared to NCS in step 79. Steps 77 and 78 are repeated as long as m ⁇ NCS.
- m NCS
- the calculation of the minimum phase is completed for the harmonic k, and the index k is incremented in step 80.
- the calculation of minimum phases 76-79 is repeated for the following harmonic as long as kF 0 ⁇ F e / 2 (test 81).
- the module 54 holds account of a constant phase over the width of each spectral line, equal to the minimum phase ⁇ (k) supplied for the corresponding harmonic k by the module 57.
- the estimate X v of the harmonic component is synthesized by summing spectral lines positioned at the harmonic frequencies of the fundamental frequency F 0 .
- the spectral lines can be positioned on the frequency axis with a resolution greater than the resolution of the Fourier transform. For that, one precalculates once and for all a reference spectral line according to the higher resolution. This calculation can consist of a Fourier transform of the analysis window f A with a transform size of 16384 points, providing a resolution of 0.5 Hz per point.
- each harmonic line is then carried out by the module 54 by positioning the reference line at high resolution on the frequency axis, and by sub-sampling this reference spectral line to reduce to the resolution of 16.625 Hz of the Fourier transform on 512 points. This allows to precisely position the spectral line.
- the TFR 85 module of the decoder of figure 8 receives the NCI quantified cepstral coefficients cx_inf_q of orders 0 to NCI - 1, and it supplements them advantageously by the NCS - NCI cepstral coefficients cx_sup_q of order NCI to NCS - 1 representing the upper envelope. Indeed, we can estimate first approximation as the rapid variations of the compressed lower envelope are well reproduced by those of the compressed upper envelope. In another realization, the TFR 85 module could only consider NCl cepstral parameters cx_inf_q.
- Module 86 converts frequency scale reciprocally of the conversion carried out by the module 32 of the coder, in order to restore the LX_inf estimate of the compressed lower envelope, submitted to the spectral decompression 87.
- the decoder has of a lower envelope X_inf comprising the values of the spectrum module in the valleys between the harmonic peaks.
- This envelope X_inf will modulate the spectrum of a noise frame whose phase is processed as a function of the quantized weighting vector Emix extracted by the module 49.
- a generator 88 delivers a normalized noise frame whose 4 ms segments are weighted in a module 89 in accordance with the normalized components of the Emix vector provided by the module 49 for the current frame.
- This noise is a high-pass filtered white noise to take account of the low level which in principle the unvoiced component has at low frequencies.
- the Fourier transform of the resulting frame is calculated by the TFR module 91.
- the spectral estimate X uv of the non-harmonic component is determined by the spectral synthesis module 92 which performs frequency-by-frequency weighting. This weighting consists in multiplying each complex spectral value supplied by the TFR module 91 by the value of the lower envelope X_inf obtained for the same frequency by the spectral decompression module 87.
- the spectral estimates X v , X uv of the harmonic (voiced in the case of a speech signal) and non-harmonic (or non-voiced) components are combined by a mixing module 95 controlled by a module 96 for analyzing the degree of harmony (or voicing) of the signal.
- the analysis module 96 comprises a unit 97 for estimating a degree of voicing W dependent on the frequency, from which four gains dependent on are calculated.
- the frequency namely two gains g v , g uv controlling the relative importance of the harmonic and non-harmonic components in the synthesized signal, and two gains g v_ ⁇ , g uv_ ⁇ used to noise the phase of the harmonic component.
- the degree of voicing W (i) is a continuously variable value between 0 and 1 determined for each frequency index i (0 ⁇ i ⁇ N) as a function of the upper envelope X_sup (i) and the lower envelope X_inf (i) obtained for this frequency i by the decompression modules 53, 87.
- the threshold Vth (F 0 ) corresponds to the average dynamics calculated on a synthetic spectrum purely voiced at the fundamental frequency. It is advantageously chosen depending on the fundamental frequency F 0 .
- the degree of voicing W (i) for a frequency other than the harmonic frequencies is obtained simply as being equal to that estimated for the nearest harmonic.
- the gain g v (i), which depends on the frequency, is obtained by applying a non-linear function to the degree of voicing W (i) (block 98).
- This non-linear function has for example the form represented in FIG. 11: the thresholds W1, W2 being such that 0 ⁇ W1 ⁇ W2 ⁇ 1.
- phase ⁇ ' v of the mixed harmonic component is the result of a linear combination of the phases ⁇ v , ⁇ uv of the harmonic and non-harmonic components X v , X uv synthesized by the modules 54, 92.
- the gains g v_ ⁇ , g uv_ ⁇ respectively applied to these phases are calculated from the degree of voicing W and also weighted according to the frequency index i, since the sound effects of the phase are only really useful beyond of a certain frequency.
- a first gain g v1_ ⁇ is calculated by applying a non-linear function to the degree of voicing W (i), as shown diagrammatically by block 100 in FIG. 10.
- This non-linear function can have the form represented in FIG. 12: the thresholds W3 and W4 being such that 0 ⁇ W3 ⁇ W4 ⁇ 1, and the minimum gain G1 being between 0 and 1.
- a multiplier 101 multiplies for each index frequency i the gain g v1_ ⁇ by another gain g v2_ ⁇ depending only on the frequency index i, to form the gain g v_ ⁇ (i).
- the gain g v2_ ⁇ (i) depends non-linearly on the frequency index i, for example as shown in Figure 13: the indices i1 and i2 being such that 0 ⁇ i1 ⁇ i2 ⁇ N, and the minimum gain G2 being between 0 and 1.
- . exp [j ⁇ ' v (i)] + g uv (I). X uv (I) with ⁇ ' v (i) g v_ ⁇ (I). ⁇ v (i) + g uv_ ⁇ (I). ⁇ uv (I) where ⁇ v (i) denotes the argument of the complex number X v (i) supplied by the module 54 for the frequency of index i (block 104 of FIG.
- ⁇ uv (i) denotes the argument of complex number X uv (i) supplied by the module 92 (block 105 of FIG. 10). This combination is carried out by the multipliers 106-110 and the adders 111-112 shown in FIG. 10.
- the frames successively obtained in this way are finally processed by the time synthesis module 116 which forms the audio signal decoded x and.
- the temporal synthesis module 116 performs a sum at overlapping of frames modified with respect to those successively evaluated at the output of module 115.
- the modification can be seen in two steps illustrated respectively in FIGS. 14 and 15.
- the first step (FIG. 14) consists in multiplying each frame 2 ′ delivered by the TFRI module 115 by a window 1 / f A opposite to the analysis window f A used by the module 1 of the coder.
- the resulting 2 "frame samples are therefore weighted uniformly.
- the synthesis window f S (i) increases progressively from 0 to A for i going from 0 to L. It is for example a raised half-sinusoid:
- each sample of the decoded audio signal x and thus obtained is assigned a uniform overall weight, equal to A.
- This overall weight comes from the contribution of a single frame if the sample has in this frame a rank i such that L ⁇ i ⁇ N - L, and comprises the summed contributions of two successive frames if 0 ⁇ i ⁇ L where N - L ⁇ i ⁇ N.
- FIG. 16 shows the appearance of the compound window f C in the case where the analysis window f A is a Hamming window and the synthesis window f S has the form given by the relations (19) to (21) .
- the encoder of the Figure 1 can increase the rate of training and analysis of the frames, so transmit more quantization parameters to the decoder.
- a frame of N 256 samples (32 ms) is formed every 20 ms.
- the notations cx_q [n-1] and cx_q [n] denote determined quantified cepstral vectors, for two successive frames of entire rank, by the quantization module 18 and / or by the quantization module 34. These vectors include for example four consecutive cepstral coefficients each. They could also understand more cepstral coefficients.
- a module 120 performs an interpolation of these two vectors cepstraux cx_q [n-1] and cx_q [n], in order to estimate an intermediate value cx_i [n-1/2].
- the interpolation performed by the module 120 can be a simple arithmetic mean of the vectors cx_q [n-1] and cx_q [n].
- the module 120 could apply a more sophisticated interpolation formula, for example polynomial, also based on cepstral vectors obtained for frames prior to frame n-1.
- the interpolation takes into account the relative position of each interpolated frame.
- the coder calculates also the cepstral coefficients cx [n-1/2] relative to the half-integer row frame.
- these cepstral coefficients are those provided by the TFRI module 13 after post-liftrage 15 (for example with the same post-lifter coefficients as for the previous frame n-1) and normalization 16.
- the coefficients cepstraux cx [n-1/2] are those delivered by the TFRI 33 module.
- a subtractor 121 forms the difference ecx [n-1/2] between the cepstral coefficients cx [n-1/2] calculated for the half-integer row frame and the coefficients cx_i [n-1/2] estimated by interpolation.
- This difference is supplied to a quantization module 122 which addresses indexes of quantization icx [n-1/2] at the output multiplexer 6 of the encoder.
- Module 122 works for example by vector quantization of interpolation errors ecx [n-1/2] successively determined for half-integer rank frames.
- This quantification of the interpolation error can be carried out by the coder for each of the NCS + NCI cepstral coefficients used by the decoder, or only for some of them, typically those of orders the smallest.
- the corresponding means of the decoder are illustrated in FIG. 19.
- the decoder essentially functions as that described with reference to the Figure 8 to determine whole rank signal frames.
- a module interpolation 124 identical to the module 120 of the coder estimates the coefficients intermediaries cx_i [n-1/2] from the quantized coefficients cx_q [n-1] and cx_q [n] provided by module 47 and / or module 48 from the icxs, icxi indexes extracts from the flow ⁇ .
- a parameter extraction module 125 receives the index of quantization icx [n-1/2] from the input demultiplexer 45 of the decoder, and deduces the quantized interpolation error ecx_q [n-1/2] from the same quantization dictionary as that used by the module 122 of the coder.
- a adder 126 sums the cepstral vectors cx_i [n-1/2] and ecx_q [n-1/2] in order to provide the cepstral coefficients cx [n-1/2] which will be used by the decoder (modules 51-57, 95, 96, 115 and / or modules 85-87, 92, 95, 96, 115) to form the interpolated frame of rank n-1/2.
- the decoder can also interpolate the other parameters F 0 , Emix used to synthesize the signal frames.
- the fundamental frequency F 0 can be interpolated linearly, either in the time domain, or (preferably) directly in the frequency domain.
- the interpolation should be carried out after denormalization and of course taking account of the time offsets between frames.
- the coder uses the cepstral vectors cx_q [n], cx_q [n-1], ..., cx_q [n-r] and cx_q [n-1/2] calculated for the last frames passed (r ⁇ 1) to identify an interpolator filter optimal which, when subject to the quantified cepstral vectors cx_q [n-r], ..., cx_q [n] relative to the frames of whole rank, delivers a cepstral vector interpolated cx_i [n-1/2] which has a minimum distance from the vector cx [n-1/2] calculated for the last frame of half-integer rank.
- this interpolator filter 128 is present in the encoder, and a subtractor 129 subtracts its output cx_i [n-1/2] of the calculated cepstral vector cx [n-1/2].
- a minimization module 130 determines the parameter set ⁇ P ⁇ of the interpolator filter 128, for which the ecx interpolation error [n-1/2] delivered by the subtractor 129 presents a minimum standard.
- This set of parameters ⁇ P ⁇ is addressed to a module of quantization 131 which provides a corresponding quantization index iP to encoder output multiplexer 6.
- the decoder From the iP quantification indices of the parameters ⁇ P ⁇ obtained in the bit stream ⁇ , the decoder reconstructs the interpolator filter 128 (at errors quantization), and processes the spectral vectors cx_q [n-r], ..., cx_q [n] in order to estimate the cepstral coefficients cx [n-1/2] used to synthesize the half-full row frames.
- the decoder can use a method simple interpolation (without transmission of parameters from the encoder for half-integer rows), an interpolation method with tap taking into account a quantized interpolation error (according to FIGS. 17 and 18), or an interpolation method with an optimal interpolator filter (according to the figure 19) to evaluate half-full row frames in addition to row frames integer evaluated directly as explained with reference to Figures 8 to 13.
- the time synthesis module 116 can then combine all of these frames evaluated to form the synthesized signal x and in the manner explained below with reference to Figures 14, 21 and 22.
- the module 116 performs an overlap sum of modified frames with respect to those successively evaluated at the output of the module 115, and this modification can be seen in two stages, the first of which is identical to that previously described with reference to FIG. 14 (divide the samples of the frame 2 'by the analysis window f A ).
- the summary window f '/ S (i) increases progressively for i going from N / 2 - M / p to N / 2. It is for example a raised sinusoid over the interval N / 2 - M / p ⁇ i ⁇ N / 2 + M / p.
- the synthesis window f ′ / S can be, over this interval, a Hamming window (as shown in FIG. 21) or a Hanning window.
- FIG. 21 shows the successive 2 "frames repositioned in the time by module 116.
- the hatching indicates the portions eliminated from the frames (summary window at 0).
- property (25) ensures a uniform weighting of the samples of the synthesized signal.
- FIG. 22 shows the shape of the compound window f '/ C in the case where the windows f A and f' / S are of the Hamming type.
- Interpolated frames may be subject to reduced transmission coding parameters, as described above, but this is not mandatory.
- This embodiment makes it possible to keep an interval M relatively large between two analysis frames, and therefore limit the throughput of transmission required, while limiting the discontinuities likely to appear due to the size of this interval compared to time scales typical of variations in audio signal parameters, including cepstral coefficients and the fundamental frequency.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Error Detection And Correction (AREA)
- Reduction Or Emphasis Of Bandwidth Of Signals (AREA)
Claims (17)
- Verfahren zum Decodieren eines digitalen Eingangsdatenstroms (Φ), der ein codiertes Audiosignal bildet, bei dem eine Gruppe von aufeinander folgenden Blöcken von N Abtastwerten des Audiosignals ausgehend von Codierungsdaten zusammengesetzt wird, die in dem digitalen Eingangsdatenstrom enthalten sind, bei dem die Codierungsdaten für nur eine Untergruppe von Blöcken Daten umfassen, die für Spektralamplituden, die Frequenzen des Spektrums des Audiosignals zugeordnet sind, repräsentativ sind,
dadurch gekennzeichnet, dass auf der Basis der Codierungsdaten für jeden der Blöcke der Untergruppe Cepstralkoeffizienten bestimmt werden, die für wenigstens einige der Spektralamplituden repräsentativ sind, und dass für die nicht zu dieser Untergruppe gehörenden Blöcke diese Cepstralkoeffizienten interpoliert werden und mit Hilfe der interpolierten Ceptstralkoeffizienten eine Spektralschätzung (Y) des Audiosignals erzeugt wird, das in den Zeitbereich umgesetzt wird, um den zusammengesetzten Block zu erhalten. - Verfahren nach Anspruch 1,
bei dem die Codierungsdaten Daten zur Quantisierung dieser Cepstralkoeffizienten umfassen. - Verfahren nach Anspruch 1 oder 2,
bei dem eine Grundfrequenz (F0) des Audiosignals ausgehend von Quantisierungsdaten, die in dem binären Eingangsdatenstrom (Φ) enthalten sind, bestimmt wird, bei dem ausgehend von den Codierungsdaten für jeden Block dieser Untergruppe eine obere Spektraleinhüllende (X_sup) des Audiosignals bestimmt wird, die Spektralamplituden entspricht, die Vielfachfrequenzen der Grundfrequenz zugeordneten sind, und bei dem für jeden nicht zur Untergruppe gehörenden Block die obere Spektraleinhüllende des Audiosignals ausgehend von den interpolierten Cepstralkoeffizienten bestimmt wird. - Verfahren nach einem der Ansprüche 1 bis 3,
bei dem eine Grundfrequenz (F0) des Audiosignals ausgehend von Quantisierungsdaten bestimmt wird, die in dem binären Eingangsdatenstrom (Φ) enthalten sind, bei dem ausgehend von den Codierungsdaten für jeden Block dieser Untergruppe eine untere Spektraleinhüllende (X_inf) des Audiosignals bestimmt wird, die Spekralamplituden entspricht, die Frequenzen zugeordnet sind, die in Bezug auf die Vielfachfrequenzen der Grundfrequenz in Zwischenbereichen des Spektrums liegen, und bei dem für jeden nicht zur Untergruppe gehörenden Block die untere Spektraleinhüllende des Audiosignals ausgehend von den interpolierten Cepstralkoeffizienten bestimmt wird. - Verfahren nach einem der Ansprüche 1 bis 4,
bei dem die aufeinander folgenden Blöcke dieser Gruppe überlappend und aus N Abtastwerten des Audiosignals zusammengesetzt sind, die durch ein Analysefenster (fA) gewichtet sind, bei dem die aufeinander folgenden Blöcke dieser Untergruppe gegenseitige Zeitverschiebungen um M Abtastwerte aufweisen, wobei die Zahl M größer ist als N/2, wogegen die aufeinander folgenden Blöcke dieser Gruppe gegenseitige Zeitverschiebungen um M/p Abtastwerte aufweisen, wobei p eine Ganzzahl größer 1 ist, bei dem jeder zusammengesetzte Block der Gruppe verändert wird, indem eine Verarbeitung durchgeführt wird, die einer Division durch dieses Analysefenster (fA) und einer Multiplikation mit einem Synthesefenster (f 'S) entspricht, und das decodierte Audiosignal (x and) als eine Summe zum Überlappen der veränderten Blöcke gebildet wird, und bei dem das Synthesefenster f 's(i) einen Träger hat, der auf die Reihen i von N/2 - M/p bis N/2 + M/p begrenzt ist, da die Abtastwerte eines Blocks von 0 bis N-1 nummerierte Reihen i haben, und f 's(i) + f 's(i+M/p) = A für N/2 - M/p ≤ i < N/2 prüft, wobei A eine positive Konstante ist. - Verfahren nach einem der Ansprüche 1 bis 5,
bei dem für die nicht zur Untergruppe gehörenden Blöcke die interpolierten Cepstralkoeffizienten (cx_i[n-1/2]) auf der Basis von Daten (icx[n-1/2]) zur Quantisierung von Interpolationsfehlern (ecx_[n-1/2]) korrigiert werden, die in den Codierungsdaten enthalten sind. - Verfahren nach einem der Ansprüche 1 bis 5,
bei dem für die nicht zur Untergruppe gehörenden Blöcke die Cepstralkoeffizienten (cx_q[n]) durch einen Filter (128) interpoliert werden, der auf der Basis von Daten (iP) zur Quantisierung eines Interpolationsfilters bestimmt wird, die in den Codierungsdaten enthalten sind. - Audio-Decodierer mit Mitteln, die zum Durchführen eines Verfahrens nach einem der Ansprüche 1 bis 7 angeordnet sind.
- Verfahren zum Codieren eines Audiosignals (x),
bei dem ein Spektrum des Audiosignals durch eine Transformation in den Frequenzbereich eines Blocks des Audiosignals bestimmt wird und einem digitalen Ausgangsdatenstrom (Φ) Daten hinzugefügt werden, die für Spektralamplituden repräsentativ sind, die wenigstens einigen der Frequenzen des Spektrums zugeordnet sind, bei dem das Spektrum des Audiosignals für eine Gruppe von aufeinander folgenden Blöcken von N Abtastwerten des Audiosignals bestimmt wird, und bei dem für jeden der Blöcke dieser Gruppe Cepstralkoeffizienten bestimmt werden, die wenigstens für einige der Spektralamplituden repräsentativ sind,
dadurch gekennzeichnet, dass die für die Spektralamplituden repräsentativen Daten in dem digitalen Ausgangsdatenstrom für nur eine Untergruppe der Blöcke enthalten sind, und dass für die nicht zu dieser Untergruppe gehörenden Blöcke dem digitalen Ausgangsdatenstrom (Φ) Daten (icx[n-1/2]) zur Quantisierung eines Interpolationsfehlers (ecx[n-1/2]) der Cepstralkoeffizienten hinzugefügt werden. - Verfahren zum Codieren eines Audiosignals (x), bei dem ein Spektrum des Audiosignals durch eine Transformation in den Frequenzbereich eines Blocks des Audiosignals bestimmt wird und einem digitalen Ausgangsdatenstrom (Φ) Daten hinzugefügt werden, die für Spektralamplituden repräsentativ sind, die wenigstens einigen der Frequenzen des Spektrums zugeordnet sind, bei dem das Spektrum des Audiosignals für eine Gruppe von aufeinander folgenden Blöcken von N Abtastwerten des Audiosignals bestimmt wird, und bei dem für jeden der Blöcke dieser Gruppe Cepstralkoeffizienten bestimmt werden, die für wenigstens einige der Spektralamplituden repräsentativ sind,
dadurch gekennzeichnet, dass die für die Spektralamplituden repräsentativen Daten in dem digitalen Ausgangsdatenstrom für nur eine Untergruppe der Blöcke enthalten sind, und dass für die nicht zu der Untergruppe gehörenden Blöcke ein optimaler Interpolationsfilter (128) für die Cepstralkoeffizienten bestimmt wird und dem digitalen Ausgangsdatenstrom (Φ) Daten (iP) hinzugefügt werden, die diesen optimalen Interpolationsfilter bilden. - Verfahren nach Anspruch 9 oder 10,
bei dem die für die Spektralamplituden repräsentativen Daten Daten zur Quantisierung der Ceptstralkoeffizienten umfassen. - Verfahren nach einem der Ansprüche 9 bis 11,
bei dem eine Grundfrequenz (F0) des Audiosignals geschätzt wird, und bei dem die interpolierten Cepstralkoeffizienten Cepstralkoeffizienten umfassen, die durch Transformieren einer komprimierten oberen Einhüllenden (LX_sup) des Spektrums des Audiosignals in den Cepstralbereich berechnet werden. - Verfahren nach Anspruch 12,
bei dem die komprimierte obere Einhüllende (LX_sup) durch Interpolation von Spektralamplituden, die Vielfachfrequenzen der Grundfrequenz (F0) zugeordnet sind, unter Anwendung einer Funktion zur Spektralkompression bestimmt wird. - Verfahren nach einem der Ansprüche 9 bis 13,
bei dem eine Grundfrequenz (F0) des Audiosignals geschätzt wird, und bei dem die interpolierten Cepstralkoeffizienten Cepstralkoeffizienten umfassen, die durch Transformieren einer komprimierten unteren Einhüllenden (LX_inf) des Spektrums des Audiosignals in den Cepstralbereich berechnet werden. - Verfahren nach Anspruch 14,
bei dem die komprimierte untere Einhüllende (LX_inf) unter Anwendung einer Funktion zur Spektralkompression durch Interpolation von Spektralamplituden bestimmt wird, die Frequenzen zugeordnet sind, die in Bezug auf die Vielfachfrequenzen der Grundfrequenz (F0) in Zwischenbereichen des Spektrums liegen. - Verfahren nach einem der Ansprüche 9 bis 15,
bei dem die aufeinander folgenden Blöcke der Untergruppe gegenseitige Zeitverschiebungen um mehr als N/2 Abtastwerte aufweisen, und die aufeinander folgenden Blöcke der Gruppe gegenseitige Zeitverschiebung um wenigstens N/2 Abtastwerte aufweisen. - Audio-Codierer mit Mitteln, die zum Durchführen eines Verfahrens nach einem der Ansprüche 9 bis 16 angeordnet sind.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR9908635A FR2796191B1 (fr) | 1999-07-05 | 1999-07-05 | Procedes et dispositifs de codage et de decodage audio |
FR9908635 | 1999-07-05 | ||
PCT/FR2000/001906 WO2001003118A1 (fr) | 1999-07-05 | 2000-07-04 | Codage et decodage audio par interpolation |
Publications (2)
Publication Number | Publication Date |
---|---|
EP1192619A1 EP1192619A1 (de) | 2002-04-03 |
EP1192619B1 true EP1192619B1 (de) | 2004-09-22 |
Family
ID=9547704
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP00949621A Expired - Lifetime EP1192619B1 (de) | 1999-07-05 | 2000-07-04 | Audio-kodierung, dekodierung zur interpolation |
Country Status (6)
Country | Link |
---|---|
EP (1) | EP1192619B1 (de) |
AT (1) | ATE277403T1 (de) |
AU (1) | AU6292000A (de) |
DE (1) | DE60014085D1 (de) |
FR (1) | FR2796191B1 (de) |
WO (1) | WO2001003118A1 (de) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AT500235B1 (de) * | 2002-06-21 | 2007-05-15 | Joanneum Res Forschungsgesells | System und verfahren zur automatischen überwachung eines verkehrsweges |
FR2910996A1 (fr) * | 2006-12-29 | 2008-07-04 | France Telecom | Codage d'unites acoustiques par interpolation |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5630011A (en) * | 1990-12-05 | 1997-05-13 | Digital Voice Systems, Inc. | Quantization of harmonic amplitudes representing speech |
JP3747492B2 (ja) * | 1995-06-20 | 2006-02-22 | ソニー株式会社 | 音声信号の再生方法及び再生装置 |
US5774837A (en) * | 1995-09-13 | 1998-06-30 | Voxware, Inc. | Speech coding system and method using voicing probability determination |
-
1999
- 1999-07-05 FR FR9908635A patent/FR2796191B1/fr not_active Expired - Fee Related
-
2000
- 2000-07-04 EP EP00949621A patent/EP1192619B1/de not_active Expired - Lifetime
- 2000-07-04 AT AT00949621T patent/ATE277403T1/de not_active IP Right Cessation
- 2000-07-04 AU AU62920/00A patent/AU6292000A/en not_active Abandoned
- 2000-07-04 WO PCT/FR2000/001906 patent/WO2001003118A1/fr active IP Right Grant
- 2000-07-04 DE DE60014085T patent/DE60014085D1/de not_active Expired - Lifetime
Also Published As
Publication number | Publication date |
---|---|
DE60014085D1 (de) | 2004-10-28 |
WO2001003118A1 (fr) | 2001-01-11 |
AU6292000A (en) | 2001-01-22 |
ATE277403T1 (de) | 2004-10-15 |
FR2796191B1 (fr) | 2001-10-05 |
EP1192619A1 (de) | 2002-04-03 |
FR2796191A1 (fr) | 2001-01-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0782128B1 (de) | Verfahren zur Analyse eines Audiofrequenzsignals durch lineare Prädiktion, und Anwendung auf ein Verfahren zur Kodierung und Dekodierung eines Audiofrequenzsignals | |
EP1692689B1 (de) | Optimiertes mehrfach-codierungsverfahren | |
EP0801790B1 (de) | Verfahren zur sprachkodierung mittels analyse durch synthese | |
WO2004070705A1 (fr) | Procede pour le traitement numerique differencie de la voix et de la musique, le filtrage de bruit, la creation d’effets speciaux et dispositif pour la mise en oeuvre dudit procede | |
EP0428445B1 (de) | Verfahren und Einrichtung zur Codierung von Prädiktionsfiltern in Vocodern mit sehr niedriger Datenrate | |
FR2784218A1 (fr) | Procede de codage de la parole a bas debit | |
EP0721180A1 (de) | Sprachkodierung mittels Analyse durch Synthese | |
FR2653557A1 (fr) | Appareil et procede pour le traitement de la parole. | |
EP1192619B1 (de) | Audio-kodierung, dekodierung zur interpolation | |
WO2023165946A1 (fr) | Codage et décodage optimisé d'un signal audio utilisant un auto-encodeur à base de réseau de neurones | |
EP1192621B1 (de) | Audiokodierung mit harmonischen komponenten | |
EP1192618B1 (de) | Audiokodierung mit adaptiver lifterung | |
EP1194923B1 (de) | Verfahren und system für audio analyse und synthese | |
WO2001003121A1 (fr) | Codage et decodage audio avec composants harmoniques et phase minimale | |
WO2001003119A1 (fr) | Codage et decodage audio incluant des composantes non harmoniques du signal | |
FR2773653A1 (fr) | Dispositifs de codage/decodage de donnees, et supports d'enregistrement memorisant un programme de codage/decodage de donnees au moyen d'un filtre de ponderation frequentielle | |
FR2737360A1 (fr) | Procedes de codage et de decodage de signaux audiofrequence, codeur et decodeur pour la mise en oeuvre de tels procedes | |
EP0454552A2 (de) | Verfahren und Einrichtung zur Sprachcodierung mit niedriger Bitrate | |
WO2002029786A1 (fr) | Procede et dispositif de codage segmental d'un signal audio | |
WO2013135997A1 (fr) | Modification des caractéristiques spectrales d'un filtre de prédiction linéaire d'un signal audionumérique représenté par ses coefficients lsf ou isf | |
FR2739482A1 (fr) | Procede et dispositif pour l'evaluation du voisement du signal de parole par sous bandes dans des vocodeurs | |
FR2980620A1 (fr) | Traitement d'amelioration de la qualite des signaux audiofrequences decodes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20020107 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL |
|
AX | Request for extension of the european patent |
Free format text: AL;LT;LV;MK;RO;SI |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: NORTEL NETWORKS FRANCE |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20040922 Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT;WARNING: LAPSES OF ITALIAN PATENTS WITH EFFECTIVE DATE BEFORE 2007 MAY HAVE OCCURRED AT ANY TIME BEFORE 2007. THE CORRECT EFFECTIVE DATE MAY BE DIFFERENT FROM THE ONE RECORDED. Effective date: 20040922 Ref country code: IE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20040922 Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20040922 Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20040922 Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20040922 |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D Free format text: NOT ENGLISH |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D Free format text: FRENCH |
|
REF | Corresponds to: |
Ref document number: 60014085 Country of ref document: DE Date of ref document: 20041028 Kind code of ref document: P |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20041222 Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20041222 Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20041222 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20041223 |
|
GBT | Gb: translation of ep patent filed (gb section 77(6)(a)/1977) |
Effective date: 20050110 |
|
LTIE | Lt: invalidation of european patent or patent extension |
Effective date: 20040922 |
|
NLV1 | Nl: lapsed or annulled due to failure to fulfill the requirements of art. 29p and 29m of the patents act | ||
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FD4D |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20050614 Year of fee payment: 6 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20050704 Ref country code: CY Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20050704 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20050706 Year of fee payment: 6 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20050731 Ref country code: LI Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20050731 Ref country code: CH Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20050731 Ref country code: BE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20050731 |
|
26N | No opposition filed |
Effective date: 20050623 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20060704 |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20060704 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: ST Effective date: 20070330 |
|
BERE | Be: lapsed |
Owner name: *NORTEL NETWORKS FRANCE Effective date: 20050731 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: PT Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20050222 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20060731 |