WO2001003116A1 - Procedes et dispositifs d'analyse et de synthese audio - Google Patents
Procedes et dispositifs d'analyse et de synthese audio Download PDFInfo
- Publication number
- WO2001003116A1 WO2001003116A1 PCT/FR2000/001904 FR0001904W WO0103116A1 WO 2001003116 A1 WO2001003116 A1 WO 2001003116A1 FR 0001904 W FR0001904 W FR 0001904W WO 0103116 A1 WO0103116 A1 WO 0103116A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- frames
- samples
- synthesis
- module
- frame
- Prior art date
Links
- 230000015572 biosynthetic process Effects 0.000 title claims abstract description 60
- 238000003786 synthesis reaction Methods 0.000 title claims abstract description 58
- 238000004458 analytical method Methods 0.000 title claims abstract description 49
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000001228 spectrum Methods 0.000 claims abstract description 60
- 230000005236 sound signal Effects 0.000 claims abstract description 49
- 238000012545 processing Methods 0.000 claims abstract description 18
- 230000001131 transforming effect Effects 0.000 claims abstract description 8
- 230000003595 spectral effect Effects 0.000 claims description 64
- 238000000605 extraction Methods 0.000 claims description 9
- 230000005540 biological transmission Effects 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 5
- 238000011084 recovery Methods 0.000 claims description 2
- 230000002194 synthesizing effect Effects 0.000 claims 2
- 238000002407 reforming Methods 0.000 abstract 1
- 239000013598 vector Substances 0.000 description 65
- 238000013139 quantization Methods 0.000 description 32
- 230000006870 function Effects 0.000 description 23
- 238000010586 diagram Methods 0.000 description 11
- 238000011002 quantification Methods 0.000 description 9
- 230000006835 compression Effects 0.000 description 8
- 238000007906 compression Methods 0.000 description 8
- 230000006837 decompression Effects 0.000 description 8
- 230000009466 transformation Effects 0.000 description 8
- 238000001308 synthesis method Methods 0.000 description 7
- 230000006978 adaptation Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 6
- 238000009499 grossing Methods 0.000 description 6
- 238000010606 normalization Methods 0.000 description 6
- 230000002123 temporal effect Effects 0.000 description 6
- 102100032077 Neuronal calcium sensor 1 Human genes 0.000 description 5
- 101710133725 Neuronal calcium sensor 1 Proteins 0.000 description 5
- 150000001875 compounds Chemical class 0.000 description 5
- 238000005070 sampling Methods 0.000 description 5
- 238000012886 linear function Methods 0.000 description 4
- 230000000873 masking effect Effects 0.000 description 4
- 238000002156 mixing Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 101100129500 Caenorhabditis elegans max-2 gene Proteins 0.000 description 2
- 101100083446 Danio rerio plekhh1 gene Proteins 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 239000006185 dispersion Substances 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 1
- 102100026144 Transferrin receptor protein 1 Human genes 0.000 description 1
- 238000004873 anchoring Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000005311 autocorrelation function Methods 0.000 description 1
- 238000004061 bleaching Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 210000004704 glottis Anatomy 0.000 description 1
- 230000012447 hatching Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000011282 treatment Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/10—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation
Definitions
- the present invention relates to the analysis and synthesis of audio signals, from representations of these signals in the spectral domain.
- the signal spectrum is obtained by transforming successive frames towards the frequency domain.
- the transformation used is most often the fast Fourier transform (TFR); but other known transforms can be used.
- TFR fast Fourier transform
- the number N of samples per frame is typically of the order of 100 to 500, which represents frames of a few tens of milliseconds.
- the TFR is performed on 2N points, N samples at zero being added to the N samples of the frame.
- the spectrum obtained by Fourier transform of the signal frame is the convolution of the real spectrum of the signal by the Fourier transform of the signal analysis window.
- This analysis window which weights the samples of each frame, is necessary to take into account the finite duration of the frame. If the signal frame is directly subjected to the TFR, that is to say if a rectangular analysis window is used, the spectrum obtained is disturbed by the secondary peaks of the TFR of the analysis window.
- windows having better spectral properties that is to say weighting functions whose support is limited to N samples and whose Fourier transform has its energy concentrated in a narrow peak with a strong attenuation of the secondary peaks. The most common of these windows are the Hamming, Hanning and Kaiser windows.
- the successive frames have mutual overlaps of 50% (N / 2 samples).
- WOLA Weighted OLA
- An object of the invention is to propose an analysis and synthesis scheme for audio signals which makes it possible to limit the rate of the analysis frames, while using analysis windows having good spectral properties.
- the invention provides a method of analyzing an audio signal processed by successive frames of N samples, in which the samples of each frame are weighted by an analysis window of the Hamming, Hanning, Kaiser or similar type, a spectrum of the audio signal by transforming each frame of weighted samples in the frequency domain, and the spectrum of the audio signal is processed to deliver synthesis parameters of a signal derived from the analyzed audio signal.
- the successive frames comprise an alternation of frames for which complete sets of synthesis parameters are delivered, which have mutual overlaps of less than N / 2 samples, ie less than 50%, and of frames for which one delivers incomplete sets of synthesis parameters.
- the frames for which complete sets of synthesis parameters are not delivered may not be subject to any spectral analysis. As a variant, an analysis can nevertheless be carried out for these frames, in order to deliver incomplete sets of synthesis parameters including data representing an interpolation error of at least one of the synthesis parameters and / or data representing a filter d interpolation of at least one of the synthesis parameters.
- the processing of the spectrum of the audio signal comprises an extraction of coding parameters with a view to the transmission and / or storage of the coded audio signal.
- the processing of the spectrum of the audio signal comprises a denoising by spectral subtraction.
- Other fields of application can also be envisaged among the audio processing.
- a second aspect of the invention relates to a method of synthesis of an audio signal, in which successive spectral estimates are obtained corresponding respectively to frames of N samples of the audio signal weighted by an analysis window, the successive frames presenting mutual overlaps of L samples, each frame of the signal is evaluated audio by transforming the spectral estimates in the time domain, and combining the evaluated frames to form the synthesized signal.
- each evaluated frame is modified by applying to it a processing corresponding to a division by said analysis window and to a multiplication by a synthesis window, and the synthesized signal is formed as a sum with overlap of the modified frames.
- a set of successive overlapping frames of N samples of the audio signal weighted by an analysis window is evaluated, by transforming in the time domain spectral estimates corresponding respectively to said frames, and combines the evaluated frames to form the synthesized signal.
- the spectral estimates are obtained by processing synthesis parameters respectively associated with the frames of said subset while, for the frames not forming part of the subset, the spectral estimates are obtained with a interpolation of at least part of the synthesis parameters.
- the successive frames of said subset have mutual time offsets of M samples, the number M being greater than N / 2, while the successive frames of said set have mutual time offsets of M / p samples, p being an integer more greater than 1.
- Each evaluated frame is modified by applying to it a processing corresponding to a division by said analysis window and to a multiplication by a synthesis window, and the synthesized signal is formed as an overlapping sum of the modified frames.
- the invention also provides audio processing devices comprising means for implementing the above analysis and synthesis methods.
- FIG. 1 is a block diagram of an audio encoder according to the invention.
- Figures 2 and 3 are diagrams illustrating the formation of audio signal frames in the encoder of Figure 1;
- FIGS. 4 and 5 are graphs showing an example of the audio signal spectrum and illustrating the extraction of the upper and lower envelopes of this spectrum
- FIG. 6 is a block diagram of an example of quantization means usable in the encoder of Figure 1;
- FIG. 7 is a block diagram of means used to extract parameters relating to the phase of the non-harmonic component in a variant of the encoder of Figure 1;
- FIG. 8 is a block diagram of an audio decoder corresponding to the encoder of Figure 1;
- FIG. 9 is a flow diagram of an example of a procedure for smoothing spectral coefficients and extracting minimum phases implemented in the decoder of FIG. 8;
- FIG. 10 is a block diagram of analysis and spectral mixing modules of harmonic and non-harmonic components of the audio signal
- FIGS. 14 and 15 are diagrams illustrating one way of proceeding to the temporal synthesis of the signal frames in the decoder of FIG. 8;
- FIGS. 16 and 17 are graphs showing windowing functions usable in the synthesis of the frames according to Figures 14 and 15;
- FIGS. 18 and 19 are block diagrams of interpolation means which can be used in an alternative embodiment of the coder and the decoder,
- FIG. 20 is a block diagram of interpolation means which can be used in another variant embodiment of the coder.
- FIGS. 21 and 22 are diagrams illustrating another way of carrying out the temporal synthesis of the signal frames in the decoder of FIG. 8, using an interpolation of parameters
- the coder and the decoder described below are digital circuits which can, as is usual in the field of audio signal processing, be produced by programming a digital signal processor (DSP) or an integrated circuit d specific application (ASIC)
- DSP digital signal processor
- ASIC integrated circuit d specific application
- the audio coder represented in FIG. 1 processes an input audio signal x which, in the nonlimiting example considered below, is a speech signal
- the signal x is available in digital form, for example at a frequency d the 8 kHz sampling F e is for example delivered by an analog-digital converter processing the amplified output signal of a microphone
- the input signal x can also be formed from another version, analog or digital, coded or not, of the speech signal
- the coder comprises a module 1 which forms successive frames of audio signal for the various treatments carried out, and an output multiplexer 6 which delivers an output stream ⁇ containing for each frame sets of parameters of quantization from which a decoder will be able to synthesize a decoded version of the audio signal
- the structure of the frames is illustrated by Figures 2 and 3
- Each frame 2 is composed of a no mbre N of consecutive samples of the audio signal x
- the module 1 multiplies the samples of each frame 2 by a windowing function f A , preferably chosen for its good spectral properties
- a windowing function f A preferably chosen for its good spectral properties
- ⁇ is a coefficient for example equal to 6
- l 0 (.) denotes the function of
- the coder in FIG. 1 analyzes the audio signal in the spectral domain. It includes a module 3 which calculates the fast Fourier transform (TFR) of each signal frame.
- TFR fast Fourier transform
- the TFR 3 module obtains the signal spectrum for each frame, the module and phase of which are respectively denoted
- a fundamental frequency detector 4 estimates for each signal frame a value of the fundamental frequency F 0 .
- the detector 4 can apply any known method of analysis of the speech signal of the frame to estimate the fundamental frequency F 0 , for example a method based on the autocorrelation function or the AMDF function, possibly preceded by a bleaching module by linear prediction.
- the estimation can also be performed in the spectral domain or in the cepstral domain.
- Another possibility is to evaluate the time intervals between the consecutive breaks in the speech signal attributable to closures of the glottis of the intervening speaker during the duration of the frame.
- Well-known methods which can be used to detect such micro-ruptures are described in the following articles: M.
- the estimated fundamental frequency F 0 is subject to quantification, for example scalar, by a module 5, which supplies the output multiplexer 6 with an index iF for quantizing the fundamental frequency for each frame of the signal.
- the encoder uses cepstral parametric models to represent an upper envelope and a lower envelope of the spectrum of the audio signal.
- the first step of the cepstral transformation consists in applying to the signal spectrum module a spectral compression function, which can be a logarithmic or root function.
- the coder module 8 thus operates, for each value X (i) of the signal spectrum (0 ⁇ i ⁇ N), the following transformation:
- ⁇ being an exponent between 0 and 1.
- the compressed spectrum LX of the audio signal is processed by a module 9 which extracts spectral amplitudes associated with the harmonics of the signal corresponding to the multiples of the estimated fundamental frequency F0. These amplitudes are then interpolated by a module 10 in order to obtain a compressed upper envelope denoted LX_sup
- the module 9 for extracting the maxima takes into account the possible variation of the fundamental frequency on the analysis frame, the errors that the detector 4 can make, as well as inaccuracies linked to the discrete nature of the frequency sampling. That, the search for the amplitudes of the spectral peaks does not simply consist in taking the values LX ( ⁇ ) corresponding to the indices i such that i F e / 2N is the frequency closest to a harmonic of frequency k F 0 (k> 1)
- the spectral amplitude retained for a harmonic of order k is a local maximum of the spectrum module in the vicinity of the frequency k F 0 (this amplitude is obtained directly in compressed form when the spectral compression 8 is carried out before the extraction maxima 9)
- Figures 4 and 5 show an example of the shape of the compressed spectrum
- the interpolation is carried out between points whose abscissa is the frequency corresponding to the maximum of the amplitude of a spectral peak, and whose ordinate is this maximum, before or after compression
- the interpolation carried out to calculate the upper envelope LX_sup is a simple linear interpolation
- another form of interpolation could be used (for example polynomial or spline)
- the interpolation is carried out between points whose abscissa is a frequency k F 0 multiple of the fundamental frequency (in fact the closest frequency in the discrete spectrum) and whose ordinate is the maximum amplitude, before or after compression, of the spectrum in the vicinity of this multiple frequency
- the extraction mode according to Figure 5 which reposition the peaks on the harmonic frequencies, leads to better precision on the amplitude of the peaks that the decoder will assign at frequencies multiple of the fundamental frequency II can occur a slight frequency shift of the position of these peaks, which is not perceived ely very important and is not avoided either in the case of Figure 4
- the anchor points for the interpolation are merged with the vertices of the harmonic peaks In the case of FIG.
- the maximum amplitude search interval associated with a harmonic of rank k is centered on the index i of the frequency of the highest TFR
- This search interval depends on the sampling frequency F e , the size 2N of the TFR and the range of possible variation of the fundamental frequency This width is typically of the order of ten frequencies with the examples of values previously considered.
- TFRI inverse fast Fourier transform
- the non-linear distortion makes it possible to minimize the modeling error more effectively. It is for example carried out according to a Mel or Bark type frequency scale. This distortion may possibly depend on the estimated fundamental frequency F 0 .
- Figure 1 illustrates the case of the Mel scale. The relationship between the frequencies F of the linear spectrum, expressed in hertz, and the frequencies F 'of the Mel scale is as follows:
- NCS can be equal to 16.
- a post-filtering in the cepstral domain is applied by a module 15 to the compressed upper envelope LX_sup.
- This post-liftrage corresponds to a manipulation of the cx_sup cepstral coefficients delivered by the TRF module! 13, which corresponds approximately to a post-filtering of the harmonic part of the signal by a transfer function having the classical form:
- a (z) is the transfer function of a linear prediction filter of the audio signal
- ⁇ 1 and ⁇ 2 are coefficients between 0 and 1
- ⁇ is a possibly zero pre-emphasis coefficient.
- a normalization module 16 further modifies the cepstral coefficients by imposing the exact modeling constraint of a point on the initial spectrum, which is preferably the most energetic point among the spectral maxima extracted by the module 9 In practice, this normalization only modifies the value of the coefficient c p (0)
- the normalization module 16 operates as follows: it recalculates a value of the spectrum synthesized at the frequency of the maximum indicated by the module 9, by Fourier transform of the truncated and post-liftral cepstral coefficients, taking into account the non-linear distortion of the frequency axis, it determines a normalization gain g N by the logarithmic difference between the maximum value provided by the module 9 and this recalculated value, and it adds the gain g N to the post-liftrated cepstral coefficient cp (0 )
- the post-liftrated and normalized cepstral coefficients are subject to quantification by a module 18 which transmits corresponding quantization indexes icxs to the output multiplexer 6 of the encoder
- the module 18 can operate by vector quantization from cepstral vectors formed from post-liftred and normalized coefficients, noted HERE cx [n] for the signal frame of rank n.
- cepstral vector cx [ ⁇ ] of NCS 16 cepstral coefficients cx [n, 0], cx [n, 1],, cx [n, NCS-1] is distributed into four cepstral sub-vectors each containing four coefficients of consecutive orders
- the cepstral vector cx [n ] can be processed by the means shown in FIG.
- rcx [n] denotes a residual prediction vector for the frame of rank n whose components are respectively denoted rcx [n, 0], rcx [n, 1], rcx [n, NCS-1]
- rcx_q [ ⁇ -1] denotes the quantized residual vector for the frame of rank n— 1, whose components are respectively noted rcx_q [n, 0], rcx_q [n, 1], rcx_q [n, NCS-1]
- the numerator of the relation (10) is obtained by a subtractor 20, whose components of the output vector are divided by the quantities 2- ⁇ ( ⁇ ) at 21
- the residual vector rcx [n] is subdivided into four sub-vectors, corresponding to the subdivision into four cepstral sub-vectors
- the unit 22 proceeds to the vector quantization of each sub-vector of the residual vector rcx [n] This quantization can consist, for each sub-vector srcx [n], in selecting in the dictionary the quantized sub-vector srcx_q [n] which minimizes the quadratic error
- the prediction coefficient ⁇ ( ⁇ ) can be optimized separately for each of the cepstral coefficients
- the quantization dictionaries can also be optimized separately for each of four cepstral sub-vectors
- a second quantization mode can be provided as well as a selection process from that of the two modes which minimizes a criterion of least squares with the cepstral coefficients to be quantified, and transmit with the quantization indexes of the frame a bit indicating which of the two modes has been selected
- These spectral amplitudes are for example calculated in compressed form, by applying the Fourier transform to the quantified cepstral coefficients taking into account the non-linear distortion of the frequency scale used in the cepstral transformation
- the amplitudes thus recalculated are supplied to an adaptation module 29 which compare to maxima amplitudes
- the adaptation module 29 controls the post-hftre 15 so as to minimize a module difference between the spectrum of the audio signal and the corresponding module values calculated at 28
- This module difference can be expressed by a sum of absolute difference values amplitudes, compressed or not, corresponding to one or more of the harmonic frequencies This sum can be weighted according to the spectral amplitudes associated with these frequencies
- the modulus difference taken into account in the adaptation of the post-liftring would take into account all the harmonics of the spectrum
- the module 28 can only re-synthesize the spectral amplitudes for one or more frequencies multiple of the fundamental frequency F 0 , selected on the basis of the importance of the spectrum module in absolute value
- the adaptation module 29 can for example consider the three most intense spectral peaks in the calculation of l module gap to minimize
- the adaptation module 29 estimates a spectral masking curve of the audio signal by means of a psychoacoustic model, and the frequencies taken into account in the calculation of the module deviation to be minimized are selected on the basis the importance of the spectrum modulus relative to the masking curve (we can for example take the three frequencies for which the spectrum modulus exceeds the masking curve the most)
- Different conventional methods can be used to calculate the masking curve at from the audio signal We can for example use the one developed by JD Johnston ("Transform Coding of Audio Signais Using Perceptual Noise C ⁇ te ⁇ a", IEEE Journal on Selected Area in Communications, Vol 6, No 2, February 1988)
- the module 29 can use a filter identification model.
- a simpler method consists in predefining a set of sets of post-liftering parameters, that is to say a set of couples ⁇ - ) , ⁇ 2 in the case of a post-liftring according to relations (8), to carry out the operations incumbent on modules 15, 16, 18 and 28 for each of these sets of parameters, and to retain that of sets of parameters which leads to the minimum modulus deviation between the signal spectrum and the recalculated values
- the quantization indexes provided by the module 18 are then those which relate to the best set of parameters
- the coder determines coefficients cx_ ⁇ nf representing a compressed lower envelope LX_ ⁇ nf A module 30 extracted from the compressed spectrum LX of the spectral amplitudes associated with frequencies located in intermediate spectrum areas with respect to frequencies multiple of the estimated fundamental frequency F 0
- each amplitude associated with a frequency situated in an intermediate zone between two successive harmonics k F 0 and (k + 1) F 0 simply corresponds to the modulus of the spectrum for the frequency (k + 1/2) F 0 located in the middle of the interval separating the two harmonics
- this amplitude could be an average of the spectrum module over a small range surrounding this frequency (k + 1/2) F 0
- a module 31 proceeds to an interpolation, for example linear, of the spectral amplitudes associated with the frequencies located in the intermediate zones to obtain the compressed lower envelope LX_ ⁇ nf
- the cepstral transformation applied to this compressed lower envelope LXjnf is carried out according to a frequency scale resulting from a non-linear distortion applied by a module 32
- the non-linear transformation of the frequency scale for the cepstral transformation of the lower envelope can be performed towards a finer scale at high frequencies than at low frequencies, which advantageously makes it possible to model the unvoiced components of the signal at high frequencies.
- module 32 it may be preferable to adopt in module 32 the same scale as in module 12 (Mel in the example considered).
- the cepstral coefficients cx_inf representing the compressed lower envelope are quantified by a module 34, which can operate in the same way as the module 18 for quantifying the cepstral coefficients representing the compressed upper envelope.
- the vector thus formed is subjected to a vector quantization of prediction residue, carried out by means identical to those represented in FIG. 6 but without subdivision into sub-vectors.
- the coder shown in FIG. 1 does not include any particular device for coding the phases of the spectrum with the harmonics of the audio signal. On the other hand, it includes means 36-40 for coding temporal information linked to the phase of the non-harmonic component represented by the lower envelope.
- a spectral decompression module 36 and a TFRI module 37 form a temporal estimate of the frame of the non-harmonic component.
- the module 36 applies a reciprocal decompression function of the compression function applied by the module 8 (that is to say an exponential or a power 1 / ⁇ function) to the compressed lower envelope LXjnf produced by the module interpolation 31. This provides the modulus of the estimated frame of the non-harmonic component, the phase of which is taken to be equal to that ⁇ ⁇ of the spectrum of the signal X over the frame.
- the inverse Fourier transform performed by the module 37 provides the estimated frame of the non-harmonic component.
- the module 38 subdivides this estimated frame of the non-harmonic component into several time segments.
- the module 38 calculates the energy equal to the sum of the squares of the samples, and forms a vector E1 formed by eight positive real components equal to the eight calculated energies.
- the largest of these eight energies, denoted EM, is also determined to be supplied, with the vector E1, to a normalization module 39.
- the latter divides each component of the vector E1 by EM, so that the normalized vector Emix is formed of eight components between 0 and 1. It is this normalized vector Emix, or weighting vector, which is subject to quantization by module 40. This can perform vector quantization with a dictionary determined during a prior learning.
- the quantization index iEm is supplied by the module 40 to the output multiplexer 6 of the coder.
- FIG. 7 shows an alternative embodiment of the means used by the coder of FIG. 1 to determine the vector Emix of energy weighting of the frame of the non-harmonic component.
- the spectral decompression and TFRI modules 36, 37 operate like those which have the same references in FIG. 1.
- a selection module 42 is added to determine the value of the module of the spectrum subjected to the inverse Fourier transform 37. On the basis of the estimated fundamental frequency F 0 , the module 42 identifies harmonic regions and non-harmonic regions of the spectrum audio signal.
- a frequency will be considered to belong to a harmonic region if it is in a frequency interval centered on a harmonic kF 0 and of width corresponding to a width of spectral line synthesized, and to a non-harmonic region otherwise.
- the complex signal subjected to TFRI 37 is equal to the value of the spectrum, that is to say that its module and its phase correspond to the values
- this complex signal has the same phase ⁇ ⁇ as the spectrum and a modulus given by the lower envelope after spectral decompression 36. This way of proceeding according to FIG. 7 provides a more precise modeling of the regions not harmonics.
- the decoder shown in Figure 8 includes a demultiplexer input 45 which extracts from the bit stream ⁇ , coming from an encoder according to FIG. 1, the indexes iF, icxs, ICXI, lEm for quantification of the fundamental frequency F 0 , cepstral coefficients representing the compressed upper envelope, coefficients representing the compressed lower envelope, and the weighting vector Emix, and distributing them respectively to modules 46, 47, 48 and 49
- These modules 46-49 include quantization dictionaries similar to those of modules 5, 18, 34 and 40 of FIG.
- the modules 47 and 48 have dictionaries to form the quantized prediction residues rcx_q [n], and they deduce therefrom the quantized cepstral vectors cx_q [n] with identical elements elements 23-26 of figure 6
- These quantified cepstral vectors cx_q [n] provide the cepstral coefficients cx_sup_q and cx_ ⁇ nf_q processed by the decoder
- a module 51 calculates the fast Fourier transform of the cepstral coefficients cx_su ⁇ for each signal frame
- the frequency spectrum of the resulting compressed spectrum is modified non-linearly by a module 52 applying the reciprocal non-linear transformation of that of module 12 of figure 1, and which provides the estimate LX_sup of the compressed upper envelope
- a spectral decompression of LX_sup operated by a module 53, provides the upper envelope X_sup comprising the estimated values of the spectrum module at frequencies multiple of the frequency fundamental F 0
- the module 54 synthesizes the spectral estimate X v of the harmonic component of the audio signal, by a sum of spectral lines centered on the frequencies multiple of the fundamental frequency F 0 and whose amplitudes (in module) are those given by the envelope superior X_sup
- the decoder in FIG. 8 is capable of extracting information on this phase from cepstral coefficients cx_sup_q representing the compressed upper envelope This phase information is used to assign a phase ⁇ (k) to each of the spectral lines determined by the module 54 in the estimation of the harmonic component of the signal
- the speech signal can be considered to be at minimum phase.
- the information of minimum phase can be easily deduced from a cepstral modeling. minimum phase information is therefore calculated for each harmonic frequency.
- the minimum phase assumption means that the energy of the synthesized signal is localized at the start of each period of the fundamental frequency F 0 .
- a little dispersion is introduced by means of a specific post-liftering of the cepstrums during the synthesis of the phase.
- This post-liftrage carried out by the module 55 in FIG. 8, it is possible to accentuate the form resonances of the envelope and therefore to control the dispersion of the phases.
- This post-liftrage is for example of the form (8).
- module 56 deduces post-liftrated cepstral coefficients and smoothed the minimum phase assigned to each spectral line representing a harmonic peak of the spectrum.
- the operations performed by the modules 56, 57 for smoothing and extracting the minimum phase are illustrated by the flowchart in FIG. 9.
- the module 56 examines the variations of the cepstral coefficients in order to apply a lesser smoothing in the presence of sudden variations only in the presence of slow variations. For this, it performs the smoothing of the cepstral coefficients by means of a forgetting factor ⁇ c chosen as a function of a comparison between a threshold d th and a distance d between two successive sets of post-liftrated cepstral coefficients.
- the threshold d th is itself adapted as a function of the variations of the cepstral coefficients.
- the first step 60 consists in calculating the distance d between the two successive vectors relating to the frames n-1 and n. These vectors, denoted here cxp [n-1] and cxp [n], correspond for each frame to all of the NCS post-liftral cepstral coefficients representing the compressed upper envelope.
- the distance used can in particular be the Euclidean distance between the two vectors or even a quadratic distance.
- Two smoothings are first carried out, respectively by means of forgetting factors ⁇ mjn and ⁇ ma ⁇ , to determine a minimum distance d mjn and a maximum distance d ma ⁇ .
- the forgetting factors ⁇ mjr ⁇ and ⁇ max are themselves selected from two distinct values, respectively ⁇ min , ⁇ min2 and ⁇ maxi ' ⁇ -ma i between 0 and 1, the indices ⁇ mjn1 , ⁇ max1 each being substantially closer to 0 than the indices ⁇ mjn2 , ⁇ max2 . If d> d min (test 61), the forget factor ⁇ min is equal to ⁇ min1 (step 62); otherwise it is taken equal to ⁇ mjn2 (step 63).
- step 64 the minimum distance d min is taken equal to ⁇ min min + (- ⁇ min) d - If d > d max ( test 65 ) - the forget factor ⁇ max is 3 al to ⁇ max1 (step 66); otherwise it is taken equal to ⁇ max2 (step 67).
- step 68 the minimum distance d ma ⁇ is taken equal to ⁇ max .d max + (1- ⁇ max ) .d.
- step 72 If the distance d between the two consecutive cepstral vectors is greater than the threshold d th (test 71), a value ⁇ c1 relatively close to 0 is adopted for the forget factor ⁇ c (step 72). In this case, the corresponding signal is considered to be of the non-stationary type, so that there is no need to keep a large memory of the previous cepstral coefficients. If d ⁇ d th , in step 73 we adopt for the forget factor ⁇ c a value ⁇ c2 less close to 0 in order to further smooth the cepstral coefficients.
- the module 57 then calculates the minimum phases ⁇ (k) associated with the harmonics kF 0 .
- the minimum phase for a harmonic of order k is given by:
- step 75 the harmonic index k is initialized to 1.
- the phase ⁇ (k) and the cepstral index m are initialized respectively at 0 and 1 in step 76.
- the module 57 adds to phase ⁇ (k) the quantity -2.cxl [n, m] .sin (2 ⁇ mk.F 0 / F e ).
- the cepstral index m is incremented in step 78 and compared to NCS in step 79. Steps 77 and 78 are repeated as long as m ⁇ NCS.
- the calculation of minimum phases 76-79 is repeated for the following harmonic as long as kF 0 F e / 2 (test 81).
- the module 54 takes account of a constant phase over the width of each spectral line, equal to the minimum phase ⁇ (k) supplied for the corresponding harmonic k by the module 57.
- the estimate X v of the harmonic component is synthesized by summing spectral lines positioned at the harmonic frequencies of the fundamental frequency F 0 .
- the spectral lines can be positioned on the frequency axis with a resolution greater than the resolution of the Fourier transform. For that, one precalculates once and for all a spectral line of reference according to the higher resolution. This calculation can consist of a Fourier transform of the analysis window fA with a transform size of 16384 points, providing a resolution of 0.5 Hz per point.
- each harmonic line is then carried out by the module 54 by positioning on the frequency axis the reference line at high resolution, and by sub-sampling this spectral line of reference to reduce to the resolution of 16.625 Hz of the Fourier transform on 512 points. This allows to precisely position the spectral line.
- the TFR module 85 of the decoder of FIG. 8 receives the NCI quantified cepstral coefficients cx_inf_q of orders 0 to NCI - 1, and it advantageously supplements them by the NCS - NCI cepstral coefficients cx_sup_q d NCI to NCS order - 1 representing the upper envelope. Indeed, it can be estimated as a first approximation that the rapid variations of the compressed lower envelope are well reproduced by those of the compressed upper envelope. In another embodiment, the TFR 85 module could only consider the NCI cepstraux parameters cx_inf_q.
- the module 86 converts the frequency scale reciprocally from the conversion operated by the module 32 of the coder, in order to restore the estimate LX_inf of the compressed lower envelope, subjected to the spectral decompression module 87.
- the decoder has of a lower envelope X_ ⁇ nf comprising the values of the spectrum modulus in the valleys located between the harmonic peaks
- This envelope Xjnf will modulate the spectrum of a noise frame whose phase is processed as a function of the quantized weighting vector Emix extracted by the module 49
- a generator 88 delivers a normalized noise frame whose 4 ms segments are weighted in a module 89 in accordance with the normalized components of the Emix vector provided by module 49 for the current frame
- This noise is a high-pass filtered white noise to take account of the low level which in principle the non-voiced component has at low frequencies
- the Fourier transform of the resulting frame is calculated by the TFR 91 module
- the spectral estimate X uv of the non-harmonic component is determined by the spectral synthesis module 92 which performs frequency-by-frequency weighting. This weighting consists in multiplying each complex spectral value supplied by the TFR module 91 by the value of the lower envelope X nf obtained for the same frequency by the spectral decompression module 87
- the analysis module 96 comprises a unit 97 for estimating a degree of voicing W dependent on the frequency, from which four gains dependent on the frequency, namely two gains g v , g uv controlling the relative importance of the harmonic and non-harmonic components in the synthesized signal, and two gains g v g uv used to noise the phase of the harmonic component
- the degree of voicing W ( ⁇ ) is a continuously variable value between 0 and 1 determined for each frequency index i (0 ⁇ i ⁇ N) as a function of the upper envelope X_sup ( ⁇ ) and the lower envelope X_inf (i) obtained for this frequency i by the decompression modules 53, 87.
- the degree of voicing W (i) is estimated by the unit 97 for each frequency index i corresponding to a harmonic of the fundamental frequency F 0 ,
- the threshold Vth (F 0 ) corresponds to the average dynamics calculated on a synthetic spectrum purely voiced at the fundamental frequency. It is advantageously chosen depending on the fundamental frequency F 0 .
- the degree of voicing W (i) for a frequency other than the harmonic frequencies is obtained simply as being equal to that estimated for the nearest harmonic.
- the gain g v (i), which depends on the frequency, is obtained by applying a non-linear function to the degree of voicing W (i) (block 98).
- phase ⁇ v of the mixed harmonic component is the result of a linear combination of the phases ⁇ v , ⁇ uv of the harmonic and non-harmonic components X v , X uv synthesized by the modules 54, 92.
- the gains g v g uv respectively applied to these phases are calculated from the degree of voicing W and also weighted as a function of the frequency index i. since the phase noise is only really useful beyond a certain frequency.
- a first gain g v1 is calculated by applying a non-linear function to the degree of voicing W (i), as shown diagrammatically by block 100 in FIG. 10.
- This non-linear function can have the form represented in FIG. 12:
- g v1 _ ⁇ (i) 1 if W4 ⁇ W (i) ⁇ 1 the thresholds W3 and W4 being such that 0 ⁇ W3 ⁇ W4 ⁇ 1, and the minimum gain G1 being between 0 and 1.
- a multiplier 101 multiplies for each index frequency i the gain g v1 by another gain g v2 depending only on the frequency index i, to form the gain g v (i).
- the gain g v2 (i) depends non-linearly on the frequency index i, for example as shown in Figure 13:
- g v2 _ ⁇ (i) G2 if i2 ⁇ i ⁇ 1 the indices il and i2 being such that 0 ⁇ il ⁇ i2 ⁇ N, and the minimum gain G2 being between 0 and 1.
- the complex spectrum Y of the synthesized signal is produced by the mixing module 95, which realizes the following mixing relation, for 0 ⁇ i ⁇ N:
- the time synthesis module 116 performs an overlap sum of modified frames with respect to those successively evaluated at the output of the module 115
- the modification can be seen in two stages illustrated respectively in FIGS. 14 and 15
- the first step ( Figure 14) consists in multiplying each frame 2 'delivered by the TFRI module 115 by a window 1 / f A inverse of the analysis window f A used by the module 1 of the coder The samples of frame 2 "which result are therefore weighted uniformly
- each sample of the decoded audio signal x thus obtained is assigned a uniform overall weight equal to A This overall weight comes from the contribution of a single frame if the sample has in this frame a rank i such that L ⁇ i ⁇ N - L, and includes the summed contributions of two successive frames if 0 ⁇ i ⁇ L where N - L ⁇ ⁇ ⁇ N
- FIG. 16 shows the appearance of the compound window f c in the case where the analysis window f A is a Hamming window and the synthesis window f s has the form given by the relations (19) to (21)
- the coder in FIG. 1 can increase the rate of formation and analysis of the frames, in order to transmit more quantization parameters to the decoder.
- a frame of N 256 samples (32 ms) is formed every 20 ms
- the notations cx_q [n-1] and cx_q [n] denote quantified cepstral vectors determined, for two successive frames of whole rank, by the quantization module 18 and / or by the quantization 34. These vectors comprise for example four consecutive cepstral coefficients each. They could also include more cepstral coefficients.
- a module 120 performs an interpolation of these two cepstral vectors cx_q [n-1] and cx_q [n], in order to estimate an intermediate value cx_i [n-1/2].
- the interpolation performed by the module 120 can be a simple arithmetic mean of the vectors cx_q [n-1] and cx_q [n].
- the module 120 could apply a more sophisticated interpolation formula, for example polynomial, also relying on the cepstral vectors obtained for frames prior to the frame n-1.
- the interpolation takes account of the relative position of each interpolated frame.
- the coder uses the means described above to calculate the cepstral coefficients cx [n-1/2] relating to the frame of half-integer rank.
- these cepstral coefficients are those provided by the module of TFR1 13 after post-liftrage 15 (for example with the same post-liftrage coefficients as for the previous frame n-1) and normalization 16.
- the cepstral coefficients cx [n-1/2] are those delivered by the TFRI module 33.
- a subtractor 121 forms the difference ecx [n-1/2] between the cepstral coefficients cx [n-1/2] calculated for the half-integer row frame and the coefficients cx_i [n-1/2] estimated by interpolation.
- This difference is supplied to a quantization module 122 which addresses quantization indices icx [n-1/2] to the output multiplexer 6 of the coder.
- the module 122 operates for example by vector quantization of the ecx interpolation errors [n-1/2] successively determined for the half-integer rank frames.
- This quantification of the interpolation error can be carried out by the coder for each of the NCS + NCI cepstral coefficients used by the decoder, or only for some of them, typically those of orders the smallest.
- the decoder essentially functions as that described with reference to Figure 8 to determine the signal frames of whole rank.
- An interpolation module 124 identical to the module 120 of the coder estimates the intermediate coefficients cx_i [n-1/2] - from the quantized coefficients cx_q [n-1] and cx_q [ ⁇ ] supplied by the module 47 and / or the module 48 from the icxs, icxi indexes extracted from the flow ⁇ .
- a parameter extraction module 125 receives the quantization index icx [n-1/2] from the input demultiplexer 45 of the decoder, and deduces therefrom the quantized interpolation error ecx_q [n-1/2] from the same quantization dictionary as that used by the module 122 of the coder.
- An adder 126 sums the cepstral vectors cx_i [n-1/2] and ecx_q [n-1/2] in order to provide the cepstral coefficients cx [n-1/2] which will be used by the decoder (modules 51 - 57, 95, 96, 1 15 and / or modules 85-87, 92, 95, 96, 115) to form the interpolated frame of rank n-1/2.
- the decoder can also interpolate the other parameters F 0 , Emix used to synthesize the signal frames.
- the fundamental frequency F 0 can be interpolated linearly, either in the time domain, or (preferably) directly in the frequency domain.
- the interpolation should be carried out after denormalization and of course taking account of the time offsets between frames.
- the coder uses the cepstral vectors cx_q [n], cx_q [n-1], ..., cx_q [nr] and cx_q [n-1/2] calculated for the last frames passed (r> 1) to identify an optimal interpolator filter which, when subject to the quantified cepstral vectors cx_q [nr], ..., cx_q [n] relating to frames of whole rank, delivers an interpolated cepstral vector cx_i [n-1/2] which has a minimum distance with the vector cx [n-1/2] calculated for the last frame of half-whole row.
- this interpolator filter 128 is present in the coder, and a subtractor 129 subtracts its output cx_i [n-1/2] from the calculated cepstral vector cx [n-1/2].
- a minimization module 130 determines the set of parameters ⁇ P ⁇ of the interpolator filter 128, for which the interpolation error ecx [n-1/2] delivered by the subtractor 129 has a minimum standard. This set of parameters ⁇ P ⁇ is addressed to a quantization module 131 which provides a corresponding quantization index iP to the output multiplexer 6 of the coder.
- the decoder From the quantization indexes iP of the parameters ⁇ P ⁇ obtained in the bit stream ⁇ , the decoder reconstructs the interpolator filter 128 (except for quantization errors), and processes the spectral vectors cx_q [nr], ..., cx_q [ n] in order to estimate the cepstral coefficients cx [n-1/2] used to synthesize the half-integer rank frames.
- the decoder can use a simple interpolation method (without transmission of parameters from the coder for half-integer rank frames), an interpolation method with consideration of an interpolation error quantized (according to Figures 17 and 18), or an interpolation method with an optimal interpolator filter (according to Figure 19) to evaluate the half-integer rank frames in addition to the whole rank frames evaluated directly as explained with reference to Figures 8 to 13.
- the time synthesis module 116 can then combine all of these evaluated frames to form the synthesized signal x in the manner explained above. after with reference to Figures 14, 21 and 22.
- the module 116 performs an overlap sum of modified frames with respect to those successively evaluated at the output of the module 115, and this modification can be seen in two stages, the first of which is identical to that previously described with reference to FIG. 14 (divide the samples of the frame 2 'by the analysis window f).
- summary window f s (i) gradually increases for i going from
- the synthesis window fs can be, over this interval, a Hamming window (as represented in FIG. 21) or a Hanning window.
- FIG. 21 shows the successive frames 2 "repositioned in time by the module 116.
- the hatching indicates the portions eliminated from the frames (summary window at 0). It can be seen that by performing the overlapping sum of the samples of the successive frames, the property (25) ensures a homogeneous weighting of the samples of the synthesized signal.
- the interpolated frames can be the subject of a reduced transmission of coding parameters, as described above, but this is not compulsory.
- This embodiment makes it possible to maintain a relatively large interval M between two analysis frames, and therefore to limit the required transmission rate, while limiting the discontinuities likely to appear due to the size of this interval relative to the scales. of time typical of the variations of the parameters of the audio signal, in particular the cepstral coefficients and the fundamental frequency.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Analysing Materials By The Use Of Radiation (AREA)
- Diaphragms For Electromechanical Transducers (AREA)
- Investigating Or Analysing Materials By The Use Of Chemical Reactions (AREA)
- Solid-Sorbent Or Filter-Aiding Compositions (AREA)
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU65751/00A AU6575100A (en) | 1999-07-05 | 2000-07-04 | Methods and device for audio analysis and synthesis |
EP00953223A EP1194923B1 (fr) | 1999-07-05 | 2000-07-04 | Procedes et dispositifs d'analyse et de synthese audio |
DE60025615T DE60025615D1 (de) | 1999-07-05 | 2000-07-04 | Verfahren und system für audio analyse und synthese |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR9908638A FR2796194B1 (fr) | 1999-07-05 | 1999-07-05 | Procedes et dispositifs d'analyse et de synthese audio |
FR99/08638 | 1999-07-05 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2001003116A1 true WO2001003116A1 (fr) | 2001-01-11 |
Family
ID=9547707
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/FR2000/001904 WO2001003116A1 (fr) | 1999-07-05 | 2000-07-04 | Procedes et dispositifs d'analyse et de synthese audio |
Country Status (6)
Country | Link |
---|---|
EP (1) | EP1194923B1 (fr) |
AT (1) | ATE316284T1 (fr) |
AU (1) | AU6575100A (fr) |
DE (1) | DE60025615D1 (fr) |
FR (1) | FR2796194B1 (fr) |
WO (1) | WO2001003116A1 (fr) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0893791A2 (fr) * | 1990-12-05 | 1999-01-27 | Digital Voice Systems, Inc. | Procédés de codage, amélioration et synthèse de la parole |
US5878388A (en) * | 1992-03-18 | 1999-03-02 | Sony Corporation | Voice analysis-synthesis method using noise having diffusion which varies with frequency band to modify predicted phases of transmitted pitch data blocks |
US5911130A (en) * | 1995-05-30 | 1999-06-08 | Victor Company Of Japan, Ltd. | Audio signal compression and decompression utilizing amplitude, frequency, and time information |
-
1999
- 1999-07-05 FR FR9908638A patent/FR2796194B1/fr not_active Expired - Fee Related
-
2000
- 2000-07-04 EP EP00953223A patent/EP1194923B1/fr not_active Expired - Lifetime
- 2000-07-04 AU AU65751/00A patent/AU6575100A/en not_active Abandoned
- 2000-07-04 DE DE60025615T patent/DE60025615D1/de not_active Expired - Lifetime
- 2000-07-04 AT AT00953223T patent/ATE316284T1/de not_active IP Right Cessation
- 2000-07-04 WO PCT/FR2000/001904 patent/WO2001003116A1/fr active IP Right Grant
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0893791A2 (fr) * | 1990-12-05 | 1999-01-27 | Digital Voice Systems, Inc. | Procédés de codage, amélioration et synthèse de la parole |
US5878388A (en) * | 1992-03-18 | 1999-03-02 | Sony Corporation | Voice analysis-synthesis method using noise having diffusion which varies with frequency band to modify predicted phases of transmitted pitch data blocks |
US5911130A (en) * | 1995-05-30 | 1999-06-08 | Victor Company Of Japan, Ltd. | Audio signal compression and decompression utilizing amplitude, frequency, and time information |
Non-Patent Citations (3)
Title |
---|
9TH INTERNATIONAL CZECH - SLOVAK SCIENTIFIC CONFERENCE. RADIOELEKTRONIKA 99. CONFERENCE PROCEEDINGS, PROCEEDINGS OF 9TH INTERNATIONAL CZECH - SLOVAK SCIENTIFIC CONFERENCE. RADIOELEKTRONIKA 99, BRNO, CZECH REPUBLIC, 27-28 APRIL 1999, 1999, Brno, Czech Republic, Brno Univ. Technol, Czech Republic, pages 186 - 189, ISBN: 80-214-1327-1 * |
AHMADI S ET AL: "New techniques for sinusoidal coding of speech at 2400 bps", CONFERENCE RECORD OF THIRTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (CAT. NO.96CB36004), CONFERENCE RECORD OF THE THIRTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, PACIFIC GROVE, CA, USA, 3-6 NOV. 1996, 1997, Los Alamitos, CA, USA, IEEE Comput. Soc. Press, USA, pages 770 - 774 vol.1, XP002138769, ISBN: 0-8186-7646-9 * |
DATABASE INSPEC [online] INSTITUTE OF ELECTRICAL ENGINEERS, STEVENAGE, GB; LAGLER A ET AL: "Real-time fixed-point DSP-implementation of spectral substraction algorithm for speech enhancement in noisy environment", XP002139930, Database accession no. 6604117 * |
Also Published As
Publication number | Publication date |
---|---|
DE60025615D1 (de) | 2006-04-06 |
AU6575100A (en) | 2001-01-22 |
FR2796194A1 (fr) | 2001-01-12 |
EP1194923A1 (fr) | 2002-04-10 |
EP1194923B1 (fr) | 2006-01-18 |
FR2796194B1 (fr) | 2002-05-03 |
ATE316284T1 (de) | 2006-02-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0782128B1 (fr) | Procédé d'analyse par prédiction linéaire d'un signal audiofréquence, et procédés de codage et de décodage d'un signal audiofréquence en comportant application | |
US6708145B1 (en) | Enhancing perceptual performance of sbr and related hfr coding methods by adaptive noise-floor addition and noise substitution limiting | |
EP1593116B1 (fr) | Procédé pour le traitement numérique différencié de la voix et de la musique, le filtrage de bruit, la création d'effets spéciaux et dispositif pour la mise en oeuvre dudit procédé | |
EP1692689B1 (fr) | Procede de codage multiple optimise | |
US20100161319A1 (en) | Device and method for generating a complex spectral representation of a discrete-time signal | |
EP0002998A1 (fr) | Procédé de compression de données relatives au signal vocal et dispositif mettant en oeuvre ledit procédé | |
EP0801790A1 (fr) | Procede de codage de parole a analyse par synthese | |
EP0428445B1 (fr) | Procédé et dispositif de codage de filtres prédicteurs de vocodeurs très bas débit | |
EP1606792A1 (fr) | Procede d analyse d informations de frequence fondament ale et procede et systeme de conversion de voix mettant en oeuvre un tel procede d analyse | |
FR2653557A1 (fr) | Appareil et procede pour le traitement de la parole. | |
WO2023165946A1 (fr) | Codage et décodage optimisé d'un signal audio utilisant un auto-encodeur à base de réseau de neurones | |
EP1192619B1 (fr) | Codage et decodage audio par interpolation | |
EP1192618B1 (fr) | Codage audio avec liftrage adaptif | |
EP1194923B1 (fr) | Procedes et dispositifs d'analyse et de synthese audio | |
EP1192621B1 (fr) | Codage audio avec composants harmoniques | |
EP0616315A1 (fr) | Dispositif de codage et de décodage numérique de la parole, procédé d'exploration d'un dictionnaire pseudo-logarithmique de délais LTP, et procédé d'analyse LTP | |
WO2001003121A1 (fr) | Codage et decodage audio avec composants harmoniques et phase minimale | |
WO2001003119A1 (fr) | Codage et decodage audio incluant des composantes non harmoniques du signal | |
FR2773653A1 (fr) | Dispositifs de codage/decodage de donnees, et supports d'enregistrement memorisant un programme de codage/decodage de donnees au moyen d'un filtre de ponderation frequentielle | |
FR2737360A1 (fr) | Procedes de codage et de decodage de signaux audiofrequence, codeur et decodeur pour la mise en oeuvre de tels procedes | |
WO2013135997A1 (fr) | Modification des caractéristiques spectrales d'un filtre de prédiction linéaire d'un signal audionumérique représenté par ses coefficients lsf ou isf | |
EP0454552A2 (fr) | ProcédÀ© et dispositif de codage bas débit de la parole | |
FR2697937A1 (fr) | Procédé de discrimination de la parole en présence de bruits ambiants et vocodeur à faible débit pour la mise en Óoeuvre du procédé. | |
FR2739482A1 (fr) | Procede et dispositif pour l'evaluation du voisement du signal de parole par sous bandes dans des vocodeurs | |
FR2980620A1 (fr) | Traitement d'amelioration de la qualite des signaux audiofrequences decodes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2000953223 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 10019961 Country of ref document: US |
|
WWP | Wipo information: published in national office |
Ref document number: 2000953223 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWG | Wipo information: grant in national office |
Ref document number: 2000953223 Country of ref document: EP |