EP1979901B1 - Procede et dispositifs pour le codage de signaux audio - Google Patents
Procede et dispositifs pour le codage de signaux audio Download PDFInfo
- Publication number
- EP1979901B1 EP1979901B1 EP06706508.6A EP06706508A EP1979901B1 EP 1979901 B1 EP1979901 B1 EP 1979901B1 EP 06706508 A EP06706508 A EP 06706508A EP 1979901 B1 EP1979901 B1 EP 1979901B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- audio signal
- audio
- fundamental period
- signal
- subband
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000005236 sound signal Effects 0.000 title claims description 92
- 238000000034 method Methods 0.000 title claims description 30
- 230000005284 excitation Effects 0.000 claims description 70
- 238000005070 sampling Methods 0.000 claims description 46
- 238000002156 mixing Methods 0.000 claims description 24
- 230000015572 biosynthetic process Effects 0.000 claims description 19
- 238000003786 synthesis reaction Methods 0.000 claims description 16
- 238000007493 shaping process Methods 0.000 claims description 10
- 238000009795 derivation Methods 0.000 claims description 6
- 230000001419 dependent effect Effects 0.000 claims description 5
- 230000005540 biological transmission Effects 0.000 description 10
- 238000001228 spectrum Methods 0.000 description 9
- 238000001914 filtration Methods 0.000 description 8
- 230000003595 spectral effect Effects 0.000 description 8
- 230000003044 adaptive effect Effects 0.000 description 7
- 230000004044 response Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- 230000001186 cumulative effect Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 239000004606 Fillers/Extenders Substances 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000001208 nuclear magnetic resonance pulse sequence Methods 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 208000008918 voyeurism Diseases 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0204—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
- G10L19/0208—Subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/038—Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
Definitions
- the invention relates to a method and arrangements for audio signal coding.
- the invention relates to a method and an audio signal decoder for forming an audio signal and an audio signal encoder.
- Efficient compression of audio signals is also an important consideration in the context of storage or archival of audio signals.
- Coding methods in which an audio signal synthesized by an audio synthesis filter is timed to an audio signal to be transmitted prove to be particularly efficient is adjusted by optimizing filter parameters. Such a procedure is often referred to as Analysisby Synthesis.
- the audio synthesis filter is excited by a preferably also to be optimized excitation signal. Filtering is often referred to as formant synthesis.
- LPC coefficients LPC: Linear Predictive Coding
- / or parameters specifying a spectral and / or temporal envelope of the audio signal can be used as filter parameters.
- the optimized filter parameters and parameters specifying the excitation signal are then transmitted to the receiver on a timely basis in order to form a synthetic audio signal there by means of an audio signal decoder provided on the receiver side, which is as similar as possible to the original audio signal with regard to the subjective auditory impression.
- Such an audio coding method is known from ITU-T Recommendation G.729.
- a real-time audio signal with a bandwidth of 4 kHz can be reduced to a transmission rate of 8 kbit / s.
- the achievable transmission bandwidth and audio synthesis quality depend significantly on the generation of a suitable excitation signal.
- a bandwidth-expanding excitation signal u hb (k) in a high subband can be formed as a spectral copy of the narrow-band excitation signal u nb (k) , (Through the index k, samples of the excitation signal or of other signals are indicated here and below.)
- the copy can hereby be formed by spectral translation or by spectral reflection of the narrow-band excitation signal u nb (k).
- spectral translation or reflection distorts the spectrum of the excitation signal inharmonic and / or it causes a significant audible phase error in the spectrum.
- the document D1 discloses a voice bandwidth extender and a voice bandwidth extension method.
- the device comprises inter alia a demultiplexer which divides a received signal into multiplexed parameters such as voice information, ie an index.
- This index includes and outputs a gain code vector, an adaptive codebook delay index, information about a sound source signal and an index of a sound source codevector, and an index of a spectrum parameter.
- An adder uses a sound source signal which is expanded in the frequency bandwidth, and adds this signal to a signal from the conversion of the reproduction speech signal at a sampling frequency representing a higher frequency component and outputting a voice signal extended by a frequency bandwidth.
- frequency components of the audio signal attributable to a first subband are formed by means of a subband decoder on the basis of supplied basic period values each indicating a fundamental period of the audio signal.
- a second subband attributable frequency components of the audio signal are formed by exciting an audio synthesis filter by means of an excitation signal specific for the second subband.
- a basic period parameter is derived from the basic period values by an excitation signal generator.
- pulses with a pulse shape dependent on the basic period parameter are formed by the excitation signal generator in a time interval determined by the basic period parameter and mixed with a noise signal.
- frequency components of the audio signal attributable to a further second subband can be synthesized on the basis of basic period values which have already been made available for a subband decoder specific to the first subband. Since no additional audio parameters are generally required for the generation of the noise signal, the generation of the excitation signal generally requires no additional transmission bandwidth.
- the audio quality of the audio signal can be considerably improved, in particular since a harmonic harmonic content determined by the basic period values can be reproduced in the second subband.
- the basic period parameter may specify the fundamental period of the audio signal except for a fraction of a first sampling interval associated with the subband decoder.
- the pulses can be spaced with a relation to the sub-band decoder higher accuracy, whereby a harmonic spectrum of the audio signal in the second sub-band can be modeled finer.
- the pulse shape of a respective pulse can be selected from different pulse forms stored in a look-up table, depending on a portion of the fundamental period parameter which is not integral in units of the first sampling interval. From the look-up table, very different pulse shapes can be retrieved in real-time by simple retrieval with low switching, processing or computational effort.
- the pulse shapes to be stored can be optimized in advance in terms of a lifelike audio playback. In fact, the cumulative effects or the cumulative impulse response of several filters, decimators and / or modulators can be calculated in advance and stored in the look-up table as a correspondingly shaped pulse.
- a decimator in this context is a converter which multiplies a sampling interval of a signal by a decimation factor m by rejecting all samples except for every m-th sample.
- a modulator is a filter that multiplies individual samples of a signal by predetermined individual factors and outputs the respective product.
- the time interval of the pulses can be determined by an integral part of the basic period parameter in units of the first sampling interval.
- the pulses from a predetermined pulse shape can be formed by samples having a second sampling distance which is smaller by a bandwidth expansion factor than the first sampling distance.
- the time interval of the pulses can then be in units of the second sample pitch are determined by the basic period parameter multiplied by the bandwidth expansion factor.
- the bandwidth expansion factor it is preferable to select the inverse N of the fraction 1 / N corresponding to the accuracy of the basic period parameter in units of the first sampling pitch.
- the pulses may be formed by a pulse shaping filter having filter coefficients set at the second sampling pitch.
- the pulses can be filtered before or after admixing of the noise signal by at least one high, low and / or bandpass and / or decimated by at least one decimator.
- the basic period parameter can be derived from one or more basic period values on a timely basis.
- the basic period parameter can be derived from fluctuation-compensating, preferably non-linear, associated basic period values of several time frames. In this way it can be avoided that fluctuations or jumps in the fundamental period values, e.g. may result from spurious noise measurements of an audio background frequency, adversely affect the basic period parameters.
- a relative deviation of a current base period value from a previous base period value or from a quantity derived therefrom can be determined and attenuated as part of the derivation of the basic period parameter.
- a mixing ratio between the pulses and the noise signal by at least one mixing parameter certainly.
- This time frame can be derived from a subband decoder existing in the level ratio between a tonal and atonal audio signal component of the first subband.
- an overtone-to-noise ratio in the first subband level parameter related to the formation of the audio signal components in the second sub-band can be used.
- the level ratio can be converted in such a way that, when the atonal audio signal component predominates, the tonal audio signal component is further lowered. Since, in the case of natural audio sources, an atonal audio signal component in higher frequency bands, in particular from 6 kHz, increasingly prevails, the quality of reproduction can generally be improved by such a reduction.
- FIG. 1 shows a schematic representation of an audio signal decoder which generates a synthetic audio signal SAS from a supplied data stream of coded audio data AD.
- the generation of the synthetic audio signal SAS is divided into different subbands.
- frequency components of the synthetic audio signal SAS attributable to a first, low subband are generated separately from frequency components of the synthetic audio signal SAS attributable to a second, high subband.
- the deep subband is also referred to below as narrowband.
- the supplied audio data AD is decoded by a deep subband-specific low-band decoder LBD, ie a decoder having a bandwidth substantially only the low subband.
- LBD deep subband-specific low-band decoder
- a synthetic excitation signal u (k) is formed by a high-band excitation signal generator HBG on the basis of the side information g FIX , g LTP and ⁇ LTP extracted by the low-band decoder LBD.
- the variable k here and in the following denotes an index by which digital samples of the excitation signal or other signals are indexed.
- an audio signal encoder can also be realized in a simple manner.
- the synthesized audio signal SAS is to be forwarded to a comparison device (not shown), which compares the synthesized audio signal SAS with an audio signal to be encoded.
- the synthesized audio signal SAS is then adjusted to the audio signal to be encoded.
- the invention may be used to advantage for general audio coding, subband audio synthesis and artificial bandwidth expansion of audio signals.
- the latter can be interpreted as a special case of a subband audio synthesis in which information about a particular subband is used to reconstruct or estimate missing frequency components of another subband.
- the aforementioned applications are based on a suitably formed excitation signal u (k).
- the excitation signal u (k) which represents a spectral fine structure of an audio signal, can be converted by the audio synthesis filter ASYN in different ways, eg by shaping its time and / or frequency response.
- the synthetic excitation signal u (k) is preferably to be generated so that an overtone-to-noise ratio, i. An energy or intensity ratio of the tonal and atonal components of the original audio signal is reproduced as accurately as possible.
- a broadband noise component is generally added to the harmonics of the audio fundamental frequency F 0 .
- This noise component is often dominant at higher frequencies, in particular from 6 kHz.
- the excitation signal u (k) is considered to be on at a predetermined sampling rate of e.g. 16 kHz or 8 kHz sampled subband signal generated.
- This subband signal u (k) represents the frequency components of the high subband of 4-8 kHz, by which the bandwidth of the narrowband audio signal NAS is to be extended.
- the narrowband audio signal NAS extends over a frequency range of 0-4 kHz and is sampled at a sampling rate of 8 kHz.
- the formed excitation signal u (k) excites the audio synthesis filter ASYN and is thereby formed into the high band audio signal HAS.
- the synthetic broadband audio signal SAS is finally synthesized by combining the shaped high-band audio signal HAS and the narrow-band audio signal NAS with a higher sampling rate of e.g. 16 kHz generated.
- excitation signal u (k) is based on an audio generation model in which tonal, i. voiced sounds through a sequence of pulses and atonal, i. unvoiced sounds are excited by preferably white noise.
- tonal i. voiced sounds through a sequence of pulses
- atonal i. unvoiced sounds are excited by preferably white noise.
- Various modifications are contemplated to allow for mixed excitation forms that can provide improved hearing.
- the generation of the tonal components of the excitation signal u (k) is based on two audio parameters of the audio generation model, namely the audio base frequency F 0 and the energy / intensity ratio ⁇ between the tonal and atonal audio components in the deep subband.
- the latter is often referred to as the overtone-to-noise ratio or "harmonics to Noise Ratio ", HNR for short
- the audio base frequency F 0 is also called” fundamental speech frequency ".
- Both audio parameters F 0 and ⁇ can be extracted at the receiver of a transmitted audio signal; preferably (eg in the case of a bandwidth extension) directly from the low frequency band of the audio signal or (eg in the case of subband audio synthesis) from the low band decoder of an underlying lowband audio codec, where such audio parameters are usually available.
- the audio base frequency F 0 is often represented by a basic period value divided by the sampling rate divided by the audio basic frequency F 0 .
- the base period value is often referred to as "pitch lag".
- the base period value is an audio parameter generally transmitted to standard audio codecs, such as the G.729 recommendation, for purposes of so-called “long-term prediction", LTP for short. If such a standard deep subband audio codec is used, the audio base frequency F 0 may be determined or estimated from the LPT audio parameters provided by this audio codec.
- an LTP fundamental period value is transmitted at a temporal resolution, ie, accuracy that is a fractional 1 / N of the sampling distance used by this audio codec.
- the basic LTP period value is provided with an accuracy of 1/3 of the sampling distance. In units of this sampling distance, the basic period value can also be accept non-integer values.
- accuracy can be achieved by the relevant audio encoder, for example, by a sequence of so-called “open-loop” and "closed-loop” searches. The audio encoder attempts to find that fundamental period value at which the intensity or energy of an LTP residual signal is minimized.
- an LTP basic period value determined in this way can deviate from the basic period value corresponding to the actual audio basic frequency F 0 of the tonal audio components, particularly in the case of strong background noises, and thus impair accurate reproduction of these tonal audio components.
- Typical deviations include period-doubling errors and period bisecting errors. That is, the frequency corresponding to the departing LPT basic period value is one-half, or twice, the actual audio basic frequency F 0 of the tonal audio components.
- the function round maps its argument to the nearest integer.
- a moving average is formed over the fundamental period values ⁇ post ( ⁇ ) for further smoothing.
- the moving average corresponds to a type of low-pass filtering.
- tonal mixing parameters g v ( ⁇ ) and atonal mixing parameters g uv ( ⁇ ) for mixing corresponding tonal and atonal components of the excitation signal u (k) in the high subband time-frame from the subband specific mixing parameters g LTP ( ⁇ ) and g FIX ( ⁇ ) of the low-band decoder LBD are derived.
- the low-band decoder LBD is a so-called CELP decoder (CELP: Codebook Excited Linear Prediction), which has a so-called adaptive or LTP codebook and a so-called fixed codebook.
- the intensity ratio between tonal and atonal signal components can be reconstructed from the mixing parameters LTP g and g FIX of the low-band decoder LBD.
- Both mixing parameters g LTP , g FIX can be extracted on a timely basis from the low-band decoder LBD.
- an instantaneous intensity ratio between the contributions of the adaptive and fixed codebooks ie the overtone-to-noise ratio ⁇ , can be determined by dividing the energy contributions of the adaptive and fixed codebooks.
- the mixing parameter g LTP ( ⁇ ) indicates a gain for the adaptive codebook signals
- the mixing parameter g FIX ( ⁇ ) indicates a gain for the fixed codebook signals.
- This "Wiener” filtering further lowers a small ⁇ (atonal audio segment) while barely changing large values of ⁇ (tonally dominated audio segment). Such a reduction better approximates natural audio signals.
- both mixing parameters g v ( ⁇ ) and g uv ( ⁇ ) generally have (at the same time) a non-vanishing one Value.
- the above calculation rule ensures that the sum of the squares of the mixing parameters g v and g uv , ie a total energy of the mixed excitation signal u (k), is substantially constant.
- the generation of the excitation signal u (k) is explained in more detail below on the basis of the audio parameters g v , g uv and ⁇ p derived from the low-band decoder LBD using the example of two embodiment variants of the excitation signal generator HBG.
- the following statements are of course readily generalizable to any values of N.
- a first embodiment of the excitation signal generator HBG is in FIG. 2 shown schematically.
- the noise generator NOISE preferably generates white noise.
- the pulse generator PG1 in turn comprises a rectangular pulse generator SPG and a pulse shaping filter SF with a predetermined filter coefficient set p (k) of finite length. While the noise generator NOISE serves to generate the atonal components of the excitation signal u (k), the pulse generator PG1 contributes to the generation of the tonal components of the excitation signal u (k).
- the audio parameters g v , g uv and ⁇ p are time-frame continuous in audio sequence of the low-band decoder LBD or derived and adapted by means of a suitable audio parameter extraction block.
- the filter operations are designed for a fractional basic period parameter ⁇ p with an accuracy of 1 / (2N), equal to 1/6, in units of the sampling rate of the low-band decoder LBD and for a target bandwidth corresponding to the bandwidth of the low-band decoder LBD ,
- Pulse generator PG1 provide a sampling rate of at least 8 kHz.
- the rectangular pulse generator SPG generates individual rectangular pulses in a time interval given by 6 * ⁇ p in units of the sampling interval 1/48000 s of the rectangular pulse generator SPG.
- the individual rule pulses have an amplitude of 6 * ⁇ p . such that the average energy of a long pulse sequence is substantially constant equal to 1.
- the rectangular pulses generated by the rectangular pulse generator SPG are multiplied by the "tonal" mixing parameter g v and fed to the pulse shaping filter SF.
- the rectangular pulses are smeared to a certain extent by folding or correlation with the filter coefficients p (k) .
- This filtering can considerably reduce the so-called crest factor, ie a ratio of peak to average samples, and considerably improve the audio quality of the synthesized audio signal SAS
- the rectangular pulses may advantageously be spectrally shaped by the pulse shaping filter SF
- the pulse shaping filter SF may have a bandpass characteristic with a transition region around 4 kHz and a substantially uniform increase in attenuation in the direction of higher and lower frequencies can be achieved that higher frequencies of the excitation signal u (k) have less harmonic components and thus the noise component increases with increasing frequency.
- FIGS. 3a and 3b An exemplary choice of the filter coefficients p (k) is in the FIGS. 3a and 3b shown schematically. While FIG. 3a the filter coefficient p (k) plotted against its sample index k is in FIG. 3b the energy spectrum of the filter coefficients p (k) plotted against the frequency. For the target frequency range relevant in the present exemplary embodiment, essentially only the spectral range of 4-8 kHz is relevant for the filter coefficients p (k). This frequency range is in FIG. 3b indicated by a widened line.
- the rectangular pulses "blurred" by the pulse shaping filter SF are added to a noise signal generated by the noise generator NOISE and multiplied by the "atonal" mixing parameter g uv , and the resultant sum signal is fed to the low-pass filter LP.
- the generated excitation signal u (k) contains the frequency components required for bandwidth expansion. However, these are available as spectrum mirrored around the 4 kHz frequency. In order to invert the spectrum, the excitation signal u (k) can be modulated with modulation factors (-1) k .
- the tonal and atonal components of the excitation signal u (k) can be treated independently of each other.
- filtering and decimation operations for the tonal audio components are also summarized in a single processing block.
- the impulse response of all in FIG. 2 provided filtering, decimation and modulation operations for the tonal audio components are calculated in advance and stored in a look-up table in a suitable form.
- FIG. 4 Such a configured, second embodiment of the excitation signal generator HBG is in FIG. 4 schematically illustrated and will be explained below.
- the pulse generator PG2 in turn comprises a pulse positioner PP and a look-up table LOOKUP, in which predetermined pulse shapes v j (k) are stored.
- the noise generator NOISE serves to generate the atonal components of the excitation signal u (k)
- the pulse generator PG2 contributes to the generation of the tonal components of the excitation signal u (k).
- the derivation of the audio parameters g v , g uv and ⁇ p has already been explained above.
- the fractional basic period parameter ⁇ p is given as above with an accuracy of 1 / (2N), here equal to 1/6, in units of the sampling rate of the low-band decoder LBD.
- the impulse response of all can FIG. 2 Filtering, decimation and modulation operations are calculated in advance and stored in the form of certain pulse shapes v j (k) in the look-up table LOOKUP.
- several pulse shapes v j (k) in the lookup table LOOKUP are kept.
- the number of pulse shapes v j (k) to be provided is preferably given by the inverse of the accuracy of the basic period parameter ⁇ p , ie here by 2N.
- the index j runs from 0 to 2N-1.
- the look-up table LOOKUP is supplied with the fractional component ⁇ p - ⁇ ⁇ p ⁇ of the respective basic period parameter ⁇ p .
- the parenthesis ⁇ ⁇ denotes an integer part of a rational or real number.
- a pulse shape is selected from the stored pulse shapes v j (k) and a correspondingly shaped pulse is output from the look-up table LOOKUP.
- ⁇ p - ⁇ ⁇ p ⁇ can assume the values 0, 1/6, 2/6, 3/6, 4/6 and 5/6.
- that impulse form v j (k) is selected whose index j corresponds to the respective counter of the relevant fraction.
- Each of the stored pulse forms v j (k) corresponds to an impulse response of the in FIG. 2 shown chain from the filters SF, LP, D3, HP and D2 (and optionally a modulator) for a particular fractional fraction ⁇ p - ⁇ ⁇ p ⁇ the basic period parameter ⁇ p .
- the illustrated pulse shapes v j (k) are for a fractional resolution of ⁇ p of 1/6 (at a sampling rate of 8 kHz) and plotted against its scan index k.
- An assignment of a respective pulse shape v j (k) to the associated fractional fraction ⁇ p - ⁇ ⁇ p ⁇ is the legend of FIG. 5 refer to.
- the pulse output by the LOOKUP look-up table which has a pulse shape selected on the basis of the fractional fraction ⁇ p- ⁇ ⁇ p ⁇ , is multiplied by the "tonal" mixing parameter g v and fed to the pulse positioner PP.
- the supplied pulses are temporally positioned depending on the integer part ⁇ ⁇ p ⁇ of the basic period parameter ⁇ p .
- the pulses are output by the pulse positioning device PP at a time interval which corresponds to the integer component ⁇ ⁇ p ⁇ of the basic period parameter ⁇ p.
- the pulses may be modulated by a respective sign of the pulse shapes v j (k) or of the respective pulses, either for even values of ⁇ ⁇ p ⁇ or for odd values of ⁇ ⁇ is inverted p ⁇ .
- the noise signal of the noise generator NOISE multiplied by the "atonal" mixture parameter g uv is finally added to obtain the excitation signal u (k).
- FIG. 4 illustrated embodiment can be generally with less effort than in FIG. 2 implement illustrated embodiment variant.
- an excitation signal generator according to FIG. 4 by specifying suitable pulse shapes v j (k) effectively the same excitation signals u (k) as with an excitation signal generator according to FIG. 2 to generate. Since the output pulses have a relatively large distance (typically 20-134 sampling distances), the computational effort for an inventive excitation signal generator according to FIG. 4 relatively low.
- the invention can be implemented by means of a low-cost digital signal processor with relatively low memory and computing power requirements.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Claims (15)
- Procédé de formation d'un signal audio (SAS) à partir d'un flux de données audio (AD) codées, avec lequela) des composantes fréquentielles (NAS) du signal audio incluses dans une première sous-bande basse sont formées au moyen d'un décodeur de sous-bande (LBD) à l'aide de valeurs de période fondamentale (λLTP) contenues dans les données audio (AD) et indiquant chacune une période fondamentale du signal audio (SAS),b) des composantes fréquentielles (HAS) du signal audio incluses dans une seconde sous-bande haute sont formées par excitation d'un filtre de synthèse audio (ASYN) au moyen d'un signal d'excitation (u(k)) spécifique à la seconde sous-bande, etc) pour la génération du signal d'excitation (u(k)) par un générateur de signal d'excitation (HBG)- un paramètre de période fondamentale (λp) est déduit des valeurs de période fondamentale (λLTP) ainsi que- des impulsions ayant une forme d'impulsion dépendante du paramètre de période fondamentale (λp) sont formées dans un intervalle de temps défini par le paramètre de période fondamentale (λp) et mélangées à un signal de bruit.
- Procédé selon la revendication 1,
caractérisé
en ce qu'un premier intervalle d'échantillonnage spécifique à la première sous-bande est attribué au décodeur de sous-bande (LBD) et
en ce que le paramètre de période fondamentale (λp) indique la période fondamentale du signal audio (SAS) jusqu'à une fraction du premier intervalle d'échantillonnage. - Procédé selon la revendication 2,
caractérisé en ce que
la forme d'impulsion (vj(k)) d'une impulsion respective est sélectionnée en fonction d'une partie non entière (λp - └λp┘) du paramètre de période fondamentale (λp) selon des unités du premier intervalle d'échantillonnage parmi différentes formes d'impulsion (vj(k)) prédéfinies, enregistrées dans une table de consultation. - Procédé selon la revendication 2 ou 3,
caractérisé en ce que l'intervalle de temps des impulsions est défini par une partie entière (└λp┘) du paramètre de période fondamentale (λp) selon des unités du premier intervalle d'échantillonnage. - Procédé selon la revendication 2 ou 3,
caractérisé
en ce que les impulsions sont formées, à partir d'une forme d'impulsion prédéfinie, par des valeurs d'échantillonnage présentant un second intervalle d'échantillonnage, le second intervalle d'échantillonnage étant plus petit d'un facteur d'extension de largeur de bande (N) que le premier intervalle d'échantillonnage, et
en ce que l'intervalle de temps des impulsions est défini selon des unités du second intervalle d'échantillonnage par le paramètre de période fondamentale (λp) multiplié par le facteur d'extension de largeur de bande (N). - Procédé selon la revendication 5,
caractérisé en ce que
les impulsions sont formées par un filtre de formage d'impulsions (SF) ayant des coefficients de filtre (p(k)) prédéfinis dans le second intervalle d'échantillonnage. - Procédé selon la revendication 5 ou 6,
caractérisé en ce que,
avant ou après le mélange du signal de bruit, les impulsions sont soumises à une décimation par au moins un décimateur (D2, D3). - Procédé selon l'une quelconque des revendications précédentes, caractérisé en ce que
les impulsions sont filtrées, avant ou après le mélange du signal de bruit, par au moins un filtre passe-haut, passe-bas ou passe-bande. - Procédé selon l'une quelconque des revendications précédentes, caractérisé en ce que
le paramètre de période fondamentale (λp) est déduit, selon des laps de temps, d'après une ou plusieurs valeurs de période fondamentale (λLTP). - Procédé selon l'une quelconque des revendications précédentes, caractérisé en ce que
le paramètre de période fondamentale (λp) est déduit d'après des valeurs de période fondamentale (λLTP) de plusieurs laps de temps combinées pour compenser des variations. - Procédé selon l'une quelconque des revendications précédentes, caractérisé en ce que
une différence relative (e) d'une valeur de période fondamentale (λLTP) réelle à une valeur de période fondamentale antérieure ou à une grandeur (λpost) déduite de celle-ci est déterminée et est atténuée dans le cadre de la déduction du paramètre de période fondamentale (λp). - Procédé selon l'une quelconque des revendications précédentes, caractérisé en ce que
un taux de mélange entre les impulsions et le signal de bruit est défini par au moins un paramètre de mélange (gv, guv) qui est déduit, selon des laps de temps, d'un rapport de niveau (γ), présent dans le décodeur de sous-bande (LBD), entre une partie tonale et une partie atonale du signal audio de la première sous-bande. - Procédé selon la revendication 12,
caractérisé en ce que,
dans le cadre de la déduction du paramètre de mélange (gv, guv), le rapport de niveau (γ) est déplacé de manière telle que, en cas de prépondérance de la partie atonale du signal audio, la partie tonale du signal audio soit réduite. - Décodeur de signaux audio pour la formation d'un signal audio (SAS) à partir d'un flux de données audio (AD) codées, comportanta) un décodeur de sous-bande (LBD) pour la formation des composantes fréquentielles (NAS) du signal audio incluses dans une première sous-bande basse à l'aide de valeurs de période fondamentale (λLTP) contenues dans les données audio (AD) et indiquant chacune une période fondamentale du signal audio (SAS),b) un filtre de synthèse audio (ASYN) etc) un générateur de signal d'excitation (HBG) destiné à la génération d'un signal d'excitation (u(k)) pour la formation de composantes fréquentielles (HAS) du signal audio comprises dans une seconde sous-bande haute par l'excitation du filtre de synthèse audio, le générateur de signal d'excitation (HBG) comportant :- un dispositif de déduction pour la déduction d'un paramètre de période fondamentale (λp) à partir des valeurs de période fondamentale (λLTP)- un générateur de bruit (NOISE) pour la formation d'un signal de bruit,- un générateur d'impulsions (PG1, PG2) pour la formation d'impulsions ayant une forme d'impulsion dépendante du paramètre de période fondamentale (λp) dans un intervalle de temps défini par le paramètre de période fondamentale (λp), ainsi que- un système de mélange pour le mélange des impulsions avec le signal de bruit.
- Codeur de signaux audio avec un décodeur de signaux audio selon la revendication 14 ainsi qu'avec un système de comparaison pour l'harmonisation d'un signal audio synthétisé (SAS) par le décodeur de signaux audio à un signal audio à coder, le signal audio synthétisé (SAS) étant harmonisé au signal audio à coder par une variation des données audio (AD), en particulier des informations secondaires (GFIX, gLTP, λLTP) contenues dans celles-ci.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2006/000812 WO2007087824A1 (fr) | 2006-01-31 | 2006-01-31 | Procede et dispositifs pour le codage de signaux audio |
Publications (2)
Publication Number | Publication Date |
---|---|
EP1979901A1 EP1979901A1 (fr) | 2008-10-15 |
EP1979901B1 true EP1979901B1 (fr) | 2015-10-14 |
Family
ID=36616862
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP06706508.6A Active EP1979901B1 (fr) | 2006-01-31 | 2006-01-31 | Procede et dispositifs pour le codage de signaux audio |
Country Status (4)
Country | Link |
---|---|
US (1) | US8612216B2 (fr) |
EP (1) | EP1979901B1 (fr) |
CN (1) | CN101336451B (fr) |
WO (1) | WO2007087824A1 (fr) |
Families Citing this family (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4972742B2 (ja) * | 2006-10-17 | 2012-07-11 | 国立大学法人九州工業大学 | 高域信号補間方法及び高域信号補間装置 |
US8639500B2 (en) * | 2006-11-17 | 2014-01-28 | Samsung Electronics Co., Ltd. | Method, medium, and apparatus with bandwidth extension encoding and/or decoding |
KR101379263B1 (ko) * | 2007-01-12 | 2014-03-28 | 삼성전자주식회사 | 대역폭 확장 복호화 방법 및 장치 |
WO2010028292A1 (fr) | 2008-09-06 | 2010-03-11 | Huawei Technologies Co., Ltd. | Prédiction de fréquence adaptative |
US8532998B2 (en) * | 2008-09-06 | 2013-09-10 | Huawei Technologies Co., Ltd. | Selective bandwidth extension for encoding/decoding audio/speech signal |
US8515747B2 (en) | 2008-09-06 | 2013-08-20 | Huawei Technologies Co., Ltd. | Spectrum harmonic/noise sharpness control |
US8577673B2 (en) | 2008-09-15 | 2013-11-05 | Huawei Technologies Co., Ltd. | CELP post-processing for music signals |
WO2010031003A1 (fr) | 2008-09-15 | 2010-03-18 | Huawei Technologies Co., Ltd. | Addition d'une seconde couche d'amélioration à une couche centrale basée sur une prédiction linéaire à excitation par code |
EP2346032B1 (fr) * | 2008-10-24 | 2014-05-07 | Mitsubishi Electric Corporation | Suppresseur de bruit et decodeur de parole |
CN101599272B (zh) * | 2008-12-30 | 2011-06-08 | 华为技术有限公司 | 基音搜索方法及装置 |
JP5552988B2 (ja) * | 2010-09-27 | 2014-07-16 | 富士通株式会社 | 音声帯域拡張装置および音声帯域拡張方法 |
US8924200B2 (en) * | 2010-10-15 | 2014-12-30 | Motorola Mobility Llc | Audio signal bandwidth extension in CELP-based speech coder |
US8868432B2 (en) * | 2010-10-15 | 2014-10-21 | Motorola Mobility Llc | Audio signal bandwidth extension in CELP-based speech coder |
KR20120046627A (ko) * | 2010-11-02 | 2012-05-10 | 삼성전자주식회사 | 화자 적응 방법 및 장치 |
CN108831501B (zh) | 2012-03-21 | 2023-01-10 | 三星电子株式会社 | 用于带宽扩展的高频编码/高频解码方法和设备 |
JP5998603B2 (ja) * | 2012-04-18 | 2016-09-28 | ソニー株式会社 | 音検出装置、音検出方法、音特徴量検出装置、音特徴量検出方法、音区間検出装置、音区間検出方法およびプログラム |
US9373337B2 (en) * | 2012-11-20 | 2016-06-21 | Dts, Inc. | Reconstruction of a high-frequency range in low-bitrate audio coding using predictive pattern analysis |
US8927847B2 (en) * | 2013-06-11 | 2015-01-06 | The Board Of Trustees Of The Leland Stanford Junior University | Glitch-free frequency modulation synthesis of sounds |
JP6385936B2 (ja) * | 2013-08-22 | 2018-09-05 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America | 音声符号化装置およびその方法 |
US10083708B2 (en) * | 2013-10-11 | 2018-09-25 | Qualcomm Incorporated | Estimation of mixing factors to generate high-band excitation signal |
US10163447B2 (en) * | 2013-12-16 | 2018-12-25 | Qualcomm Incorporated | High-band signal modeling |
CN111710342B (zh) * | 2014-03-31 | 2024-04-16 | 弗朗霍弗应用研究促进协会 | 编码装置、解码装置、编码方法、解码方法及程序 |
US20170010733A1 (en) * | 2015-07-09 | 2017-01-12 | Microsoft Technology Licensing, Llc | User-identifying application programming interface (api) |
US10264116B2 (en) * | 2016-11-02 | 2019-04-16 | Nokia Technologies Oy | Virtual duplex operation |
CN109003621B (zh) * | 2018-09-06 | 2021-06-04 | 广州酷狗计算机科技有限公司 | 一种音频处理方法、装置及存储介质 |
WO2020157888A1 (fr) * | 2019-01-31 | 2020-08-06 | 三菱電機株式会社 | Dispositif d'extension de bande de fréquence, procédé d'extension de bande de fréquence et programme d'extension de bande de fréquence |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE69712537T2 (de) * | 1996-11-07 | 2002-08-29 | Matsushita Electric Industrial Co., Ltd. | Verfahren zur Erzeugung eines Vektorquantisierungs-Codebuchs |
US6377915B1 (en) * | 1999-03-17 | 2002-04-23 | Yrp Advanced Mobile Communication Systems Research Laboratories Co., Ltd. | Speech decoding using mix ratio table |
DE10041512B4 (de) | 2000-08-24 | 2005-05-04 | Infineon Technologies Ag | Verfahren und Vorrichtung zur künstlichen Erweiterung der Bandbreite von Sprachsignalen |
JP2003044098A (ja) * | 2001-07-26 | 2003-02-14 | Nec Corp | 音声帯域拡張装置及び音声帯域拡張方法 |
-
2006
- 2006-01-31 WO PCT/EP2006/000812 patent/WO2007087824A1/fr active Application Filing
- 2006-01-31 CN CN2006800521286A patent/CN101336451B/zh not_active Expired - Fee Related
- 2006-01-31 US US12/223,362 patent/US8612216B2/en active Active
- 2006-01-31 EP EP06706508.6A patent/EP1979901B1/fr active Active
Also Published As
Publication number | Publication date |
---|---|
CN101336451B (zh) | 2012-09-05 |
EP1979901A1 (fr) | 2008-10-15 |
WO2007087824A1 (fr) | 2007-08-09 |
US20090024399A1 (en) | 2009-01-22 |
CN101336451A (zh) | 2008-12-31 |
US8612216B2 (en) | 2013-12-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1979901B1 (fr) | Procede et dispositifs pour le codage de signaux audio | |
DE60012198T2 (de) | Kodierung der hüllkurve des spektrums mittels variabler zeit/frequenz-auflösung | |
DE60013785T2 (de) | VERBESSERTE SUBJEKTIVE QUALITäT VON SBR (SPECTRAL BAND REPLICATION)UND HFR (HIGH FREQUENCY RECONSTRUCTION) KODIERVERFAHREN DURCH ADDIEREN VON GRUNDRAUSCHEN UND BEGRENZUNG DER RAUSCHSUBSTITUTION | |
DE602004005846T2 (de) | Audiosignalgenerierung | |
DE69317958T2 (de) | Kodierer von Audiosignalen mit niedriger Verzögerung, unter Verwendung von Analyse-durch-Synthese-Techniken | |
DE69618422T2 (de) | Verfahren zur Sprachdekodierung und tragbares Endgerät | |
DE69910240T2 (de) | Vorrichtung und verfahren zur wiederherstellung des hochfrequenzanteils eines überabgetasteten synthetisierten breitbandsignals | |
DE69219718T2 (de) | Digitales Datenkodierungs-und Dekodierungsgerät mit hoher Wirksamkeit | |
DE102008015702B4 (de) | Vorrichtung und Verfahren zur Bandbreitenerweiterung eines Audiosignals | |
DE69916321T2 (de) | Kodierung eines verbesserungsmerkmals zur leistungsverbesserung in der kodierung von kommunikationssignalen | |
DE69518452T2 (de) | Verfahren für die Transformationskodierung akustischer Signale | |
DE69821089T2 (de) | Verbesserung von quellenkodierung unter verwendung von spektralbandreplikation | |
DE60202881T2 (de) | Wiederherstellung von hochfrequenzkomponenten | |
DE69634645T2 (de) | Verfahren und Vorrichtung zur Sprachkodierung | |
DE60310716T2 (de) | System für die audiokodierung mit füllung von spektralen lücken | |
DE60128121T2 (de) | Wahrnehmungsbezogen verbesserte aufbesserung kodierter akustischer signale | |
DE69621393T2 (de) | Quantisierung von Sprachsignalen in prädiktiven Kodiersystemen unter Verwendung von Modellen menschlichen Hörens | |
DE60103086T2 (de) | Verbesserung von quellcodierungssystemen durch adaptive transposition | |
DE69123500T2 (de) | 32 Kb/s codeangeregte prädiktive Codierung mit niedrigen Verzögerung für Breitband-Sprachsignal | |
EP1979899B1 (fr) | Procédé et dispositifs pour coder un signal audio | |
DE69328064T2 (de) | Zeit-Frequenzinterpolation mit Anwendung zur Sprachkodierung mit niedriger Rate | |
DE602005003358T2 (de) | Audiokodierung | |
DE60038279T2 (de) | Beitband Sprachkodierung mit parametrischer Kodierung des Hochfrequenzanteils | |
DE3710664A1 (de) | System zum uebertragen eines sprachsignals | |
EP1023777B1 (fr) | Procede et dispositif pour limiter un courant de donnees audio dont le debit binaire peut etre mis a l'echelle |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20080612 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): DE FR GB IT SE |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: TADDEI, HERVE Inventor name: GARTNER, MARTIN Inventor name: JAX, PETER Inventor name: VARY, PETER Inventor name: SCHANDL, STEFAN Inventor name: GEISER, BERND |
|
RBV | Designated contracting states (corrected) |
Designated state(s): DE FR GB IT SE |
|
DAX | Request for extension of the european patent (deleted) | ||
17Q | First examination report despatched |
Effective date: 20130926 |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: UNIFY GMBH & CO. KG |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R079 Ref document number: 502006014585 Country of ref document: DE Free format text: PREVIOUS MAIN CLASS: G10L0021020000 Ipc: G10L0019020000 |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G10L 21/038 20130101ALI20150408BHEP Ipc: G10L 19/02 20130101AFI20150408BHEP |
|
INTG | Intention to grant announced |
Effective date: 20150512 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE FR GB IT SE |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D Free format text: NOT ENGLISH |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 502006014585 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 11 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20151014 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20151014 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 502006014585 Country of ref document: DE |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed |
Effective date: 20160715 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R082 Ref document number: 502006014585 Country of ref document: DE Representative=s name: SCHAAFHAUSEN PATENTANWAELTE PARTNERSCHAFTSGESE, DE Ref country code: DE Ref legal event code: R082 Ref document number: 502006014585 Country of ref document: DE Representative=s name: FRITZSCHE PATENTANWAELTE, DE Ref country code: DE Ref legal event code: R081 Ref document number: 502006014585 Country of ref document: DE Owner name: UNIFY GMBH & CO. KG, DE Free format text: FORMER OWNER: UNIFY GMBH & CO. KG, 81379 MUENCHEN, DE |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 12 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 13 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R082 Ref document number: 502006014585 Country of ref document: DE Representative=s name: SCHAAFHAUSEN PATENTANWAELTE PARTNERSCHAFTSGESE, DE |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20240119 Year of fee payment: 19 Ref country code: GB Payment date: 20240124 Year of fee payment: 19 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20240124 Year of fee payment: 19 |