EP1979901A1

EP1979901A1 - Method and arrangements for audio signal encoding

Info

Publication number: EP1979901A1
Application number: EP06706508A
Authority: EP
Inventors: Martin Gartner; Bernd Geiser; Peter Jax; Stefan Schandl; Herve Taddei; Peter Vary
Original assignee: Siemens Enterprise Communications GmbH and Co KG
Current assignee: Unify GmbH and Co KG
Priority date: 2006-01-31
Filing date: 2006-01-31
Publication date: 2008-10-15
Anticipated expiration: 2026-01-31
Also published as: EP1979901B1; CN101336451B; WO2007087824A1; US20090024399A1; CN101336451A; US8612216B2

Abstract

To form an audio signal (SAS), frequency components (NAS) of the audio signal which are allotted to a first subband are formed by means of a subband decoder (LBD) using supplied fundamental period values (λLTP) which respectively indicate a fundamental period for the audio signal. Frequency components (HAS) of the audio signal which are allotted to a second subband are formed by exciting an audio synthesis filter (ASYN) using an excitation signal (u(k)) which is specific to the second subband. To produce this excitation signal (u(k)), an excitation signal generator (HBG) derives a fundamental period parameter (λp) from the fundamental period values (λLTP). The fundamental period parameter (λp) is used by the excitation signal generator (HBG) to form pulses with a pulse shape which is dependent on the fundamental period parameter (λp) at an interval of time which is determined by the fundamental period parameter (λp) and to mix them with a noise signal.

Description

description

Methods and arrangements for audio signal coding

The invention relates to a method and arrangements for audio signal coding. In particular, the invention relates to a method and an audio signal decoder for forming an audio signal and an audio signal encoder.

In many contemporary communication systems, and in particular in mobile communication systems, real-time audio transmissions such Voice or music broadcasts, limited transmission bandwidths available. To transmit over a limited bandwidth transmission link, such as e.g. It is therefore frequently provided via a radio network to transmit as many audio channels as possible in real time, to compress the audio signals to be transmitted by real-time or quasi-real-time capable audio coding methods and to decompress them after the transmission. In the following, the term audio is understood in particular also language.

In the case of such audio coding methods, the aim is usually to reduce the amount of data to be transmitted and thus the transmission rate as much as possible without impairing the subjective hearing impression or, in the case of voice transmissions, the ability to understand too much.

An efficient compression of audio signals is also an essential aspect in connection with the storage or archiving of audio signals.

Coding methods in which an audio signal synthesized by an audio synthesis filter prove to be particularly efficient on a timely basis to an audio signal to be transmitted is adjusted by optimizing filter parameters. Such a procedure is often referred to as analysis-by-synthesis. The audio synthesis filter is thereby excited by an excitation signal which is preferably also to be optimized. Filtering is often referred to as formant synthesis. For example, so-called LPC coefficients (LPC: Linear Predictive Coding) and / or parameters specifying a spectral and / or temporal envelope of the audio signal can be used as filter parameters. The optimized filter parameters and the parameters specifying the excitation signal are then transferred to the receiver on a timely basis in order to form a synthetic audio signal there by means of an audio signal decoder provided on the receiver side, which is as similar as possible to the original audio signal with regard to the subjective auditory impression.

Such an audio coding method is known from ITU-T Recommendation G.729. By means of the audio coding method described there, a real-time audio signal with a bandwidth of 4 kHz can be reduced to a transmission rate of 8 kbit / s.

In addition, it is currently desired to synthesize an audio signal to be transmitted to improve the audio experience with higher bandwidth. In the currently discussed G.729EV extension of the G.792 Recommendation, an attempt is made to extend the audio bandwidth from 4 kHz to 8 kHz.

The achievable transmission bandwidth and audio synthesis quality depend essentially on the generation of a suitable excitation signal.

In the case of a bandwidth expansion, in which an excitation signal u _nb (k) in a deep subband, for example in the Frequenzbe- rich from 50 Hz to 3.4 kHz, already present, a bandwidth-expanding excitation signal u _hb (k) in a high subband, eg in the frequency range of 3.4-7 kHz, as a spectral copy of the narrow-band excitation signal u _nb (k) gebil - be det. (Through the index k, samples of the excitation signal or of other signals are indicated here and below.) The copy can hereby be formed by spectral translation or by spectral reflection of the narrow-band excitation _signal u _nb (k). Such spectral translation or reflection, however, anharmonically distorts the spectrum of the excitation signal and / or causes a significant, audible phase error in the spectrum. However, this leads to an audible quality loss of the audio signal.

It is an object of the present invention to provide a method for forming an audio signal, which allows an improvement of the audio quality, wherein the transmission bandwidth is not increased or only relatively little. It is a further object of the invention to specify an audio signal decoder for carrying out the method as well as an audio signal encoder.

This object is achieved by a method with the features of claim 1, by an audio signal decoder with the

Features of claim 14 and by an audio signal encoder with the features of claim 15.

In the method according to the invention for forming an audio signal, frequency components of the audio signal attributable to a first subband are formed by means of a subband decoder on the basis of supplied basic period values each indicating a fundamental period of the audio signal. On a second sub Band attributable frequency components of the audio signal are formed by exciting an audio synthesis filter by means of an excitation signal specific for the second subband. To generate the excitation signal specific for the second subband, a basic period parameter is derived from the basic period values by an excitation signal generator. Based on the basic period parameter, the excitation signal generator forms pulses having a pulse shape dependent on the basic period parameter in a time interval determined by the basic period parameter and mixed with a noise signal.

By means of the invention, frequency components of the audio signal attributable to a further second subband can be synthesized on the basis of basic period values which have already been made available for a subband decoder specific to the first subband. Since no additional audio parameters are generally required for the generation of the noise signal, the generation of the excitation signal generally requires no additional transmission bandwidth. By adding the frequency components of the further, second subband, however, the audio quality of the audio signal can be considerably improved, in particular since a harmonic harmonic content determined by the basic period values can be reproduced in the second subband.

Advantageous embodiments and further developments of the invention are specified in the dependent claims.

According to an advantageous embodiment of the invention, the basic period parameter may specify the fundamental period of the audio signal except for a fraction of a first sampling interval associated with the subband decoder. By one to a fraction - preferably l / N with integer N - of the first Sampling distance specified basic period parameters, the pulses can be spaced with a relation to the subband decoder higher accuracy, whereby a harmonic spectrum of the audio signal in the second sub-band can be modeled finer.

Furthermore, the pulse shape of a respective pulse can be selected from different pulse forms stored in a look-up table depending on a proportion of the basic period parameter which is not integral in units of the first sampling interval. From the look-up table, very different pulse shapes can be retrieved in real-time by simple retrieval with low switching, processing or computational effort. The pulse shapes to be stored can be optimized in advance in terms of a lifelike audio playback. In fact, the cumulative effects or the cumulative impulse response of several filters, decimators and / or modulators can be calculated in advance and each stored as a correspondingly shaped pulse in the look-up table. A decimator in this context is a converter which multiplies a sampling interval of a signal by a decimation factor m by rejecting all samples except for every m-th sample. A modulator is a filter that multiplies individual samples of a signal by predetermined individual factors and outputs the respective product.

Furthermore, the time interval of the pulses can be determined by an integral part of the basic period parameter in units of the first sampling interval.

According to a further advantageous embodiment of the invention, the pulses from a predetermined pulse shape, for example a rectangular pulse, can be formed by samples having a second sampling distance which is smaller by a bandwidth expansion factor than the first sampling distance. The time interval of the pulses can then be in units of the second sample pitch are determined by the basic period parameter multiplied by the bandwidth expansion factor. As the bandwidth expansion factor, it is preferable to select the inverse N of the fraction l / N corresponding to the accuracy of the basic period parameter in units of the first sampling pitch.

Preferably, the pulses can be formed by a pulse shaping filter with filter coefficients predetermined at the second sampling interval.

Furthermore, the pulses can be filtered before or after admixing of the noise signal by at least one high, low and / or bandpass and / or decimated by at least one decimator.

According to a further advantageous embodiment of the invention, the basic period parameter can be derived from one or more basic period values on a timely basis.

In particular, the basic period parameter can be derived from fluctuation-compensating, preferably non-linear, associated basic period values of several time frames. In this way it can be avoided that fluctuations or jumps in the fundamental period values, e.g. may result from spurious noise measurements of an audio background frequency, adversely affect the basic period parameters.

In this context, a relative deviation of a current base period value from a previous base period value or from a quantity derived therefrom can be determined and attenuated as part of the derivation of the basic period parameter.

According to a further advantageous embodiment of the invention, a mixing ratio between the pulses and the noise signal by at least one mixing parameter certainly. This can be time-frame basis of the first sub-band abgelei ^¬ be tet from an existing in the subband decoder level ratio between a tonal and atonal audio signal component. In this way, in the subband decoder, a level parameter relating to an overtone-to-noise ratio in the first subband can be used to form the audio signal components in the second subband.

Furthermore, in the context of the derivation of the mixture parameter, the level ratio can be implemented such that at

Weighing the atonal audio signal portion of the tonal audio signal component is further lowered. Since, in the case of natural audio sources, an atonal audio signal component in higher frequency bands, in particular from 6 kHz, increasingly prevails, the quality of reproduction can generally be improved by such a reduction.

Advantageous embodiments of the invention will be explained in more detail with reference to the drawing.

In each case show in a schematic representation:

FIG. 1 shows an audio signal decoder,

FIG. 2 shows a first embodiment of an excitation signal generator,

FIG. 3a filter coefficients of a pulse shaping filter,

FIG. 3b shows an energy spectrum of the filter coefficients,

Figure 4 shows a second embodiment of an excitation signal generator and

FIG. 5 shows previously calculated pulse shapes. FIG. 1 shows a schematic illustration of an audio signal decoder which generates a synthetic audio signal SAS from a supplied data stream of coded audio data AD. The generation of the synthetic audio signal SAS is subdivided into different subbands. Thus, frequency components of the synthetic audio signal SAS attributable to a first, low subband are generated separately from frequency components of the synthetic audio signal SAS attributable to a second, high subband. In the following exemplary embodiments, it is assumed by way of example that the low subband comprises a frequency range f = 0-4 kHz and the high subband comprises a frequency range f = 4-8 kHz. The deep subband is also referred to below as narrowband.

In the deep subband, the supplied audio data AD is decoded by a deep subband-specific low-band decoder LBD, ie a decoder having a bandwidth substantially only the low subband. For this purpose, in particular in the audio data AD contained, for the deep subband specific side information, namely atonal mixing parameters g _FIX , tonal mixing _parameters g _LTp and

Basic period values λ _LTP recovered. The low-band decoder, for example a voice codec according to ITU recommendation G.729, generates a narrow-band audio signal NAS in the frequency range f = 0-4 kHz with a sampling rate f _s = 8 kHz.

In the high subband, a synthetic excitation signal u (k) is formed by a high band excitation signal generator HBG on the basis of the side information g _FIX , g _LTP and λ _LTP extracted by the low band decoder LBD. The variable k here and in the following denotes an index by which digital samples of the excitation signal or other signals are indexed. The excitation signal u (k) is from the Excitation signal generator HBG an audio synthesis filter ASYN supplied, which is thereby excited to generate a synthetic high-band audio signal HAS in the frequency range f = 4-8 kHz. The high-band audio signal HAS is combined with the narrow-band audio signal NAS to finally generate and output the broadband synthetic audio signal SAS in the frequency range f = 0-8 kHz.

By means of the audio signal decoder, an audio signal encoder can also be realized in a simple manner. For this purpose, the synthesized audio signal SAS is to be forwarded to a comparison ^device (not shown), which compares the synthesized audio signal SAS with an audio signal to be encoded. By variation of the audio data AD and in particular the side information g _FIX, CJLTP ^unc * ^ LTP ^w i ^r d then the syn ^¬ thetisierte audio signal SAS aligned with the to be encoded audio signal.

The invention can advantageously be used for general audio coding, for subband audio synthesis as well as for artificial bandwidth expansion of audio signals. The latter can be interpreted as a special case of a subband audio synthesis in which information about a particular subband is used to reconstruct or estimate missing frequency components of another subband.

The aforementioned applications are based on a suitably formed excitation signal u (k). The excitation signal u (k), which represents a spectral fine structure of an audio signal, can be converted by the audio synthesis filter ASYN in different ways, eg by shaping its time and / or frequency response. So that a synthetically formed excitation signal u (k) coincides as exactly as possible with an original excitation signal (not shown) used by a (subband) audio signal encoder, the synthetic excitation signal u (k) should preferably have the following properties:

The synthetic excitation signal u (k) should generally have a flat spectrum. At atonal, i. unvoiced sounds, the synthetic excitation signal u (k) can be formed from white noise.

- For tonal, i. voiced sounds, the synthetic excitation signal u (k) should be harmonic signal components, i. spectral peaks in integer multiples of an audio base frequency Fg have.

In practice, however, hardly any pure tonal or pure atonal audio signals occur. Instead, real audio signals usually contain a mixture of tonal and atonal components. The synthetic excitation signal u (k) is preferably to be generated so that an overtone-to-noise ratio, i. an energy or intensity ratio of the tonal and atonal components of the original audio signal may be exactly reproduced.

During tonal sounds, a broadband noise component generally adds to the harmonics of the audio fundamental frequency F _Q. This noise component often becomes dominant at higher frequencies, in particular from 6 kHz.

In the following, the formation of an audio coding, subband audio synthesis and artificial bandwidth amplification of audio signals suitable excitation signal u (k) explained in more detail.

The excitation signal u (k) is considered to be on at a predetermined sampling rate of e.g. 16 kHz or 8 kHz sampled subband signal generated. This subband signal u (k) represents the frequency components of the high subband of 4-8 kHz, by which the bandwidth of the narrowband audio signal NAS is to be extended. The narrowband audio signal NAS extends over a frequency range of 0-4 kHz and is sampled at a sampling rate of 8 kHz.

The formed excitation signal u (k) excites the audio synthesis filter ASYN and is thereby formed into the high-band audio signal HAS. The synthetic broadband audio signal SAS is finally synthesized by combining the shaped high-band audio signal HAS and the narrow-band audio signal NAS with a higher sampling rate of e.g. 16 kHz generated.

The formation of the excitation signal u (k) is based on an audio generation model in which tonal, i. voiced sounds through a sequence of pulses and atonal, i. unvoiced sounds are excited by preferably white noise. Various modifications are contemplated to allow for mixed stimuli that may result in improved hearing.

The generation of the tonal components of the excitation signal u (k) is based on two audio parameters of the audio generation model, namely the audio basic frequency FQ and the energy balance. Intensity ratio γ between the tonal and the a tonal audio components in the deep subband. The latter is often referred to as the overtone-to-noise ratio or "harmonics to Noise Ratio ", HNR for short The audio basic frequency Fg is also called" fundamental speech frequency ".

Both audio parameters Fg and γ can be extracted at the receiver of a transmitted audio signal; preferably (e.g., in the case of bandwidth extension) directly from the low frequency band of the audio signal or (e.g., in the case of subband audio synthesis) from the low band decoder of an underlying lowband audio codec, where such audio parameters are typically available.

The audio basic frequency Fg is often represented by a basic period value given by the sampling rate divided by the audio basic frequency Fg. The base period value is often referred to as "pitch lag ^XΛ . The basic period value is an audio parameter that is generally communicated to standard audio codecs, such as the G.729 recommendation, for purposes of so-called "long term prediction", LTP for short If the low subband is used, the audio base frequency Fg can be determined or estimated from the LPT audio parameters provided by this audio codec.

For many standard audio codecs, such as according to G.729

Recommendation, an LTP fundamental period value is transmitted with a temporal resolution, ie accuracy, which is a fractional l / N of the sampling interval used by this audio codec. In an audio codec according to the G.729 recommendation, the basic LTP period value is provided with an accuracy of 1/3 of the sampling distance. In units of this sampling distance, the basic period value can also be accept non-integer values. Such accuracy can be achieved by the relevant audio encoder, for example, by a sequence of so-called "open-loop" and "closed-loop" searches. The audio encoder attempts to find that basic period value at which the intensity or energy of an LTP residual signal is minimized. However, an LTP basic period value determined in this way may deviate from the basic period value corresponding to the actual audio basic frequency FQ of the tonal audio components, in particular in the case of strong background noises, and thus impair accurate reproduction of these tonal audio components. Typical deviations include period-doubling errors and period bisecting errors. That is, the frequency corresponding to the departing LPT basic period value is one half or twice the actual audio basic frequency FQ of the audio tonal components, respectively.

Using such LTP base period values to synthesize the tonal audio components in the high subband, such large frequency deviations should be avoided. In order to minimize the effects of typical period-doubling and period-bisecting errors, the post-processing technique explained below can be used within the scope of the invention:

_Let λ _LTp (μ) denote an LTP basic period value currently extracted from the low-band decoder LBD, where μ represents an index of a respective processed time frame or subframe. The fundamental period value λ _LTP (μ) is given in units of the sampling interval of the low-band decoder LBD and can also assume non-integer values. From the relationship between the current basic period _value λ _Lτp (μ) and a filtered basic period _value λ _post (μ-l) of the previous frame, an integer factor f is first calculated as

The function round maps its argument to the nearest integer.

A decision as to whether the current basic period value λ _LTP (μ) is to be modified becomes dependent on a relative error

Ä _LW (μ) e = l-f-λ _P os _t (μ-V>

met. If the relative error e is below a predetermined threshold ε of eg 1/10, it is assumed that the current basic period value λ _LTP (μ) is the result of an incipient phase with period doubling or error error. In such a case, the current _{fundamental period value} λ _LTp (μ) is corrected or filtered by dividing by the factor f such that the filtered _{fundamental period values} λ _post (μ) behave substantially steadily over a plurality of time frames μ. It proves to be advantageous the filtered basic period value λ _post (μ) according to

to determine. By multiplying by the factor N, for example N = 3, in the argument of the round function, the resulting fundamental period value λ _t (μ) is again exact except for the fractional part 1 / N of the sampling interval of the low-band decoder LBD.

Finally, a moving average is formed over the fundamental period values λ _post (μ) for further smoothing. The moving average corresponds to a kind of low-pass filtering. With a moving average over, for example, two successive fundamental period values λ _post (μ), a basic period parameter results

λ _p {μ) = L { _λpoΛμ - 1) + _{λpost (μ} )) _r

on the basis of which the excitation signal u (k) for the high subband is derived. Due to the averaging over two values, the basic period parameter λ _p (μ) has a higher resolution by a factor of two, which corresponds to a fraction 1 / (2N) of the sampling interval of the low-band decoder LBD.

By the non-linear filtering procedure explained above, most period-doubling or generally-multiplying errors can be avoided. This results in a considerable improvement in the reproduction quality.

In the following it is explained how tonal mixing parameters g _v (μ) and atonal mixing _parameters g _uv (μ) for mixing corresponding tonal and atonal components of the excitation signal u (k) in the high subband time-frame from the subband specific mixing _parameters g _Lτp ( μ) and g _FIχ (μ) of the low-band decoder LBD are derived. It may be assumed here that the low-band decoder LBD is a so-called CELP decoder (CELP: Codebook Excited Linear Prediction), which has a so-called adaptive or LTP codebook and a so-called fixed codebook.

In real audio signals, tonal sounds almost never occur without contributions from atonal signal components. To estimate an energy or intensity ratio between tonal and atonal signal contributions, let us assume by model that the adaptive codebook contributes only tonal components in the deep subband and the fixed codebook only atonal components in the deep subband. Further assume that these two contributions are mutually orthogonal.

On the basis of these assumptions, the intensity ratio between tonal and atonal signal components can be reconstructed from the mixing parameters g _LTP and g _{FIX of} the low-band decoder LBD. Both mixing _parameters g _LTp , g _pi x can be extracted from the low-band decoder LBD on a timely-frame basis. For each time frame or subframe (indexed by μ), an instantaneous intensity ratio between the contributions of the adaptive and fixed codebooks, ie the overtone-to-noise ratio γ, can be determined by dividing the energy contributions of the adaptive and fixed codebooks.

While the mixing _parameter g _LTp (μ) indicates a gain for the adaptive _codebook signals, the mixing _parameter g _piχ (μ) indicates a gain for the fixed _codebook signals. Are output from the adaptive _{codebook codebook vectors} with x _LTp (μ) and from the codebook vectors output with fixed codebook indicated by% _iχ (μ), the overtone-to-noise ratio results as

For better modeling of the atonal audio components in the high subband, the overtone-to-noise ratio γ derived from the deep subband is converted by a kind of Wiener filter according to

This "Wiener" filtering further lowers a small γ (atonal audio segment) while barely changing large values of γ (tonally dominated audio segment). Such a reduction better approximates natural audio signals.

From the filtered overtone-to-noise ratio Y _pOS t finally gain _factors , ie mixing _parameters g _v and g _uv for tonal and atonal components of the excitation signal u (k) in the high subband can be determined

Since in practice hardly any pure tonal or pure atonal audio signals occur, both mixing parameters g _v (μ) and g _uv (μ) usually have (at the same time) a non-disappearance the value. The above calculation rule ensures that the sum of the squares of the mixing parameters g _v and g _uv , ie a total energy of the mixed

Excitation signal u (k) is substantially constant.

In the following, the generation of the excitation signal u (k) on the basis of the derived from the low-band decoder LBD audio parameters g _v , g _uv and λ _p using two variants of the

Excitation signal generator HBG explained in more detail. Here, for the sake of clarity, it is assumed that the accuracy of the basic period values in units of the sample pitch of the low-band decoder LBD is given by 1 / N where N = 3. The following statements are of course readily generalizable to any values of N.

A first embodiment of the excitation signal generator HBG is shown schematically in FIG. The embodiment shown in Figure 2 comprises a pulse generator PGI, a noise generator NOISE, a low-pass LP with waste cut frequency _fc = 8 kHz, a decimator D3 with decimation magnification factor m = 3 (or generally M = N), a high-pass filter HP with Cut-off frequency f _c = 4 kHz and a decimator D2 with decimation factor m = 2 on. The noise generator NOISE preferably generates white noise. The pulse generator PG1 in turn comprises a rectangular pulse generator SPG and a pulse shaping filter SF with a predetermined filter coefficient set p (k) of finite length. While the noise generator NOISE serves to generate the atonal components of the excitation signal u (k), the pulse generator PG1 contributes to the generation of the tonal components of the excitation signal u (k).

The audio parameters g _v , g _uv and λ _p are used in a continuous sequence from audio parameters of the low-band Decoder LBD or derived and adapted by means of a suitable audio parameter extraction block. The filter operations are for a fractional basic period parameter λ _p with an accuracy of 1 / (2N), here equal to 1/6, in units of the sampling rate of the low-band decoder LBD and for one

Target bandwidth corresponding to the bandwidth of the low-band decoder LBD designed.

Since the low-band decoder LBD uses a sampling rate of 8 kHz in accordance with its bandwidth of 0-4 kHz, and audio components of 4-8 kHz by means of the excitation signal u (k). are to be generated with a bandwidth of 4 kHz, is provided for the pulse generator PGl a sampling rate of at least 8 kHz. In accordance with the time resolution of .alpha

Basic period parameter λ _p , however, a sampling rate of f _s = 2 * N * 8 kHz = 6 * 8 kHz = 48 kHz is provided for both the pulse generator PGl and for the noise generator NOISE.

To generate the tonal component of the excitation signal, the basic period parameter λ _{p is} multiplied by the factor 2N = 6, and the product 6 * λ _{p is} fed to the rectangular pulse generator SPG. As a result, the rectangular pulse generator SPG generates individual rectangular pulses in a time interval given by 6 * λ _p in units of the sampling interval 1/48000 s of the rectangular pulse generator SPG. The individual recheck impulses have one

Amplitude of J6 * λ _p such that the average energy of a long pulse sequence is substantially constant equal to 1.

The rectangular pulses generated by the rectangular pulse generator SPG are multiplied by the "tonal" mixing parameter g _v and fed to the pulse shaping filter SF. tion filter SF, the rectangular pulses by a convolution or correlation with the filter coefficients p (k) in a sense temporally "smeared ^λΛ . By means of this filtering, the so-called crest factor, ie a ratio of peak to average samples, can be considerably reduced and the audio quality of the synthesized audio signal SAS can be considerably improved. In addition, the rectangular pulses can be spectrally shaped by the pulse shaping filter SF in an advantageous manner. Preferably, the pulse shaping filter SF may have a band-pass characteristic with a transition region around 4 kHz and a substantially uniform increase in attenuation in the direction of higher and lower frequencies. In this way it can be achieved that higher frequencies of the excitation signal u (k) have fewer harmonic components and thus the noise component increases with increasing frequency.

An exemplary selection of the filter coefficients p (k) is shown schematically in FIGS. 3a and 3b. While FIG. 3a shows the filter coefficients p (k) plotted against its sample index k, in FIG. 3b the energy spectrum of the filter coefficients p (k) is plotted against the frequency. For the target frequency range relevant in the present exemplary embodiment, essentially only the spectral range of 4-8 kHz is relevant for the filter coefficients p (k). This frequency range is indicated in Figure 3b by a widened line.

As illustrated in FIG. 2, the rectangular pulses "blurred" by the pulse shaping filter SF are added to a noise signal generated by the noise generator NOISE _and multiplied by the "atonal" mixture _parameter g _uv , and the resultant sum signal is fed to the low-pass filter LP. Until this process step, an increased sampling rate of f _s = 48 kHz was used. The remaining processing blocks shown in FIG. 2 are now used to filter out the frequency ranges outside a target frequency range of 4-8 kHz and to generate the excitation signal u (k) in a representation representing this target frequency range (with a sampling rate of f _s = 8 kHz).

For this purpose, the sum signal is first filtered by the low-pass filter LP and the filtered signal is then converted by the decimator D3 of 48 kHz sampling rate to a sampling rate of f _s = 16 kHz. The converted signal is then fed to the high-pass filter HP, which feeds the high-pass filtered signal to the decimator D2, which finally generates the excitation signal u (k) with the target sampling rate of f _s = 8 kHz from the supplied 16 kHz sampling rate signal.

The generated excitation signal u (k) contains the frequency components required for bandwidth expansion. However, these are present as spectrum mirrored around the frequency 4 kHz. In order to invert the spectrum, the excitation signal u (k) can be modulated with modulation factors (-l) ^k .

Since the components of the audio signal decoder according to FIG. 1 are substantially linear and time-invariant, the tonal and atonal components of the excitation signal u (k) can be treated independently of each other. Thus, the filtering and decimation operations for the tonal audio components provided in the embodiment variant according to FIG. 2 can also be combined in a single processing block. In fact, the impulse response of all the filtering, decimation and modulation operations provided for in FIG the tonal audio components are calculated in advance and stored in a look-up table in a suitable form.

Such a second embodiment of the excitation signal generator HBG is illustrated schematically in FIG. 4 and will be explained below. The embodiment shown in Figure 4 has a pulse generator PG2 and a preferably white noise generating noise generator NOISE. The pulse generator PG2 in turn comprises a pulse positioner PP and a look-up table LOOKUP in which predetermined pulse forms V _j (k) are stored. While the noise generator NOISE serves to generate the atonal components of the excitation signal u (k), the pulse generator PG2 contributes to the generation of the tonal components of the excitation signal u (k). Both the

The noise generator NOISE and the pulse generator PG2 directly use the target sampling rate of f _s = 8 kHz.

The excitation signal generator, the audio parameters g _v , g _uv and λ _{p time-} frame forwarded in a continuous sequence. The derivation of the audio parameters g _v , g _uv and λ _p has already been explained above. The fractional basic period parameter λ is as above with an accuracy of 1 / (2N), here equal

1/6, given in units of the sampling rate of the low-band decoder LBD.

For the tonal components of the excitation signal u (k), the impulse response of all filter, decimation and modulation operations illustrated by FIG. 2 can be calculated in advance and in the form of specific pulse forms V _j (k) in the

Lookup table LOOKUP be saved. Unless - as in the present embodiment - even non-integer Basic period parameter λ _p are to be considered, several pulse shapes V _j (k) in the look-up table LOOKÜP are kept. The number of pulse shapes V _j (k) to be provided is preferably given by the inverse of the accuracy of the fundamental period parameter λ _p , ie here by 2N. Of the

For example, index j runs from 0 to 2N-1. In the present case, 6 pre-calculated pulse forms V _j (k), j = 0,..., 5 are to be stored in the look-up table LOOKUP.

During operation of the pulse generator PG2, the look-up table LOOKUP is supplied with the fractional component λ _p - | _λ _p j of the respective basic period parameter λ _p . The bracket LJ denotes an integer part of a rational or real number. On the basis of the supplied fractional component λ _p - | _λ _p j, a pulse shape is selected from the stored pulse shapes V _j (k) and a correspondingly shaped pulse is output from the look-up table LOOKUP. In the present exemplary embodiment, λ _p - | _λ _p j can be the values 0, 1/6,

2/6, 3/6, 4/6 and 5/6. Preferably, that pulse shape V _j (k) is selected whose index j is the respective one

Counter of the relevant fraction corresponds.

Each of the stored pulse forms V _j (k) corresponds to an impulse response of the chain shown in Figure 2 from the filters SF, LP, D3, HP and D2 (and optionally a modulator) for a particular fractional fraction λ _p -

Lλ _p J of the fundamental period parameter λ _p .

FIG. 5 shows, by way of example, calculated pulse forms V _j (k) for j = 0,..., 5 in a schematic representation. The illustrated pulse shapes V _j (k) are for a fractional resolution of λ _p of 1/6 (at a sampling rate of 8 kHz) and plotted against its scan index k. An assignment of a respective pulse shape V _j (k) to the associated fractional number

Proportion λ _p - | _λ _p j can be taken from the legend of FIG.

As illustrated in Figure 4, the pulse output from the LOOKUP look-up table, having a pulse shape selected from the fractional fraction λ _p -Lλ _p J, is multiplied by the "tonal" mixing parameter g _v and supplied to the pulse positioner PP pulses, depending on the integer part Lλ _p J of the pitch period parameter λ positioned _p in time. the pulses are hereby outputted by the pulse positioning device PP in a time interval corresponding to the integer part l_λ _p j of the base period parameter λ _p. the pulses may be modulated by a respective sign the pulse forms V _j (k) and the respective

Impulse is inverted either for even values of l_λ _p j or for odd values of l_λ _p j.

To the pulses output by the pulse positioner PP, the noise signal of the noise generator NOISE multiplied by the "atonal" mixture parameter g _uv is finally added to obtain the excitation signal u (k).

The embodiment variant shown in FIG. 4 can generally be implemented with less effort than the embodiment variant shown in FIG. In fact, with an excitation signal generator according to FIG. 4, by specifying suitable pulse shapes V _j (k), it is possible to effectively use the same excitation signals u (k) as with an excitation signal generator generate according to FIG. Since the output pulses have a relatively large distance (typically 20-134 scanning distances), the computational effort for an inventive excitation signal generator according to Figure 4 is relatively low. As a result, the invention can be implemented by means of a low-cost digital signal processor with relatively low memory and computing power requirements.

Claims

claims

A method for forming an audio signal (SAS), wherein a) frequency component (NAS) of the audio signal attributable to a first subband by means of a subband decoder

(LBD) based on supplied, in each case a basic period of the audio signal (SAS) indicative of pitch period values (λ _LTP) are formed) b at a second subband attributable frequency components (HAS) of the audio signal by exciting an audio synthesis filter (ASYN) by means of a second for the subband specific excitation signal (u (k)) are formed, and c) for generating the excitation signal (u (k) by an excitation signal generator (HBG)

a basic period parameter (λ _p ) is derived from the basic period values (λ _LTP ) and

- pulses are formed with a dependent on the pitch period parameter (λ _p) pulse shape in a basic period determined by the parameter (λ _p) time interval and mixed with a noise signal.

2. The method according to claim 1, characterized in that the subband decoder (LBD) is associated with a specific for the first subband first sampling distance, and that the basic period parameter (λ _p ) the basic period of

Audio signal (SAS) down to a fraction of the first sampling distance.

3. The method according to claim 2, characterized in that the pulse shape (V _j (k)) of a respective pulse depends on a fraction (λ _p -Lλ _p J) of the basic period parameter (λ _p ) which is not integral in units of the first sampling interval, from different predetermined values stored in a look-up table Pulse shapes (V _j (k)) is selected.

4. The method according to claim 2 or 3, characterized in that the time interval of the pulses by an integral in units of the first sampling interval portion (Lλ _p J) of the

Basic period parameter (λ _p ) is determined.

A method according to claim 2 or 3, characterized in that the pulses of a predetermined pulse shape are formed by samples having a second sampling interval, the second sampling distance being smaller by a bandwidth expansion factor (N) than the first sampling interval, and the time interval of Pulses in units of the second sampling distance through the basic period parameter multiplied by the bandwidth expansion factor (N)

(λ _p ) is determined.

6. The method according to claim 5, characterized in that the pulses are formed by a pulse shaping filter (SF) with the second sampling interval predetermined filter coefficients (p (k)).

7. The method according to claim 5 or 6, characterized in that the pulses are decimated before or after admixture of the noise signal by at least one decimator (D2, D3).

8. The method according to any one of the preceding claims, characterized in that the pulses are filtered before or after admixing of the noise signal by at least one high, low and / or band pass.

9. Method according to one of the preceding claims, characterized in that the basic period parameter (λ _p ) is derived from one or more basic period values (λ _LTP ) on a timely basis.

10. The method according to any one of the preceding claims, characterized in that the basic period parameter (λ _p ) is derived from fluctuation-compensated associated basic period values (λ _LTP ) of several time frames.

11. Method according to one of the preceding claims, characterized in that a relative deviation (e) of a current basic period value (λ _LTP ) from an earlier basic period value or from a variable derived therefrom (λ _post ) is determined and in the context of the derivation of the basic period parameter (λ _p ) is damped.

12. The method according to any one of the preceding claims, characterized in that a mixing ratio between the pulses and the noise signal is determined by at least one mixing parameter (g _v , g _uv ), the time frame of a subband decoder (LBD) existing level ratio (γ) between a tonal and atonal audio signal component of the first Subbandes is derived.

13. The method according to claim 12, characterized in that in the context of the derivation of the mixing parameter (g _v , g _uv ), the level ratio (γ) is implemented such that when predominantly the atonal audio signal component of the tonal audio signal component is lowered.

14. An audio signal decoder for forming an audio signal (SAS), comprising a) a subband decoder (LBD) for forming a frequency component (NAS) of the audio signal attributed to a first subband by means of supplied ones

Basic period values (λ _LTP ), b) an audio synthesis filter (ASYN), and c) an excitation signal generator (HBG) for generating an excitation signal (u (k)) for forming frequency components attributable to a second subband (HAS) of the audio signal by exciting the audio synthesis filter, the excitation signal generator (HBG)

- Deriving means for deriving a basic period parameter (λ _p ) from the basic period values

(λ _LTP ),

a noise generator (NOISE) to form a noise signal,

- a pulse generator (PGI, PG2) for forming pulses having a fundamental period-dependent parameter (λ _p) pulse shape in a basic period determined by the parameter (λ _p) time interval, and

- Having a mixer for mixing the pulses with the noise signal.

15. An audio signal encoder with an audio signal decoder according to claim 14 and with a comparison device for matching an audio signal formed by the audio signal decoder to an audio signal to be transmitted.