ES2604983T3 - Level adjustment in the time domain for decoding or encoding of audio signals - Google Patents

Level adjustment in the time domain for decoding or encoding of audio signals Download PDF

Info

Publication number
ES2604983T3
ES2604983T3 ES14702195.0T ES14702195T ES2604983T3 ES 2604983 T3 ES2604983 T3 ES 2604983T3 ES 14702195 T ES14702195 T ES 14702195T ES 2604983 T3 ES2604983 T3 ES 2604983T3
Authority
ES
Spain
Prior art keywords
audio signal
representation
frequency band
level
time domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
ES14702195.0T
Other languages
Spanish (es)
Inventor
Stephan Schreiner
Arne Borsum
Matthias Neusinger
Manuel Jander
Markus Lohwasser
Bernhard Neugebauer
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Original Assignee
Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to EP13151910 priority Critical
Priority to EP13151910.0A priority patent/EP2757558A1/en
Application filed by Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV filed Critical Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Priority to PCT/EP2014/050171 priority patent/WO2014111290A1/en
Application granted granted Critical
Publication of ES2604983T3 publication Critical patent/ES2604983T3/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/0017Lossless audio signal coding; Perfect reconstruction of coded audio signal by transmission of coding error
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/0018Speech coding using phonetic or linguistical decoding of the source; Reconstruction using text-to-speech synthesis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0324Details of processing therefor
    • G10L21/0332Details of processing therefor involving modification of waveforms
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0324Details of processing therefor
    • G10L21/034Automatic adjustment

Abstract

An audio signal decoder (100) configured to provide a decoded audio signal representation based on an encoded audio signal representation, the audio signal decoder comprising: a decoder pre-processing stage (110) configured to obtain a plurality of frequency band signals from the representation of the encoded audio signal; a clipping estimator (120) configured to analyze the additional information regarding a gain of the frequency band signals of the representation of the encoded audio signal as to whether the additional information suggests a potential cut to determine a factor of change of current level for the representation of encoded audio signal, in which when additional information suggests potential clipping, the current level change factor causes the information of the plurality of frequency band signals to be shifted to a bit less significant in such a way that a space is gained in at least one more significant bit; a level changer (130) configured to change the levels of the frequency band signals according to the current level change factor to obtain the frequency band signals changed from level; a frequency domain to time domain converter (140) configured to convert the frequency band signals changed from level to a time domain representation; and a level change compensator (150) configured to act on the time domain representation to compensate at least partially for a level change applied to the frequency band signals changed by the level changer (130) and to obtain a substantially compensated time domain representation.

Description

5

10

fifteen

twenty

25

30

35

40

Four. Five

fifty

55

60

Level adjustment in the time domain for decoding or encoding of audio signals

DESCRIPTION

The present invention relates to coding, decoding, and processing of audio signals, and, in particular, to adjusting a level of a signal that is to be converted from frequency to time (or that is to be converted from time to frequency ) to the dynamic range of a corresponding frequency converter (or time to frequency converter). Some embodiments of the present invention relate to the adjustment of the level of a signal to be converted from frequency to time (or to be converted from time to frequency) to the dynamic range of a corresponding converter implemented in fixed point arithmetic or integers Additional embodiments of the present invention relate to the prevention of clippings for spectral decoded audio signals that use level adjustment in the time domain in combination with additional information.

Audio signal processing becomes more and more important. Challenges increase as modern perceptual audio codecs are needed to deliver satisfactory audio quality at increasingly low bit rates.

In the current production of audio content and supply chains the digitally available master content (PCM flow (pulse modulated flow) is coded for example by a professional AAC encoder (advanced audio coding) on the creation side Of content. The resulting AAC bit stream is then made available for purchase for example through a digital media store on line. It seems that in rare cases some of the decoded PCM samples are “trimmed”, which means that two or more consecutive samples reached the maximum level that can be represented by the underlying bit resolution (for example 16 bits) of a dot representation fixed quantified uniformly (for example modulated according to PCM) for the output waveform. This can lead to audible distortions (clicks or short distortion). Although an effort will typically be made on the encoder side to avoid the occurrence of the cutout on the decoder side, however the cutout may occur on the decoder side for several reasons, such as different implementations of the decoder, rounding errors, errors of transmission, etc. Assuming that an audio signal at the encoder input is below the trim threshold, the reasons for trimming in a modern perceptual audio encoder are multiple. First, the audio encoder applies quantization to the transmitted signal that is available in a frequency decomposition of the input waveform to reduce the transmission data rate. The quantization errors in the frequency domain result in small deviations of the signal amplitude and phase with respect to the original waveform. If the phase or amplitude errors add up constructively, the resulting position in the time domain may be temporarily higher than the original waveform. Secondly, the parametric coding methods (for example spectral band replication, SBR) parameterize the signal power quite a bit. Typically the phase information is omitted. Consequently, the signal on the receiver side is regenerated only with the correct power but without the conservation of the waveform. Signals with an amplitude close to the full scale are prone to clipping.

Modern audio coding systems offer the possibility of transmitting a loudness level parameter (g1) giving decoders the ability to adjust the loudness for playback with unified levels. In general, this can lead to clipping, if the audio signal is encoded at sufficiently high levels and the transmitted normalization gains suggest an increase in loudness levels. In addition, the common practice in mastering the audio content (especially music) increases the audio signals to the maximum possible values, resulting in the clipping of the audio signal when quantified enough by the audio codecs.

To avoid trimming audio signals, so-called limiters are known as an appropriate tool to restrict audio levels. If an incoming audio signal exceeds a certain threshold, the limiter is activated and attenuates the audio signal in a way that the audio signal does not exceed a given level at the output. Unfortunately, before the limiter, sufficient space is needed (in terms of dynamic range and / or bit resolution).

In general, any normalization of loudness in the frequency domain is achieved together with a so-called "dynamic range control" (CRT). This makes it possible to soften the loudness normalization mixture and even if the normalization gain varies from frame to frame due to the superposition of the filter bank.

In addition, due to poor quantification or parametric description, any encoded audio signal may have clipping if the original audio is mastered at levels close to the clipping threshold.

The technical publication of ISO / IEC MPEG-2 Advanced Audio Coding by Bosi et al., Journal of the Audio Engineering Society, vol. 45, n. 10, October 1997, pages. 789-811, describes the main features of the AAC system (ISO / IEC 13818-7). This technology combines the coding efficiency of a high filter bank

5

10

fifteen

twenty

25

30

35

40

Four. Five

fifty

55

60

resolution, Huffman prediction and coding techniques with additional functionalities that aim to provide very high audio quality at a variety of data rates.

It is typically desirable to keep computational complexity, memory usage and power consumption as low as possible in highly efficient digital signal processing devices based on a fixed point arithmetic. For this reason, it is also desirable to keep the word length of the audio samples as low as possible. To take into account any potential space for trimming due to the normalization of the loudness, a bank of filters, which is typically a part of an audio encoder or decoder, must be designed with a longer word length.

It is desirable to allow signal limitation without losing data accuracy and / or without a need to use a longer word length for a decoder filter bank or an encoder filter bank. Alternatively or additionally, it would be desirable if a relevant dynamic range of the signal could be determined to be converted from frequency to time or vice versa continuously in a frame by frame manner for consecutive time sections or "frames" of the signal such that the signal level can be adjusted in a way that the current relevant dynamic range is adjusted in the dynamic range provided by the converter (frequency domain converter to time domain or time domain domain converter of frequency). It will also be desirable to make such a level change for the purpose of frequency to time conversion or time to frequency conversion substantially "transparent" to other components of the decoder or encoder.

At least one of these possible desires and / or additional desires is treated by an audio signal decoder according to claim 1, an audio signal encoder according to claim 14, and a method for decoding a signal representation of encoded audio according to claim 15.

An audio signal decoder is provided to provide a decoded audio signal representation based on an encoded audio signal representation. The audio signal decoder comprises a decoder pre-processing stage configured to obtain a plurality of frequency band signals from the presentation of encoded audio signal. The audio signal decoder further comprises a clipping estimator configured to analyze at least one of the encoded audio signal representation, the plurality of frequency signals, and additional information regarding a gain of the frequency band signals. of the encoded audio signal representation as to whether the encoded audio signal information, the plurality of frequency signals, and / or the additional information suggests or suggests a potential cut to determine a current level change factor for the Representation of encoded audio signal. When the additional information suggests the potential cut, the current level change factor causes the information of the plurality of frequency band signals to be changed to a less significant bit, thus gaining space in at least one more significant bit. The audio signal decoder also comprises a level changer configured to change the levels of the frequency band signals according to the level change factor to obtain the frequency band signals changed of level. In addition, the audio signal decoder comprises a converter from the frequency domain to the time domain configured to convert the frequency band signals changed from level to a representation in the time domain. The audio signal decoder further comprises a level change compensator configured to act on the representation in the time domain to at least partially compensate for a level change applied to the frequency band signals changed by the level changer. and to obtain a substantially compensated time domain representation.

Additional embodiments of the present invention provide an audio signal encoder configured to provide an encoded audio signal representation based on a time domain representation of an input audio signal. The audio signal encoder comprises a clipping estimator configured to analyze the time domain representation of the input audio signal as to whether the potential clipping is suggested to determine a current level change factor for the presentation of input signal. When the potential clipping is suggested, the current level change factor causes the representation in the time domain of the input audio signal to be changed to a less significant bit, thus gaining space in at least one more significant bit. . The audio signal encoder further comprises a level changer configured to change a level of the time domain representation of the input audio signal according to the level change factor to obtain a representation in the time domain changed. of level. In addition, the audio signal encoder comprises a converter from the time domain to the frequency domain configured to convert the representation in the time domain changed from level to a plurality of frequency band signals. The audio signal encoder also comprises a level change compensator configured to act on the plurality of frequency band signals to at least partially compensate for a level change applied to the presentation in the time domain level changed by the changer. level and to obtain a plurality of substantially compensated frequency band signals.

Additional embodiments of the present invention provide a method for decoding the presentation of

5

10

fifteen

twenty

25

30

35

40

Four. Five

fifty

55

60

encoded audio signal to obtain a representation of decoded audio signal. The method comprises preprocessing the representation of encoded audio signal to obtain a plurality of frequency band signals. The method further comprises analyzing at least one of the encoded audio signal representation, the frequency band signals, and the additional information regarding a gain in the frequency band signals as to whether the potential cutout is suggested to determine a current level change factor for the presentation of encoded audio signal. When the potential clipping is suggested, the current level change factor causes the representation in the time domain of the input audio signal to be changed to a less significant bit, thus gaining space in at least one more significant bit. . In addition, the method comprises changing the levels of the frequency band signals according to the level change factor to obtain the frequency band signals changed in level. The method also comprises converting the frequency domain to the time domain of the frequency band signals to a representation in the time domain. The method further comprises acting on the representation in the time domain to at least partially compensate for a level change applied to the frequency band signals changed in level and to obtain a representation in the substantially compensated time domain.

In addition, an information program is provided to implement the methods described above when they are run on a computer or signal processor.

Additional embodiments provide an audio signal decoder to provide a decoded audio signal representation based on an encoded audio signal representation. The audio signal decoder comprises a decoder pre-processing stage configured to obtain a plurality of frequency band signals from the presentation of encoded audio signal. The audio signal decoder further comprises a clipping estimator configured to analyze at least one of the encoded audio signal representation, the plurality of frequency band signals, and additional information regarding a gain of the band signals. frequency of the encoded audio signal representation to determine a current level change factor for the encoded audio signal representation. The audio signal decoder also comprises a level changer configured to change the levels of the frequency band signals according to the level change factor to obtain the frequency band signals changed of level. In addition, the audio signal decoder comprises a converter from the frequency domain to the time domain configured to convert the frequency band signals changed from level to a representation in the time domain. The audio signal decoder further comprises a level change compensator configured to act on the representation in the time domain to at least partially compensate for a level change applied to the frequency band signals changed by the level changer. and to obtain a substantially compensated time domain representation.

Additional embodiments of the present invention provide an audio signal encoder configured to provide an encoded audio signal representation based on a time domain representation of an input audio signal. The audio signal encoder comprises a clipping estimator configured to analyze the time domain representation of the input audio signal to determine a current level change factor for the input signal representation. The audio signal encoder further comprises a level changer configured to change a level of the representation in the time domain of the input audio signal according to the level change factor to obtain a representation in the time domain. changed level In addition, the audio signal encoder comprises a converter from the time domain to the frequency domain configured to convert the representation in the time domain changed from level to a plurality of frequency band signals. The audio signal encoder also comprises a level change compensator configured to act on the plurality of frequency band signals to at least partially compensate for a level change applied to the presentation in the time domain level changed by the changer. level and to obtain a plurality of substantially compensated frequency band signals.

Additional embodiments of the present invention provide a method for decoding the presentation of encoded audio signal to obtain a representation of decoded audio signal. The method comprises preprocessing the representation of encoded audio signal to obtain a plurality of frequency band signals. The method further comprises analyzing at least one of the encoded audio signal representation, the frequency band signals, and additional information is suggested regarding a gain of the frequency band signals to determine a current change factor for the Presentation of coded audio signal. In addition, the method comprises changing the levels of the frequency band signals according to the level change factor to obtain the frequency band signals changed in level. The method also comprises converting the frequency domain to the time domain of the frequency band signals to a time domain representation. The method further comprises acting on the representation in the time domain to at least partially compensate for a level change applied to the frequency band signals changed in level and to obtain a representation in the substantially compensated time domain.

5

10

fifteen

twenty

25

30

35

40

Four. Five

fifty

55

60

At least some of the embodiments are based on the idea that it is possible, without losing relevant information, to change the plurality of frequency band signals of a representation in the frequency domain by a certain level change factor during intervals of time, in which a general level of loudness of the audio signal is relatively high. Instead, the relevant information is changed to bits that are likely to contain noise, anyway. In this way, a converter from the frequency domain to the time domain having a limited word length can be used even though a dynamic range of the frequency band signals can be larger than that supported by the length of the Limited domain converter word from frequency to time domain. In other words, at least some embodiments of the present invention take advantage of the fact that the least significant bit or bits typically do not carry any relevant information while the audio signal is relatively sound, that is, while the relevant information is more likely than this contained in the most significant bit or bits. The change in level applied to the frequency band signals changed in level may also have the benefit of reducing a probability of clipping within the time domain representation, where such clipping may result from a constructive overlap of one or more frequency band signals of the plurality of frequency band signals.

These ideas and findings are also applied in an analogous way to the audio signal encoder and the method to encode an original audio signal to obtain a coded audio signal presentation.

Next, the embodiments of the present invention are described in more detail with reference to the figures, in which:

Figure 1 illustrates an encoder according to the state of the art;

Figure 2 represents a decoder according to the state of the art;

Figure 3 illustrates another encoder according to the state of the art;

Figure 4 represents an additional decoder according to the state of the art;

Figure 5 shows a schematic block diagram of an audio signal decoder according to at least one embodiment;

Figure 6 shows a schematic block diagram of an audio signal decoder according to at least one additional embodiment;

Figure 7 shows a block diagram illustrating a concept of the proposed audio signal decoder and the proposed method for decoding an encoded audio signal representation according to the embodiments;

Figure 8 is a schematic visualization of the level change to gain space;

Figure 9 shows a schematic block diagram of a possible transition shape setting that may be a component of the audio signal decoder or encoder according to at least some embodiments;

Figure 10 represents an estimation unit according to a further embodiment comprising a prediction filter adjuster;

Figure 11 illustrates an apparatus for generating a return data flow;

Figure 12 illustrates an encoder according to the state of the art;

Figure 13 represents a decoder according to the state of the art;

Figure 14 illustrates another encoder according to the state of the art; Y

Figure 15 shows a schematic block diagram of an audio signal encoder according to at least one embodiment; Y

Figure 16 shows a schematic flow chart of a method for decoding the representation of encoded audio signal according to at least one embodiment.

5

10

fifteen

twenty

25

30

35

40

Four. Five

fifty

55

60

Audio processing has advanced in many aspects and has been the subject of several studies, how to efficiently encode and decode an audio data signal. Efficient coding, for example, is provided by MPEG AAC (MPEG = group of experts in motion pictures; AAC = advanced audio coding). Some aspects of MPEG AAC are explained in more detail below, as an introduction to audio coding and decoding. The description of MPEG AAC is only to be understood as an example, since the concepts described can be applied to other audio coding and decoding schemes, as well.

According to MPEG AAC, the spectral values of an audio signal are encoded using scale factors, quantification and code books, in particular Huffman code books.

Before Huffman coding is carried out, the encoder groups the plurality of spectral coefficients to be encoded in different sections (the spectral coefficients that have been obtained from upstream components, such as a filter bank, a model psycho-acoustic, and a quantifier controlled by the psycho-acoustic model with respect to quantification resolutions and quantification thresholds). For each section of spectral coefficients, the encoder chooses a Huffman code book for Huffman coding. The MPEG AAC provides eleven Huffman code books of different spectra to encode the spectral data from which the encoder selects the codebook that best fits to encode the spectral coefficients of the section. The encoder provides a codebook identifier that identifies the codebook used for Huffman coding of the spectral coefficients of the section to the decoder as additional information.

On one side of the decoder, the decoder analyzes the additional information received to determine which of the plurality of Huffman spectrum code books has been used to encode the spectral values of a section. The decoder performs the Huffman decoding based on the additional information about the Huffman codebook used to encode the spectral coefficients of the section to be decoded by the decoder.

After Huffman decoding, a plurality of quantified spectral values are obtained in the decoder. The decoder can then perform inverse quantization to reverse a non-uniform quantification that could have been carried out by the encoder. By this, the quantified spectral values are obtained inversely in the decoder.

However, the spectral values quantified inversely may not yet be scaled. The non-scaled, derived spectral values have been grouped into scale factor bands, each scale factor band having a common scale factor. The scale factor for each band of scale factor is available to the decoder as additional information, which has been provided by the encoder. By using this information, the decoder multiplies the unmodified spectral values to scale of a scale factor band by its scale factor. By this, the spectral values modified to scale are obtained.

The coding and decoding of the spectral values according to the state of the art is now explained with reference to Figures 1-4.

Figure 1 illustrates an encoder according to the state of the art. The encoder comprises a bank of T / F filters (time to frequency) 10 to transform an audio signal AS, which will be encoded, from the time domain to a frequency domain to obtain an audio signal in the frequency domain . The audio signal in the frequency domain is fed into a scale factor unit 20 to determine the scale factors. The scale factor unit 20 is adapted to divide the spectral coefficients of the audio signal in the frequency domain into several groups of spectral coefficients called scale factor bands, which share a scale factor. A scale factor represents a gain value used to change the amplitude of all spectral coefficients in the respective scale factor band. The scale factor unit 20 is further adapted to generate and emit unmodified spectral coefficients at the level of the audio signal in the frequency domain.

In addition, the encoder in Figure 1 comprises a quantifier for quantifying the unmodified spectral coefficients at the scale of the audio signal in the frequency domain. The quantifier 30 may be a non-uniform quantifier.

After quantification, the unmodified, quantified spectra of the audio signal are fed into a Huffman 40 encoder to undergo Huffman encoding. Huffman encoding is used to reduce the redundancy of the quantized spectrum of the audio signal. The plurality of quantified spectral coefficients not modified to scale are grouped into sections. While eleven books of possible codes are provided in the MPEG AAC, all spectral coefficients of a section are coded by the same Huffman code book.

5

10

fifteen

twenty

25

30

35

40

Four. Five

fifty

55

60

The encoder will choose one of eleven books of possible Huffman codes that is particularly suitable for encoding the spectral coefficients of the section. By this, the selection of the Huffman codebook from the encoder for a particular section depends on the spectral values of the particular section. The spectral coefficients subjected to Huffman coding can then be transmitted to the decoder together with additional information comprising, for example, information about the Huffman codebook that has been used to encode a section of spectral coefficients, a scale factor that has been used. for a particular scale factor band, etc.

Two or four spectral coefficients are coded by a code word from the Huffman codebook used for Huffman coding of the spectral coefficients of the section. The encoder transmits the code words representing the spectral coefficients encoded to the decoder together with additional information that includes the length of a section as well as information about the Huffman codebook used to encode the spectral coefficients of the section.

In the MPEG AAC, eleven books of spectrum Huffman codes are provided to encode the spectral data of the audio signal. The code book Huffman of different spectrum can be identified by its code the codebook (a value between 1 and 11). The size of the Huffman codebook indicates how many spectral coefficients are encoded by a code word from the Huffman codebook considered. In the MPEG AAC, the size of a Huffman codebook indicates, either 2 or 4, that a code word encodes either two or four spectral values of the audio signal.

However, different Huffman code books also differ with respect to other properties. For example, the maximum absolute value of a spectral coefficient that can be encoded by the Huffman codebook is from codebook to codebook and can be, for example, 1, 2, 4, 7, 12 or greater. In addition, a considered Huffman code book can be adapted to encode signed or non-signed values.

When using Huffman coding, the spectral coefficients are encoded by code words of different lengths. The MPEG AAC provides two different Huffman code books that have an absolute maximum value of 1, two different Huffman code books that have an absolute maximum value of 2, two different Huffman code book books that have an absolute maximum value of 4 , two different Huffman code books that have an absolute maximum value of 7 and two different Huffman code books that have an absolute maximum value of 12, where each Huffman code book represents a different probability distribution function. The Huffman encoder will always choose the Huffman code book that best fits the coding of the spectral coefficients.

Figure 2 illustrates a decoder according to the state of the art. The spectral values subject to Huffman encoding are received by a Huffman 50 decoder. The Huffman 50 decoder also receives, as additional information, information about the Huffman codebook used to encode the spectral values for each spectral value section. The Huffman 50 decoder then performs Huffman decoding to obtain the quantified spectral values not modified to scale. The quantified, non-scaled, spectral values are fed into an inverse quantizer 60. The inverse quantizer performs inverse quantization to obtain the non-scaled spectral values, inverse quantized, which are fed to a scale modifier 70. The modifier Scale 70 also receives scale factors as additional information for each scale factor band. Based on the received scale factors, the scale modifier 70 modifies the scale of the inverse quantified spectral values not modified to scale to obtain the inverse quantified spectral values, modified to scale. An F / T 80 filter bank then transforms the quantified spectral values in reverse, modified to scale of the audio signal in the frequency domain from the frequency domain to the time domain to obtain sample values of a Audio signal in the time domain.

Figure 3 illustrates an encoder according to the state of the art that differs from the encoder of Figure 1 in that the encoder of Figure 3 further comprises a TNS unit on the encoder side (TNS = temporary noise shaping). Temporary noise shaping can be used to control the temporal form of quantization noise when carrying out a filtering process with respect to portions of the spectral data of the audio signal. The TNS unit on the encoder side 15 performs a linear predictive coding (LPC) calculation with respect to the spectral coefficients of the audio signal in the domain of the frequency to be encoded. Among others, those resulting from the LPC calculation are the reflection coefficients, also called PARCOR coefficients. The temporal noise conformation is not used if the prediction gain, which is also derived by the LPC calculation, does not exceed a certain threshold value. However, if the prediction gain is greater than the threshold value, the temporal noise conformation is used. The TNS unit on the encoder side eliminates all reflection coefficients that are less than a certain threshold value. The remaining reflection coefficients are converted into linear prediction coefficients and are used as noise forming filter coefficients in the encoder. The TNS unit on the encoder side then performs a filter operation on those spectral coefficients, for which TNS is used, to obtain processed spectral coefficients

5

10

fifteen

twenty

25

30

35

40

Four. Five

fifty

55

60

of the audio signal. Additional information indicating TNS information, for example, reflection coefficients (PARCOR coefficients) is transmitted to the decoder.

Figure 4 illustrates a decoder according to the state of the art that differs from the decoder illustrated in Figure 2 insofar as the decoder of Figure 4 further comprises a TNS unit on the side of the decoder 75. The TNS unit on the side The decoder receives inversely quantified scaled spectra of the audio signal and also receives TNS information, for example, information indicating the reflection coefficients (PARCOR coefficients). The TNS unit on the decoder side 75 processes the quantized spectra in an inverse manner of the audio signal to obtain an inverse quantized spectrum of the audio signal processed.

Figure 5 shows a schematic block diagram of an audio signal decoder 100 according to at least one embodiment of the present invention. The audio signal decoder is configured to receive an encoded audio signal representation. Typically, the presentation of encoded audio signal is accompanied by additional information. The representation of encoded audio signal together with the additional information can be provided in the form of a data stream that has been produced by, for example, a perceptual audio encoder. The audio signal decoder 100 is further configured to provide a representation of decoded audio signal that can be identical to the signal labeled "representation in the substantially compensated time domain" in Figure 5 or derived therefrom using processing later.

The audio signal decoder 100 comprises a pre-processing stage of decoder 110 that is configured to obtain a plurality of frequency band signals from the encoded audio signal representation. For example, the decoder preprocessing step 110 may comprise a bit stream unpacker in case the encoded audio signal representation and additional information are contained in a bit stream. Some of the audio coding standards may use time-varying resolutions and also different resolutions for the plurality of frequency band signals, depending on the frequency range in which the coded audio signal presentation currently carries relevant information ( high resolution) or irrelevant information (low resolution or no data at all). This means that a frequency band in which the encoded audio signal representation currently has a large amount of relevant information is typically encoded using a highly accurate resolution (i.e., using a relatively high number of bits) during that interval of time, unlike a frequency band signal that temporarily carries no or only very little information. It may even happen that for some of the frequency band signals that the bit stream does not temporarily contain data or bits, at all, because these frequency band signals do not contain any relevant information during the corresponding time interval . The bitstream provided to the decoder preprocessing stage 110 typically contains information (for example, as part of the additional information) indicating which frequency band signals of the plurality of frequency band signals contain data for the time interval or "frame" currently considered and the corresponding bit resolution.

The audio signal decoder 100 further comprises a clipping estimator 120 configured to analyze the additional information regarding a gain of the frequency band signals of the encoded audio signal representation to determine a current level change factor for the representation of encoded audio signal. Some perceptual audio coding standards use individual scale factors for the different frequency band signals of the plurality of frequency band signals. The individual scale factors indicate for each frequency band signal the current amplitude range, with respect to the other frequency band signals. For some embodiments of the present invention an analysis of these scale factors allows an approximate evaluation of a maximum amplitude that can take place in a representation in the corresponding time domain after the plurality of frequency band signals has been converted from a frequency domain to a time domain. This information can then be used to determine whether, without any appropriate processing as proposed by the present invention, it would be likely that the clipping would occur within the time domain representation for the time interval or "frame" considered. The clipping estimator 120 is configured to determine a level change factor that changes all frequency band signals of the plurality of frequency band signals in an identical amount with respect to the level (with respect to a signal amplitude or signal strength, for example). The level change factor can be determined for each time interval (frame) individually, that is, the level change factor is variable over time. Typically, the clipping estimator 120 will attempt to adjust the levels of the plurality of frequency band signals by the change factor that is common for all frequency band signals in a way that it is very unlikely that clipping will occur within the representation of the time domain, but at the same time maintaining a reasonable dynamic range for the frequency band signals. As an example, a plot of the representation of the encoded audio signal in which a number of the scale factors is relatively high is considered. The clipping estimator 120 can now consider the worst case scenario, that is, possible signal peaks within the plurality of

5

10

fifteen

twenty

25

30

35

40

Four. Five

fifty

55

60

frequency band signals that overlap or sum in a constructive manner, resulting in a large amplitude within the time domain representation. The level change factor can now be determined as a number that causes this hypothetical peak within the time domain representation to be within a desired dynamic range, possibly with the additional consideration of a margin. At least according to some embodiments, the clipping estimator 120 does not need the coded audio signal representation itself to evaluate a clipping probability within the time domain representation for the time interval or frame considered. The reason is that at least some perceptual audio coding standards choose the scale factors for the frequency band signals for the plurality of frequency band signals according to the largest amplitude that has to be encoded within a certain frequency band signal and the time interval considered. In other words, the highest volume that can be represented by the bit resolution chosen for the available frequency band signal is very likely to occur at least once during the time interval or frame considered, given the properties of the coding scheme. By using this assumption, the clipping estimator 120 may focus on evaluating the additional information regarding the gain or gains of the frequency band signals (eg, said scale factor and possibly additional parameters) to determine the factor of change of current level for the representation of encoded audio signal and the time interval considered (frame).

The audio signal decoder 100 further comprises a level changer 130 configured to change the levels of the frequency band signals according to the level change factor to obtain the frequency band signals changed in level.

The audio signal decoder 100 further comprises a converter from the frequency domain to the time domain 140 configured to convert the level band frequency signals into a representation in the time domain. The frequency domain to time domain converter 140 may be an inverse filter bank, a discrete inverse modified cosine transformation (inverse MDCT), an inverse quadrature mirror filter (inverse QMF), to name a few. For some audio coding standards the frequency domain to time domain converter 140 can be configured to support consecutive window frame formation, in which two frames overlap for, for example, 50% of their duration.

The time domain representation provided by the frequency domain to time domain converter 140 is provided to a level 150 offset compensator that is configured to act on the time domain representation to at least partially compensate for a level change applied to the frequency band signals changed level by the level changer 130, and to obtain a substantially compensated time domain representation. The level change compensator 150 also receives the level change factor of the trim estimator 140 or a signal derived from the level change factor. The level changer 130 and the level 150 compensator provide a gain adjustment of the frequency band signals changed in level and a compensation gain adjustment of the presentation in the time domain, respectively, in which said gain adjustment prevents the frequency domain to time domain converter 140. In this way, the frequency band signals changed in level and the time domain representation can be adjusted to a dynamic range provided by the frequency converter frequency domain to time domain 140 that can be limited due to a fixed word length and / or a fixed point arithmetic implementation of converter 140. In particular, the relevant dynamic range of the frequency band signals changed in level and the representation in the corresponding time domain may be at relatively high signal strength levels or amplitude values. nte relatively sound frames. On the contrary, the relevant dynamic range of the frequency band signal changed in level and consequently also of the representation in the corresponding time domain can be in signal power values or relatively small amplitude values during relatively smooth frames. In the case of sound frames, the information contained in the lower bits of a binary representation of the frequency band signals changed in level can typically be considered negligible compared to the information contained within the upper bits. Typically, the level change factor is common for all frequency band signals which makes it possible to compensate for the level change applied to the frequency band signals changed from level even downstream of the domain converter from frequency to domain. of time 140. Unlike the proposed level change factor that is determined by the audio signal decoder 100 itself, the so-called global gain parameter is contained within the bit stream that was produced by a remote audio signal encoder. and audio signal decoder 100 was provided as an input. In addition, the overall gain is applied to the plurality of frequency band signals between the pre-processing stage of decoder 110 and the frequency domain converter to the time domain 140. Typically, the overall gain is applied to the plurality of frequency band signals in substantially the same place within the signal processing chain as the scale factors for the different frequency band signals. This means that for a relatively sonic frame the frequency band signals provided to the converter of the frequency domain to the time domain 140 are already relatively loud, and therefore can cause cut in the representation in the corresponding time domain, due to to the plurality of signals from

5

10

fifteen

twenty

25

30

35

40

Four. Five

fifty

55

60

frequency band did not provide enough space in case the different frequency band signals were added in a constructive manner, thus leading to a relatively high signal amplitude within the time domain representation.

The proposed approach, that is, for example implemented by the audio signal decoder 100 illustrated schematically in Figure 5, allows signal limitation without losing data accuracy or using longer word lengths for decoder filter banks ( for example, the converter from the frequency domain to the time domain 140).

To overcome the problem of the restricted word length of the filter banks, the normalization of loudness as a potential clipping source can be moved to processing in the time domain. This allows the filter bank 140 to be implemented with the original word length or the reduced word length compared to an implementation where loudness normalization is performed within the frequency domain processing. To make a smooth mix of gain values, a transition shape adjustment can be made as will be explained later in the context of Figure 9.

In addition, audio samples within the bit stream are generally quantified at a lower precision than the reconstructed audio signal. This allows some space in the filter bank 140. The decoder 100 derives some estimate from the other bit flow parameter p (such as the overall gain factor) and, for the case where the output signal trimming is likely, a change in level (g2) is applied to prevent clipping in the filter bank 140. This change in level is signaled to the time domain for appropriate compensation by the level 150 offset compensator. If the clipping is not estimated, the Audio signal remains unchanged and therefore the method has no loss of precision.

The clipping estimator can also be configured to determine a cut probability based on the additional information and / or to determine the current level change factor based on the cut probability. Although the probability of clipping only indicates a trend, instead of a permanent fact, it can provide useful information related to the level change factor that can be reasonably applied to the plurality of frequency band signals for a given frame of the representation of encoded audio signal. The determination of the probability of clipping can be relatively simple in terms of complexity or computational effort and in comparison with the conversion of the frequency domain to the time domain performed by the frequency domain converter to the time domain 140.

The additional information may comprise at least one of a global gain factor for the plurality of frequency band signals and a plurality of scale factors. Each scale factor may correspond to one or more frequency band signals of the plurality of frequency band signals. The overall gain factor and / or the plurality of scale factors already provide useful information related to a level of loudness of the current frame to be converted to the time domain by the converter 140.

According to at least some embodiments, the decoder preprocessing stage 110 can be configured to obtain the plurality of frequency band signals in the form of a plurality of successive frames. The clipping estimator 120 can be configured to determine the current level change factor for a current frame. In other words, the audio signal decoder 100 can be configured to dynamically determine the variable level change factors for different frames of the representation of the encoded audio signal, for example depending on a varying degree of loudness within of the successive frames.

The representation of decoded audio signal can be determined based on the representation in the substantially compensated time domain. For example, the audio signal decoder 100 may further comprise a limiter in the time domain downstream of the level 150 compensator. In accordance with some embodiments, the level 150 compensator may be a part of a limiter. in the time domain of this type.

According to additional embodiments, the additional information regarding the gain of the frequency band signals may comprise a plurality of gain factors related to the frequency band.

The pre-processing stage of decoder 110 may comprise a reverse quantizer configured to pre-quantify each frequency band signal using a specific quantization indicator of the frequency band of a plurality of specific quantization indicators of the frequency band. In particular, different frequency band signals may have been quantified using different quantization resolutions (or bit resolutions) by means of an audio signal encoder that has created the coded audio signal presentation and the corresponding additional information. The different frequency band specific quantification indicators can therefore provide information about an amplitude resolution for the various frequency band signals, depending on a required amplitude resolution for that particular frequency band signal determined above. by the audio signal encoder.

5

10

fifteen

twenty

25

30

35

40

Four. Five

fifty

55

60

The plurality of frequency band specific quantification indicators may be part of the additional information provided to the pre-processing stage of decoder 110 and may provide additional information to be used by the clipping estimator 120 to determine the factor. of change of level.

The clipping estimator 120 can also be configured to analyze the additional information with respect to whether the additional information suggests a potential cut within the time domain representation. A finding of this type will then be interpreted as a less significant bit (LSB) that does not contain relevant information. In this case the level change applied by the level changer 130 can change the information towards the least significant bit so that by releasing a more significant bit (MSB) some space is gained in the most significant bit, which may be necessary for the resolution in the time domain in case two or more of the frequency band signals are summed up in a constructive way. This concept can also be extended to the least significant n bits and the most significant n bits.

The clipping estimator 120 can be configured to consider a quantization noise. For example, in AAC decoding, both the "overall gain" and the "scale factor bands" are used to normalize the audio / sub-band. As a consequence, the relevant information of each (spectral) value is changed to the MSB, while the LSBs are neglected in the quantification. After re-quantification in the decoder, typically only LSBs contain noise. If the "overall gain" and "scale factor band" values (p) suggest a potential cut after reconstruction filter bank 140, it can be reasonably assumed that the LSBs do not contain information. With the proposed method, the decoder 100 also changes the information in these bits to gain some space with the MSBs. This causes substantially no loss of information.

The proposed apparatus (audio signal decoder or encoder) and the methods allow the prevention of clippings for audio decoders / encoders without using a high resolution filter bank for the required space. This is typically much less expensive in terms of memory requirements and computational complexity than performing / implementing a higher resolution filter bank.

Figure 6 shows a schematic block diagram of an audio signal decoder 100 in accordance with additional embodiments of the present invention. The audio signal decoder 100 comprises a reverse quantizer 210 (Q-1) that is configured to receive the encoded audio signal representation and typically also the additional information or a portion of the additional information. In some embodiments, the reverse quantizer 210 may comprise a bit stream unpacker configured to unpack a bit stream containing the encoded audio signal representation and additional information, for example in the form of data packets, in which Each data packet may correspond to a certain number of frames of the encoded audio signal representation. As explained above, within the encoded audio signal representation and within each frame, each frequency band can have its own individual quantization resolution. In this way, the frequency bands that require a relatively precise quantification temporarily, to correctly represent the portions of the audio signal within the frequency bands, can have a precise quantification resolution of this type. On the other hand, the frequency bands that contain, during a given frame, nothing or only a small amount of information can be quantified using a much more sufficient quantification, thus saving bits of data. The inverse quantizer 210 can be configured to provide the different frequency bands, which have been quantified using time-varying and individual quantization resolutions, at a common quantization resolution. The common quantization resolution may be, for example, the resolution provided by an arithmetic fixed point representation that is used by the audio signal decoder 100 internally for calculations and processing. For example, the audio signal decoder 100 may internally use a 16-bit or 24-bit fixed point representation. The additional information provided to the inverse quantizer 210 may contain information related to different quantization resolutions for the plurality of frequency band signals for each new frame. The inverse quantizer 210 can be considered as a special case of the pre-processing stage of decoder 110 represented in Figure 5.

The trim estimator 120 shown in Figure 6 is similar to the trim estimator 120 in Figure 5.

The audio signal decoder 100 further comprises the level changer 230 which is connected to an output of the inverse quantizer 210. The level changer 230 also receives additional information or a part of the additional information, as well as the level change factor which is determined by the clipping estimator 120 in a dynamic way, that is, for each time interval or frame, the level change factor can assume a different value. The level change factor is consistently applied to the plurality of frequency band signals using a plurality of multipliers or scale modification elements 231, 232 and 233. It may occur that some of the frequency band signals are relatively strong when leaving inverse quantizer 210, possibly using their respective MSBs already. When these strong frequency band signals

5

10

fifteen

twenty

25

30

35

40

Four. Five

fifty

55

60

are added within the frequency domain converter to the time domain 140, an overflow can be observed within the representation in the time domain emitted by the time domain frequency converter 140. The change factor of level determined by the trim estimator 120 and applied by the scale modification elements 231, 232, 233 makes it possible to selectively reduce (i.e., taking into account the current additional information) the levels of the frequency band signals of such that an overflow of representation in the time domain is less likely to occur. The level changer 230 further comprises a second plurality of multipliers or scale modification elements 236, 237, 238 configured to apply the specific scale factors of the frequency band to the corresponding frequency bands. Additional information may comprise M scale factors. Level changer 230 provides the plurality of frequency band signals changed from level to frequency domain converter to time domain 140 which is configured to convert frequency band signals changed from level to representation in the domain of the weather.

The audio signal decoder 100 of Figure 6 further comprises the level change compensator 150 comprising in the embodiment shown an additional scale modifier or multiplier 250 and a redproco calculator 252. The redproco calculator 252 receives the factor of level change and determines the redproca (1 / x) of the level change factor. The redproca of the level change factor is forwarded to the additional scale modification element 250 where it is multiplied by the representation in the time domain to produce the representation in the substantially compensated time domain. As an alternative to multipliers or scale modification elements 231, 232, 233 and 252 it may also be possible to use addition / subtraction elements to apply the level change factor to the plurality of frequency band signals and to the representation in the time domain

Optionally, the audio signal decoder 100 in Figure 6 further comprises a post processing element 260 connected to an output of the level change compensator 150. For example, the post processing element 260 may comprise a limiter in the time domain that has a fixed feature to reduce or eliminate any cuts that may still be present within the representation in the substantially compensated time domain, despite the provision of level exchanger 230 and level 150 offset compensator. An output of the optional post-processing element 260 provides the representation of decoded audio signal. If the optional post-processing element 260 is not present, the decoded audio signal representation may be available at the output of the level 150 compensator.

Figure 7 shows a schematic block diagram of an audio signal decoder 100 in accordance with possible additional embodiments of the present invention. A bitstream decoder / reverse quantizer 310 is configured to process an incoming bitstream and to derive the following information therefrom: the plurality of frequency band signals X1 (f), the bitstream parameters p , and a global gain g1. The bitstream parameters p may comprise the scale factors for the frequency bands and / or the overall gain g1.

The bitstream parameters p are provided to the clipping estimator 320 that derives the scale factor 1 / g2 from the bitstream parameters p. The scale factor 1 / g2 is fed to the level changer 330 which in the embodiment shown also implements a dynamic range control (DRC). The level changer 330 may also receive the bit flow parameters p or a portion thereof to apply the scale factors to the plurality of frequency band signals. The level changer 330 issues the plurality of frequency band signals changed from level X2 (f) to the inverse filter bank 340 which provides the conversion of the frequency domain to the time domain. In an outlet of the inverse filter bank 340, the representation in the time domain X3 (t) to be supplied to the level 350 compensator is provided. The level 350 compensator is a multiplier or modifying element of scale, as in the embodiment depicted in Figure 6. The level shifting compensator 350 is a part of a post-time domain processing 360 for high precision processing, for example, which supports a longer word length than the inverse filter bank 340. For example, the inverse filter bank can have a word length of 16 bits and the high precision processing performed by the subsequent time domain processing can be performed using 20 bits. As another example, the word length of the inverse filter bank 340 may be 24 bits and the word length of the high precision processing may be 30 bits. In any case, the number of bits should not be considered as limiting the scope of this patent / patent application unless explicitly stated. Processing in the subsequent time domain 360 issues the representation of decoded audio signal X4 (t).

The applied gain change g2 is fed to the implementation of the 360 limiter for compensation. The 362 limiter can be implemented with high precision.

If the clipping estimator 320 estimates no clipping, the audio samples remain substantially unchanged, that is, as if the level change and level change compensation had not been made.

5

10

fifteen

twenty

25

30

35

40

Four. Five

fifty

55

60

The trim estimator provides the redproca g2 of the level 1 / g2 change factor to a combiner 328 where it is combined with the overall gain g1 to produce a combined gain g3.

The audio signal decoder 100 further comprises a transition form setting 370 that is configured to provide smooth transitions when the combined gain g3 abruptly changes from a frame preceding a current frame (or from the current frame to a subsequent frame ). Transition form adjuster 370 can be configured to chain the current level change factor and a subsequent level change factor to obtain a chained level change factor g4 for use by the level change compensator 350. To allow smooth transition of the gain factors that change, a transition form adjustment has to be made. This tool creates a vector of gain factors g4 (t) (a factor for each sample of the corresponding audio signal). To simulate the same gain adjustment behavior that produces signal processing in the frequency domain, the same transition windows W of filter bank 340 have to be used. A frame covers a plurality of samples. The combined gain factor g3 is typically constant for the duration of a frame. The transition window W is typically a long frame and provides different window values for each sample within the frame (for example, the first half period of a cosine). Details related to a possible implementation of the transition form setting are provided in Figure 9 and in the corresponding description below.

Figure 8 schematically illustrates the effect of a level change applied to the plurality of frequency band signals. An audio signal (for example, each of the plurality of frequency band signals) can be represented using a 16-bit resolution, as symbolized by rectangle 402. Rectangle 404 schematically illustrates how the bits of the 16-bit resolution to represent the quantized sample within one of the frequency band signals provided by the pre-processing stage of decoder 110. It can be seen that the quantized sample can use a certain number of bits starting from the most significant bit (MSB) up to one last bit used for the quantized sample. The remaining bits up to the least significant bit (LSB) contain only quantization noise. This can be explained by the fact that for the current frame the corresponding frequency band signal was represented within the bit stream only by a reduced number of bits (<16 bits). Even if the full 16-bit bit resolution was used within the bit stream for the current frame and for the corresponding frequency band, the least significant bit typically contains a significant amount of quantization noise.

A rectangle 406 in Figure 8 schematically illustrates the result of the level change of the frequency band signal. Since the content of the least significant bit or bits can be expected to contain a considerable amount of quantization noise, the quantized sample can be changed to the less significant bit, substantially without losing relevant information. This can be achieved by simply changing the bits down ("right shifting"), or by actually recalculating the binary representation. In both cases, the level change factor can be memorized for subsequent compensation of the applied level change (for example, by means of the 150 or 350 level change compensator). The change in level results in additional space in the most significant bit or bits.

Figure 9 schematically illustrates a possible implementation of the transition shape adjustment 370 shown in Figure 7. The transition shape adjuster 370 may comprise a memory 371 for a previous level change factor, a first window forming system 372 configured to generate a first plurality of samples in windows by applying a window form to the current level change factor, a second window forming system 376 configured to generate a second plurality of samples in windows by applying a window form prior to previous level change factor provided by memory 371, and a sample combiner 379 configured to mutually combine samples in corresponding windows of the first plurality of samples in windows and of the second plurality of samples in windows to obtain a plurality of samples combined . The first window forming system 372 comprises a window-shaped provider 373 and a multiplier 374. The second window-forming system 376 comprises a previous window-shaped provider 377 and an additional multiplier 378. The multiplier 374 and the multiplier Additional 378 emit vectors over time. In the case of the first window formation system 372 each vector element corresponds to the multiplication of the current combined gain factor g3 (t) (constant during the current frame) with the current window form provided by the window form provider 373. In the case of the second window formation system 376 each vector element corresponds to the multiplication of the previous combined gain factor g3 (tT) (constant during the previous frame) with the previous window form provided by the supplier in a way from previous window 377.

According to the embodiment illustrated schematically in Figure 9, the gain factor of the previous frame must be multiplied by the "second half" window of the filter bank 340, while the actual gain factor is multiplied by the window sequence of the "first half". These two vectors can be added to form a gain vector g4 (t) to be multiplied by element with the audio signal X3 (t) (see Figure 7).

5

10

fifteen

twenty

25

30

35

40

Four. Five

fifty

55

60

The window shapes can be guided by the additional information w of the filter bank 340, if required.

The window form and the previous window form can also be used by the frequency domain converter to the time domain 340 such that the same window form and the previous window form are used to convert the band signals of frequency changed from level to representation in the time domain and for the formation of windows of the current level change factor and the previous level change factor.

The current level change factor may be valid for a current frame of the plurality of frequency band signals. The previous level change factor may be valid for an earlier frame of the plurality of frequency band signals. The current frame and the previous frame can be superimposed, for example by 50%.

The transition shape setting 370 can be configured to combine the previous level factor with a second portion of the previous window shape that results in a previous frame factor sequence. The transition shape setting 370 can also be configured to combine the current level change factor with a first portion of the current window shape that results in a current frame factor sequence. A chained level change factor sequence can be determined based on the previous frame factor sequence and the current frame factor sequence.

The proposed approach is not necessarily limited to decoders, but also the encoders can have a gain or limiter adjustment in combination with a filter bank that can benefit from the proposed method.

Figure 10 illustrates how the decoder pre-processing stage 110 and the clipping estimator 120 are connected. The pre-processing stage of the decoder 110 corresponds to or comprises the code book determinator 1110. The clipping estimator 120 comprises a estimation unit 1120. The code book determiner 1110 is adapted to determine a code book from a plurality of code books as an identified code book, in which the audio signal has been encoded using the code book codes identified. The estimation unit 1120 is adapted to derive a level value, for example an energy value, an amplitude value or a loudness value, associated with the codebook identified as a derived level value. In addition, the estimation unit 1120 is adapted to estimate a level estimate, for example an energy estimate, an amplitude estimate or a loudness estimate, of the audio signal using the value of the derived level. For example, the code book determiner 1110 can determine the code book, which has been used by an encoder to encode the audio signal, receiving additional information transmitted along with the encoded audio signal. In particular, the additional information may comprise information that identifies the code book used to encode a considered section of the audio signal. Such information can be transmitted, for example, from the encoder to the decoder as a number, which identifies a Huffman code book used to encode the considered section of the audio signal.

Figure 11 illustrates an estimation unit according to an embodiment. The estimation unit comprises a level value 1210 derivative and a 1220 scale modification unit. The level value derivative is adapted to derive a level value associated with the identified code book, that is, the code book which was used to encode the spectral data by the encoder, looking for the level value in a memory, requesting the level value from a local database or requesting the level value associated with the codebook identified from a remote computer. In one embodiment, the level value, which is sought or requested by the level value derivative, can be an average level value indicating an average level of an unmodified spectral value on an encoded scale when using the identified code book .

By this, the derived level value is not calculated from the actual spectral values but instead, an average level value is used that depends only on the codebook used. As explained above, the encoder is generally adapted to select the code book from a plurality of code books that are better adapted to encode the respective spectral data of an audio signal section. Since code books differ, for example with respect to their absolute maximum values that can be encoded, the average value that is encoded by a Huffman codebook differs from the codebook to codebook and, therefore, also The average value level of a spectral coefficient encoded by a particular codebook differs from the codebook to the codebook.

Therefore, according to one embodiment, an average level value for encoding a spectral coefficient of an audio signal using a particular Huffman codebook can be determined for each Huffman codebook and, for example, can be stored in a memory, a database or on a remote computer. The level value wrapper then simply has to search or request the level value associated with the identified codebook that has been used to encode the spectral data, to obtain the derived level value associated with the identified codebook.

5

10

fifteen

twenty

25

30

35

40

Four. Five

fifty

55

60

However, it should be borne in mind that Huffman code books are often used to encode unmodified spectral values to scale, as is the case for the MPEG AAC. Then, however, the scale modification should be taken into account when a level estimate is made. Therefore, the estimation unit of Figure 11 also comprises a scale modification unit 1220. The scale modification unit is adapted to derive a scale factor related to the encoded audio signal or a portion of the signal of encoded audio as a derived scale factor. For example, with respect to a decoder, the scale modification unit 1220 will determine a scale factor for each band of scale factor. For example, the scale modification unit 1220 may receive information about the scale factor of a scale factor band by receiving additional information transmitted from an encoder to the decoder. The scale modification unit 1220 is further adapted to determine a level value modified to scale based on the scale factor and the derived level value.

In one embodiment, where the derived level value is a derived energy value, the scale modification unit is adapted to apply the derived scale factor to the derived energy value to obtain a level value modified to scale by multiplying the value of energy derived by the square of the derived scale factor.

In another embodiment, where the derived level value is a derived amplitude value, and the scale modification unit is adapted to apply the derived scale factor to the derived amplitude value to obtain a level value modified to scale by multiplying the value. of amplitude derived by the derived scale factor.

In a further embodiment, in which the derived level value is a derived loudness value, and the scale modification unit 1220 is adapted to apply the derived scale factor to the derived loudness value to obtain a modified level value to scale by multiplying the loudness value derived by the cube of the derived scale factor. There are alternative ways to calculate loudness such as using a 3/2 exponent. In general, scale factors have to be transformed to the loudness domain, when the derived level value is a loudness value.

These embodiments take into account that an energy value is determined based on the square of the spectral coefficients of an audio signal, that an amplitude value is determined based on the absolute values of the spectral coefficients of an audio signal, and that a loudness value is determined based on the spectral coefficients of an audio signal that has been transformed to the loudness domain.

The estimation unit is adapted to estimate an audio signal level estimate using the level value modified to scale. In the embodiment of Figure 11, the estimation unit is adapted to issue the level value modified to scale as the level estimate. In this case, post-processing of the level value modified to scale is not carried out. However, as illustrated in the embodiment of Figure 12, the estimation unit can also be adapted to carry out post-processing. Therefore, the estimation unit of Figure 12 comprises a post-processor 1230 for post-processing one or more level values modified to scale to estimate a level estimate. For example, the level estimation of the estimation unit can be determined by post-processor 1230 by determining an average value of a plurality of level values modified to scale. This averaged value can be issued by the unit of estimate as the level estimate.

Compared to the presented embodiments, a state of the art approach to estimate, for example, the energy of a band of scale factor will be to perform the Huffman decoding and inverse quantification for all spectral values and calculate the energy by adding the square of all the quantified spectral values inversely.

In the proposed embodiments, however, this computationally complex process of the state of the art is replaced by an estimate of the average level that only depends on the scale factor and the uses of the code book and not on the actual quantified values.

The embodiments of the present invention employ the fact that a Huffman code book is designed to provide optimal coding after specialized statistics. This means that the codebook has been designed according to the probability of the data, for example, AAC-ELD (AAC-ELD = advanced audio coding - under enhanced delay): spectral lines. This process can be reversed to obtain the probability of the data according to the code book. The probability of each data entry within a code book (index) is given by the code word length. For example,

p (index) = 2A-length (code word)

that is to say

p (code) = 2_l ° ngitud (code word)

5

10

fifteen

twenty

25

30

35

40

Four. Five

fifty

55

60

in which p (index) is the probability of a data entry (an index) within a code book.

Based on this, the expected level can be pre-calculated and stored as follows: each index represents a sequence of integer values (x), for example, spectral lines, where the length of the sequence depends on the size of the book of codes, for example, 2 or 4 for AAC-ELD.

Figure 13a and 13b illustrate a method for generating a level value, for example an energy value, an amplitude value or a loudness value, associated with a code book according to an embodiment. The method comprises:

Determine a sequence of numerical values associated with a code word from the code book for each code word in the code book (step 1310). As explained above, a code book encodes a sequence of numerical values, for example, 2 or 4 numerical values by a code word from the code book. The code book comprises a plurality of code books to encode a plurality of sequences of numerical values. The sequence of numerical values, which is determined, is the sequence of numerical values that is encoded by the code word considered in the code book. Step 1310 is carried out for each code word in the code book. For example, if the code book comprises 81 code words, 81 sequences of numerical values are determined in step 1310.

In step 1320, an inverse quantized sequence of numerical values is determined for each code word in the code book by applying an inverse quantizer to the numerical values of the sequence of numerical values of a code word for each code word of the code code book. As explained above, an encoder can generally use quantification when encoding the spectral values of the audio signal, for example non-uniform quantization. As a consequence, this quantification has to be reversed on the decoder side.

Then, in step 1330, a sequence of level values is determined for each code word in the code book.

If an energy value is to be generated as the level value of the code book, then a sequence of energy values is determined for each code word, and the square of each value of the quantized sequence is calculated inversely of the numerical values for each code word in the code book.

If, however, an amplitude value is to be generated as the value of the code book level, then a sequence of amplitude values is determined for each code word, and the absolute value of each sequence value is calculated. inversely quantified numerical values for each code word in the code book.

If, though, a loudness value is to be generated as the level value of the codebook, then a sequence of loudness values is determined for each codeword, and the cube of each value of the quantized sequence of Inverse form of numerical values for each code word in the code book. There are alternative ways to calculate loudness such as using a 3/2 exponent. In general, the values of the inverse quantized sequence of numerical values have to be transformed to the loudness domain, when a loudness value is to be generated as the value of the codebook level.

Subsequently, in step 1340, a level sum value is calculated for each code word in the code book by adding the values of the level value sequence for each code word in the code book.

Then, in step 1350, a weighted probability level sum value is determined for each code word in the code book by multiplying the level sum value of a code word by a probability value associated with the code word. for each code word in the code book. By this, it is taken into account that some of the sequences of numerical values, for example, sequences of spectral coefficients, will not appear as often as other sequences of spectral coefficients. The probability value associated with the code word takes this into account. A probability value of this type can be derived from the code word length, since the code words that are most likely to appear are encoded using code words that are shorter in length, while other words of code code that is less likely to appear will be coded using code words that are longer in length, when Huffman coding is used.

In step 1360, a weighted average probability level sum value will be determined for each code word in the code book by dividing the weighted probability level sum value of a code word by a dimension value associated with the book of codes for each code word in the code book. A dimension value indicates the number of spectral values that are encoded by a word of

5

10

fifteen

twenty

25

30

35

40

Four. Five

fifty

55

60

code from the code book. By this, a weighted, summed average level sum value is determined that represents a level value (weighted probability) for a spectral coefficient that is encoded by the code word.

Then, in step 1370, the level value of the codebook is calculated by adding the summed values of the average weighted probability level of all code words.

It has to be noted, that such a generation of a level value has to be done only once for a code book. If the level value of a code book is determined, this value can be searched and used simply, for example by an apparatus for level estimation in accordance with the embodiments described above.

Next, a method for generating an energy value associated with a code book according to an embodiment is presented. To estimate the expected energy value of the data encoded with the given codebook, the following steps have to be performed only once for each codebook code index:

A) apply the inverse quantizer to the integer values of the sequence (for example AAC-ELD: XA (4/3)).

B) calculate the energy by squared each value of the sequence of A)

C) create the sum of the sequence B)

D) multiply C) with the given probability of the index

E) divide by the code book size to obtain the expected energy per spectral line.

Finally, all the values calculated by E) must be added to obtain the expected energy of the complete codebook.

After the output of these stages is stored in a table, the estimated energy values can be searched simply based on the codebook code, that is, depending on which codebook is used. Actual spectral values do not have to undergo Huffman decoding for this estimate.

To estimate the total energy of the spectral data of a complete audio frame, the scale factor must be taken into account. The scale factor can be extracted from the bit stream without a significant amount of complexity. The scale factor can be modified before it is applied to the expected energy, for example the square of the scale factor used can be calculated. Then the expected energy is multiplied by the square of the scale factor used.

In accordance with the embodiments described above, the spectral value for each band of scale factor can be estimated without decoding the spectral values subjected to Huffman coding. Level estimates can be used to identify flows with a low level, for example with low power, which are typically those that do not typically result in cuts. Therefore, the complete decoding of these can be omitted.

According to one embodiment, a level estimation apparatus further comprises a memory or a database having a plurality of memory values stored in the code book level indicating a level value that is associated with a level codebook, in which each of the plurality of codebooks has a memory value of the codebook level associated with the one stored in the memory or database. In addition, the level value wrapper is configured to derive the level value associated with the codebook identified by deriving a level memory value from the codebook associated with the codebook identified from the memory or from the database. .

The estimated level according to the embodiments described above may vary if an additional processing stage as a prediction, such as prediction filtering, is applied in the codec, for example, for AAC-ELD TNS filtering (temporary noise shaping). At this point, the prediction coefficients are transmitted within the bit stream, for example, for tNs as PARCOR coefficients.

Figure 14 illustrates an embodiment in which the estimation unit further comprises a prediction filter adjuster 1240. The prediction filter adjuster is adapted to derive one or more prediction filter coefficients related to the encoded audio signal or with a portion of the audio signal encoded as derived prediction filter coefficients. In addition, the prediction filter adjuster is adapted to obtain an adjusted level value of the prediction filter based on the prediction filter coefficients and the derived level value. In addition, the estimation unit is adapted to estimate an audio signal level estimate using the adjusted level value of the prediction filter.

5

10

fifteen

twenty

25

30

35

40

Four. Five

fifty

55

In one embodiment, the PARCOR coefficients for TNS are used as the prediction filter coefficients. The prediction gain of the filtering process can be determined from those coefficients in a very efficient way. With respect to TNS, the prediction gain can be calculated according to the formula: gain = 1 / prod (1-parcor.A2).

For example, if the 3 PARCOR coefficients have to be taken into account, for example, parcori, parcor2 and parcor3, the gain is calculated according to the formula:

. one

gain = ^

(1 - parcoif j (l - parcor, - parcor3)

For n PARCOR coefficients parcori, parcor2, ... parcorn, the following formula is applied:

image 1

This means that the amplification of the audio signal through the filtering can be estimated without applying the filtering operation itself.

Figure 15 shows a schematic block diagram of an encoder 1500 that implements the proposed gain adjustment that "avoids" the filter bank. The audio signal encoder 1500 is configured to provide an encoded audio signal representation based on a time domain representation of an input audio signal. The representation in the time domain can be, for example, an audio input signal modulated by pulse coding.

The audio signal encoder comprises a clipping estimator 1520 configured to analyze the time domain representation of the input audio signal to determine a current level change factor for the input signal representation. The audio signal encoder further comprises a level changer 1530 configured to change a level of the representation in the time domain of the input audio signal according to the level change factor to obtain a representation in the domain of the Level changed time. A time domain to frequency domain converter 1540 (for example, a filter bank, such as a quadrature mirror filter bank, a discrete modified cosine transform, etc.) is configured to convert the representation into the time domain changed level in a plurality of frequency band signals. The audio signal encoder 1500 also comprises a level change compensator 1550 configured to act on the plurality of frequency band signals to at least partially compensate for a level change applied to the representation in the time domain changed from level to the level changer 1530 and to obtain a plurality of substantially compensated frequency band signals.

The audio signal encoder 1500 may further comprise a bit / noise allocation, quantifier and encoding component 1510 and a psychoacoustic model 1508. The psychoacoustic model 1508 determines the time-frequency masking thresholds variable in (and / or resolutions of individual frame and individual frequency band quantification, and scale factors) based on the PCM input audio signal, which will be used by bit / noise allocation, quantizer and 1610 encoding. Details related to a possible implementation of the psychoacoustic model and other aspects of perceptual audio coding can be found, for example, in the international standards ISO / IEC 11172-3 and ISO / IEC 13818-3. The bit / noise allocation, quantizer, and encoding 1510 are configured to quantify the plurality of frequency band signals according to their individual frame and individual frequency band quantization resolutions, and to provide this data to a formatter bit stream 1505 that emits an encoded bit stream to be provided to one or more audio signal decoders. The bit / noise allocation, quantizer and encoding 1510 can be configured to determine additional information in addition to the plurality of quantized frequency signals. This additional information can also be provided to bit stream formatter 1505 for inclusion in the bit stream.

Figure 16 shows a schematic flow chart of a method for decoding an encoded audio signal representation to obtain a decoded audio signal representation. The method comprises a step 1602 of preprocessing the encoded audio signal representation to obtain a plurality of frequency band signals. In particular, the preprocessing may comprise unpacking a bit stream in data corresponding to successive frames, and re-quantifying (inverse quantization) the data related to the

5

10

fifteen

twenty

25

30

35

40

Four. Five

fifty

55

60

frequency band according to the specific quantization resolutions of the frequency band to obtain a plurality of frequency band signals.

In a step 1604 of the method for decoding, the additional information regarding a gain of the frequency band signals is analyzed to determine a current level change factor for the representation of encoded audio signal. The gain related to the frequency band signals can be individual for each frequency band signal (for example, known scaling factors in some perceptual audio coding schemes or similar parameters) or common for all band signals of frequency (for example, the overall gain known in some perceptual audio coding schemes). The analysis of the additional information allows to gather information about a loudness of the encoded audio signal during the available frame. The loudness, in turn, may indicate a tendency for the representation of decoded audio signal to be cut. The level change factor is typically determined as a value that avoids such a cut while retaining a relevant dynamic range and / or relevant information content of (all) frequency band signals.

The method for decoding further comprises a step 1606 for changing the levels of the frequency band signal according to the level change factor. In case the frequency band signals are changed from level to a lower level, the level change creates some additional space in the most significant bit or bits of a binary representation of the frequency band signals. This additional space may be necessary when converting the plurality of frequency band signals from the frequency domain to the time domain to obtain a representation in the time domain, which is performed at a later stage 1608. In particular, the space additionally reduces the risk that the representation in the time domain will be cut if some of the frequency band signals are close to an upper limit related to their amplitude and / or power. As a consequence, the conversion of the frequency domain to the time domain can be performed using a relatively short word length.

The method for decoding also comprises a step 1609 to act on the representation in the time domain to at least partially compensate for a level change applied to the frequency band signals changed in level. Subsequently, a substantially compensated time domain representation is obtained.

Accordingly, a method for decoding an encoded audio signal representation to a decoded audio signal representation comprises:

- preprocess the encoded audio signal representation to obtain a plurality of frequency band signals;

- analyze the additional information regarding a gain of the frequency band signals to determine a current level change factor for the encoded audio signal representation;

- change the levels of the frequency band signals according to the level change factor to obtain the frequency band signals changed in level,

- perform a conversion from the frequency domain to the time domain of the frequency band signals to a representation in the time domain; Y

- act on the representation in the time domain to at least partially compensate for a change in level applied to the frequency band signals changed in level and to obtain a representation in the substantially compensated time domain.

According to additional aspects, analyzing the additional information may include: determining a probability of trimming based on the additional information and determining the current level change factor based on the probability of trimming.

According to additional aspects, the additional information may comprise at least one of a global gain factor for the plurality of frequency band signals and a plurality of scale factors, each scale factor corresponding to a frequency band signal of the plurality of frequency band signals.

According to additional aspects, the preprocessing of the encoded audio signal representation may comprise obtaining the plurality of frequency band signals in the form of a plurality of successive frames, and analyzing the additional information may comprise determining the factor of change of current level for a current frame.

According to additional aspects, the representation of decoded audio signal can be determined based on the representation in the substantially compensated time domain.

According to additional aspects, the method may also include: applying a limiter feature

5

10

fifteen

twenty

25

30

35

40

Four. Five

fifty

55

60

in the subsequent time domain to act on the representation in the time domain to at least partially compensate for the level change.

According to additional aspects, the additional information regarding the gain of the frequency band signals may comprise a plurality of gain factors related to the frequency band.

In accordance with additional aspects, preprocessing the encoded audio signal may comprise re-quantifying each frequency band signal using a specific quantization indicator of the frequency band of a plurality of specific quantization indicators of the frequency band.

According to additional aspects, the method may further comprise making a transition-shaped adjustment, the transition form adjustment comprising: chaining the current level change factor and a subsequent level change factor to obtain a change factor of level chained for use during the compensation action at least partially of the level change.

According to additional aspects, the transition form adjustment may further comprise:

- temporarily store a previous level change factor,

- generate a first plurality of samples in windows by applying a window shape to the current level change factor,

- generate a second plurality of samples in windows by applying a window form prior to the previous level change factor provided by the action of temporarily storing the previous level change factor, and

- mutually combining the samples in corresponding windows of the first plurality of samples in windows and of the second plurality of samples in windows to obtain a plurality of samples combined.

According to additional aspects, the window form and the previous window form can also be used by converting the frequency domain to the time domain so the same window form and the previous window form are used to convert frequency band signals changed level in the time domain representation and to apply window formation of the current level change factor and the previous level change factor.

According to additional aspects, the current level change factor may be valid for a current frame of the plurality of the frequency band signals, in which the previous level change factor may be valid for a previous frame of the plurality of frequency band signals, and in which the current frame and the previous frame may overlap. The transition form setting can be configured

- to combine the previous level change factor with a second portion of the previous window form that results in a previous frame factor sequence,

- to combine the current level change factor with a first portion of the current window shape that results in a current frame factor sequence, and

- to determine a chained level change factor sequence based on the previous frame factor sequence and the current frame factor sequence.

According to additional aspects, analyzing the additional information can be done with respect to whether the additional information suggests a potential cut within the representation in the time domain which means that a less significant bit does not contain relevant information, and in which In this case, the change in level changes the information to the least significant bit, so releasing a more significant bit gains some space in the most significant bit.

According to additional aspects, an information program can be provided to implement the method for decoding or the method for encoding, when the computer program is executed on a computer or signal processor.

Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method stage or a characteristic of a method stage. Similarly, the aspects described in the context of a method stage also represent a description of a corresponding block or element or feature of a corresponding apparatus.

The inventive decomposed signal can be stored in a digital storage medium or it can be transmitted in a transmission medium such as a wireless transmission medium or a wire transmission medium such as the internet.

5

10

fifteen

twenty

25

30

35

40

Four. Five

fifty

Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM, or a FLASH memory, which have electronically readable control signals stored therein, which cooperate (or are able to cooperate) with a programmable computer system so that the respective method is performed.

Some embodiments according to the invention comprise a non-transient data carrier that has electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed. .

In general, the embodiments of the present invention can be implemented as a computer program product with a program code, the operational program code being used to perform one of the methods when the computer program product is run on a computer. The program code can be stored, for example, in a machine-readable carrier.

Other embodiments include the computer program for performing one of the methods described herein, stored in a machine-readable carrier.

In other words, an embodiment of the inventive method is, therefore, an computer program that has a program code to perform one of the methods described herein, when the computer program is run on a computer.

A further embodiment of the inventive method is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded therein, the computer program for performing one of the methods described in This document.

A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the signal sequence can be configured, for example, to be transferred via a data communication connection, for example via the internet.

A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured for or adapted to perform one of the methods described herein.

A further embodiment comprises a computer that has the computer program installed therein to perform one of the methods described herein.

In some embodiments, a programmable logic device (for example a matrix field of programmable doors) can be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a programmable door array field may cooperate with a microprocessor to perform one of the methods described herein. In general, the methods are preferably performed by any hardware device.

The embodiments described above are simply illustrative for the principles of the present invention. It is understood that the modifications and variations of the provisions and details described herein will be apparent to other experts in the art. It is proposed, therefore, that it be limited only by the scope of the neighboring patent claims and not by the specific details presented by way of description and explanation of the embodiments herein.

Claims (17)

  1. 5
    10
    fifteen
    twenty
    25
    30
    35
    40
    Four. Five
    fifty
    55
    60
    1. An audio signal decoder (100) configured to provide a decoded audio signal representation based on an encoded audio signal representation, the audio signal decoder comprising:
    a decoder pre-processing stage (110) configured to obtain a plurality of frequency band signals from the representation of the encoded audio signal;
    a clipping estimator (120) configured to analyze the additional information regarding a gain of the frequency band signals of the representation of the encoded audio signal as to whether the additional information suggests a potential cut to determine a factor of change of current level for the representation of encoded audio signal, in which when additional information suggests potential clipping, the current level change factor causes the information of the plurality of frequency band signals to be changed to one bit less significant in such a way that a space is gained in at least one more significant bit;
    a level changer (130) configured to change the levels of the frequency band signals according to the current level change factor to obtain the frequency band signals changed of level;
    a frequency domain to time domain converter (140) configured to convert the level band frequency signals to a time domain representation; Y
    a level change compensator (150) configured to act on the time domain representation to compensate at least partially for a level change applied to the frequency band signals changed level by the level changer (130) and for obtain a substantially compensated time domain representation.
  2. 2. Audio signal decoder (100) according to claim 1, wherein the clipping estimator (120) is further configured to determine a trimming probability based on at least one of the additional information and the signal representation of encoded audio, and to determine the current level change factor based on the probability of clipping.
  3. 3. Audio signal decoder (100) according to claim 1 or 2, wherein the additional information comprises at least one of a global gain factor for the plurality of frequency band signals and a plurality of factors of scale, each scale factor corresponding to a frequency band signal or a group of frequency band signals within the plurality of frequency band signals.
  4. 4. Audio signal decoder (100) according to any one of the preceding claims, wherein the pre-processing stage of decoder (110) is configured to obtain the plurality of frequency band signals in the form of a plurality of successive frames, and in which the clipping estimator (120) is configured to determine the current level change factor for a current frame.
  5. 5. Audio signal decoder (100) according to any one of the preceding claims, wherein the representation of decoded audio signal is determined based on the representation in the substantially compensated time domain.
  6. 6. Audio signal decoder (100) according to any one of the preceding claims, further comprising a limiter in the downstream time domain of the level change compensator (150).
  7. 7. Audio signal decoder (100) according to any one of the preceding claims, wherein the additional information regarding the gain of the frequency band signals comprises a plurality of gain factors related to the band of frequency.
  8. 8. Audio signal decoder (100) according to any one of the preceding claims, wherein the decoder pre-processing stage (110) comprises a reverse quantizer configured to quantify each frequency band signal using an indicator of specific quantification of the frequency band of a plurality of specific quantification indicators of the frequency band.
  9. 9. Audio signal decoder (100) according to any one of the preceding claims, further comprising a transition shape adjuster configured to chain the current level change factor and a subsequent level change factor to obtain a level change factor chained for use by the level change compensator (150).
  10. 10. Audio signal decoder (100) according to claim 9, wherein the shape adjuster
    5
    10
    fifteen
    twenty
    25
    30
    35
    40
    Four. Five
    fifty
    55
    60
    Transition comprises a memory (371) for a previous level change factor, a first window formation system (372) configured to generate a first plurality of samples in windows by applying a window shape to the current level change factor, a second window formation system (376) configured to generate a second plurality of samples in windows by applying a window form prior to the previous level change factor provided by the memory (371), and a sample combiner (379) configured to mutually combining the samples in corresponding windows of the first plurality of samples in windows and of the second plurality of samples in windows to obtain a plurality of samples combined.
  11. 11. Audio signal decoder (100) according to claim 10,
    wherein the current level change factor is valid for a current frame of the plurality of frequency band signals, in which the previous level change factor is valid for an earlier frame of the plurality of band signals of frequency, and in which the current frame and the previous frame overlap;
    in which the transition shape setting is set
    to combine the previous level change factor with a second portion of the previous window form that results in a previous frame factor sequence,
    to combine the current level change factor with a portion of the current window shape that results in a current frame factor sequence, and
    to determine a chained level change factor sequence based on the previous frame factor sequence and the current frame factor sequence.
  12. 12. Audio signal decoder (100) according to any one of the preceding claims, wherein the clipping estimator (120) is configured to analyze at least one of the encoded audio signal representation and additional information with as to whether at least one of the coded audio signal representation and the additional information suggest a potential cut within the time domain representation which means that a less significant bit does not contain relevant information, and in which in this If the level change applied by the level changer changes the information to the least significant bit, so releasing a more significant bit gains some space in the most significant bit.
  13. 13. Audio signal decoder (100) according to any one of the preceding claims, wherein the trimmer estimator (120) comprises:
    a codebook determiner (1110) for determining a codebook from a plurality of codebooks as an identified codebook, in which the representation of encoded audio signal has been encoded using the codebook identified , Y
    an estimation unit (1120) configured to derive a level value associated with the codebook identified as a derived level value and, to estimate an audio signal level estimate using the derived level value.
  14. 14. Audio signal encoder configured to provide an encoded audio signal representation based on a time domain representation of an input audio signal, the audio signal encoder comprising:
    a clipping estimator configured to analyze the representation in the time domain of the input audio signal as to sf the potential clipping is suggested to determine a current level change factor for the representation of the input signal, in which when potential clipping is suggested, the current level change factor causes the representation in the time domain of the input audio signal to be changed to a less significant bit so that a space is gained in at least one more bit significant;
    a level changer configured to change a level of the representation in the time domain of the input audio signal according to the current level change factor to obtain a representation in the time domain changed of level;
    a time domain to frequency domain converter configured to convert the representation in the time domain changed from level to a plurality of frequency band signals; Y
    a level change compensator configured to act on the plurality of frequency band signals to at least partially compensate for a level change applied to the representation in the time domain changed by the level changer and to obtain a plurality of substantially compensated frequency band signals.
  15. 15. Method for decoding an encoded audio signal representation and to provide a corresponding decoded audio signal representation, the method comprising:
    5
    10
    fifteen
    twenty
    25
    30
    35
    40
    preprocess the encoded audio signal representation to obtain a plurality of frequency band signals;
    analyze the additional information regarding a gain of the frequency band signals as to whether the additional information suggests a potential cut to determine a current level change factor for the encoded audio signal representation, in which when the additional information suggests the potential cut-off, the current level change factor causes the information of the plurality of frequency band signals to be changed to a less significant bit so that a space is gained in at least one more significant bit;
    change the levels of the frequency band signals according to the level change factor to obtain the frequency band signals changed of level;
    perform a conversion from the frequency domain to the time domain of the frequency band signals to a representation in the time domain; Y
    act on the representation in the time domain to at least partially compensate for a change in level applied to the frequency band signals changed in level and to obtain a representation in the substantially compensated time domain.
  16. 16. Audio signal coding method to provide a coded audio signal representation based on a time domain representation of an input audio signal, the method comprising:
    analyze the time domain representation of the input audio signal as to whether potential clipping is suggested to determine a current level change factor for the representation of input signal, in which when the potential clipping is suggested, the Current level change factor causes the representation in the time domain of the input audio signal to be changed to a less significant bit, so that a space is gained in at least one more significant bit;
    change a level of representation in the time domain of the input audio signal according to the current level change factor to obtain a representation in the time domain changed level;
    converting the representation in the time domain changed in level to a plurality of frequency band signals; Y
    act on the plurality of frequency band signals to at least partially compensate for a change in level applied to the representation in the time domain changed by changing the level and to obtain a plurality of substantially compensated frequency band signals.
  17. 17. Computer program adapted to order a computer to perform the method of claim 15 or 16.
ES14702195.0T 2013-01-18 2014-01-07 Level adjustment in the time domain for decoding or encoding of audio signals Active ES2604983T3 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP13151910 2013-01-18
EP13151910.0A EP2757558A1 (en) 2013-01-18 2013-01-18 Time domain level adjustment for audio signal decoding or encoding
PCT/EP2014/050171 WO2014111290A1 (en) 2013-01-18 2014-01-07 Time domain level adjustment for audio signal decoding or encoding

Publications (1)

Publication Number Publication Date
ES2604983T3 true ES2604983T3 (en) 2017-03-10

Family

ID=47603376

Family Applications (1)

Application Number Title Priority Date Filing Date
ES14702195.0T Active ES2604983T3 (en) 2013-01-18 2014-01-07 Level adjustment in the time domain for decoding or encoding of audio signals

Country Status (11)

Country Link
US (1) US9830915B2 (en)
EP (2) EP2757558A1 (en)
JP (1) JP6184519B2 (en)
KR (2) KR101953648B1 (en)
CN (1) CN105210149B (en)
BR (1) BR112015017293A2 (en)
CA (1) CA2898005C (en)
ES (1) ES2604983T3 (en)
MX (1) MX346358B (en)
RU (1) RU2608878C1 (en)
WO (1) WO2014111290A1 (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2262108B1 (en) 2004-10-26 2017-03-01 Dolby Laboratories Licensing Corporation Adjusting the perceived loudness and/or the perceived spectral balance of an audio signal
TWI447709B (en) 2010-02-11 2014-08-01 Dolby Lab Licensing Corp System and method for non-destructively normalizing loudness of audio signals within portable devices
CN103325380B (en) 2012-03-23 2017-09-12 杜比实验室特许公司 Gain for signal enhancing is post-processed
CN107403624A (en) 2012-05-18 2017-11-28 杜比实验室特许公司 System for maintaining the reversible dynamic range control information associated with parametric audio coders
EP2757558A1 (en) * 2013-01-18 2014-07-23 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Time domain level adjustment for audio signal decoding or encoding
PL2901449T3 (en) 2013-01-21 2018-05-30 Dolby Laboratories Licensing Corporation Audio encoder and decoder with program loudness and boundary metadata
KR102071860B1 (en) 2013-01-21 2020-01-31 돌비 레버러토리즈 라이쎈싱 코오포레이션 Optimizing loudness and dynamic range across different playback devices
WO2014128275A1 (en) 2013-02-21 2014-08-28 Dolby International Ab Methods for parametric multi-channel encoding
CN107093991A (en) 2013-03-26 2017-08-25 杜比实验室特许公司 Loudness method for normalizing and equipment based on target loudness
CN110083714A (en) 2013-04-05 2019-08-02 杜比实验室特许公司 Acquisition, recovery and the matching to the peculiar information from media file-based for autofile detection
CN108364657A (en) 2013-07-16 2018-08-03 华为技术有限公司 Handle the method and decoder of lost frames
US9521501B2 (en) 2013-09-12 2016-12-13 Dolby Laboratories Licensing Corporation Loudness adjustment for downmixed audio content
CN109785851A (en) 2013-09-12 2019-05-21 杜比实验室特许公司 Dynamic range control for various playback environments
KR20160090796A (en) * 2013-11-27 2016-08-01 마이크로칩 테크놀로지 인코포레이티드 Main clock high precision oscillator
CN105142067B (en) 2014-05-26 2020-01-07 杜比实验室特许公司 Audio signal loudness control
CN105225666B (en) 2014-06-25 2016-12-28 华为技术有限公司 The method and apparatus processing lost frames
CN107112023A (en) 2014-10-10 2017-08-29 杜比实验室特许公司 Based on the program loudness for sending unrelated expression
JP6699564B2 (en) * 2015-02-10 2020-05-27 ソニー株式会社 Transmission device, transmission method, reception device, and reception method
CN104795072A (en) * 2015-03-25 2015-07-22 无锡天脉聚源传媒科技有限公司 Method and device for coding audio data
CN105662706B (en) * 2016-01-07 2018-06-05 深圳大学 Enhance the artificial cochlea's signal processing method and system of time domain expression
US10331400B1 (en) * 2018-02-22 2019-06-25 Cirrus Logic, Inc. Methods and apparatus for soft clipping

Family Cites Families (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1996018996A1 (en) 1994-12-15 1996-06-20 British Telecommunications Public Limited Company Speech processing
US6280309B1 (en) 1995-10-19 2001-08-28 Norton Company Accessories and attachments for angle grinder
US5796842A (en) * 1996-06-07 1998-08-18 That Corporation BTSC encoder
US6289309B1 (en) * 1998-12-16 2001-09-11 Sarnoff Corporation Noise spectrum tracking for speech enhancement
JP3681105B2 (en) * 2000-02-24 2005-08-10 アルパイン株式会社 Data processing method
AU3385100A (en) * 2000-02-29 2001-09-12 Qualcomm Inc Closed-loop multimode mixed-domain linear prediction speech coder
US6651040B1 (en) * 2000-05-31 2003-11-18 International Business Machines Corporation Method for dynamic adjustment of audio input gain in a speech system
CA2359771A1 (en) * 2001-10-22 2003-04-22 Dspfactory Ltd. Low-resource real-time audio synthesis system and method
JP2003280691A (en) * 2002-03-19 2003-10-02 Sanyo Electric Co Ltd Voice processing method and voice processor
US20050004793A1 (en) * 2003-07-03 2005-01-06 Pasi Ojala Signal adaptation for higher band coding in a codec utilizing band split coding
DE10345995B4 (en) * 2003-10-02 2005-07-07 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for processing a signal having a sequence of discrete values
US7751572B2 (en) * 2005-04-15 2010-07-06 Dolby International Ab Adaptive residual audio coding
BRPI0616624A2 (en) * 2005-09-30 2011-06-28 Matsushita Electric Ind Co Ltd Speech coding apparatus and speech coding method
DE102006022346B4 (en) * 2006-05-12 2008-02-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Information signal coding
CA2645913C (en) * 2007-02-14 2012-09-18 Lg Electronics Inc. Methods and apparatuses for encoding and decoding object-based audio signals
US9653088B2 (en) * 2007-06-13 2017-05-16 Qualcomm Incorporated Systems, methods, and apparatus for signal encoding using pitch-regularizing and non-pitch-regularizing coding
US8126578B2 (en) * 2007-09-26 2012-02-28 University Of Washington Clipped-waveform repair in acoustic signals using generalized linear prediction
US20100266142A1 (en) * 2007-12-11 2010-10-21 Nxp B.V. Prevention of audio signal clipping
CN101350199A (en) * 2008-07-29 2009-01-21 北京中星微电子有限公司 Audio encoder and audio encoding method
BRPI0919880B1 (en) * 2008-10-29 2020-03-03 Dolby International Ab Method and apparatus to protect against the signal ceifing of an audio sign derived from digital audio data and transcoder
US8346547B1 (en) * 2009-05-18 2013-01-01 Marvell International Ltd. Encoder quantization architecture for advanced audio coding
CN103250206B (en) * 2010-10-07 2015-07-15 弗朗霍夫应用科学研究促进协会 Apparatus and method for level estimation of coded audio frames in a bit stream domain
KR20200058593A (en) * 2011-07-01 2020-05-27 돌비 레버러토리즈 라이쎈싱 코오포레이션 System and method for adaptive audio signal generation, coding and rendering
AU2012351565B2 (en) * 2011-12-15 2015-09-03 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus, method and computer programm for avoiding clipping artefacts
EP2757558A1 (en) * 2013-01-18 2014-07-23 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Time domain level adjustment for audio signal decoding or encoding

Also Published As

Publication number Publication date
MX2015009171A (en) 2015-11-09
KR20150106929A (en) 2015-09-22
WO2014111290A1 (en) 2014-07-24
US9830915B2 (en) 2017-11-28
KR20170104661A (en) 2017-09-15
KR101953648B1 (en) 2019-05-23
EP2946384B1 (en) 2016-11-02
US20160019898A1 (en) 2016-01-21
MX346358B (en) 2017-03-15
EP2757558A1 (en) 2014-07-23
JP6184519B2 (en) 2017-08-23
CN105210149B (en) 2019-08-30
CA2898005C (en) 2018-08-14
RU2608878C1 (en) 2017-01-25
CA2898005A1 (en) 2014-07-24
BR112015017293A2 (en) 2018-05-15
EP2946384A1 (en) 2015-11-25
JP2016505168A (en) 2016-02-18
CN105210149A (en) 2015-12-30

Similar Documents

Publication Publication Date Title
US9715883B2 (en) Multi-mode audio codec and CELP coding adapted therefore
US10403293B2 (en) Apparatus for encoding and decoding of integrated speech and audio
US8938387B2 (en) Audio encoder and decoder
ES2701456T3 (en) Coding of multichannel audio signals using complex prediction and differential coding
US8275626B2 (en) Apparatus and a method for decoding an encoded audio signal
ES2709755T3 (en) Stereo decoding of complex prediction based on TCMD
KR100958144B1 (en) Audio Compression
US8548801B2 (en) Adaptive time/frequency-based audio encoding and decoding apparatuses and methods
EP1905000B1 (en) Selectively using multiple entropy models in adaptive coding and decoding
EP2255358B1 (en) Scalable speech and audio encoding using combinatorial encoding of mdct spectrum
US8583445B2 (en) Method and apparatus for processing a signal using a time-stretched band extension base signal
TWI466106B (en) Audio or video encoder, audio or video decoder and related methods for processing multi-channel audio or video signals using a variable prediction direction
TWI455113B (en) Audio signal decoder, audio signal encoder, method and computer program for providing a decoded audio signal representation and method and computer program for providing an encoded representation of an audio signal
KR101373004B1 (en) Apparatus and method for encoding and decoding high frequency signal
CA2609539C (en) Audio codec post-filter
RU2625444C2 (en) Audio processing system
JP5208901B2 (en) Method for encoding audio and music signals
EP1495464B1 (en) Device and method for encoding a time-discrete audio signal and device and method for decoding coded audio data
CA2691993C (en) Audio encoder for encoding an audio signal having an impulse-like portion and stationary portion, encoding methods, decoder, decoding method, and encoded audio signal
JP5243661B2 (en) Audio signal encoder, audio signal decoder, method for providing a coded representation of audio content, method for providing a decoded representation of audio content, and computer program for use in low-latency applications
RU2389085C2 (en) Method and device for introducing low-frequency emphasis when compressing sound based on acelp/tcx
RU2641224C2 (en) Adaptive band extension and device therefor
CN101903945B (en) Encoder, decoder, and encoding method
TWI469136B (en) Apparatus and method for processing a decoded audio signal in a spectral domain
KR101858466B1 (en) Coding generic audio signals at low bitrates and low delay