EP3544003B1

EP3544003B1 - Device and method of determining an estimated value

Info

Publication number: EP3544003B1
Application number: EP19167397.9A
Authority: EP
Inventors: Michael Schug; Johannes Hilpert; Stefan Geyersberger; Max Neuendorf
Original assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Current assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Priority date: 2004-03-01
Filing date: 2005-02-17
Publication date: 2020-12-23
Anticipated expiration: 2025-02-17
Also published as: JP2007525715A; BRPI0507815A; NO338917B1; EP2034473A2; CA2559354C; RU2337414C2; WO2005083680A1; DE102004009949A1; IL176978A0; ES2847237T3; PT2034473T; RU2006134638A; HK1093813A1; NO20064432L; KR100852482B1; CN1938758A; EP2034473A3; KR20060121978A; EP3544003A1; CN1938758B

Abstract

The device and method are used for a video or audio signal (100). A first step (102) provides levels for allowable interference (nb(b)) and the signal energy in a given frequency band (e(b)). These signals are processed in a second step (104) which receives a frequency band energy distribution signal (nl(b)) from a third step (106) and calculates an estimated value (pe).

Description

Die vorliegende Erfindung bezieht sich auf Codierer zum Codieren eines Signals, das Audio- und/oder Videoinformationen umfasst, und insbesondere auf die Abschätzung für einen Bedarf von Informationseinheiten zum Codieren dieses Signals.The present invention relates to coders for coding a signal comprising audio and / or video information, and more particularly to estimating a need for information units for coding this signal.

Nachfolgend wird der bekannte Codierer dargestellt. An einem Eingang 1000 wird ein zu codierendes Audiosignal eingespeist. Dieses wird zunächst einer Skalierungsstufe 1002 zugeführt, in der eine sogenannte AAC-Verstärkungssteuerung durchgeführt wird, um den Pegel des Audiosignals festzulegen. Seiteninformationen aus der Skalierung werden einem Bitstromformatierer 1004 zugeführt, wie es durch den Pfeil zwischen dem Block 1002 und dem Block 1004 dargestellt ist. Das skalierte Audiosignal wird hierauf einer MDCT-Filterbank 1006 zugeführt. Beim AAC-Codierer implementiert die Filterbank eine modifizierte diskrete Cosinustransformation mit 50 % überlappenden Fenstern, wobei die Fensterlänge durch einen Block 1008 bestimmt wird.The known encoder is shown below. An audio signal to be coded is fed in at an input 1000. This is first fed to a scaling stage 1002, in which what is known as AAC gain control is carried out in order to determine the level of the audio signal. Page information from the scaling is fed to a bitstream formatter 1004, as shown by the arrow between block 1002 and block 1004. The scaled audio signal is then fed to an MDCT filter bank 1006. In the AAC coder, the filter bank implements a modified discrete cosine transform with 50% overlapping windows, the window length being determined by a block 1008.

Allgemein gesagt ist der Block 1008 dazu vorhanden, dass transiente Signale mit kürzeren Fenstern gefenstert werden, und dass eher stationäre Signale mit längeren Fenstern gefenstert werden. Dies dient dazu, dass aufgrund der kürzeren Fenster für transiente Signale eine höhere Zeitauflösung (auf Kosten der Frequenzauflösung) erreicht wird, während für eher stationäre Signale eine höhere Frequenzauflösung (auf Kosten der Zeitauflösung) durch längere Fenster erreicht wird, wobei tendenziell längere Fenster bevorzugt werden, da sie einen größeren Codiergewinn versprechen. Am Ausgang der Filterbank 1006 liegen zeitlich betrachtet aufeinanderfolgende Blöcke von Spektralwerten vor, die je nach Ausführungsform der Filterbank MDCT-Koeffizienten, Fourier-Koeffizienten oder auch Subbandsignale sein können, wobei jedes Subbandsignal eine bestimmte begrenzte Bandbreite hat, die durch den entsprechenden Subbandkanal in der Filterbank 1006 festgelegt wird, und wobei jedes Subbandsignal eine bestimmte Anzahl von Subband-Abtastwerten aufweist.Generally speaking, block 1008 is provided for windowing transient signals with shorter windows and windowing more stationary signals with longer windows. The purpose of this is that due to the shorter window for transient signals a higher time resolution is achieved (at the expense of the frequency resolution), while for more stationary signals a higher frequency resolution (at the expense of the time resolution) through longer windows is achieved, with longer windows tending to be preferred because they promise a greater coding gain. At the output of the filter bank 1006 there are temporally successive blocks of spectral values which, depending on the embodiment of the filter bank, can be MDCT coefficients, Fourier coefficients or even sub-band signals, each sub-band signal having a certain limited bandwidth that is determined by the corresponding sub-band channel in the filter bank 1006 is determined, and each sub-band signal has a certain number of sub-band samples.

Nachfolgend wird beispielhaft der Fall dargestellt, bei dem die Filterbank zeitlich betrachtet aufeinanderfolgende Blöcke von MDCT-Spektralkoeffizienten ausgibt, die allgemein gesagt, aufeinanderfolgende Kurzzeitspektren des zu codierenden Audiosignals am Eingang 1000 darstellen. Ein Block von MDCT-Spektralwerten wird dann in einen TNS-Verarbeitungsblock 1010 eingespeist, in dem eine zeitliche Rauschformung stattfindet (TNS = temporal noise shaping). Die TNS-Technik wird dazu verwendet, um die zeitliche Form des Quantisierungsrauschens innerhalb jedes Fensters der Transformation zu formen. Dies wird dadurch erreicht, dass ein Filterprozess auf Teile der Spektraldaten jedes Kanals angewendet wird. Die Codierung wird auf einer Fensterbasis durchgeführt. Insbesondere werden die folgenden Schritte ausgeführt, um das TNS-Tool auf ein Fenster spektraler Daten, also auf einen Block von Spektralwerten anzuwenden.The following is an example of the case in which the filter bank outputs successive blocks of MDCT spectral coefficients viewed over time, which, generally speaking, represent successive short-term spectra of the audio signal to be encoded at input 1000. A block of MDCT spectral values is then fed into a TNS processing block 1010, in which temporal noise shaping takes place (TNS = temporal noise shaping). The TNS technique is used to shape the temporal shape of the quantization noise within each window of the transformation. This is achieved by applying a filtering process to parts of the spectral data of each channel. The coding is done on a window basis. In particular, the following steps are carried out in order to use the TNS tool on a window of spectral data, i.e. on a block of spectral values.

Zunächst wird ein Frequenzbereich für das TNS-Tool ausgewählt. Eine geeignete Auswahl besteht darin, einen Frequenzbereich von 1,5 kHz bis zum höchsten möglichen Skalenfaktorband mit einem Filter abzudecken. Es sei darauf hingewiesen, dass dieser Frequenzbereich von der Abtastrate abhängt, wie es im AAC-Standard (ISO/IEC 14496-3: 2001 (E)) spezifiziert ist.First, a frequency range is selected for the TNS tool. A suitable choice is to cover a frequency range from 1.5 kHz up to the highest possible scale factor band with a filter. It should be noted that this frequency range depends on the sampling rate as specified in the AAC standard (ISO / IEC 14496-3: 2001 (E)).

Anschließend wird eine LPC-Berechnung (LPC = linear predictive coding = lineare prädiktive Codierung) ausgeführt, und zwar mit den spektralen MDCT-Koeffizienten, die in dem ausgewählten Zielfrequenzbereich liegen. Für eine erhöhte Stabilität werden Koeffizienten, die Frequenzen unter 2,5 kHz entsprechen, aus diesem Prozess ausgeschlossen. Übliche LPC-Prozeduren, wie sie aus der Sprachverarbeitung bekannt sind, können für die LPC-Berechnung verwendet werden, beispielsweise der bekannte Levinson-Durbin-Algorithmus. Die Berechnung wird für die maximal zulässige Ordnung des Rauschformungsfilters ausgeführt.An LPC calculation (LPC = linear predictive coding) is then carried out, specifically with the spectral MDCT coefficients which lie in the selected target frequency range. For increased stability, coefficients corresponding to frequencies below 2.5 kHz are excluded from this process. Usual LPC procedures, as they are known from speech processing, can be used for the LPC calculation, for example the known Levinson-Durbin algorithm. The calculation is carried out for the maximum allowable order of the noise shaping filter.

Als Ergebnis der LPC-Berechnung wird der erwartete Prädiktionsgewinn PG erhalten. Ferner werden die Reflexionskoeffizienten oder Parcor-Koeffizienten erhalten.The expected prediction gain PG is obtained as the result of the LPC calculation. Furthermore, the reflection coefficients or Parcor coefficients are obtained.

Wenn der Prädiktionsgewinn eine bestimmte Schwelle nicht überschreitet, wird das TNS-Tool nicht angewendet. In diesem Fall wird eine Steuerinformation in den Bitstrom geschrieben, damit ein Decodierer weiß, dass keine TNS-Verarbeitung ausgeführt worden ist.If the prediction gain does not exceed a certain threshold, the TNS tool is not used. In this case, control information is written into the bit stream so that a decoder knows that TNS processing has not been carried out.

Wenn der Prädiktionsgewinn jedoch eine Schwelle überschreitet, wird die TNS-Verarbeitung angewendet.However, if the prediction gain exceeds a threshold, TNS processing is applied.

In einem nächsten Schritt werden die Reflexionskoeffizienten quantisiert. Die Ordnung des verwendeten Rauschformungsfilters wird durch Entfernen aller Reflexionskoeffizienten mit einem Absolutwert kleiner als eine Schwelle von dem "Schwanz" des Reflexionskoeffizienten-Arrays bestimmt. Die Anzahl der verbleibenden Reflexionskoeffizienten liegt in der Größenordnung des Rauschformungsfilters. Eine geeignete Schwelle liegt bei 0,1.In a next step, the reflection coefficients are quantized. The order of the noise shaping filter used is determined by removing all reflection coefficients with an absolute value less than a threshold from the "tail" of the reflection coefficient array. The number of remaining reflection coefficients is in the order of magnitude of the noise shaping filter. A suitable threshold is 0.1.

Die verbleibenden Reflexionskoeffizienten werden typischerweise in lineare Prädiktionskoeffizienten umgewandelt, wobei diese Technik auch als "Step-Up"-Prozedur bekannt ist.The remaining reflection coefficients are typically converted to linear prediction coefficients, a technique also known as the "step-up" procedure.

Die berechneten LPC-Koeffizienten werden dann als Codierer-Rauschformungsfilterkoeffizienten, also als Prädiktionsfilterkoeffizienten verwendet. Dieses FIR-Filter wird über den spezifizierten Zielfrequenzbereich geführt. Bei der Decodierung wird ein autoregressives Filter verwendet, während bei der Codierung ein sogenanntes Moving-Average-Filter verwendet wird. Schließlich werden noch die Seiteninformationen für das TNS-Tool dem Bitstromformatierer zugeführt, wie es durch den Pfeil dargestellt ist, der zwischen dem Block TNS-Verarbeitung 1010 und dem Bitstromformatierer 1004 in Fig. 3 gezeigt ist.The calculated LPC coefficients are then used as encoder noise shaping filter coefficients, that is to say as prediction filter coefficients. This FIR filter is guided over the specified target frequency range. An autoregressive filter is used for decoding, while a so-called moving average filter is used for coding. Finally, the page information for the TNS tool is also fed to the bitstream formatter, as shown by the arrow between the block TNS processing 1010 and the bitstream formatter 1004 in FIG Fig. 3 is shown.

Hierauf werden mehrere in Fig. 3 nicht gezeigte optionale Tools durchlaufen, wie beispielsweise ein Langzeitprädiktions-Tool, ein Intensity/Kopplungs-Tool, ein Prädiktions-Tool, ein Rauschsubstitutions-Tool, bis schließlich zu einem Mitte/Seite-Codierer 1012 gelangt wird. Der Mitte/Seite-Codierer 1012 ist dann aktiv, wenn das zu codierende Audiosignal ein Multikanalsignal ist, also ein Stereosignal mit einem linken Kanal und einem rechten Kanal. Bisher, also in der Verarbeitungsrichtung vor dem Block 1012 in Fig. 3 wurden der linke und der rechte Stereokanal getrennt voneinander verarbeitet, also skaliert, durch die Filterbank transformiert, der TNS-Verarbeitung unterzogen oder nicht etc.Several in Fig. 3 Run through optional tools, not shown, such as a long-term prediction tool, an intensity / coupling tool, a prediction tool, a noise substitution tool, until finally a middle / side encoder 1012 is reached. The middle / side encoder 1012 is active when the audio signal to be encoded is a multi-channel signal, that is to say a stereo signal with a left channel and a right channel. So far, i.e. in the processing direction before block 1012 in Fig. 3 were the left and right stereo channels processed separately from each other, i.e. scaled, transformed through the filter bank, subjected to TNS processing or not, etc.

Im Mitte/Seite-Codierer wird dann zunächst überprüft, ob eine Mitte/Seite-Codierung sinnvoll ist, also überhaupt einen Codiergewinn bringt. Eine Mitte/Seite-Codierung wird dann einen Codiergewinn bringen, wenn der linke und der rechte Kanal eher ähnlich sind, da dann der Mitte-Kanal, also die Summe aus dem linken und dem rechten Kanal nahezu gleich dem linken oder dem rechten Kanal ist, abgesehen von der Skalierung durch den Faktor 1/2, während der Seite-Kanal nur sehr kleine Werte hat, da er gleich der Differenz zwischen dem linken und dem rechten Kanal ist. Damit ist zu sehen, dass dann, wenn der linke und der rechte Kanal annähernd gleich sind, die Differenz annähernd Null ist bzw. nur ganz kleine Werte umfasst, die - so ist die Hoffnung - in einem nachfolgenden Quantisierer 1014 zu Null quantisiert werden und somit sehr effizient übertragen werden können, da dem Quantisierer 1014 ein Entropie-Codierer 1016 nachgeschaltet ist.In the middle / side coder, it is then first checked whether middle / side coding makes sense, that is, whether there is any coding gain at all. A middle / side coding will bring a coding gain if the left and right channels are more similar, because then the middle channel, i.e. the sum of the left and right channels, is almost the same as the left or right channel, apart from the scaling by the factor 1/2, while the side channel has only very small values, since it is equal to the difference between the left and right channels. It can thus be seen that when the left and right channels are approximately the same, the difference is approximately zero or only includes very small values which - this is the hope - are quantized to zero in a subsequent quantizer 1014 and thus can be transmitted very efficiently, since the quantizer 1014 is followed by an entropy coder 1016.

Dem Quantisierer 1014 wird von einem psycho-akustischen Modell 1020 eine erlaubte Störung pro Skalenfaktorband zugeführt. Der Quantisierer arbeitet iterativ, d. h. es wird zunächst eine äußere Iterationsschleife aufgerufen, die dann eine innere Iterationsschleife aufruft. Allgemein gesagt wird zunächst, ausgehend von Quantisiererschrittweiten-Startwerten, eine Quantisierung eines Blocks von Werten am Eingang des Quantisierers 1014 vorgenommen. Insbesondere quantisiert die innere Schleife die MDCT-Koeffizienten, wobei eine bestimmte Anzahl von Bits verbraucht wird. Die äußere Schleife berechnet die Verzerrung und modifizierte Energie der Koeffizienten unter Verwendung des Skalenfaktors, um wieder eine innere Schleife aufzurufen. Dieser Prozess wird iteriert, bis ein bestimmter Bedingungssatz erfüllt ist. Für jede Iteration in der äußeren Iterationsschleife wird dabei das Signal rekonstruiert, um die durch die Quantisierung eingeführte Störung zu berechnen und mit der von dem psycho-akustischen Modell 1020 gelieferten erlaubten Störung zu vergleichen. Ferner werden die Skalenfaktoren von Iteration zu Iteration um eine Stufe vergrößert, und zwar für jede Iteration der äußeren Iterationsschleife.A psychoacoustic model 1020 supplies the quantizer 1014 with a permitted disturbance per scale factor band. The quantizer works iteratively, ie an outer iteration loop is first called, which then calls an inner iteration loop. Generally speaking, starting from quantizer step size start values, a block of values is first quantized at the input of quantizer 1014. In particular, the inner loop quantizes the MDCT coefficients, consuming a certain number of bits. The outer loop calculates the distortion and modified energy of the coefficients using the scale factor to call an inner loop again. This process is iterated until a certain set of conditions is reached is satisfied. For each iteration in the outer iteration loop, the signal is reconstructed in order to calculate the disturbance introduced by the quantization and to compare it with the permitted disturbance supplied by the psycho-acoustic model 1020. Furthermore, the scale factors are increased by one level from iteration to iteration, specifically for each iteration of the outer iteration loop.

Dann, wenn eine Situation erreicht ist, bei der die durch die Quantisierung eingeführte Quantisierungsstörung unterhalb der durch das psycho-akustische Modell bestimmten erlaubten Störung ist, und wenn gleichzeitig Bitanforderungen erfüllt sind, nämlich, dass eine Maximalbitrate nicht überschritten wird, wird die Iteration, also das Analyse-Durch-Synthese-Verfahren beendet, und es werden die erhaltenen Skalenfaktoren codiert, wie es in dem Block 1014 ausgeführt ist und in codierter Form dem Bitstromformatierer 1004 zugeführt, wie es durch den Pfeil gekennzeichnet ist, der zwischen dem Block 1014 und dem Block 1004 gezeichnet ist. Die quantisierten Werte werden dann dem Entropie-Codierer 1016 zugeführt, der typischerweise unter Verwendung mehrerer Huffman-Code-Tabellen für verschiedene Skalenfaktorbänder eine Entropie-Codierung durchführt, um die quantisierten Werte in ein binäres Format zu übertragen. Wie es bekannt ist, wird bei der Entropie-Codierung in Form der Huffman-Codierung auf Code-Tabellen zurückgegriffen, die aufgrund einer erwarteten Signalstatistik erstellt werden, und bei denen häufig auftretende Werte kürzere Code-Wörter bekommen als seltener auftretende Werte. Die entropiecodierten Werte werden dann ebenfalls als eigentliche Hauptinformationen dem Bitstromformatierer 1004 zugeführt, der dann gemäß einer bestimmten Bitstromsyntax ausgangsseitig das codierte Audiosignal ausgibt.Then, when a situation is reached in which the quantization interference introduced by the quantization is below the permitted interference determined by the psycho-acoustic model, and when bit requirements are met at the same time, namely that a maximum bit rate is not exceeded, the iteration becomes the analysis-through-synthesis method is terminated, and the scale factors obtained are encoded, as is carried out in block 1014, and supplied in encoded form to the bitstream formatter 1004, as indicated by the arrow between block 1014 and the Block 1004 is drawn. The quantized values are then fed to the entropy coder 1016, which typically entropy encodes using several Huffman code tables for different scale factor bands to translate the quantized values into a binary format. As is known, in entropy coding in the form of Huffman coding, code tables are used which are created on the basis of expected signal statistics, and in which frequently occurring values are given shorter code words than less frequently occurring values. The entropy-coded values are then also fed to the bit stream formatter 1004 as the actual main information, which then outputs the coded audio signal on the output side according to a specific bit stream syntax.

Die Datenreduktion von Audiosignalen ist mittlerweile eine bekannte Technik, die Gegenstand einer Reihe von Internationalen Standards ist (z.B. ISO/MPEG-1, MPEG-2 AAC, MPEG-4).The data reduction of audio signals has become a well-known technique that is the subject of a number of international standards (e.g. ISO / MPEG-1, MPEG-2 AAC, MPEG-4).

Gemeinsam ist den oben genannten Verfahren, dass das Eingangssignal mittels eines sogenannten Encoders unter Ausnutzung wahrnehmungsbezogener Effekte (Psychoakustik, Psychooptik) in eine kompakte, datenreduzierte Darstellung gebracht wird. Hierzu wird üblicherweise eine Spektralanalyse des Signals vorgenommen und die entsprechende Signalkomponenten werden unter Berücksichtigung eines Wahrnehmungsmodells quantisiert und anschließend in möglichst kompakter Weise als sogenannter Bitstrom codiert.What the above-mentioned methods have in common is that the input signal is brought into a compact, data-reduced representation by means of a so-called encoder using perception-related effects (psychoacoustics, psychooptics). For this purpose, a spectral analysis of the signal is usually carried out and the corresponding signal components are quantized, taking into account a perception model, and then coded as a so-called bit stream in the most compact way possible.

Um vor der eigentlichen Quantisierung abzuschätzen, wie viele Bits ein bestimmter zu codierender Abschnitt des Signals benötigen wird, kann die sogenannte Perceptual Entropy (PE) herangezogen werden. Die PE liefert auch ein Maß dafür, wie schwierig es für den Encoder ist, ein bestimmtes Signal oder Teile davon zu codieren.So-called perceptual entropy (PE) can be used to estimate before the actual quantization how many bits a certain section of the signal to be coded will need. The PE also provides a measure of how difficult it is for the encoder to encode a particular signal or parts of it.

Entscheidend für die Qualität der Abschätzung ist die Abweichung der PE von der Anzahl tatsächlich benötigter Bits.The deviation of the PE from the number of bits actually required is decisive for the quality of the estimate.

Ferner kann die Perceptual Entropy bzw. jeder Schätzwert für einen Bedarf von Informationseinheiten zum Codieren eines Signals dafür herangezogen werden, abzuschätzen, ob das Signal transient oder stationär ist, da transiente Signale ebenfalls mehr Bits zum Codieren benötigen als eher stationäre Signale. Die Abschätzung einer transienten Eigenschaft eines Signal wird beispielsweise dazu verwendet, um eine Fensterlängenentscheidung, wie sie um Block 1008 in Fig. 3 angedeutet ist, durchzuführen.Furthermore, the perceptual entropy or any estimated value for a need for information units for coding a signal can be used to estimate whether the signal is transient or stationary, since transient signals also require more bits for coding than stationary signals. The estimation of a transient property of a signal is used, for example, to make a window length decision as indicated by block 1008 in Fig. 3 is indicated to perform.

In Fig. 6 ist die Perceptual Entropy berechnet nach ISO/IEC IS 13818-7 (MPEG-2 advanced audio coding (AAC)) dargestellt. Zu Berechnung dieser Perceptual Entropy, also einer bandweisen Perceptual Entropy wird die in Fig. 6 dargestellte Gleichung verwendet. In dieser Gleichung steht der Parameter pe für die Perceptual Entropy. Ferner steht width(b) für die Anzahl der Spektralkoeffizienten im jeweiligen Band b. Ferner ist e(b) die Energie des Signals in diesem Band. Schließlich ist nb(b) die dazu passende Maskierungsschwelle bzw. allgemeiner ausgedrückt, die erlaubte Störung, die in das Signal eingebracht werden kann, beispielsweise durch eine Quantisierung, damit ein menschlicher Hörer dennoch keine oder nur eine verschwindend geringe Störung hört.In Fig. 6 the perceptual entropy calculated according to ISO / IEC IS 13818-7 (MPEG-2 advanced audio coding (AAC)) is shown. To calculate this perceptual entropy, i.e. a band-wise perceptual entropy, the in Fig. 6 is used. In this equation, the parameter pe stands for the perceptual entropy. Furthermore, width (b) stands for the number of spectral coefficients in the respective band b. Furthermore, e (b) is the energy of the signal in this band. Finally, nb (b) is the matching masking threshold or, in more general terms, the permitted interference that can be introduced into the signal, for example by quantization, so that a human listener still hears no or only a negligible interference.

Die Bänder können von der Bandeinteilung des psychoakustischen Modells (Block 1020 in Fig. 3) stammen, oder es handelt sich um die bei der Quantisierung verwendeten sogenannten Skalenfaktorbänder (scfb). Die psychoakustische Maskierungsschwelle ist der Energiewert, den der Quantisierungsfehler nicht überschreiten sollte.The bands can be derived from the band division of the psychoacoustic model (block 1020 in Fig. 3 ) or the so-called scale factor bands (scfb) used in the quantization. The psychoacoustic masking threshold is the energy value that the quantization error should not exceed.

Die in Fig. 6 gezeigte Abbildung zeigt somit, wie gut eine so bestimmte Perceptual Entropy als Abschätzung für die Anzahl der zur Codierung benötigten Bits funktioniert. Hierzu wurde am Beispiel eines AAC-Codierers bei unterschiedlichen Bitraten für jeden einzelnen Block die jeweilige Perceptual Entropy in Abhängigkeit von den verbrauchten Bits aufgetragen. Das verwendete Teststück beinhaltet eine typische Mischung aus Musik, Sprache und Einzelinstrumenten.In the Fig. 6 The figure shown thus shows how well a perceptual entropy determined in this way works as an estimate for the number of bits required for coding. For this purpose, using the example of an AAC coder at different bit rates, the respective perceptual entropy was plotted for each individual block as a function of the bits used. The test piece used contains a typical mixture of music, language and individual instruments.

Idealerweise würden sich die Punkte entlang einer Geraden durch den Nullpunkt versammeln. Die Ausdehnung der Punktfolge mit den Abweichungen von der idealen Linie verdeutlicht die ungenaue Abschätzung.Ideally, the points would congregate along a straight line through the zero point. The expansion of the point sequence with the deviations from the ideal line illustrates the imprecise estimate.

Nachteilig an dem in Fig. 6 gezeigten Konzept ist also die Abweichung, die sich dahin gehend äußert, dass sich z.B. ein zu großer Wert für die Perceptual Entropy ergibt, was wiederum bedeutet, dass dem Quantisierer signalisiert wird, dass mehr Bits als eigentlich erforderlich, benötigt werden. Dies führt dazu, dass der Quantisierer zu fein quantisiert, dass er also nicht das Maß an erlaubter Störung ausschöpft, was in einem reduzierten Codiergewinn resultiert. Andererseits, wenn der Wert für die Perceptual Entropy zu klein ermittelt wird, so wird dem Quantisierer signalisiert, dass weniger Bits als eigentlich erforderlich, zur Codierung des Signals benötigt werden. Dies wiederum hat zur Folge, dass der Quantisierer zu grob quantisiert, was unmittelbar zu einer hörbaren Störung im Signal führen würde, sofern nicht Gegenmaßnahmen ergriffen werden. Die Gegenmaßnahmen können darin bestehen, dass der Quantisierer noch eine oder mehrere weitere Iterationsschleifen benötigt, was die Rechenzeit des Codierers ansteigen lässt.The disadvantage of the in Fig. 6 The concept shown is the deviation that expresses itself to the effect that, for example, the value for the perceptual entropy is too large, which in turn means that the quantizer is signaled that more bits are required than actually required. This has the result that the quantizer quantizes too finely, that is to say that it does not exhaust the degree of permitted interference, which results in a reduced coding gain. On the other hand, if the value for the perceptual entropy is determined to be too small, the quantizer is signaled that fewer bits than actually required are required to encode the signal. This in turn has the consequence that the quantizer quantizes too roughly, which would immediately lead to an audible disturbance in the signal, unless countermeasures are taken. The countermeasures can be that the quantizer still needs one or more further iteration loops, which increases the computing time of the encoder.

Zur Verbesserung der Berechnung der Perceptual Entropy könnte man, wie es in Fig. 7 gezeigt ist, einen konstanten Term, wie beispielsweise 1,5, in den Logarithmus-Ausdruck einführen. Dann ergibt sich bereits ein besseres Ergebnis, also eine geringere Abweichung nach oben bzw. unten, obgleich dennoch zu sehen ist, dass bei der Berücksichtigung eines konstanten Terms im Logarithmus-Ausdruck zwar der Fall reduziert ist, dass die Perceptual Entropy einen zu optimistischen Bedarf an Bits signalisiert. Andererseits ist aus Fig. 7 jedoch deutlich zu erkennen, dass signifikant eine zu hohe Anzahl an Bits signalisiert wird, was dazu führt, dass der Quantisierer immer zu fein quantisieren wird, dass also der Bitbedarf größer angenommen wird, als er eigentlich ist, was wiederum in einem reduzierten Codiergewinn resultiert. Die Konstante in dem Logarithmus-Ausdruck ist eine grobe Abschätzung der für die Seiteninformationen benötigten Bits.To improve the calculation of the perceptual entropy one could, as it is in Fig. 7 as shown, introduce a constant term such as 1.5 into the logarithm expression. Then there is already a better result, i.e. a smaller deviation upwards or downwards, although it can still be seen that when taken into account of a constant term in the logarithm expression, the case is reduced that the perceptual entropy signals an overly optimistic need for bits. On the other hand it's over Fig. 7 However, it can be clearly seen that too high a number of bits is being signaled, which means that the quantizer will always quantize too finely, i.e. that the bit requirement is assumed to be greater than it actually is, which in turn results in a reduced coding gain. The constant in the logarithm expression is a rough estimate of the bits needed for the side information.

So liefert das Einfügen eines Terms in den Logarithmus-Ausdruck zwar eine Verbesserung der bandweisen Perceptual Entropy, wie es in Fig. 6 dargestellt ist, da die Bänder mit sehr geringem Abstand zwischen Energie und Maskierungsschwelle eher berücksichtigt werden, da auch für die Übertragung von zu Null quantisierten Spektralkoeffizienten eine gewisse Anzahl von Bits nötig ist.Inserting a term into the logarithm expression does improve the band-wise perceptual entropy, as shown in Fig. 6 is shown, since the bands with a very small distance between energy and masking threshold are more likely to be taken into account, since a certain number of bits is also necessary for the transmission of spectral coefficients quantized to zero.

Eine weitere, jedoch sehr Rechenzeit-aufwendige Berechnung der Perceptual Entropy ist in Fig. 8 dargestellt. In Fig. 8 ist der Fall gezeigt, bei dem die Perceptual Entropy linienweise berechnet wird. Der Nachteil liegt jedoch in dem höheren Rechenaufwand der linienweisen Berechnung. Hier werden anstelle der Energie Spektralkoeffizienten X(k) eingesetzt, wobei kOffset (b) den ersten Index von Band b bezeichnet. Wenn Fig. 8 mit Fig. 7 verglichen wird, so ist deutlich im Bereich zwischen 2000 und 3000 Bit eine Reduzierung der "Ausschläge" nach oben zu erkennen. Die PE-Schätzung wird daher genauer sein, also nicht zu pessimistisch schätzen, sondern eher am Optimum liegen, so dass der Codiergewinn im Vergleich zu den in Fig. 6 und 7 gezeigten Berechnungsverfahren ansteigen kann, bzw. die Anzahl der Iterationen im Quantisierer wird reduziert.Another calculation of the perceptual entropy, which is very time consuming, is in Fig. 8 shown. In Fig. 8 the case is shown in which the perceptual entropy is calculated line by line. The disadvantage, however, is the higher computational effort involved in the line-by-line calculation. Instead of the energy, spectral coefficients X (k) are used, where kOffset (b) denotes the first index of band b. If Fig. 8 With Fig. 7 is compared, a reduction in the "deflections" upwards can clearly be seen in the range between 2000 and 3000 bits. The PE estimate will therefore be more precise, i.e. not estimate too pessimistically, but rather be at the optimum, so that the coding gain compared to the in Fig. 6 and 7th shown Calculation method can increase, or the number of iterations in the quantizer is reduced.

Nachteilig an der linienweise Berechnung der Perceptual Entropy ist jedoch die Rechenzeit, die benötigt wird, um die in Fig. 8 gezeigte Gleichung auszuwerten.The disadvantage of the line-by-line calculation of the perceptual entropy, however, is the computing time that is required to convert the in Fig. 8 evaluate the equation shown.

So spielen solche Rechenzeitennachteile zwar nicht unbedingt eine Rolle, wenn der Codierer auf einem leistungsstarken PC oder einer leistungsstarken Workstation läuft. Ganz anders ist sieht es dagegen aus, wenn der Codierer in einem tragbaren Gerät, wie beispielsweise einem UMTS-Handy untergebracht ist, das einerseits klein und billig sein muss, das andererseits einen niedrigen Strombedarf haben muss, und das zusätzlich schnell arbeiten muss, um die Codierung eines über die UMTS-Verbindung übertragenen Audiosignals oder Videosignals zu ermöglichen.Such computation time disadvantages do not necessarily play a role if the encoder runs on a powerful PC or a powerful workstation. On the other hand, the situation is completely different when the encoder is housed in a portable device, such as a UMTS cell phone, which on the one hand has to be small and cheap, on the other hand it has to have a low power requirement, and which also has to work quickly to get the Enabling coding of an audio signal or video signal transmitted via the UMTS connection.

Die US 2002/103637 A1 offenbart ein Konzept zum Verbessern der Leistungsfähigkeit von Codiersystemen, die Hochfrequenzrekonstruktionsverfahren einsetzen. Hierzu wird auf Seite des Encodierers eine Codierschwierigkeit oder ein Maß für die Arbeitsbelastung eines Codierers berechnet, um davon abhängig die Crossover-Frequenz zu steuern, die bestimmt, bis zu welcher Frequenz ein Signal mit einem Quell-Codierer codiert wird, wobei der Anteil des Signals oberhalb der Crossover-Frequenz durch ein Hochfrequenzrekonstruktionsverfahren codiert wird. Als Maß für die Schwierigkeit, ein Signal zu codieren, wird die Perceptual Entropy berechnet, die darauf basiert, dass ein Spektralwert quadriert wird und dann mit einer Zahl gewichtet wird, die gleich der Anzahl der Linien im aktuellen Band geteilt durch die psychoakustische Schwelle für dieses Band ist, um dann von dem Ergebnis einen Logarithmus zu bilden. Eine Aufsummierung sämtlicher solcher Logarithmen in einem Band ergibt dann die Perceptual Entropy in diesem Band. Alternativ hierzu kann auch eine Verzerrungsenergie am Ende des Quellcodierverfahrens berechnet werden, indem die Verzerrungsenergie in jedem Band aufsummiert wird und mit einer Lautheitskurve gewichtet wird.The US 2002/103637 A1 discloses a concept for improving the performance of coding systems employing radio frequency reconstruction techniques. For this purpose, a coding difficulty or a measure for the workload of an encoder is calculated on the encoder side in order to control the crossover frequency as a function of this, which determines the frequency up to which a signal is encoded with a source encoder, the proportion of the signal is encoded above the crossover frequency by a high frequency reconstruction method. As a measure of the difficulty of encoding a signal, the Perceptual Entropy is calculated, which is based on a spectral value being squared and then weighted with a number equal to the number of lines in the current band divided by the psychoacoustic threshold for this Band, and then take the logarithm of the result. Summing up all such logarithms in a band then gives the perceptual entropy in this band. As an alternative to this, a distortion energy can also be calculated at the end of the source coding process by adding up the distortion energy in each band and weighting it with a loudness curve.

Die Aufgabe der vorliegenden Erfindung besteht darin, ein effizientes und dennoch genaues Konzept zum Ermitteln eines Schätzwerts für einen Bedarf von Informationseinheiten zum Codieren eines Signals zu schaffen.The object of the present invention is to create an efficient and yet precise concept for determining an estimated value for a requirement of information units for coding a signal.

Diese Aufgabe wird durch eine Vorrichtung gemäß Patentanspruch 1, ein Verfahren gemäß Patentanspruch 12 oder ein Computerprogramm nach Patentanspruch 13 gelöst.This object is achieved by a device according to claim 1, a method according to claim 12 or a computer program according to claim 13.

Der vorliegenden Erfindung liegt die Erkenntnis zugrunde, dass an einer frequenzbandweisen Berechnung des Schätzwerts für einen Bedarf an Informationseinheiten aus Rechenzeitgründen festgehalten werden muss, dass jedoch, um eine genaue Ermittlung des Schätzwerts zu erhalten, die Verteilung der Energie in dem Frequenzband, das bandweise zu berechnen ist, berücksichtigt werden muss.The present invention is based on the knowledge that a frequency band-wise calculation of the estimated value for a requirement for information units for reasons of computing time It should be noted, however, that in order to obtain an accurate estimate of the estimate, the distribution the energy in the frequency band that is to be calculated band by band must be taken into account.

Damit wird gewissermaßen implizit der dem Quantisierer nachfolgende Entropie-Codierer in die Ermittlung des Schätzwerts für den Bedarf von Informationseinheiten "hineingezogen". Die Entropy-Codierung ermöglicht es nämlich, dass zur Übertragung von kleineren Spektralwerten eine geringere Anzahl an Bits benötigt wird als zur Übertragung von größeren Spektralwerten. Besonders effizient ist der Entropie-Codierer dann, wenn zu-Null-quantisierte Spektralwerte übertragen werden können. Da diese typischerweise am häufigsten auftreten werden, ist das Codewort zum Übertragen einer zu-Null-quantisierten Spektrallinie das kürzeste Codewort, und ist das Codewort zum Übertragen einer immer größeren quantisierten Spektrallinie immer länger. Darüber hinaus kann für ein besonders effizientes Konzept zum Übertragen einer Folge von zu-Null-quantisierten Spektralwerten sogar auf eine Lauflängencodierung zurückgegriffen werden, was zur Folge hat, dass im Falle eines Laufs von Nullen pro zu-Null-quantisiertem Spektralwert durchschnittlich betrachtet nicht einmal ein einziges Bit benötigt wird.The entropy coder following the quantizer is thus implicitly “drawn into” the determination of the estimated value for the requirement of information units. The entropy coding enables a smaller number of bits to be required for the transmission of smaller spectral values than for the transmission of larger spectral values. The entropy coder is particularly efficient when spectral values quantized to zero can be transmitted. Since these will typically occur most frequently, the code word for transmitting a spectral line quantized to zero is the shortest code word, and the code word for transmitting an ever larger quantized spectral line is always longer. In addition, for a particularly efficient concept for transmitting a sequence of spectral values quantized to zero, it is even possible to use run-length coding, which means that in the case of a run of zeros per spectral value quantized to zero, on average, not even one only bit is needed.

Es wurde herausgefunden, dass die im Stand der Technik verwendete bandweise Perceptual-Entropy-Berechnung zur Ermittlung des Schätzwerts für den Bedarf von Informationseinheiten die Wirkungsweise des nachgeschalteten Entropie-Codierers völlig ignoriert, wenn die Verteilung der Energie in dem Frequenzband von einer vollständig gleichmäßigen Verteilung abweicht.It was found that the band-by-band perceptual entropy calculation used in the prior art to determine the estimated value for the requirement of information units completely ignores the mode of operation of the downstream entropy coder if the distribution of the energy in the frequency band deviates from a completely uniform distribution .

Erfindungsgemäß wird somit zur Reduktion der Ungenauigkeiten der bandweisen Berechnung berücksichtigt, wie die Energie innerhalb eines Bandes verteilt ist.According to the invention, in order to reduce the inaccuracies of the band-by-band calculation, account is taken of how the energy is distributed within a band.

Je nach Implementierung kann das Maß für die Verteilung der Energie in dem Frequenzband auf der Basis der tatsächlichen Amplituden ermittelt werden, oder durch eine Schätzung der Frequenzlinien, die durch den Quantisierer nicht zu null quantisiert werden. Dieses Maß, das auch als "nl" bezeichnet wird, wobei nl für "number of active lines", also für die Anzahl von aktiven Linien, steht, wird aus Rechenzeit-Effizienzgründen bevorzugt. Es kann jedoch auch die Anzahl der zu null quantisierten Spektrallinien oder eine feinere Unterteilung berücksichtigt werden, wobei diese Schätzung immer genauer wird, je mehr Informationen des nachgeschalteten Entropie-Codierers berücksichtigt werden. Ist der Entropie-Codierer auf der Basis von Huffman-Codetabellen aufgebaut, so können Eigenschaften dieser Codetabellen besonders gut integriert werden, da die Codetabellen nicht aufgrund der Signalstatistik gewissermaßen on-line berechnet werden, sondern da die Codetabellen unabhängig von dem tatsächlichen Signal ohnehin feststehen.Depending on the implementation, the measure for the distribution of the energy in the frequency band can be determined on the basis of the actual amplitudes, or by estimating the frequency lines that are not quantized to zero by the quantizer. This dimension, which is also referred to as “nl”, where nl stands for “number of active lines”, that is to say for the number of active lines, is preferred for reasons of computing time efficiency. However, the number of spectral lines quantized to zero or a finer subdivision can also be taken into account, this estimation becoming more and more precise the more information from the downstream entropy coder is taken into account. If the entropy coder is built on the basis of Huffman code tables, properties of these code tables can be integrated particularly well, since the code tables are not calculated on-line based on the signal statistics, but rather because the code tables are fixed anyway, regardless of the actual signal.

Je nach Rechenzeit-Einschränkungen wird jedoch im Falle einer besonders effizienten Berechnung das Maß für die Verteilung der Energie in dem Frequenzband durch die Ermittlung der nach der Quantisierung noch überlebenden Linien, also der Anzahl von aktiven Linien, durchgeführt.Depending on the computing time restrictions, however, in the case of a particularly efficient calculation, the measure for the distribution of the energy in the frequency band is carried out by determining the lines that still survive after the quantization, that is to say the number of active lines.

Die vorliegende Erfindung ist dahingehend vorteilhaft, dass ein Schätzwert für einen Bedarf an Informationsinhalten ermittelt wird, der zum einen genauer und zum anderen effizienter als im Stand der Technik ist.The present invention is advantageous in that an estimated value for a need for information content is determined which is on the one hand more precise and on the other hand more efficient than in the prior art.

Darüber hinaus ist die vorliegende Erfindung für verschiedene Anwendungen skalierbar, da je nach erwünschter Genauigkeit des Schätzwerts immer mehr Eigenschaften des Entropie-Codierers, jedoch zum Preis einer erhöhten Rechenzeit, in die Schätzung des Bitbedarfs mit hereingenommen werden können.In addition, the present invention can be scaled for various applications, since, depending on the desired accuracy of the estimated value, more and more properties of the entropy coder can be included in the estimation of the bit requirement, but at the cost of increased computing time.

Bevorzugte Ausführungsbeispiele der vorliegenden Erfindung werden nachfolgend bezugnehmend auf die beiliegenden Zeiten detailliert erläutert. Es zeigen:

Fig. 1: ein Blockschaltbild der erfindungsgemäßen Vorrichtung zum Ermitteln eines Schätzwerts;
Fig. 2a: eine bevorzugte Ausführungsform der Einrichtung zum Berechnen eines Maßes für die Verteilung der Energie in dem Frequenzband;
Fig. 2b: eine bevorzugte Ausführungsform der Einrichtung zum Berechnen des Schätzwerts für den Bedarf an Bits;
Fig. 3: ein Blockschaltbild eines bekannten AudioCodierers;
Fig. 4: eine Prinzipdarstellung zur Erläuterung des Einflusses der Energieverteilung innerhalb eines Bandes auf die Ermittlung des Schätzwerts;
Fig. 5: ein Diagramm zur Schätzwertberechnung gemäß der vorliegenden Erfindung;
Fig. 6: ein Diagramm zur Schätzwertberechnung gemäß ISO/IEC IS 13818-7(AAC);
Fig. 7: ein Diagramm zur Schätzwertberechnung mit konstantem Term;
Fig. 8: ein Diagramm zur linienweisen Schätzwertberechnung mit konstantem Term.

Preferred embodiments of the present invention are explained in detail below with reference to the enclosed times. Show it:

Fig. 1: a block diagram of the device according to the invention for determining an estimated value;
Fig. 2a: a preferred embodiment of the device for calculating a measure for the distribution of energy in the frequency band;
Figure 2b: a preferred embodiment of the means for calculating the estimate for the demand for bits;
Fig. 3: a block diagram of a known audio encoder;
Fig. 4: a schematic diagram to explain the influence of the energy distribution within a band on the determination of the estimated value;
Fig. 5: a diagram for estimated value calculation according to the present invention;
Fig. 6: a diagram for the estimated value calculation according to ISO / IEC IS 13818-7 (AAC);
Fig. 7: a diagram for estimated value calculation with a constant term;
Fig. 8: a diagram for the line-by-line estimate calculation with constant term.

Nachfolgend wird bezugnehmend auf Fig. 1 die erfindungsgemäße Vorrichtung zum Ermitteln eines Schätzwerts für einen Bedarf von Informationseinheiten zum Codieren eines Signals dargestellt. Das Signal, das ein Audio- und/oder ein Videosignal sein kann, wird über einen Eingang 100 eingespeist. Vorzugsweise liegt das Signal bereits als spektrale Darstellung mit Spektralwerten vor. Dies ist jedoch nicht unbedingt erforderlich, da durch entsprechende z.B. Bandpass-Filterung auch einige Berechnungen mit einem Zeitsignal durchgeführt werden können.Reference is made below to Fig. 1 the device according to the invention for determining an estimated value for a requirement of information units for coding a signal is shown. The signal, which can be an audio and / or a video signal, is fed in via an input 100. The signal is preferably already available as a spectral representation with spectral values. However, this is not absolutely necessary, since some calculations can also be carried out with a time signal by means of appropriate bandpass filtering, for example.

Das Signal wird einer Einrichtung 102 zum Liefern eines Maßes für eine erlaubte Störung für ein Frequenzband des Signals zugeführt. Die erlaubte Störung kann beispielsweise mittels eines psycho-akustischen Modells, wie es anhand von Fig. 3 (Block 1020) erläutert worden ist, ermittelt werden. Die Einrichtung 102 ist ferner wirksam, um auch ein Maß für die Energie des Signals in dem Frequenzband zu liefern. Voraussetzung für eine bandweise Berechnung ist, dass ein Frequenzband, für das eine erlaubte Störung oder eine Signalenergie angegeben wird, wenigstens zwei oder mehrere Spektrallinien der spektralen Darstellung des Signals enthält. Bei typischen standardisierten Audio-Codierern wird das Frequenzband vorzugsweise ein Skalenfaktorband sein, da die Bitbedarfsschätzung unmittelbar vom Quantisierer benötigt wird, um festzustellen, ob eine erfolgte Quantisierung ein Bitkriterium erfüllt oder nicht.The signal is fed to a device 102 for providing a measure of an allowable interference for a frequency band of the signal. The permitted disturbance can, for example, by means of a psycho-acoustic model, as it is based on Fig. 3 (Block 1020) has to be determined. The means 102 is also effective to provide a measure of the energy of the signal in the frequency band. A prerequisite for a band-by-band calculation is that a frequency band for which a permitted interference or signal energy is specified contains at least two or more spectral lines of the spectral representation of the signal. In typical standardized audio encoders, the frequency band will preferably be a scale factor band because the bit requirement estimation is required directly by the quantizer in order to determine whether a quantization that has taken place fulfills a bit criterion or not.

Die Einrichtung 102 ist ausgebildet, um sowohl die erlaubte Störung nb(b), als auch die Signalenergie e(b) des Signals in dem Band einer Einrichtung 104 zum Berechnen des Schätzwerts für den Bedarf an Bits zuzuführen.The device 102 is designed to feed both the permitted interference nb (b) and the signal energy e (b) of the signal in the band to a device 104 for calculating the estimated value for the requirement for bits.

Erfindungsgemäß ist die Einrichtung 104 zum Berechnen des Schätzwerts für den Bedarf von Bits ausgebildet, um neben der erlaubten Störung und der Signalenergie ein Maß nl(b) für eine Verteilung der Energie in dem Frequenzband zu berücksichtigten, wobei die Verteilung der Energie in dem Frequenzband von einer vollständig gleichmäßigen Verteilung abweicht. Das Maß für die Verteilung der Energie wird in einer Einrichtung 106 berechnet, wobei die Einrichtung 106 zumindest ein Band, nämlich das betrachtete Frequenzband des Audio- oder Videosignals entweder als Bandpass-Signal oder direkt als Folge von Spektrallinien benötigt, um z.B. eine spektrale Analyse des Bandes durchführen zu können, um das Maß für die Verteilung der Energien im Frequenzband zu erhalten.According to the invention, the means 104 for calculating the estimated value for the requirement of bits is designed to take into account, in addition to the permitted interference and the signal energy, a measure nl (b) for a distribution of the energy in the frequency band, the distribution of the energy in the frequency band from deviates from a completely even distribution. The measure for the distribution of the energy is calculated in a device 106, the device 106 requiring at least one band, namely the considered frequency band of the audio or video signal either as a bandpass signal or directly as a sequence of spectral lines, e.g. to be able to perform a spectral analysis of the band in order to obtain the measure for the distribution of the energies in the frequency band.

Selbstverständlich kann das Audio- oder Videosignal der Einrichtung 106 als Zeitsignal zugeführt werden, wobei die Einrichtung 106 dann eine Bandfilterung sowie eine Analyse in dem Band durchführt. Alternativ kann das Audio- oder Videosignal, das der Einrichtung 106 zugeführt wird, bereits im Frequenzbereich vorliegen, wie z.B. als MDCT-Koeffizienten, oder aber auch als Bandpass-Signal in der Filterbank mit einer im Vergleich zu einer MDCT-Filterbank kleineren Anzahl an Bandpass-Filtern.Of course, the audio or video signal can be fed to the device 106 as a time signal, the device 106 then performing band filtering and an analysis in the band. Alternatively, the audio or video signal that is fed to device 106 can already be present in the frequency range, such as, for example, as MDCT coefficients, or as a bandpass signal in the filter bank with a smaller number of bandpasses compared to an MDCT filter bank -Filter.

Bei einem bevorzugten Ausführungsbeispiel ist die Einrichtung 106 zum Berechnen ausgebildet, um zur Berechnung des Schätzwerts aktuelle Beträge von Spektralwerten in dem Frequenzband zu berücksichtigen.In a preferred exemplary embodiment, the means 106 for calculating is designed to take into account current amounts of spectral values in the frequency band for calculating the estimated value.

Ferner kann die Einrichtung zum Berechnen des Maßes für die Verteilung der Energie ausgebildet sein, um als Maß für die Verteilung der Energie eine Anzahl von Spektralwerten zu ermitteln, deren Betrag größer oder gleich einer vorbestimmten Betragsschwelle sind, oder deren Betrag kleiner oder gleich der Betragsschwelle ist, wobei die Betragsschwelle vorzugsweise eine geschätzte Quantisiererstufe ist, die in einem Quantisierer bewirkt, dass Werte kleiner oder gleich der Quantisiererstufe zu null quantisiert werden. In diesem Fall ist das Maß für die Energie die Anzahl von aktiven Linien, also die Anzahl der Linien, die nach der Quantisierung überleben bzw. nicht gleich null sind.Furthermore, the device for calculating the measure for the distribution of the energy can be designed to determine a number of spectral values as a measure for the distribution of the energy, the amount of which is greater than or equal to a predetermined amount threshold, or the amount of which is less than or equal to the amount threshold , the absolute value threshold preferably being an estimated quantizer stage which, in a quantizer, causes values less than or equal to the quantizer stage to be quantized to zero. In this case, the measure for the energy is the number of active lines, i.e. the number of lines that survive or are not equal to zero after quantization.

Fig. 2a zeigt ein bevorzugtes Ausführungsbeispiel für die Einrichtung 106 zum Berechnen des Maßes für die Verteilung der Energie in dem Frequenzband. Das Maß für die Verteilung der Energie in dem Frequenzband ist in Fig. 2a mit nl(b) bezeichnet. Der Formfaktor ffac(b) ist bereits ein Maß für die Verteilung der Energie in dem Frequenzband. Wie es aus Block 106 ersichtlich ist, wird das Maß für die spektrale Verteilung nl aus dem Formfaktor ffac(b) durch Gewichtung mit der 4. Wurzel aus der Signalenergie e(b) geteilt durch die Bandbreite width(b) bzw. Anzahl der Linien im Skalenfaktorband b ermittelt. In diesem Zusammenhang sei darauf hingewiesen, dass man der Formfaktor auch ein Beispiel für eine Größe ist, die ein Maß für die Verteilung der Energien angibt, während nl(b) im Gegensatz hierzu ein Beispiel für ein Größe ist, die einen Schätzwert für die Anzahl der für die Quantisierung relevanten Linien darstellt. Fig. 2a shows a preferred embodiment for the means 106 for calculating the measure for the distribution of the energy in the frequency band. The measure of the distribution of energy in the frequency band is in Fig. 2a denoted by nl (b). The form factor ffac (b) is already a measure of the distribution of energy in the frequency band. As can be seen from block 106, the measure for the spectral distribution nl is obtained from the form factor ffac (b) by weighting with the 4th root of the signal energy e (b) divided by the bandwidth width (b) or number of lines determined in the scale factor band b. In this context, it should be pointed out that the form factor is also an example of a quantity that specifies a measure for the distribution of energies, while nl (b), in contrast, is an example of is a quantity that represents an estimate of the number of lines relevant for quantization.

Der Formfaktor ffac(b) errechnet sich durch Betragsbildung einer Spektrallinie und anschließender Wurzelbildung dieser Spektrallinie und anschließender Aufsummierung der "gewurzelten" Beträge der Spektrallinien in dem Band.The form factor ffac (b) is calculated by forming the absolute value of a spectral line and then taking the root of this spectral line and then adding up the "rooted" amounts of the spectral lines in the band.

Fig. 2b zeigt eine bevorzugte Ausführungsform der Einrichtung 104 zum Berechnen des Schätzwerts pe, wobei in Fig. 2b noch eine Fallunterscheidung eingeführt ist, nämlich dann, wenn der Logarithmus zur Basis 2 des Verhältnisses aus der Energie zur erlaubten Störung größer als ein konstanter Faktor c1 oder gleich dem konstanten Faktor ist. In diesem Fall wird die in dem Block 104 oben stehende Alternative genommen, also das Maß für die spektrale Verteilung nl wird mit dem Logarithmusausdruck multipliziert. Figure 2b shows a preferred embodiment of the means 104 for calculating the estimated value pe, where in Figure 2b Another case distinction is introduced, namely when the logarithm to base 2 of the ratio of the energy to the permitted disturbance is greater than a constant factor c1 or equal to the constant factor. In this case, the above alternative in block 104 is used, that is to say the measure for the spectral distribution nl is multiplied by the logarithm expression.

Wird dagegen festgestellt, dass der Logarithmus zur Basis 2 aus dem Verhältnis der Signalenergie zur erlaubten Störung kleiner als der Wert c1 ist, so wird die untere Alternative im Block 104 von Fig. 2b verwendet, die zusätzlich noch eine additive Konstante c2 sowie eine multiplikative Konstante c3 aufweist, die sich aus den Konstanten c2 und c1 berechnet.If, on the other hand, it is found that the logarithm to base 2 from the ratio of the signal energy to the permitted interference is less than the value c1, then the lower alternative in block 104 of Figure 2b is used, which also has an additive constant c2 and a multiplicative constant c3, which is calculated from the constants c2 and c1.

Nachfolgend wird anhand von Fig. 4a und Fig. 4b das erfindungsgemäße Konzept dargestellt. So zeigt Fig. 4a ein Band, in dem vier Spektrallinien vorhanden sind, die alle gleich groß sind. Die Energie in diesem Band ist somit gleichmäßig über das Band verteilt. Dagegen zeigt Fig. 4b eine Situation, bei der die Energie in dem Band in einer Spektrallinie residiert, während die anderen drei Spektrallinien gleich null sind. Das in Fig. 4b gezeigte Band könnte beispielsweise vor der Quantisierung vorliegen, oder könnte nach der Quantisierung erhalten werden, wenn die in Fig. 4b zu null gesetzten Spektrallinien vor der Quantisierung kleiner als die erste Quantisiererstufe sind und somit durch den Quantisierer zu null gesetzt werden, also nicht "überleben".The following is based on Figures 4a and 4b illustrated the inventive concept. So shows Figure 4a a band in which there are four spectral lines, all of the same size. The energy in this band is thus evenly distributed over the band. Against it shows Figure 4b a situation where the energy in the band resides in one spectral line while the other three spectral lines are the same are zero. This in Figure 4b For example, the band shown could be before quantization or could be obtained after quantization if the in Figure 4b Spectral lines set to zero before the quantization are smaller than the first quantizer stage and are thus set to zero by the quantizer, ie they do not "survive".

Die Anzahl von aktiven Linien in Fig. 4b ist somit gleich 1, wobei der Parameter nl in Fig. 4b zu der Quadratwurzel von 2 berechnet wird. Dagegen wird der Wert nl, also das Maß für die spektrale Verteilung der Energie in Fig. 4a zu 4 berechnet. Dies bedeutet, dass die spektrale Verteilung der Energie gleichmäßiger ist, wenn das Maß für die Verteilung der spektralen Energie größer ist.The number of active lines in Figure 4b is therefore equal to 1, with the parameter nl in Figure 4b to the square root of 2 is calculated. In contrast, the value nl, i.e. the measure for the spectral distribution of the energy in Figure 4a calculated to 4. This means that the spectral distribution of the energy is more uniform when the measure for the distribution of the spectral energy is larger.

Es sei darauf hingewiesen, dass die bandweise Berechnung der Perceptual Entropy gemäß dem Stand der Technik keinen Unterschied zwischen den beiden Fällen feststellt. Insbesondere wird kein Unterschied festgestellt, wenn in den beiden Bändern, die in Fig. 4a und 4b gezeigt sind, dieselbe Energie vorhanden ist.It should be noted that the band-by-band calculation of the perceptual entropy according to the prior art does not find any difference between the two cases. In particular, no difference is found when in the two bands ending in Figures 4a and 4b shown that the same energy is present.

Offensichtlich ist jedoch der in Fig. 4b gezeigte Fall mit nur einer relevanten Linie mit weniger Bits codierbar, da die drei zu null gesetzten Spektrallinien sehr effizient übertragen werden können. Allgemein gesagt beruht die einfachere Quantisierbarkeit des in Fig. 4b gezeigten Falls auf der Tatsache, dass nach der Quantisierung und verlustlosen Codierung kleinere Werte und insbesondere zu null quantisierte Werte weniger Bits zur Übertragung benötigen.Obviously, however, the in Figure 4b The case shown can be coded with only one relevant line with fewer bits, since the three spectral lines set to zero can be transmitted very efficiently. Generally speaking, the simpler quantizability of the in Figure 4b The case shown is based on the fact that after the quantization and lossless coding, smaller values and, in particular, values quantized to zero require fewer bits for transmission.

Erfindungsgemäß wird somit berücksichtigt, wie die Energie innerhalb des Bands verteilt ist. Dies erfolgt, wie es ausgeführt worden ist, durch Ersetzen der Anzahl der Linien pro Band in der bekannten Gleichung (Fig. 6) durch eine Abschätzung der Anzahl der Linien, die nach der Quantisierung ungleich null sind. Diese Abschätzung ist in Fig. 2a gezeigt.According to the invention, it is thus taken into account how the energy is distributed within the band. This is done as it is carried out by replacing the number of lines per band in the familiar equation ( Fig. 6 ) by estimating the number of lines that are non-zero after quantization. This estimate is in Fig. 2a shown.

Ferner sei darauf hingewiesen, dass der in Fig. 2a gezeigte Formfaktor auch an anderer Stelle im Codierer benötigt wird, beispielsweise innerhalb des Quantisierungsblocks 1014 zur Bestimmung der Quantisierungs-Schrittweite. Dann, wenn der Formfaktor bereits an anderer Stelle berechnet wird, muß er zur Bit-Abschätzung nicht erneut berechnet werden, so dass das erfindungsgemäße Konzept zur verbesserten Abschätzung des Maßes für die benötigten Bits mit einem Minimum an zusätzlichem Rechenaufwand auskommt.It should also be noted that the in Fig. 2a The form factor shown is also required elsewhere in the encoder, for example within the quantization block 1014 for determining the quantization step size. If the form factor is already calculated elsewhere, it does not have to be recalculated for bit estimation, so that the inventive concept for improved estimation of the measure for the required bits manages with a minimum of additional computing effort.

Wie es bereits ausgeführt worden ist, handelt es sich bei X(k) um den später zu quantisierenden Spektralkoeffizienten, während die Variable kOffset(b) den ersten Index im Band b bezeichnet.As has already been explained, X (k) is the spectral coefficient to be quantized later, while the variable kOffset (b) designates the first index in band b.

Wie es aus Fig. 4a und 4b ersichtlich ist, ergibt das Spektrum in Fig. 4a einen Wert nl=4, während das Spektrum in Fig. 4b einen Wert von 1,41 ergibt. Mit Hilfe des Formfaktors steht somit ein Maß für die Charakterisierung der spektralen Feldstruktur innerhalb des Bandes zur Verfügung.How it looks Figures 4a and 4b can be seen, gives the spectrum in Figure 4a a value nl = 4, while the spectrum in Figure 4b gives a value of 1.41. With the help of the form factor, a measure for the characterization of the spectral field structure within the band is available.

Die neue Formel zur Berechnung einer verbesserten bandweisen Perceptual Entropie basiert somit auf der Multiplikation des Maßes für die spektrale Verteilung der Energie und des Logarithmus-Ausdrucks, indem die Signalenergie e(b) im Zähler und die erlaubte Störung im Nenner auftreten, wobei je nach Bedarf ein Term innerhalb des Logarithmus eingesetzt werden kann, wie es bereits in Fig. 7 dargestellt ist. Diese Term kann beispielsweise ebenfalls 1,5 sein, kann jedoch auch gleich null sein, wie in dem in Fig. 2b gezeigten Fall, wobei dies z. B. empirisch bestimmt werden kann.The new formula for calculating an improved band-wise perceptual entropy is based on the multiplication of the measure for the spectral distribution of the energy and the logarithm expression in which the signal energy e (b) occurs in the numerator and the permitted disturbance in the denominator, depending on requirements a term inserted within the logarithm can be, as it is already in Fig. 7 is shown. This term can, for example, also be 1.5, but can also be equal to zero, as in the in Figure 2b case shown, this being e.g. B. can be determined empirically.

An dieser Stelle sei nochmals auf Fig. 5 hingewiesen, aus der die erfindungsgemäß berechnete Perceptual Entropie ersichtlich ist, und zwar aufgetragen über den benötigten Bits. Eine höhere Genauigkeit der Abschätzung gegenüber den Vergleichsbeispielen in den Fig. 6, 7 und 8 ist deutlich zu erkennen. Auch gegenüber der linienweisen Berechnung schneidet die erfindungsgemäße modifizierte bandweise Berechnung zumindest gleichwertig ab.At this point, open again Fig. 5 indicated, from which the perceptual entropy calculated according to the invention can be seen, plotted over the required bits. A higher accuracy of the estimation compared to the comparative examples in the Fig. 6 , 7th and 8th can be clearly seen. The modified band-by-band calculation according to the invention also performs at least equally as compared to the line-by-line calculation.

Abhängig von der Gegebenheit, kann das erfindungsgemäße Verfahren in Hardware oder in Software implementiert werden. Die Implementierung kann auf einem digitalen Speichermedium, insbesondere einer Diskette oder CD mit elektronisch auslesbaren Steuersignalen erfolgen, die so mit einem programmierbaren Computersystem zusammenwirken können, dass das Verfahren ausgeführt wird. Allgemein besteht die Erfindung somit auch in einem Computer-Programm-Produkt mit einem auf einem maschinenlesbaren Träger gespeicherten Programmcode zur Durchführung des erfindungsgemäßen Verfahrens, wenn das Computer-Programm-Produkt auf einem Rechner abläuft. In anderen Worten ausgedrückt, kann die Erfindung somit als ein Computer-Programm mit einem Programmcode zur Durchführung des Verfahrens realisiert werden, wenn das Computer-Programm auf einem Computer abläuft.Depending on the circumstances, the method according to the invention can be implemented in hardware or in software. The implementation can take place on a digital storage medium, in particular a floppy disk or CD with electronically readable control signals which can interact with a programmable computer system in such a way that the method is carried out. In general, the invention thus also consists in a computer program product with a program code stored on a machine-readable carrier for carrying out the method according to the invention when the computer program product runs on a computer. In other words, the invention can thus be implemented as a computer program with a program code for carrying out the method when the computer program runs on a computer.

Claims

Apparatus for determining an estimate of a need for information units for encoding a signal having audio or video information, the signal having several frequency bands, comprising:
means (102) for providing a measure for an admissible interference for a frequency band of the signal, wherein the frequency band includes at least two spectral values of a spectral representation of the signal, and a measure for an energy of the signal in the frequency band;

characterized by

means (106) for calculating a measure for a distribution of the energy in the frequency band, wherein the distribution of the energy in the frequency band deviates from a completely uniform distribution; and

means (104) for calculating the estimate using the measure for the interference, the measure for the energy, and the measure for the distribution of the energy.
Apparatus of claim 1, wherein the means (106) for calculating is configured to take magnitudes of spectral values in the frequency band into account for calculating the measure for the distribution of the energy.
Apparatus of claim 1 or 2, wherein the means (106) for calculating the measure for the distribution of the energy is configured to determine, as a measure for the distribution of the energy, a number of spectral values the magnitudes of which are greater than or equal to a predetermined magnitude threshold, or the magnitudes of which are smaller than or equal to the magnitude threshold.
Apparatus of claim 3, wherein the magnitude threshold is an exact or estimated quantizer stage causing, in a quantizer, values smaller than or equal to the quantizer stage to be quantized to zero.
Apparatus of one of the preceding claims, wherein the means (106) for calculating is configured to calculate a form factor according to the following equation: $ffac (b) = \sum_{k = kOffset (b)}^{kOffset (b + 1) - 1} \sqrt{|X (k)|},$
wherein X(k) is a spectral value at a frequency index k, wherein kOffset is a first spectral value in a band b, and wherein ffac(b) is the form factor.
Apparatus of one of the preceding claims,
wherein the means (106) for calculating is configured to take a fourth root of a ratio between the energy in the frequency band and a width of the frequency band or number of spectral values within the frequency band into account.
Apparatus of one of the preceding claims,
wherein the means (106) for calculating is configured to calculate the measure for the distribution of the energy according to the following equations: $nl (b) = \frac{ffac (b)}{{(\frac{e (b)}{width (b)})}^{0.25}}$
$ffac (b) = \sum_{k = kOffset (b)}^{kOffset (b + 1) - 1} \sqrt{|X (k)|},$
wherein X(k) is a spectral value at a frequency index k, wherein kOffset is a first spectral value in a band b, wherein ffac(b) is a form factor, wherein nl(b) represents the measure for the distribution of the energy in the band b, wherein e(b) is a signal energy in the band b, and wherein width(b) is a width of the band.
Apparatus of one of the preceding claims,
wherein the means (104) for calculating the estimate is configured to use a quotient of the energy in the frequency band and the interference in the frequency band.
Apparatus of one of the preceding claims,
wherein the means (104) for calculating the estimate is configured to calculate the estimate using the following expression: $pe = \sum_{b} nl (b) \cdot \log_{2} (\frac{e (b)}{nb (b)} + s)$
wherein pe is the estimate, wherein nl(b) represents the measure for the distribution of the energy in the band b, wherein e(b) is an energy of the signal in the band b, wherein nb(b) is the admissible interference in the band b, and wherein s is an additive term preferably equal to 1.5.
Apparatus of one of the preceding claims,
wherein the means (104) for calculating the estimate is configured to calculate the estimate according to the following equation: $pe = \sum_{b} nl (b) \cdot \log_{2} (\frac{e (b)}{nb (b)} + s)$
wherein: $nl (b) = \frac{ffac (b)}{{(\frac{e (b)}{width (b)})}^{0.25}},$
and
wherein: $ffac (b) = \sum_{k = kOffset (b)}^{kOffset (b + 1) - 1} \sqrt{|X (k)|},$
wherein pe is the estimate, wherein nl(b) represents the measure for the distribution of the energy in the band b, wherein e(b) is an energy of the signal in the band b, wherein nb(b) is the admissible interference in the band b, wherein s is an additive term preferably equal to 1.5, wherein X(k) is a spectral value at a frequency index k, wherein kOffset is a first spectral value in a band b, wherein ffac(b) is a form factor, and wherein width(b) is a width of the band.
Apparatus of one of the preceding claims,
wherein the signal is given as a spectral representation with spectral values.
Method of determining an estimate of a need for information units for encoding a signal having audio or video information, the signal having several frequency bands, comprising the steps of:
providing (102) a measure for an admissible interference for a frequency band of the signal, wherein the frequency band includes at least two spectral values of a spectral representation of the signal, and a measure for an energy of the signal in the frequency band;

characterized by

calculating (106) a measure for a distribution of the energy in the frequency band, wherein the distribution of the energy in the frequency band deviates from a completely uniform distribution; and

calculating (104) the estimate using the measure for the interference, the measure for the energy, and the measure for the distribution of the energy.
Computer program with program code for performing the method of determining an estimate of a need for information units for encoding a signal of claim 12, when the program is executed on a computer.