WO2014135914A1

WO2014135914A1 - A method for inverting dynamic range compression of a digital audio signal

Info

Publication number: WO2014135914A1
Application number: PCT/IB2013/000595
Authority: WO
Inventors: Stanislaw GORLOW; Joshua D. REISS
Original assignee: Universite De Bordeaux 1; Queen Mary, University Of London; Institut Polytechnique De Bordeaux; Universite Bordeaux Segalen; Centre National De La Recherche Scientifique (Cnrs)
Priority date: 2013-03-04
Filing date: 2013-03-04
Publication date: 2014-09-12

Abstract

In the method of decompressing a compressed digital audio signal resulting from the compression of the dynamic range of an input signal, for each integer n representing a time instant, y(n) being the level of the compressed signal at instant n, automated means determine: z(n)=sgn[y(n)].∣z(n)∣ with ∣z(n)∣=H − [∣y(n)∣∣θ] if v(n) > 10 ^L/20 ∣z(n)∣=∣y(n)∣ otherwise where L is a threshold in d B, v(n) is a sound level or envelope of the input signal x(n), θ represents the compressor model parameters, and H represents the compressor, where H ^-1 is its inverse.

Description

A method for inverting dynamic range compression of a digital audio signal

The invention relates to a method for inverting dynamic range compression of a digital audio signal.

The prior art comprises the following references:

[I] D. Barchiesi and J. Reiss, "Reverse engineering of a mix," J. Audio Eng. Soc, vol. 58, pp. 563-576, 2010.

[2] T. Ogunfunmi, Adaptive nonlinear system identification: The Volterra and Wiener model approaches. 233 Spring Street, New York, NY 10013, USA: Springer Science+Business Media, LLC, 2007, ch. 3.

[3] Y. Avargel and I. Cohen, "Adaptive nonlinear system identification in the short- time Fourier transform domain," IEEE Trans. Signal Process., vol. 57, no. 10, pp. 3891-3904, Oct. 2009.

[4] "Modeling and identification of nonlinear systems in the shorttime Fourier transform domain," IEEE Trans. Signal Process., vol. 58, no. 1 , pp. 291-304, Jan. 2010.

[5] A. Gelb and W. E. Vander Velde, Multiple-input describing functions and nonlinear system design. New York: McGraw-Hill, 1968, ch. 1 .

[6] P. W. J. M. Nuij, O. H. Bosgra, and M. Steinbuch, "Higher-order sinusoidal input describing functions for the analysis of non-linear systems with harmonic responses," Mech. Syst. Signal Process., vol. 20, pp. 1883-1904, 2006.

[7] B. Lachaise and L. Daudet, "Inverting dynamics compression with minimal side information," in Proc. DA Fx, 2008, pp. 1-6.

[8] E. Vickers, "The loudness war: Background, speculation and recommendations," in AES Convention 129, Nov. 2010.

[9] Dolby Digital and Dolby Volume provide a comprehensive loudness solution, Dolby Laboratories, 2007.

[10] Broadcast loudness issues: The comprehensive Dolby approach, Dolby Laboratories, 2011.

[I I] R. Jeffs, S. Holden, and D. Bohn, Dynamics processor— technology & application tips, Rane Corporation, 2005.

[12] U. Zolzer, DAFX: Digital audio effects, 2nd ed. The Atrium, Southern Gate, Chichester, West Sussex, P019 8SQ, United Kingdom: John Wiley & Sons Ltd, 2011 , ch. 4.

[13] J. C. Schmidt and J. C. Rutledge, "Multichannel dynamic range compression for music signals," in Proc. IEEE ICASSP, vol. 2, pp. 1013-1016. [14] D. Giannoulis, M. Massberg, and J. D. Reiss, "Digital dynamic range compressor design— a tutorial and analysis," J. Audio Eng. Soc, vol. 60, pp. 399- 408, 2012.

[15] M. M. Goodwin and C. Avendano, "Frequency-domain algorithms for audio signal enhancement based on transient modification," J. Audio Eng. Soc, vol. 54, pp. 827-840, 2006.

[16] M. Walsh, E. Stein, and J.-M. Jot, "Adaptive dynamics enhancement," in AES Convention 130, May 2011 .

[17] M. Zaunschirm, J. D. Reiss, and A. Klapuri, "A sub-band approach to modification of musical transients," Comput. Music J., vol. 36, pp. 23-36, 2012.

Sound or audio engineering is an established discipline employed in many areas that are part of our everyday life without us taking notice of it. But not many know how the audio was produced. If we take sound recording and reproduction or broadcasting as an example, we may imagine that a prerecorded signal from an acoustic source is altered by an audio engineer in such a way that it corresponds to certain criteria when played back. The number of these criteria may be large and usually depends on the context. In general, the said alteration of the input signal is a sequence of numerous forward transformations, the reversibility of which is of little or no interest. But what if one wished to do exactly this, that is to reverse the transformation chain, and what is more, in a systematic and repeatable manner ?

The research objective of reverse audio engineering is twofold: to identify the transformation parameters given the input and the output signals, as in [1 ], and to regain the input signal that goes with the output signal given the transformation parameters. In both cases, an explicit signal model is mandatory. The latter case might seem trivial, but only if the applied transformation is linear and orthogonal and as such perfectly invertible. Yet the forward transform is often neither linear nor has it an inverse. This is the case for dynamic range compression (DRC), which is commonly described by a dynamic nonlinear time-variant system. The classical linear time-invariant (LTI) system theory does not apply here, so a tailored solution to the problem at hand must be found instead.

At this point, we would also like to highlight the fact that neither Volterra nor Wiener model approaches [2]-[4] offer a solution, and neither do describing functions [5], [6]. These are useful tools when identifying a time-invariant or a slowly varying nonlinear system or analyzing the limit cycle behavior of a feedback system with a static nonlinearity.

A method to invert dynamics compression is described in [7], but it requires an instantaneous gain value to be transmitted for each sample of the compressed signal. To provide a means to control the data rate, the gain signal is subsampled and also entropy coded. Not relying on a gain model and thus being extremely generic, this approach is highly inefficient.

On the other hand, transmitting the uncompressed signal in conjunction with some few typical compression parameters like threshold, ratio, attack, and release would require a much smaller capacity and yield the best possible signal quality with regard to any thinkable measure. A more realistic scenario is when the uncompressed signal is not available on consumer side. This is usually the case for studio music recordings and broadcast material. There, the listener is offered a signal that is meant to sound "good" to everyone. However, the loudness war [8] has resulted in over-compressed audio material. Overcompression makes a song lose its artistic features like excitingness or liveliness and desensitizes the ear thanks to a louder volume. Thus there is a need to restore the original signal's dynamic range and to experience audio free of compression.

In addition to the normalization of the program's loudness level, the Dolby solution [9], [10] also includes dynamic range expansion. The expansion parameters that help reproduce the original program's dynamic range are tuned on the broadcaster side and transmitted as metadata together with the broadcast signal. This is a very convenient solution for broadcasters, not least because the metadata is quite compact. Dynamic range expansion is yet another forward transformation rather than a true inversion.

Evidently, none of the previous approaches satisfy the reverse engineering objective as it was formulated earlier.

An object of the present invention, hence, is to invert dynamic range compression, which is a vital element not only in broadcasting but also in mastering.

The invention therefore provides for a method of decompressing a compressed digital audio signal resulting from the compression of an input signal, wherein, for each integer n representing a time instant, y(n) being the level of the compressed signal at instant n, automated means determine:

z {n) = sgn[ y {n)] . \z { n)\

with

H n)\ = \y { n)\ otherwise

where

L is a threshold in dB, v(n) is a sound level or envelope of the input signal x(n),

Θ represents the compressor model parameters, and

-1

H represents the compressor, where H is its inverse.

-1

The decompressor, which is respresented by H , is defined by the so-called "characteristic function" z_p(v). "Characteristic" because the function characterizes the nonlinear behavior of the decompressor represented by the model Θ. (The compressor is defined by various parameters such as a threshold, a ratio, an attack, a release, a knee, etc.)

In one embodiment, the automated means determine:

v { n)= x n)

with

where :

p defines the sound level detector's type, i.e. for an RMS detector p = 2 and for a peak detector p = 1.

β and y are the smoothing factors that go with the model parameters τ_ν and ¾, which again are the time constants of the level detector and the gain smoothing filter, the conversion being as follows: and

where f_s is the sampling frequency, in the above equation, g(n-1) being the gain value for the preceding sample, which was calculated as

Advantageously, the level detector as well as the gain smoothing filter can be in either the attack or release phase, wherein, if g(n-l)

the detector is assumed to be in attack, so that τ_ν= x_{v aC}k, otherwise τ_ν= iv,_re,_ease. and wherein, for the gain smoothing filter, the condition for attack is (β is now known)

I \\ p _{1 n}SZ/20 IIS

g{n-l) g n-l)

S being the slope parameter derived from the compression ratio R according to

S = 1 -1/R wherein, given that the condition holds true, ¾= ¾_a((ac/_<, otherwise ¾= ¾,_retease.

L/20

In one embodiment, if v(n)>10 , the current sample is decompressed in the following manner:

- First, we compute the root or zero-crossing of the characteristic function z_p(v), v₀(n) using v(n) as a starting point for an iterative search:

v₀= CHARFZEROfv(n)]

- Once v₀(n) is obtained, the modulus of the decompressed sample is given by

x(n)=v^p(n)

- The corresponding gain value is

- Otherwise, the modulus of the sample is computed as

g(n)= γ + (1 -γ)_δ(η-1)

-And is updated according to x\n)=p\

Advantageously, when the model parameters Θ are not known, the same method is applied to accentuate the shape of the signal y(n) and in that case, the model parameters are tweaked in such a way that the desired effect is achieved. The invention also provides for :

- a digital audio signal obtained by using the method of the invention;

- a method of making available on a telecommunication network a signal obtained by using the method of the invention in view of downloading it; - a computer program comprising code instructions arranged for controling the execution of a method of the invention when the program is performed on a computer;

- a method of making available on a telecommunication network a program according to the invention in view of downloading it; and

- a data storage medium comprising data representing a signal obtained by using the method of the invention and/or data representing a program according to the invention.

The invention also provides for a device for decompressing a compressed digital audio signal resulting from the compression of an initial signal wherein the device is arranged to perform the method of the invention.

Other characteristics and advantages of the invention will appear on reading the following description comprising an embodiment given as a non-limiting example, and referring to the attached drawings in which:

- Figure 1 shows a basic broadband compressor model ;

- Figure 2 shows the graphical illustration for the iterative search of the zero- crossing used in the invention ;

- Figure 3 shows an illustrative example using an RMS detector with τ_ν set to 5 ms, a threshold of -20 dBFS (dashed line in the upper right corner), a compression ratio of 4 : 1 , and x_g set to 1 .6 ms for attack and 17 ms for release, respectively ;

- Figure 4 shows RMSE as a function of typical attack and release times using a peak (upper row) or an RMS amplitude detector (lower row). In the left column, the attack time of the envelope filter is varied while the release time is held constant. The right column shows the reverse case. The time constants of the gain filter are fixed at zero. In all four cases, threshold and ratio are fixed at -32 dBFS and 4 : 1 , respectively;

- Figure 5 shows RMSE as a function of typical attack and release times using a peak (upper row) or an RMS amplitude detector (lower row). In the left column, the attack time of the gain filter is varied while the release time is held constant. The right column shows the reverse case. The time constants of the enveloppe filter are fixed at zero. In all four cases, threshold and ratio are fixed at -32 dBFS and 4 : 1 , respectively, and

- Figure 6 shows RMSE as a function of threshold relative to the signal's average loudness level (left column) and compression ratio (right column) using a peak (upper row) or an RMS amplitude detector (lower row). The time constants are: τ_ν = 5 ms, T_{g att} = 20 ms, and T_g,_rei = 1 s. Hereafter, we show how a dynamic nonlinear time-variant operator, such as a dynamic range compressor, can be inverted using an explicit signal model. By knowing the model parameters that were used for compression, one is able to recover the original uncompressed signal from a "broadcast" signal with high numerical accuracy and very low computational complexity. A compressor- decompressor scheme is worked out and described in detail. The approach is evaluated on real-world audio material with great success.

In the following, we provide a brief introduction to dynamic range compression and present the compressor model upon which our considerations are based. The data model, the formulation of the problem, and the pursued approach are described next. The inversion is then discussed. Afterwards, we illustrate how an integral step of the inversion procedure, namely the search for the zero-crossing of a nonlinear function, can be solved in an iterative manner by means of linearization. Some other compressor features are then discussed. The complete algorithm is given in the form of pseudocode and its performance is evaluated for different compressor settings.

The audio digital signal which is considered here is an audio signal such as a piece of music.

Dynamic range compression

Dynamic range compression or simply "compression" is a sound processing technique that attenuates loud sounds and/or amplifies quiet sounds, which in consequence leads to a reduction of an audio signal's dynamic range. (In audio, dynamic range is the difference between the loudest and quietest sounds measured in decibel.) In what follows, we will use the word "compression" having "downward" compression in mind. (The invention is likewise applicable to upward compression.) Downward compression means attenuating sounds above a certain threshold while leaving sounds below the threshold unchanged. A sound engineer might use a compressor to reduce the dynamic range of source material for purposes of aesthetics, intelligibility, recording or broadcast limitations.

Figure 1 shows a basic broadband compressor model 2 (feed forward). Figure 1 illustrates the basic compressor model from [11 , ch. 2] amended by a switchable RMS/peak detector in the side chain to make it compatible with the compressor/limiter model in [12, p. 106]. We will hereafter restrict our considerations to this basic model, but the purpose of the invention is a general approach. First, the input signal x(n) is split and a copy is sent to the side chain 4. The detector 6 then calculates the magnitude or level of the sidechain signal using the root mean square (RMS) or peak as a measure for how loud a sound is. [12, p. 107] The detector's temporal behavior is controlled by the attack and release parameters. In the comparator 8, the sound level is compared with the threshold level, and in case it exceeds the threshold a scale factor is calculated in calculator 10 which corresponds to the ratio of input level to output level. The knee parameter determines how quick the compression ratio is reached. At the end of the side chain, the scale factor is fed to a smoothing filter 12 that yields the gain. The response of the filter is controlled by another set of attack and release parameters. Finally, the gain control 14 applies the smoothed gain to the input signal and adds a fixed amount of makeup gain to bring the output signal y(n) to a desired level.

Such a broadband compressor operates on the input signal's full bandwidth, treating all frequencies from zero through the highest frequency equally. A detailed overview of all sidechain controls of a basic gain computer is given in [11 , ch. 3].

Data model, problem formulation and proposed solution

A. Data Model and Problem Formulation

The used data model is based upon the compressor from figure 1 . The following simplifications are additionally made: the knee parameter ("hard" knee) and the makeup gain (fixed at 0 dB) are ignored. The compressor is further deemed to be a single-input single-output (SISO) system, that is both the input and the output are single-channel signals. What follows is a description of each block by means of a dedicated function.

The RMS/peak detector as well as the gain computer build upon a first-order (one-pole) lowpass filter. The sound level or envelope v(n) of the input signal x(n) at time instant n, n being an integer, is obtained by

x { n) = \x { n )\^p+]> x {n - l ) with /?e{l,2}

β= 1 -β

v n ) = x \ n ,

(1 )

where p = 2 represents an RMS detector, and p = 1 a peak detector. The non-zero smoothing factor β, 0 < β =< 1 , may take on different values, or /3re/, depending on whether the detector is in attack or release phase. The condition for the level detector to enter the attack phase and to choose over /3re/is

|x(n) | > v(n - 1 ). (2)

A formula that converts a time constant τ into a smoothing factor is given in [12, p. 109]:

β = 1 - exp [-2.2 / (fs . Tv )] ,

where fs is the sampling frequency. The static nonlinearity in the gain computer is usually modeled in the logarithmic domain as a continuous piecewise linear function:

F(n) = -S.[V(n)-L] ifV(n)>L,

F(n) = 0 otherwise (3)

where S is the slope, V (n) = 20 logw v(n), and L is the threshold in decibel. The slope is further derived from the desired compression ratio R according to

S = 1 - 1/R (4)

Equation (3) is equivalently expressed in the linear domain as

f(n) = Kv^s(n) if v(n) > I

f(n) = 1 otherwise, (5)

L/20

where 1= 10 , K = r, and f is the linear scale factor before filtering.

The smoothed gain g is then calculated as the exponentially-weighted moving average,

g(n) = yf(n) + yg(n - 1) with Υ^€{Υ-*Ύ^«*) (6)

where the decision for the gain computer to choose the attack smoothing factor

instead of yrei is subject to

f(n)<g(n-1) (7)

The output signal y(n) is finally obtained by multiplying the above gain with the input signal x(n):

y(n) = g(n) . x(n) (8)

Due to the fact that the gain g is strictly positive, 0 < g =< 1, it follows that

sgn(y) = sgn(x), (9)

where sgn is the signum or sign function. In consequence, it is convenient to factorize the input signal as a product of the sign and the modulus according to

with sgn(x) being known due to (9).

The problem to be solved can be formulated as follows: given the compressed signal y(n) and the model parameters Θ, recover the modulus of the original signal \x(n)\ _from \y(n)\ _{based on 0}.

B. Solution

The output of the side chain, that is the gain of ' (^Μ-" , given θ, (^Μ_1) , and g(n- 1), may be written as

g(n)=G[\x(n)\\e,x(n-\),g(n-\)]

In (11), G denotes a nonlinear dynamic operator that maps the modulus of the input signal l Wl onto a sequence of instantaneous gain values g(n) according to the compressor model represented by Θ. The compressor shall be completely described by the model parameters listed below.

L The threshold in dB

R The compression ratio dBin : dBout

p The detector type (peak or RMS)

Tv.att The attack time of the envelope filter in ms

Tv.rei The release time of the envelope filter in ms

Tg.att The attack time of the gain filter in ms

T_g,rei The release time of the gain filter in ms.

Using (11 ), (8) can be solved for ' W yielding

subject to invertibility of G. In order to solve the above equation, one requires the knowledge of g(n), which is unavailable. However, since g is a function of , we can express ^ as a function of one independent variable , and in that manner we obtain an equation with a single unknown:

where H represents the entire compressor. If H is invertible, i.e. bijective for all n, l W can be obtained from by

\x {n )\ = \y { n)\ _{otherwise (1 3)}

And yet, since v(n) is unknown, the condition for applying decompression must be predicted from y(n), _{t anc}| g(n - 1), and so needs the condition for toggling between the attack and release phases. Depending on the quality of the prediction, the recovered modulus l^z ( ^M )l may differ somewhat at transition points from the original modulus l Wl , so that in the end

In the next section, it is shown how such an inverse compressor or decompressor is derived.

Inversion of dynamic range compression

A. Characteristic function

For simplicity, we choose the instantaneous envelope value v(n) instead of as the independent variable in (12). The relation between the two items is given by (1). From (5), (6) and (8), when v(n) > I,

From (1),

?> ₍₁₇₎ or equivalently (note that fi≠0by definition)

Moreover, (18) has a unique solution if G and also H are invertible. Moving the expression on the left-hand side over to the right-hand side, we may define

z_p{v)=[ Kv-^s{n)+ g{n-\)]^p.[v^p{n)- x{n-\)]-^\y{nf ₍₁₉₎ which shall be termed the characteristic function. The zerocrossing of z_P(v) hence represents the sought-after envelope value v(n). Once v(n) is found (see Section V), the current values of ^x ' ' ' and g are updated as per

x(n) = v^p(n ,

|χ(»)| = γ[χ(»)-βχ(»-ΐ)]/β (20)

g{n)=\y{n)\l\x{n)\

and the decompressed sample is then calculated as

B. Attack-release phase toggle

1) Envelope smoothing

In case a peak detector is in use, β takes on two different values. The condition for the attack phase is then given by (2) and is equivalent to

\x{n)\^p>x~(n-l) _ (₂2)

Assuming that the past value of is known at time n, what we need to do is to express the unknown in terms of , such that the above equation still holds true. If y is rather small, γ =<0.01 « 1, or equivalently if ¾ is sufficiently large, ¾>=0.5 ms at 44.1 -kHz sampling, the term yf(n) in (15) is negligible, so it approximates (15) as

Solving (23) for and plugging the result into (22), we obtain

If (24) is true, the detector is assumed to be in attack phase.

2) Gain smoothing

Just like the peak detector, the gain smoothing filter may be in either attack or release phase. The necessary condition for the attack phase in (7) may also be formulated as

v(«)>[ /g(«-l)r_with v(«)>/ _ ₍₂₅₎

But since the current envelope value is unknown, we need to substitute v(n) in the above inequality by something that we know. With this in mind we rewrite (15) as

Provided that f(n) < g(n - 1), and due to the fact that 0 < γ =< 1, the expression in square brackets in (26) is smaller than one, and thus during attack

\y{n)\<g{n-l).\x{n)\ ^7) Substituting ' W by ^ν"(^{η 1} )] ⁷P using (20), and solving (27) for v(n) results in

If v(n) in (25) is substituted by the expression on the righthand side of (28), (25) still holds true, so that the following sufficient condition is used to predict the attack phase of the gain filter:

Note that the values of all variables are known whenever (29) is evaluated.

C. Envelope Predictor

An instantaneous estimate of the envelope value v(n) is required not only to predict when compression is active, formally v(n) > I according to (5), but also to initialize the iterative search algorithm in Section V. We resort once more to (15) and note that in the opposite case where v(n) =< I, f(n) = 1, and so

\x{n)\ = \y{n)\l{ + yg{n-l)) _ ^

The sound level of the input signal at time n is therefore

which must be greater than the threshold for compression to set in, whereas β and y are selected based on (24) and (29), respectively. Numerical solution of the characteristic function

An approximate solution to the characteristic function can be found, e.g., by means of linearization. The estimate from (31 ) may moreover serve as a starting point for an iterative search of an optimum:

The criterion for optimality is further chosen as the deviation of the characteristic function from zero, initialized to

Μ_ί = \^ζ p {v_Mt) \ _ ^32)

We may thereupon approximate (19) at a given point using the equation of a straight line, z = m.v + c, where m is the slope and c is the z-intercept. The zero-crossing is characterized by the equation

as is shown in figure 2. This figure shows the graphical illustration for the iterative search of the zero-crossing. The new and hopefully better estimate of the optimal v is hence found as

If Vi+i is less optimal than vi, the iteration is stopped and vi is the final estimate. The iteration is also stopped if AM is smaller than some ε. In the latter case, VM has the optimal value with respect to the chosen criterion. Otherwise, w is set to VM and A is set to AM after every iteration step and the procedure is repeated until VM has converged to a more optimal value.

General remarks

A. Stereo linking

When dealing with stereo signals, one might want to apply the same amount of gain reduction to both channels to prevent image shifting. This is achieved through stereo linking. One way is to calculate the required amount of gain reduction for each channel independently and then apply the larger amount to both channels. The question which arises in this context is which of the two channels was the gain derived from. To give an answer resolving the dilemma of ambiguity, one thinkable solution would be to signal which of the channels carries the applied gain. One could then decompress the marked sample and use its gain for the other channel. Although very simple to implement, this approach provokes an additional data rate of 44.1 kbps at 44.1 -kHz sampling. A rate-efficient alternative that comes with a higher computational cost is realized in the following way: First, one decompresses both the left and the right channel independently and in so doing one obtains two estimates Ζι(η) and z_r(n), where subscript / shall denote the left channel and subscript r the right channel, respectively. In a second step, one calculates the compressed values of Zi(n) and z_r(n) and selects the channel for which H[z(n)] = y(n) holds true. In a final step, one updates the remaining variables using the gain of the selected channel.

B. Lookahead

A compressor with a look-ahead function, i.e. with a delay in the main signal path as in [12, p. 106], uses past input samples to calculate the output sample. Now that future input samples are required to invert the process, which are unavailable, the inversion is rendered impossible. g(n) and x(n) must thus be in sync for the above approach to be applied.

C. Clipping and limiting

Another point worth mentioning is that "hard" clipping and "brick-wall" limiting are special cases of compression with at least the attack time set to zero and the compression ratio set to∞ : 1 . The static nonlinearity F, in that particular case, is a one-to-many mapping, which by definition is noninvertible.

The algorithm

The complete algorithm is divided into three parts each of them given as pseudocode further below. Algorithm 1 outlines the compressor that corresponds to the model described above. Algorithm 2 illustrates the decompressor, and the iterative search for the numerical solution of the characteristic function is finally summarized in Algorithm 3. The parameter fs represents the sampling frequency in kHz.

Algorithm 1 : The compressor

function CoMP(x_n; Θ ; fs)

*„<-o

for n <—l, N do

enfif

if v_n>lthen else

/„<-!

end if

if fn <gn^then

else

y<-l-exp[-2.2/(/_i.T_g>re/)] end if y n 8 n

end for

return y_n

end function Algorithm 2 : The decompressor

function DECOMP(y_n; θ; ε ; fs) for n<—l,N do

else

P«-l-e:<p[-2.2/(/,.-r_v>J]

end if

if\y_n\>i[{Klg_nY^,s-\ix_n]l^.g_nthen

y<-l-_eXp[-2.2/(/,.T_e>J]

else

^ « ^ ^*y ^*y

end for

return x_n

end function

Algorithm 3 : The iterative search of the zero-crossing function CHARFZERO(v_n, ε) repeat

[z_p{ ν,.+Δ,.) - z_p (v . ) ] return v,„

return v.

end function Performance evaluation

A. Performance metrics

To evaluate the inverse approach, the following quantities are measured: the mean-square error (RMSE),

given in decibel relative to full scale (dBFS), the perceptual similarity between the original and decompressed signal, and the execution time of the decompressor relative to real time (RT). Furthermore, we present the percentage of compressed samples, the mean number of iterations until convergence per compressed sample, the error rate of the attack-release toggle for the gain smoothing filter, and finally the error rate of the envelope predictor. The perceptual similarity is assessed by PEMO- Q [13], [14] with PSMt as metric. The simulations are run in MATLAB on an Intel Core i5-520M CPU.

B. Computational results

Fig. 3 shows the inverse output signal z(n) for a synthetic input signal x(n) using an RMS detector. Iti is an illustrative example using an RMS amplitude detector with τ_ν set to 5 ms, a threshold of -20 dBFS (dashed line in the upper right corner), a compression ratio of 4 : 1 , and g set to 1 .6 ms for attack and 17 ms for release, respectively. The RMSE is -129 dBFS. The inverse signal is obtained from the compressed signal y(n) with an error of -129 dBFS. It is visually indistinguishable from the original signal x(n). Due to the fact that the signal envelope is constant most of the time, the error is noticeable only around transition points— which are few. The decompressor's performance is further evaluated for some commercial compressor presets. The used audio material consists of 12 items covering speech, sung voice, music, and jingles. All items are normalized to -16 LKFS [15]. The ε-value in the break condition of Algorithm 3 is set to 1 .10-12. A detailed overview of compressor settings and performance figures is given in Tables l-ll. The presented results suggest that the decompressed signal is perceptually indistinguishable from the original— the PSMt-value is flawless. This was also confirmed by the authors through informal listening tests.

As can be seen from Table II, the largest inversion error is associated with setting E and the smallest with setting B. For all five settings, the error is larger when an RMS detector is in use. This is partly due to the fact that z_p(v) has a stronger curvature in comparison to z_p(v). By defining the distance in (40) as

Δ = ^ )Ι

it is possible to attain a smaller error for an RMS detector at the cost of a slightly longer runtime. In most cases, the envelope predictor works more reliably as compared to the toggle switch between attack and release. It can also be observed that the choice of time constants seems to have little impact on decompressor's accuracy. The major parameters that affect the decompressor's performance are L and R, while the threshold is evidently the predominant one: the RMSE strongly correlates with the threshold level.

TABLE I Selected compressor settings

TABLE II

PERFORMANCE FIGURES OBTAINED FOR VARIOUS AUDIO MATERIAL ( 12 ITEMS)

Figs. 4-5 show the inversion error as a function of various time constants. These are in the range of typical attack and release times for a limiter (peak) or compressor (RMS) [12, pp. 109-110]. Figure 4 shows RMSE as a function of typical attack and release times using a peak (upper row) or an RMS amplitude detector (lower row). In the left column, the attack time of the envelope filter is varied while the release time is held constant. The right column shows the reverse case. The time constants of the gain filter are fixed at zero. In all four cases, threshold and ratio are fixed at -32 dBFS and 4 : 1 , respectively. Figure 5 shows RMSE as a function of typical attack and release times using a peak (upper row) or an RMS amplitude detector (lower row). In the left column, the attack time of the gain filter is varied while the release time is held constant. The right column shows the reverse case. The time constants of the enveloppe filter are fixed at zero. In all four cases, threshold and ratio are fixed at -32 dBFS and 4 : 1 , respectively. It can be observed that the inversion accuracy depends on the release time of the peak detector and not so much on its attack time for both the envelope and the gain filter, see Figs. 4, 5 (b). For the envelope filter, all error curves exhibit a local dip around a release time of 0.5 s. The error increases steeply below that bound but moderately with larger values. In the proximity of 5 s, the error converges to -130 dBFS. With regard to the gain filter, the error behaves in a reverse manner. The curves in Fig. 5 (b) exhibit a local peak around 0.5 s with a value of -180 dBFS. It can further be observed in Fig. 4 (a) that the curve for τ_ν,_κι = 1 ms has a dip where xv.att is close to 1 ms, i.e. where the modulus of β_3Κ - p_rei is minimal. This is also true for Fig. 4 (c) and (d): the lowest error is where the attack and release times are identical. As a general rule, the error that is due to the attack-release switch is smaller for the gain filter in Fig. 5.

This program may be made available on a telecommunication network in view of downloading it.

The digital audio signal obtained by using the method of the invention may be recorded on a data storage medium so as to obtain a data storage medium comprising data representing the signal.

This signal may also be made available on a telecommunication network in view of downloading it.

The above mentioned medium could be a disc, a hard drive, a flash drive, a CD or a DVD for example.

Conclusion The invention deals with the problem of finding an inverse to a nonlinear dynamic operator such as a digital compressor. In the proposed approach, we use an explicit signal model to solve the problem. To find the "dry" or uncompressed signal with high accuracy, it is sufficient to know the model parameters. The parameters can e.g. be sent together with the "wet" or compressed signal in the form of metadata as is the case with Dolby Volume and ReplayGain [16]. A new bitstream format is not mandatory, since many digital audio standards, like WAV or MP3, provide means to tag the audio content with "ancillary" data. With the help of the metadata, one can then reverse the compression applied after mixing or before broadcast. This allows the end user to have control over the amount of compression, which may be preferred because the sound engineer has no control over the playback environment or the listener's individual taste.

When the compressor parameters are unavailable, they can possibly be estimated from the compressed signal. This may thus be a direction for future work. Another direction would be to apply the approach to more sophisticated models that include a "soft" knee, parallel and multiband compression, or perform gain smoothing in the logarithmic domain, see [11], [12], [17], [18] and references therein.

The figures suggest that the decompressor is realtime capable which can pave the way for exciting new applications. One such application could be the restoration of dynamics in over-compressed audio or else the accentuation of transient components, see [19]-[21 ], by an adaptively tuned decompressor that has no prior knowledge of the compressor parameters.

We have shown how an inverse to a nonlinear dynamic operator such as a digital compressor can be derived.

The invention allows to obtain an audio signal with negligible errors , i.e. perceptually indistinguishable from the original uncompressed signal in its "artistic" properties.

Obviously, numerous modifications to the compressor model can be made without leaving the scope of the invention. The invention may also be used with a step of estimation of the model parameters from the compressed signal, when the model parameters are unknown. It could also be used with more sophisticated models that include a soft knee, parallel and multiband compression, or perform gain smoothing in the logarithmic domain, see [11]-[14] and references therein.

The invention may also be used as an adaptativ digital audio effect. Indeed, the decompressor may be used on a digital audio signal which was compressed using an unknown compressor different from the above described compressor, or which was not compressed. The parameters of the decompressor are then adapted to the input signal.

The search for the zero-crossing could be done by other ways, using known functions.

Another description of the invention is given in the following pages. Model-Based Inversion of Dynamic

Range Compression

Stanislaw Gorlow, Graduate Student Member, IEEE and Joshua D. Reiss, Member, IEEE

Abstract— In this work it is shown how a dynamic nonlinear and neither do describing functions [5], [6]. These are useful time-variant operator, such as a dynamic range compressor, can tools when identifying a time -invariant or a slowly varying be inverted using an explicit signal model. By knowing the model nonlinear system or analyzing the limit cycle behavior of a parameters that were used for compression one is able to recover

the original uncompressed signal from a "broadcast" signal with feedback system with a static nonlinearity.

high numerical accuracy and very low computational complexity. A method to invert dynamics compression is described A compressor-decompressor scheme is worked out and described in [7], but it requires an instantaneous gain value to be in detail. The approach is evaluated on real-world audio material transmitted for each sample of the compressed signal. To with great success. provide a means to control the data rate, the gain signal is

Index Terms— Dynamic range compression, inversion, model- subsampled and also entropy coded. This approach is highly based, reverse audio engineering. inefficient as it does not rely on a gain model and is extremely generic.

I. INTRODUCTION On the other hand, transmitting the uncompressed signal in conjunction with a few typical compression parameters like

S OUND or audio engineering is an established discipline

threshold, ratio, attack, and release would require a much employed in many areas that are part of our everyday

smaller capacity and yield the best possible signal quality with life without us taking notice of it. But not many know how

regard to any thinkable measure. A more realistic scenario the audio was produced. If we take sound recording and

is when the uncompressed signal is not available on the reproduction or broadcasting as an example, we may imagine

consumer side. This is usually the case for studio music that a prerecorded signal from an acoustic source is altered by

recordings and broadcast material where the listener is offered an audio engineer in such a way that it corresponds to certain

a signal that is meant to sound "good" to everyone. However, criteria when played back. The number of these criteria may

the loudness war [8] has resulted in over-compressed audio be large and usually depends on the context. In general, the

material. Over-compression makes a song lose its artistic said alteration of the input signal is a sequence of numerous

features like excitingness or liveliness and desensitizes the forward transformations, the reversibility of which is of little

ear thanks to a louder volume. There is a need to restore the or no interest. But what if one wished to do exactly this, that

original signal's dynamic range and to experience audio free is to reverse the transformation chain, and what is more, in a

of compression.

systematic and repeatable manner?

In addition to the normalization of the program's loudness

The research objective of reverse audio engineering is

level, the Dolby solution [9], [10] also includes dynamic range twofold: to identify the transformation parameters given the

expansion. The expansion parameters that help reproduce the input and the output signals, as in [1], and to regain the input

original program's dynamic range are tuned on the broadcaster signal that goes with the output signal given the transformaside and transmitted as metadata together with the broadcast tion parameters. In both cases, an explicit signal model is

signal. This is a very convenient solution for broadcasters, not mandatory. The latter case might seem trivial, but only if

least because the metadata is quite compact. Dynamic range the applied transformation is linear and orthogonal and as

expansion is yet another forward transformation rather than a such perfectly invertible. Yet the forward transform is often

true inversion.

neither linear nor invertible. This is the case for dynamic

Evidently, none of the previous approaches satisfy the range compression (DRC), which is commonly described by

reverse engineering objective of this work. The goal of the a dynamic nonlinear time-variant system. The classical linear

present work, hence, is to invert dynamic range compression, time-invariant (LTI) system theory does not apply here, so a

which is a vital element not only in broadcasting but also tailored solution to the problem at hand must be found instead.

in mastering. The paper is organized as follows. Section II At this point, we also like to highlight the fact that neither

provides a brief introduction to dynamic range compression Volterra nor Wiener model approaches [2]-[4] offer a solution,

and presents the compressor model upon which our consid¬

This work was partially funded by the "Agence Nationale de la Recherche" erations are based. The data model, the formulation of the within the scope of the DReaM project (ANR-09-CORD-006) as well as the problem, and the pursued approach are described next in laboratory with which the first author is affiliated (see below) as part of the

"mobilite juniors" program. Section III. The inversion is discussed in detail in Section

S. Gorlow is with the Computer Science Research Laboratory of Bordeaux IV. Section V illustrates how an integral step of the inversion (LaBRI), CNRS, Bordeaux 1 University, 33405 Talence Cedex, France (e- procedure, namely the search for the zero-crossing of a nonmail: stanislaw.gorlow@labri.fr).

J. D. Reiss is with the Centre for Digital Music (C4DM), Queen Mary, Unilinear function, can be solved in an iterative manner by means versity of London, London El 4NS, UK (e-mail: josh.reiss@eecs.qmul.ac.uk). of linearization. Some other compressor features are discussed 2 in Section VI. The complete algorithm is given in the form of output are single-channel signals. What follows is a description pseudocode in Section VII and its performance is evaluated for of each block by means of a dedicated function.

different compressor settings in Section VIII. Conclusions are The RMS/peak detector as well as the gain computer build drawn in Section IX, where some directions for future work upon a first-order (one-pole) lowpass filter. The sound level or are mentioned. envelope v(n) of the input signal x(n) is obtained by

II. DYNAMIC RANGE COMPRESSION with p {1, 2}, (1)

Dynamic range compression or simply "compression" is

where p = 2 represents an RMS detector, and p = 1 a peak a sound processing technique that attenuates loud sounds

detector. The non-zero smoothing factor β, 0 < β 1, β = and/or amplifies quiet sounds, which in consequence leads to

1— β, may take on different values, 3_att or /¾_ei , depending a reduction of an audio signal's dynamic range. The latter

on whether the detector is in the attack or release phase. The is defined as the difference between the loudest and quietest

condition for the level detector to enter the attack phase and sound measured in decibel. In the following, we will use

to choose 3_att over /3_rei is

the word "compression" having "downward" compression in

mind, though the discussed approach is likewise applicable to

I > v(n— 1) . (2) "upward" compression. Downward compressing means atten¬

A formula that converts a time constant r into a smoothing uating sounds above a certain threshold while leaving sounds

factor is given in [12, p. 109], so e.g.

below the threshold unchanged. A sound engineer might use a

compressor to reduce the dynamic range of source material for β = 1 - exp [-2.2/(/_s -

purposes of aesthetics, intelligibility, recording or broadcast

where /_s is the sampling frequency. The static nonlinearity limitations.

in the gain computer is usually modeled in the logarithmic

Fig. 1 illustrates the basic compressor model from [11,

domain as a continuous ise linear function:

ch. 2] amended by a switchable RMS/peak detector in the side

chain making it compatible with the compressor/limiter model -S^■

- L] if V(n) > L from [12, p. 106]. We will hereafter restrict our considerations F(n) = (3)

0 otherwise to this basic model, as the purpose of the present work is to

demonstrate a general approach rather than a solution to a where S is the slope, V(n) = 20 log₁₀ v(n), and L is the specific problem. First, the input signal is split and a copy threshold in decibel. The slope is further derived from the is sent to the side chain. The detector then calculates the desired compression ratio R according to

magnitude or level of the sidechain signal using the root

mean square (RMS) or peak as a measure for how loud a ^{S = l} - li- ⁽⁴⁾ sound is [12, p. 107]. The detector's temporal behavior is

Equation (3) is equivalently expressed in the linear domain as controlled by the attack and release parameters. The sound

level is compared with the threshold level and, for the case κυ^~ (n) if v(n) > I

it exceeds the threshold, a scale factor is calculated which f (n) = (5)

1 otherwise

corresponds to the ratio of input level to output level. The

knee parameter determines how quick the compression ratio is where I = ΙΟ^1-/²⁰, κ = I^s , and / is the linear scale factor reached. At the end of the side chain, the scale factor is fed to a before filtering. The smoothed gain g is then calculated as the smoothing filter that yields the gain. The response of the filter exponentially-weighted moving average,

is controlled by another set of attack and release parameters.

g(ⁿ) = lf iⁿ) + igiⁿ - 1) with 7 G {7_att , 7rei } , (6) Finally, the gain control applies the smoothed gain to the input

signal and adds a fixed amount of makeup gain to bring the where the decision for the gain computer to choose the attack output signal to a desired level. Such a broadband compressor smoothing factor 7_att instead of 7_rei is subject to operates on the input signal's full bandwidth, treating all

f (n) < g(n - l) . (7) frequencies from zero through the highest frequency equally.

A detailed overview of all sidechain controls of a basic gain The output signal is finally obtained by multiplying the above computer is given in [11, ch. 3], e.g. gain with the input signal:

y(n) = g(n) ^■ x(n) . (8)

III. DATA MODEL, PROBLEM FORMULATION, AND

PROPOSED S OLUTION Due to the fact that the gain g is strictly positive, 0 < g 1, it follows that

A. Data Model and Problem Formulation sgn(y) = sgn(x), (9)

The employed data model is based on the compressor from

where sgn is the signum or sign function. In consequence, it Fig. 1. The following simplifications are additionally made: the

is convenient to factorize the input signal as a product of the knee parameter ("hard" knee) and the makeup gain (fixed at 0

sign and the modulus according to

dB) are ignored. The compressor is defined as a single-input

single-output (SISO) system, that is both the input and the x(n) \ (10) 3 makeup gain x(n) Broadband \ y(n) input output

Gain Control

Side Chain g(")

RMS/Peak v(n)

Compare Scale Filter

Detector attack release threshold , knee '/^' fy////ys ratio '// attack *A release

Gain Computer

Fig. 1. Basic broadband compressor model (feed forward).

The problem at hand is formulated in the following manner: However, since g is a function of \x\, we can express \y\ as a Given the compressed signal y(n) and the model parameters function of one independent variable \x\, and in that manner we obtain an equation with a single unknown:

Θ = [L R p /3_att βτβΐ 7at el ,

\ \ e, x(n - l) , g(n - l)], (12) recover the modulus of the original signal

based on Θ. For a more intuitive use, the smoothing fac β where H represents the entire compressor. If H is invertible, and 7 may be replaced by the time constants τ_ν and τ₉. The i.e. bijective for all n,

\ can be obtained from \ by meaning of each parameter is listed below. jH-' Mn) ] I 0, . . . ] ιΐ ν(η)>1

L The threshold in dB ") l = i _/ M (13) otherwise

R The compression ratio dBj_n : dB_0Ut

p The detector type (RMS or peak) And yet, since v(n) is unknown, the condition for applying i^~u,att The attack time of the envelope filter in ms decompression must be predicted from y(n), x(n— 1), and τ„ _Γθι The release time of the envelope filter in ms g(n — 1), and therefore needs the condition for toggling r₅,att The attack time of the gain filter in ms between the attack and release phases. Depending on the T₅,rei The release time of the gain filter in ms quality of the prediction, the recovered modulus

\ may differ somewhat at transition points from the original modulus |x(n) |, so that in the end

B. Proposed Solution

x(n) f¾ sgn(y) · = z{n) . (14)

The output of the side chain, that is the gain of

|, given

Θ, x(n— 1), and g(n— 1), may be written as In the next section it is shown how such an inverse compressor or decompressor is derived.

4

invertible. Moving the expression on the left-hand side over than one, and thus during attack

to the right-hand side, we may define

(p (^v) - b^{KV S} (ⁿ) + 19{n - 1)]

• [_υ ^Ρ(η) - βϊ(η - 1)] Substituting \ by [vP (n) - βχ(η

and solving (27) for v(n) results in

which shall be termed the characteristic function. The root

or zero-crossing of ζ_ρ(υ) hence represents the sought-after

envelope value v(n) . Once v(n) is found (see Section V), v(n) > Γ β

the current values of x, \x\, and g are updated as per

If v(n) in (25) is substituted by the expression on the right- x(n) = v^p(n) hand side of (28), (25) still holds true, so the following

(η - 1)] /β (20) sufficient condition is used to predict the attack phase of the gain filter: and the decompressed sample is then calculated as

x{n) = sgn(y) · \x(n) \. (21)

B. Attack-Release Phase Toggle Note that the values of all variables are known whenever (29) is evaluated.

1 ) Envelope Smoothing: In case a peak detector is in use,

β takes on two different values. The condition for the attack

phase is then given by (2) and is equivalent to C. Envelope Predictor

v(n) > with v(n) > I. (25) (32)

g(n - i) 5

The normalized error is then level detector and the gain filter are both in either the attack or release phase. Here too, the estimation error increases with e(n) (33) also with |7_att - 7_rel | .

whereas for 7— > 0, (37) converges to infinity: The criterion for optimality is further chosen as the deviation

>0 during compression of the characteristic function from zero, initialized to

|e» | 1∑ =<, ! /(' Ainit = |Cp(¾iit ) | - (40)

7→0

T ∑,^=o (" - *) (" - J - l) Thereupon, (19) may be approximated at a given point using

— OO. (39) the equation of a straight line, ζ = m ^■ v + c, where m is the slope and c is the (^"-intercept. The zero-crossing is

So, the error is smaller for large 7 or short r_g. The smallest

characterized by the equation

possible error is for 7 = 1, which then again depends on the

current and the previous value of /. The error accumulates if p jvj + Aj) - pjvj)

7 < 1 with N. The difference between consecutive /-values is Ai

signal dependent. The signal envelope v(n) fluctuates less and as shown in Fig. 2. The new estimate of the optimal v is found is thus smoother for smaller β or longer τ_ν. f (n) is also more as

stable when the compression ratio R is low. For R = 1, /(n) Ai ^■ C_P{vi)

Vi+l (42) is perfectly constant. The threshold L has a negative impact p {Vi + Ai) - Cp{Vi) on error propagation. The lower L the more the error depends If Vi₊i is less optimal than t¾, the iteration is stopped and ¾ on N, since more samples are compressed with different /- is the final estimate. The iteration is also stopped if Δ;₊ι is values. The RMS detector stabilizes the envelope more than smaller than some e. In the latter case, ¾₊i has the optimal the peak detector, which also reduces the error. Furthermore, value with respect to the chosen criterion. Otherwise, t¾ is since usually r_att < r_rei, the error due to β is smaller during set to Vi₊i and Δ; is set to Δ;₊ι after every step and the release whereas the error due to 7 is smaller during attack. procedure is repeated until ¾₊i has converged to a more Finally, the error is expected to be larger at transition points optimal value. The proposed method is a special form of the between quiet to loud signal passages. secant method with a single initial value

The above error may cause a decision in favor of a wrong

smoothing factor β in (24), like 3_att instead of /3_rei e.g. The

VI. GENERAL REMARKS

decision error from (24) then propagates to (29). Given that

/3att > β_ιβΐ, the error due to (32) is accentuated by (24) with A. Stereo Linking

the consequence that (29) is less reliable than (24). The total When dealing with stereo signals, one might want to apply error in (29) thus scales with | 3_att— β_τβ\ \. In regard to (31), the same amount of gain reduction to both channels to prevent reliability of the envelope's estimate is subject to validity of image shifting. This is achieved through stereo linking. One (24) and (29). A better estimate is obtained when the sound way is to calculate the required amount of gain reduction for 6 each channel independently and then apply the larger amount Algorithm 1 The compressor

to both channels. The question which arises in this context function COMP(I„, Θ, f_s)

is which of the two channels was the gain derived from. Xn - 0

To give an answer resolving the dilemma of ambiguity, one 9n <- 1

solution would be to signal which of the channels carries the for n <- 1 , N do

applied gain. One could then decompress the marked sample if

> x_n then

and use its gain for the other channel. Although very simple β <- 1 - exp [-2.2/(/_s · r„,_att) ]

to implement, this approach provokes an additional data rate else

of 44.1 kbps at 44.1-kHz sampling. A rate-efficient alternative β <- 1 - exp [-2.2/(/_s · r„,_rei)]

that comes with a higher computational cost is realized in the end if

following way. First, one decompresses both the left and the Xn - β \Χη \^Ρ + βΧη

right channel independently and in so doing one obtains two V_n ^ V Xn

estimates ¾ (n) and z_r (n), where subscript I shall denote the if v_n > I then

left channel and subscript r the right channel, respectively. In

a second step, one calculates the compressed values of ¾ (n) else

and z_r(n) and selects the channel for which H[z(n)] = y(n) /n <- l

holds true. In a final step, one updates the remaining variables end if

using the gain of the selected channel. if In < 9n then

7 «- 1— exp [-2.2/(/_s ^T3,att ) ]

else

B. Lookahead

7 «- 1— exp [-2.2/(/_s ^T3,rel) ]

A compressor with a look-ahead function, i.e. with a delay

in the main signal path as in [12, p. 106], uses past input

samples as weighted output samples. Now that some future

Vn ^ QnXn

input samples are required to invert the process— which are

end for

unavailable, the inversion is rendered impossible. g(n) and

return y_n

x(n) must thus be in sync for the approach to be applied.

end function

C. Clipping and Limiting

Another point worth mentioning is that "hard" clipping and compressed sample, the error rate of the attack-release toggle "brick-wall" limiting are special cases of compression with for the gain smoothing filter, and finally the error rate of the the attack time set to zero and the compression ratio set to envelope predictor. The perceptual similarity is assessed by oo : 1. The static nonlinearity F in that particular case is a PEMO-Q [13], [14] with PSM_t as metric. The simulations are one-to-many mapping, which by definition is noninvertible. run in MATLAB on an Intel Core i5-520M CPU.

VII. THE ALGORITHM B. Computational Results

The complete algorithm is divided into three parts, each

Fig. 3 shows the inverse output signal z(n) for a synthetic of them given as pseudocode below. Algorithm 1 outlines

input signal x(n) using an RMS detector. The inverse signal the compressor that corresponds to the model from Sections

is obtained from the compressed signal y(n) with an error of II— III. Algorithm 2 illustrates the decompressor described in

— 129 dBFS. It is visually indistinguishable from the original Section IV, and the iterative search from Section V is finally

signal x (n) . Due to the fact that the signal envelope is constant summarized in Algorithm 3. The parameter /_s represents the

most of the time, the error is noticeable only around transition sampling frequency in kHz.

points— which are few. The decompressor's performance is further evaluated for some commercial compressor presets.

VIII. PERFORMANCE EVALUATION

The used audio material consists of 12 items covering speech,

A. Performance Metrics sung voice, music, and jingles. All items are normalized

To evaluate the inverse approach, the following quantities to —16 L FS [15]. The e- value in the break condition of are measured: the root-mean-square error (RMSE), Algorithm 3 is set to 1 · 10^{~ 12}. A detailed overview of compressor settings and performance figures is given in Tables I— II. The presented results suggest that the decompressed

RMSE - x{n)f (43) signal is perceptually indistinguishable from the original— the

PSM_t -value is flawless. This was also confirmed by the authors given in decibel relative to full scale (dBFS), the perceptual through informal listening tests.

similarity between the original and decompressed signal, and As can be seen from Table II, the largest inversion error is the execution time of the decompressor relative to real time associated with setting E and the smallest with setting B. For (RT). Furthermore, we present the percentage of compressed all five settings, the error is larger when an RMS detector is samples, the mean number of iterations until convergence per in use. This is partly due to the fact that ¾ (v) has a stronger 7

Fig. 3. An illustrative example using an RMS amplitude detector with τ_ν set to 5 ms, a threshold of —20 dBFS (dashed line in the upper right corner), a compression ratio of 4 : 1, and r_g set to 1.6 ms for attack and 17 ms for release, respectively. The RMSE is —129 dBFS.

TABLE I SELECTED COMPRESSOR SETTINGS

Parameter Description A B C D E

L (dBFS) Threshold -32.0 - 19.9 -24.4 -26.3 -38.0

R (dB_in : dB_out) Ratio 3.0 : 1 1.8 : 1 3.2 : 1 7.3 : 1 4.9 : 1

(ms) Envelope attack

5.0 5.0 5.0 5.0 5.0

(ms) Envelope release

(ms) Gain attack 13.0 11.0 5.8 9.0 13.1

(ms) Gain release 435 49 112 705 257

TABLE II

PERFORMANCE FIGURES OBTAINED FOR VARIOUS AUDIO MATERIAL (12 ITEMS)

A B C D E

Peak RMS Peak RMS Peak RMS Peak RMS Peak RMS

RMSE (dBFS) -74.4 -71.2 -97.2 -93.7 -81.0 -77.8 -76.3 -69.5 -63.2 -53.8

PSM_t (PEMO-Q) 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

Execution time (RT) 0.54 0.53 0.40 0.44 0.47 0.49 0.48 0.50 0.54 0.54

Compression rate (%) 78.7 80.8 38.5 50.7 61.8 67.3 67.6 71.8 85.2 86.4

Iterations per sample (#) 1.04 1.02 1.00 1.01 1.07 1.06 1.05 1.03 1.09 1.04

Attack-release error rate (%) 0.05 0.09 0.01 0.01 0.02 0.04 0.01 0.03 0.14 0.51

State error rate (%) 0.02 0.03 0.01 0.01 0.01 0.02 0.02 0.03 0.03 0.05 Algorithm 2 The decompressor Figs. 4-5 show the inversion error as a function of various function DECOMP(¾/„, Θ, e, f_s) time constants. These are in the range of typical attack and

Xn - 0 release times for a limiter (peak) or compressor (RMS) [12, 9n < - 1 pp. 109-110]. It can be observed that the inversion accuracy for n <- 1, N do depends on the release time of the peak detector and not so if |¾ | > - n then much on its attack time for both the envelope and the gain

_/3 ^ 1 - exp [-2.2/(/_s · r„,_att)] filter, see Figs. 4, 5 (b). For the envelope filter, all error curves else exhibit a local dip around a release time of 0.5 s. The error

_/3 ^ 1 - exp [-2.2/(/_s · r„,_rei)] increases steeply below that bound but moderately with larger end if values. In the proximity of 5 s, the error converges to—130 if - /¾„] /β■ g_n then dBFS. With regard to the gain filter, the error behaves in a

reverse manner. The curves in Fig. 5 (b) exhibit a local peak

7 <- 1 - exp [-2.2/(/_s · r_5iatt)]

around 0.5 s with a value of —180 dBFS. It can further be else observed in Fig. 4 (a) that the curve for τ„ _Γθι = 1 ms has a dip where r_Vt&u is close to 1 ms, i.e. where | 3_att— /3_rei | is minimal. This is also true for Fig. 4 (c) and (d): the lowest error is where the attack and release times are identical. As a general rule, the error that is due to the attack-release switch is smaller for the gain filter in Fig. 5.

Looking at Fig. 6 one can see that the error decreases with threshold and increases with compression ratio. At a ratio

of 10 : 1 and beyond, the RMSE scales almost exclusively else with the threshold. The lower the threshold, the stronger the

9n <- 7 + 7g„ error propagates between decompressed samples, which leads

|^ζ„| <^~ to a larger RMSE value. The RMS detector further augments z„ - β\χ_η \^ρ + βχ„ the error because it stabilizes the envelope v(n) more than end if the peak detector. Clearly, the threshold level has the highest x_n <- sgn(y impact on the decompressor's accuracy.

end for

return x_n IX. CONCLUSION AND OUTLOOK

end function

This work examines the problem of finding an inverse to a nonlinear dynamic operator such as a digital compressor. The

Algorithm 3 The iterative search for the zero-crossing proposed approach is characterized by the fact that it uses an function CHARFZERO(¾ , e) explicit signal model to solve the problem. To find the "dry"

Vi <- v_n or uncompressed signal with high accuracy, it is sufficient to repeat know the model parameters. The parameters can e.g. be sent

Δ; <- |C_P(¾) | together with the "wet" or compressed signal in the form of

Vi - Vi - Ai -

+ Δ;) - C_p{vi)} metadata as is the case with Dolby Volume and ReplayGain if \ > Δ; then [16]. A new bitstream format is not mandatory, since many return v_n digital audio standards, like WAV or MP3, provide means to end if tag the audio content with "ancillary" data. With the help of

¾ <- Vi the metadata, one can then reverse the compression applied until |Cp(¾) | < e after mixing or before broadcast. This allows the end user to return t¾ have control over the amount of compression, which may be end function preferred because the sound engineer has no control over the playback environment or the listener's individual taste.

When the compressor parameters are unavailable, they can possibly be estimated from the compressed signal. This may thus be a direction for future work. Another direction would be to apply the approach to more sophisticated models that

n most cases, t e enve ope pre ctor wor s more include a "soft" knee, parallel and multiband compression, or compared to the toggle switch between attack and release. It perform gain smoothing in the logarithmic domain, see [11], can also be observed that the choice of time constants seems [12], [17], [18] and references therein.

to have little impact on decompressor's accuracy. The major In conclusion, we want to draw the reader's attention to the parameters that affect the decompressor's performance are L fact that the presented figures suggest that the decompressor and R, while the threshold is evidently the predominant one: is realtime capable which can pave the way for exciting new the RMSE strongly correlates with the threshold level. applications. One such application could be the restoration of 9

(a) - peak (b) - peak

Fig. 4. RMSE as a function of typical attack and release times using a peak (upper row) or an RMS amplitude detector (lower row). In the left column, the attack time of the envelope filter is varied while the release time is held constant. The right column shows the reverse case. The time constants of the gain filter are fixed at zero. In all four cases, threshold and ratio are fixed at—32 dBFS and 4 : 1, respectively.

Fig. 5. RMSE as a function of typical attack and release times using a peak (upper row) or an RMS amplitude detector (lower row). In the left column, the attack time of the gain filter is varied while the release time is held constant. The right column shows the reverse case. The time constants of the envelope filter are fixed at zero. In all four cases, threshold and ratio are fixed at—32 dBFS and 4 : 1, respectively.

dynamics in over-compressed audio or else the accentuation [11] R. Jeffs, S. Holden, and D. Bohn, Dynamics processor— technology & of transient components, see [19]— [21], by an adaptively tuned application tips, Rane Corporation, 2005.

[12] U. Zolzer, DAFX: Digital audio effects, 2nd ed. The Atrium, Southern decompressor that has no prior knowledge of the compressor Gate, Chichester, West Sussex, P019 8SQ, United Kingdom: John Wiley parameters. & Sons Ltd, 2011, ch. 4.

[13] R. Huber and B. Kollmeier, "PEMO-Q— a new method for objective audio quality assessment using a model of auditory perception," IEEE

ACKNOWLEDGMENT Trans. Audio Speech Lang. Process., vol. 14, no. 6, pp. 1902-1911, Nov.

2006.

This work was carried out in part at the Centre for Digital [14] HorTech gGmbH, "PEMO-Q," http://www.hoertech.de/web_en/ Music (C4DM), Queen Mary, University of London. produkte/pemo-q.shtml, version 1.3.

[15] ITU-R, Algorithms to measure audio programme loudness and true-peak audio level, Mar. 2011, rec. ITU-R BS.1770-2.

REFERENCES [16] Hydrogenaudio, "ReplayGain," http://wiki.hydrogenaudio.org/index.

php?title=ReplayGain, Feb. 2013.

[1] D. Barchiesi and J. Reiss, "Reverse engineering of a mix," J. Audio

[17] J. C. Schmidt and J. C. Rutledge, "Multichannel dynamic range comEng. Soc, vol. 58, pp. 563-576, 2010.

[2] T. Ogunfunmi, Adaptive nonlinear system identification: The Volterra pression for music signals," in Proc. IEEE 1CASSP, vol. 2, 1996, pp.

1013-1016.

and Wiener model approaches. 233 Spring Street, New York, NY

[18] D. Giannoulis, M. Massberg, and J. D. Reiss, "Digital dynamic range 10013, USA: Springer Science+Business Media, LLC, 2007, ch. 3.

[3] Y. Avargel and I. Cohen, "Adaptive nonlinear system identification in compressor design— a tutorial and analysis," J. Audio Eng. Soc. , vol. 60, pp. 399-408, 2012.

the short-time Fourier transform domain," IEEE Trans. Signal Process. ,

[19] M. M. Goodwin and C. Avendano, "Frequency-domain algorithms for vol. 57, no. 10, pp. 3891-3904, Oct. 2009.

[4] , "Modeling and identification of nonlinear systems in the short- audio signal enhancement based on transient modification," J. Audio

Eng. Soc, vol. 54, pp. 827-840, 2006.

time Fourier transform domain," IEEE Trans. Signal Process., vol. 58,

[20] M. Walsh, E. Stein, and J.-M. Jot, "Adaptive dynamics enhancement," no. 1, pp. 291-304, Jan. 2010.

ia AES Convention 130, May 2011.

[5] A. Gelb and W. E. Vander Velde, Multiple-input describing functions

[21] M. Zaunschirm, J. D. Reiss, and A. Klapuri, "A sub-band approach to and nonlinear system design. New York: McGraw-Hill, 1968, ch. 1.

modification of musical transients," Comput. Music J., vol. 36, pp. 23-

[6] P. W. J. M. Nuij, O. H. Bosgra, and M. Steinbuch, "Higher-order

36, 2012.

sinusoidal input describing functions for the analysis of non-linear

systems with harmonic responses," Mech. Syst. Signal Process., vol. 20,

pp. 1883-1904, 2006.

[7] B. Lachaise and L. Daudet, "Inverting dynamics compression with

minimal side information," in Proc. DAFx, 2008, pp. 1-6.

[8] E. Vickers, "The loudness war: Background, speculation and recommendations," ia AES Convention 129, Nov. 2010.

[9] Dolby Digital and Dolby Volume provide a comprehensive loudness

solution, Dolby Laboratories, 2007.

[10] Broadcast loudness issues: The comprehensive Dolby approach, Dolby

Laboratories, 2011.

Claims

-21- Claims

1. A method of decompressing a compressed digital audio signal resulting from the compression of an input signal using a compressor, wherein, for each integer n representing a time instant, y(n) being the level of the compressed signal at instant n, automated means determine:

with

l^zWl = Wl otherwise

where

L is a threshold in dB,

v(n) is a sound level or envelope of the input signal x(n),

Θ represents the compressor model parameters, and

-1

H represents the compressor, where H is its inverse.

2. A method according to the preceding claim wherein, the automated means determine:

v ( n)= x ( n)

with

where :

β and y are the smoothing factors that go with the model parameters τ_ν and ¾, which again are the time constants of the level detector and the gain smoothing filter, the conversion being as follows:

β = 1 - ε^χρ [-2.2/(/_ί. τ_ν)]

and

where f_s is the sampling frequency, in the above equation, g(n-1) being the gain value for the preceding sample, which was calculated as -22-

3. The method of the preceding claim wherein the level detector as well in smoothing filter can be in either the attack or release phase, wherein, if

-} >x(n-l) the detector is assumed to be in attack, so that τ_ν= x_{v aC}k, otherwise τ_ν= iv,_re,_ease. and wherein, for the gain smoothing filter, the condition for attack is (β is now known) p _{1 fv}S£/20 IIS

g{n-\) g n-\)

S being the slope parameter derived from the compression ratio R according to

S = 1 -1/R wherein, given that the condition holds true, ¾= ¾_a((ac¾, otherwise ¾ =

¾ release-

L/20

4. The method of claim 2 wherein, if v(n)>10 , the current sample is decompressed in the following manner:

Vo = CHARFZEROfv(n)]

- Once v₀(n) is obtained, the modulus of the decompressed sample is given by

x(n)= v^p(n)

\ζ{η)\ = Ίΐ[χ{η)-{ΐ- )χ{η-ΐ)]/

- The corresponding gain value is

- Otherwise, the modulus of the sample is computed as

g(n)= γ + (1 -Y)g(n-l)

-And *(ⁿ) is updated according to

(η) = β|ζ(η)|^ρ+(ΐ-β) (η-ΐ)

5. A method according to claim 1 wherein, when the model parameters Θ are not known, the same method is applied to accentuate the shape of the signal y(n) and in that case, the model parameters are tweaked in such a way that the desired effect is achieved.

6. A digital audio signal obtained by using the method of any of the preceding claims.

7. A method of making available on a telecommunication network a signal obtained by using the method of any of claims 1 to 5 in view of downloading it.

8. A computer program comprising code instructions arranged for controling the -23- execution of a method according to any of claims 1 to 5 when the program is performed on a computer.

9. A method of making available on a telecommunication network a program according to the preceding claim in view of downloading it.

10. A data storage medium (18) comprising data representing a signal obtained by using the method of any of claims 1 to 5 and/or data representing a program according to claim 8.

11. A device (16) for decompressing a compressed digital audio signal resulting from the compression of an initial signal wherein the device is arranged to perform a method according to any of claims 1 to 5.