WO2014135914A1 - A method for inverting dynamic range compression of a digital audio signal - Google Patents

A method for inverting dynamic range compression of a digital audio signal Download PDF

Info

Publication number
WO2014135914A1
WO2014135914A1 PCT/IB2013/000595 IB2013000595W WO2014135914A1 WO 2014135914 A1 WO2014135914 A1 WO 2014135914A1 IB 2013000595 W IB2013000595 W IB 2013000595W WO 2014135914 A1 WO2014135914 A1 WO 2014135914A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
gain
attack
compressor
compression
Prior art date
Application number
PCT/IB2013/000595
Other languages
French (fr)
Inventor
Stanislaw GORLOW
Joshua D. REISS
Original Assignee
Universite De Bordeaux 1
Queen Mary, University Of London
Institut Polytechnique De Bordeaux
Universite Bordeaux Segalen
Centre National De La Recherche Scientifique (Cnrs)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Universite De Bordeaux 1, Queen Mary, University Of London, Institut Polytechnique De Bordeaux, Universite Bordeaux Segalen, Centre National De La Recherche Scientifique (Cnrs) filed Critical Universite De Bordeaux 1
Priority to PCT/IB2013/000595 priority Critical patent/WO2014135914A1/en
Publication of WO2014135914A1 publication Critical patent/WO2014135914A1/en

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G7/00Volume compression or expansion in amplifiers
    • H03G7/007Volume compression or expansion in amplifiers of digital or coded signals

Definitions

  • the invention relates to a method for inverting dynamic range compression of a digital audio signal.
  • Sound or audio engineering is an established discipline employed in many areas that are part of our everyday life without us taking notice of it. But not many know how the audio was produced. If we take sound recording and reproduction or broadcasting as an example, we may imagine that a prerecorded signal from an acoustic source is altered by an audio engineer in such a way that it corresponds to certain criteria when played back. The number of these criteria may be large and usually depends on the context. In general, the said alteration of the input signal is a sequence of numerous forward transformations, the reversibility of which is of little or no interest. But what if one wished to do exactly this, that is to reverse the transformation chain, and what is more, in a systematic and repeatable manner ?
  • a method to invert dynamics compression is described in [7], but it requires an instantaneous gain value to be transmitted for each sample of the compressed signal.
  • the gain signal is subsampled and also entropy coded. Not relying on a gain model and thus being extremely generic, this approach is highly inefficient.
  • the Dolby solution [9], [10] also includes dynamic range expansion.
  • the expansion parameters that help reproduce the original program's dynamic range are tuned on the broadcaster side and transmitted as metadata together with the broadcast signal. This is a very convenient solution for broadcasters, not least because the metadata is quite compact. Dynamic range expansion is yet another forward transformation rather than a true inversion.
  • An object of the present invention is to invert dynamic range compression, which is a vital element not only in broadcasting but also in mastering.
  • the invention therefore provides for a method of decompressing a compressed digital audio signal resulting from the compression of an input signal, wherein, for each integer n representing a time instant, y(n) being the level of the compressed signal at instant n, automated means determine:
  • L is a threshold in dB
  • v(n) is a sound level or envelope of the input signal x(n)
  • represents the compressor model parameters
  • H represents the compressor, where H is its inverse.
  • the decompressor which is respresented by H , is defined by the so-called “characteristic function” z p (v). "Characteristic” because the function characterizes the nonlinear behavior of the decompressor represented by the model ⁇ . (The compressor is defined by various parameters such as a threshold, a ratio, an attack, a release, a knee, etc.)
  • the automated means determine:
  • ⁇ and y are the smoothing factors that go with the model parameters ⁇ ⁇ and 3 ⁇ 4, which again are the time constants of the level detector and the gain smoothing filter, the conversion being as follows:
  • the level detector as well as the gain smoothing filter can be in either the attack or release phase, wherein, if g(n-l)
  • the current sample is decompressed in the following manner:
  • model parameters ⁇ are not known, the same method is applied to accentuate the shape of the signal y(n) and in that case, the model parameters are tweaked in such a way that the desired effect is achieved.
  • the invention also provides for :
  • a data storage medium comprising data representing a signal obtained by using the method of the invention and/or data representing a program according to the invention.
  • the invention also provides for a device for decompressing a compressed digital audio signal resulting from the compression of an initial signal wherein the device is arranged to perform the method of the invention.
  • FIG. 1 shows a basic broadband compressor model
  • FIG. 2 shows the graphical illustration for the iterative search of the zero- crossing used in the invention ;
  • FIG. 3 shows an illustrative example using an RMS detector with ⁇ ⁇ set to 5 ms, a threshold of -20 dBFS (dashed line in the upper right corner), a compression ratio of 4 : 1 , and x g set to 1 .6 ms for attack and 17 ms for release, respectively ;
  • FIG. 4 shows RMSE as a function of typical attack and release times using a peak (upper row) or an RMS amplitude detector (lower row).
  • the attack time of the envelope filter is varied while the release time is held constant.
  • the right column shows the reverse case.
  • the time constants of the gain filter are fixed at zero.
  • threshold and ratio are fixed at -32 dBFS and 4 : 1 , respectively;
  • FIG. 5 shows RMSE as a function of typical attack and release times using a peak (upper row) or an RMS amplitude detector (lower row).
  • the attack time of the gain filter is varied while the release time is held constant.
  • the right column shows the reverse case.
  • the time constants of the enveloppe filter are fixed at zero.
  • threshold and ratio are fixed at -32 dBFS and 4 : 1 , respectively, and
  • FIG. 6 shows RMSE as a function of threshold relative to the signal's average loudness level (left column) and compression ratio (right column) using a peak (upper row) or an RMS amplitude detector (lower row).
  • a dynamic nonlinear time-variant operator such as a dynamic range compressor, can be inverted using an explicit signal model.
  • the audio digital signal which is considered here is an audio signal such as a piece of music.
  • Dynamic range compression or simply “compression” is a sound processing technique that attenuates loud sounds and/or amplifies quiet sounds, which in consequence leads to a reduction of an audio signal's dynamic range.
  • dynamic range is the difference between the loudest and quietest sounds measured in decibel.
  • compression means attenuating sounds above a certain threshold while leaving sounds below the threshold unchanged.
  • a sound engineer might use a compressor to reduce the dynamic range of source material for purposes of aesthetics, intelligibility, recording or broadcast limitations.
  • Figure 1 shows a basic broadband compressor model 2 (feed forward).
  • Figure 1 illustrates the basic compressor model from [11 , ch. 2] amended by a switchable RMS/peak detector in the side chain to make it compatible with the compressor/limiter model in [12, p. 106].
  • the detector 6 calculates the magnitude or level of the sidechain signal using the root mean square (RMS) or peak as a measure for how loud a sound is. [12, p. 107]
  • the detector's temporal behavior is controlled by the attack and release parameters.
  • the sound level is compared with the threshold level, and in case it exceeds the threshold a scale factor is calculated in calculator 10 which corresponds to the ratio of input level to output level.
  • the knee parameter determines how quick the compression ratio is reached.
  • the scale factor is fed to a smoothing filter 12 that yields the gain.
  • the response of the filter is controlled by another set of attack and release parameters.
  • the gain control 14 applies the smoothed gain to the input signal and adds a fixed amount of makeup gain to bring the output signal y(n) to a desired level.
  • Such a broadband compressor operates on the input signal's full bandwidth, treating all frequencies from zero through the highest frequency equally.
  • a detailed overview of all sidechain controls of a basic gain computer is given in [11 , ch. 3].
  • the used data model is based upon the compressor from figure 1 .
  • the following simplifications are additionally made: the knee parameter ("hard” knee) and the makeup gain (fixed at 0 dB) are ignored.
  • the compressor is further deemed to be a single-input single-output (SISO) system, that is both the input and the output are single-channel signals. What follows is a description of each block by means of a dedicated function.
  • the RMS/peak detector as well as the gain computer build upon a first-order (one-pole) lowpass filter.
  • the sound level or envelope v(n) of the input signal x(n) at time instant n, n being an integer, is obtained by
  • x ⁇ n) ⁇ x ⁇ n ) ⁇ p +]> x ⁇ n - l ) with /?e ⁇ l,2 ⁇
  • Equation (3) is equivalently expressed in the linear domain as
  • the smoothed gain g is then calculated as the exponentially-weighted moving average
  • the output signal y(n) is finally obtained by multiplying the above gain with the input signal x(n):
  • the problem to be solved can be formulated as follows: given the compressed signal y(n) and the model parameters ⁇ , recover the modulus of the original signal ⁇ x(n) ⁇ from ⁇ y(n) ⁇ based on 0 .
  • the output of the side chain that is the gain of ' ( ⁇ -" , given ⁇ , ( ⁇ _1 ) , and g(n- 1), may be written as
  • G denotes a nonlinear dynamic operator that maps the modulus of the input signal l Wl onto a sequence of instantaneous gain values g(n) according to the compressor model represented by ⁇ .
  • the compressor shall be completely described by the model parameters listed below.
  • T g ,rei The release time of the gain filter in ms.
  • H represents the entire compressor. If H is invertible, i.e. bijective for all n, l W can be obtained from by
  • the condition for applying decompression must be predicted from y(n), t anc
  • the recovered modulus l z ( M )l may differ somewhat at transition points from the original modulus l Wl , so that in the end
  • z p ⁇ v [ Kv- s ⁇ n)+ g ⁇ n- ⁇ )] p .[v p ⁇ n)- x ⁇ n- ⁇ )]- ⁇ y ⁇ nf (19) which shall be termed the characteristic function.
  • the zerocrossing of z P (v) hence represents the sought-after envelope value v(n). Once v(n) is found (see Section V), the current values of x ' ' ' and g are updated as per
  • the gain smoothing filter may be in either attack or release phase.
  • the necessary condition for the attack phase in (7) may also be formulated as
  • the criterion for optimality is further chosen as the deviation of the characteristic function from zero, initialized to
  • Vi+i is less optimal than vi
  • the iteration is stopped and vi is the final estimate.
  • the iteration is also stopped if AM is smaller than some ⁇ . In the latter case, VM has the optimal value with respect to the chosen criterion. Otherwise, w is set to VM and A is set to AM after every iteration step and the procedure is repeated until VM has converged to a more optimal value.
  • a compressor with a look-ahead function i.e. with a delay in the main signal path as in [12, p. 106] uses past input samples to calculate the output sample. Now that future input samples are required to invert the process, which are unavailable, the inversion is rendered impossible. g(n) and x(n) must thus be in sync for the above approach to be applied.
  • Algorithm 1 outlines the compressor that corresponds to the model described above.
  • Algorithm 2 illustrates the decompressor, and the iterative search for the numerical solution of the characteristic function is finally summarized in Algorithm 3.
  • the parameter fs represents the sampling frequency in kHz.
  • Algorithm 3 The iterative search of the zero-crossing function CHARFZERO(v n , ⁇ ) repeat [z p ⁇ ⁇ ,.+ ⁇ ,.) - z p (v . ) ] return v, repetition
  • the perceptual similarity is assessed by PEMO- Q [13], [14] with PSMt as metric. The simulations are run in MATLAB on an Intel Core i5-520M CPU.
  • Fig. 3 shows the inverse output signal z(n) for a synthetic input signal x(n) using an RMS detector. Iti is an illustrative example using an RMS amplitude detector with ⁇ ⁇ set to 5 ms, a threshold of -20 dBFS (dashed line in the upper right corner), a compression ratio of 4 : 1 , and g set to 1 .6 ms for attack and 17 ms for release, respectively.
  • the RMSE is -129 dBFS.
  • the inverse signal is obtained from the compressed signal y(n) with an error of -129 dBFS. It is visually indistinguishable from the original signal x(n).
  • the decompressor's performance is further evaluated for some commercial compressor presets.
  • the used audio material consists of 12 items covering speech, sung voice, music, and jingles. All items are normalized to -16 LKFS [15].
  • the ⁇ -value in the break condition of Algorithm 3 is set to 1 .10-12.
  • Tables l-ll A detailed overview of compressor settings and performance figures is given in Tables l-ll. The presented results suggest that the decompressed signal is perceptually indistinguishable from the original— the PSMt-value is flawless. This was also confirmed by the authors through informal listening tests.
  • Figs. 4-5 show the inversion error as a function of various time constants. These are in the range of typical attack and release times for a limiter (peak) or compressor (RMS) [12, pp. 109-110].
  • Figure 4 shows RMSE as a function of typical attack and release times using a peak (upper row) or an RMS amplitude detector (lower row).
  • the attack time of the envelope filter is varied while the release time is held constant.
  • the right column shows the reverse case.
  • the time constants of the gain filter are fixed at zero. In all four cases, threshold and ratio are fixed at -32 dBFS and 4 : 1 , respectively.
  • Figure 5 shows RMSE as a function of typical attack and release times using a peak (upper row) or an RMS amplitude detector (lower row).
  • the attack time of the gain filter is varied while the release time is held constant.
  • the right column shows the reverse case.
  • the time constants of the enveloppe filter are fixed at zero.
  • threshold and ratio are fixed at -32 dBFS and 4 : 1 , respectively. It can be observed that the inversion accuracy depends on the release time of the peak detector and not so much on its attack time for both the envelope and the gain filter, see Figs. 4, 5 (b).
  • the error increases steeply below that bound but moderately with larger values. In the proximity of 5 s, the error converges to -130 dBFS. With regard to the gain filter, the error behaves in a reverse manner.
  • This program may be made available on a telecommunication network in view of downloading it.
  • the digital audio signal obtained by using the method of the invention may be recorded on a data storage medium so as to obtain a data storage medium comprising data representing the signal.
  • This signal may also be made available on a telecommunication network in view of downloading it.
  • the above mentioned medium could be a disc, a hard drive, a flash drive, a CD or a DVD for example.
  • the invention deals with the problem of finding an inverse to a nonlinear dynamic operator such as a digital compressor.
  • we use an explicit signal model to solve the problem.
  • the parameters can e.g. be sent together with the "wet" or compressed signal in the form of metadata as is the case with Dolby Volume and ReplayGain [16].
  • a new bitstream format is not mandatory, since many digital audio standards, like WAV or MP3, provide means to tag the audio content with "ancillary" data.
  • WAV or MP3 provide means to tag the audio content with "ancillary" data.
  • With the help of the metadata one can then reverse the compression applied after mixing or before broadcast. This allows the end user to have control over the amount of compression, which may be preferred because the sound engineer has no control over the playback environment or the listener's individual taste.
  • the decompressor is realtime capable which can pave the way for exciting new applications.
  • One such application could be the restoration of dynamics in over-compressed audio or else the accentuation of transient components, see [19]-[21 ], by an adaptively tuned decompressor that has no prior knowledge of the compressor parameters.
  • the invention allows to obtain an audio signal with negligible errors , i.e. perceptually indistinguishable from the original uncompressed signal in its "artistic" properties.
  • the invention may also be used with a step of estimation of the model parameters from the compressed signal, when the model parameters are unknown. It could also be used with more sophisticated models that include a soft knee, parallel and multiband compression, or perform gain smoothing in the logarithmic domain, see [11]-[14] and references therein.
  • the invention may also be used as an adaptativ digital audio effect.
  • the decompressor may be used on a digital audio signal which was compressed using an unknown compressor different from the above described compressor, or which was not compressed. The parameters of the decompressor are then adapted to the input signal.
  • the loudness war [8] has resulted in over-compressed audio be large and usually depends on the context. In general, the
  • the Dolby solution [9], [10] also includes dynamic range twofold: to identify the transformation parameters given the
  • Section V illustrates how an integral step of the inversion (LaBRI), CNRS, Bordeaux 1 University, 33405 Talence Cedex, France (e- procedure, namely the search for the zero-crossing of a nonmail: stanislaw.gorlow@labri.fr).
  • 1— ⁇ may take on different values, 3 att or /3 ⁇ 4 e i , depending a reduction of an audio signal's dynamic range. The latter
  • The is defined as the difference between the loudest and quietest
  • Fig. 1 illustrates the basic compressor model from [11,
  • Equation (3) is equivalently expressed in the linear domain as controlled by the attack and release parameters.
  • the scale factor is fed to a before filtering.
  • the smoothed gain g is then calculated as the smoothing filter that yields the gain.
  • Such a broadband compressor smoothing factor 7 att instead of 7 re i is subject to operates on the input signal's full bandwidth, treating all
  • the employed data model is based on the compressor from
  • the compressor is defined as a single-input
  • SISO single-output
  • Fig. 1 Basic broadband compressor model (feed forward).
  • ⁇ , x(n— 1), and g(n— 1) may be written as In the next section it is shown how such an inverse compressor or decompressor is derived.
  • the normalized error is then level detector and the gain filter are both in either the attack or release phase.
  • the estimation error increases with e(n) (33) also with
  • Vi+l (42) is perfectly constant.
  • the threshold L has a negative impact p ⁇ Vi + Ai) - Cp ⁇ Vi) on error propagation. The lower L the more the error depends If Vi + i is less optimal than t3 ⁇ 4, the iteration is stopped and 3 ⁇ 4 on N, since more samples are compressed with different /- is the final estimate. The iteration is also stopped if ⁇ ; + ⁇ is values.
  • the RMS detector stabilizes the envelope more than smaller than some e. In the latter case, 3 ⁇ 4 + i has the optimal the peak detector, which also reduces the error. Furthermore, value with respect to the chosen criterion.
  • t3 ⁇ 4 is since usually r att ⁇ r re i, the error due to ⁇ is smaller during set to Vi + i and ⁇ ; is set to ⁇ ; + ⁇ after every step and the release whereas the error due to 7 is smaller during attack. procedure is repeated until 3 ⁇ 4 + i has converged to a more Finally, the error is expected to be larger at transition points optimal value.
  • the proposed method is a special form of the between quiet to loud signal passages. secant method with a single initial value
  • the above error may cause a decision in favor of a wrong
  • a compressor with a look-ahead function i.e. with a delay
  • Fig. 3 shows the inverse output signal z(n) for a synthetic of them given as pseudocode below.
  • Algorithm 2 illustrates the decompressor described in
  • the used audio material consists of 12 items covering speech
  • FIG. 3 An illustrative example using an RMS amplitude detector with ⁇ ⁇ set to 5 ms, a threshold of —20 dBFS (dashed line in the upper right corner), a compression ratio of 4 : 1, and r g set to 1.6 ms for attack and 17 ms for release, respectively.
  • the RMSE is —129 dBFS.
  • Attack-release error rate (%) 0.05 0.09 0.01 0.01 0.02 0.04 0.01 0.03 0.14 0.51
  • the RMS detector further augments zdon - ⁇ ⁇ ⁇ + ⁇ command the error because it stabilizes the envelope v(n) more than end if the peak detector.
  • the threshold level has the highest x n ⁇ - sgn(y impact on the decompressor's accuracy.
  • Algorithm 3 The iterative search for the zero-crossing proposed approach is characterized by the fact that it uses an function CHARFZERO(3 ⁇ 4 , e) explicit signal model to solve the problem. To find the "dry"
  • the parameters can e.g. be sent
  • t e enve ope pre ctor wor s more include a "soft" knee, parallel and multiband compression, or compared to the toggle switch between attack and release. It perform gain smoothing in the logarithmic domain, see [11], can also be observed that the choice of time constants seems [12], [17], [18] and references therein.
  • RMSE as a function of typical attack and release times using a peak (upper row) or an RMS amplitude detector (lower row).
  • the attack time of the envelope filter is varied while the release time is held constant.
  • the right column shows the reverse case.
  • the time constants of the gain filter are fixed at zero.
  • threshold and ratio are fixed at—32 dBFS and 4 : 1, respectively.
  • RMSE as a function of typical attack and release times using a peak (upper row) or an RMS amplitude detector (lower row).
  • the attack time of the gain filter is varied while the release time is held constant.
  • the right column shows the reverse case.
  • the time constants of the envelope filter are fixed at zero.
  • threshold and ratio are fixed at—32 dBFS and 4 : 1, respectively.
  • ITU-R Algorithms to measure audio programme loudness and true-peak audio level, Mar. 2011, rec. ITU-R BS.1770-2.

Landscapes

  • Tone Control, Compression And Expansion, Limiting Amplitude (AREA)

Abstract

In the method of decompressing a compressed digital audio signal resulting from the compression of the dynamic range of an input signal, for each integer n representing a time instant, y(n) being the level of the compressed signal at instant n, automated means determine: z(n)=sgn[y(n)].∣z(n)∣ with ∣z(n)∣=H − [∣y(n)∣∣θ] if v(n) > 10 L/20 ∣z(n)∣=∣y(n)∣ otherwise where L is a threshold in d B, v(n) is a sound level or envelope of the input signal x(n), θ represents the compressor model parameters, and H represents the compressor, where H -1 is its inverse.

Description

A method for inverting dynamic range compression of a digital audio signal
The invention relates to a method for inverting dynamic range compression of a digital audio signal.
The prior art comprises the following references:
[I] D. Barchiesi and J. Reiss, "Reverse engineering of a mix," J. Audio Eng. Soc, vol. 58, pp. 563-576, 2010.
[2] T. Ogunfunmi, Adaptive nonlinear system identification: The Volterra and Wiener model approaches. 233 Spring Street, New York, NY 10013, USA: Springer Science+Business Media, LLC, 2007, ch. 3.
[3] Y. Avargel and I. Cohen, "Adaptive nonlinear system identification in the short- time Fourier transform domain," IEEE Trans. Signal Process., vol. 57, no. 10, pp. 3891-3904, Oct. 2009.
[4] "Modeling and identification of nonlinear systems in the shorttime Fourier transform domain," IEEE Trans. Signal Process., vol. 58, no. 1 , pp. 291-304, Jan. 2010.
[5] A. Gelb and W. E. Vander Velde, Multiple-input describing functions and nonlinear system design. New York: McGraw-Hill, 1968, ch. 1 .
[6] P. W. J. M. Nuij, O. H. Bosgra, and M. Steinbuch, "Higher-order sinusoidal input describing functions for the analysis of non-linear systems with harmonic responses," Mech. Syst. Signal Process., vol. 20, pp. 1883-1904, 2006.
[7] B. Lachaise and L. Daudet, "Inverting dynamics compression with minimal side information," in Proc. DA Fx, 2008, pp. 1-6.
[8] E. Vickers, "The loudness war: Background, speculation and recommendations," in AES Convention 129, Nov. 2010.
[9] Dolby Digital and Dolby Volume provide a comprehensive loudness solution, Dolby Laboratories, 2007.
[10] Broadcast loudness issues: The comprehensive Dolby approach, Dolby Laboratories, 2011.
[I I] R. Jeffs, S. Holden, and D. Bohn, Dynamics processor— technology & application tips, Rane Corporation, 2005.
[12] U. Zolzer, DAFX: Digital audio effects, 2nd ed. The Atrium, Southern Gate, Chichester, West Sussex, P019 8SQ, United Kingdom: John Wiley & Sons Ltd, 2011 , ch. 4.
[13] J. C. Schmidt and J. C. Rutledge, "Multichannel dynamic range compression for music signals," in Proc. IEEE ICASSP, vol. 2, pp. 1013-1016. [14] D. Giannoulis, M. Massberg, and J. D. Reiss, "Digital dynamic range compressor design— a tutorial and analysis," J. Audio Eng. Soc, vol. 60, pp. 399- 408, 2012.
[15] M. M. Goodwin and C. Avendano, "Frequency-domain algorithms for audio signal enhancement based on transient modification," J. Audio Eng. Soc, vol. 54, pp. 827-840, 2006.
[16] M. Walsh, E. Stein, and J.-M. Jot, "Adaptive dynamics enhancement," in AES Convention 130, May 2011 .
[17] M. Zaunschirm, J. D. Reiss, and A. Klapuri, "A sub-band approach to modification of musical transients," Comput. Music J., vol. 36, pp. 23-36, 2012.
Sound or audio engineering is an established discipline employed in many areas that are part of our everyday life without us taking notice of it. But not many know how the audio was produced. If we take sound recording and reproduction or broadcasting as an example, we may imagine that a prerecorded signal from an acoustic source is altered by an audio engineer in such a way that it corresponds to certain criteria when played back. The number of these criteria may be large and usually depends on the context. In general, the said alteration of the input signal is a sequence of numerous forward transformations, the reversibility of which is of little or no interest. But what if one wished to do exactly this, that is to reverse the transformation chain, and what is more, in a systematic and repeatable manner ?
The research objective of reverse audio engineering is twofold: to identify the transformation parameters given the input and the output signals, as in [1 ], and to regain the input signal that goes with the output signal given the transformation parameters. In both cases, an explicit signal model is mandatory. The latter case might seem trivial, but only if the applied transformation is linear and orthogonal and as such perfectly invertible. Yet the forward transform is often neither linear nor has it an inverse. This is the case for dynamic range compression (DRC), which is commonly described by a dynamic nonlinear time-variant system. The classical linear time-invariant (LTI) system theory does not apply here, so a tailored solution to the problem at hand must be found instead.
At this point, we would also like to highlight the fact that neither Volterra nor Wiener model approaches [2]-[4] offer a solution, and neither do describing functions [5], [6]. These are useful tools when identifying a time-invariant or a slowly varying nonlinear system or analyzing the limit cycle behavior of a feedback system with a static nonlinearity.
A method to invert dynamics compression is described in [7], but it requires an instantaneous gain value to be transmitted for each sample of the compressed signal. To provide a means to control the data rate, the gain signal is subsampled and also entropy coded. Not relying on a gain model and thus being extremely generic, this approach is highly inefficient.
On the other hand, transmitting the uncompressed signal in conjunction with some few typical compression parameters like threshold, ratio, attack, and release would require a much smaller capacity and yield the best possible signal quality with regard to any thinkable measure. A more realistic scenario is when the uncompressed signal is not available on consumer side. This is usually the case for studio music recordings and broadcast material. There, the listener is offered a signal that is meant to sound "good" to everyone. However, the loudness war [8] has resulted in over-compressed audio material. Overcompression makes a song lose its artistic features like excitingness or liveliness and desensitizes the ear thanks to a louder volume. Thus there is a need to restore the original signal's dynamic range and to experience audio free of compression.
In addition to the normalization of the program's loudness level, the Dolby solution [9], [10] also includes dynamic range expansion. The expansion parameters that help reproduce the original program's dynamic range are tuned on the broadcaster side and transmitted as metadata together with the broadcast signal. This is a very convenient solution for broadcasters, not least because the metadata is quite compact. Dynamic range expansion is yet another forward transformation rather than a true inversion.
Evidently, none of the previous approaches satisfy the reverse engineering objective as it was formulated earlier.
An object of the present invention, hence, is to invert dynamic range compression, which is a vital element not only in broadcasting but also in mastering.
The invention therefore provides for a method of decompressing a compressed digital audio signal resulting from the compression of an input signal, wherein, for each integer n representing a time instant, y(n) being the level of the compressed signal at instant n, automated means determine:
z {n) = sgn[ y {n)] . \z { n)\
with
Figure imgf000004_0001
H n)\ = \y { n)\ otherwise
where
L is a threshold in dB, v(n) is a sound level or envelope of the input signal x(n),
Θ represents the compressor model parameters, and
-1
H represents the compressor, where H is its inverse.
-1
The decompressor, which is respresented by H , is defined by the so-called "characteristic function" zp(v). "Characteristic" because the function characterizes the nonlinear behavior of the decompressor represented by the model Θ. (The compressor is defined by various parameters such as a threshold, a ratio, an attack, a release, a knee, etc.)
In one embodiment, the automated means determine:
v { n)= x n)
with
Figure imgf000005_0001
where :
p defines the sound level detector's type, i.e. for an RMS detector p = 2 and for a peak detector p = 1.
β and y are the smoothing factors that go with the model parameters τν and ¾, which again are the time constants of the level detector and the gain smoothing filter, the conversion being as follows: and
Figure imgf000005_0002
where fs is the sampling frequency, in the above equation, g(n-1) being the gain value for the preceding sample, which was calculated as
Figure imgf000005_0003
Advantageously, the level detector as well as the gain smoothing filter can be in either the attack or release phase, wherein, if g(n-l)
the detector is assumed to be in attack, so that τν= xv aCk, otherwise τν= iv,re,ease. and wherein, for the gain smoothing filter, the condition for attack is (β is now known)
I \\ p 1 nSZ/20 IIS
g{n-l) g n-l)
S being the slope parameter derived from the compression ratio R according to
S = 1 -1/R wherein, given that the condition holds true, ¾= ¾a((ac/<, otherwise ¾= ¾,retease.
L/20
In one embodiment, if v(n)>10 , the current sample is decompressed in the following manner:
- First, we compute the root or zero-crossing of the characteristic function zp(v), v0(n) using v(n) as a starting point for an iterative search:
v0= CHARFZEROfv(n)]
- Once v0(n) is obtained, the modulus of the decompressed sample is given by
x(n)=vp(n)
Figure imgf000006_0001
- The corresponding gain value is
Figure imgf000006_0002
- Otherwise, the modulus of the sample is computed as
g(n)= γ + (1 -γ)δ(η-1)
Figure imgf000006_0003
-And is updated according to x\n)=p\
Advantageously, when the model parameters Θ are not known, the same method is applied to accentuate the shape of the signal y(n) and in that case, the model parameters are tweaked in such a way that the desired effect is achieved. The invention also provides for :
- a digital audio signal obtained by using the method of the invention;
- a method of making available on a telecommunication network a signal obtained by using the method of the invention in view of downloading it; - a computer program comprising code instructions arranged for controling the execution of a method of the invention when the program is performed on a computer;
- a method of making available on a telecommunication network a program according to the invention in view of downloading it; and
- a data storage medium comprising data representing a signal obtained by using the method of the invention and/or data representing a program according to the invention.
The invention also provides for a device for decompressing a compressed digital audio signal resulting from the compression of an initial signal wherein the device is arranged to perform the method of the invention.
Other characteristics and advantages of the invention will appear on reading the following description comprising an embodiment given as a non-limiting example, and referring to the attached drawings in which:
- Figure 1 shows a basic broadband compressor model ;
- Figure 2 shows the graphical illustration for the iterative search of the zero- crossing used in the invention ;
- Figure 3 shows an illustrative example using an RMS detector with τν set to 5 ms, a threshold of -20 dBFS (dashed line in the upper right corner), a compression ratio of 4 : 1 , and xg set to 1 .6 ms for attack and 17 ms for release, respectively ;
- Figure 4 shows RMSE as a function of typical attack and release times using a peak (upper row) or an RMS amplitude detector (lower row). In the left column, the attack time of the envelope filter is varied while the release time is held constant. The right column shows the reverse case. The time constants of the gain filter are fixed at zero. In all four cases, threshold and ratio are fixed at -32 dBFS and 4 : 1 , respectively;
- Figure 5 shows RMSE as a function of typical attack and release times using a peak (upper row) or an RMS amplitude detector (lower row). In the left column, the attack time of the gain filter is varied while the release time is held constant. The right column shows the reverse case. The time constants of the enveloppe filter are fixed at zero. In all four cases, threshold and ratio are fixed at -32 dBFS and 4 : 1 , respectively, and
- Figure 6 shows RMSE as a function of threshold relative to the signal's average loudness level (left column) and compression ratio (right column) using a peak (upper row) or an RMS amplitude detector (lower row). The time constants are: τν = 5 ms, Tg att = 20 ms, and Tg,rei = 1 s. Hereafter, we show how a dynamic nonlinear time-variant operator, such as a dynamic range compressor, can be inverted using an explicit signal model. By knowing the model parameters that were used for compression, one is able to recover the original uncompressed signal from a "broadcast" signal with high numerical accuracy and very low computational complexity. A compressor- decompressor scheme is worked out and described in detail. The approach is evaluated on real-world audio material with great success.
In the following, we provide a brief introduction to dynamic range compression and present the compressor model upon which our considerations are based. The data model, the formulation of the problem, and the pursued approach are described next. The inversion is then discussed. Afterwards, we illustrate how an integral step of the inversion procedure, namely the search for the zero-crossing of a nonlinear function, can be solved in an iterative manner by means of linearization. Some other compressor features are then discussed. The complete algorithm is given in the form of pseudocode and its performance is evaluated for different compressor settings.
The audio digital signal which is considered here is an audio signal such as a piece of music.
Dynamic range compression
Dynamic range compression or simply "compression" is a sound processing technique that attenuates loud sounds and/or amplifies quiet sounds, which in consequence leads to a reduction of an audio signal's dynamic range. (In audio, dynamic range is the difference between the loudest and quietest sounds measured in decibel.) In what follows, we will use the word "compression" having "downward" compression in mind. (The invention is likewise applicable to upward compression.) Downward compression means attenuating sounds above a certain threshold while leaving sounds below the threshold unchanged. A sound engineer might use a compressor to reduce the dynamic range of source material for purposes of aesthetics, intelligibility, recording or broadcast limitations.
Figure 1 shows a basic broadband compressor model 2 (feed forward). Figure 1 illustrates the basic compressor model from [11 , ch. 2] amended by a switchable RMS/peak detector in the side chain to make it compatible with the compressor/limiter model in [12, p. 106]. We will hereafter restrict our considerations to this basic model, but the purpose of the invention is a general approach. First, the input signal x(n) is split and a copy is sent to the side chain 4. The detector 6 then calculates the magnitude or level of the sidechain signal using the root mean square (RMS) or peak as a measure for how loud a sound is. [12, p. 107] The detector's temporal behavior is controlled by the attack and release parameters. In the comparator 8, the sound level is compared with the threshold level, and in case it exceeds the threshold a scale factor is calculated in calculator 10 which corresponds to the ratio of input level to output level. The knee parameter determines how quick the compression ratio is reached. At the end of the side chain, the scale factor is fed to a smoothing filter 12 that yields the gain. The response of the filter is controlled by another set of attack and release parameters. Finally, the gain control 14 applies the smoothed gain to the input signal and adds a fixed amount of makeup gain to bring the output signal y(n) to a desired level.
Such a broadband compressor operates on the input signal's full bandwidth, treating all frequencies from zero through the highest frequency equally. A detailed overview of all sidechain controls of a basic gain computer is given in [11 , ch. 3].
Data model, problem formulation and proposed solution
A. Data Model and Problem Formulation
The used data model is based upon the compressor from figure 1 . The following simplifications are additionally made: the knee parameter ("hard" knee) and the makeup gain (fixed at 0 dB) are ignored. The compressor is further deemed to be a single-input single-output (SISO) system, that is both the input and the output are single-channel signals. What follows is a description of each block by means of a dedicated function.
The RMS/peak detector as well as the gain computer build upon a first-order (one-pole) lowpass filter. The sound level or envelope v(n) of the input signal x(n) at time instant n, n being an integer, is obtained by
x { n) = \x { n )\p+]> x {n - l ) with /?e{l,2}
β= 1 -β
v n ) = x \ n ,
(1 )
where p = 2 represents an RMS detector, and p = 1 a peak detector. The non-zero smoothing factor β, 0 < β =< 1 , may take on different values, or /3re/, depending on whether the detector is in attack or release phase. The condition for the level detector to enter the attack phase and to choose over /3re/is
|x(n) | > v(n - 1 ). (2)
A formula that converts a time constant τ into a smoothing factor is given in [12, p. 109]:
β = 1 - exp [-2.2 / (fs . Tv )] ,
where fs is the sampling frequency. The static nonlinearity in the gain computer is usually modeled in the logarithmic domain as a continuous piecewise linear function:
F(n) = -S.[V(n)-L] ifV(n)>L,
F(n) = 0 otherwise (3)
where S is the slope, V (n) = 20 logw v(n), and L is the threshold in decibel. The slope is further derived from the desired compression ratio R according to
S = 1 - 1/R (4)
Equation (3) is equivalently expressed in the linear domain as
f(n) = Kvs(n) if v(n) > I
f(n) = 1 otherwise, (5)
L/20
where 1= 10 , K = r, and f is the linear scale factor before filtering.
The smoothed gain g is then calculated as the exponentially-weighted moving average,
g(n) = yf(n) + yg(n - 1) with Υ{Υ-*Ύ«*) (6)
where the decision for the gain computer to choose the attack smoothing factor
Figure imgf000010_0001
instead of yrei is subject to
f(n)<g(n-1) (7)
The output signal y(n) is finally obtained by multiplying the above gain with the input signal x(n):
y(n) = g(n) . x(n) (8)
Due to the fact that the gain g is strictly positive, 0 < g =< 1, it follows that
sgn(y) = sgn(x), (9)
where sgn is the signum or sign function. In consequence, it is convenient to factorize the input signal as a product of the sign and the modulus according to
Figure imgf000010_0002
with sgn(x) being known due to (9).
The problem to be solved can be formulated as follows: given the compressed signal y(n) and the model parameters Θ, recover the modulus of the original signal \x(n)\ from \y(n)\ based on 0.
B. Solution
The output of the side chain, that is the gain of ' (Μ-" , given θ, (Μ_1) , and g(n- 1), may be written as
g(n)=G[\x(n)\\e,x(n-\),g(n-\)]
In (11), G denotes a nonlinear dynamic operator that maps the modulus of the input signal l Wl onto a sequence of instantaneous gain values g(n) according to the compressor model represented by Θ. The compressor shall be completely described by the model parameters listed below.
L The threshold in dB
R The compression ratio dBin : dBout
p The detector type (peak or RMS)
Tv.att The attack time of the envelope filter in ms
Tv.rei The release time of the envelope filter in ms
Tg.att The attack time of the gain filter in ms
Tg,rei The release time of the gain filter in ms.
Using (11 ), (8) can be solved for ' W yielding
Figure imgf000011_0001
subject to invertibility of G. In order to solve the above equation, one requires the knowledge of g(n), which is unavailable. However, since g is a function of , we can express ^ as a function of one independent variable , and in that manner we obtain an equation with a single unknown:
Figure imgf000011_0002
where H represents the entire compressor. If H is invertible, i.e. bijective for all n, l W can be obtained from by
Figure imgf000011_0003
\x {n )\ = \y { n)\ otherwise (1 3)
And yet, since v(n) is unknown, the condition for applying decompression must be predicted from y(n), t anc| g(n - 1), and so needs the condition for toggling between the attack and release phases. Depending on the quality of the prediction, the recovered modulus lz ( M )l may differ somewhat at transition points from the original modulus l Wl , so that in the end
Figure imgf000011_0004
In the next section, it is shown how such an inverse compressor or decompressor is derived.
Inversion of dynamic range compression
A. Characteristic function
For simplicity, we choose the instantaneous envelope value v(n) instead of as the independent variable in (12). The relation between the two items is given by (1). From (5), (6) and (8), when v(n) > I,
From (1),
Figure imgf000012_0001
?> (17) or equivalently (note that fi≠0by definition)
Figure imgf000012_0002
Moreover, (18) has a unique solution if G and also H are invertible. Moving the expression on the left-hand side over to the right-hand side, we may define
zp{v)=[ Kv-s{n)+ g{n-\)]p.[vp{n)- x{n-\)]-^\y{nf (19) which shall be termed the characteristic function. The zerocrossing of zP(v) hence represents the sought-after envelope value v(n). Once v(n) is found (see Section V), the current values of x ' ' ' and g are updated as per
x(n) = vp(n ,
|χ(»)| = γ[χ(»)-βχ(»-ΐ)]/β (20)
g{n)=\y{n)\l\x{n)\
and the decompressed sample is then calculated as
Figure imgf000012_0003
B. Attack-release phase toggle
1) Envelope smoothing
In case a peak detector is in use, β takes on two different values. The condition for the attack phase is then given by (2) and is equivalent to
\x{n)\p>x~(n-l) _ (22)
Assuming that the past value of is known at time n, what we need to do is to express the unknown in terms of , such that the above equation still holds true. If y is rather small, γ =<0.01 « 1, or equivalently if ¾ is sufficiently large, ¾>=0.5 ms at 44.1 -kHz sampling, the term yf(n) in (15) is negligible, so it approximates (15) as
Figure imgf000012_0004
Solving (23) for and plugging the result into (22), we obtain
Figure imgf000013_0001
If (24) is true, the detector is assumed to be in attack phase.
2) Gain smoothing
Just like the peak detector, the gain smoothing filter may be in either attack or release phase. The necessary condition for the attack phase in (7) may also be formulated as
v(«)>[ /g(«-l)rwith v(«)>/ _ (25)
But since the current envelope value is unknown, we need to substitute v(n) in the above inequality by something that we know. With this in mind we rewrite (15) as
Figure imgf000013_0002
Provided that f(n) < g(n - 1), and due to the fact that 0 < γ =< 1, the expression in square brackets in (26) is smaller than one, and thus during attack
\y{n)\<g{n-l).\x{n)\ ^7) Substituting ' W by ν"(η 1 )] 7P using (20), and solving (27) for v(n) results in
Figure imgf000013_0003
If v(n) in (25) is substituted by the expression on the righthand side of (28), (25) still holds true, so that the following sufficient condition is used to predict the attack phase of the gain filter:
Figure imgf000013_0004
Note that the values of all variables are known whenever (29) is evaluated.
C. Envelope Predictor
An instantaneous estimate of the envelope value v(n) is required not only to predict when compression is active, formally v(n) > I according to (5), but also to initialize the iterative search algorithm in Section V. We resort once more to (15) and note that in the opposite case where v(n) =< I, f(n) = 1, and so
\x{n)\ = \y{n)\l{ + yg{n-l)) _ ^
The sound level of the input signal at time n is therefore
Figure imgf000013_0005
which must be greater than the threshold for compression to set in, whereas β and y are selected based on (24) and (29), respectively. Numerical solution of the characteristic function
An approximate solution to the characteristic function can be found, e.g., by means of linearization. The estimate from (31 ) may moreover serve as a starting point for an iterative search of an optimum:
Figure imgf000014_0001
The criterion for optimality is further chosen as the deviation of the characteristic function from zero, initialized to
Μί = \ζ p {vMt) \ _ ^32)
We may thereupon approximate (19) at a given point using the equation of a straight line, z = m.v + c, where m is the slope and c is the z-intercept. The zero-crossing is characterized by the equation
Figure imgf000014_0002
as is shown in figure 2. This figure shows the graphical illustration for the iterative search of the zero-crossing. The new and hopefully better estimate of the optimal v is hence found as
Figure imgf000014_0003
If Vi+i is less optimal than vi, the iteration is stopped and vi is the final estimate. The iteration is also stopped if AM is smaller than some ε. In the latter case, VM has the optimal value with respect to the chosen criterion. Otherwise, w is set to VM and A is set to AM after every iteration step and the procedure is repeated until VM has converged to a more optimal value.
General remarks
A. Stereo linking
When dealing with stereo signals, one might want to apply the same amount of gain reduction to both channels to prevent image shifting. This is achieved through stereo linking. One way is to calculate the required amount of gain reduction for each channel independently and then apply the larger amount to both channels. The question which arises in this context is which of the two channels was the gain derived from. To give an answer resolving the dilemma of ambiguity, one thinkable solution would be to signal which of the channels carries the applied gain. One could then decompress the marked sample and use its gain for the other channel. Although very simple to implement, this approach provokes an additional data rate of 44.1 kbps at 44.1 -kHz sampling. A rate-efficient alternative that comes with a higher computational cost is realized in the following way: First, one decompresses both the left and the right channel independently and in so doing one obtains two estimates Ζι(η) and zr(n), where subscript / shall denote the left channel and subscript r the right channel, respectively. In a second step, one calculates the compressed values of Zi(n) and zr(n) and selects the channel for which H[z(n)] = y(n) holds true. In a final step, one updates the remaining variables using the gain of the selected channel.
B. Lookahead
A compressor with a look-ahead function, i.e. with a delay in the main signal path as in [12, p. 106], uses past input samples to calculate the output sample. Now that future input samples are required to invert the process, which are unavailable, the inversion is rendered impossible. g(n) and x(n) must thus be in sync for the above approach to be applied.
C. Clipping and limiting
Another point worth mentioning is that "hard" clipping and "brick-wall" limiting are special cases of compression with at least the attack time set to zero and the compression ratio set to∞ : 1 . The static nonlinearity F, in that particular case, is a one-to-many mapping, which by definition is noninvertible.
The algorithm
The complete algorithm is divided into three parts each of them given as pseudocode further below. Algorithm 1 outlines the compressor that corresponds to the model described above. Algorithm 2 illustrates the decompressor, and the iterative search for the numerical solution of the characteristic function is finally summarized in Algorithm 3. The parameter fs represents the sampling frequency in kHz.
Algorithm 1 : The compressor
function CoMP(xn; Θ ; fs)
*„<-o
for n <—l, N do
Figure imgf000016_0001
enfif
if vn>lthen else
/„<-!
end if
if fn <gnthen
Figure imgf000016_0002
else
y<-l-exp[-2.2/(/i.Tg>re/)] end if y n 8 n
end for
return yn
end function Algorithm 2 : The decompressor
function DECOMP(yn; θ; ε ; fs) for n<—l,N do
Figure imgf000017_0001
else
P«-l-e:<p[-2.2/(/,.-rv>J]
end if
if\yn\>i[{KlgnY,s-\ixn]l^.gnthen
y<-l-eXp[-2.2/(/,.Te>J]
else
Figure imgf000017_0002
^ « ^ *y *y
Figure imgf000017_0003
end for
return xn
end function
Algorithm 3 : The iterative search of the zero-crossing function CHARFZERO(vn, ε) repeat
Figure imgf000017_0004
[zp{ ν,.+Δ,.) - zp (v . ) ] return v,„
Figure imgf000017_0005
return v.
end function Performance evaluation
A. Performance metrics
To evaluate the inverse approach, the following quantities are measured: the mean-square error (RMSE),
Figure imgf000018_0001
given in decibel relative to full scale (dBFS), the perceptual similarity between the original and decompressed signal, and the execution time of the decompressor relative to real time (RT). Furthermore, we present the percentage of compressed samples, the mean number of iterations until convergence per compressed sample, the error rate of the attack-release toggle for the gain smoothing filter, and finally the error rate of the envelope predictor. The perceptual similarity is assessed by PEMO- Q [13], [14] with PSMt as metric. The simulations are run in MATLAB on an Intel Core i5-520M CPU.
B. Computational results
Fig. 3 shows the inverse output signal z(n) for a synthetic input signal x(n) using an RMS detector. Iti is an illustrative example using an RMS amplitude detector with τν set to 5 ms, a threshold of -20 dBFS (dashed line in the upper right corner), a compression ratio of 4 : 1 , and g set to 1 .6 ms for attack and 17 ms for release, respectively. The RMSE is -129 dBFS. The inverse signal is obtained from the compressed signal y(n) with an error of -129 dBFS. It is visually indistinguishable from the original signal x(n). Due to the fact that the signal envelope is constant most of the time, the error is noticeable only around transition points— which are few. The decompressor's performance is further evaluated for some commercial compressor presets. The used audio material consists of 12 items covering speech, sung voice, music, and jingles. All items are normalized to -16 LKFS [15]. The ε-value in the break condition of Algorithm 3 is set to 1 .10-12. A detailed overview of compressor settings and performance figures is given in Tables l-ll. The presented results suggest that the decompressed signal is perceptually indistinguishable from the original— the PSMt-value is flawless. This was also confirmed by the authors through informal listening tests.
As can be seen from Table II, the largest inversion error is associated with setting E and the smallest with setting B. For all five settings, the error is larger when an RMS detector is in use. This is partly due to the fact that zp(v) has a stronger curvature in comparison to zp(v). By defining the distance in (40) as
Δ = ^ )Ι
it is possible to attain a smaller error for an RMS detector at the cost of a slightly longer runtime. In most cases, the envelope predictor works more reliably as compared to the toggle switch between attack and release. It can also be observed that the choice of time constants seems to have little impact on decompressor's accuracy. The major parameters that affect the decompressor's performance are L and R, while the threshold is evidently the predominant one: the RMSE strongly correlates with the threshold level.
TABLE I Selected compressor settings
Figure imgf000019_0001
TABLE II
PERFORMANCE FIGURES OBTAINED FOR VARIOUS AUDIO MATERIAL ( 12 ITEMS)
Figure imgf000019_0002
Figs. 4-5 show the inversion error as a function of various time constants. These are in the range of typical attack and release times for a limiter (peak) or compressor (RMS) [12, pp. 109-110]. Figure 4 shows RMSE as a function of typical attack and release times using a peak (upper row) or an RMS amplitude detector (lower row). In the left column, the attack time of the envelope filter is varied while the release time is held constant. The right column shows the reverse case. The time constants of the gain filter are fixed at zero. In all four cases, threshold and ratio are fixed at -32 dBFS and 4 : 1 , respectively. Figure 5 shows RMSE as a function of typical attack and release times using a peak (upper row) or an RMS amplitude detector (lower row). In the left column, the attack time of the gain filter is varied while the release time is held constant. The right column shows the reverse case. The time constants of the enveloppe filter are fixed at zero. In all four cases, threshold and ratio are fixed at -32 dBFS and 4 : 1 , respectively. It can be observed that the inversion accuracy depends on the release time of the peak detector and not so much on its attack time for both the envelope and the gain filter, see Figs. 4, 5 (b). For the envelope filter, all error curves exhibit a local dip around a release time of 0.5 s. The error increases steeply below that bound but moderately with larger values. In the proximity of 5 s, the error converges to -130 dBFS. With regard to the gain filter, the error behaves in a reverse manner. The curves in Fig. 5 (b) exhibit a local peak around 0.5 s with a value of -180 dBFS. It can further be observed in Fig. 4 (a) that the curve for τν,κι = 1 ms has a dip where xv.att is close to 1 ms, i.e. where the modulus of β - prei is minimal. This is also true for Fig. 4 (c) and (d): the lowest error is where the attack and release times are identical. As a general rule, the error that is due to the attack-release switch is smaller for the gain filter in Fig. 5.
This program may be made available on a telecommunication network in view of downloading it.
The digital audio signal obtained by using the method of the invention may be recorded on a data storage medium so as to obtain a data storage medium comprising data representing the signal.
This signal may also be made available on a telecommunication network in view of downloading it.
The above mentioned medium could be a disc, a hard drive, a flash drive, a CD or a DVD for example.
Conclusion The invention deals with the problem of finding an inverse to a nonlinear dynamic operator such as a digital compressor. In the proposed approach, we use an explicit signal model to solve the problem. To find the "dry" or uncompressed signal with high accuracy, it is sufficient to know the model parameters. The parameters can e.g. be sent together with the "wet" or compressed signal in the form of metadata as is the case with Dolby Volume and ReplayGain [16]. A new bitstream format is not mandatory, since many digital audio standards, like WAV or MP3, provide means to tag the audio content with "ancillary" data. With the help of the metadata, one can then reverse the compression applied after mixing or before broadcast. This allows the end user to have control over the amount of compression, which may be preferred because the sound engineer has no control over the playback environment or the listener's individual taste.
When the compressor parameters are unavailable, they can possibly be estimated from the compressed signal. This may thus be a direction for future work. Another direction would be to apply the approach to more sophisticated models that include a "soft" knee, parallel and multiband compression, or perform gain smoothing in the logarithmic domain, see [11], [12], [17], [18] and references therein.
The figures suggest that the decompressor is realtime capable which can pave the way for exciting new applications. One such application could be the restoration of dynamics in over-compressed audio or else the accentuation of transient components, see [19]-[21 ], by an adaptively tuned decompressor that has no prior knowledge of the compressor parameters.
We have shown how an inverse to a nonlinear dynamic operator such as a digital compressor can be derived.
The invention allows to obtain an audio signal with negligible errors , i.e. perceptually indistinguishable from the original uncompressed signal in its "artistic" properties.
Obviously, numerous modifications to the compressor model can be made without leaving the scope of the invention. The invention may also be used with a step of estimation of the model parameters from the compressed signal, when the model parameters are unknown. It could also be used with more sophisticated models that include a soft knee, parallel and multiband compression, or perform gain smoothing in the logarithmic domain, see [11]-[14] and references therein.
The invention may also be used as an adaptativ digital audio effect. Indeed, the decompressor may be used on a digital audio signal which was compressed using an unknown compressor different from the above described compressor, or which was not compressed. The parameters of the decompressor are then adapted to the input signal.
The search for the zero-crossing could be done by other ways, using known functions.
Another description of the invention is given in the following pages. Model-Based Inversion of Dynamic
Range Compression
Stanislaw Gorlow, Graduate Student Member, IEEE and Joshua D. Reiss, Member, IEEE
Abstract— In this work it is shown how a dynamic nonlinear and neither do describing functions [5], [6]. These are useful time-variant operator, such as a dynamic range compressor, can tools when identifying a time -invariant or a slowly varying be inverted using an explicit signal model. By knowing the model nonlinear system or analyzing the limit cycle behavior of a parameters that were used for compression one is able to recover
the original uncompressed signal from a "broadcast" signal with feedback system with a static nonlinearity.
high numerical accuracy and very low computational complexity. A method to invert dynamics compression is described A compressor-decompressor scheme is worked out and described in [7], but it requires an instantaneous gain value to be in detail. The approach is evaluated on real-world audio material transmitted for each sample of the compressed signal. To with great success. provide a means to control the data rate, the gain signal is
Index Terms— Dynamic range compression, inversion, model- subsampled and also entropy coded. This approach is highly based, reverse audio engineering. inefficient as it does not rely on a gain model and is extremely generic.
I. INTRODUCTION On the other hand, transmitting the uncompressed signal in conjunction with a few typical compression parameters like
S OUND or audio engineering is an established discipline
threshold, ratio, attack, and release would require a much employed in many areas that are part of our everyday
smaller capacity and yield the best possible signal quality with life without us taking notice of it. But not many know how
regard to any thinkable measure. A more realistic scenario the audio was produced. If we take sound recording and
is when the uncompressed signal is not available on the reproduction or broadcasting as an example, we may imagine
consumer side. This is usually the case for studio music that a prerecorded signal from an acoustic source is altered by
recordings and broadcast material where the listener is offered an audio engineer in such a way that it corresponds to certain
a signal that is meant to sound "good" to everyone. However, criteria when played back. The number of these criteria may
the loudness war [8] has resulted in over-compressed audio be large and usually depends on the context. In general, the
material. Over-compression makes a song lose its artistic said alteration of the input signal is a sequence of numerous
features like excitingness or liveliness and desensitizes the forward transformations, the reversibility of which is of little
ear thanks to a louder volume. There is a need to restore the or no interest. But what if one wished to do exactly this, that
original signal's dynamic range and to experience audio free is to reverse the transformation chain, and what is more, in a
of compression.
systematic and repeatable manner?
In addition to the normalization of the program's loudness
The research objective of reverse audio engineering is
level, the Dolby solution [9], [10] also includes dynamic range twofold: to identify the transformation parameters given the
expansion. The expansion parameters that help reproduce the input and the output signals, as in [1], and to regain the input
original program's dynamic range are tuned on the broadcaster signal that goes with the output signal given the transformaside and transmitted as metadata together with the broadcast tion parameters. In both cases, an explicit signal model is
signal. This is a very convenient solution for broadcasters, not mandatory. The latter case might seem trivial, but only if
least because the metadata is quite compact. Dynamic range the applied transformation is linear and orthogonal and as
expansion is yet another forward transformation rather than a such perfectly invertible. Yet the forward transform is often
true inversion.
neither linear nor invertible. This is the case for dynamic
Evidently, none of the previous approaches satisfy the range compression (DRC), which is commonly described by
reverse engineering objective of this work. The goal of the a dynamic nonlinear time-variant system. The classical linear
present work, hence, is to invert dynamic range compression, time-invariant (LTI) system theory does not apply here, so a
which is a vital element not only in broadcasting but also tailored solution to the problem at hand must be found instead.
in mastering. The paper is organized as follows. Section II At this point, we also like to highlight the fact that neither
provides a brief introduction to dynamic range compression Volterra nor Wiener model approaches [2]-[4] offer a solution,
and presents the compressor model upon which our consid¬
This work was partially funded by the "Agence Nationale de la Recherche" erations are based. The data model, the formulation of the within the scope of the DReaM project (ANR-09-CORD-006) as well as the problem, and the pursued approach are described next in laboratory with which the first author is affiliated (see below) as part of the
"mobilite juniors" program. Section III. The inversion is discussed in detail in Section
S. Gorlow is with the Computer Science Research Laboratory of Bordeaux IV. Section V illustrates how an integral step of the inversion (LaBRI), CNRS, Bordeaux 1 University, 33405 Talence Cedex, France (e- procedure, namely the search for the zero-crossing of a nonmail: stanislaw.gorlow@labri.fr).
J. D. Reiss is with the Centre for Digital Music (C4DM), Queen Mary, Unilinear function, can be solved in an iterative manner by means versity of London, London El 4NS, UK (e-mail: josh.reiss@eecs.qmul.ac.uk). of linearization. Some other compressor features are discussed 2 in Section VI. The complete algorithm is given in the form of output are single-channel signals. What follows is a description pseudocode in Section VII and its performance is evaluated for of each block by means of a dedicated function.
different compressor settings in Section VIII. Conclusions are The RMS/peak detector as well as the gain computer build drawn in Section IX, where some directions for future work upon a first-order (one-pole) lowpass filter. The sound level or are mentioned. envelope v(n) of the input signal x(n) is obtained by
II. DYNAMIC RANGE COMPRESSION with p {1, 2}, (1)
Figure imgf000023_0001
Dynamic range compression or simply "compression" is
where p = 2 represents an RMS detector, and p = 1 a peak a sound processing technique that attenuates loud sounds
detector. The non-zero smoothing factor β, 0 < β 1, β = and/or amplifies quiet sounds, which in consequence leads to
1— β, may take on different values, 3att or /¾ei , depending a reduction of an audio signal's dynamic range. The latter
on whether the detector is in the attack or release phase. The is defined as the difference between the loudest and quietest
condition for the level detector to enter the attack phase and sound measured in decibel. In the following, we will use
to choose 3att over /3rei is
the word "compression" having "downward" compression in
mind, though the discussed approach is likewise applicable to
Figure imgf000023_0002
I > v(n— 1) . (2) "upward" compression. Downward compressing means atten¬
A formula that converts a time constant r into a smoothing uating sounds above a certain threshold while leaving sounds
factor is given in [12, p. 109], so e.g.
below the threshold unchanged. A sound engineer might use a
compressor to reduce the dynamic range of source material for β = 1 - exp [-2.2/(/s -
Figure imgf000023_0003
purposes of aesthetics, intelligibility, recording or broadcast
where /s is the sampling frequency. The static nonlinearity limitations.
in the gain computer is usually modeled in the logarithmic
Fig. 1 illustrates the basic compressor model from [11,
domain as a continuous ise linear function:
ch. 2] amended by a switchable RMS/peak detector in the side
chain making it compatible with the compressor/limiter model -S
Figure imgf000023_0004
- L] if V(n) > L from [12, p. 106]. We will hereafter restrict our considerations F(n) = (3)
0 otherwise to this basic model, as the purpose of the present work is to
demonstrate a general approach rather than a solution to a where S is the slope, V(n) = 20 log10 v(n), and L is the specific problem. First, the input signal is split and a copy threshold in decibel. The slope is further derived from the is sent to the side chain. The detector then calculates the desired compression ratio R according to
magnitude or level of the sidechain signal using the root
mean square (RMS) or peak as a measure for how loud a S = l - li- (4) sound is [12, p. 107]. The detector's temporal behavior is
Equation (3) is equivalently expressed in the linear domain as controlled by the attack and release parameters. The sound
level is compared with the threshold level and, for the case κυ~ (n) if v(n) > I
it exceeds the threshold, a scale factor is calculated which f (n) = (5)
1 otherwise
corresponds to the ratio of input level to output level. The
knee parameter determines how quick the compression ratio is where I = ΙΟ1-/20, κ = Is , and / is the linear scale factor reached. At the end of the side chain, the scale factor is fed to a before filtering. The smoothed gain g is then calculated as the smoothing filter that yields the gain. The response of the filter exponentially-weighted moving average,
is controlled by another set of attack and release parameters.
g(n) = lf in) + igin - 1) with 7 G {7att , 7rei } , (6) Finally, the gain control applies the smoothed gain to the input
signal and adds a fixed amount of makeup gain to bring the where the decision for the gain computer to choose the attack output signal to a desired level. Such a broadband compressor smoothing factor 7att instead of 7rei is subject to operates on the input signal's full bandwidth, treating all
f (n) < g(n - l) . (7) frequencies from zero through the highest frequency equally.
A detailed overview of all sidechain controls of a basic gain The output signal is finally obtained by multiplying the above computer is given in [11, ch. 3], e.g. gain with the input signal:
y(n) = g(n) x(n) . (8)
III. DATA MODEL, PROBLEM FORMULATION, AND
PROPOSED S OLUTION Due to the fact that the gain g is strictly positive, 0 < g 1, it follows that
A. Data Model and Problem Formulation sgn(y) = sgn(x), (9)
The employed data model is based on the compressor from
where sgn is the signum or sign function. In consequence, it Fig. 1. The following simplifications are additionally made: the
is convenient to factorize the input signal as a product of the knee parameter ("hard" knee) and the makeup gain (fixed at 0
sign and the modulus according to
dB) are ignored. The compressor is defined as a single-input
single-output (SISO) system, that is both the input and the x(n) \ (10) 3 makeup gain x(n) Broadband \ y(n) input output
Gain Control
Side Chain g(")
RMS/Peak v(n)
Compare Scale Filter
Detector attack release threshold , knee '/' fy////ys ratio '// attack *A release
Gain Computer
Fig. 1. Basic broadband compressor model (feed forward).
The problem at hand is formulated in the following manner: However, since g is a function of \x\, we can express \y\ as a Given the compressed signal y(n) and the model parameters function of one independent variable \x\, and in that manner we obtain an equation with a single unknown:
Θ = [L R p /3att βτβΐ 7at el ,
\ \ e, x(n - l) , g(n - l)], (12) recover the modulus of the original signal
Figure imgf000024_0001
based on Θ. For a more intuitive use, the smoothing fac β where H represents the entire compressor. If H is invertible, and 7 may be replaced by the time constants τν and τ9. The i.e. bijective for all n,
Figure imgf000024_0002
\ can be obtained from \ by meaning of each parameter is listed below. jH-' Mn) ] I 0, . . . ] ιΐ ν(η)>1
L The threshold in dB ") l = i / M (13) otherwise
R The compression ratio dBjn : dB0Ut
p The detector type (RMS or peak) And yet, since v(n) is unknown, the condition for applying i~u,att The attack time of the envelope filter in ms decompression must be predicted from y(n), x(n— 1), and τ„ Γθι The release time of the envelope filter in ms g(n — 1), and therefore needs the condition for toggling r5,att The attack time of the gain filter in ms between the attack and release phases. Depending on the T5,rei The release time of the gain filter in ms quality of the prediction, the recovered modulus
Figure imgf000024_0003
\ may differ somewhat at transition points from the original modulus |x(n) |, so that in the end
B. Proposed Solution
x(n) f¾ sgn(y) · = z{n) . (14)
The output of the side chain, that is the gain of
Figure imgf000024_0004
|, given
Θ, x(n— 1), and g(n— 1), may be written as In the next section it is shown how such an inverse compressor or decompressor is derived.
Figure imgf000024_0005
4
Figure imgf000025_0001
invertible. Moving the expression on the left-hand side over than one, and thus during attack
to the right-hand side, we may define
(p (v) - bKV S (n) + 19{n - 1)]
• [υ Ρ(η) - βϊ(η - 1)] Substituting \ by [vP (n) - βχ(η
Figure imgf000025_0002
and solving (27) for v(n) results in
which shall be termed the characteristic function. The root
or zero-crossing of ζρ(υ) hence represents the sought-after
envelope value v(n) . Once v(n) is found (see Section V), v(n) > Γ β
Figure imgf000025_0003
the current values of x, \x\, and g are updated as per
If v(n) in (25) is substituted by the expression on the right- x(n) = vp(n) hand side of (28), (25) still holds true, so the following
(η - 1)] /β (20) sufficient condition is used to predict the attack phase of the gain filter: and the decompressed sample is then calculated as
x{n) = sgn(y) · \x(n) \. (21)
Figure imgf000025_0004
B. Attack-Release Phase Toggle Note that the values of all variables are known whenever (29) is evaluated.
1 ) Envelope Smoothing: In case a peak detector is in use,
β takes on two different values. The condition for the attack
phase is then given by (2) and is equivalent to C. Envelope Predictor
v(n) > with v(n) > I. (25) (32)
Figure imgf000025_0005
g(n - i) 5
The normalized error is then level detector and the gain filter are both in either the attack or release phase. Here too, the estimation error increases with e(n) (33) also with |7att - 7rel | .
Figure imgf000026_0001
Figure imgf000026_0002
whereas for 7— > 0, (37) converges to infinity: The criterion for optimality is further chosen as the deviation
>0 during compression of the characteristic function from zero, initialized to
|e» | 1∑ =<, ! /(' Ainit = |Cp(¾iit ) | - (40)
7→0
T ∑,^=o (" - *) (" - J - l) Thereupon, (19) may be approximated at a given point using
— OO. (39) the equation of a straight line, ζ = m v + c, where m is the slope and c is the ("-intercept. The zero-crossing is
So, the error is smaller for large 7 or short rg. The smallest
characterized by the equation
possible error is for 7 = 1, which then again depends on the
current and the previous value of /. The error accumulates if p jvj + Aj) - pjvj)
7 < 1 with N. The difference between consecutive /-values is Ai
Figure imgf000026_0003
signal dependent. The signal envelope v(n) fluctuates less and as shown in Fig. 2. The new estimate of the optimal v is found is thus smoother for smaller β or longer τν. f (n) is also more as
stable when the compression ratio R is low. For R = 1, /(n) Ai CP{vi)
Vi+l (42) is perfectly constant. The threshold L has a negative impact p {Vi + Ai) - Cp{Vi) on error propagation. The lower L the more the error depends If Vi+i is less optimal than t¾, the iteration is stopped and ¾ on N, since more samples are compressed with different /- is the final estimate. The iteration is also stopped if Δ;+ι is values. The RMS detector stabilizes the envelope more than smaller than some e. In the latter case, ¾+i has the optimal the peak detector, which also reduces the error. Furthermore, value with respect to the chosen criterion. Otherwise, t¾ is since usually ratt < rrei, the error due to β is smaller during set to Vi+i and Δ; is set to Δ;+ι after every step and the release whereas the error due to 7 is smaller during attack. procedure is repeated until ¾+i has converged to a more Finally, the error is expected to be larger at transition points optimal value. The proposed method is a special form of the between quiet to loud signal passages. secant method with a single initial value
The above error may cause a decision in favor of a wrong
smoothing factor β in (24), like 3att instead of /3rei e.g. The
VI. GENERAL REMARKS
decision error from (24) then propagates to (29). Given that
/3att > βιβΐ, the error due to (32) is accentuated by (24) with A. Stereo Linking
the consequence that (29) is less reliable than (24). The total When dealing with stereo signals, one might want to apply error in (29) thus scales with | 3att— βτβ\ \. In regard to (31), the same amount of gain reduction to both channels to prevent reliability of the envelope's estimate is subject to validity of image shifting. This is achieved through stereo linking. One (24) and (29). A better estimate is obtained when the sound way is to calculate the required amount of gain reduction for 6 each channel independently and then apply the larger amount Algorithm 1 The compressor
to both channels. The question which arises in this context function COMP(I„, Θ, fs)
is which of the two channels was the gain derived from. Xn - 0
To give an answer resolving the dilemma of ambiguity, one 9n <- 1
solution would be to signal which of the channels carries the for n <- 1 , N do
applied gain. One could then decompress the marked sample if
Figure imgf000027_0001
> xn then
and use its gain for the other channel. Although very simple β <- 1 - exp [-2.2/(/s · r„,att) ]
to implement, this approach provokes an additional data rate else
of 44.1 kbps at 44.1-kHz sampling. A rate-efficient alternative β <- 1 - exp [-2.2/(/s · r„,rei)]
that comes with a higher computational cost is realized in the end if
following way. First, one decompresses both the left and the Xn - β \Χη \Ρ + βΧη
right channel independently and in so doing one obtains two Vn ^ V Xn
estimates ¾ (n) and zr (n), where subscript I shall denote the if vn > I then
left channel and subscript r the right channel, respectively. In
a second step, one calculates the compressed values of ¾ (n) else
and zr(n) and selects the channel for which H[z(n)] = y(n) /n <- l
holds true. In a final step, one updates the remaining variables end if
using the gain of the selected channel. if In < 9n then
7 «- 1— exp [-2.2/(/s T3,att ) ]
else
B. Lookahead
7 «- 1— exp [-2.2/(/s T3,rel) ]
A compressor with a look-ahead function, i.e. with a delay
in the main signal path as in [12, p. 106], uses past input
samples as weighted output samples. Now that some future
Figure imgf000027_0002
Vn ^ QnXn
input samples are required to invert the process— which are
end for
unavailable, the inversion is rendered impossible. g(n) and
return yn
x(n) must thus be in sync for the approach to be applied.
end function
C. Clipping and Limiting
Another point worth mentioning is that "hard" clipping and compressed sample, the error rate of the attack-release toggle "brick-wall" limiting are special cases of compression with for the gain smoothing filter, and finally the error rate of the the attack time set to zero and the compression ratio set to envelope predictor. The perceptual similarity is assessed by oo : 1. The static nonlinearity F in that particular case is a PEMO-Q [13], [14] with PSMt as metric. The simulations are one-to-many mapping, which by definition is noninvertible. run in MATLAB on an Intel Core i5-520M CPU.
VII. THE ALGORITHM B. Computational Results
The complete algorithm is divided into three parts, each
Fig. 3 shows the inverse output signal z(n) for a synthetic of them given as pseudocode below. Algorithm 1 outlines
input signal x(n) using an RMS detector. The inverse signal the compressor that corresponds to the model from Sections
is obtained from the compressed signal y(n) with an error of II— III. Algorithm 2 illustrates the decompressor described in
— 129 dBFS. It is visually indistinguishable from the original Section IV, and the iterative search from Section V is finally
signal x (n) . Due to the fact that the signal envelope is constant summarized in Algorithm 3. The parameter /s represents the
most of the time, the error is noticeable only around transition sampling frequency in kHz.
points— which are few. The decompressor's performance is further evaluated for some commercial compressor presets.
VIII. PERFORMANCE EVALUATION
The used audio material consists of 12 items covering speech,
A. Performance Metrics sung voice, music, and jingles. All items are normalized
To evaluate the inverse approach, the following quantities to —16 L FS [15]. The e- value in the break condition of are measured: the root-mean-square error (RMSE), Algorithm 3 is set to 1 · 10~ 12. A detailed overview of compressor settings and performance figures is given in Tables I— II. The presented results suggest that the decompressed
RMSE - x{n)f (43) signal is perceptually indistinguishable from the original— the
Figure imgf000027_0003
PSMt -value is flawless. This was also confirmed by the authors given in decibel relative to full scale (dBFS), the perceptual through informal listening tests.
similarity between the original and decompressed signal, and As can be seen from Table II, the largest inversion error is the execution time of the decompressor relative to real time associated with setting E and the smallest with setting B. For (RT). Furthermore, we present the percentage of compressed all five settings, the error is larger when an RMS detector is samples, the mean number of iterations until convergence per in use. This is partly due to the fact that ¾ (v) has a stronger 7
Figure imgf000028_0001
Fig. 3. An illustrative example using an RMS amplitude detector with τν set to 5 ms, a threshold of —20 dBFS (dashed line in the upper right corner), a compression ratio of 4 : 1, and rg set to 1.6 ms for attack and 17 ms for release, respectively. The RMSE is —129 dBFS.
TABLE I SELECTED COMPRESSOR SETTINGS
Parameter Description A B C D E
L (dBFS) Threshold -32.0 - 19.9 -24.4 -26.3 -38.0
R (dBin : dBout) Ratio 3.0 : 1 1.8 : 1 3.2 : 1 7.3 : 1 4.9 : 1
(ms) Envelope attack
5.0 5.0 5.0 5.0 5.0
(ms) Envelope release
(ms) Gain attack 13.0 11.0 5.8 9.0 13.1
(ms) Gain release 435 49 112 705 257
TABLE II
PERFORMANCE FIGURES OBTAINED FOR VARIOUS AUDIO MATERIAL (12 ITEMS)
A B C D E
Peak RMS Peak RMS Peak RMS Peak RMS Peak RMS
RMSE (dBFS) -74.4 -71.2 -97.2 -93.7 -81.0 -77.8 -76.3 -69.5 -63.2 -53.8
PSMt (PEMO-Q) 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
Execution time (RT) 0.54 0.53 0.40 0.44 0.47 0.49 0.48 0.50 0.54 0.54
Compression rate (%) 78.7 80.8 38.5 50.7 61.8 67.3 67.6 71.8 85.2 86.4
Iterations per sample (#) 1.04 1.02 1.00 1.01 1.07 1.06 1.05 1.03 1.09 1.04
Attack-release error rate (%) 0.05 0.09 0.01 0.01 0.02 0.04 0.01 0.03 0.14 0.51
State error rate (%) 0.02 0.03 0.01 0.01 0.01 0.02 0.02 0.03 0.03 0.05 Algorithm 2 The decompressor Figs. 4-5 show the inversion error as a function of various function DECOMP(¾/„, Θ, e, fs) time constants. These are in the range of typical attack and
Xn - 0 release times for a limiter (peak) or compressor (RMS) [12, 9n < - 1 pp. 109-110]. It can be observed that the inversion accuracy for n <- 1, N do depends on the release time of the peak detector and not so if |¾ | > - n then much on its attack time for both the envelope and the gain
/3 ^ 1 - exp [-2.2/(/s · r„,att)] filter, see Figs. 4, 5 (b). For the envelope filter, all error curves else exhibit a local dip around a release time of 0.5 s. The error
/3 ^ 1 - exp [-2.2/(/s · r„,rei)] increases steeply below that bound but moderately with larger end if values. In the proximity of 5 s, the error converges to—130 if - /¾„] /β■ gn then dBFS. With regard to the gain filter, the error behaves in a
Figure imgf000029_0001
reverse manner. The curves in Fig. 5 (b) exhibit a local peak
7 <- 1 - exp [-2.2/(/s · r5iatt)]
around 0.5 s with a value of —180 dBFS. It can further be else observed in Fig. 4 (a) that the curve for τ„ Γθι = 1 ms has a dip where rVt&u is close to 1 ms, i.e. where | 3att— /3rei | is minimal. This is also true for Fig. 4 (c) and (d): the lowest error is where the attack and release times are identical. As a general rule, the error that is due to the attack-release switch is smaller for the gain filter in Fig. 5.
Looking at Fig. 6 one can see that the error decreases with threshold and increases with compression ratio. At a ratio
Figure imgf000029_0002
of 10 : 1 and beyond, the RMSE scales almost exclusively else with the threshold. The lower the threshold, the stronger the
9n <- 7 + 7g„ error propagates between decompressed samples, which leads
|ζ„| <~ to a larger RMSE value. The RMS detector further augments z„ - β\χη \ρ + βχ„ the error because it stabilizes the envelope v(n) more than end if the peak detector. Clearly, the threshold level has the highest xn <- sgn(y impact on the decompressor's accuracy.
end for
return xn IX. CONCLUSION AND OUTLOOK
end function
This work examines the problem of finding an inverse to a nonlinear dynamic operator such as a digital compressor. The
Algorithm 3 The iterative search for the zero-crossing proposed approach is characterized by the fact that it uses an function CHARFZERO(¾ , e) explicit signal model to solve the problem. To find the "dry"
Vi <- vn or uncompressed signal with high accuracy, it is sufficient to repeat know the model parameters. The parameters can e.g. be sent
Δ; <- |CP(¾) | together with the "wet" or compressed signal in the form of
Vi - Vi - Ai -
Figure imgf000029_0003
+ Δ;) - Cp{vi)} metadata as is the case with Dolby Volume and ReplayGain if \ > Δ; then [16]. A new bitstream format is not mandatory, since many return vn digital audio standards, like WAV or MP3, provide means to end if tag the audio content with "ancillary" data. With the help of
¾ <- Vi the metadata, one can then reverse the compression applied until |Cp(¾) | < e after mixing or before broadcast. This allows the end user to return t¾ have control over the amount of compression, which may be end function preferred because the sound engineer has no control over the playback environment or the listener's individual taste.
When the compressor parameters are unavailable, they can possibly be estimated from the compressed signal. This may thus be a direction for future work. Another direction would be to apply the approach to more sophisticated models that
Figure imgf000029_0004
n most cases, t e enve ope pre ctor wor s more include a "soft" knee, parallel and multiband compression, or compared to the toggle switch between attack and release. It perform gain smoothing in the logarithmic domain, see [11], can also be observed that the choice of time constants seems [12], [17], [18] and references therein.
to have little impact on decompressor's accuracy. The major In conclusion, we want to draw the reader's attention to the parameters that affect the decompressor's performance are L fact that the presented figures suggest that the decompressor and R, while the threshold is evidently the predominant one: is realtime capable which can pave the way for exciting new the RMSE strongly correlates with the threshold level. applications. One such application could be the restoration of 9
(a) - peak (b) - peak
Figure imgf000030_0001
Fig. 4. RMSE as a function of typical attack and release times using a peak (upper row) or an RMS amplitude detector (lower row). In the left column, the attack time of the envelope filter is varied while the release time is held constant. The right column shows the reverse case. The time constants of the gain filter are fixed at zero. In all four cases, threshold and ratio are fixed at—32 dBFS and 4 : 1, respectively.
Figure imgf000030_0002
Fig. 5. RMSE as a function of typical attack and release times using a peak (upper row) or an RMS amplitude detector (lower row). In the left column, the attack time of the gain filter is varied while the release time is held constant. The right column shows the reverse case. The time constants of the envelope filter are fixed at zero. In all four cases, threshold and ratio are fixed at—32 dBFS and 4 : 1, respectively.
Figure imgf000031_0001
dynamics in over-compressed audio or else the accentuation [11] R. Jeffs, S. Holden, and D. Bohn, Dynamics processor— technology & of transient components, see [19]— [21], by an adaptively tuned application tips, Rane Corporation, 2005.
[12] U. Zolzer, DAFX: Digital audio effects, 2nd ed. The Atrium, Southern decompressor that has no prior knowledge of the compressor Gate, Chichester, West Sussex, P019 8SQ, United Kingdom: John Wiley parameters. & Sons Ltd, 2011, ch. 4.
[13] R. Huber and B. Kollmeier, "PEMO-Q— a new method for objective audio quality assessment using a model of auditory perception," IEEE
ACKNOWLEDGMENT Trans. Audio Speech Lang. Process., vol. 14, no. 6, pp. 1902-1911, Nov.
2006.
This work was carried out in part at the Centre for Digital [14] HorTech gGmbH, "PEMO-Q," http://www.hoertech.de/web_en/ Music (C4DM), Queen Mary, University of London. produkte/pemo-q.shtml, version 1.3.
[15] ITU-R, Algorithms to measure audio programme loudness and true-peak audio level, Mar. 2011, rec. ITU-R BS.1770-2.
REFERENCES [16] Hydrogenaudio, "ReplayGain," http://wiki.hydrogenaudio.org/index.
php?title=ReplayGain, Feb. 2013.
[1] D. Barchiesi and J. Reiss, "Reverse engineering of a mix," J. Audio
[17] J. C. Schmidt and J. C. Rutledge, "Multichannel dynamic range comEng. Soc, vol. 58, pp. 563-576, 2010.
[2] T. Ogunfunmi, Adaptive nonlinear system identification: The Volterra pression for music signals," in Proc. IEEE 1CASSP, vol. 2, 1996, pp.
1013-1016.
and Wiener model approaches. 233 Spring Street, New York, NY
[18] D. Giannoulis, M. Massberg, and J. D. Reiss, "Digital dynamic range 10013, USA: Springer Science+Business Media, LLC, 2007, ch. 3.
[3] Y. Avargel and I. Cohen, "Adaptive nonlinear system identification in compressor design— a tutorial and analysis," J. Audio Eng. Soc. , vol. 60, pp. 399-408, 2012.
the short-time Fourier transform domain," IEEE Trans. Signal Process. ,
[19] M. M. Goodwin and C. Avendano, "Frequency-domain algorithms for vol. 57, no. 10, pp. 3891-3904, Oct. 2009.
[4] , "Modeling and identification of nonlinear systems in the short- audio signal enhancement based on transient modification," J. Audio
Eng. Soc, vol. 54, pp. 827-840, 2006.
time Fourier transform domain," IEEE Trans. Signal Process., vol. 58,
[20] M. Walsh, E. Stein, and J.-M. Jot, "Adaptive dynamics enhancement," no. 1, pp. 291-304, Jan. 2010.
ia AES Convention 130, May 2011.
[5] A. Gelb and W. E. Vander Velde, Multiple-input describing functions
[21] M. Zaunschirm, J. D. Reiss, and A. Klapuri, "A sub-band approach to and nonlinear system design. New York: McGraw-Hill, 1968, ch. 1.
modification of musical transients," Comput. Music J., vol. 36, pp. 23-
[6] P. W. J. M. Nuij, O. H. Bosgra, and M. Steinbuch, "Higher-order
36, 2012.
sinusoidal input describing functions for the analysis of non-linear
systems with harmonic responses," Mech. Syst. Signal Process., vol. 20,
pp. 1883-1904, 2006.
[7] B. Lachaise and L. Daudet, "Inverting dynamics compression with
minimal side information," in Proc. DAFx, 2008, pp. 1-6.
[8] E. Vickers, "The loudness war: Background, speculation and recommendations," ia AES Convention 129, Nov. 2010.
[9] Dolby Digital and Dolby Volume provide a comprehensive loudness
solution, Dolby Laboratories, 2007.
[10] Broadcast loudness issues: The comprehensive Dolby approach, Dolby
Laboratories, 2011.

Claims

-21- Claims
1. A method of decompressing a compressed digital audio signal resulting from the compression of an input signal using a compressor, wherein, for each integer n representing a time instant, y(n) being the level of the compressed signal at instant n, automated means determine:
Figure imgf000032_0001
with
Figure imgf000032_0002
lzWl = Wl otherwise
where
L is a threshold in dB,
v(n) is a sound level or envelope of the input signal x(n),
Θ represents the compressor model parameters, and
-1
H represents the compressor, where H is its inverse.
2. A method according to the preceding claim wherein, the automated means determine:
v ( n)= x ( n)
with
Figure imgf000032_0003
where :
p defines the sound level detector's type, i.e. for an RMS detector p = 2 and for a peak detector p = 1.
β and y are the smoothing factors that go with the model parameters τν and ¾, which again are the time constants of the level detector and the gain smoothing filter, the conversion being as follows:
β = 1 - εχρ [-2.2/(/ί. τν)]
and
Figure imgf000032_0004
where fs is the sampling frequency, in the above equation, g(n-1) being the gain value for the preceding sample, which was calculated as -22-
3. The method of the preceding claim wherein the level detector as well in smoothing filter can be in either the attack or release phase, wherein, if
-} >x(n-l) the detector is assumed to be in attack, so that τν= xv aCk, otherwise τν= iv,re,ease. and wherein, for the gain smoothing filter, the condition for attack is (β is now known) p 1 fvS£/20 IIS
g{n-\) g n-\)
S being the slope parameter derived from the compression ratio R according to
S = 1 -1/R wherein, given that the condition holds true, ¾= ¾a((ac¾, otherwise ¾ =
¾ release-
L/20
4. The method of claim 2 wherein, if v(n)>10 , the current sample is decompressed in the following manner:
- First, we compute the root or zero-crossing of the characteristic function zp(v), v0(n) using v(n) as a starting point for an iterative search:
Vo = CHARFZEROfv(n)]
- Once v0(n) is obtained, the modulus of the decompressed sample is given by
x(n)= vp(n)
\ζ{η)\ = Ίΐ[χ{η)-{ΐ- )χ{η-ΐ)]/
- The corresponding gain value is
Figure imgf000033_0001
- Otherwise, the modulus of the sample is computed as
g(n)= γ + (1 -Y)g(n-l)
Figure imgf000033_0002
-And *(n) is updated according to
(η) = β|ζ(η)|ρ+(ΐ-β) (η-ΐ)
5. A method according to claim 1 wherein, when the model parameters Θ are not known, the same method is applied to accentuate the shape of the signal y(n) and in that case, the model parameters are tweaked in such a way that the desired effect is achieved.
6. A digital audio signal obtained by using the method of any of the preceding claims.
7. A method of making available on a telecommunication network a signal obtained by using the method of any of claims 1 to 5 in view of downloading it.
8. A computer program comprising code instructions arranged for controling the -23- execution of a method according to any of claims 1 to 5 when the program is performed on a computer.
9. A method of making available on a telecommunication network a program according to the preceding claim in view of downloading it.
10. A data storage medium (18) comprising data representing a signal obtained by using the method of any of claims 1 to 5 and/or data representing a program according to claim 8.
11. A device (16) for decompressing a compressed digital audio signal resulting from the compression of an initial signal wherein the device is arranged to perform a method according to any of claims 1 to 5.
PCT/IB2013/000595 2013-03-04 2013-03-04 A method for inverting dynamic range compression of a digital audio signal WO2014135914A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/IB2013/000595 WO2014135914A1 (en) 2013-03-04 2013-03-04 A method for inverting dynamic range compression of a digital audio signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IB2013/000595 WO2014135914A1 (en) 2013-03-04 2013-03-04 A method for inverting dynamic range compression of a digital audio signal

Publications (1)

Publication Number Publication Date
WO2014135914A1 true WO2014135914A1 (en) 2014-09-12

Family

ID=48521365

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2013/000595 WO2014135914A1 (en) 2013-03-04 2013-03-04 A method for inverting dynamic range compression of a digital audio signal

Country Status (1)

Country Link
WO (1) WO2014135914A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3031638A1 (en) * 2015-01-14 2016-07-15 Univ Bordeaux DECOMPRESSION METHOD AND CORRESPONDING DEVICE
FR3037752A1 (en) * 2015-06-17 2016-12-23 Univ Bordeaux METHODS FOR REDUCING CRETE FACTOR AND INVERTING OFDM SIGNAL, AND DEVICES THEREOF
WO2018177787A1 (en) * 2017-03-31 2018-10-04 Dolby International Ab Inversion of dynamic range control
CN110679083A (en) * 2017-03-31 2020-01-10 杜比国际公司 Dynamic range controlled inversion

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5375255A (en) * 1991-09-06 1994-12-20 U.S. Philips Corporation Radio receiver comprising analog dynamic compression and digital expansion
US6556685B1 (en) * 1998-11-06 2003-04-29 Harman Music Group Companding noise reduction system with simultaneous encode and decode

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5375255A (en) * 1991-09-06 1994-12-20 U.S. Philips Corporation Radio receiver comprising analog dynamic compression and digital expansion
US6556685B1 (en) * 1998-11-06 2003-04-29 Harman Music Group Companding noise reduction system with simultaneous encode and decode

Non-Patent Citations (29)

* Cited by examiner, † Cited by third party
Title
"Broadcast loudness issues: The comprehensive Dolby approach", 2011, DOLBY LABORATORIES
"Dolby Digital and Dolby Volume provide a comprehensive loudness solution", 2007, DOLBY LABORATORIES
"Modeling and identification of nonlinear systems in the short- time Fourier transform domain", IEEE TRANS. SIGNAL PROCESS., vol. 58, no. 1, January 2010 (2010-01-01), pages 291 - 304
"Modeling and identification of nonlinear systems in the shorttime Fourier transform domain", IEEE TRANS. SIGNAL PROCESS., vol. 58, no. 1, January 2010 (2010-01-01), pages 291 - 304
"PEMO-Q", HORTECH GGMBH
A. GELB; W. E. VANDER VELDE: "Multiple-input describing functions and nonlinear system design", 1968, MCGRAW-HILL
ALGORITHMS TO MEASURE AUDIO PROGRAMME LOUDNESS AND TRUE-PEAK AUDIO LEVEL, March 2011 (2011-03-01)
B. LACHAISE; L. DAUDET: "Inverting dynamics compression with minimal side information", PROC. DAFX, 2008, pages 1 - 6
D. BARCHIESI; J. REISS: "Reverse engineering of a mix", J. AUDIO ENG. SOC., vol. 58, 2010, pages 563 - 576, XP040567060
D. GIANNOULIS; M. MASSBERG; J. D. REISS: "Digital dynamic range compressor design?a tutorial and analysis", J. AUDIO ENG. SOC., vol. 60, 2012, pages 399 - 408
D. GIANNOULIS; M. MASSBERG; J. D. REISS: "Digital dynamic range compressor design-a tutorial and analysis", J. AUDIO ENG. SOC., vol. 60, 2012, pages 399 - 408
E. VICKERS: "The loudness war: Background, speculation and recommendations", AES CONVENTION, vol. 129, November 2010 (2010-11-01)
GIANNOULIS DIMITRIOS ET AL: "Digital Dynamic Range Compressor Design?A Tutorial and Analysis", JAES, AES, 60 EAST 42ND STREET, ROOM 2520 NEW YORK 10165-2520, USA, vol. 60, no. 6, 1 June 2012 (2012-06-01), pages 399 - 408, XP040574680 *
J. C. SCHMIDT; J. C. RUTLEDGE: "Multichannel dynamic range compression for music signals", PROC. IEEE ICASSP, vol. 2, 1996, pages 1013 - 1016, XP002150259, DOI: doi:10.1109/ICASSP.1996.543295
J. C. SCHMIDT; J. C. RUTLEDGE: "Multichannel dynamic range compression for music signals", PROC. IEEE ICASSP, vol. 2, pages 1013 - 1016
M. M. GOODWIN; C. AVENDANO: "Frequency-domain algorithms for audio signal enhancement based on transient modification", J. AUDIO ENG. SOC., vol. 54, 2006, pages 827 - 840, XP040507992
M. WALSH; E. STEIN; J.-M. JOT: "Adaptive dynamics enhancement", AES CONVENTION, vol. 130, May 2011 (2011-05-01)
M. WALSH; E. STEIN; J.-M. JOT: "Adaptive dynamics enhancement", AES CONVENTION, vol. 130, pages MAY 2011
M. ZAUNSCHIRM; J. D. REISS; A. KLAPURI: "A sub-band approach to modification of musical transients", COMPUT. MUSIC J., vol. 36, 2012, pages 23 - 36
P. W. J. M. NUIJ; O. H. BOSGRA; M. STEINBUCH: "Higher-order sinusoidal input describing functions for the analysis of non-linear systems with harmonic responses", MECH. SYST. SIGNAL PROCESS., vol. 20, 2006, pages 1883 - 1904, XP024930337, DOI: doi:10.1016/j.ymssp.2005.04.006
R. HUBER; B. KOLLMEIER: "PEMO-Q - a new method for objective audio quality assessment using a model of auditory perception", IEEE TRANS. AUDIO SPEECH LANG. PROCESS., vol. 14, no. 6, November 2006 (2006-11-01), pages 1902 - 1911
R. JEFFS; S. HOLDEN; D. BOHN: "Dynamics processor - technology & application tips", 2005, RANE CORPORATION
REPLAYGAIN, February 2013 (2013-02-01), Retrieved from the Internet <URL:http://wiki.hydrogenaudio.org/index. php?title=ReplayGain>
T. OGUNFUNMI: "Adaptive nonlinear system identification: The Volterra and Wiener model approaches", 2007, SPRINGER SCIENCE+BUSINESS MEDIA
T. OGUNFUNMI: "Adaptive nonlinear system identification: The Volterra and Wiener model approaches", 2007, SPRINGER SCIENCE+BUSINESS MEDIA, LLC
U. Z61ZER: "The Atrium, Southern Gate, Chichester, West Sussex, P019 8SQ, United Kingdom", 2011, JOHN WILEY & SONS LTD, article "DAFX: Digital audio effects"
U. ZOLZER: "DAFX: Digital audio effects", 2011, JOHN WILEY & SONS LTD
Y AVARGEL; I. COHEN: "Adaptive nonlinear system identification in the short-time Fourier transform domain", IEEE TRANS. SIGNAL PROCESS., vol. 57, no. 10, October 2009 (2009-10-01), pages 3891 - 3904, XP011268530, DOI: doi:10.1109/TSP.2009.2021713
Y. AVARGEL; COHEN: "Adaptive nonlinear system identification in the short- time Fourier transform domain", IEEE TRANS. SIGNAL PROCESS., vol. 57, no. 10, October 2009 (2009-10-01), pages 3891 - 3904, XP011268530, DOI: doi:10.1109/TSP.2009.2021713

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3031638A1 (en) * 2015-01-14 2016-07-15 Univ Bordeaux DECOMPRESSION METHOD AND CORRESPONDING DEVICE
FR3037752A1 (en) * 2015-06-17 2016-12-23 Univ Bordeaux METHODS FOR REDUCING CRETE FACTOR AND INVERTING OFDM SIGNAL, AND DEVICES THEREOF
WO2018177787A1 (en) * 2017-03-31 2018-10-04 Dolby International Ab Inversion of dynamic range control
CN110679083A (en) * 2017-03-31 2020-01-10 杜比国际公司 Dynamic range controlled inversion
US10924078B2 (en) 2017-03-31 2021-02-16 Dolby International Ab Inversion of dynamic range control
CN110679083B (en) * 2017-03-31 2023-11-17 杜比国际公司 Dynamic range control inversion

Similar Documents

Publication Publication Date Title
JP7050976B2 (en) Compression and decompression devices and methods for reducing quantization noise using advanced spread spectrum
AU2005281937B2 (en) Generation of a multichannel encoded signal and decoding of a multichannel encoded signal
CN109903776B (en) Dynamic range control for various playback environments
CN110675884B (en) Loudness adjustment for downmixed audio content
Gorlow et al. Model-based inversion of dynamic range compression
RU2639663C2 (en) Method and device for normalized playing audio mediadata with embedded volume metadata and without them on new media devices
US9875746B2 (en) Encoding device and method, decoding device and method, and program
Giannoulis et al. Parameter automation in a dynamic range compressor
US10861475B2 (en) Signal-dependent companding system and method to reduce quantization noise
IL186046A (en) Economical loudness measurement of coded audio
CN107077852B (en) Encoded HOA data frame representation comprising non-differential gain values associated with a channel signal of a particular data frame of the HOA data frame representation
CN110556120A (en) Method for decoding a Higher Order Ambisonics (HOA) representation of a sound or sound field
WO2014135914A1 (en) A method for inverting dynamic range compression of a digital audio signal
CN110679083B (en) Dynamic range control inversion
JP2003316394A (en) System, method, and program for decoding sound
CN113808600A (en) Method for determining the minimum number of integer bits required to represent non-differential gain values for compression of a representation of a HOA data frame
Vickers The Non-flat and Continually Changing Frequency Response of Multiband Compressors
WO2018177787A1 (en) Inversion of dynamic range control
Cantzos et al. Quality Enhancement of Compressed Audio Based on Statistical Conversion

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13725195

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13725195

Country of ref document: EP

Kind code of ref document: A1