GB2433184A - Optimal quantiser for an audio signal - Google Patents

Optimal quantiser for an audio signal Download PDF

Info

Publication number
GB2433184A
GB2433184A GB0625079A GB0625079A GB2433184A GB 2433184 A GB2433184 A GB 2433184A GB 0625079 A GB0625079 A GB 0625079A GB 0625079 A GB0625079 A GB 0625079A GB 2433184 A GB2433184 A GB 2433184A
Authority
GB
United Kingdom
Prior art keywords
noise
audio signal
quantisation
frequency
frequencies
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB0625079A
Other versions
GB2433184B (en
GB0625079D0 (en
Inventor
Peter G Craven
Malcolm J Law
J Robert Stuart
Rhonda J Wilson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
MERIDIAN LOSSLESS PACKING Ltd
Original Assignee
MERIDIAN LOSSLESS PACKING Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by MERIDIAN LOSSLESS PACKING Ltd filed Critical MERIDIAN LOSSLESS PACKING Ltd
Priority to GB0625079A priority Critical patent/GB2433184B/en
Priority claimed from GB0407392A external-priority patent/GB2414646B/en
Publication of GB0625079D0 publication Critical patent/GB0625079D0/en
Publication of GB2433184A publication Critical patent/GB2433184A/en
Application granted granted Critical
Publication of GB2433184B publication Critical patent/GB2433184B/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems
    • H04S1/007Two-channel systems in which the audio signals are in digital form
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B15/00Suppression or limitation of noise or interference

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A method of quantising an audio signal is provided, in which quantisation noise is shaped differently at audible and ultrasonic frequencies. The quantisation noise at audible frequencies may be shaped in a given frequency band to be less than the intrinsic noise in that frequency band. Also provided is a method of processing an audio signal, including quantising and compressing the signal, wherein the quantisation noise at substantially ultrasonic frequencies does not exceed the spectral power level reached during the passages of peak data rate at those frequencies.

Description

<p>OPTIMAL QUANTISER FOR AN AUDIO SIGNAL</p>
<p>Background</p>
<p>It is increasingly common for audio intended for distribution on a digital medium such as DVD-Audio to be presented as a Unear Pulse Code Modulation (LPCM) stream with a wordwldth of 24 bits per sample. This represents a high data-rate and so lossless compression is employed to reduce the data-rate of the audio on the disc.</p>
<p>However ft is often the case that the wanted audio has been contaminated by * noise that is considerably louder than the intrinsic noise floor of a 24-bit channel. For example, a digital transcription of an analogue tape may have a hiss level only twelve bits below peak level. In this case a 24-bit presentation is wasteful of data-rate, since the lossless compression is obliged to communicate the noise exactly and the noise is * often responsible for more encoded data-rate in this circumstance than the desired audio content.</p>
<p>The encoded data rate could be reduced by employing lossy compression methods instead of lossless. These achieve great reductions in data-rate but by * discarding information that is inaudible according to standard psychoacoustic models.</p>
<p>At high quality levels one would prefer not to do this which is the whole reason for lossless audio-compression's existence.</p>
<p>Another option is to reduce the excessive precision of the LPCM audio by fixed re-quantisation to a more realistic wordwldth, for example to 14, 15 or 16 bits in the case of original material with a noise level equivalent to 12 bits. The re-quantisation adds noise1 but this is a far less controversial process since, by appropriate use of dither, the effect on the audio can be made practically equivalent to adding noise. in the analogue domain. This process is time-invanant and does not rely on time variant psychoacoustic models.</p>
<p>In the following, the term quantisation noise refers to the noise added by such a re-quantisation, as distinct from the intrinsic noise contained in the original audio signal.</p>
<p>A variant of fixed quantisation is quantisation with noise shaping (see Gerzon, MA. and Craven, P.O., Optimal Noise Shaping and Dither of Digital Signals presented at Audio Eng. Soc. 87th Convention, New York, October 1989, AES prepnnt # 2822, for example). The noise shaping allows quantisation noise to be traded between frequency bands, and in particular to be moved from frequencies where the ear is more sensitive to bands where It Is less sensitive. 1*</p>
<p>At sampling rates of 88.2kHz and higher, there Is the opportunity to move great amounts of noise to frequencies above 20kHz, the conventional limit of human hearing.</p>
<p>Consequently, there is less noise introduced below 20kHz, an advantage that can be traded for a further reduction of the LPCM wordwidth. The quantisation noise is no longer white, but it is constant and uncorrelated with the audio signal, and this process has been seriously proposed as a means of economical, transmission of LPCM audio with audiophile quality, see Stuart, J.R., Coding for High-Resolution Audio Systems J. Audio Eng. Soc, volume 52, number 3, page 117(2004 March).</p>
<p>To be more precise, it is well known (c.f. Gerzon, M.A and Craven, P.O., Optimal Noise Shaping and Dither of Digital Signalsa presented at Audio Eng. Soc. 87th Convention, New York, October1989, AES prepnnt # 2822) that the quantisatlon noise can be shaped, using a minimum-phase noise-shaper, to any desired spectral shape, subject to the normälisation constraint that the integral (with respect to frequency) of the logarithm of the shaped noise spectral density is equal to the corresponding integral of the logarithm of the unshaped noise density.</p>
<p>The unshaped noise density is directly related to the requantised wordwidth, reducing by 6dB for each I bit increase in wordwidth. Conversely, an increase of quantisation noise by 6dB at all frequencies allows the wordwidth to be reduced by one bit, whether or not noise-shaping is used. With noise-shaping, the tradeoff can be organised differently, for example at 88kHz sampling one bit could be saved by allowing the quantisation noise to increase by 12dB at all frequencies above 20kHz, while keeping the noise the unaltered below 20kHz.</p>
<p>It will be seen from these examples that the greatest reduction of. LPCM wordwidth is obtained if the noise-shaping is adjusted so that the quantisation noise density is as large as is acceptable: at all frequencies.</p>
<p>In the following, we assume that the tradeoff described above is always applied, so that when we speak of Increasing the quantisatlon noise at a particular frequency, we assume that the noise shaping and wordwidth (quantisation step size) are simultaneously adjusted so that the quantisation noise density is kept constant at other frequencies. The item of interest is the reduced wordwith, which will directly result in a reduction of data-rate when lossless compression is used.</p>
<p>According to conventional psychoacoustic models, the ear is completely insensitive above 20kHz and one might be tempted to dump' vary large amounts of noise above 20kHz. However at present the audible significance of frequencies above 20kHz is not completely certain, and taking into account practical considerations it may be better to exercise restraint.</p>
<p>I</p>
<p>When lossiess compression is used, another factor becomes significant Typically a prediction filter is used, and the amount of compression then depends quite strongly on the spectrum of the input signal. If the signal has been re-quantised with heavy noise-shaping, the prediction may become much less accurate and the amount * 5 of compression will be less. This effect will tend to erode the advantage of the smafler wordwidth of the requantised signal. Choosing the level and spectrum of the quantisation noise can be done manually. However achieving maximum data-rate reduction without audibly affecting the signal requires knowledge of the noise spectrum and level in the original signal and an understanding of the influence of the spectrum of the quantisation noise on the losslessly coded data-rate. These are not easily achieved.</p>
<p>Summary of the Invention</p>
<p>The invention in a first aspect provides a tool that estimates the intrinsic noise within an audio signal and quantises that signal to a wordwidth determined in dependence on the estimate of intrinsic noise.</p>
<p>Preferably, the quantisation wordwidth is determined so that the quantisation noise is lower than the intrinsic noise over the greater part of the audible frequency range.</p>
<p>Preferably, the intrinsic noise estimate is made as a function of frequency and the quantisation uses noise-shaping to allow the quantisation noise spectrum to follow the intrinsic noise spectrum more closely.</p>
<p>The invention in a second aspect provides a method to estimate the intrinsic noise in an audio signal, comprising analysis into frequency bands or time-varying spectra, and making an estimate of the noise based substantially on the lowest spectral levels observed in each frequency band.</p>
<p>One embodiment of the second aspect creates, at each frequency, a histogram * of the observed Spectral levels and estimates the noise on the basis of the histogram.</p>
<p>Mother embodiment of the second aspect uses a method such as Maximum Likelihood to estimate the parameters in a statistical model of the spectral levels.</p>
<p>The invention in a third aspect provides a noise-shaped quantiser whose quantisation noise is determined at high and/or ultrasonic frequencies responsively to an analysis of the time-varying spectrum of its input signal at those frequencies.</p>
<p>In the context of lossless compression of. the quantised signal, the quantisation noise is preferably minimised subject to a predetermined limit on peak data rate or on average compressed data rate.</p>
<p>O One embodiment of the third aspect provides for a quantisation noise spectral density substantially equal to that of the signal dunng passages of peak data rate, or alternatively lower than that amount by a substantially fixed decibel offset Another embodiment of the third aspect provides for the quantisation noise density substantially to follow a given percentile of the signal's histogram, as a function of frequency.</p>
<p>The invention in a fourth aspect provides a quantisation tool that uses a noise shaping curve determined from the input signal, so as to provide substantially the characteristics of the first aspect at. low and middle audio frequencies, and the characteristics of the third aspect at high and/or ultrasonic frequencies, with an appropriately chosen transition between the two regImes.</p>
<p>Brief Description of the Drawings</p>
<p>V Examples of the present invention will now be described in detail with reference to the accompanying drawings, in which: Figure 1 shows the results of processing a section of audio in the manner described herein and plotting, at each frequency, the 1, 10, 50, 90 and 99th percentiles of the distribution of smoothed spectral components. The horizontal a,ds is frequency In kilohertz (kHz) and the vertical axis Is relative level In decibels (dB); V Figure 2 is akin to Figure 1 but with stationary white noise as Input and using minimal smoothing of the spectral components; and! V V Figure 3 is akin to Figure 2 but with a greater degree of smoothing of the spectral components. V</p>
<p>V 25 Detailed Description of the Invention</p>
<p>V An Embodiment V V One embodiment of the invention re-quantises the digital audio signal presented as input by: V V 30 a) Analysing the signal Into a time-varying spectrum b) Smoothing the spectral components both across frequency and time.</p>
<p>C) Recording the smoothed spectral components.</p>
<p>d) Processing the recorded spectral components at each frequency to determine characteristics of the signal or noise floor and so determine an acceptable level to introduce quántisation noise.</p>
<p>* e) Choosing a quantisation level and noise shaping filter such that the introduced quantisation noise is acceptable at all frequencies according to the computation, at (d).</p>
<p>f) Quantising the audio at the chosen level with the computed noise-V V 5 shaping filter.</p>
<p>We will now discuss these operations in more detail. V V 4.1 Analysing the signal into a time-varying spectnim. V V Analysis of the signal can done by using for example a polyphase filter bank, or by splitting the signal into (probably overlapping) sections, multiplying each section by a windowing function and computing a Fast Fourier Transform (FFT) on each.</p>
<p>Since the output from each filter is narrow band, it can sampled at considerably less than the original audio sarnpFing rate without loss of useful information.</p>
<p>The audio under analysis is ideally the whole audio programme to be V 15 processed. However it could alternatively be one or more representative sections of the audio chosen automatically or manually.</p>
<p>V 4.2 Smoothing the spectral components both across frequency and time.</p>
<p>V The (time-) sequence of FFT computations yields a sequence of complex spectral coefficients for each frequency. The individual coefficients are likely to have zero mean but we would like an estimate of the variance. Preferably, this is obtained by taking the square of the absolute value of each coefficient. Similarly in the case of a filter bank, our objective is to acquire statistics relating to how the signal variance in each frequency bin is distributed over time. The instantaneous power coming out of each filter in the filterbank provides a natural, though noisy, estimate of that variance.</p>
<p>The output from our fitterbank may well have zero-crossings which could V introduce a proportion of spuriously low valued readings. We smooth the variance estimates in time and/or frequency so as to reduce the noise of the estimates and to reject spuriously low values due to zero crossings. The data may be decimated if desired following the smoothing.</p>
<p>The amount of smoothing applied is limited by our desire to see through the V gaps in the signal to the underlying background noise. To illustrate in the frequency V domain1 if the signal were a single note and its harmonics, we'd want to be able to observe background noise at frequencies between those harmonics, which would only V 35 be possible if the support' for frequency smoothing (that is, the number or extent of raw data that contribute to a smoothed value) is less than the separation between the harmonics. A support across 3 adjacent frequency bands can be a sensible choice, though others are possible including no frequency smoothing at all. In the time-domain, natural signals tend not to disappear instantly, but to decay gradually.</p>
<p>Exponential smoothing with a faster time constant than is typical of natural signals (for example 2Oms) would achieve a reasonable degree of smoothing without undue risk of filling In gaps in the signal through which we'd like to observe noise.</p>
<p>4.3 Recording the smoothed spectral components A practical method of recording the smoothed spectral components is to quantlse them according to level (preferably on an exponential scale, for example to the nearest quarter dB) and then bin' them according to frequency and level, i.e. maintain a two-dimensional array of counters. Effectively one Is accumulating a histogram, separately for each frequency, of how many times each quantised value is observed.</p>
<p>4.4 Processing the recorded spectral components at each frequency to determine characteristics of the signal or noise floor and so determine an acceptable level to introduce quantisation noise * From the recorded (and smoothed) spectral components we can estimate two parameters that influence a determination of the acceptable level of quantisation noise at each frequency. These are firstly the level of intrinsic noise present in the signal at that frequency, and secondly the level of quantisatlon noise that could be introduced at that frequency without degrading the ability to losslessly compress the audio.</p>
<p>4.4.1 Intrinsic noise estimation It is reasonable to suppose that, at each frequency, there is usually a signal present, but frOm time to time signal is absent in that frequency band so the intrinsic noise level is viéible.</p>
<p>According to the invention, the tool estimates the level of Intrinsic noise at each frequency. The simplest approach Is to take a percentile criterion, for example with a 1% criterion the tool determines the level that is above the smoothed spectral component for precisely 1% of the time, and takes that level as the estimated noise level. Repeated determination with different percentile criteria is extremely fast. if spectral components have been binned according to frequency and level and counted to form a histogram at each frequency as suggested above.</p>
<p>Figure 11$ an example plot with frequency as the independent vanable, in which levels corresponding to each percentile criterion have been joined to form a percentile line. This has been done for the 1%, 10%, 50%, 90% and 99% criteria. It may be reasonable to take the bottom (1%) curve aS representing the intrinsic noise of the source material The effect of an antialias filter at 20kHz is dearly visible.</p>
<p>Such a simple approach suffers from some systematic biases though. The length of time for which we observe the noise is not known and likely to be frequency dependent. For example, if at particular frequency the noise floor is visible for 10% of the time, then the 1% global statistic will be the 10% statistic for the noise, If the noise floor is visible for 2% of the time, then the 1% global statistic will be the 50% statistic for the noise alone. Noise is a random process, not a constant value, and so its 10 percentile will read lower than the 50 percentile.</p>
<p>One way to reduce this systematic bias Is to Increase the smoothing applied to the spectral components. The effect of smoothing is illustrated by comparing figure 2 with figure 3. Both figures are percentile plots of stationary white noise. Figure 2 has minimal smoothing, and shows a 4dB difference between the level estimates (-84dB and -.60dB approximately) obtained by. applying 10% and 50% criteria respectively. In figure 3, WhiCh has greater smoothing, this difference is reduced to about 2.5dB.</p>
<p>A more sophisticated analysis might begin by rejecting histogram entres corresponding to signal levels that are dearly too loud to be noise. Renormalising the counts for the remaining entries and applying a percentile criterion to the renormalised counts is likely to produce a less biased estimate of the noise level. The analysis could also take into account the precalculable probability density function of the 8pectrai components that result from applying the procedures described above to a signal consisting of noise.</p>
<p>Referring again to figure 1, the lowest percentile line, for 1%, rises substaritlaily at low frequencies, for example below 1kHz. However this probably does not indicate a rise in intrinsic electronic noise in this region: more likely it is an increase in the acoustic noise of the room in which the recording was made. Therefore it may be preferred not to allow the quantisation noise to rise in this frequency region, and in any case there is little lossless coding gain to be had from the quantisation noise being allowed to rise substantially over a small bandwidth. In this situation it can be sensible * for the estimated low frequency noise floor to be extrapolated from the noise floor at mid frequencies. The same principle can be applied in other parts of the spectrum: any sharp peaks in the low percentile lines can be ignored.</p>
<p>4.4.2 Quantisation noise and!ossless compression At high frequencies, especially at ultrasonic frequencies, we may be tempted to allow more noise to be introduced regardless of the estimated noise floor of the original signal on the grounds that the ears reduced sensitMty at these frequencies will make it inaudible. There is also the example of Direct Stream Digital (DSD) encoding, which some listeners claim to be more engaging than LPCM. and it is hard to think of a plausible explanation other than that it is due to the large levels or ultrasonic noise introduced by DSD encoding.</p>
<p>However, we do not intend to introduce quantisation noise for the sake of it, our ultimate aim is to improve the efficiency of subsequent lossless compression. For this we need a model of the interaction between quantisatlon noise spectrum and losslessly-compressed data rate. The following statement Is based on Meridian Lossless Packing (MLP), though it is likely to apply also to other lossless compression systems that rely on signal prediction followed by entropy coding, or on a frequency transform followed by entropy coding of the transform coefficients.</p>
<p>As long as the quantisation noise spectrum is lower than the audio eigflal's spectrum (including both desired content and intrinsic noise) at all frequencies, the reduction in losslessly coded data-rate resulting from requantisation is directly related to the average (taken linearly with frequency) level (measured logarithmically eg in dB) of the spectrum of the introduced quantisation noise. Hence higher quantisation noise at any frequency reduces the losslessly coded data-rate. Once the quantisation noise exceeds the audio signal spectrum at any frequency, a further Increase in quantisation noise at that frequency will have a smaller effect. The precise behaviour wl depend on the lossless compression system in use, but it is adequate for our purposes to assume that further Increases in the quantisatlon noise will have no effect on the instantaneous losslessly compressed data rate.</p>
<p>If the desire is to minimise the peak data rate of the compressed stream, the spectrum of the input signal at times corresponding to the data rate peaks could, be taken as a guide to the upper limit beyond which further increases in the quántisation noise spectrum result in little or no further reduction of the data rate at these times.</p>
<p>If the desire is to minimise the average data rate, for example to increase the playing time on a disc or to minimise the size of a computer file, then the percentile plots provide a guide to the law of diminishing returns'. For example, if the quantisation noise spectrum equals the 50% percentile line at a particular frequency, the effect of a 1dB increase in noise on average data rate Will be half that of a 1dB increase when the quantisation noise at that frequency is below the input spectrum noise throughout the program. It the quantisation noiàe is at the 80% line, then a 1dB increase in quantisation noise will result in only one-fifth the reduction of average data rate.</p>
<p>Allowing noise 6dB below the median leveF of the signal (the 5O' percentile) may be considered a reasonable choice of noise level by this criterion.</p>
<p>4.5 Choosing a quantisation level and noise shaping filter such that the introduced quanfisation noise is acceptable at all frequencies according to the computation at 4.4 It is reasonable to want the quantisation noise spectrum to be smooth, and to run parallel to and below the estimated noise floor at low to mid frequencies. How much below the estimated noise floor the quantisation noise floor is placed is a tradeoff between conservatism and desire for reduced lossiessly coded data-rate. A value in the range of 6-18dB is sensible. For reference, additional quantisatlon noIse 12dB below the original noisefloor will increase the noise floor by a mere 1/4dB, 18dB below will increase by IiedB.</p>
<p>*At high frequencies (especially ultrasonic frequencies above 20kHz) the decreased sensitivity of the ear makes matching the estimated noise floor less important and we are happy for increased noise up to the level where the lossless compression returns degrade as calculated in section 4.4.2. It is sensible to provide a smooth transition between the two regimes.</p>
<p>* if desired, user interaction could be employed at this stage, presenting the results from eailier stages (for example, percentile plots similar to figure 1) and inviting the user tO choose by how much the quantisation noise should be below the estimated noise level. The user could also choose over what frequency range the design criteria change from those of section 4.4.1 to those of section 4.4.2, and the degree of noise introduced in the ultrasonic region (for example, by reference to a percentile).</p>
<p>Having computed the desired spectrum and level, a noise shaping filter to * implement the desired spectral èhape can be automatically calculated. A procedure to calculate a noise shaping filter from a desired target spectrum Is given in Gerzon, M.A.</p>
<p>and Craven, P.G., ROptimal Noise Shaping and Dither of Digital SIgnals' presented at Audio Eng. Soc. 87th Convention, New York, October 1989, AES prepuint # 28220, though other design techniques (eg. least squares computation) are simpler. The quantisation level can then be chosen so that the shaped quantisation noise lies at the desired level.</p>
<p>4.6 Quantising the audio at the chosen level with the computed noise-shaping filter The mechanics of noise-shaped quantisation with dither are well known in the art, see for example Gerzon,. MA. and Craven, P.G., Optimal Noise Shaping and Dither of Digital Signals presented at Audio Eng. Soc. 87th Convention; New York, October 1989, AES preprint # 2822.</p>
<p>It Will generally be necessary to use a second pass through the audio signal in order to implement the quantisation, unless the quantisation level and noise shaper can be determined adequately from inspection of a small portion near the start.</p>
<p>5. Alternative embodiment An alternative embodiment follows the overall strategy described above, but relies more on a model-based fitting procedure and so deviates In respect of the steps described In paragraphs 4.2,4.3 and 4.4 The alternative embodiment requires a model for the spectral statistics of the input signal. Preferably, the input signal is modelled as intrinsic noise, for example stationary Gaussian noise with frequency-dependent variance to be determined, plus a nonstatlonary wanted signal whose parameters result in a near-zero spectral density for some of the time, thus allowing the noise to be distinguished. The model's parameters are then estimated by a method such as least-squares or Maximum Ukellhood.</p>
<p>The input to the parameter estimation process could be the signal itself, though * it Will generally be more efficient to pre- process to obtain a time-varying spectrum as in paragraph 4.1. The smoothing with respect to time, discussed In paragraph 4.2, can be eliminated, since Maximum Ukellhood is able to deal optimally with noisy estimates.</p>
<p>The model can be independent of frequency, and can be applied separately at each frequency, using only the data acquired in that frequency bin, smoothed with that of its neighbours. Alternatively, the model can include explicit frequency dependence of the parameters, in which case it is applied once with the data at all frequencies, and in this case pre-smoothing of the data with respect to frequency also becomes redundant.</p>
<p>Nonlinear parameter estimation generally proceeds iteratively, but given reasonable initial estimates It Is generally possible to derive an update formula whose dependency on the data is by way of a small number of intermediate quantities, each of which is the sum over the data of a nonlinear function of the observables. A single iteration may suffice, in which case the intermediate quantities can be accumulated during a single pass through the audio data. Failing that, It may be possible to use the initial portion of the stream to refine the initial parameter estimates so that the majority * of the stream needs to be scanned only once in order to produce estimates if the required accuracy.</p>
<p>Once the unknown parameters In the model have been determined, the model can be used as the basis for the determination of a quantlsatlon level and spectial shape as described in section 4.5. In particular, the model should directly ftzmlsh an estimate of the level and spectrum of the intrinsic noise.</p>
<p>6. Multichanne! aspects The above descnption can be generalised to handle multichannel signals. The simplest generalisation is to treat all channels independently. An alternative is to combine the histograms from all channels so that the same quantisation and noise shaping will be applied to all channels.</p>
<p>However, If the noise on the original multiple channels Is correlated, then quantisation noise introduced independently on each channel may be directionally unmasked. This is unlikely to be a problem very often, particularly with conservative choices of introduced quantisation noise. However to address this issue, instead of using the power on each channel individually, one could compute the cross correlation matrix between all channels, at each frequency and at each point in time. The cross-correlation matrix entries would then each be smoothed in the same way as described for the spectral coefficients In the single-channel case.</p>
<p>There are various approaches to processing the cross-correlation matrices.</p>
<p>One is to calculate the eigenvalues of each one and then to throw all the elgenvalues into the histogram computation described previously. Applying a low percentile criterion would lead to a noise estimate determined substantially by the smallest elgenvalue of each matrix, and would be conservative in terms of immunity to directional unmasking..</p>
<p>Alternatively, a multichannel noise-shaped quantiser with matrix feedback could be adjusted so that the quantisation noise at each frequency and in each vector direction is less than the estimated noise inthe original signal in that direction. This may permit a larger quantisation step-size while also being proof against directional unmasking.</p>
<p>7. Time variance It is quite possible that the intrinsic noise level of the original signal may change, for example due to fades in or out or as the mix varies between different sources containing differing noise contributions.</p>
<p>An audiophile viewpoint would be that the introduced quantisation noise should be time-invariant, in which case the methods described above will tend to a conservative result of estimating the noise at each frequency as the quietest of the noises present at various times. One of the benefits of using a percentile criterion such as 1% to estimate noise is that it should reject spuriously low levels of noise during initial fade in/fade out. Cleaily the percentile criterion must be larger than the proportion of time occupied by such fades and artificial silences, for this to be effective.</p>
<p>Other approaches are possible, for example detecting changes in the input signal's statistics and either turning off the noise shaped quantisation (so that for example "digital black" in leads to digital black" out) or analysing the situations separately and providing a different noise level/spectrum for each. This is an area where explicit human Intervention may be helpful.</p>
<p>* Note that turning off quantisation for a small length of time will have a small effect on the overall average losslessly coded data-rate.</p>
<p>If implementing time-variant quantisatlon noise it is sensible to smooth the transitions between regimes.</p>
<p>This is easily done in the case of level. The quantisatlon level should be ramped in steps of one bit. In between those steps additional noise should be introduced such that as the noise introduced by quantisation suddenly changes by one bit the additional noise alters such that the total introduced noise from both quantisatlon and additional noise alters smoothly.</p>
<p>Smoothly altering noise-shaping filters is more complicated, and techniques will depend on the architecture of the noise shaper. For some architectures it is perfectly practical (if somewhat computationally intensive) to interpolate the target spectrums into intermediate targets, compute interpolated noise shaping filters from each of those target spectrums and use them successively with the same state variables.</p>

Claims (1)

  1. <p>S Claims 1. A method of quantisingan audio signal, comprising the steps
    of: analysing the audio signal into spectral components; quantislng the audio signal; shaping the quantisatlon noise at substantially audible frequencies according to a first relationship to the spectral components; and, shaping the quaritisation noise at substantially ultrasonic frequencies according toa second relationship to the spectral components.</p>
    <p>2. A method according to claim 1, wherein the spectral components include at least one of frequency bands and time-varying spectra.</p>
    <p>3. A method according to claim I or claim 2, f urther comprising, the step of estImating the intrinsic noise in the audio signal, wherein the first relationship requires the quantisation noise in a given frequency band to be less than the estimate of intrinsic noise in thatfrequency band.</p>
    <p>4. A method according to claim I or claim 2, wherein the second relationship requires the spectral power level in a given frequency band to be less than or equal to the quantisation noise in that frequency band for a pre-determined proportion of the duration of the signal.</p>
    <p>5. A method according to claim I or claim 2, whereIn the second relationship is such that the quantisation noise is shaped across a range of substantially ultrasonic frequencies to be less at each frequency in the range than the peak power levee of the audio signal at that frequency over the duration of the audio signal.</p>
    <p>6. A method according to claim I or claim 2, wherein the second relationship is such that the quantisation noise is shaped across range of substantially ultrasonic frequencies to be less at each frequency in the range than the 50% percentile of the spectral power level of the audio signal at that frequency over the duration of the audio signal.</p>
    <p>7. A method of processing an audio signal, comprising the steps of: * analysing the audio signal to establish the location of the passages of peak data rate where the audio signal is least compressible; quantising the audio signal according to the method of claim I or claim 2, wherein the second relationship is such that the quantisation noise at substantially ultrasonic frequencies does not exceed the spectral power level reached during the passages of peak data rate at those frequencies; and, subsequently compressing the noise-shapedquafltiSed audio signal.</p>
    <p>8. A method according to claim 7, wherein the quantised audio signal is losslessly * 10 compressed.</p>
    <p>9. A quantiser for quantising an audio signal, the quantiser adapted to perform the method of any of claims 1 to 6.</p>
    <p>10. A processor for processing an audio signal, the processor compnsing an analyser a quantiser and a compressor adapted to perform the method steps of claim 7 or dalm 8.</p>
    <p>11. A computer program product for performing the method of any of claims I to 8. * 20</p>
    <p>12. A data carrier comprising an audio signal quantised using the method of any of claims I to 8 or processed using the method of claim 7 or claim 8.</p>
GB0625079A 2004-03-31 2004-03-31 Optimal quantiser for an audio signal Expired - Lifetime GB2433184B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB0625079A GB2433184B (en) 2004-03-31 2004-03-31 Optimal quantiser for an audio signal

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0407392A GB2414646B (en) 2004-03-31 2004-03-31 Optimal quantiser for an audio signal
GB0625079A GB2433184B (en) 2004-03-31 2004-03-31 Optimal quantiser for an audio signal

Publications (3)

Publication Number Publication Date
GB0625079D0 GB0625079D0 (en) 2007-01-24
GB2433184A true GB2433184A (en) 2007-06-13
GB2433184B GB2433184B (en) 2007-11-28

Family

ID=38645850

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0625079A Expired - Lifetime GB2433184B (en) 2004-03-31 2004-03-31 Optimal quantiser for an audio signal

Country Status (1)

Country Link
GB (1) GB2433184B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5206884A (en) * 1990-10-25 1993-04-27 Comsat Transform domain quantization technique for adaptive predictive coding
EP0632597A2 (en) * 1993-06-29 1995-01-04 Sony Corporation Audio signal transmitting apparatus and the method thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5206884A (en) * 1990-10-25 1993-04-27 Comsat Transform domain quantization technique for adaptive predictive coding
EP0632597A2 (en) * 1993-06-29 1995-01-04 Sony Corporation Audio signal transmitting apparatus and the method thereof

Also Published As

Publication number Publication date
GB2433184B (en) 2007-11-28
GB0625079D0 (en) 2007-01-24

Similar Documents

Publication Publication Date Title
CN110379434B (en) Method for parametric multi-channel coding
US8812308B2 (en) Apparatus and method for modifying an input audio signal
EP3236586B1 (en) System for combining loudness measurements in a single playback mode
JP5511136B2 (en) Apparatus and method for generating a multi-channel synthesizer control signal and apparatus and method for multi-channel synthesis
US8194889B2 (en) Hybrid digital/analog loudness-compensating volume control
US7970144B1 (en) Extracting and modifying a panned source for enhancement and upmix of audio signals
US9443525B2 (en) Quality improvement techniques in an audio encoder
RU2520420C2 (en) Method and system for scaling suppression of weak signal with stronger signal in speech-related channels of multichannel audio signal
CN103262409B (en) The dynamic compensation of the unbalanced audio signal of frequency spectrum of the sensation for improving
JP4712799B2 (en) Multi-channel synthesizer and method for generating a multi-channel output signal
EP2486654B1 (en) Adaptive dynamic range enhancement of audio recordings
RU2596592C2 (en) Spatial audio processor and method of providing spatial parameters based on acoustic input signal
US8612237B2 (en) Method and apparatus for determining audio spatial quality
US6915255B2 (en) Apparatus, method, and computer program product for encoding audio signal
US20020022898A1 (en) Digital audio coding apparatus, method and computer readable medium
US6466912B1 (en) Perceptual coding of audio signals employing envelope uncertainty
EP1259956B1 (en) Method of and apparatus for converting an audio signal between data compression formats
US7725323B2 (en) Device and process for encoding audio data
EP2828853B1 (en) Method and system for bias corrected speech level determination
GB2414646A (en) Optimal quantiser for an audio signal
GB2433184A (en) Optimal quantiser for an audio signal
WO2007034375A2 (en) Determination of a distortion measure for audio encoding
KR20230084232A (en) Quantization of audio parameters
Uhle et al. A supervised learning approach to ambience extraction from mono recordings for blind upmixing
JP2000137497A (en) Device and method for encoding digital audio signal, and medium storing digital audio signal encoding program

Legal Events

Date Code Title Description
732E Amendments to the register in respect of changes of name or changes affecting rights (sect. 32/1977)

Free format text: REGISTERED BETWEEN 20130926 AND 20131002

PE20 Patent expired after termination of 20 years

Expiry date: 20240330