GB2269967A

GB2269967A - Audio data compression

Info

Publication number: GB2269967A
Application number: GB9217894A
Authority: GB
Inventors: Philip Cambridge; Martin Todd
Original assignee: Central Research Laboratories Ltd
Current assignee: Central Research Laboratories Ltd
Priority date: 1992-08-22
Filing date: 1992-08-22
Publication date: 1994-02-23
Also published as: GB9217894D0

Abstract

Audio data compression is achieved by filtering received audio data (2) into several frequency sub-bands (6(a)-(f)), each sub-band covering a fraction of the total audio bandwidth. The lowest range of sub-band frequencies (6(f)) is transform coded (10) to provide masking thresholds, and the remaining sub-bands (6(a)-(e)) time-domain coded (18) also to provide masking thresholds. The masking thresholds are then used to perform psychoacoustic masking. <IMAGE>

Description

AUDIO DATA COMPRESSION The present invention relates to a method of audio data compression comprising: receiving audio data for compression, the audio data comprising components of differing frequencies;providing a plurality of frequency sub-bands, the frequency range of each sub-band being a fraction of the total audio bandwidth, which fractions have predetermined relationships therebetween; and filtering the components into the sub-bands in dependence upon correlation between the frequency of the components and the respective subband frequency range. The invention also concems a method of decoding audio data compressed by such a method and to apparatus for carrying these methods into effect.

The aim of audio data compression systems is to reduce The amount of, generally, digital information necessary to describe or encode the audio signal, whilst still maintaining the perceived quality of the original audio signal. A compression system which introduces no audible distortions is said to offer "transparent" quality. Most high quality coding systems exploit The property of the human auditory system known as psychoacoustic masking. The principle of such masking is that a loud sound will mask a quiet one for a given frequency, the quiet sound then being inaudible. If the audio spectrum is split into frequency divisions known as "critical bands" then a masking threshold can be calculated for each critical band such that signals with power below that threshold will be inaudible.

One such method in accordance with the above is that described in European Patent Application No. EP 0, t 78,608. In This document a signal is split into four sub-bands. The highest frequencybandis encoded using block compandedpulse code modulation, while the remaining three subbands are encoded using adaptive differential pulse code modulation. This is because block companded pulse code modulation gives a higher quality encoding and would thus ideally be used for all the sub-bands, yet its implementation requires a large memory capacity and fo use it for all four subbands would necessitate memory in excess of that available in many processor chips.

It is an aim of the present invention to at least alleviate the aforementioned shortcomings.

According to one aspect the invention provides a method of audio data compression comprising: receiving audio data for compression, the audio data comprising components of differing frequencies; providing a plurality of frequency sub-bands, the frequency range of each sub-band being a fraction of the total audio bandwidth, iich fractions have a predetermined relationship therebetween; and fitering the components of the data into the sub-bands in dependence upon correlation between the frequency of The components and the respect sub-band frequency range; characterised in that the components within the sub-band covering the range of lowest frequencies are transform coded; the components within all other subbands are timedomain coded; and all the coded components are multiplexed to provide a signal representative of compressed audio data.

According to another aspect the invention provides apparatus for compression of audio data comprising: a receiver for receiving audio data for compression, which data comprises components of differing frequencies; and a filter for providing a plurality of frequency subbands, The frequency range of each sub-band being a fraction of the total audio bandwidth, which fractions have a predetermined relationship therebetween, and for filtering the components of The data into the sub-bands in dependence upon correlation between the frequency of the components and the respective sub-band frequency-range; the apparatus including a transform coder for transform coding The components within the subband covering the range of lowest frequencies; time-domain coders for tim-domain coding the components within all other sub-bands; and a multiplexer for multiplexing all the coded components to provide a signal representative of compressed audio data.

According to yet another aspect of the present invention, there is provided a method of expanding audio data having been compressed in accordance with otherprovisions of the invention, comprising receiving compressed coded audio data, demultiplexing the compressed coded data to provide coded frequency sub-bands of the data, transform decoding the lowest range ofsubbands, and combining all the decoded sub-bands to provide an expanded audio signal derived from the compressed audio data.

A further aspect of the present invention provides apparatus for expanding audio data having been compressed in accordance with other provisions of the invention, comprising a receiver for receiving compressed coded audio data, a demultiplexer for demultiplexing the received compressed coded data to provide coded frequency sub-bands of the data, a transform decoder for transform decoding the lowest range of subbands, a time-domain decoder for time-domain decoding all remaining subbands, and a combiner for combining all the decoded sub-bands to provide an expanded audio signal derived from the compressed audio data Because the sub-band covering the lowest frequency range has therewithin many narrow- width critical bands, and to attempt to produce subbands with frequency ranges similar to the critical bands (in this range) would require filters of very high order (possibly as a consequence of many cascaded filters), and would require synthesis filters whose stability would be difficult to attain. Then by treating This sub-band as a single band and encoding the signals there within by transform coding, a more efficient coding is possible Than has been hitherto known. This is due to the fact that transform coding produces sinusoidal distortions which are tolerable at low frequencies, whereas time-domain coding produces distortions which are noise-like, that is wideband and these are tolerable at high frequencies.

Preferably the predetermined relationship is a binary relationship such that the width of The sub-bands may differ from each other by factors of two, this enabling the complexity of the circuitry necessary to perform the invention to be kept to a minimum.

Advantageously the fraction of the total audio bandwidth covered by the sub-band for transform coding is the lowest quarter of the bandwidth, and the fraction covered by the time-domain coding is the highest three-quarters of the total bandwidth.

The present invention will now be described, by way of example only and with reference to the accompanying drawings of which: Figure 1 illustrates schematically a block diagram of an embodiment of present invention; Figure 2 illustrates the relationship between critical bands and the subbands chosen for The present invention; Figure 3 illustrates graphically the technique of psychoacoustic masking, and; Figure 4 illustrates schematically a block diagram of a decoder for use with the present invention.

Referring firstly to figure lit will be seen that audio data 2 for compression passes through a filter bank such as quadrature mirror filter 4 and is split info six sub-bands 6(a)-(t). The frequency range of 6(a) has higher frequencies Than that of 6(b) which has higher frequencies than that of 6(c), and so on. The range of lowest frequencies is 6(f).

Sub-bands 6(a)-6(e) are then all coded by time-domain coders 8, whilst sub-band 6(f) is coded by transform coder 10.

Finally the coded audio data is then passed to a multiplexer 12 where a signal 14 representative of compressed audio data derived from audio data 2 is produced.

Referring now also to figure 2, the filtering and choice of sub-bands will be explained in more detail.

The quadrature mirror filter 4 is chosen so as to provide six sub-bands 6(a)-(t). The bandwidth of each of these sub-bands is chosen so as to mirror to an extent the set of well-known critical" bands derived by Zwicker. These well known critical bands, which number 26 over the spectrum of 0-22 KHz play a significant role in several auditory effects.

More modem data indicate slightly narrower bands than those specified by Zwicker, leading to around 35 bands in the 0-22 KHz range.

The Zwicker critical bands can be seen on the left hand side of figure 2 and The six sub bands 6(a)-(f) on the right.

It can be seen that the quadrature mirror filter 4 works as follows: firstly the audio data 2 is split into two subbands at midway along the bandwidth. Then each of these sub-bands is split into two also midway along their bandwidths. Next, for reasons which will be explained later, the two highest frequency sub-bands 16,18 and the lowest frequency sub-bands 20 (each representing a quarter of the total audio bandwidth) are left untouched, yet the remaining subband is split into two, and then one half of this into two again. This results in a sub-band 22 of width an eighth of the total audio bandwidth and two subbands 24,26 each of width a sixteenth of the total audio bandwidth.

Comparing the subbands on the right with those of the Zwicker critical bands, it can be seen that the similarity ends for the subband 20 which is below subband 26. Here it would be expected that subband 26 would itself be split into two, and so on, in order to reflect the critical bands of Zwicker. However, because these bands would be of such narrow widths and so densely concentrated making them difficult to separate with a subband technique, it has been decided to treat the entire lowest frequency quarter of the total audio bandwidth as one single sub-band, 20, and code this using a transform coding. The transform coding providing a method of segmenting this band into the narrow frequency ranges.

The audio data 2 has components of differing frequencies. When the data 2passes through filter 4, these components are filtered and allocated to a sub-band according to the frequency of the component. As illustrated in figure 2, components in the frequency range 0-5.5 KHz are filtered to sub-band 6(f); components in the frequency range 5.5-6.875 KHz are filtered to sub-band 6(e); components in the frequency range 6.85-8.25 KHz are filtered to sub-band 6(d); components in the frequency range 8.25-11 KHz are filtered to sub-band 6(c); components in the frequency range 1115.5 KHz are filtered to sub-band 6(b); and components in the frequency range 15.5-22 KHz are filtered to subband 8().

The original audio data 2 having been split into six subbands 6(a)-(f), each sub band may be coded independently. This coding is done in the time-domain and transform domain dependent upon the frequency range of each subband, as detailed below.

The highest three quarters of the total filtered audio data 2 bandwidth, 16,18,22,24 and 26 is coded using time-domain coding. An example of such a coding technique, which is known, characterises each signal sample by a redundant (predictable) part anda non-redundant (information- carrying) part. The predicted value is subtracted from the informatoncarying value to leave a socalled "residual: This residual is quantized using a known quantization technique, such as uniform or non-uniform. The quantizer scale is chosen such that the noise generated due to quantization errors is within the calculated marking threshold.Prediction coefficients may be updated and coded for every frame of the sample so derived. For The lowest quarter of the total filtered audio data 2 bandwidth, 20, a known coding technique called transform coding is employed. In This technique, the components of the signal within sub-band 20 are split into blocks of samples and each block is processed in turn. The processing produces a set of transform coefficients for each block by utilising a transform algorithm. As with the timedomain coding, the transform coefficients are Then quantized based on calculated masking thresholds.

Masking thresholds are calculated for all critical bands, based on the signal energy within each critical band. The filtered audio data is split into blocks of samples, and for each block the critical band energies are given by transform coefficients in the bottom quarter of The frequency range, and by The mean-signal sample values within The other subbands.

Having calculated the total signal energy within each critical band, the masking thresholds for all critical bands can be calculated using a one of a number of published techniques (e.g.

Johnson, DCC). The masking Threshold pro vides a maximum value for quantization none within each critical band.

Referring particularly once more to figure 1, it can be seen that time-domain coding is performed on the sub-bands 16,18,22,24 and 26 which together cover the highest three- quarters of the total audio data 2 bandwidth. The coding is performed by linear predictors 8 which achieve predictive coding.

The concepts ofpsychoacoustic masking and ofproviding masking thresholds to achieve this are well known to Those skilled in the art and hence will not be described in detaiL Reference to figure 3 illustrates graphically how such psychoacoustic masking is achieved, with the heavy line running through the plotted points of frequency against amplitude representing the masking threshold at any particular frequency.

The outputs ofall coders 8,10 are multiplexed via multiplexer 12 which provides a signal 14 representative of compressed audio data 2. The system described above has the ability to provide signals 14 having compression ratios of around typically 8:1 but not limited to This.

In order to decode, or expand, the signal 14, a decoder circuit such as that illustrated in figure 4 is necessary.

In its simplest form, this circuit need only perform the inverse functions of that achieved in figure 1. Hence signal 14 is demultiplexed via demultiplexer 28 effective to provide discrete freguency-band separated signals which are then passed to timedomain decoders 30 or transform domain decoder 32 in the same fashion as used for the coding described above. The decoded signals are then combined in combiner 34 in order to provide a mimic 36 of the original audio data 2.

It will be understood that in figure 2, the fractions shown in the right hand sub-band structure are fractions of the total audio bandwidth. Furthermore, although in the above example the frequency ranges of the sub-bands have been chosen to have a binary relationship between each other, this is not necessary. Any suitable predetermined relationship will suffice, for example a factor of Three, or five. Use of a binary relationship has been chosen solely for convenience in the choice of filters then necessary within quadrature mirror filter 4.

Those skilled in the art will appreciate that, although in the above example a discrete Fourier transform was utilised to perform the transform coding, any suitable transform coding technique will suffice, such as by employing Discrete Cosine transforms and the like. Similarly whilst linear predictors 8 achieved the time-domain coding, any suitable time-domain coding technique may be employed such as (an adaptive estimation technique) pulse code modulation, code excited line prediction, multiple coding.

It will be appreciated that the filter 4 acts as a receiver of incoming audio data 2, and that demultiplexer 28 acts as a receiver of incoming compressed coded audio data 14.

Claims

1. A method of audio data compression comprising: receiving audio data for compression, the audio data comprising components ofdiffering frequencies; providing a plurality of frequency sub-bands, the frequency range of each sub-band being a fraction of the total audio bandwidth, which fractions have a predetermined relationship therebetween; and filtering the components of the data into the sub-bands in dependence upon correlation between the frequency of the components and the respective sub-band frequency range; characterised in that the components within The sub-band covering the range of lowest frequencies are transform coded; the components within all other sub-bands are timedomain coded; and all the coded components are multiplexed to provide a signal representative of compressed audio data.

2. A method according to claim 1 wherein the predetermined relationship is a binary relationship.

3. A method according to claim 1 or claim 2 wherein the fraction of the total audio bandwidth covered by the sub-band for transform coding is the lowest quarter of the bandwidth, and the fraction covered by the time-domain coding is the highest three-quarters of the total bandwidth.

4. A method according to claim 3 wherein the fraction of the total audio bandwidth covered by the sub-bands for time-domain coding is split into five sub-bands, two of which each cover the highest quarter of the total audio bandwidth, one of which covers the next highest eighth of the total audio bandwidth, and two of which each cover the lowest sixteenth of the total audio bandwidth.

5. A method according to any one of The preceding claims wherein both the transform and fimedomain coding develop masking thresholds for each of their associated sub-bands, for enabling psychoaco us tic masking of the coded components.

6. Apparatus for compression of audio data comprising: a receiver for receiving audio data for compression, which data comprises components of differing frequencies; and a filter for providing a pluralify of frequency sub-bands, the frequency range of each sub-band being a fraction of the total audio bandwidth, which fractions have a predetermined relationship therebetween, and for filtering the components of the data into the sub-bands in dependence upon correlation between The frequency of the components and the respective sub-band frequency-range; the apparatus including a transform coder for transform coding the components within the subband covering the range of lowest frequencies; timedomain coders for time-domain coding the components within all other sub-bands; and a multiplexer for multiplexing all the coded components to provide a signal representative of compressed audio data.

7. Apparatus according to claim 6 wherein the predetermined relationship is a binary relationship.

8. A method of expanding compressed audio data, the data having been compressed in accordance with any one of claims 1-5, the method comprising: receiving compressed coded audio data; demultiplexing the received compressed coded data to provide coded frequency sub-bands of the data; transform decoding the lowest range of sub-bands; time-domain decoding all remaining sub-bands; and combining all the decoded subbands to provide an expanded audio signal derived from the compressed audio data.

9. Apparatus for expanding compressed audio data, the data having been compressed in accordance with any one of claims 1-5, the apparatus comprising: a receiver for receiving compressed coded audio data; a demultiplexer for demultiplexing the received compressed coded data to provide coded frequency subbands of the data; a time domain decoder for time-domain decoding all remaining subbands; and combiner means for combining all the decoded sub-bands to provide an expanded audio signal derived from the compressed audio data.

10. A method of audio data compression as substantially hereinbefore described and with reference to The accompanying drawings.

11. Apparatus for compression of audio data as substantially hereinbefore described and with reference to the accompanying drawings.