GB2423451A

GB2423451A - Inserting a watermark code into a digitally compressed audio or audio-visual signal or file

Info

Publication number: GB2423451A
Application number: GB0503257A
Authority: GB
Inventors: Ian Mcdonald Green
Original assignee: ISHCE Ltd
Current assignee: ISHCE Ltd
Priority date: 2005-02-16
Filing date: 2005-02-16
Publication date: 2006-08-23
Also published as: GB0503257D0

Abstract

A method of inserting a watermark code into a digitally compressed audio or audio-visual signal or file in which samples are stored in discrete frequency bands without decompression, the method comprising:- unpacking an incoming digital bit stream having a given bit rate to provide samples having a first amplitude resolution and/or representation(1); selecting or deriving samples from a frequency band which is narrower than, and located wholly within, one of said discrete frequency bands (2); encoding watermark data into said selected samples by modifying the selected samples according to a scheme in dependence upon said first amplitude resolution and/or representation (3); and optionally re-packing the samples to give an output bit stream having the same given bit rate (4).

Description

A METHOD OF INSERTING A WATERMARK CODE INTO A DIGITALLY

COMPRESSED AUDIO OR AUDIO-VISUAL SIGNAL OR FILE

[001] This invention relates to a method of inserting a watermark code into a digitally compressed audio or audio- visual signal or file. It also relates to apparatus for inserting a watermark code into such an audio or audio-visual signal or file.

[002] Methods of inserting a substantially inaudible identification code or watermark code into an audio signal such that it can be read by a suitable decoder are well known.

[003] One known method of watermarking a compressed audio bit-stream signal is to decompress the input bit-stream to 2CM samples, watermark the signal in that form, and re- compress the result to an output bit-stream. This method has several disadvantages. Firstly, the decompression and recompression processes are computing intensive, and therefore limit throughput. Secondly, it is known that the decompression and recompression cycle is very likely to degrade the audio quality of the signal. Finally, re- compression of watermarked 2CM samples is likely to degrade the watermark, resulting in difficulties with retrieving the watermark downstream.

[004] It is an object of the present invention to provide a method which can mitigate one or more of the above disadvantage arid/or provide additional advantages over known methods.

[005] According to a first aspect of the invention there is provided metha. as specified in claims 1 - 11. According to a second aspect of the invention there is provided a computer readable medium as specified in claim 12. According to a third aspect of the invention there is provided apparatus as specified in claims 13 - 15.

[006] The method of the present invention applies particularly to compressed audio having samples stored in discrete frequency bands, and does not require significant decompression of the compressed signal. Additionally, as the method can preferably be carried out on the unpacked fine frequency components, the reconstruction of the coarse frequency bands, or the anti-aliasing matrix, or the conversion to baseband PCM are preferably not required. Thus the computing requirement is low when compared to a full decoder, and as a result the watermarking process can be very fast and efficient.

[007] An embodiment of the invention will now be described, by way of example only, with reference to the accompanying schematic Figure 1 which shows a block diagram of a method according to the present invention.

[008] The present invention relates particularly, though not exclusively, to analogue audio watermarking of compressed audio or audio-visual bitstreams. In this regard, the term compressed is used in this description to mean digital data compression techniques, used for example by the Moving Picture Experts Group (MPEG) video and audio standards. The invention is particularly applicable to schemes which store samples separated into frequency bands. Such schemes include MPEG layer 2 and 3, AC3 and AAC. A typical compression ratio would be 10:1. The present invention is expected to find particular application in the watermarking of pre-compressed data streams such as DVD files. This type of compression is therefore different to analogue amplitude compression, which is a quite different technique.

[009] "Analogue audio watermarking" in the present case means a process which, although it may be applied to a digital signal or file, results in a watermark which survives when the audio signal is converted to an analogue signal, such as for example when sent to loudspeakers. Thus the watermark can be recovered from the analogue signal alone, using a suitable decoder.

0] With 16 bit PCM input to the digital data compressor and a compression ratio of 10:1, it is evident that over a substantial part of the frequency spectrum a single data bit must on average represent each sample. MPEG layer 3 compression, for example, often represents samples by a three level code, which has possible values of -1, 0 and +1. The actual sample amplitude is then proportional to the code value multiplied by a scaling factor which varies between frequency bands, and which is also a function of time.

Within these constraints it has proved difficult to establish how to modify the data to generate a watermark that is both inaudible and recoverable.

1] A first embodiment of the present invention will now be described, based on MPEG layer 3 encoding/compression, and implementing a watermark analogous to that described in more detail in EP 0 245 037. As the MPEG layer 3 standard is very complex, a simplified description will be presented which should be combined with detailed study of the MPEG standard to construct an encoder. This standard is referred to as ISO/lEc 13818-3:1998, and is available from the world wide web at www.iso.org.

2] In the MPEG layer 3 scheme, upper regions of the frequency spectrum are often coded using a three level scheme as described above. Lower frequencies are encoded as "big values" - namely a range of integers. The cross over frequency between these two alternative representations varies dynamically with time, such that a reliable watermark encoder would need to handle both representations.

Accordingly, a separate description will be provided for such tn-level and big value representations. it should be understood that the watermark is likely to switch rapidly between these representations during a single watermark code insertion step. The amplitude resolution available using the big values" representation can clearly be much greater than that possible with the tn-level representation.

3] The data stream is packed using Huffman coding, and needs to be unpacked prior to watermark insertion. After unpacking, the resulting samples are found to be divided into 32 coarse frequency bands (called discrete frequency bands in this description), with a separate scaling factor for each discrete band. Within each of these discrete bands there is a fine frequency structure, which will be described further later. To avoid the difficulty of constructing a watermark from components of different coarse discrete frequency bands having different scaling factors, and because of the inability to make fine adjustments to the tni-level values, the watermarking process is preferably restricted to a single coarse discrete frequency band. This also helps to make the watermark less audible on playback.

4] It is known that the edges of the coarse discrete frequency bands have aliasing components generated (for good reason) by the filtering that creates the bands. These aliasing components are substantially cancelled by equal and opposite aliasing in the decoder. However, if a watermark component were to be introduced between the encoder and the decoder, there would be no way to cancel the decoder aliasing. Therefore it is preferable that watermark insertion should occur away from the edges of the discrete frequency bands - very preferably near the centre of a discrete frequency band.

5] The amplitude scaling factors mentioned previously are time dependent as well as changing between discrete frequency bands. It is thus desirable that the watermark encoder takes these changes of amplitude into account, and that the decoder is insensitive to amplitude variations with time. In the MPEG scheme, the unit of time is known as a granule, and is typically 13 ms.

6] The watermark to be described for exemplary purposes is a Binary Phase Shift Keying (BPSK) modulation of a single fine frequency component (called a predetermined frequency band in the present description) . The data present in the predetermined frequency band is discarded, and watermark code put in its place. This is in some ways equivalent to the notch filtering and code insertion described in EP 0 245 037. A suitable predetermined frequency band preferably lies in the region of 3kHz. For musical program it would preferably lie between adjacent semitones in the tonic scale.

7] Although BPSK modulation is used in the following embodiment, other phase modulation or manipulation techniques may be employed as an alternative, whilst remaining within the scope of the invention.

8] In the present embodiment the BPSK bit period is set to 2 granules, and every second granule value is set to zero. Thus amplitude and scaling factor changes from granule to granule do not affect the time of the zero crossings, and so the BPSK signal can be decoded without timing errors caused by amplitude changes.

9] BPSK modulation consists of 180 degree phase inversion, which is equivalent to amplitude inversion. If the predetermined frequency band has data represented by tn- level components, the inversion is arranged by choosing the granule coefficient at the BPSK carrier frequency to be +1 or -1 as appropriate.

0] A characteristic of the MPEG layer 3 implementation is the 90 degree phase shift of a single fine frequency carrier (predetermined frequency band) between granules. Thus the scheme of setting every other granule value to zero means that the remaining granule values are either in phase or 180 degrees out of phase with each other. Thus, by choosing the sign of the amplitude appropriately, any desired BPSK data sequence can be generated as a watermark code for insertion.

1] The fine frequency channels in the MPEG layer 3 implementation have substantial overlap. Therefore in practice it is necessary for such an implementation to set to zero the components in the two adjacent fine frequency channels whenever the centre fine frequency channel contains watermark code data. The fine channel spacing is sufficiently narrow that the effect is inaudible, whilst ensuring that the BPSK data from the MPEG player can later be decoded without excessive programme breakthrough or interference from adj acent frequencies.

2] To ensure the watermark code is inaudible, it is preferable to refrain from inserting watermark data if the granule has no energy in the coarse (discrete) frequency band in which the fine (predetermined) frequency band lies.

3] A desirable feature of audio compression is that the compressed audio should not be audibly different from the uncompressed version. The difference between the two, in the part of the spectrum with tn-level (also known as tn-state) values, is related to all the frequency samples which are approximated by a tn-state value. Since the error due to MPEG approximation must often approach 0.5 on the tn-state scale, one would expect that altering a single fine frequency channel as described would be less audible than the totality of the MPEG approximation errors. It turns out in practice that modifying the tn-state values in the way described gives a substantially inaudible watermark.

4] When "big-value" representations are used by the MPEG layer 3 implementation to encode the fine frequency components, the principle of watermarking is the same, but there is the added requirement to select an amplitude for the centre channel of the fine (predetermined) frequency band. This amplitude, which is a compromise between audibility (if too large) and lack of robustness (if too small) is chosen using the principle described in EP 0 245 037. Whereas EP 0 245 037 describes a frequency-dependent masking filter that processes time samples to derive a safe watermarking amplitude, the present invention derives essentially the same information by taking the lower frequency samples within a granule, scaling each sample according to the response of the desired masking filter, and computing the r.m.s. value of the resulting components, this value being proportional to the safe watermarking amplitude.

5] Having thus derived the safe watermarking amplitude as a real number, the "big-value" of the centre channel is preferably set to the highest integer less than that amplitude, with sign appropriate to the modulation required.

If the derived amplitude is less than 1, the watermark insertion is preferably aborted.

6] One of the complexities of the MPEG layer 3 standard is that the "big values" scale is non-linear, so that this must be taken into account when encoding.

7] Since every other granule has a centre channel (predetermined frequency band) with a watermark component of zero, there is no need for a masking analysis on these granules. The same applies for times between watermark code insertions.

8] Once the watermark has been inserted into the predetermined frequency band (using one of the two schemes described above, depending on amplitude resolution - i.e. whether a tn-level or big number representation is being employed) the bit stream must be re-packed in accordance with the compression standard being used. For MPEG layer 3, data packing involves Huffman coding, which is lossless, but which has a compression ratio which is variable and difficult to predict. Thus there is no way of predicting the exact length of the watermarked bit-stream after packing. However, the MPEG layer 3 bit-steam standard sometimes defines a constant bit rate, which implies that the watermarked bitstream must sometimes have exactly the same number of bits as the original source bit-stream.

9] In practice, one expects each watermarked and packed granule to have a few bits more or less than the original packed granule. Huffman coding is designed for optimal packing, using the shortest code for the most common sample value. As statistically the most common sample value is zero, this coding is particularly efficient when packing zeros.

Since watermarking as described above sets many samples to zero, it is most likely that the watermarked data will have fewer bits than the original data in a given granule.

0] To cope with such differences in bit count before and after watermark insertion, it is recommended that if the packed watermarked granule has fewer bits than the original, the Huffman output should be packed with is. These will be ignored by the MPEG layer 3 decoder. If on the other hand the watermarked packed granule has more bits than the original, it is recommended that the last Huffman multi-bit symbol should be removed until the bit count is less than or equal to the required number. Any spare bits are then packed with is as before. The net effect of removing symbols in this way is to remove very high frequency components in the frequency range above 16 kHz, and in a way which is not audible.

1] The above adjustment of the bit count may be performed in a single pass, and is therefore efficient to compute.

2] The resulting watermarks are analogue in nature in that the watermark survives if the digital signal is converted to an analogue signal - for example for routing to loudspeakers. it also survives analogue or digital compression/decompression cycles.

3] The MPEG layer 3 standard implementation embodiment described above uses the fine frequency components available for selecting a narrower frequency band. If the method is to be used with MPEG layer 2, which does not have the fine frequency bands, the samples are derived using a narrowband filter rather than selection as described above.

4] In summary, a method of inserting a watermark code into a digitally compressed audio or audio-visual signal or file in which samples are stored in discrete frequency bands has been described. The method comprises, as shown in Figure 1, unpacking an incoming digital bit stream having a given bit rate to provide samples having a first amplitude resolution and/or representation (block 1), selecting or deriving samples from a frequency band which is narrower than, and located wholly within, one of said discrete frequency bands (block 2), encoding watermark data into said selected samples by modifying the selected samples according to a scheme in dependence upon said first amplitude resolution (block 3), and (optionally) re-packing the samples - :io - to give an output bit stream having the same given bit rate (block 4) . Watermark insertion can be achieved without significant decompression of the compressed signal or file.

5] It is to be understood that means equivalent to those specified in the above embodiments may be substituted if such equivalent means achieve the same results. In addition, many minor and/or obvious modifications to the above embodiments can be made whilst still falling within the scope of the invention as specified in the following claims.

Claims

- 11 - CLA I MS

1. A method of inserting a watermark code into a digitally compressed audio or audio-visual signal or file in which samples are stored in discrete frequency bands, the method comprising:- a. unpacking an incoming bit stream having a given bit rate from the said audio or audio visual signal or file, to provide samples having a first amplitude resolution and/or representation, b. selecting or deriving samples from a predetermined frequency band which is narrower than, and located wholly within, one of said discrete frequency bands, c. encoding watermark data into said selected or derived samples by encoding said samples according to a scheme in dependence upon said first amplitude resolution and/or representation, and d. re-packing the samples to give an output bit stream.

2. A method as claimed in claim 1 in which in step d.

the samples are re-packed to have the said given bit rate.

3. A method as claimed in claim 2 in which step d.

includes discarding the highest frequency components of the bit steam when necessary to maintain the given bit rate.

4. A method as claimed in any preceding claim in which the selected or derived samples having a first amplitude resolution and/or representation are represented by tn-level values, and the encoding comprises a phase modulation of said predetermined frequency band or a part thereof.

5. A method as claimed in any preceding claim in which - 12 - the selected or derived samples having a first amplitude resolution and/or representation are represented by "big values", and the encoding comprises a phase modulation of said predetermined frequency band or a part thereof.

6. A method as claimed in any preceding claim in which the encoding comprises a Binary Phase Shift Keying (BPSK) modulation of said predetermined frequency band or a part thereof.

7. A method as claimed in any preceding claim in which the first amplitude resolution (and/or the way in which the sample amplitude is represented) varies with time.

8. A method as claimed in any preceding claim in which the said audio signal or file has been compressed using one of the schemes in the group consisting of: MPEG layer 2; MPEG layer 3; AC3; and AAC.

9. A method as claimed in any preceding claim in which the predetermined frequency band is located sufficiently far from the edges of the said discrete frequency band that aliasing components are not present in the predetermined frequency band.

10. A method as claimed in any preceding claim in which the samples have a first and a second amplitude resolution and/or representation which are different, and in which said encoding of said samples is performed according to a different scheme for respective said amplitude resolutions or representations.

11. A method of inserting a watermark code into a digitally compressed audio or audio-visual signal or file in which samples are stored in discrete frequency bands, substantially as described herein.

- 13 -

12. A computer-readable medium storing a computer program comprising instructions which, when loaded into a computer and executed, performs a method as claimed in any one of claims 1 - 10.

13. Apparatus for inserting a watermark code into a digitally compressed audio or audio-visual signal or file in which samples are stored in discrete frequency bands, comprising an encoder having means for unpacking an incoming digital bit stream having a given bit rate from the said audio or audio visual signal or file, to provide samples having a first amplitude resolution and/or representation, means for selecting or deriving samples from a predetermined frequency band which is narrower than, and located wholly within, one of said discrete frequency bands, means for encoding watermark data into said selected or derived samples by encoding the said samples according to a scheme in dependence upon said first amplitude resolution, and means for re-packing the samples to give an output bit stream.

14. Apparatus as claimed in claim 12 in which the means for re-packing the samples provides an output bit stream having the said given bit rate.

15. A watermarking system comprising apparatus as claimed in claim 13 or 14, and a decoder being adapted or arranged for recovering the encoded watermark.