GB2419765A

GB2419765A - Image watermarking by forming weighted code word coefficients in a down-sampled transform domain

Info

Publication number: GB2419765A
Application number: GB0424227A
Authority: GB
Inventors: Daniel Warren Tapson
Original assignee: Sony United Kingdom Ltd
Current assignee: Sony Europe Ltd
Priority date: 2004-11-01
Filing date: 2004-11-01
Publication date: 2006-05-03
Also published as: GB0424227D0

Abstract

An encoding data processor generates a watermarked copy of an image by embedding a code word into a copy of the image. A generated code word comprises a plurality of code word coefficients and is formed into a transform domain representation corresponding to spatially and/or temporally down-sampling the image with respect to a sampling rate of the base band image. An image perception analyser 14.1 forms a down-sampled version of the image and generates weighting factors for scaling the code word coefficients. A strength adapter 24 combines the weighting factors with the code word coefficients, and an inverse transform processor 26 comprising an up-sampler is operable to convert the weighted transform domain code word coefficients into the base band domain. The inverse transformed code word is combined with the image in the base band domain. The encoder can be used for digital cinema applications. Payload data is extracted from the watermarked copy of the image W by comparing it with the original image I to recover an estimate of the code word and correlating the estimated code word with re-generated code words from the code word generator 4.

Description

I

ENCODING AND DETECTING APPARATUS

Field of Invention

The present invention relates to encoding data processing apparatus and methods for generating marked copies of images by introducing code words into copies of the images.

Correspondingly, the present invention also relates to detecting data processing apparatus and methods operable to detect code words, which may be present in marked images.

The present invention also relates to media terminals for generating marked copies of material items for reproduction. In some applications the media terminal may form part of a digital cinema projector.

Back2round of the Invention Generally, a technique for embedding data in material to the effect that the embedded data is perceptible or imperceptible in the material is referred to as watermarking. Code words are applied to copies of material items for the purpose of identifying the copy of the material item or for conveying data represented by the code words. In some applications, watermarking can provide, therefore, a facility for identifying a particular copy of the material.

A process in which information is embedded in material for the purpose of identifying a specific copy of the material is referred to as finger printing. A code word, which identifies the material, is combined with the material in such a way that, as far as possible, the code word is imperceptible in the material. As such, if the material is copied or used in a way, which is inconsistent with the wishes of the owner, distributor or other rights holder of the material, the material copy can be identified from the code word and take appropriate action.

In co-pending UK patent application number 0327854.6 an encoding data processor is disclosed for application to for example digital cinema in which payload data having two or more fields is represented by watermarking an image with one or more code words. Each value of a first field of a payload data word is associated with a first code word, and each value of a second or subsequent data field is represented by a second code word, which is selected in dependence upon the value of the first data field. As such a detected code word can be used to identify one of a plurality of sub- sets into which the data words are divided, each data word in the sub-set having the same value in the first field. A second or subsequent field of the data word from the sub-set can be then identified by detecting a second code word from the material item.

The second code word identifies a value in the second data field as well as the value of

the first data field.

The first and second fields can be assigned to different associated parameters such as for example address parameters. The first field could therefore identify the country of distribution of the material, whereas the second field could identify a particular distribution outlet within the country. A watermarking system can be formed for identifying a point of distribution of copies of video material. However, for applications such as digital cinema in which a number of samples per image frame is relatively high, it is desirable to reduce an amount of processing which is required to generate watermarked images.

Summary of Invention

According to the present invention there is provided an encoding data processing apparatus for generating a marked copy of an image by introducing at least one code word into a copy of the image. The apparatus comprises a code word generator, an image analyser, a strength adaptor, an inverse transform processor and a combiner. The code word generator is operable to generate the code word having a plurality of code word coefficients and to form the code word into a transform domain representation, the transform domain representation corresponding to at least one of spatially or temporally down sampling the image with respect to a base band sampling rate of the image. The image analyser is operable to receive the image in a base band form and to form a down-sampled version of the image, the down sampling being effected in correspondence with the down sampled representation of the code word formed by the code word generator with respect to the base band sampling rate of the image. The image analyser generates weighting factors for scaling the code word coefficients with respect to parts of the base band image with which the code word coefficients are to be combined. The strength adapter is operable to combine the weighting factors with the code word coefficients, and the inverse transform processor is operable to convert the transform domain code word coefficients scaled by the weighting factors into the base band domain by correspondingly up-sampling the code word at least one of spatially or temporally to form the base band domain version of the code word for combining with the image. The combiner is operable to combine the inverse transformed code word with the image in the base band domain, the code word being formed in the transform domain for combining with the image in the base band domain, and the weighting factors are formed from the image in the spatial domain.

An encoding data processing apparatus according to embodiments of the invention can provide an advantage with respect to an amount of processing which is required to form a marked copy of an image. Known encoders for marking a copy of an image, such as for example disclosed in European patent EP 1324263 A typically transform an image to be marked into the transform domain, then combine the image with a code word and convert the marked image back to the spatial domain by performing an inverse transform. The strength of the watermark code word is adapted with respect to the transform domain image. It is also known from European patent EP 1098522 to combine a watermark code word with an image in the spatial domain.

However the known encoding arrangements perform a linear combination of the image and the code word to form the marked image, which is formed from the original copy of the image. Embodiments of the present invention are arranged to form the watermark code word in the transform domain for combination with the image in the spatial domain. Furthermore, the weighting values for adapting the strength of the code word coefficients in some embodiments can be formed from the image in the spatial domain. A result of this arrangement is that a number of calculations required to form the marked image can be substantially reduced with respect to known techniques. This is in part because the encoding process does not combine the code word with the image in the transform domain, but maintains the version of the image to be marked in the spatial domain. As a result, processing of the image to determine the weighting factors does not have to be linear, in that it is not necessary to recover an original form of the image. A copy of the image can, therefore be processed and discarded in order to calculate the weighting factors.

In some embodiments an analysis performed by the image analyser includes at least one of perceptual weighting, temporal masking and contrast masking. For example the temporal masking forms weighting factors, which increases the code word coefficients if a scene change in video media represented by the images occurs. The contrast masking may include smoothness masking to adapt the weighting factor values calculated by the contrast masking function.

The transform domain may correspond to at least one of spatially or temporally down sampling the image with respect to a sampling rate of the base band image, for example, where the image is a video sequence of images such as for digital cinema applications. In such embodiments the inverse transform processor may be operable correspondingly to up-sample the code word at least one of spatially or temporally to form the base band domain version of the code word for combining with the image.

For applications to digital cinema, such embodiments are advantageous because a quantity of data in a digital cinema image is relatively high, which would require a relatively high processing rate to form a marked version of the image. Therefore, by forming the code word in the transform domain which includes a down-sampled representation with respect to the original image, an amount of processing which is required to form the code word is substantially reduced in accordance with the down sampling rate. Furthermore, all processing required to form the code word in the transform domain has a corresponding reduced amount of processing.

In some embodiments the image the analyser is operable to form a downsampled version of the image. The down sampling may be effected in correspondence with a down sampled representation of the code word formed by the code word generator with respect to a base band sampling rate of the image. The weighting factors are determined with respect to the down sampled version of the image.

Accordingly, an amount of processing which is required to calculate the weighting factors is substantially reduced with respect to that which would be required if the weighting factors were calculated with respect to the base band image. As explained above, therefore performing a nonlinear transform on the image in order to form a version of the image to determine the weighting factors is made possible because processing is performed only to form the code word, because the code word is combined with the image in the base band domain.

According to an aspect of the present invention there is provided a detecting data processing apparatus operable to determine payload data from a marked copy of an image. The image has been marked by forming at least one code word from a set of code words in a transform domain to represent the payload data, the transform domain including at least one of spatially or temporally down-sampling. The image is marked by performing an inverse transform of the code word by up-sampling the code word to a sampling rate corresponding to the sampling rate of the image and combining the code word with a copy of the image. The detecting apparatus comprises a recovery processor, a code word generator, a correlator and a control processor. The recovery processor is operable to recover an estimate of the code word by comparing the marked version of the image with an original version of the image. The code word re- generator is operable to re-generate code words from the set of code words, and the correlator is operable to correlate the estimated code word recovered from the marked image with a code word re-generated by the code word re-generator. The control processor is operable to control the code word re-generator and the correlator to correlate the re-generated code words with the estimated code word recovered from the marked image, and to determine whether a code word has been detected if a result of the correlation from the correlator exceeds a predetermined threshold. The control processor is operable to recover the payload data in accordance with the detected code word. The recovery processor includes a down sampling processor operable to form the estimated code word having a sampling rate which is down-sampled with respect to the base band versions of the marked image and the original version of the image.

In some embodiments the down-sampling processor may comprise first and second down sampling processors. The first down-sampling processor forms a down- sampled version of the marked copy of the image. The second down-sampling processor forms a down sampled version of the original image and a comparator recovers the estimated code word by comparing the down-sampled version of the original image with the down-sampled version of the marked image.

A detecting data processing apparatus is provided with an advantage in that processing required to determine a payload data word represented by the code word with which the material is marked is substantially reduced because the processing is performed on a transform domain which is downsampled with respect to the base band domain. Accordingly processes such as correlation of the re-generated code words with the estimated code word are performed with a substantially reduced number of calculations with respect to a number which would be required if the calculations were performed at a rate corresponding to the base band image.

Various further aspects and features of the present invention are defined in the appended claims. These aspects include a media terminal, a cinema projector, a method of generating a marked copy of an image and a method of detecting a payload in a marked image.

Brief Description of the Drawin2s

Embodiments of the present invention will now be described by way of example only with reference to the accompanying drawings, where like parts are provided with corresponding reference numerals, and in which: Figure 1 is a schematic block diagram of a known encoding apparatus for combining an image with a code word to form a marked version of the image; Figure 2 is a schematic block diagram of an encoding apparatus for combining an image with a code word according to the present technique; Figure 3 is a schematic block diagram of an inverse transform processor forming part of the apparatus shown in Figure 2; Figure 4 is a schematic block diagram of a perceptual image analyser which forms part of the encoding apparatus of Figure 2; Figure 5 is a graphical representation of a relative contrast measure of a code word coefficient with respect to a relative contrast measure of an image block to which the code word coefficient is to be added; Figure 6 is a schematic block diagram of a second example encoding apparatus for combining an image with a code word according to the present technique; Figure 7 is a schematic block diagram of a detecting data processing apparatus for detecting payload data represented by code words present in marked copies of images; Figure 8 is a flow diagram illustrating a process performed by an encoding data processing apparatus according to the present technique; Figure 9 is a flow diagram illustrating an example process of inverse transforming a code word, which may be performed in the process represented in Figure 8; Figure 10 is a flow diagram illustrating a process performed by a perceptual image analyser to determine weighting factors for strength adapting coefficients of the code word; Figure 11 is a flow diagram illustrating possible further functions of the image analyser; and Figure 12 is a flow diagram illustrating a process for detecting a payload data word from a marked image according to the present technique.

Description of Example Embodiments

General Watermarkin Encoder An example of a known encoding data processing apparatus, which is operable to generate watermarked images by combining or embedding a code word with the images, is shown in Figure 1. The known encoding data processing apparatus shown in Figure 1 is disclosed in European patent application EP 1324263 A. In Figure 1 images I are generated by a source 1 and fed to an encoder 2 which is arranged to combine payload data words P generated by a data word generator 4 so that at the output of the encoder 2 a marked copy W of the images I is formed. As shown in Figure 1, the encoder 2 includes a transform processor 6, a combiner 8 and an inverse transform processor 10. The encoder also includes a watermark code word generator 12 generates a code word or code words X having a plurality of code word coefficients. Each code word represents a payload data word P or a part of the data word P generated by the data word generator 4. The transform processor 6 receives the source image I and performs a transform such as a Discrete Wavelet Transform (DWT) or a Discrete Cosine Transform (DCT) to convert the image into the transform domain. A perceptual image analyser 14 is arranged to receive a copy of the transform domain image I which is to be marked with the code word X and generates a weighting factor a for each part of the image with which the code word coefficients are to be combined. For example, each luminance component of the image pixels may be arranged to carry one or more of the code word coefficients. The weighting factors a are therefore adapted to the effect that the weight of the code word coefficients is made as high as possible, to increase a likelihood of detection of the code word whilst as far as possible reducing a likelihood of the effect of the code word being perceived by the human eye when the code word is combined with the image I. A strength adaptor 14 is therefore arranged to combine the weighting factors a with the code word coefficients. The weighted code word coefficients are then combined with the image in the transform domain by the combiner 8 and the marked copy of the image W is then formed by the inverse transform processor 10 by forming an inverse transform on the marked transform domain image V. In the following description the term "samples" will be used to refer to discrete samples from which an image is comprised. The samples may be luminance samples of the image, which is otherwise, produce from the image pixels. Therefore, where appropriate the term samples and pixels are inter-changeable.

Video images are one example of material, which can be protected by embedding a digital code word. Other examples of material, which can be protected by embedding a code word, include software programs, digital documents (optionally reproduced on paper or other media), music, audio signals and any other information- bearing signal.

Watermark Encoder An encoding data processing apparatus, which operates in accordance with the present technique, is shown in Figure 2. Unlike the conventional arrangement shown in Figure 1, the encoding data processing apparatus shown in Figure 2 is arranged to combine the code word with the image to form the marked copy in the base band domain. The encoder 20 shown in Figure 2 includes a code word generator 22 which arranges the code word coefficients into a form corresponding to a transform domain representation of the image. Weighting factors are then generated by a perceptual analyser 14.1 in accordance with a relative ability of the image to carry the code word coefficients with a maximum strength whilst minimising a risk of the code word being perceivable when added to the image I. The weighting factors are received by a strength adaptor 24 and combined with the code word coefficients to form weighted code word coefficients. The weighted code word coefficients are then transformed into the base band domain by an inverse transform processor 26, which performs an inverse transform on the code word. The base-band domain code word is then combined with the base band domain image by a combiner 28 to form the marked copy of the image W. According to one example of the present technique, the transform domain representation includes either a temporal andlor spatial down-sampled representation with respect to a sampling rate of the base band domain image. The code word is therefore arranged in a form or treated as if the code word were in a form in which it had been spatially andlor temporally down-sampled with respect to the base band version. As such the inverse transform processor is arranged to temporally andlor spatially up-sample the code word coefficients to form a base band version of the code word, in which form the code word is combined with the base band image I to form the marked copy of the image W. In some embodiments utilising the present technique, the transform domain representation of the code word may include a Discrete Cosine Transform (DCT), a Fourier Transform or a Discrete Wavelet Transform. For example, the code word could be formed as if in a DCT domain, so that the inverse transform processor 26 may be arranged to perform an inverse DCT on the code word coefficients before being spatially and/or temporally up-sampled. Accordingly, contributions from the codeword coefficients may be localised within certain preferred frequency bands of the image.

An example of an inverse transform processor 26 is shown in Figure 3 in more detail. As shown in Figure 3, the inverse transform processor 26 includes an inverse DCT transformer 30 which performs an inverse DCT on the down-sampled code word as formed into a DCT domain image. An upsampling processor 32 is then operable to spatially and/or temporally upsample the code word to provide a sampling rate, which corresponds to that of the base band domain image into which the code word is to be introduced.

In one embodiment the up-sampling processor 32 may include an up-sampler 33 and a low pass filter 34. The up-sampler 33 is arranged to introduce zero valued sub-samples into the down-sampled spatial domain representation of the code word to form a version of the representation of the code word having a sampling rate corresponding to the sampling rate of the base band image. The low pass filter 34 then filters the version of the representation of the code word to form the up-sampled version of the representation of the code word for combining with the image in the spatial domain. The filter may be different for filtering the image in the temporal dimension than the spatial dimension. In some embodiments the temporal domain filter may be an infinite impulse response filter.

Using an infinite impulse response filter provides an advantage, because infinite impulse response filters typically have substantially fewer tap delay elements, thereby reducing the delay to the image being filtered as compared with an equivalent finite impulse response filter. If a finite impulse response filter were used, then the image would have to be delayed and stored, which represents a substantial increase in cost and complexity as well as delay in the temporal dimension, particularly, for example for an image generated for digital cinema, in which the sampling rate of the image is relatively high.

A disadvantage of infinite impulse response filters is that they are known to introduce a non-linear phase distortion in the signal being filtered. However as explained, embodiments of the present invention do not require a reversible transform of the image to be formed because the code word is combined with the original copy of the image in the spatial domain. As such infinite impulse response filters can be used to form the up-sampled code word. Advantageously, the up-sampled version of the code word is filtered in one direction, for example a forward time direction, because as will be explained shortly, a down-sampler may include a low pass filter, which is also an infinite impulse response filter. Passing the down-sampled image through an infinite impulse response filter in a reverse direction to the up-sampler, such as a backward time direction can reduce phase distortion caused by filtering on the recovered code word. This is because the phase distortion introduced when up- sampling is substantially off-set by an opposite phase distortion in the down-sampler.

Embodiments which utilise the present technique provide an advantage with respect to conventional arrangements in that generation and strength adaptation of the code word coefficients is performed at a lower rate and lower band width with respect to the base band image. For an example where the image represents frames of high definition television pictures or digital cinema images in which the number of pixels in an image frames comprises 4096x2048 pixels (8x106 pixels), the code words for combining with the base band image can be formed in the transform domain as 256x128 pixel frames. Correspondingly, the weighting factors generated by the perceptual analyser 14.1 can be 256x128 factors per frame. The strength adapter 24 therefore can perform the combination of the code word coefficients and the weighting factors at a relatively low rate requiring, for example, only 256x 128 multiplications as opposed to 4096x2048 multiplications which would be required if the code word coefficients were combined with the image in the base band domain. As explained above, conventional arrangements such as that shown in Figure 1, combine the code word with the image in the transform domain. As such, the transform processor 6 would be required to perform a transform, such as a DCT on an image frame comprising 4096x2048 pixels, and then a corresponding inverse transform. As can be appreciated, this would represent a significant amount of processing power in terms of a number of calculations per second. Embodiments utilising the present technique therefore reduce an amount of processing required and can therefore be implemented using a substantially reduced number of computations per second, because the transform domain includes a down-sampled representation thereby allowing a less expensive implementation.

Image Perception Analyser As explained above, an image perception analyser 14 is arranged to calculate the weighting factors for adapting the code word coefficients in accordance with an ability of the image, or parts thereof to mask the visual presence of contributions from the code word coefficients. The effect of the code word should be therefore, as far as possible be substantially imperceptible in the image. An example of an image perception analyser utilising the present technique is shown in Figure 4. In Figure 4 the signal representation of the image is received from the source by a spatial domain down sampler 40, which spatially down-samples the received images. According to one example in which the images are formed from images generated according to the digital cinema standard, the source images comprise 4096x2048 pixels each pixel having 12 bit samples. For this example the spatial down sampler may down-sample the images to 256x128 pixels each having 8 bit samples. The down-sampling may be different to the down-sampled representation in which the code word is formed for combining with the image. The spatially down sampled images are then received by an image divider, which is arranged to divide the down-sampled image into 4x4 image blocks. The down sampled image is then received from the image divider 42 by at least a contrast masking engine 44 and a smoothness making engine 46.

The contrast-masking engine 44 includes a DCT transformer 48, which is arranged to transform 4x4 blocks of pixels into the DCT transform domain. A coefficient contrast comparator 50 then receives the transform domain image. The contrast comparator compares the DC value of the DCT coefficients with the AC value of the DCT coefficients within each 4x4 pixel block to form a relative contrast measure for the DCT coefficient concerned. From a predetermined relationship between a relative contrast measure which would be produced by the code word coefficient with respect to a relative contrast measure of the image block which would mask this code word coefficient, a maximum allowable contrast is determined for the relative contrast measure of the DCT coefficient concerned. A weighting factor is then calculated by scaling the maximum allowable contrast measure with the value of the DCT coefficient concerned. This process is explained in more detail in the following paragraphs: The contrastmasking engine 44 applies a tecimique referred to as contrast masking in which a target signal T, which in this case is the watermark code word is hidden by another signal M, which is referred to as the masker, which in this case is the image. According to the present technique, the image signal is tested to determine a relative ability of the image to mask the watermark code word coefficient or coefficients. The masking ability is assessed in accordance with frequency sensitivity in that the human visual sensitivity differs for different spatial frequency and orientation. The contrast masking ability is greatest when the masking signal and the target signal are spatially coincident and are of similar frequencies. For this reason, the masking of each watermark code word coefficient is considered with respect to corresponding DCT coefficients of the source image. The contrast of the source AC coefficient u, v in an image block b of the source S to the ratio of the DC value is defined as follows: - S /S Lbuv - b,u,v / b,O,O Where DbUV are the u, v coefficients of the block b of the DCT transform of the source image. The human visual sensitivity threshold of a target of contrast CT in the presence of a masking image of contrast CM is modelled using a predetermined perception function. An example of such a perception function is as illustrated in Figure 5. As can be seen for the example of Figure 5 in a flat region of the function 50, for small values of contrast of the masking image CM the maximum value of the target contrast CT below which the watermark code word coefficients can be added is constant. According to the function illustrated in Figure 5, at a threshold T the log contrast of the target rises linearly with respect to the log of the value of the masking contrast CM. The value of the threshold is different for each of fifteen AC coefficients, higher spatial frequencies having higher threshold values. The values of the threshold T, and an exponent value P, illustrated in Figure 5 are determined empirically. Applying the function illustrated in Figure 5, a maximum allowable watermark coefficient value, which can be applied for a given data block DbUV is defined by the equation below: = D' = TU VDOO max(1,[D I lDç0] A Accordingly, the weighting value is calculated for each image data block DbUV to ensure that the coefficient value does not exceed the maximum allowable value calculated above. Effectively, therefore the weighting factor is calculated by determining the maximum allowable contrast caused by the code word coefficient determined from a ratio of the AC coefficient to the DC coefficient of the coefficient in the DCT domain. The weighting factor is calculated by scaling the maximum allowable contrast with the DC value of the DCT domain coefficient of the code word.

As mentioned above, the spatially down sampled image is also received from the image divider 42 by the smoothness compensation engine 46. The smoothness compensation engine 46 is provided to compensate for some limitations of the contrast masking engine 44. The contrast-masking engine 44 can operate well for smoothly varying signals. However edges in images can place energy in many of the coefficients of a DCT block and can lead to an overestimate of the maximum allowable watermark code word coefficient. For this reason the smoothness compensation engine 46 calculates a correction factor for each DCT block which varies between zero and one [0, 1] to an effect of reducing the weighting value in the presence of smooth areas and edges. The smoothness engine 46 includes a neighbour contrast calculator, which is arranged to determine a smoothness factor of a block by counting the number of smooth pixels in the block DbUV. A pixel is determined to be smooth if a sufficient number of its nearest neighbours are within a neighbourhood threshold of its own value. The smoothness factor of a block is then calculated as a function of the proportion of smooth pixels in the block. The neighbourhood threshold of pixel values, which is used to define whether a pixel is determined as being smooth, and the relationship between the smoothness factor and the compensation factor is determined empirically from an effect of combining code word coefficients with the image blocks with certain smoothness factors, as judged by the human eye.

Once the compensation factor has been calculated by the smoothness engine 46, these are received by a weighting factor combiner 58 and combined with the weighting factors generated by the contrast masking engine 44. The weighting factors are then fed to the strength adapter 24 as shown in Figure 2.

As shown in Figure 4, in some embodiments the perceptual analyser 14.1 may also include a perceptual weighting engine 60 and a temporal masking engine 62. The perceptual weighting engine 60 is arranged to generate an adjustment to the weighting factors generated by the contrast masking engine 44 and the smoothness masking engine 46 in accordance with a function based on the human visual system. Thus according to an empirically determined function derived from human perception adjustment factors are generated based on the source image in the spatial domain and combined with the weighting values generated by the contrast masking engine 44 and the smoothness compensation engine 46 by the combiner 58. Similarly the temporal masking engine 62 is arranged to generate an adjustment to the weighting factors generated by the contrast masking engine 44 and the smoothness masking engine 46 in accordance with temporal changes to the video images. Thus for example, at moments of scene change in the image, humans are less sensitive to noise andior visual imperfections in the image. Accordingly watermark code word coefficients are introduced into video images after scene changes in the video images. Scene changes are therefore detected in the video source images in the spatial domain and combined with the weighting values generated by the contrast masking engine 44 and the smoothness compensation engine 46 by the combiner 58.

More Detailed Encoder Example Figure 6 provides a more detailed example of an encoding data processing apparatus which is arranged to embed a watermark code word into a copy of an information material item such as video images. The encoding data processing apparatus shown in Figure 6 includes a payload data word generator 4.1 which is arranged to operate in accordance with co-pending UK patent application number 0327854.6 to generate a code word having at least two parts or layers to represent two fields of a payload data word. The code words are arranged to represent the two fields in a hierarchical maimer with the effect that the first code word identifies the first data field, whereas the second code word identifies a value of the second data field conditioned on the first data field. As such a detected code word can be used to identify one of a plurality of sub-sets into which the data words are divided, each data word in the sub-set having the same value in the first field. A second or subsequent field of the data word from the sub-set can be then identified by detecting a second code word from the material item. The second code word identifies a value in the second data field as well as the value of the first data field. The arrangement of the data fields in association with the watermark code words uniquely identifies a layer hierarchy of data field values, which can be used to convey different types of information.

As shown in Figure 6, the code word generator 4.1 includes a first layer data word generator 60 and a second data word generator 62. The first and second data word generators 60, 62 are arranged to generate respective data words which may in some applications be utilised to embed an identifier having a first part Dl and a second part 1D2. As explained in co-pending UK patent application number 0327854.6 two code words may be generated to represent payload data having at least two parts or layers. The code words are arranged to represent the two fields in a hierarchical manner with the effect that the first code word identifies the first data field, whereas the second code word identifies a value of the second data field conditioned on the first data field. The payload data may be an identifier, the first data field providing a first identifier ID 1 which is represented by a first code word, whereas the first and second parts ID1 + 1D2 of the identifier may be represented by a second code word. As shown in Figure 6, the first layer data word generator 60 generates the first layer data word ID 1 of 20 bits, whereas the second layer data word generator 62 generates the second layer data word ID1 + 1D2 of 40 bits. The first layer data word represents the first part of the identifier ID 1, whereas the second layer data word represents the first and second parts of the identifier Dl + 1D2.

In a digital cinema application the first part of the identifier Dl can represent the projector identifier whereas the second part of the identifier 1D2 can represent a time, date or location at which the cinema film was displayed.

The data word generator 4.1 of the encoding data processor shown in a Figure 6 also includes a key generator 64, which generates keys for generating the code words to represent the first and second data word layers. The key generator 64 generates two keys for the first and second layers, which are fed to an encryption circuit 66 forming part of a code word generator 22.1. The keys may be provided from an interface 67.1.

The encryption circuit 66 receives first and second private encryption keys from a second interface 67.2 and encrypts the first and second data word layers in accordance with a private key encryption process to form first and second encrypted data word parts. In the example of Figure 6, the first and second encrypted data words each comprise 128 bits. The first and second encrypted data words are received by a pseudo random number generator 68. The pseudo random number generator 68 also receives first and second keys from the key generator 64 which also comprise 128 bits.

The pseudo random number generator uses the first and second encrypted data word layers as seeds to generate code word coefficients for each of two code words to represent the first and second data word layers. The first and second code words are then received by a Gaussian mapping processor 70 which is arranged to redistribute the coefficients of the first and second code words according to a normal distribution.

The Gaussian distributed code words are then combined by a combiner 72 to form a composite code word 74. A vector representing both data word layers is thereby formed and passed from the code word generator 22.1 to the strength adapter 24.1.

In Figure 6, a more simplified version of the perceptual analyser 14.2 shown in Figure 4 is arranged to generate weighting factors for combination with the code word coefficients as explained for the perceptual analyser 14.1 of Figure 4. However, the perceptual analyser 14. 2 of Figure 6 is shown as only including a spatial and temporal downsampler 76, which is arranged to receive a copy of the base band source images and to form a low resolution version of the source images by spatial and temporal down-sampling. The low resolution image is then transformed into the DCT domain by a DCT transform processor 78 and fed to a strength adaptor and combiner 80.

The strength adaptor and combiner 80 adjusts the weight of the transform domain code word coefficients received from the code word generator 22.1. For the example shown in Figure 6, the code word coefficients are adjusted with respect to an AC values of the DCT coefficients with respect to the DC value as explained with reference to the contrast masking engine 44 of the image perceptual analyser 14.1 of Figure 4. In other embodiments the image perception anslyser 14.2 may include one or all of the smoothness compensation engine 46, the perceptual weighting engine 60 or the temporal masking engine 62.

The perceptually weighted code word coefficients are then formed into a DCT domain representation of the source image, to which the code word is to be embedded.

The code word coefficients are then received by an inverse transformer 26. 1 which operates as described above to perform an inverse DCT transform on the spatially and temporally down-sampled image, to convert the image into the spatial domain. The spatial domain down-sampled image is then spatially and temporally up-sampled to the base band domain. The marked copy of the images W are then formed by a combiner 28.2 which is arranged to add the spatial domain coefficients to the original images I, to form finger printed frames. The finger printed frames may then be for example projected such as in a digital cinema.

Detectin2 Processor A detecting apparatus, which is arranged to detect code words and to recover a payload data word if present in the material item is shown in Figure 7. In Figure 7 an offending version of the watermarked image W' and a copy of the original image I are received in a recovery processor 90. Within the recovery processor 90, the marked image is received at a first down-sampling processor 100. The copy of the original image I is fed to a second down-sampling processor 102. The first and second down- sampling processors 100, 102 each spatially and temporally down sample the received images W', I to form reduced resolution versions. The reduced resolution versions correspond to the down-sampled version with respect to which the code word was formed in the transform domain, when embedded in the image I in the encoding data processor.

As shown in Figure 7 the first and second sampling processors may each include an down-sampler 100.1, 102.1 and a low pass filter 100.2, 102.2. The down- samplers 100.1, 102.1 are arranged to decimate the water marked image W', and the original version of the image I respectively to the effect that the sampling rate of the marked image W' and the original version of the image I are the same as the sampling rate of the transform domain representation in which the code word was formed. The down-sampling is effected both temporally and spatially in correspondence with the spatial and temporal down-sampled version with respect to which the code word was formed. The down-sampled version of the marked and the original image are then low-pass filtered with the low-pass filters 100.2, 102.2. to reduce aliasing errors.

As explained above with reference to the encoder, the low-pass filters 100.2, 102.2 may include both temporal and spatial low pass filters. In some embodiments the temporal filter may be implemented as an infinite impulse response filter, so that a smaller number of delay elements are required with respect a number which would be required for a finite impulse response filter to provide an equivalent cut-off performance. As mentioned above, utilising an infinite impulse response is advantageous for example with digital cinema applications because a relatively large number of samples for each image would result in a relatively high delay and a large a amount of storage for a corresponding finite impulse response filter. As indicated above, in order to reduce an amount of phase distortion caused by the infinite impulse response filter, the downsampled images are passed through the filter in a direction which is opposite to that in which the marked image was passed to effect upsampling in the encoder. This is because reversing the direction of filtering has an effect of introducing an opposite phase distortion into the down-sampled marked image. It is therefore more likely that the code word estimate will represent accurately the original code word samples.

The recovery processor 90 is arranged to process the marked image and the original image and to form an estimate of a code word which may have been embedded in the marked image. For the example shown in Figure 7, the recovery processor 90 also includes a registration processor 104, a comparator 106 and a transform processor 108.

The offending version of the image W' may have been produced by photographing or otherwise reproducing a part of the watermarked image W' As such, in order to improve the likelihood of detecting the identification code word, the registration processor 104 is arranged to receive the down-sampled version of the image I' and the suspected marked copy W" and to align substantially the offending image with the original version of the image. One example of a process for registering a received image with the original version of the image is provided in European patent application number 1 324 263 A. The purpose of this alignment is to provide a correspondence between the down-sampled original image samples I' and the corresponding samples of the down-sampled watermarked image W" to which the code word coefficients have been added, thereby increasing a likelihood of correctly detecting a code word, or reducing the false negative detection.

The registered image W" is received by the comparator 106 also forming part of the recovery processor 90. The comparator 106 also receives a copy of the down- sampled original image I' and proceeds to subtract the samples of the original image I' from the registered watermarked image W". Since the watermark code word was embedded into the image I' in the spatial domain there is no requirement to transform the image into the frequency domain to recover an estimate of the watermark code word V'. The estimate of the code word V' in the spatial domain is then fed to the transform processor 108 which forms an estimate of the code word by performing a DCT on the reduced resolution samples to form an estimated code word X'.

The output of the transform processor 108 therefore provides an estimate of the coefficients of the code word, which is to be identified. The recovered code word X' is then fed to a first input of a correlator 110. The correlator 110 also receives on a second input a re-generated code words produced by the code word generator 112.

The code word generator 112 operates to reproduce code words under the control of a control processor 114. The control processor 114 therefore has access to a data store 116, which stores seeds and keys for generating the watermark code words. The control processor 114 controls the correlator 110 and the code word re-generator to correlate, with the estimated code word, each of the code words in the set of possible code words, which may have been embedded in the image. If a result of the correlation exceeds a predetermined threshold then the control processor 114 determines that the code word was present in the estimated code word and accordingly the corresponding payload data word layer or identifier ID1, 1D2 is considered to have been detected.

According to the present technique the detecting data processor illustrated in Figure 7 is arranged to identify the payload data word having first and second layer parts to recover the identification data Dl, 1D2. An advantage is provided by the decoder according to the present technique in that most of calculations and processing performed by the detecting data processor are performed on a down-sampled version of samples corresponding to the original base band image from which the estimated code word is recovered. That is, the DCT transform and correlation of the re- generated code words with the estimated version of the code word are performed on a down sampled version of recovered samples of the code word. As can be appreciated, for the example application of digital cinema, the source images may comprise 4096x2048 pixels each pixel having 12 bit samples. For this example the down sampling may reduce the size of the images to be decoded to, for example 256x 128 pixels each having 8 bit samples. Accordingly, a substantial reduction in the amount of processing which is required to detect and recover an estimate of the code word is provided.

Summary of Operation

Encodin2 Process A flow diagram illustrating process steps involved in encoding an image to form a marked copy of the image is provided in Figure 9. The operations performed by the encoding process illustrated in Figure 9 are summarised as follows:- Si: The code word is generated for combining with the image in order to generate a marked copy of the image. As illustrated in Figure 6 this may involve several steps of generating a key as well as generating the payload data including parts which represent different layers. The payload data may then by encrypted to form a seed which is used by a pseudo random number generator with a key to generate the code word coefficients for combining with the image. As illustrated in Figure 6 the code word coefficients may also be processed in accordance with a Gaussian map to redistribute the vector values according to a normal distribution.

S2: A copy of the original image is also received by an image analyser.

S4: Weighting factors are then generated for each coefficient of the code word with respect to a part of the image to which the coefficient is to be added.

S6: The weighting factors are then combined with the code word coefficients to form strength adapted code word coefficients.

S8: The weighted code word coefficients are then converted from the transform domain to a base band domain corresponding to the domain of the base band image in its original form.

SlO: The code word is then combined with the image in the base band domain to form a marked copy of the image.

As already explained above, in one example the transform domain is the DCT domain in combination with a down-sampled representation with respect to the sampling rate of the image signal. Thus the transform domain corresponds to a temporally or spatially down sampled version of the sampling rate of the original image. As such, an example of the process step S8 for converting the transform domain code word into the base band domain is represented in Figure 10. The steps in Figure 10 are summarised as follows:- S12: The code word is converted from the transform domain which may be referred to as the frequency domain into the spatial domain by performing an appropriate transform. For example, the inverse DCT may be performed to convert the code word formed in the DCT domain into the spatial domain.

S14: The inverse DCT transformed code word is then converted into a base band form by up-sampling the samples of the spatial domain code word temporally and/or spatially in correspondence with a sampling rate of the image.

Caldulatin2 Wei2htin Factors To summarise the operation of the perceptual image analyser to generate the weighting factors for adapting the strength of the code word coefficents a flow diagram is provided in Figure 11. The process steps illustrated by the flow diagram on Figure 11 are summarised as follows:- S20: A copy of the image is received in base band form within the image analyser.

S22: The image is spatially down-sampled to reduce the number of samples of the image. Optionally the image may also be temporally down-sampled.

S24: The down-sampled image is then divided into image blocks which for example could be blocks having 4x4 pixels. The image blocks are then fed to two separate functions represented in Figure 11 by process steps on the left-hand side and process steps on the right hand side.

S26: A DCT is performed on the image blocks.

S28: For each image, a relative contrast measure is calculated by comparing the AC coefficient value with the value of the DC coefficient for the DCT domain image.

S30: For each AC coefficient of the DCT domain code word, a maximum allowable contrast is determined from a predetermined relationship between the contrast of the image block and the contrast which would be produced by the code word coefficients in the spatial domain. The relative contrast measure for the image block is therefore used to identify, from this relationship, a maximum allowable contrast which would be caused by the code word coefficient in the spatial domain.

S32: For each AC coefficient, a weighting value is calculated for each image data block DbUVto ensure that the coefficient value does not exceed the maximum allowable value calculated in steps S28 and S30. Effectively, therefore the weighting factor is calculated by scaling the maximum allowable contrast with the DC value of the DCT domain coefficient of the code word for that image block.

S36: The weighting factor values are received and combined or compensated with compensation factors or other weighting values calculated by alternative functions.

S38: As already explained although the contrast masking provided by steps S26 to S34 generates weighting factor values which are proportional to the contrast value of the image, in some examples where there is a significant change in contrast within the image, weighting factor values can be calculated which do not have a desired effect of hiding or masking the code word coefficients. Accordingly, a smootlmess compensation function is provided to compensate the weighting factor values calculated in accordance with the masking function. To this end, the image blocks are received and for each block a number of pixels in the block which are determined to be smooth is counted to generate a smoothness factor for the block. A pixel is determined to be smooth if a difference between the value of the pixel with respect to its neighbours does not exceed a predetermined threshold.

S40: For each block a smoothness factor is calculated in accordance with a proportion of smooth pixels in the block.

S42: For each block a compensation function is calculated from the smoothness factor, by comparing the smoothness factor with a predetermined relationship. The predetermined relationship is determined by experimentation with respect to the human eye's perception. Thus, for higher smoothness factors the predetermined relationship has an effect of reducing the weighting factor in order to compensate for images which include edges which would otherwise cause an over- estimation of the strength of the code word coefficients.

S36: As indicated above once the weighing factor values have been calculated the compensation factors are combined with the weighting factors to generate compensated weighting factors.

As indicated in Figure 11 at step S24 further processing functions are possible in order to adjust the weighting factor values or indeed to generate further compensation values which may be combined with the weighting factor values to adjust these values. Examples of such further functions are illustrated in Figure 12 which corresponds to the perceptual weighting function and the temporal masking 60 come 62 illustrated in Figure 5. In Figure 12 the process steps on the left hand side correspond to the perceptual weighting function whereas the process steps on the right- hand side correspond to the temporal masking. The process steps illustrated in Figure 12 are summarised as follows:- S60: The image blocks are received in a perceptual weighting function which is applying a function based on the human visual system which may be for example heuristically calculated.

S62: Compensation factors are established for each block in accordance with the perceptual weighting function.

S64: The temporal masking function receives images and detects from the images whether a scene change has occurred.

S66: In accordance with whether a scene change has occurred compensation factors are generated to set the weighting factors to zero if a scene change has occurred. This is because imperfections in an image are more likely to be noticeable by the human eye after a scene change.

As illustrated in Figure 12 the further compensation factors generated by the flow diagram of Figure 12 are combined in step S36 with the weighting factor values already generated to produce overall adjusted weighting factors values.

Detecting Process A flow diagram illustrating a process performed in detecting a code word from which payload data can be recovered is illustrated in Figure 13. The steps illustrated in Figure 13 are summarised as follows:- S80: The marked image from which the payload data is to be recovered is received and spatially andlor temporarily down-sampled in correspondence with a temporally andlor spatially down-sampled domain in which the code word was formed.

S82: A copy of the original image is received and correspondingly spatially andlor temporarily down-sampled, the down-sampling corresponding to the down- sampling performed on the marked image.

S84: A registration process is performed with the effect that the downsampled marked image is aligned with the down-sample copy of the original image.

Alignments is effected so that as far as possible samples to which code word coefficients were added in the original copy of the image correspond to the samples of the down-sampled copy of the image received at the detector. A result of the alignment should therefore increase a likelihood of correctly recovering the code word and detecting the payload data.

S86: A transform domain estimate of the code word is recovered bsubtracting the aligned down-sampled marked image from the down-sampled original image. At this point the samples are still in the spatial domain.

S88: A DCT is performed on the transform domain estimate to form an estimate of the code word. As will be appreciated DCT is one example of the transform which could be used.

S90: Code words from the set of possible code words are regenerated.

S92: The regenerated code words are correlated with the estimated code word recovered from the marked image.

S94: One or more code words are detected if a result of the correlation exceeds a predetermined threshold.

S96: The payload data is determined from the code words which are detected by the correlation results.

Applications The encoding image processing apparatus which is arranged to produce the watermarked images shown in Figure 2 may be incorporated into a variety of products for different scenarios in which embodiments of the present invention find application.

For example, the encoding image processing apparatus may be connected to a web site or web server from which the watermarked images may be downloaded. Before downloading a copy of the image, a unique code word is introduced into the downloaded image, which can be used to detect the recipient of the downloaded image at some later point in time.

In another application the encoding image processor forms part of a digital cinema projector in which the identification code word is added during projection of the image at, for example, a cinema. Thus, the code word is arranged to identify the projector and the cinema at which the images are being reproduced. Accordingly, the identification code word can be identified within a pirate copy produced from the images projected by the cinema projector in order to identify the projector and the cinema from which pirate copies were produced. Correspondingly, a watermarked image may be reproduced as a photograph or printout in which a reproduction or copy may be made and distributed.

For applications to digital cinema, the code word may be introduced into the cinema images to be projected by a media terminal. According to the digital cinema standard the cinema film is received in encoded form of the JPEG2000 and decoded to reproduce the cinema images to be projected. Signals representing the cinema image are therefore fed to the media terminal in which the code words are introduced before the water marked images are projected onto a cinema display screen.

In addition to the above-mentioned applications of the encoding data processing apparatus of the watermarking system to a cinema projector and to a web server, other applications are envisaged. For example, a receiver/decoder is envisaged in which received signals are watermarked by introducing code words upon receipt of the signals from a communicating device. For example, a set top box is typically arranged to receive television and video signals from a "head-end" broadcast or multi- cast device. As will be appreciated in this application, the encoding data processing apparatus forms part of the set top box and is arranged to introduce watermark code words into the video signals as the signals are received and decoded. In one example embodiment, the watermark code word is arranged to uniquely identify the set top box which receives and decodes the video signals.

Various further aspects and features of the present invention are defined in the appended claims. Various modifications can be made to the embodiments herein before described without departing from the scope of the present invention.

Claims

1. An encoding data processing apparatus for generating a marked copy of an image by introducing at least one code word into a copy of the image, the apparatus comprising a code word generator operable to generate the code word having a plurality of code word coefficients and to form the code word into a transform domain representation, the transform domain representation corresponding to at least one of spatially or temporally down sampling the image with respect to a sampling rate of the base band image, an image analyser operable to receive the image in a base band form to form a down-sampled version of the image, the down sampling being effected in correspondence with the down sampled representation of the code word formed by the code word generator with respect to a base band sampling rate of the image, and to generate weighting factors for scaling the code word coefficients with respect to parts of the base band image with which the code word coefficients are to be combined, a strength adapter operable to combine the weighting factors with the code an inverse transform processor operable to convert the transform domain code word coefficients scaled by the weighting factors into the base band domain by correspondingly up-sampling the code word at least one of spatially or temporally to form the base band domain version of the code word for combining with the image, and a combiner operable to combine with inverse transformed code word with the image in the base band domain, the code word being formed in the transform domain for combining with the image in the base band domain, and the weighting factors are formed from the image in the spatial domain.

2. An encoding data processing apparatus as claimed in Claim 1, wherein the image analyser includes a contrast masking engine operable to generate the weighting factor values in proportion with a relative contrast of the down-sampled image in a region in which the code word coefficients to be adapted by the weighting factor values are to be combined.

3. An encoding data processing apparatus as claimed in Claim 1 or 2, wherein the inverse transform processor comprises an inverse transform processor operable to perform an inverse transform on the down-sampled version of the code word coefficients to form a down-sampled spatial domain representation of the code word, and an up-sampling processor operable to convert the sampling rate of the down- sampled spatial domain representation of the code word to a sampling rate corresponding to the sampling rate of the base band image, the combiner combining the code word with the image in the spatial domain.

4. An encoding data processing apparatus as claimed in Claim 3, wherein the inverse transform is an inverse Discrete Cosine Transform.

5. An encoding data processing apparatus as claimed in Claim 2, 3 or 4, wherein the up-sampling processor includes an up-sampler operable to introduce zero valued sub-samples into the down- sampled representation of the code word to form a version of the representation of the code word having a sampling rate corresponding to the sampling rate of the base band image, and a filter operable to filter the version of the representation of the code word to form the up-sampled version of the representation of the code word for combining with the image in the spatial domain, wherein the filter is an infinite impulse response filter.

6. An encoding data processing apparatus as claimed in Claim 5, wherein the up-sampler is operable to up-sample the code word in the temporal domain and the infinite impulse response filter is applied to filter the temporally up-sampled representation of the code word.

7. A cinema projector including an encoding data processing apparatus according to any of Claims 1 to 6, wherein the encoding data processing apparatus is operable to receive image signals before reproduction, and to introduce code words into the image signals before reproduction.

8. A media terminal including an encoding data processing apparatus according to any of Claims 1 to 6, wherein the encoding data processing apparatus is operable to receive image signals, and to introduce code words into the image signals.

9. A detecting data processing apparatus operable to determine payload data from a marked copy of an image, the image having been marked by forming at least one code word in a transform domain to represent the payload data, the transform domain including at least one of spatially or temporally down-sampling, and inverse transforming the code word by upsampling the code word to a sampling rate corresponding to the sampling rate of the image and combining the code word with a copy of the image, the detecting apparatus comprising a recovery processor operable to recover an estimate of the code word by comparing the marked version of the image with an original version of the image, a code word re-generator operable to re-generate code words from the set of code words, a correlator operable to correlate the estimated code word recovered from the marked image with a code word re-generated by the code word regenerator, and a control processor operable to control the code word regenerator to re-generate code words from the set, to control the correlator to correlate the re-generated code words with the estimated code word recovered from the marked image, to determine whether a code word has been detected if a result of the correlation from the correlator exceeds a predetermined threshold, and to recover the payload data in accordance with the detected code word, wherein the recovery processor includes a down sampling processor operable to fonn a down-sampled representation of the estimated code word from the base band versions of the marked image and the original version of the image.

10. A detecting data processing apparatus as claimed in Claim 9, wherein the down-sampling processor comprises a first down sampling processor operable to receive the marked copy of the image and to down-sample, at least one of spatially or temporally, the marked image in correspondence with the down-sampling of the transform domain in which the code word was formed to generate the marked version of the image, a second down sampling processor operable to receive a copy of the original image and to down-sample, at least one of spatially or temporally, the original, in correspondence with the down-sampling applied to the marked copy of the image, a comparator operable to recover the down-sampled version of the estimated code word by comparing the down-sampled version of the original image with the down-sampled version of the marked image.

11. A detecting data processing apparatus as claimed in Claim 10, wherein the transform domain in which the code word is formed corresponds to a discrete cosine transform, the code word formed being inverse discrete cosine transformed before being combined with the copy of the image to form the marked image, wherein the recovery processor includes a transform processor operable to receive samples resulting from the comparison of the marked image with the original image and to discrete cosine transform the resulting samples to form the estimated code word.

12. A detecting data processing apparatus as claimed in Claim 10 or 11, wherein the recovery processor includes a registration processor operable to substantially align the down-sampled version of the marked image with the down-sampled version of the copy of the original images to the effect that samples of the marked copy correspond substantially to samples of the original image so that a probability of correctly detecting the code word present in the marked copy of the material is substantially increased.

13. A detecting data processing apparatus as claimed in any of Claims 9 to 12, wherein the first down sampling processor includes a first filter operable to low pass filter the marked image, and a first sub-sampler operable to form a sub-sampled version of the filtered marked image by selecting samples of the marked copy of the image to reduce the sampling rate, in correspondence with the down-sampling of the transform domain in the second down sampling processor includes a second filter operable to low pass filter the copy of the original image, and a second sub-sampler operable to form a sub-sampled version of the filtered copy of the original image by selecting samples of the filtered copy of the original image to reduce the sampling rate, in correspondence with the downsampling of the marked version of the image, and wherein at least one of the first and second filter is an infinite impulse response filter.

14. A detecting data processing apparatus as claimed in Claim 13, wherein the infinite impulse response filter is applied to filter the representation of the code word temporally, the sub-sampler being operable to sub-sample the code word in the temporal domain.

15. A detecting data processing apparatus as claimed in Claim 14, wherein the marked image was formed by up-sampling the transform domain code word and passing the up-sampled code word through an infinite impulse response filter, and the sub-sampled code word is passed through the filter in a direction which is temporally opposite to a direction in which the upsampled code word was passed through the infinite impulse response filter to form the marked image.

16. A method of generating a marked copy of an image by introducing at least one code word into a copy of the image, the method comprising generating the code word forming the code word according to a representation corresponding to at least one of spatially or temporally down sampled version of the image with respect to a sampling rate of a base band image, receiving the image in a base band form, forming a downsampled version of the image, the down sampling being effected in correspondence with the down sampled representation of the code word with respect to the base band sampling rate of the image, and generating weighting values for scaling the code word coefficients with respect to the down sampled version of the image in the spatial domain, combining the weighting values with the code word coefficients, and converting the transform domain code word coefficients scaled by the weighting values into the base band domain by correspondingly up-sampling the code word at least one of spatially or temporally to form the base band domain version of the code word for combining with the image, and combining the image with inverse transformed code word in the base band domain, wherein the forming the code word includes forming the code word in the transform domain for combining with the image in the base band domain, and the generating the weighting factors comprises forming the weighting factors from the image in the spatial domain.

17. A method as claimed in Claim 16 wherein, the generating the weighting factor values includes generating the weighting factor values in proportion with a relative contrast of the down-sampled image in a region in which the code word coefficients to be adapted by the weighting factor values are to be combined.

18. A method as claimed in Claim 17, wherein the transform domain includes a discrete cosine transform representation, and the converting the transform domain code word coefficients comprises performing an inverse discrete cosine transform on the down-sampled version of the code word coefficients to form a down-sampled spatial domain representation of the code word, and then correspondingly up-sampling the code word to form the base band domain version of the code word.

19. A method as claimed in Claim 16, 17 or 18, wherein the up-sampling includes introducing zero valued sub-samples into the down-sampled spatial domain representation of the code word to form a version of the representation of the code word having a sampling rate corresponding to the sampling rate of the base band image, and filtering the version of the representation of the code word with an infinite impulse response filter to form the up-sampled version of the representation of the code word for combining with the image in the spatial domain.

20. A method of determining payload data from a marked copy of an image, the image having been marked by forming at least one code word from a set of code words in a transform domain to represent the payload data, the transform domain including at least one of spatially or temporally downsampling, and inverse transforming the code word by up-sampling the code word to a sampling rate corresponding to the sampling rate of the image and combining the code word with a copy of the image, the method comprising recovering an estimate of the code word by comparing the marked version of the image with an original version of the image, re- generating code words from the set of code words, correlating the estimated code word recovered from the marked image with a code word re- generated by the code word re-generator, and controlling the re- generation of code words from the set and the correlation of the re- generated code words with the estimated code word recovered from the marked image, to determine whether a code word has been detected if a result of the correlation exceeds a predetermined threshold, and to recover the payload data in accordance with the detected code word, wherein the recovering the estimate of the code word includes forming a down-sampled representation of the estimated code word from the base band versions of the marked image and the original version of the image.

21. A method as claimed in Claim 19, wherein the forming the down- sampled representation comprises receiving the marked copy of the image, down-sampling, at least one of spatially or temporally, the marked image in correspondence with the down-sampling of the transform domain in which the code word was formed to generate the marked version of the image, receiving a copy of the original image, correspondingly down-sampling, at least one of spatially or temporally, the original image, and recovering the down-sampled version of the estimated code word by comparing the downsampled version of the original image with the down-sampled version of the marked image.

22. A method as claimed in Claim 21, wherein the transform domain in which the code word is formed corresponds to a discrete cosine transform, the code word formed being inverse discrete cosine transformed before being combined with the copy of the image to form the marked image, wherein the recovering the estimate of the code word includes receiving samples resulting from the comparison of the marked image with the original image and performing a discrete cosine transform on the resulting samples to form the estimated code word.

23. A method as claimed in Claims 20 to 22, wherein the down-sampling the marked image in correspondence with the down-sampling of the transform domain in which the code word was formed includes low pass filtering the marked image with an infinite impulse response filter, and forming a subsampled version of the filtered marked image by selecting samples of the filtered marked copy of the image to reduce the sampling rate, at least one of spatially or temporally, in correspondence with the down-sampling of the transform domain in which the code word was formed, and the correspondingly down- sampling, at least one of spatially or temporally, the original image includes low pass filtering the copy of the original image, forming a sub- sampled version of the filtered copy of the original image by selecting samples of the filtered copy of the original image to reduce the sampling rate, at least one of spatially or temporally, in correspondence with the down-sampling of the marked version of the image.

24. Computer software having program code, which when loaded onto a data processor, causes the data processor to perform the method according to claim to any of claims 16 to 23.

25. A data bearing medium bearing computer program code, which when loaded onto a data processor performs the method according to any of claims 16 to 23.

26. A data bearing medium according to claim 25, wherein the medium is a storage medium.

27. An encoding data processing apparatus or a detecting data processing apparatus as herein before described with reference to the accompanying drawings.

28. A method of generating a marked copy of a material item and a method of determining a data word represented by a plurality of code words substantially as herein before described with reference to the accompanying drawings.

Amended claims have been filed as follows

1. An encoding data processing apparatus for generating a marked copy of an image by introducing at least one code word into a copy of the image, the apparatus comprising a code word generator operable to generate the code word having a plurality of code word coefficients and to form the code word into a transform domain representation, the transform domain representation corresponding to at least one of spatially or temporally down sampling the image with respect to a sampling rate of the base band image, an image analyser operable to receive the image in a base band form to form a down-sampled version of the image, the down sampling being effected in correspondence with the down sampled representation of the code word formed by the code word generator with respect to a base band sampling rate of the image, and to generate weighting factors for scaling the code word coefficients with respect to parts of the base band image with which the code word coefficients are to be combined, a strength adapter operable to combine the weighting factors with the code an inverse transform processor operable to convert the transform domain code word coefficients scaled by the weighting factors into the base band domain by correspondingly up-sampling the code word at least one of spatially or temporally to form the base band domain version of the code word for combining with the image, and a combiner operable to combine with inverse transformed code word with the image in the base band domain, the code word being formed in the transform domain for combining with the image in the base band domain.

3. An encoding data processing apparatus as claimed in Claim I or 2, wherein the inverse transform processor comprises an inverse transform processor operable to perform an inverse transform on the down-sampled version of the code word coefficients to form a down-sampled spatial domain representation of the code word, and an up-sampling processor operable to convert the sampling rate of the down- sampled spatial domain representation of the code word to a sampling rate corresponding to the sampling rate of the base band image, the combiner combining the code word with the image in the spatial domain.

6. An encoding data processing apparatus as claimed in Claim 5, wherein the up-sampler is operable to up-sample the code word in the temporal domain and the ( infinite impulse response filter is applied to filter the temporally up-sampled representation of the code word.

9. A detecting data processing apparatus operable to determine payload data from a marked copy of an image, the image having been marked by forming at least one code word in a transform domain to represent the payload data, the transform domain including at least one of spatially or temporally down-sampling, and inverse transforming the code word by upsampling the code word to a sampling rate corresponding to the sampling rate of the image and combining the code word with a copy of the image, the detecting apparatus comprising a recovery processor operable to recover an estimate of the code word by comparing the marked version of the image with an original version of the image, a code word re-generator operable to re-generate code words from the set of code words, a correlator operable to correlate the estimated code word recovered from the marked image with a code word re-generated by the code word regenerator, and a control processor operable to control the code word regenerator to re-generate code words from the set, to control the correlator to correlate the re-generated code words with the estimated code word recovered from the marked image, to determine whether a code word has been detected if a result of the correlation from the correlator exceeds a predetermined threshold, ançl to recover the payload data in accordance with the detected code word, wherein the recovery processor includes

K 4.'

a down sampling processor operable to form a down-sampled representation of the estimated code word from the base band versions of the marked image and the original version of the image.

12. A detecting data processing apparatus as claimed in Claim 10 or 11, wherein the recovery processor includes a registration processor operable to substantially align the down-sampled version of the marked image with the down-sampled version of the copy of the original images to the effect that samples of the marked copy correspond substantially ( L-1.

to samples of the original image so that a probability of correctly detecting the code word present in the marked copy of the material is substantially increased.

16. A method of generating a marked copy of an image by introducing at least one code word into a copy of the image, the method comprising generating the code word forming the code word according to a representation corresponding to at least one of spatially or temporally down sampled version of the image with respect to a sampling rate of a base band image, receiving the image in a base band form, forming a downsampled version of the image, the down sampling being effected in correspondence with the down sampled representation of the code word with respect to the base band sampling rate of the image, and generating weighting values for scaling the code word coefficients with respect to the down sampled version of the image in the spatial domain, combining the weighting values with the code word coefficients, and converting the transform domain code word coefficients scaled by the weighting values into the base band domain by correspondingly up-sampling the code word at least one of spatially or temporally to form the base band domain version of the code word for combining with the image, and combining the image with inverse transformed code word in the base band domain, wherein the forming the code word includes forming the code word in the transform domain for combining with the image in the base band domain.

I

19. A method as claimed in Claim 16, 17 or 18, wherein the up-sampling includes introducing zero valued sub-samples into the down-sampled spatialdomain representation of the code word to form a version of the representation of the code word having a sampling rate corresponding to the sampling rate of the base band image, and filtering the version of the representation of the code word with an infinite impulse response filter to form the up-sampled version of the representation of the code word for combining with the image in the spatial domain.

22. A method as claimed in Claim 21, wherein the transform domain in which the code word is formed corresponds to a discrete cosine transform, the code word formed being inverse discrete cosine transformed before being combined with the copy of the image to form the marked image, wherein the recovering the estimate of the code word includes receiving samples resulting from the comparison of the marked image with the original image and performing a discrete cosine transform on the resulting samples to form the estimated code word. ( Lto