WO2007085632A1

WO2007085632A1 - Method of watermarking digital data

Info

Publication number: WO2007085632A1
Application number: PCT/EP2007/050732
Authority: WO
Inventors: Philippe Nguyen; Séverine Baudry
Original assignee: Thomson Licensing
Priority date: 2006-01-27
Filing date: 2007-01-25
Publication date: 2007-08-02
Also published as: FR2896938A1

Abstract

The invention relates to a digital watermarking method comprising the steps of - decomposition of the video into blocks (N, N+l, N+2) - calculation (El, E2) for each block of the said video of a digital signature (S). According to the invention the method furthermore comprises a step of insertion into a second block (N+l) of said signature. (S) relating to a first block (N), before the calculation of said signature (S) of said second block (N+l).

Description

METHOD OF WATERMARKING DIGITAL DATA

The invention relates to a method of watermarking digital data.

The invention relates to the general field of the watermarking of digital data. More precisely, it relates to integrity monitoring of digital data.

Integrity is generally understood as strict integrity. In electronic notary applications for example, one wants to be certain that a document (a will for example) is strictly identical to its original version. Any modification, even tiny, of this document (change of a word or even of a single character) can have a strong impact on the semantics. Moreover, such documents are not required to evolve or to be transformed for storage cost or transport reasons: the size of a text document - a few thousands bytes - does not in fact require compression efforts. Any cut in the document is also synonymous with violation of integrity, and is therefore not considered to be acceptable. Strict integrity is also necessary when exchanging executable code, so that each can be sure of the origin of the code, and in particular guard against possible viruses. Here again, the small size of the data and their particular structure - no possibility of compression with loss in particular - imply that a strict integrity mechanism is entirely suitable.

Solutions exist in the state of the art, which make it possible to guarantee the strict integrity of a document. The cryptographic hash functions make it possible in particular to obtain, on the basis of any document, a digest (set of data of much smaller size than the original document) with the beneficial properties. In particular, any change, even tiny (a single bit for example), of the source document leads to a totally different digest.

The probability of having two source documents with the same digest is very low; moreover, there is no "simple" procedure (that is to say of much lower complexity than that of exhaustive exploration) for generating, on the basis of a digest, a source document corresponding to this digest. It is therefore very difficult to fabricate a falsified document corresponding to a given digest; moreover, even if one manages to do so, this document will with all probability be very different from the original document (if the latter is a text, the falsified document will doubtless consist of an unintelligible series of characters).

Flexible integrity

The mechanisms of strict integrity are entirely suitable for data of text, programs, executables type etc. On the other hand, they are hardly suitable for data such as images, video or audio. Specifically, data of this type are often very voluminous, accordingly they are generally compressed. Moreover, they lend themselves particularly well to compression with loss. Specifically, they are ultimately intended for a human user -viewer or listener -, whose brain can process only a very limited amount of information.

The eye, or the ear, is therefore hardly if at all sensitive to a large number of components of the audio or video signals. It is for example well known that the very high audio frequencies are hardly perceptible: the frequency bandwidth can be reduced without altering the subjective quality of the sound. Isolated modifications of small amplitude of an image are likewise very rarely perceived. Compression with loss utilizes these properties, by removing the signal or by strongly altering (for example by quantization) all the components to which the subjective receiver is hardly sensitive.

The signal obtained after compression with loss is therefore different from the original signal. Moreover, there may be a large number of different compressed documents for one and the same original document, according to the type of compression algorithm used, the parametrization, or the desired perceptual quality. Other alterations are also common. They hardly modify the semantics of the video, but strongly alter the content of the data. Such is the case in particular with geometric transformations for the images - slight splits, change of scale or "of aspect ratio" -, of the change of sampling frequency for the sound. All these transformations are unfortunately common in the life of an audiovisual content, since they are often rendered indispensable by the operational contingencies (limited storage room, reduced bandwidth etc.). An end-to-end strict integrity protection mechanism, applied indiscriminately, therefore has every chance of being totally unsuitable in this context.

One seeks rather to preserve the semantics of the content. We note henceforth that the concept of semantics is extremely difficult to define in a formal manner. Specifically, it involves not only the mechanisms mentioned above of selectivity of vision or of hearing, which are still poorly understood at the present time, but also the high level analysis carried out by the brain to extract the "sense" of the scene.

For example, the attention of the subject will be focused on the players in a scene rather than on the background. An extraction of the semantics therefore requires prior recognition of objects, then an analysis of the relations between these objects in the given scene, operations which are beyond the reach of current scientific knowledge.

A method making it possible to calculate a digest of an image (or of a video, of a sound) is called a visual hash function or flexible signature. Contrary to the cryptographic hash functions, such a function can (this is at the very least what is hoped for at its conception) give an identical digest for two different images, with the proviso that the latter are sufficiently "close" from the perceptual point of view; hence the qualifier "flexible" or sometimes "robust as opposed to the qualifier "strict.

On the other hand, the digest will have to be different as soon as the image undergoes an alteration of its semantics; for example the addition of a player or of an object to a scene, the modification of text images, etc. One immediately realizes that the problem area of formal definition of the semantics of an image will render the conception of visual hash functions tricky. One may even doubt the well-posed character of the problem. It is nevertheless possible to define a locality criterion, which can be considered to be valid in the large majority of cases: a contingent alteration (due to compression for example) will give rise to modifications of the signal of small amplitude, but distributed in a relatively uniform manner over the whole of the image. On the other hand, an alteration of the semantics will result in a strong but localized modification of the data. One therefore seeks "threshold- based" hash functions, which tolerate modifications of less than a certain value but react to overly strong localized variations.

The digest of the image must accompany the latter throughout its life, so that the integrity can be verified at an arbitrary step of the transmission and storage chain. Unfortunately, the steps of processing, carriage and archiving of the images are often multiple and complex, and implement formats or equipment in which it is not always possible to associate image and digest. It is therefore particularly beneficial that the digest can be conveyed with the image, in a transparent manner and whatever data representation standard is used.

Watermarking techniques are particularly indicated here: they make it possible to transmit information by visually imperceptible modification of the carrier data. Unlike meta-data, watermarking is therefore persistent even after format conversion of the carrier image. If robust watermarking algorithms are used, the buried information can be played back even if the image has been altered, for example compressed. On the other hand, if the image is too strongly modified, the watermark is then erased. One thus foresees two possible, nonexclusive, uses of watermarking:

• Robust and persistent transport of the digest, jointly with the image

• Detection of strong alterations of the image, based on the readability of the watermark itself.

The watermarking algorithm and its paramethzation are carefully chosen so as to meet the constraints of integrity protection. The image modifications due to the watermarking must not be too strong, so as not to alter the integrity. The watermarking must be readable at least for alterations corresponding to the acceptance limits in respect of integrity. To implement the locality criterion, it is beneficial to have separable watermarking algorithms, making it possible to code distinct information in distinct zones of the image.

Various digital watermarking techniques suitable for monitoring the integrity of data exist:

• Fragile procedures which detect all the modifications, if only the modification of a bit. These procedures are very accurate for locating the alterations.

• Procedures termed semi-fragile, whose objective is not only to tolerate certain processings (especially JPEG compression) but also to detect substitutions or addition of objects in the image. Unlike those above, locating is relatively coarse. The big problem at present being that these techniques might not operate correctly and sometimes generate false alarms (that is to say indicate "imaginary" alterations for some images).

• Hybrid procedures, which try to combine the advantages of the above procedures; a double marking is in fact used,

• finally procedures with autocorrection capacity proposing to restore the modified image.

For the particular type of applications of watermarking that is integrity monitoring, substitutive techniques are favoured. They in fact allow the insertion of a message which can be extracted during the integrity verification process.

The fragile techniques generally utilize a summary, calculated on the image data not affected by the insertion phenomenon. It may involve the result of a "checksum", or a hash function on the planes of high-order bits, followed by an insertion into the low-order bits.

For the semi-fragile procedures, the process of extracting the summary is more pertinent and deals with the image attributes not having to be altered. Other authors exploit relations between DCT coefficients of distinct blocks; relations a priori invariant under JPEG compression.

A second solution consists in introducing alterations into the image, and in verifying the presence thereof on detection.

Finally, the procedures of reversible watermarking must be mentioned. In these approaches, after having calculated a summary of the image in its entirety, it is inserted into the LSB of the image by having compressed the original bits of the image. In fact, the signature hidden in the low-order bits consists of the summary and of the compressed low-order bits. This procedure is very obviously fragile.

The invention proposes to transport a signature of the video by watermarking the video itself. A problem of the "deadly embrace" type may then arise: the watermarking modifies the video, which may thereafter give rise to a different digest from that of the original video.

The document by C. Rey J. -L. Dugelay entitled "Blind Detection of Malicious Alterations On Still Images Using Robust Watermarks", published in Secure Images and Image Authentication, IEE Electronics Communications, April 10, 2000 - London, proposes a procedure for solving this problem by an iterative approach: the digest of the watermarked image is calculated, and is reinserted into the image, and the method is iterated until convergence. However this procedure poses problems of calculation time, stability and visibility of the watermark: convergence is in particular not guaranteed. This procedure cannot in particular be used with a strict hash function since there is no guarantee of its convergence, that is to say an image may not be protectable without it being possible to do anything. The following approach is proposed here: the video is split into blocks.

These blocks can be groups of images or portions of images; it is possible to borrow the structures defined in the standards such as MPEG or JPEG (GOP - Group of Pictures; 16x16 macroblocks or 8x8 blocks) to simplify the operations on compressed streams and to favour interoperability.

For this purpose, the invention proposes a digital watermarking method comprising the steps of - decomposition of the video into blocks,

- calculation for each block of the said video of a digital signature, According to the invention the method furthermore comprises a step of insertion into a second block of the said signature relating to a first block, before the calculation of the said signature of the said second block.

Advantageously, during the step of calculation of the said digital signature, a hash function and an asymmetric encipherment function are applied to each block.

According to a preferred embodiment, the said blocks comprise a series of sub-blocks representative of a frequency decomposition of the said video.

Preferably, the said first and second blocks are temporally and spatially adjacent blocks.

In an advantageous manner, the said signature being a series of bits, the said signature relating to a first block is inserted into the second block, by inserting into each of the sub-blocks of the said second block one of the said bits of the said signature of the said first block.

Preferably, during the insertion of the said bits into the said sub-blocks, the order of coefficients of the said sub-block is modified. Advantageously, the coefficients to be modified are chosen so that the absolute value of the difference of their absolute values is less than a predetermined threshold.

The invention also relates to a computer program product; according to the invention the program comprises program code instructions for the execution of the steps of the method according to the invention when the said program is executed on a computer.

"Computer program product" is understood to mean a computer program carrier, which can consist not only of a storage space containing the program, such as a diskette or a cassette, but also of a signal, such as an electrical or optical signal.

The invention will be better understood and illustrated by means of wholly nonlimiting advantageous exemplary embodiments and modes of implementation, with reference to the appended figures in which:

- Figure 1 represents a block diagram illustrating the modifications performed on the blocks during the encryption operation; - Figure 2 represents a cryptography algorithm (digital signature) used in the preferred embodiment of the invention.

The invention is described within the framework of data coded in accordance with the MPEG-4 AVC video coding standard such as described in the document ISO/IEC 14496-10 (entitled "information technology - coding of audio-visual objects - part 10: advanced video coding").

In accordance with the conventional video compression standards, such as MPEG-2, MPEG-4, and H.264, the images of a sequence of images can be of intra type (I image), i.e. coded without reference to the other images of the sequence or of inter type (i.e. P and B images), i.e. coded by being predicted on the basis of other images of the sequence. The images are generally divided into macroblocks themselves divided into disjoint pixel blocks of size N pixels by P pixels, called NxP blocks. These macroblocks are themselves coded according to an intra or inter coding mode. More precisely all the macroblocks in an I image are coded according to the intra mode while the macroblocks in a P image can be coded according to an inter or intra mode. The possibly predicted macroblocks are thereafter transformed block by block using a transform for example a discrete cosine transform referenced DCT or else a Hadamard transform. The blocks thus transformed are quantized then coded generally using variable-length codes. In the particular case of the MPEG-2 standard the macroblocks of size 16 by 16 pixels are divided into 8x8 blocks themselves transformed with an 8x8 DCT into transformed 8x8 blocks. In the case of H.264, the macroblocks of intra type relating to the luminance component can be coded according to the 4x4 intra mode or according to the 16x16 intra mode. An intra macroblock coded according to the 4x4 intra mode is divided into 16 disjoint 4x4 blocks. Each 4x4 block is spatially predicted with respect to certain neighbouring blocks situated in a causal neighbourhood, i.e. with each 4x4 block is associated a 4x4 prediction block generated on the basis of the said neighbouring blocks. 4x4 blocks of residuals are generated by subtracting from each of the 4x4 blocks, the associated 4x4 prediction block. The 16 blocks of residuals thus generated are transformed by a 4x4 integer H transform which approximates a 4x4 DCT. An intra macroblock coded according to the 16x16 intra mode is spatially predicted with respect to certain neighbouring macroblocks situated in a causal neighbourhood, i.e. a 16x16 prediction block is generated on the basis of the said neighbouring macroblocks. A macroblock of residuals is generated by subtracting from the intra macroblock the associated prediction macroblock. This macroblock of residuals is divided into 16 disjoint 4x4 blocks which are transformed by the H transform. The 16 low-frequency coefficients (called DC coefficients) thus obtained are in their turn transformed by a 4x4 Hadamard transform. Subsequently in the document, the transform T which is applied to a macroblock designates a 4x4 H transform applied to each of the 4x4 blocks of the macroblock if the macroblock is coded in 4x4 intra mode and a 4x4 H transform applied to each of the 4x4 blocks of the macroblock followed by a Hadamard transform applied to the DC coefficients if the macroblock is coded in 16x16 intra mode. We introduce here the concept of ST-block. The term designates a concatenation in space and over time of coded block (in mpeg-4 for example) or, equally, of block of pixels. An ST block is therefore a particular subset in space and over time of the video. It can correspond to several blocks such as described previously, to a whole image. It corresponds to the unit of splitting of the video.

The blocks N, N+1 , N+2 represented in Figure 1 each represent an ST- block.

A cryptography procedure subsequently described with reference to Figure 2, comprising a hash function and an asymmetric encipherment, is applied to the ST-block N.

On completion of the application of this cryptography procedure, data for authentication of N are obtained. These authentication data are then inserted into the following block N+1 of the video to which the cryptography procedure will also be applied so as to obtain therefrom authentication data that are inserted into the following block N+2 and so on and so forth.

This advantageously makes it possible not to modify the blocks after calculation of the hash. Otherwise, the block would no longer be intact on arrival. The following block is admittedly modified but in an imperceptible manner, owing to the properties of the watermark. There is therefore no violation of the semantic integrity of the content.

Furthermore, this procedure makes it possible also to monitor or to verify the integrity of the chaining together of the blocks, that is to say the temporal integrity of the sequence. Assume that we remove the series

[N;N+K] of blocks. The signature extracted from the image N+K+1 does not then correspond to the signature of the image N-1 , except if the images

N+K+1 and N-1 were originally identical; however, this case is extremely improbable because of the intrinsic variability of the sensors (thermal noise etc.).

This mechanism has the benefit of being able to be implemented on the fly, therefore with a significant memory gain and a very small processing latency. A low-cost real time implementation is therefore envisageable, this not being the case with the iterative or global approaches of the state of the art. This low complexity is particularly beneficial in a video surveillance application since, to ensure a maximal safety level, the algorithm must be effected as close as possible to the sensor.

Moreover, block splitting is particularly well suited to the locality criterion defining flexible integrity. The algorithm in fact makes it possible to determine which(s) block(s) have been altered. This information can be particularly beneficial for knowing if crucial zones of the video have been altered, in which case one may suspect a fraudulent modification, or if the alteration pertains to insignificant zones of the scene.

Note that it is necessary to dimension the size of the blocks as a function of a compromise between safety, processing complexity and location of the artefacts.

Figure 2 represents a cryptography algorithm used in the preferred embodiment of the invention.

This cryptography algorithm comprises first of all a hash function, step E1 , followed by a digital signature calculation function, step E2. The hash function used makes it possible to obtain a condensed version of the ST-block on which a digital signature is then calculated.

The hash function used is a function of "sha-1 " type. The sha-1 function takes as input the coefficients of the macro-block and produces as output a digest of the block presented as input.

In other embodiments, it is also possible to take a function of SHA-256, SHA-384 or SHA-512, MD5, Whirepool or TIGER type.

To this digest of the ST-block N is applied a signature algorithm during step E2. The signature used is the EIGamal algorithm. In other embodiments, it is also possible to take a signature function of RSA type.

It is possible to derive a digital signature algorithm from any asymmetric encipherment algorithm in the following manner: the decipherment function DQ is applied (using the private key K_s) to the document to be signed m, or preferably to its digest H(m), to obtain the signature S = D(H(m)).

In the asymmetric crypto systems, use is made of an encipherment key (called the public key K_p) different from the decipherment key (called the private key or secret key K_s). The public key is generated from the private key: it is easy to generate K_p from K_s, but extremely difficult (in terms of calculation time and resources) to generate K_s from K_p. The person desiring to encipher or sign his messages chooses a secret key, for example by drawing a random number, then generates the public key from this secret key. He keeps K_s strictly hidden, and discloses K_p.

For example, for the El Gamal signature system, the secret key K_s is a number x chosen in a random manner. We thereafter calculate y = g^x mod[p] on the basis of a prime number p and of a number g. The public key K_p then consists of the numbers (y,p,g). The properties of the calculations on the integers modulo p mean that it is extremely difficult to retrieve x from (y,p,g). The signature of the message m is a pair (r,s) such that: g^m = y^r r^s mod[p] To verify the signature, one looks to see whether the equality g^m = y^rr^s mod[p] holds.

A digital signature for the block is then obtained as output.

This digital signature is then inserted into the following block N+1 using known watermarking techniques. Before the insertion of the signature by watermarking, the digital signature is encoded using an error corrector code, of BCH type for example.

A possible watermarking technique consists in splitting the ST-block

N+1 into a set of k DCT blocks (k being at least equal to the size of the digital signature), and in applying the following watermarking algorithm: A known watermarking method, for example applied in the DCT transformation space to the transformed 8x8 blocks denoted

^of an image to be watermarked, consists in possibly modifying for each block B°_x ^c/ the order relation existing between the absolute values of two of its DCT coefficients, denoted T₁ and T₂.

The signature S of the ST-block N is a series of j bits {b1 ,b2...bj}. The ST-block N+1 comprises a set of k DCT blocks B^ (k>=j). Into each of these blocks B^ is inserted one of the bits bi of the signature of the ST-block N. This insertion is carried out by the application of the watermarking algorithm below.

In general, these two coefficients are selected for a block that is given with the aid of a secret key. The bit bi of the print associated with a block K_xz '^s inserted into this block, modifying the order relation existing between the absolute values of the two coefficients T₁ and T₂. In order to monitor the visibility of the watermark, the coefficients of a block are modified only if the following relation holds:

where Th is a paramethzable threshold.

The coefficients T₁ and T₂ are modified so that the following order relation holds: 1^1 = 1^1 + ^*5, (1) where: - T₁ ^' and T₂ are the coefficients T₁ and T₂ modified.

- c/ is a marking parameter called the marking distance, and - B₁ is a coefficient whose value is defined as follows: 5_; =1 if bi=0 and

B₁ = -λ if b/=i .

T₁ and T₂ can be modified in diverse ways so as to ensure the relation defined previously. Let us define e_λ = T₁ -T₁ and e₂ = T₂ -T₂ , the values of e_λ and e₂ are defined for each block B^ in the following manner: e_{l =} -T₁ + Si₈H(T₁ ) * (Z₁(T^T₂ )+ d, ) and e₂ = -T₂ + sign(T₂) * (/₂ (T₁₅T₂) + d₂)

The choice of the fonction /i is free, it is possible for example to choose /i (F₁₅T₂) = J₂(T₁₅T₂) = |T₂| . For example, in the case where bi=0, let us choose d₂ = -|F₂| and J₁ = -|T₂| + <i , then the order relation (1 ) does indeed hold. In the case where bi=1 , let us choose J₁ = -F₂ and d₂ = -F₂ - d , then the order relation (1 ) also holds. The values of c/ and of Th vary as a function of the application and in particular as a function of the risk of piracy. Specifically, the higher the value of d, the more robust the watermarking but the more visible it is. Thus to preserve good visual quality of the sequence of images, the marking strength must be limited.

Once the signature S has been inserted into block N+1 , the signature of block N+1 is calculated as indicated previously for block N.

The last block of the video is therefore not protected, since it is not possible to watermark its signature in the following block, which does not exist. However, if one works on the fly on a video stream, for example if one wants to certify the integrity of a video acquired by a surveillance camera which rotates continuously, there is never a last block and therefore protection is always guaranteed. On a bounded video, it is necessary to take care to dimension the size of the blocks so that a loss of integrity on the last block is not significant. Note that there is then no need to proceed in either temporal or spatial order having the entirety of the video at one's disposal, it is possible to reorder the blocks of the video in any way whatsoever (block n+1 will not then necessarily be situated after block n). It is then possible to choose a block which is insignificant in respect of the semantics of the video as being the end block.

To verify the integrity of the document m', its digest H(m') is calculated by applying the hash function. Thereafter, the asymmetric encipherment function C() is applied (using the public key Kp) to the signature S:

C(S) = C(D(H(m))) = H(m)

One thus obtains the digest of the initial document H(m), that is compared with the digest H(m') of the suspicious document m': if the two are equal, then the integrity of the document is verified.

Claims

1. Digital watermarking method comprising the steps of - decomposition of the video into blocks (N, N+1 ,N+2) - calculation (E1 , E2) for each block of the said video of a digital signature

(S), characterized in that it furthermore comprises a step of insertion into a second block (N+1 ) of the said signature (S) relating to a first block (N), before the calculation of the said signature (S) of the said second block (N+1 ).

2. Digital watermarking method according to Claim 1 characterized in that during the step of calculation of the said digital signature (E1 , E2), a hash function and an asymmetric encipherment function are applied to each block.

3. Method according to one of the preceding claims characterized in that the said blocks (N, N+1 , N+2) comprise a series of sub-blocks { B^^τ ) representative of a frequency decomposition of the said video.

4. Method according to one of the preceding claims characterized in that the said first (N) and second blocks (N+1 ) are temporally and spatially adjacent blocks.

5. Method according to Claims 3 and 4 characterized in that, the said signature (S) being a series of bits (bi), the said signature (S) relating to a first block (N) is inserted into the second block (N+1 ), by inserting into each of the sub-blocks (

) of the said second block (N+1 ) one of the said bits (bi) of the said signature (S) of the said first block (N).

6. Method according to Claim 6 characterized in that, during the insertion of the said bits (bi) into the said sub-blocks [ B^ ), the order of coefficients (T₁₅T₂ ) of the said sub-block {B"_xf ) is modified.

7. Method according to Claim 6 characterized in that the coefficients (F₁₅T₂ ) to be modified are chosen so that the absolute value of the difference of their absolute values is less than a predetermined threshold (Th).

8. Method according to Claim 3 characterized in that the said sub-blocks {B°_xf ) are representative of a decomposition by Fourrier transformation (DCT) of the said video.