WO2009156973A1

WO2009156973A1 - Fingerprinting method and system

Info

Publication number: WO2009156973A1
Application number: PCT/IB2009/053226
Authority: WO
Inventors: Zhongxuan Liu; Shiguo Lian; Ronggang Wang; Zhen Ren
Original assignee: France Telecom
Priority date: 2008-06-27
Filing date: 2009-06-25
Publication date: 2009-12-30

Abstract

A method for embedding a fingerprint in a média content, said fingerprint comprising éléments belonging to an alphabet of symbols, said method comprising the acts of : - receiving a number of space-desynchronized streams of data derived from the média content, - dividing each space-desynchronized stream into a plurality of segments following an identical segmentation scheme, - selecting sequentially consécutive segments from the number of space- desynchronized streams so that said consécutive selected segments form a space-desynchronized version of the média content, - embedding into the selected segments watermarks corresponding to said éléments of the fingerprint.

Description

FINGERPRINTING METHOD AND SYSTEM

Field of the Invention

The present invention relates in general to a method for watermarking media content and more specifically for securely fingerprinting media data.

Background of the Invention

Media content, for example a video, is intended to be distributed, but only users paying for the content should be allowed to access it. In order to trace the users distributing the content illegally, user identification information, or fingerprint, is imperceptibly inserted or embedded into the media content. This fingerprint is unique for each user and allows the identification of the user. As far as fingerprinting is concerned, the goal of hackers is to distribute the content illegally without being traced by removing, at least partially, the fingerprint from the media content. One of the most common types of attack is called Linear Combination Collusion Attack (LCCA) and consists in combining several media copies of the same content (for instance, several copies of the same video) and extracting an averaged video so that the fingerprint is at least partially altered and hence prevents any identification.

A fingerprint is composed of a series of elements from an alphabet. Usually, these elements are 0 and 1 if the binary alphabet is used. Fingerprinting usually consists in embedding a plurality of watermarks into a media content, wherein each watermark corresponds to one or several elements of the fingerprint. But such a raw method for fingerprinting directly into the media content does not resist LCCA.

In order to limit LCCA, a technique called time desynchronization may be used. It consists in suppressing randomly different frames in each video copy. The number of suppressed frames needs to remain relatively small, usually around 10% of the total number of frames, in order to avoid degrading the quality of the video too much, i.e. in a perceptible manner for the human eye. This time desynchronization technique is efficient, but is still not secure enough against some hackers' attacks such as resynchronization attacks. A resynchronization attack consists in finding the most similar frame in two copies of a given media content in order to replace one of them (e.g. with the one of the other copy) and compensate for the suppressed frames.

In order to limit LCCA and resynchronization attacks, a technique called space desynchronization may be used. It consists in modifying frames (e.g. shifting, rotating, translating...) from one copy of the media content to another so that superimposing several copies of the same media content will make it unreadable.

This space desynchronization technique is efficient, but is still not secure enough against some hackers' attacks as known methods are only suitable for raw

(uncompressed) videos. Known space-desynchronization methods do not consider video compression during transmission and secure transmission of the desynchronized videos has not been considered.

Broadcasting is a one-way distribution or transmission of media content to a plurality of users. When applied to broadcast, known fingerprinting methods have shortcomings of low collusion resilience ability, i.e they are not very secure as broadcasting involves distributing some media content to a plurality of users using a one-way distribution scheme, i.e. the media content is sent regardless of the identity of the users, providing they are allowed to receive said broadcasted media content. Distributing a media content to anonymous users and fingerprinting said media content in regard to each user are two antinomical issues that need to be tackled to enhance security of existing broadcasting solutions.

Today there is a need for a secure broadcasting media content solution that can be easily implemented on the existing communication infrastructures, overcoming the drawbacks of the prior art.

Summary of Invention

It is an object of the present system to overcome disadvantages and/or make improvement over the prior art.

To that extend, the invention proposes a method according to claim 1 . The invention also relates to a fingerprinting device according to claim 4. The invention also relates to an emitting device according to claim 7. The invention also relates to an emitting device according to claim 8. The invention also relates to a system according to claim 9.

The invention also relates to a system according to claim 10.

The invention also relates to a computer program according to claim 11.

Brief Description of the Drawings

Embodiments of the present invention will now be described solely by way of example and only with reference to the accompanying drawings, where like parts are provided with corresponding reference numerals, and in which:

Figure 1 schematically illustrates the system according to an embodiment of the present invention;

Figure 2 schematically illustrates a process of time points selection according to an embodiment of the present invention;

Figure 3 schematically illustrates a process stream selection according to an embodiment of the present invention; Figure 4 schematically illustrates examples of streams used in the method according to an embodiment of the present invention;

Figure 5 schematically illustrates the system according to an embodiment of the present invention;

Figure 6 schematically illustrates the method according to an embodiment of the present invention.

Description of Preferred Embodiments

The method according to the invention is a media content fingerprinting method robust to collusion and resynchronization attacks and is particularly adapted to transmissions like broadcasting.

The method according to the invention allows altogether distributing a media content to a plurality of anonymous users (i.e broadcasting) and fingerprinting said media content in regard to each user. In the following example, the media content is a video, but the present teachings may be transposed by the man skilled in the art to any type of media content such as audio...

Figure 5 describes the system according to the invention. An emitting device or emitter or transmitter 500 is linked to at least one receiving device or receiver 520 through a network or link 510. The emitting device 500 and the receiving device 520 may be located on the same device or on distinct devices connected through the network (wireless or wireline) or link 510. Streams corresponding to a media content are broadcasted or multicasted from an emitting device to a plurality of receiving devices 520 (corresponding to a plurality of users). A receiver may be for instance a Personal Computer of a user whereas an emitter may be for instance a broadcasting studio.

The method according to the invention may be integrally implemented on the receiver 520 or split between the receiver 520 and the emitter 510.

The figure 1 describes an exemplary embodiment of the method according to the invention. The video or media content is captured in an act 100 and processed in an act 1 10. This processing act 1 10 includes three steps:

- lengthening which is done by inserting some frames, this step is optional and allows time-desynchronization in order to increase security;

- space-desynchronization which is done in a two-step process. Firstly, several identical copies (or streams) of the same (or original) media content are made. Secondly, said copies are desynchronized between each other by operating a spatial transformation of the media content such as e.g. for a video frame a translation of the image in regard to the original video frame of the media content, or a rotation, or else...). In Figure 1 , two space-desynchronized streams are derived from the original media content as an example);

- compression using compression standards such as e.g. MPEG2 or H.264 (the compression parameters for different copies, or streams, such as e.g. I/P/B frames configuration should be the same).

The lengthened, space-desynchronized and compressed resulting streams 120 and 125 may be encrypted respectively in optional acts 130 and 135 controlled by key sequences 1 , 141 , and 2, 142, which are computed using user's ID 143 in a user's key in an act 140.

A set of exchangeable time points 1 15 is defined and sent to the receiver. These exchangeable time points allow defining segments in the desynchronized streams, following the same scheme for each desynchronized stream, in the method according to the invention. An exchangeable time point corresponds to a point of the media content which indicates where the receiver may switch between one desynchronized stream to another in the method according to the invention. For example, using MPEG2, exchangeable time points may be random access points; for example using H.264, exchangeable time points may be random access points or sending SP/SI frames additionally. For diminishing the artifacts caused by switching between space desynchronized copies at exchangeable time points, exchangeable time points may ideally be points in the video where the space- desynchronized streams/copies are similar or fast moving frames time points to minimize the visual impact on switching. Generation of exchangeable time points will be further described in Figure 2.

In the method according to the invention, exchangeable time points 1 15 are sent by the emitter to the receiver(s). Streams (optionally encrypted) 130 and 135 are sent by the emitter to the receiver(s). Streams (optionally encrypted) 130 and 135 exchangeable time points 1 15 may be sent together or separately. As they are unique for each receiver, user identifications (IDs) 143 may be sent (e.g. unicasted) to each receiver in the method according to the invention. To increase security, encryption may be used. In this optional case, the user ID 143 is combined with key sequences 141 and 142 (one for each stream) in order to form a User's Key 140. The User's Key 140 is then sent (e.g. unicasted) to the receiver.

A new stream 150 is formed by selecting segments in the different desynchronized streams considering the exchangeable time points (and optionally keys) in order to further embed watermarks corresponding to the elements of the fingerprint. In an optional act (used to increase security and when lengthening was used in act 1 10), the stream 150 is shortened to original length (time-desynchronization) in an act 160 also controlled by key when encrypted. If encrypted, the resulting stream is decrypted and watermarks corresponding to the elements of the fingerprint are embedded in an act 170. The resulting stream is thus decompressed in an act 180 for further display in an act 199. Acts 150, 160 and 170 may be implemented in a single process illustrated in Figure 3 and 4.

Figure 6 describes an exemplary embodiment of the method according to the invention allowing embedding a fingerprint in a media content, said fingerprint comprising elements belonging to an alphabet of symbols, said method comprising the acts of:

- receiving a number of space-desynchronized streams of data derived from the media content in an act 610,

- dividing each space-desynchronized stream into a plurality of segments following an identical segmentation scheme in an act 620,

- selecting sequentially consecutive segments from the number of space- desynchronized streams so that said consecutive selected segments form a space-desynchronized version of the media content in an act 630,

- embedding into the selected segments watermarks corresponding to said elements of the fingerprint in an act 840.

Different segments from different streams are assembled to obtain a continuous and readable version of the original (i.e. before space-desynchronization) media content. Watermarks corresponding to symbols of the fingerprint to embed are then embedded into some of these segments forming a new readable version of the original media content.

Figure 2 describes an embodiment of (probable) exchangeable time (or switch) points selections. In a preliminary act, a raw or compressed video 210 is captured in an act 200. The video content is analyzed in an act 220 to find shot cuts and background moving frames 230 which are fitful for video interchange. Then a sequence 240 composed of '0' and '1 ' corresponding to each frame is computed. '0' meaning that the corresponding frame can not be exchanged (or that switching between streams is not allowed on this frame) and '1 ' meaning the corresponding frame may be exchanged (or that switching between streams is allowed on this frame). The sequence may be compressed e.g. by entropy coding and may be coded 250 for output 299 e.g. with the SP/SI frames code 260.

Figure 3 describes an exemplary embodiment of exchangeable time points generation and use. (Probable) exchangeable time points 300 and a random sequence comprising 'Os' and '1 s' (optionally derived from key 310 by applying a mathematical function, e.g. a random function etc..) may be used to generate practical exchangeable time points 330. Then stream 1 , 340, and stream 2, 350, may be used to form a new stream, or output, 399, through desynchronized streams selection 360 according to the exchangeable time points 330 and (optionally) additional information such as SP/SI frames information for H.264 coding. For example, the method according to the invention allows switching from one desynchronized stream to another on a exchangeable time point 300 or a practical exchangeable time point 330 obtained by the combination of the exchangeable time point 300 and of a random sequence.

Figure 4 describes an illustration of an exemplary embodiment of the method according to the invention wherein decryption, functions of space and time desynchronization and fingerprint embedding are combined on frames of a media content (e.g. a video). In this embodiment, two desynchronized streams from the same media content are used. These two streams are encrypted, space-and time- desynchronized, then a fingerprint is embedded. A resulting stream is obtained by combining different frames from the two space-desynchronized streams. In the method according to the invention, at least two space-desynchronized streams of a given media may be used. In this exemplary embodiment, two space- desynchronized streams (stream 1 and stream 2) are used to illustrate the method according to the invention. In this illustration of an exemplary embodiment of the method according to the invention, (media) streams are sequences of frames. In this illustration of an exemplary embodiment of the method according to the invention, the media stream of the original media content has been lengthened (frames have been added, e.g. by duplication) before space-desynchronization: e.g. on Figure 4, frames 10', 12' and 16'. This lengthening act is optional in the method according to the invention and allows keeping more frames if some of them are further eliminated in an optional act when embedding watermarks corresponding to the elements of the fingerprint.

Encryption key sequences Key1 and Key2 are respectively used to encrypt stream 1 and stream 2. Frames for which it is possible to switch from one stream to another are indicated with a bit 1 in the sequence called Exchangeable Time Points. Frames for which it is not possible to switch from one stream to another are indicated with a bit 0 in the sequence called Exchangeable Time Points.

The sequence called Desynchronizing Codes is used to indicate the number of the stream to switch to on an exchangeable time point: the Desynchronizing

Codes correspond to the code identifying the stream of origination of the frames for the new resulting stream when switching between streams on exchangeable time points.

The Skipped Frames sequence indicates the frames which will be eventually skipped for time desynchronization of the new resulting stream (this optional act allows increasing security and is usually carried out by the receiver). A 1 indicates that the corresponding frame will be skipped.

A fingerprint comprises n elements chosen in an alphabet of m symbols.

The Fingerprint Codes sequence indicates frames to keep or to eliminate. Indeed, for each element of the fingerprint to embed, elements of the fingerprint alphabet are embedded in some frames of the streams. More particularly, watermarks corresponding to elements of the fingerprint are embedded in the same frames of each stream. For example, watermarks corresponding to a 0 and a 1 are embedded in a pair of frames for as many elements as there are in the fingerprint. For instance in Figure 4, the fingerprint to embed is 010, hence, 3 pairs of watermarks corresponding to bits 0 and 1 are embedded into frames 10/10', 12/12' and 16/16'. The Fingerprint Code sequence indicates the frame to keep in each pair of frames so that the remaining frame of the pair is embedded with the watermark corresponding to an element of the fingerprint to embed into the media. For instance in Figure 4:

- frame 10 is embedded with a watermark corresponding to bit 0, - frame 10' is embedded with a watermark corresponding to bit 1 ,

- frame 12 is embedded with a watermark corresponding to bit 0,

- frame 12' is embedded with a watermark corresponding to bit 1 ,

- frame 16 is embedded with a watermark corresponding to bit 0, - frame 16' is embedded with a watermark corresponding to bit 1.

So, as the fingerprint to embed is 010, then, in the Fingerprint Codes sequence:

- a code 0 corresponding to the frames with the embedded watermark to eliminate (here frames 10', 12 and 16') will be assigned,

- a code 2 corresponding to the frames with the embedded watermark to keep (here frames 10, 12' and 16) will be assigned,

- a code 1 corresponding to the frames with no embedded watermark will be assigned.

KeyR is the resulted key sequence obtained from switching from the different streams using the Exchangeable Time Points and Desynchronizing Codes sequences. KeyR is sent to the receiver.

EV1 and EV2 are the two copies/streams sent to the receiver.

DV is the video decrypted for decompression and optionally time-desynchronized using the Skipped Frames sequence.

If time-desynchronization is not used, the frames weighted with a 0 in the Fingerprint codes sequence are eliminated and the frames weighted with a 1 and a 2 in the Fingerprint Codes sequence are kept.

When optional time-desynchronization is used, the sum of the Skipped Frames sequence and the Fingerprint Codes sequence is calculated and all the frames weighted with at least 2 are kept and the others (weighted with 0 or 1 ) are eliminated.

In the exemplary embodiment illustrated in Figure 4, the method according to the invention is applied to encryption keys which will further allow frames selection.

More generally, the method according to the invention may be indifferently applied to key sequences, to space-desynchronized streams or to encrypted space- desynchronized streams mutatis mutandis. The method according to the invention is used for embedding information allowing user identification, i.e. a fingerprint, into the media content, for application like e.g. DVDs. In the prior art, the code length depends directly of the number of copies N. In the method according to the invention, the code length may decrease dramatically as different copies of the media content are distinguished not only with a fingerprint, but also because copies of different users are different between each other because of:

- the switching between streams using different exchangeable time points (particularly if a random sequence is used as described in Figure 3),

- the space-desynchronization between the streams,

- the optional elimination of frames/time-desynchronization. The method according to the invention may be particularly suitable for applications like for example DVDs. The method according to the invention may be implemented for example during the media making process (DVD creation, DVD copy...), before the media opening or reading or for other application such as e.g. Video on Demand or programs broadcasting, multicasting or unicasting, on the receiver or split between the receiver and the emitter.

Media content such as e.g. a video is usually compressed using a compression algorithm according to a standard such as e.g. Moving Picture Experts

Group 2 (MPEG2) and where said video is compressed into a plurality of frames.

The media content, e.g. a video, is typically divided or compressed into a series of elements such as e.g. l-frames, B-frames and P-frames used in compression protocols such as e.g. MPEG2. The fingerprint is composed of elements belonging to an alphabet. Usually, the alphabet is a binary alphabet comprising O and 1 bits as elements.

In one exemplary embodiment of the method according to the invention, the elements of the fingerprint alphabet correspond to different space-desynchronized copies or streams of the media content. A selection of segments is performed in each stream corresponding to each element of the fingerprint to embed. The segments are consecutive so that a new copy of the media content is made from the different streams which correspond to the different fingerprint elements. Exchangeable time points for media space-desynchronized copy selection and sequencing and smooth space desynchronization parameters are used so that the resulting media content has little to none visual degradation. The content may be broadcasted or multicasted to users. If the media is encrypted before fingerprint embedding, the user's encryption key is unicasted to the user for saving bandwidth consumption and increase security. Several encrypted lengthened media with different space desynchronization exchangable time points are multicasted to users. In every stream, space desynchronization parameters may change smoothly between exchangable time points pairs. When used as time points, Prediction S- frame/lntra S frames (SP/SI frames) are transmitted to users by multicast or unicast (SP and SI frames are new picture types introduced in the latest video coding standard H.264. They allow driftfree bitstream switching and may also be used for error-resilience and random access). At the receiver side, a special stream is formed by switching between the multiple streams. Finally, the resulting stream is decrypted using time points (=SP/SI frames?) after time down-sampling.

In an optional exemplary embodiment of the method according to the invention, time desynchronization and encryption/decryption are combined with the method according to the invention in order to enhance the security. The security of the system increases and no extra computation is added to the reading system. The method according to the invention is particularly suitable for broadcasting transmissions, but may also be implemented for multicasting or unicasting transmissions.

Some of the acts of the method according to the invention may be performed by the emitting device; some of the acts of the method according to the invention may be performed by the receiving device.

For example in one particular illustration of embodiment of the method according to the invention, the emitting device (used for space-desynchronizing a media content), may be operable to: - derive a number of space-desynchronized streams of data from the media content, - divide each space-desynchronized stream into a plurality of segments following an identical segmentation scheme,

- distribute the plurality of segments to at least one receiving device.

For example in another particular illustration of embodiment of the method according to the invention, the emitting device (used for space-desynchronizing a media content), may be operable to:

- derive a number of space-desynchronized streams of data from the media content,

- distribute the number of space-desynchronized streams in association with a segmentation scheme to at least one receiving device for subsequent segmentation by said receiving device of each one of said space- desynchronized streams into a plurality of segments following the same received segmentation scheme.

The fingerprinting device (for embedding a fingerprint in a media content, said fingerprint comprising elements belonging to an alphabet of symbols, said media content being divided in a number of space-desynchronized streams of data derived from said media content, each of said space-desynchronized stream being further divided into a plurality of segments following an identical segmentation scheme), may be operable to: - select sequentially consecutive segments from the number of space- desynchronized streams so that said consecutive selected segments form a space-desynchronized version of the media content,

- embed into the selected segments watermarks corresponding to said elements of the fingerprint. The fingerprinting device may be further operable to receive the segmentation scheme in the form of a plurality of points delimitating data segments in each stream derived from the media content and to divide each of said space- desynchronized stream into a plurality of segments following said segmentation scheme. The fingerprinting device, wherein the media content and consequently the plurality of segments comprise a plurality of frames, may be further operable to embed a watermark into a least one frame of the segment.

In a particular exemplary embodiment of the method according to the invention, the different space-desynchronized streams may each correspond to a different symbol of the fingerprint alphabet. In this case, the method according to the invention comprises the acts of:

- deriving the media content into a number of space-desynchronized streams of data equal to the number of symbols present in the fingerprint, thereby associating one symbol present in said fingerprint per space-desynchronized stream,

- dividing each space-synchronized stream into a plurality of segments following an identical segmentation scheme, - selecting sequentially consecutive segments from the space-synchronized streams so that the corresponding associated symbols match the elements of the fingerprint,

- embedding into each selected segment a watermark corresponding to said element of the fingerprint. In this particular exemplary embodiment, the method according to the invention may comprise a preliminary act of receiving the segmentation scheme in the form of a plurality of points delimitating data segments in each stream derived from the media content. Moreover, the media content and consequently the plurality of segments may comprise a plurality of frames and the act of embedding a watermark further comprises the act of embedding the watermark into a least one frame of the segment.

Claims

1. A method for embedding a fingerprint in a media content, said fingerprint comprising elements belonging to an alphabet of symbols, said method comprising the acts of:

- receiving a number of space-desynchronized streams of data derived from the media content,

- dividing each space-desynchronized stream into a plurality of segments following an identical segmentation scheme, - selecting sequentially consecutive segments from the number of space- desynchronized streams so that said consecutive selected segments form a space-desynchronized version of the media content,

- embedding into the selected segments watermarks corresponding to said elements of the fingerprint.

2. A method according to claim 1 , said method comprising a preliminary act of receiving the segmentation scheme in the form of a plurality of points delimitating data segments in each stream derived from the media content.

3. A method according to any of the preceding claims, wherein the media content and consequently the plurality of segments comprise a plurality of frames, and wherein the act of embedding a watermark further comprises the act of embedding the watermark into a least one frame of the segment.

4. A fingerprinting device for embedding a fingerprint in a media content, said fingerprint comprising elements belonging to an alphabet of symbols, said media content being divided in a number of space-desynchronized streams of data derived from said media content, each of said space-desynchronized stream being further divided into a plurality of segments following an identical segmentation scheme, said fingerprinting device being operable to: - select sequentially consecutive segments from the number of space- desynchronized streams so that said consecutive selected segments form a space-desynchronized version of the media content,

- embed into the selected segments watermarks corresponding to said elements of the fingerprint.

5. A fingerprinting device according to claim 4, said fingerprinting device being operable to receive the segmentation scheme in the form of a plurality of points delimitating data segments in each stream derived from the media content, said fingerprinting device being further operable to divide each of said space-desynchronized stream into a plurality of segments following said segmentation scheme.

6. A fingerprinting device according to any of the preceding claims 4 and 5, wherein the media content and consequently the plurality of segments comprise a plurality of frames, said fingerprinting device being further operable to embed the watermark into a least one frame of the segment.

7. An emitting device for space-desynchronizing a media content, said emitting device being operable to:

- divide each space-desynchronized stream into a plurality of segments following an identical segmentation scheme,

- distribute the plurality of segments to at least one receiving device.

8. An emitting device for space-desynchronizing a media content, said emitting device being operable to:

- derive a number of space-desynchronized streams of data from the media content, - distribute the number of space-desynchronized streams in association with a segmentation scheme to at least one receiving device for subsequent segmentation by said receiving device of each one of said space- desynchronized streams into a plurality of segments following the same received segmentation scheme.

9. A system for embedding a fingerprint in a media content, said fingerprint comprising elements belonging to an alphabet of symbols, said system comprising: - an emitting device for space-desynchronizing the media content, said emitting device being operable to:

- distribute the plurality of segments to at least one fingerprinting device;

- a fingerprinting device for embedding a fingerprint in the media content, said fingerprinting device being operable to: - select sequentially consecutive segments from the received number of space-desynchronized streams so that said consecutive selected segments form a space-desynchronized version of the media content,

10. A system for embedding a fingerprint in a media content, said fingerprint comprising elements belonging to an alphabet of symbols, said system comprising:

- an emitting device for space-desynchronizing a media content, said emitting device being operable to: - derive a number of space-desynchronized streams of data from the media content,

- distribute the number of space-desynchronized streams in association with a segmentation scheme to at least one fingerprinting device;

- a fingerprinting device for embedding a fingerprint in the media content, said fingerprinting device being operable to:

- divide each received space-desynchronized stream into a plurality of segments following an identical segmentation scheme corresponding to the received segmentation scheme,

- select sequentially consecutive segments from the received number of space-desynchronized streams so that said consecutive selected segments form a space-desynchronized version of the media content,

A computer program providing computer executable instructions stored on a computer readable medium, which when loaded on to a data processor causes the data processor to perform a method for embedding a fingerprint into a media content according to claims 1 to 3.