CN103491456A

CN103491456A - SIFT (Scale-Invariant Feature Transform)-based geometric-attack-resistant robust video watermarking method

Info

Publication number: CN103491456A
Application number: CN201310474525.4A
Authority: CN
Inventors: 李清广; 陈真勇; 徐周川; 王美娟; 熊璋
Original assignee: RESEARCH INSTITUTE OF BEIHANG UNIVERSITY IN SHENZHEN
Current assignee: Shenzhen Air Technology Co., Ltd.
Priority date: 2013-10-11
Filing date: 2013-10-11
Publication date: 2014-01-01
Anticipated expiration: 2033-10-11
Also published as: CN103491456B

Abstract

The invention relates to an SIFT (Scale-Invariant Feature Transform)-based geometric-attack-resistant robust video watermarking method. A watermark embedding process comprises the following steps: (1) representing video frame images by Ycbcr, carrying out one-layer two-dimensional DWT (Discrete Wavelet Transform) on Y components of the images, and selecting harr wavelets as wavelet bases, thus obtaining low-frequency components after DWT; (2) taking n frames as a group, carrying out SIFT feature point extraction on the low-frequency components obtained in the step (1) of a first frame of each group, and recording the positions of obtained feature points; (3) adding a variable decided by watermark information to a pixel value of each feature point; (4) carrying out inverse DWT so as to obtain watermark embedded Y components, and transforming to an RGB (Red/Green/Blue) space from the Ycbcr, thereby completing watermark embedding. A watermark extracting process comprises the steps of carrying out steps (1) and (2) which are the same as the steps (1) and (2) of the embedding process, then, predicting pixel values of feature points according to surrounding pixels so as to obtain predicted difference values, and comparing the predicted difference values with a set threshold value, thereby obtaining embedded watermark information. The SIFT-based geometric-attack-resistant robust video watermarking method has the advantages that the robustness of video watermarking methods to geometric attacks is improved, the distortion is little, and the processing is simple.

Description

A kind of robust video watermark process of the resist geometric attacks based on SIFT

Technical field

The present invention relates to a kind of robust video digital watermark method, particularly a kind of based on SIFT(yardstick invariant features conversion, Scale-invariant feature transform) the anti-geometry attack robust video watermark process.

Background technology

As everyone knows, digital multimedia has been substituted traditional analog form, by the development of Internet communication technology, digital multimedia can conveniently copy and propagate, and the problem of piracy and the copyright dispute that cause thus also become day by day serious social concern.According to RIAA, estimate, whole world every year, American film employer's organization estimated because of the pirate economic loss caused up to 5,000,000,000 dollars, and piracy makes the annual income of American film industry reduce 2,500,000,000 dollars.

Traditional copyright protection means are cipher authentication techniques; cryptographic technique can guarantee the information security in the communication process of copyright from sender to recipient; once, but copyright is decrypted, cryptographic technique just no longer has protective effect, the bootlegger is copy propagation arbitrarily.

In order to make up the deficiency of cryptographic technique in the copyright protection field, a kind of new copyright protection means---digital watermark technology has obtained paying close attention to and development.Digital watermarking can be embedded in Digital Media (image, sound, document, video) the relevant information of representative works copyright is non; it can prove the ownership of authorship to its works; and as the evidence of identifying, prosecution is illegally encroached right; the integrality of copyright be can guarantee by the determination and analysis to watermark simultaneously, thereby intellectual property protection and the false proof effective means of digital multimedia become.

Video is as the widely used multimedia form of people; play an important role in daily life; the importance of its copyright protection is self-evident; robustness is the essential condition that video watermark need to be satisfied; and be the research emphasis of video watermark for the robustness of geometric attack always; the present invention is on the basis of the existing various video watermark process of resisting geometric attack of abundant research; in conjunction with the advanced technology in the fields such as computer vision, signal processing, being highly resistant to of the novelty proposed be take the robust video watermark process of geometric attack as main various attack.

Summary of the invention

The technical problem to be solved in the present invention is: overcome the deficiencies in the prior art, provide a kind of robust video watermark process such as common attacks such as geometric attacks of resisting of function admirable, to protect better the copyright of digital video.

The technical solution adopted for the present invention to solve the technical problems: a kind of robust video watermark process of the geometric attack resisted based on SIFT, as shown in Figure 1, 2, comprise watermark embedded part and watermark extracting part, two parts all be take sequence of frames of video as operand, by after the sequence segmentation, every section video is as a basic operating unit;

The watermark embed process of described watermark embedded part is:

(1.1) color space conversion, the frame of video that at first will be meaned by the RGB color space is converted to the frame of video meaned by the Ycbcr color space, extracts the Y component, i.e. luminance component;

(1.2) luminance component of frame of video is carried out to one deck two dimension DWT conversion, extract the low frequency component;

(1.3) in frame of video every n frame as one section, n is positive integer, concrete value is determined by watermark information size and frame of video quantity, every section first frame is converted to the low frequency component obtained through one deck two dimension DWT and carry out the extraction of SIFT characteristic value, the characteristic point position that record obtains;

(1.4) at described characteristic point position place, watermarked in the following manner:

p_{t}^{'} = p_{i} + a \times w_{t}

be the pixel value of i section characteristic point position place after watermarked

be i section characteristic point position place original pixel value, a is the multiplication factor that value is positive integer, and w represents watermark information, and value is 1 and-1, with the form of array be expressed as [1 ,-1,1 ,-1 ...], size is n;

(1.5) luminance component is carried out to one deck two dimension DWT inverse transformation, color space is converted to the RGB color space by Ycbcr, preserve video;

The watermark extraction process of described watermark extracting part is:

(2.1) color space conversion, the frame of video that at first will be meaned by the RGB color space is converted to the frame of video meaned by the Ycbcr color space, extracts the Y component, i.e. luminance component;

(2.2) one deck two dimension DWT conversion is carried out in the luminance component of frame of video, the wavelet basis of choosing is the haar small echo, extracts low frequency component;

(2.3) in frame of video, every n frame is as one section, and n is positive integer, and value when this value must embed with watermark is identical, and the low frequency component that every section first frame is obtained through one deck two dimension DWT conversion carries out the extraction of SIFT characteristic value, the characteristic point position that record obtains;

(2.4) the pixel prediction error of calculated characteristics point position, the fallout predictor adopted is SGAP, the computing formula of SGAP is:

x^{'} = \frac{x_{n} + x_{w}}{2} + \frac{x_{ne} - x_{nw}}{4}

Wherein x ' is pixel predictors, x _nfor predict pixel top pixel value, x _wfor left pixel value, x _nefor upper right corner pixel value, x _nwfor the top left corner pixel value;

Predicated error is defined as the difference of pixel value and its predicted value, each section is calculated its consensus forecast difference, this section all predicated error is sued for peace then divided by the sum of all pixels at characteristic point position place, is designated as e, set threshold value threshold > 0, extract result and be:

In described step (1.4), the value of α is 5≤a≤15.

The value of described step (2.4) threshold is 1≤threshold≤5.

It is the haar small echo that wavelet basis is chosen in the two dimension of one deck described in described step (1.2) DWT conversion, extracts low frequency component.

The principle of the invention: because the video in reality is substantially all colored, generally can mean with the RGB model, in the RGB model, energy disperses to be unfavorable for processing, processing procedure and raising robustness for method for simplifying, at first consideration is converted to the Ycbcr model by the RGB model, then unification is that luminance component is processed to the Y component, can convert back the RGB model according to concrete storage format after the watermark EO preserves again, after obtaining the Y component, again it is carried out to the 2D-DWT conversion, get the low frequency coefficient operation, the wavelet basis that wavelet transformation is chosen here is the Haar small echo.

But the beared information amount to the staged operation of video and video is closely related, and each Duan Ke represents 1 bit information, in actual mechanical process, can consider setting according to the frame number of video and the amount of information of required embedding, suppose that every n frame is one section, the numbering of section is designated as i, and the i value is 1,2 ..., n.

Next the first frame of every section is carried out to the SIFT conversion, the position of recording feature point, complete SIFT algorithm roughly has four steps: 1, the metric space extreme value detects.Detect the potential point of interest for yardstick and invariable rotary at metric space by the gaussian derivative function.2, the key point location.On point of interest location, determine position and the yardstick of key point.3, orientation determination.Gradient direction based on image local, give each key point assign direction.4, key point is described.The gradient of measurement image part in the field of each key point, finally express by a characteristic vector.In the present invention, the position of only having characteristic point of application, can omit most of irrelevant operation like this, only need to utilize part SIFT algorithm to get final product.

The embedding of watermark or extract for be the pixel value at characteristic point position place, computing formula during embedding is as follows:

p_{t}^{'} = p_{i} + a \times w_{t}

be the pixel value of i section characteristic point position place after watermarked,

be i section characteristic point position place original pixel value, a is the multiplication factor that value is positive integer, and w represents watermark information, and value is 1 and-1, with the form of array can be expressed as [1 ,-1,1 ,-1 ...], size is n.

At first the pixel prediction error that needs calculated characteristics point position during extraction, the fallout predictor adopted is SGAP, the computing formula of SGAP is:

x^{'} = \frac{x_{n} + x_{w}}{2} + \frac{x_{ne} - x_{nw}}{4}

Wherein x ' is pixel predictors, x _nfor predict pixel top pixel value, x _wfor left pixel value, x _nefor upper right corner pixel value, x _nwfor the top left corner pixel value.

The advantage that the present invention compared with prior art had is:

(1) the present invention can resist geometric attack better, such as rotation, shearing, convergent-divergent, upset etc.;

(2) distortion of the present invention is little, and due to only watermarked at the characteristic point position place in processing procedure of the present invention, and the ratio that the characteristic point number is occupied in the frame of video sum of all pixels is very little, so can not produce very large distortion;

(3) but watermark capacity self adaptation of the present invention adjustment, if the watermark information amount need embedded is greatly can suitably increase segments, if less reduce segments;

(4) the present invention realizes simply, and the process that embeds and extract is clear and definite, and computation complexity is little;

(5) the designed watermark of the present invention is blind watermark, that is to say and not need original video and original embed watermark information when watermark extracting.

The accompanying drawing explanation

Fig. 1 is overall framework figure of the present invention;

Fig. 2 is specific implementation flow chart of the present invention;

Fig. 3 is watermark embed process flow chart of the present invention;

Fig. 4 is watermark extraction process flow chart of the present invention.

Embodiment

As shown in Fig. 1,4, the present invention includes watermark embedding and watermark extracting two parts.

Wherein the watermark embed process flow chart is shown in Fig. 3, and its step is as follows:

Step 1: for every frame video data, at first be transformed under the Ycbcr color space, extract the Y component, if original video is the yuv form, can directly extract the Y component.

Step 2: the Y component is made to two-dimensional discrete wavelet conversion, and wavelet basis is the Haar small echo, extracts low frequency component.

Step 3: at first according to watermark information size and video frame number by the video equitable subsection, suppose that watermark information bit number is m, the video frame number is n, can select to be less than the digital k of the integer rounded under n/m, then every k frame is as one section, conveniently process during for information extraction and reach the related requests such as content-control, k can be made as to fixed value, such as 30.

Step 4: the first frame to every section video is done the SIFT feature extraction, and the characteristic point position that record extracts, in each frame basis position definite by first frame of every section video, is done operation as follows:

p_{t}^{'} = p_{i} + a \times w_{t}

Step 5: the frame of video after operation is carried out to two-dimentional DWT inverse transformation, color space is converted to the RGB color space by Ycbcr, preserve frame of video.

The watermark extraction process flow chart is shown in Fig. 4, and its step is as follows:

Step 1: color space conversion, the frame of video at first meaned by the RGB color space is converted to the frame of video meaned by the Ycbcr color space, extracts the Y component, i.e. luminance component;

Step 2: the luminance component to frame of video carries out one deck two dimension DWT conversion, and the wavelet basis of choosing is the haar small echo, extracts low frequency component;

Step 3: every n frame is as one section, and n is positive integer, and value when this value must embed with watermark is identical, and the low frequency component that every section first frame is obtained through above-mentioned conversion carries out the extraction of SIFT characteristic value, the characteristic point position that record obtains;

Step 4: the pixel prediction error of calculated characteristics point position, the fallout predictor adopted is SGAP, the computing formula of SGAP is:

x^{'} = \frac{x_{n} + x_{w}}{2} + \frac{x_{ne} - x_{nw}}{4}

Predicated error is defined as the difference of pixel value and its predicted value, and each section is calculated its consensus forecast difference, to this section all predicated error summation, then divided by the sum of all pixels at characteristic point position place, be designated as e, set threshold value threshold 0, extract result and be:

In a word, the present invention has improved the robustness of video watermark process to geometric attack, and distortion is little, processes simple.

Non-elaborated part of the present invention belongs to techniques well known.

The above; be only part embodiment of the present invention, but protection scope of the present invention is not limited to this, in the technical scope that any those skilled in the art disclose in the present invention; the variation that can expect easily or replacement, within all should being encompassed in protection scope of the present invention.

Claims

1. the robust video watermark process of the resist geometric attacks based on SIFT, it is characterized in that: comprise watermark embedded part and watermark extracting part, two parts all be take sequence of frames of video as operand, and by after the sequence segmentation, every section video is as a basic operating unit;

The watermark embed process of described watermark embedded part is:

p_{t}^{'} = p_{i} + a \times w_{t}

be the pixel value of i section characteristic point position place after watermarked, be i section characteristic point position place original pixel value, a is the multiplication factor that value is positive integer, and w represents watermark information, and value is 1 and-1, with the form of array be expressed as [1 ,-1,1 ,-1 ...], size is n;

The watermark extraction process of described watermark extracting part is:

x^{'} = \frac{x_{n} + x_{w}}{2} + \frac{x_{ne} - x_{nw}}{4}

2. the robust video watermark process of the resist geometric attacks based on SIFT according to claim 1 is characterized in that: in described step (1.4), the value of α is 5≤a≤15.

3. the robust video watermark process of the resist geometric attacks based on SIFT according to claim 1, it is characterized in that: the value of described step (2.4) threshold is 1≤threshold≤5.

4. the robust video watermark process of the resist geometric attacks based on SIFT according to claim 1 is characterized in that: it is the haar small echo that wavelet basis is chosen in the two dimension of one deck described in described step (1.2) DWT conversion, extracts low frequency component.