CN115170378A

CN115170378A - Video digital watermark embedding and extracting method and system based on deep learning

Info

Publication number: CN115170378A
Application number: CN202210661160.5A
Authority: CN
Inventors: 王晗; 张志伟; 胡海; 崔凯元
Original assignee: Beijing Forestry University
Current assignee: Beijing Forestry University
Priority date: 2022-06-13
Filing date: 2022-06-13
Publication date: 2022-10-11

Abstract

The invention relates to a video digital watermark embedding and extracting method and a system based on deep learning, wherein the method comprises the following steps: s1: constructing a training set of images using the public video; s2: embedding and extracting a training set input video digital watermark into a network for training to obtain a trained model; the video digital watermark embedding and extracting network comprises: the system comprises a watermark embedding network, an image transformation module and a watermark extraction network; s3: extracting a key frame of a video to be embedded with the watermark, inputting the key frame and the watermark into a trained video digital watermark embedding network, outputting the key frame embedded with the watermark, and then putting back the video to be embedded with the watermark; s4: and extracting the frame to be detected containing the digital watermark video, correcting the frame, inputting the frame to the trained video digital watermark extraction network, and extracting the watermark. The invention provides a video digital watermark embedding and extracting method capable of tracing a source, which has strong robustness and makes a contribution to digital video leakage tracing and video intellectual property protection under a new media environment.

Description

Video digital watermark embedding and extracting method and system based on deep learning

Technical Field

The invention relates to the field of digital watermarks, in particular to a video digital watermark embedding and extracting method and system based on deep learning.

Background

With the development of computer and network technologies, multimedia products are becoming digital, and digital audio-video products are getting into people's lives. Although digitization makes multimedia information easier to edit, make, store, transmit, and improves the quality of audio-visual products, it also brings new copyright problem. For example, unlimited copying of a highly valued work without the consent of the owner of the work would result in considerable economic loss to the producer and content provider. Moreover, due to the advantage of digitization, the video information is extremely easy to tamper, and the integrity of the original work is seriously threatened. Some information with special significance, such as information related to judicial litigation, government agencies, etc., is subject to malicious attack and falsification. The negative effects caused by the characteristics of the series of digital technologies become a great obstacle to the health and continuous development of the information industry.

Therefore, copyright protection for digital products is becoming increasingly important. It is often considered that the implementation of copyright protection can be done by encryption. The method includes the steps that firstly, a multimedia data file is encrypted into a ciphertext and then is issued, so that an illegal attacker who appears in the network transmission process cannot obtain confidential information from the ciphertext, and the purposes of copyright protection and information security are achieved. On one hand, the encrypted file hinders the propagation of multimedia information due to the unintelligibility of the file; on the other hand, multimedia information is easy to attract curiosity and attention of attackers after being encrypted, and has the possibility of being cracked, and once the encrypted file is cracked, the content of the encrypted file is completely transparent. Cryptography has been considered and valued as the primary means of information security in the field of communications research applications, and this has not changed until the last few years. The existing copyright protection system mostly adopts a password authentication technology (such as a security password of a DVD optical disc), but the problem of copyright protection cannot be completely solved only by adopting the password, and the password can only carry out data encryption protection in the transmission process of data from a sender to a receiver. However, when the information is received and decrypted, all encrypted documents are the same as ordinary documents and are not protected any more, and thus the encrypted documents cannot survive piracy. Therefore, how to perform digital product copyright protection and data security maintenance becomes a problem to be solved urgently.

Disclosure of Invention

In order to solve the technical problem, the invention provides a video digital watermark embedding and extracting method and system based on deep learning.

The technical solution of the invention is as follows: a video digital watermark embedding and extracting method based on deep learning comprises the following steps:

step S1: extracting a preset number of video frames from the public video and cutting to obtain an input image; generating a random binary string as watermark information data, and constructing a training set by the input image and the watermark information data;

step S2: inputting the images and watermark information data in the training set into a video digital watermark embedding and extracting network together for training to obtain a trained video digital watermark embedding and extracting network; wherein, the video digital watermark embedding and extracting network comprises: the video digital watermark embedding network is used for embedding the watermark W into the input image Img to obtain the image Img containing the watermark _encoded (ii) a The image conversion module is used for converting Img _encoded Performing attack transformation to obtain Img' _encoded (ii) a And the video digital watermark extraction network is used for extracting Img' _encoded Medium watermark W' based on Img _encoded Respectively constructing loss functions with Img and W' and W, and updating network parameters until the trained video digital watermark is embedded into and extracted from the network;

and step S3: extracting key frames of a video to be embedded with a watermark, inputting the key frames and the watermark containing user information into a trained video digital watermark embedding network, outputting the key frames embedded with the watermark, and putting the key frames embedded with the watermark back into the video to be embedded with the watermark;

and step S4: and extracting the frame to be detected of the video containing the watermark, inputting the trained video digital watermark extraction network, extracting the watermark and acquiring the user information in the watermark.

Compared with the prior art, the invention has the following advantages:

the invention discloses a video digital watermark embedding and extracting method based on deep learning, which is realized by a video digital watermark algorithm based on DWT (Discrete Wavelet Transform) domain and deep learning and an image registration method based on SIFT (scale invariant feature Transform) characteristics.

Drawings

Fig. 1 is a flowchart of a video digital watermark embedding and extracting method based on deep learning in an embodiment of the present invention;

FIG. 2 is a schematic diagram of a video digital watermark embedding network according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a network structure for extracting video digital watermarks in an embodiment of the present invention;

fig. 4 is a block diagram of a video digital watermark embedding and extracting system based on deep learning according to an embodiment of the present invention.

Detailed Description

The invention provides a video digital watermark embedding and extracting method based on deep learning, which is used for embedding and extracting watermarks of traceable sources for videos and making contributions to digital video leakage tracing and video intellectual property protection in a new media environment.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings.

For a better understanding of the embodiments of the present invention, the Discrete Wavelet Transform (DWT) used in the embodiments of the present invention is explained:

discrete Wavelet Transform (DWT) can represent information of an image in a frequency domain and a time domain at the same time, and is a convenient image processing mode. The wavelet transform is a breakthrough to the fourier transform and the short-time fourier transform, and the change is that finite length wavelet which can be attenuated is used to replace infinite length trigonometric function base, and in numerical analysis and functional analysis, the discrete wavelet transform is any wavelet transform which performs discrete sampling on the wavelet. One key advantage of discrete wavelet transforms over fourier transforms, as with other wavelet transforms, is the ability to time resolve, i.e., the discrete wavelet transform captures frequency and location information based on time. In computer vision systems, discretization of the input variables is required for ease of storage and computation, which requires discretization of the wavelet, i.e., DWT. The DWT is obtained by discretizing the scale factor and the shifting factor in the wavelet transform.

The DWT has the following transformation formula:

wherein

To approximate the coefficients, W _ψ Is a wavelet detail coefficient;

for the scale function, ψ (t) is a wavelet function, and the summation range n is 0,1,2.

ψ _j,k (t)＝2 ^j/2 ψ(2 ^j t-k),j,k≥0

Where j represents the scale and expansion of the wavelet function in the frequency domain and k represents the translation of the function in the time domain.

The two-dimensional image is subjected to DWT transformation and is decomposed into four different components, namely horizontal LH (low frequency, high frequency), vertical HL (high frequency, low frequency), diagonal HH (high frequency ) and low frequency LL (low frequency ), wherein the low frequency components are concentrated on the left side, and the DWT transformation can be continuously performed on the four components in a recursive manner. The image information contained in the low-frequency component is more important, and the quality of the image is greatly influenced if the disturbance is too large; the high frequency components are details of the image, and the image quality is not affected much if the high frequency components are discarded in the image compression.

The formula for inverse transformation of IDWT is as follows:

wherein j is generally calculated ₀ Put 0,M as a power of 2, sum range k is 0,1,2 ^j -1。

Analysis on the image transformed by the DWT shows that if the image is attacked, the low-frequency region is generally not affected much, and almost complete information can be retained, so the DWT-based watermark usually writes the content in the low-frequency region to cope with the possible attack. But with the problem that the quality of the image may be affected considerably.

Example one

As shown in fig. 1, a video digital watermark embedding and extracting method based on deep learning according to an embodiment of the present invention includes the following steps:

step S2: inputting the images and watermark information data in the training set into a video digital watermark embedding and extracting network together for training to obtain a trained video digital watermark embedding and extracting network; the video digital watermark embedding and extracting network comprises: the video digital watermark embedding network is used for embedding the watermark W into the input image Img to obtain the image Img containing the watermark _encoded (ii) a The image conversion module is used for converting Img _encoded Performing attack transformation to obtain Img' _encoded (ii) a And the video digital watermark extraction network is used for extracting Img' _encoded Medium watermark W' based on Img _encoded Respectively constructing loss functions with Img and W' and W, and updating network parameters until the trained video digital watermark is embedded into and extracted from the network;

and step S3: extracting key frames of a video to be embedded with the watermark, inputting the key frames and the watermark containing user information into a trained video digital watermark embedding network, outputting the key frames embedded with the watermark, and embedding the key frames embedded with the watermark into the video to be embedded with the watermark;

In one embodiment, the step S1: extracting a preset number of video frames from the public video and cutting the video frames to obtain a training set of key frames for constructing the input image, wherein the training set specifically comprises the following steps:

extracting a certain number of video frames from the public video as key frames, cutting the key frames according to a preset size, normalizing to obtain preprocessed key frames and obtain input images; and meanwhile, generating a random binary string containing user information as watermark information data, and constructing a training set by the input image and the watermark information data.

In one embodiment, the video digital watermark embedding network in step S2 embeds the watermark W into the input image Img to obtain the image Img containing the watermark _encoded The method specifically comprises the following steps:

step S201: acquiring watermark information data and converting the watermark information data into a watermark array W e {0,1} ^N N is the length of W; if the length of the watermark array is not enough, then 0 is supplemented until the preset length N; a first bit of W is provided with a flag bit for identifying whether the watermark is correct or not;

step S202: acquiring a picture Img in a training set, selecting a region with a preset size from the center of the Img, converting the region from an RGB color space to a YCbCr color space, and extracting a Y component matrix H in the region, wherein the size of the H is H multiplied by w; partitioning a component matrix H into 8 x 8 sized sets of sub-blocks B _y (i) I ∈ N, N = (h × w)/(8 × 8); to B _y (i) Carrying out DWT to obtain a transform subblock set B _dwt (i)；

As shown in fig. 2, W (i) represents the ith binary watermark in the watermark array W, and W (i) becomes a 1 × 4 × 4 data block after being expanded; b _y (i) Y-channel data blocks representing a YCbCr color space, the size of which is 1 × 8 × 8; b is _y (i) The data blocks are transformed into 4 multiplied by 4 after DWT;

step S203: two-dimensional watermark array W and transformation sub-block set B _dwt (i) Inputting a video digital watermark embedding network, outputting a transformed sub-block set B 'containing a watermark after operation of a watermark embedding convolution module' _dwt (i) As shown in fig. 2, the watermark embedding convolution module includes 5 two-dimensional convolution layers, a Batch Normalization (Batch Normalization) layer and a ReLU activation layer are used between the two-dimensional convolution layers, and the output channel depths of the 5 two-dimensional convolution layers are 16, and 4, respectively;

inputting the 1 × 4 × 4 data block obtained in step S202 and the 4 × 4 × 4 data block together into a video digital watermark embedding network, and then generating a data block of 5 × 4 × 4 after splicing in the first dimension; then, a data block B 'of 4X 4 in size is generated by passing through 5 two-dimensional convolution layers' _dwt (i)。

Step S204: for the coded transform subblocksCollection B' _dwt (i) Each sub-block in the color space is respectively subjected to inverse DWT conversion to obtain a Y 'component containing the watermark, and the Y' component is combined into a YCbCr color space and then converted into an RGB color space; obtaining an image Img containing a watermark _encoded ；

Step S205: calculating Img according to equation (1) _encoded And Loss value of Img Loss _img ：

Loss _img ＝LPIPS(Img,Img _encoded ) (1)。

In one embodiment, the image transformation module in step S2 is configured to transform Img _encoded Subjected to conversion enhancement to obtain Img' _encoded The method specifically comprises the following steps:

will Img _encoded Input image conversion module for Img _encoded Adding random noise, gaussian blur, JPEG image compression or brightness change to obtain the watermark-containing image Img 'after transformation enhancement' _encoded 。

In order to improve the robustness of a video digital watermark extraction network in the training process, aggressivity is added to a watermark-containing image, such as adding random noise, gaussian blur, JPEG image compression or changing brightness. When the trained video digital watermark is used for embedding and extracting the network, an image transformation module is not needed.

In one embodiment, the video digital watermark extraction network in the step S2 is used to extract Img' _encoded The medium watermark W' specifically includes:

step S211: img' _encoded Converting the RGB color space into the YCbCr color space, and extracting a Y component matrix H in the YCbCr color space, wherein the size of H is H multiplied by w; partitioning a component matrix H into 8 x 8 sized sets of sub-blocks B _y (i) I ∈ N, N = (h × w)/(8 × 8); DWT conversion is carried out on B (i) to obtain a conversion subblock set B _dwt (i)；

As shown in FIG. 3, B _y (i) Y-channel data blocks representing a YCbCr color space, the size of which is 1 × 8 × 8; b _y (i) After DWT, the data become 4 multiplied by 4 data block B _dwt (i)；

Step S212: transforming sub-block set B _dwt (i) Transmitting the video digital watermark to a video digital watermark extraction network, and extracting the watermarkAfter the convolution module operates, outputting a watermark array W', wherein, as shown in fig. 3, the watermark extraction convolution module includes 4 two-dimensional convolution layers, 1 average pooling layer and 1 full-connection layer, a Batch Normalization (Batch Normalization) layer and a ReLU activation layer are used between the 4 two-dimensional convolution layers, and the output channel depths of the 4 two-dimensional convolution layers are respectively 16, 16 and 1;

the step S211 is to obtain 4 × 4 × 4 data block B _dwt (i) Inputting a video digital watermark extraction network, and generating a data block B 'with the size of 1 multiplied by 4 through 4 two-dimensional convolution' _dwt (i) (ii) a Then generating W' (i) with the size of 1 through global average pooling (mean pooling) and full connected layers (full connected layers);

step S213: calculating the mean square error Loss value Loss of W' and W according to the formula (2) _msg ；

Step S214: and (3) constructing a total loss function as shown in a formula (3), and updating parameters of the video digital watermark embedding network and the video digital watermark extracting network through back propagation:

Loss _total ＝γ _img Loss _img +γ _msg Loss _msg (3)

wherein, gamma is _img 、γ _msg Are respectively Loss _img And Loss _msg And (4) weighting.

And continuously updating and optimizing parameters of the video digital watermark embedding network and the video digital watermark extracting network according to the total loss function until the trained video digital watermark embedding and extracting network is obtained.

In one embodiment, the step S3: extracting a key frame of a video to be embedded with a watermark, inputting the key frame and the watermark into a trained video digital watermark embedding network, outputting the key frame embedded with the watermark, and embedding the key frame embedded with the watermark into the video to be embedded with the watermark, wherein the method specifically comprises the following steps:

step S301: according to the frame extraction rule, the video to be embedded with the watermark: every predetermined frame, decimatingTaking a frame as a key frame F _origin Converting the watermark character string to be embedded into a binary string watermark array W;

step S302: f is to be _origin W, inputting the trained video digital watermark and embedding the video digital watermark into a network to obtain a video frame F containing the watermark _water ；

Step S303: marking the watermarked video frame F according to the extraction frame rule _water And putting back the video to be embedded with the watermark.

In one embodiment, the step S4: extracting a frame to be detected of a video containing a watermark, inputting the trained video digital watermark extraction network, and extracting the watermark, specifically comprising:

step S401: acquiring the first 200 frames of the video containing the watermark as a frame F to be detected for the watermark;

step S402: inputting F into the trained video digital watermark extraction network, and when the flag bit of the extracted watermark sequence is matched with the embedded watermark flag bit, ending the watermark extraction operation, and turning to the step S406; if the matched watermark can not be extracted after the F extraction is finished, the step S403 is carried out to carry out video deep watermark extraction;

step S403: each video frame F to be detected and all key frames F _origin Carrying out comparison and correction;

step S404: comparing image similarity based on SIFT characteristics, matching by using a K nearest neighbor algorithm, and aligning the video to be detected with a correction frame by using a homography matrix in a rotating and converting mode; wherein, the homography matrix H is shown as formula (4):

wherein, [ x ] ₁ y ₁ 1] ^T And [ x ] ₂ y ₂ 1] ^T Respectively representing the homogeneous coordinates of the video frame to be detected and the correction frame, and calculating the average value of the coordinates ₂₂ If the homography matrix is set to be 1, 8 unknown parameters exist in the homography matrix, each corresponding pixel point can generate 2 equations, one x equation and one y equation, and therefore four pixel points are needed to solve the problemSolving a homography matrix H; selecting qualified pixel points as inner group points through a random sampling consistency algorithm;

step S405: comparing each video frame F to be detected with the correction frame F _origin If the number of the inner cluster points is set to be less than or equal to 25% of all the characteristic points, skipping the current correction frame; if the correction frame is larger than 25%, temporarily storing the matched correction frame; after the comparison of the video frame to be detected and all correction frames is completed, sequentially aligning the frames before and after the video frame to be detected with the correction frames according to a matching strategy from inside to outside, extracting the watermark through steps S401 and S402, if the watermark is not extracted, switching to the next video frame to be detected, and repeating the steps until the watermark is extracted or the complete video frame to be detected is detected;

step S406: the user information contained in the watermark is extracted.

Example two

As shown in fig. 4, an embodiment of the present invention provides a deep learning-based video digital watermark embedding and extracting system, which includes the following modules:

constructing a training set module 1: extracting a preset number of video frames from the public video and cutting to obtain an input image; generating a random binary string as watermark information data, and constructing a training set by the input image and the watermark information data;

the network training module 2: inputting the images and watermark information data in the training set into the video digital watermark embedding and extracting network for training to obtain the trained video digital watermark embedding and extracting networkComplexing; the video digital watermark embedding and extracting network comprises: the video digital watermark embedding network is used for embedding the watermark W into the input image Img to obtain the image Img containing the watermark _encoded (ii) a The image conversion module is used for converting Img _encoded Performing attack transformation to obtain Img' _encoded (ii) a And the video digital watermark extraction network is used for extracting Img' _encoded Medium watermark W' based on Img _encoded Respectively constructing loss functions with Img and W' and W, and updating network parameters until the trained video digital watermark is embedded into and extracted from the network;

video digital watermark embedding module 3: extracting key frames of a video to be embedded with the watermark, inputting the key frames and the watermark containing user information into a trained video digital watermark embedding network, outputting the key frames embedded with the watermark, and putting the key frames embedded with the watermark back into the video to be embedded with the watermark;

the video digital watermark extraction module 4: and extracting the frame to be detected of the video containing the watermark, inputting the trained video digital watermark extraction network, extracting the watermark and acquiring the user information in the watermark.

The above examples are provided only for the purpose of describing the present invention, and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be within the scope of the invention.

Claims

1. A video digital watermark embedding and extracting method based on deep learning is characterized by comprising the following steps:

step S1: extracting a preset number of video frames from the public video and cutting to obtain an input image; generating a random binary string containing user information as watermark information data, and constructing a training set by the input image and the watermark information data;

step S2: inputting the images and watermark information data in the training set into a video digital watermark embedding and extracting network together for training to obtain a trained video digital watermark embedding and extracting network; wherein the video frequencyThe word watermark embedding and extracting network comprises: the video digital watermark embedding network is used for embedding the watermark W into the input image Img to obtain the image Img containing the watermark _encoded (ii) a The image conversion module is used for converting Img _encoded Performing attack transformation to obtain Img' _encoded (ii) a And the video digital watermark extraction network is used for extracting Img' _encoded Medium watermark W' based on Img _encoded Respectively constructing loss functions with Img and W' and W, and updating network parameters until the trained video digital watermark is embedded into and extracted from the network;

and step S3: extracting key frames of a video to be embedded with a watermark, inputting the key frames and watermark data containing user information into a trained video digital watermark embedding network, outputting the key frames embedded with the watermark, and putting the key frames embedded with the watermark back into the video to be embedded with the watermark;

and step S4: and extracting the frame to be detected of the video containing the watermark, inputting the frame to be detected into a trained video digital watermark extraction network, extracting the watermark and acquiring the user information in the watermark.

2. The deep learning-based video digital watermark embedding and extracting method of claim 1, wherein in step S2, the video digital watermark embedding network embeds the watermark W into the input image Img to obtain the image Img with the watermark _encoded The method specifically comprises the following steps:

step S201: acquiring the watermark information data and converting the watermark information data into a watermark array W e {0,1} ^N N is the length of W; if the length of the watermark array is insufficient, then supplementing 0 to a preset length N; a first bit of W is provided with a flag bit for identifying whether the watermark is correct or not;

step S202: acquiring an image Img in the training set, selecting a region with a preset size from the center of the Img, converting the region from an RGB color space to a YCbCr color space, and extracting a Y component matrix H in the region, wherein the size of the H is H multiplied by w; partitioning the component matrix H into a set of subblocks B (i) of size 8 × 8, i ∈ N, N = (H × w)/(8 × 8); carrying out DWT (discrete wavelet transform) on the B (i) to obtain a transform subblock set B _dwt (i)；

Step S203: will be describedThe two-dimensional watermark array W and the transform sub-block set B _dwt (i) Inputting the video digital watermark embedding network, outputting a transformed sub-block set B 'containing the watermark after the operation of the watermark embedding convolution module' _dwt (i) Wherein the watermark embedding convolution module comprises: 5 two-dimensional convolutional layers, wherein a batch normalization layer and a ReLU activation layer are used between the two-dimensional convolutional layers;

step S204: pair of the encoded transform subblock set B' _dwt (i) Each sub-block is respectively subjected to inverse DWT conversion to obtain a Y 'component containing the watermark, and the Y' component is combined into a YCbCr color space and then converted into an RGB color space; obtaining an image Img containing a watermark _encoded ；

Loss _img ＝LPIPS(Img,Img _encoded ) (1)。

3. The deep learning-based video digital watermark embedding and extracting method of claim 2, wherein the image transformation module in step S2 is configured to apply Img _encoded Carrying out conversion attack to obtain Img' _encoded The transformation attack specifically includes:

will Img _encoded Input to the image conversion module, for Img _encoded Adding random noise, gaussian blur, JPEG image compression or brightness change to obtain the watermark-containing image Img 'after transformation enhancement' _encoded 。

4. The deep learning-based video digital watermark embedding and extracting method of claim 3, wherein the video digital watermark extracting network in the step S2 is used for extracting Img' _encoded The medium watermark W' specifically includes:

step S211: img' _encoded Converting the RGB color space into the YCbCr color space, and extracting a Y component matrix H in the YCbCr color space, wherein the size of H is H multiplied by w; partitioning the component matrix H into a set of subblocks B (i) of size 8 × 8, i ∈ N, N = (H × w)/(8 × 8); DWT conversion of B (i)Transforming to obtain transformed subblock set B _dwt (i)；

Step S212: transforming sub-block set B _dwt (i) The video digital watermark is transmitted into the video digital watermark extraction network, and a watermark array W' is output after the operation of a watermark extraction convolution module, wherein the watermark convolution module comprises: 4 two-dimensional convolution layers, 1 average pooling layer and 1 full-connection layer, wherein a batch normalization layer and a ReLU activation layer are used between the two-dimensional convolution layers;

Loss _total ＝γ _img Loss _img +γ _msg Loss _msg (3)

wherein, γ _img 、γ _msg Are respectively Loss _img And Loss _msg And (4) weighting.

5. The deep learning based video digital watermark embedding and extracting method according to claim 4, wherein the step S3: extracting a key frame of a video to be embedded with a watermark, inputting the key frame and the watermark into a trained video digital watermark embedding network, outputting the key frame embedded with the watermark, and putting the key frame embedded with the watermark back into the video to be embedded with the watermark, specifically comprising:

step S301: according to the frame image extraction rule, the watermark video to be embedded is: every predetermined frame, extracting a frame as a key frame F _origin Converting the watermark character string to be embedded into a binary string watermark array W;

extracting one frame as a key frame F for every 120 frames of the video to be embedded with the watermark _origin (ii) a And will contain user informationConverting the watermark character string to be embedded into a binary string watermark array W;

step S302: f is to be _origin And W inputting the trained video digital watermark embedding network to obtain a watermark-containing video frame F _water ；

Step S303: watermark-containing video frames F according to the extraction frame rule _water And putting back the video to be embedded with the watermark.

Putting back the video frame F containing the watermark every 120 frames of the video to be embedded with the watermark _water 。

6. The deep learning-based video digital watermark embedding and extracting method according to claim 4, wherein the step S4: extracting a frame to be detected of a video containing a watermark, inputting the frame to be detected into a trained video digital watermark extraction network, and extracting the watermark, specifically comprising:

step S402: inputting F into the trained video digital watermark extraction network, and when the extracted flag bit of the watermark sequence is matched with the embedded watermark flag bit, ending the watermark extraction operation, and turning to the step S406; if the matched watermark can not be extracted after the F extraction is finished, the step S403 is carried out to carry out video deep watermark extraction;

step S404: comparing image similarity based on SIFT features, matching by using a K nearest neighbor algorithm, and aligning and correcting the to-be-detected video and the correction frame by using a homography matrix in a rotating and converting mode; wherein the homography matrix H is shown as formula (4):

wherein, [ x ] ₁ y ₁ 1] ^T And [ x ] ₂ y ₂ 1] ^T Respectively representing the homogeneous coordinates of the video frame to be detected and the correction frame, and calculating h ₂₂ Setting the homography matrix to be 1, wherein the homography matrix has 8 unknown parameters, and each corresponding pixel point can generate 2 equations, namely an x equation and a y equation, so that four pixel points are needed to solve the homography matrix H; selecting qualified pixel points as inner group points through a random sampling consistency algorithm;

step S405: comparing each video frame F to be detected with the correction frame F _origin If the number of the inner cluster points is set to be less than or equal to 25% of all the feature points, skipping the current correction frame; if the correction frame is larger than 25%, temporarily storing the matched correction frame; after the comparison of the video frame to be detected and all the correction frames is completed, sequentially aligning the frames before and after the video frame to be detected with the correction frames according to a matching strategy from inside to outside, and then extracting the watermark through steps S401 and S402, if the watermark is not extracted, switching to the next video frame to be detected and repeating the steps until the watermark is extracted or the complete part of the video frame to be detected is detected;

step S406: and extracting user information contained in the watermark.

7. A video digital watermark embedding and extracting system based on deep learning is characterized by comprising:

constructing a training set module: extracting a preset number of video frames from the public video and cutting to obtain an input image; generating a random binary string as watermark information data, and constructing a training set by the input image and the watermark information data;

a network training module: inputting the images and watermark information data in the training set into a video digital watermark embedding and extracting network together for training to obtain a trained video digital watermark embedding and extracting network; wherein, the video digital watermark embedding and extracting network comprises: the video digital watermark embedding network is used for embedding the watermark W into the input image Img to obtain the image Img containing the watermark _encoded (ii) a Image transformation attackThe click module is used for sending Img _encoded Performing conversion attack to obtain Img' _encoded (ii) a And the video digital watermark extraction network is used for extracting Img' _encoded Medium watermark W' based on Img _encoded Respectively constructing loss functions with Img and W' and W, and updating network parameters until the trained video digital watermark is embedded into and extracted from the network;

video digital watermark embedding module: extracting key frames of a video to be embedded with a watermark, inputting the key frames and a watermark binary string containing user information into a trained video digital watermark embedding network, outputting the key frames embedded with the watermark, and putting the key frames embedded with the watermark back into the video to be embedded with the watermark;

video digital watermark extraction module: and extracting the frame to be detected of the video containing the watermark, inputting the frame to be detected into a trained video digital watermark extraction network, extracting the watermark and acquiring the user information in the watermark.