CN114727113B - Method and device for robust video watermarking in real-time scene - Google Patents

Method and device for robust video watermarking in real-time scene Download PDF

Info

Publication number
CN114727113B
CN114727113B CN202210632330.7A CN202210632330A CN114727113B CN 114727113 B CN114727113 B CN 114727113B CN 202210632330 A CN202210632330 A CN 202210632330A CN 114727113 B CN114727113 B CN 114727113B
Authority
CN
China
Prior art keywords
watermark
video
encoder
network
frames
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210632330.7A
Other languages
Chinese (zh)
Other versions
CN114727113A (en
Inventor
柯泽辉
吴庆耀
白剑
黄海亮
梁瑛玮
张海林
鲁和平
李长杰
陈焕然
李乐
王浩
洪行健
冷冬
丁一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yifang Information Technology Co ltd
Original Assignee
Guangzhou Easefun Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Easefun Information Technology Co ltd filed Critical Guangzhou Easefun Information Technology Co ltd
Priority to CN202210632330.7A priority Critical patent/CN114727113B/en
Publication of CN114727113A publication Critical patent/CN114727113A/en
Application granted granted Critical
Publication of CN114727113B publication Critical patent/CN114727113B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • H04N19/467Embedding additional information in the video signal during the compression process characterised by the embedded information being invisible, e.g. watermarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention provides a method and a device for robust video watermarking in a real-time scene, wherein the method comprises the following steps: constructing a deep neural network for training watermark embedding, the deep neural network comprising an encoder-decoder module; training the deep neural network so that an encoder-decoder module can complete lossless embedding and extraction of the watermark without video compression interference; training the deep neural network to enable an encoder-decoder module to complete lossless embedding and extraction of the watermark under the condition of video compression interference; sampling and extracting an original video to obtain continuous frames, and inputting the continuous frames and a watermark to be added into an encoder network to generate a watermark video. The device uses the method. The invention ensures the rationality and reliability of each training stage of the deep neural network, realizes the method of robust video watermarking in a real-time scene, and solves the problem of the deep neural network in the prior art in the process of modeling a compressed video.

Description

Method and device for robust video watermarking in real-time scene
Technical Field
The invention relates to the technical field of videos, in particular to a method and a device for robust video watermarking in a real-time scene, which are particularly suitable for invisible video watermarking embedding in a real-time environment.
Background
The purpose of video watermarking is to hide information in the video stream by a method that is difficult to remove or tamper with without changing the quality of the video content. The application scenes comprise video copyright protection, video fingerprint tracking and the like.
For the conventional video watermark embedding method, there are roughly three schemes according to the domain of watermark embedding: the spatial domain (embedded on the original uncompressed video), the transform domain (embedded in the video codec network), and the compressed domain (embedded in the compressed video stream). In addition to the conventional video watermarking method, the application of deep learning to watermarking is also concerned in recent years. At present, a watermark embedding method for deep learning is mainly applied to images, such as a HiDDeN model, a RedMark model and the like.
The conventional method still performs well when the video embedded with the watermark is not attacked, however, the classical watermarking technology based on the algorithm such as Discrete Cosine Transform (DCT) or discrete wavelet transform is less resistant to video processing operations such as cropping and scaling. If the leaked video undergoes any of these geometric transformations, the watermark may be corrupted.
The existing watermark method based on deep learning has strong robustness for the geometric transformation, but most of the existing methods are applied to images, and for videos, the existing watermark method has two defects that:
1. the original data is inevitably compressed by the transmission of the video, and the deep neural network is difficult to model in the training process due to the undifferentiated noise, so that the migration of the image method to the video is the maximum obstruction;
2. the real-time performance is required, the application of the watermark to the video often needs to consider the real-time performance, and the huge calculation amount of the deep neural network makes the existing deep learning model difficult to meet the requirement.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method and a device for watermarking a robust video in a real-time scene, which solve the defects that a deep neural network is difficult to model a compressed video and cannot meet the real-time requirement, so that huge calculation amount is caused.
The technical scheme of the invention is realized as follows: a method for robust video watermarking in a real-time scene comprises the following steps:
constructing a deep neural network for training watermark embedding, the deep neural network comprising an encoder-decoder module;
training the deep neural network so that an encoder-decoder module can complete lossless embedding and extraction of the watermark without video compression interference;
training the deep neural network to enable an encoder-decoder module to complete lossless embedding and extraction of the watermark under the condition of video compression interference;
sampling and extracting an original video to obtain continuous frames, and inputting the continuous frames and a watermark to be added into an encoder network to generate a watermark video.
In one embodiment, the step of training the deep neural network so that the encoder-decoder module can perform lossless embedding and extraction of the watermark without video compression interference includes:
extracting an individual frame of a training set video, randomly generating a watermark, and inputting the individual frame and the watermark into an encoder network together to obtain a watermark video frame;
the watermark video frame is scrambled through a direct input layer and a differentiable noise layer and then input into a decoder network to obtain two output decoding watermarks;
calculating the information loss of the two decoded watermarks and the original input watermark, and reversely transmitting the loss to the encoder-decoder module;
and repeating the steps until the encoder-decoder module can complete the lossless embedding and extraction of the watermark without video compression interference.
In one embodiment, the step of training the deep neural network so that the encoder-decoder module can perform lossless embedding and extraction of the watermark in the presence of video compression interference includes:
randomly extracting an individual frame of a training set video, randomly generating a watermark, and inputting the individual frame and the watermark into an encoder network together to obtain a watermark video frame;
respectively inputting watermark video frames into a decoder network after being directly input and scrambled by a differentiable noise layer and inputting the watermark video frames into the decoder network after being compressed and scrambled by video to obtain three output decoding watermarks;
calculating the information loss of the three decoding watermarks and the original input watermark, reversely transmitting the input loss after disturbing the differentiable noise layer to an encoder-decoder module, and reversely transmitting the input loss after compressing and disturbing the video to a decoder network;
and repeating the steps until the encoder-decoder module can complete the lossless embedding and extraction of the watermark under the condition of video compression interference.
In one embodiment, the step of sampling and extracting the original video to obtain continuous frames, inputting the continuous frames and the watermark to be added into an encoder network, and generating the watermarked video includes:
acquiring the frame number N of an original video;
setting an embedding interval K and a continuous embedding frame number B;
sampling and extracting the original video according to a rule of extracting B frames at every K frames to obtain continuous frames;
and inputting the continuous frames and the watermark to be added into an encoder network to generate a watermark video.
In one embodiment, the method for sampling and extracting the original video to obtain the consecutive frames includes:
acquiring the resolution H multiplied by W of an original video;
setting a scale factor k;
calculating a resolution R = h × w of the corner cut block by a scale factor, wherein:
h = H × k, H being the height resolution of the chamfer;
w = wxk, W being the height resolution of the cut angle;
respectively extracting four corner blocks at the upper left corner, the upper right corner, the lower left corner and the lower right corner of the same original video frame;
extracting each corner cut block of the continuous video frame, and inputting the corner cut blocks and the watermark to be added into an encoder network to obtain the watermark corner cut blocks of the continuous video frame;
and covering the watermark corner cut block to the corresponding position of the corresponding original video frame to generate a watermark video.
In one embodiment, the training set video is obtained through a hollywood2 data set, and the deep neural network for training watermark embedding is constructed, where the deep neural network includes a coder-decoder module, and specifically includes:
a neural network model including an encoder-decoder module was designed and constructed for the backbone using a mobilene.
The invention also provides a device for robust video watermarking in a real-time scene, which comprises the following steps:
a construction module for constructing a deep neural network for training watermark embedding, the deep neural network comprising an encoder-decoder module;
a first training module, for training the deep neural network so that the encoder-decoder module can complete the lossless embedding extraction of the watermark without video compression interference;
the second training module is used for training the deep neural network so that the encoder-decoder module can complete lossless embedding and extraction of the watermark under the condition of video compression interference;
and the generating module is used for sampling and extracting the original video to obtain continuous frames, and inputting the continuous frames and the watermark to be added into an encoder network to generate the watermark video.
In one embodiment, the first training module is specifically configured to:
extracting an individual frame of a training set video, randomly generating a watermark, and inputting the individual frame and the watermark into an encoder network together to obtain a watermark video frame;
respectively scrambling watermark video frames through a direct input layer and a differentiable noise layer and then inputting the scrambled watermark video frames into a decoder network to obtain two output decoding watermarks;
calculating the information loss of the two decoded watermarks and the original input watermark, and reversely transmitting the loss to the encoder-decoder module;
repeating the steps until the encoder-decoder module can complete the lossless embedding and extraction of the watermark without video compression interference;
the second training module is specifically configured to:
randomly extracting an individual frame of a training set video, randomly generating a watermark, and inputting the individual frame and the watermark into an encoder network together to obtain a watermark video frame;
respectively inputting watermark video frames into a decoder network after being directly input and scrambled by a differentiable noise layer and inputting the watermark video frames into the decoder network after being compressed and scrambled by video to obtain three output decoding watermarks;
calculating the information loss of the three decoding watermarks and the original input watermark, reversely transmitting the input loss after the differentiable noise layer is disturbed to an encoder-decoder module, and reversely transmitting the input loss after the video compression is disturbed to a decoder network;
repeating the steps until the encoder-decoder module can complete the lossless embedding and extraction of the watermark under the condition of video compression interference;
the generation module is specifically configured to:
acquiring the frame number N of an original video;
setting an embedding interval K and a continuous embedding frame number B;
sampling and extracting the original video according to a rule of extracting B frames at every K frames to obtain continuous frames;
inputting the continuous frames and the watermark to be added into an encoder network to generate a watermark video;
or the generating module is specifically configured to:
acquiring the resolution H multiplied by W of an original video;
setting a scale factor k;
calculating a resolution R = h × w of the corner cut block by a scale factor, wherein:
h = H × k, H being the height resolution of the chamfer;
w = wxk, W being the height resolution of the cut angle;
respectively extracting four corner blocks at the upper left corner, the upper right corner, the lower left corner and the lower right corner of the same original video frame;
extracting each corner cut block of the continuous video frame, and inputting the corner cut blocks and the watermark to be added into an encoder network to obtain the watermark corner cut blocks of the continuous video frame;
and covering the watermark corner cut block to the corresponding position of the corresponding original video frame to generate a watermark video.
The invention also provides a computer device, which comprises a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor executes the computer program to realize the robust video watermarking method under the real-time scene.
The present invention also provides a computer storage medium having a computer program stored thereon, where the computer program is executed by a processor to implement the method for robust video watermarking in a real-time scenario.
The embodiment of the invention constructs the encoder network and the decoder network in a deep learning network in a modularized manner, common video noise simulation is added in a noise simulation layer, the deep neural networks are respectively trained so that the encoder-decoder module can finish the lossless embedding and extraction of the watermark under the condition of no video compression interference, the lossless embedding and extraction of the watermark is finished under the condition of no video compression interference and is used as the basis for finishing the lossless embedding and extraction of the watermark under the condition of video compression interference, the rationality and the reliability of each training stage of the deep neural network are ensured, the method for robust video watermarking under a real-time scene is realized, the problem of the deep neural network in the compressed video modeling process in the prior art is solved, meanwhile, the original video is sampled and extracted to obtain continuous frames, the continuous frames and the watermark to be added are input into the encoder network to generate the watermark video, the calculated amount is reduced, and the real-time requirement can be met.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a robust video watermarking method in a real-time scenario according to a first embodiment of the present invention;
FIG. 2 is a flow chart of a preferred embodiment of S14 in a first example of the present invention;
FIG. 3 is a schematic diagram of a scheme for extracting consecutive frames for embedding according to a preferred embodiment of S14 in a first example of the present invention;
FIG. 4 is a flowchart of a method for robust video watermarking in a real-time scenario according to a second embodiment of the present invention;
FIG. 5 is a diagram illustrating a deep neural network training process according to a second embodiment of the present invention;
fig. 6 is a schematic diagram of a scheme of embedding a watermark by using a chamfer in S210-S214 according to a second embodiment of the present invention;
fig. 7 is a block diagram of a robust video watermarking apparatus in a real-time scenario according to a third embodiment of the present invention;
fig. 8 is a schematic view of the internal structure of a computer according to still another embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is to be understood that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and well-known modules, units and their connections, links, communications or operations with each other are not shown or described in detail. Furthermore, the described features, architectures, or functions may be combined in any manner in one or more embodiments. It should be understood by those skilled in the art that the following embodiments are illustrative only and are not intended to limit the scope of the present invention. It will also be readily understood that the modules or units or processes of the embodiments as described herein and illustrated in the figures may be combined and designed in a wide variety of different configurations. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
The first embodiment:
referring to fig. 1 to fig. 3, an embodiment of the present invention discloses a method for robust video watermarking in a real-time scenario, including:
s11, constructing a deep neural network for training watermark embedding, wherein the deep neural network comprises an encoder-decoder module.
In the present embodiment, a neural network model including an encoder-decoder module including an encoder network and a decoder network is designed and constructed for a backbone by using a mobilene, the encoder network is used for generating a watermarked video frame from an input video frame and a watermark, and the decoder network is used for decoding the watermarked video frame to obtain a decoded watermark.
And S12, training the deep neural network so that the encoder-decoder module can complete lossless embedding and extraction of the watermark without video compression interference.
In the step, through training of the neural network, the encoder-decoder module can realize lossless embedding and extraction of the uncompressed video watermark through repeated training, and the specific mode can be that the neural network is trained by adding modes including but not limited to video random cutting and video combination on the basis of adding common image disturbance noise.
And S13, training the deep neural network so that the encoder-decoder module can complete lossless embedding and extraction of the watermark under the condition of video compression interference.
On the basis of S12, after the neural network completes the lossless embedding extraction training of the uncompressed video watermark, the lossless embedding extraction training is further carried out on the compressed video watermark, so that the rationality and reliability of each training stage of the deep neural network are guaranteed, and the method for robust video watermark in a real-time scene is realized. Compression methods include, but are not limited to, MPEG compression.
S14, sampling and extracting the original video to obtain continuous frames, and inputting the continuous frames and the watermark to be added into an encoder network to generate a watermark video.
After the training of the neural network is completed in S11-S13, the neural network can be used for embedding and extracting the watermark, based on the requirement of the watermark embedding instantaneity and the disturbance of the original frame for video interframe compression, the invention provides a method for extracting continuous frames at intervals for embedding, S14 further includes S141-S144, in which:
s141, acquiring the frame number N of the original video.
As an example and not by way of limitation, this step assumes that the number of frames of the original video is 100000 frames.
S142, an embedding interval K and a continuous embedding frame number B are set.
As an example and not by way of limitation, this step assumes K =500, b =10.
S143, sampling and extracting the original video according to the rule of extracting the B frames at every K frames to obtain continuous frames.
In correspondence with the specific example of S141 to S142 described above, this step performs watermark embedding every 500 frames, 10 frames are continuously embedded at a time, and all the watermark-embedded frames are regarded as continuous frames.
And S144, inputting the continuous frames and the watermark to be added into an encoder network to generate a watermark video.
S141-S144 greatly reduce the calculated amount of the neural network by sampling at intervals in the original video and embedding the watermark, and can meet the real-time requirement of watermark embedding.
In the embodiment, when the watermark is extracted, because the real-time requirement is not made, the cross-frame extraction can be performed according to the original setting parameters N, K and B, and the watermark can also be extracted frame by frame.
The scheme of extracting continuous frames for watermark embedding corresponding to the steps S141 to S144 can perform cross-frame extraction with reference to the original setting parameters N, K, and B when extracting the watermark, thereby achieving the effect that the watermark is invisible but can be detected, enabling the video watermark to be more concealed, effectively avoiding malicious removal of the watermark by a third party, and facilitating the purposes of video copyright protection, video fingerprint tracking, and the like.
The embodiment of the invention constructs the encoder network and the decoder network in a deep learning network in a modularized manner, common video noise simulation is added in a noise simulation layer, the deep neural networks are respectively trained so that the encoder-decoder module can finish the lossless embedding and extraction of the watermark under the condition of no video compression interference, the lossless embedding and extraction of the watermark is finished under the condition of no video compression interference and is used as the basis for finishing the lossless embedding and extraction of the watermark under the condition of video compression interference, the rationality and the reliability of each training stage of the deep neural network are ensured, the method for robust video watermarking under a real-time scene is realized, the problem of the deep neural network in the compressed video modeling process in the prior art is solved, meanwhile, the original video is sampled and extracted to obtain continuous frames, the continuous frames and the watermark to be added are input into the encoder network to generate the watermark video, the calculated amount is reduced, and the real-time requirement can be met.
Second embodiment:
referring to fig. 4 to fig. 6, an embodiment of the present invention discloses another method for robust video watermarking in a real-time scenario. The method comprises the following steps:
s201, constructing a deep neural network for training watermark embedding, wherein the deep neural network comprises an encoder-decoder module.
S201 is the same as the corresponding step of the first embodiment, and is not described again here.
S202, extracting the single frame of the training set video, randomly generating the watermark, and inputting the single frame and the watermark into an encoder network together to obtain the watermark video frame.
As a preferred scheme but not limited, the training set video is obtained through a holywood 2 data set, the batch _ size is set to 12, in the step, a mode of randomly extracting training video frames and randomly generating random bit number watermarks is adopted, and the random bit number watermarks are input into an encoder network, so that the randomness and compatibility of neural network training can be guaranteed to a greater extent, and the reliability of subsequent watermark embedding and extraction is guaranteed.
S203, the watermark video frame is scrambled by the direct input and the differentiable noise layer and then input to a decoder network to obtain two output decoding watermarks.
The disturbing manner of the differentiable noise layer in the present embodiment may be by adding, but not limited to, video random cropping and video combination on the basis of adding common image disturbance noise. After the decoder network output of this step, the noiseless watermark and the differentiable noise watermark are obtained.
S204, calculating the information loss of the two decoded watermarks and the original input watermark, and reversely transmitting the loss to the encoder-decoder module.
This step compares the original input watermark with the noise-free watermark and the differentiable noise watermark obtained in S203, calculates the loss function, and then transmits the loss function in reverse to the encoder-decoder module to achieve the purpose of correction.
And S205, repeating the steps S202-S204 until the encoder-decoder module can complete the lossless embedding and extraction of the watermark without video compression interference.
When the similarity of the noise-free watermark and the differentiable noise watermark with the original input watermark is judged to be in accordance with the threshold value, the encoder-decoder module is determined to be capable of completing lossless embedding and extraction of the watermark under the condition of no video compression interference.
As an improvement, but not a limitation, to the present invention, the following S206-S209 are proposed for realizing that the encoder-decoder module can perform lossless embedding extraction of the watermark in the presence of video compression disturbance (non-differential calculation).
S206, randomly extracting the single frame of the training set video, randomly generating the watermark, and inputting the single frame and the watermark into an encoder network together to obtain the watermark video frame.
This step is the same as S202, except that the encoder-decoder module at this time can already complete the lossless embedding and extraction of the watermark without video compression interference.
And S207, respectively inputting the watermark video frame to a decoder network through direct input, scrambling and inputting the scrambled watermark video frame to the decoder network through a differentiable noise layer, and obtaining three output decoding watermarks.
This step inputs the watermarked video frame after video compression scrambling to the decoder network, and the video compression scrambling mode includes but is not limited to MPEG compression. After the decoder network output of this step, obtain the noiseless watermark, differentiable noise watermark and inseparable noise watermark.
S208, calculating the information loss of the three decoding watermarks and the original input watermark, reversely transmitting the input loss after the differential noise layer is disturbed to the encoder-decoder module, and reversely transmitting the input loss after the video is compressed and disturbed to the decoder network.
In this step, the original input watermark is compared with the noise-free watermark, the differentiable noise watermark and the non-differentiable noise watermark obtained in step S207, a loss function is calculated, and then the loss function is reversely transmitted to the encoder-decoder module to achieve the purpose of correction.
And S209, repeating the steps S206-S208 until the encoder-decoder module can complete the lossless embedding and extraction of the watermark under the condition of video compression interference.
When the similarity of the noise-free watermark, the differentiable noise watermark and the undifferentiated noise watermark with the original input watermark is judged to accord with the threshold value, the encoder-decoder module is determined to be capable of completing lossless embedding and extraction of the watermark under the condition of no video compression interference.
Based on the requirements of the watermark embedding instantaneity and the watermark extraction accuracy, the invention further provides a multi-corner cut embedding method for real-time embedding, which comprises S210-S214, wherein:
s210, acquiring the resolution H multiplied by W of the original video.
As an example and not by way of limitation, this step assumes that the resolution of the original video is 1920 x 1080.
S211, setting a scale factor k, and calculating a resolution R = h × w of the corner cut block by the scale factor, wherein:
h = H × k, H being the height resolution of the chamfer;
w = W × k, W being the height resolution of the chamfer.
As an example and not by way of limitation, this step assumes that k =0.05 and the resolution of the available corner cube R =96 × 54.
S212, four corner cutting blocks are respectively extracted from the upper left corner, the upper right corner, the lower left corner and the lower right corner of the same original video frame.
In this step, S212 operation may be performed on all frames of the original video, and four corner blocks are extracted from each video frame.
And S213, extracting each corner cut block of the continuous video frame, and inputting the corner cut blocks and the watermark to be added into an encoder network to obtain the watermark corner cut blocks of the continuous video frame.
In S213, images containing watermarks in the corner blocks of the original video frame can be obtained, and the same video frame can obtain watermark images at four corresponding corners.
And S214, covering the watermark corner cut block to the corresponding position of the corresponding original video frame to generate a watermark video.
In this step, the watermark of the watermarked video also exists in an invisible but detectable form, and this embodiment does not limit the specific manner of the overlay.
S215, when the watermark is extracted, the corner cutting positions can be calculated through H, W and k, and the mode of the extraction results of the 4 corner cutting blocks is selected, so that the more accurate watermark extraction is obtained.
Corresponding to the scheme of embedding the watermark by using the chamfer in S210-S214, when the watermark is extracted, the chamfer position can be calculated by H, W and k, and the extraction results of the 4 chamfer blocks are subjected to mode selection, so as to obtain a more accurate extracted watermark.
The embodiment discloses another neural network for simultaneously realizing embedding and extracting of a differentiable watermark and an undifferentiated watermark and a training mode thereof, improves the robustness of a model to common video noise types, and ensures that the watermark embedded into a video is invisible and can be detected on the premise of ensuring the timeliness. Meanwhile, by embedding watermarks into all cut corners of the video frame in real time, the requirements of watermark embedding instantaneity and watermark extraction accuracy can be met on the premise of reducing the calculated amount of a neural network.
It should be noted that those skilled in the art can choose neural network models of different depths/architectures according to the real-time/accuracy/robustness trade-off. Different parameters of the embodiment of the invention can be adjusted to carry out different real-time tradeoffs.
The third embodiment:
referring to fig. 7, the present invention further provides an apparatus 100 for robust video watermarking in a real-time scene, including a constructing module 110, a first training module 120, a second training module 130, and a generating module 140, wherein:
a constructing module 110 connected to the first training module 120, for constructing a deep neural network for training watermark embedding, the deep neural network including an encoder-decoder module;
the first training module 120 is connected with the second training module 130 and is used for training the deep neural network so that the encoder-decoder module can complete the lossless embedding and extraction of the watermark without video compression interference;
a second training module 130, connected to the generating module 140, for training the deep neural network so that the encoder-decoder module can perform lossless embedding and extraction of the watermark in the presence of video compression interference;
the generating module 140 is configured to sample and extract an original video to obtain consecutive frames, and input the consecutive frames and a watermark to be added to an encoder network to generate a watermarked video.
As a preferred solution, but not limited thereto, the first training module 120 is specifically configured to:
extracting an individual frame of a training set video, randomly generating a watermark, and inputting the individual frame and the watermark into an encoder network together to obtain a watermark video frame;
respectively scrambling watermark video frames through a direct input layer and a differentiable noise layer and then inputting the scrambled watermark video frames into a decoder network to obtain two output decoding watermarks;
calculating the information loss of the two decoded watermarks and the original input watermark, and reversely transmitting the loss to the encoder-decoder module;
repeating the steps until the encoder-decoder module can complete the lossless embedding and extraction of the watermark without video compression interference;
the second training module 130 is specifically configured to:
randomly extracting an individual frame of a training set video, randomly generating a watermark, and inputting the individual frame and the watermark into an encoder network together to obtain a watermark video frame;
respectively inputting watermark video frames into a decoder network through direct input, scrambling and inputting the watermark video frames into a differentiable noise layer and inputting the watermark video frames into the decoder network through video compression scrambling to obtain three output decoding watermarks;
calculating the information loss of the three decoding watermarks and the original input watermark, reversely transmitting the input loss after the differentiable noise layer is disturbed to an encoder-decoder module, and reversely transmitting the input loss after the video compression is disturbed to a decoder network;
repeating the steps until the encoder-decoder module can complete the lossless embedding and extraction of the watermark under the condition of video compression interference;
the generating module 140 is specifically configured to:
acquiring the frame number N of an original video;
setting an embedding interval K and a continuous embedding frame number B;
sampling and extracting the original video according to a rule of extracting B frames at every K frames to obtain continuous frames;
inputting the continuous frames and the watermark to be added into an encoder network to generate a watermark video;
or the generating module 140 is specifically configured to:
acquiring the resolution H multiplied by W of an original video;
setting a scale factor k;
calculating a resolution R = h × w of the corner cut block by a scale factor, wherein:
h = H × k, H being the height resolution of the chamfer;
w = wxk, W being the height resolution of the cut angle;
respectively extracting four corner blocks at the upper left corner, the upper right corner, the lower left corner and the lower right corner of the same original video frame;
extracting each corner cut block of the continuous video frame, and inputting the corner cut blocks and the watermark to be added into an encoder network to obtain the watermark corner cut blocks of the continuous video frame;
and covering the watermark corner cut block to the corresponding position of the corresponding original video frame to generate a watermark video.
The modules of this embodiment correspond to the steps of the two method embodiments one to one, and are not redundant here.
The embodiment of the invention constructs the encoder network and the decoder network in a deep learning network in a modularized manner, common video noise simulation is added in a noise simulation layer, the deep neural networks are respectively trained so that the encoder-decoder module can finish the lossless embedding and extraction of the watermark under the condition of no video compression interference, the lossless embedding and extraction of the watermark is finished under the condition of no video compression interference and is used as the basis for finishing the lossless embedding and extraction of the watermark under the condition of video compression interference, the rationality and the reliability of each training stage of the deep neural network are ensured, the method for robust video watermarking under a real-time scene is realized, the problem of the deep neural network in the compressed video modeling process in the prior art is solved, meanwhile, the original video is sampled and extracted to obtain continuous frames, the continuous frames and the watermark to be added are input into the encoder network to generate the watermark video, the calculated amount is reduced, and the real-time requirement can be met.
It is obvious to those skilled in the art that, for convenience and simplicity of description, the above division of each functional module is only used for illustration, and in practical applications, the above function distribution may be performed by different functional modules as needed, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working processes of the system, the apparatus and the unit described above, reference may be made to the corresponding processes in the foregoing method embodiments, and details are not described here again.
Embodiments of the present invention further provide a computer storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for robust video watermarking in a real-time scenario as in the foregoing embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the method for robust video watermarking in the real-time scenarios described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the methods of the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a RAM, a ROM, a magnetic or optical disk, or various other media that can store program code.
Corresponding to the computer storage medium, in an embodiment, there is also provided a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the method for robust video watermarking in a real-time scenario as in the embodiments described above.
The computer device may be a terminal, and its internal structure diagram may be as shown in fig. 8. The computer device comprises a processor, a memory, a network interface, a display screen and an input device which are connected through a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method for robust video watermarking in a real-time scenario. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on a shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
The embodiment of the invention constructs the encoder network and the decoder network in a deep learning network in a modularized manner, common video noise simulation is added in a noise simulation layer, the deep neural networks are respectively trained so that the encoder-decoder module can complete lossless embedding and extraction of the watermark under the condition of no video compression interference, and lossless embedding and extraction of the watermark under the condition of no video compression interference is used as a basis for completing lossless embedding and extraction of the watermark under the condition of video compression interference, thereby ensuring the rationality and reliability of each training stage of the deep neural network, realizing the method for robust video watermarking under a real-time scene, solving the problems existing in the process of modeling the compressed video by the deep neural network in the prior art, simultaneously obtaining continuous frames by sampling and extracting the original video, inputting the continuous frames and the watermark to be added into the encoder network to generate the watermark video, reducing the calculated amount and meeting the real-time requirement.
All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (6)

1. A method for robust video watermarking in a real-time scene is characterized by comprising the following steps:
constructing a deep neural network for training watermark embedding, the deep neural network comprising an encoder-decoder module;
training the deep neural network so that an encoder-decoder module can complete lossless embedding and extraction of the watermark without video compression interference;
training the deep neural network to enable an encoder-decoder module to complete lossless embedding and extraction of the watermark under the condition of video compression interference;
sampling and extracting an original video to obtain continuous frames, and inputting the continuous frames and a watermark to be added into an encoder network to generate a watermark video;
the step of training the deep neural network so that the encoder-decoder module can complete the lossless embedding extraction of the watermark without video compression interference comprises the following steps:
extracting an individual frame of a training set video, randomly generating a watermark, and inputting the individual frame and the watermark into an encoder network together to obtain a watermark video frame;
respectively scrambling watermark video frames through a direct input layer and a differentiable noise layer and then inputting the scrambled watermark video frames into a decoder network to obtain two output decoding watermarks;
calculating the information loss of the two decoded watermarks and the original input watermark, and reversely transmitting the loss to the encoder-decoder module;
repeating the steps until the encoder-decoder module can finish the lossless embedding and extraction of the watermark without video compression interference;
the step of training the deep neural network so that an encoder-decoder module can complete lossless embedding and extraction of the watermark under the interference of video compression comprises the following steps:
when the encoder-decoder module can finish the lossless embedding and extraction of the watermark under the condition of no video compression interference, randomly extracting the single frame of the training set video, randomly generating the watermark, and inputting the single frame and the watermark into an encoder network together to obtain a watermark video frame;
respectively inputting watermark video frames into a decoder network through direct input, scrambling and inputting the watermark video frames into a differentiable noise layer and inputting the watermark video frames into the decoder network through video compression scrambling to obtain three output decoding watermarks;
calculating the information loss of the three decoding watermarks and the original input watermark, reversely transmitting the input loss after the differentiable noise layer is disturbed to an encoder-decoder module, and reversely transmitting the input loss after the video compression is disturbed to a decoder network;
repeating the steps until the encoder-decoder module can complete the lossless embedding and extraction of the watermark under the condition of video compression interference;
the step of sampling and extracting the original video to obtain continuous frames, inputting the continuous frames and the watermark to be added into an encoder network, and generating the watermark video comprises the following steps:
acquiring the resolution H multiplied by W of an original video;
setting a scale factor k;
calculating a resolution of the corner cut block by a scale factor R = h × w, wherein:
h = H × k, H being the height resolution of the chamfer;
w = W × k, W being the width resolution of the cut angle;
respectively extracting four corner cutting blocks at the upper left corner, the upper right corner, the lower left corner and the lower right corner of the same original video frame;
extracting each corner cut block of the continuous video frame, and inputting the corner cut blocks and the watermark to be added into an encoder network to obtain the watermark corner cut blocks of the continuous video frame;
and covering the watermark corner cut block to the corresponding position of the corresponding original video frame to generate a watermark video, wherein the watermark exists in an invisible but detectable form.
2. The method of claim 1, wherein the step of sampling and extracting the original video to obtain consecutive frames, inputting the consecutive frames and the watermark to be added into an encoder network, and generating the watermarked video comprises:
acquiring the frame number N of an original video;
setting an embedding interval K and a continuous embedding frame number B;
sampling and extracting the original video according to a rule of extracting B frames at every K frames to obtain continuous frames;
and inputting the continuous frames and the watermark to be added into an encoder network to generate a watermark video.
3. The method as claimed in claim 2, wherein the training set video is obtained through hollywood2 data set, and the step of constructing a deep neural network for training watermark embedding, the deep neural network including an encoder-decoder module, is specifically:
a neural network model including an encoder-decoder module was designed and constructed for the backbone using a mobilene.
4. An apparatus for robust video watermarking in a real-time scenario, comprising:
a construction module for constructing a deep neural network for training watermark embedding, the deep neural network comprising an encoder-decoder module;
a first training module, for training the deep neural network so that the encoder-decoder module can complete the lossless embedding and extraction of the watermark without video compression interference;
the second training module is used for training the deep neural network so that the encoder-decoder module can complete lossless embedding and extraction of the watermark under the condition of video compression interference;
the generating module is used for sampling and extracting an original video to obtain continuous frames, inputting the continuous frames and the watermark to be added into an encoder network and generating a watermark video;
the first training module is specifically configured to:
extracting an individual frame of a training set video, randomly generating a watermark, and inputting the individual frame and the watermark into an encoder network together to obtain a watermark video frame;
respectively scrambling watermark video frames through a direct input layer and a differentiable noise layer and then inputting the scrambled watermark video frames into a decoder network to obtain two output decoding watermarks;
calculating the information loss of the two decoded watermarks and the original input watermark, and reversely transmitting the loss to the encoder-decoder module;
repeating the steps until the encoder-decoder module can complete the lossless embedding and extraction of the watermark without video compression interference;
the second training module is specifically configured to:
when the encoder-decoder module can finish the lossless embedding and extraction of the watermark under the condition of no video compression interference, randomly extracting the single frame of the training set video, randomly generating the watermark, and inputting the single frame and the watermark into an encoder network together to obtain a watermark video frame;
respectively inputting watermark video frames into a decoder network through direct input, scrambling and inputting the watermark video frames into a differentiable noise layer and inputting the watermark video frames into the decoder network through video compression scrambling to obtain three output decoding watermarks;
calculating the information loss of the three decoding watermarks and the original input watermark, reversely transmitting the input loss after the differentiable noise layer is disturbed to an encoder-decoder module, and reversely transmitting the input loss after the video compression is disturbed to a decoder network;
repeating the steps until the encoder-decoder module can complete the lossless embedding and extraction of the watermark under the interference of video compression;
the generation module is specifically configured to:
acquiring the frame number N of an original video;
setting an embedding interval K and a continuous embedding frame number B;
sampling and extracting the original video according to a rule of extracting B frames at every K frames to obtain continuous frames;
inputting the continuous frames and the watermark to be added into an encoder network to generate a watermark video;
or the generating module is specifically configured to:
acquiring the resolution H multiplied by W of an original video;
setting a scale factor k;
calculating a resolution R = h × w of the corner cut block by a scale factor, wherein:
h = H × k, H being the height resolution of the chamfer;
w = wxk, W being the width resolution of the cut angle;
respectively extracting four corner cutting blocks at the upper left corner, the upper right corner, the lower left corner and the lower right corner of the same original video frame;
extracting each corner cut block of the continuous video frame, and inputting the corner cut blocks and the watermark to be added into an encoder network to obtain the watermark corner cut blocks of the continuous video frame;
and covering the watermark corner cut block to the corresponding position of the corresponding original video frame to generate a watermark video, wherein the watermark exists in an invisible but detectable form.
5. A computer device comprising a memory, a processor and a computer program stored on the memory and running on the processor, characterized in that the processor when executing the computer program implements the method for robust video watermarking in real-time scenarios as claimed in any one of claims 1 to 3.
6. A computer storage medium having a computer program stored thereon, which program, when being executed by a processor, is adapted to carry out the method for robust video watermarking in real-time scenarios according to any of the claims 1 to 3.
CN202210632330.7A 2022-06-07 2022-06-07 Method and device for robust video watermarking in real-time scene Active CN114727113B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210632330.7A CN114727113B (en) 2022-06-07 2022-06-07 Method and device for robust video watermarking in real-time scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210632330.7A CN114727113B (en) 2022-06-07 2022-06-07 Method and device for robust video watermarking in real-time scene

Publications (2)

Publication Number Publication Date
CN114727113A CN114727113A (en) 2022-07-08
CN114727113B true CN114727113B (en) 2022-10-11

Family

ID=82233064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210632330.7A Active CN114727113B (en) 2022-06-07 2022-06-07 Method and device for robust video watermarking in real-time scene

Country Status (1)

Country Link
CN (1) CN114727113B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111491170A (en) * 2019-01-26 2020-08-04 华为技术有限公司 Method for embedding watermark and watermark embedding device
CN111954086A (en) * 2020-08-19 2020-11-17 浙江无极互联科技有限公司 Invisible video copyright watermarking algorithm
CN114549273A (en) * 2022-02-28 2022-05-27 中山大学 Self-adaptive robust watermark embedding method and system based on deep neural network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220335560A1 (en) * 2019-12-05 2022-10-20 Google Llc Watermark-Based Image Reconstruction

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111491170A (en) * 2019-01-26 2020-08-04 华为技术有限公司 Method for embedding watermark and watermark embedding device
CN111954086A (en) * 2020-08-19 2020-11-17 浙江无极互联科技有限公司 Invisible video copyright watermarking algorithm
CN114549273A (en) * 2022-02-28 2022-05-27 中山大学 Self-adaptive robust watermark embedding method and system based on deep neural network

Also Published As

Publication number Publication date
CN114727113A (en) 2022-07-08

Similar Documents

Publication Publication Date Title
Jia et al. Mbrs: Enhancing robustness of dnn-based watermarking by mini-batch of real and simulated jpeg compression
US7058200B2 (en) Method for the prior monitoring of the detectability of a watermarking signal
CN115660931A (en) Robust watermarking method based on Transformer and denoising diffusion model
US20230008085A1 (en) Method for embedding watermark in video data and apparatus, method for extracting watermark in video data and apparatus, device, and storage medium
Chen et al. JSNet: a simulation network of JPEG lossy compression and restoration for robust image watermarking against JPEG attack
CN116233445B (en) Video encoding and decoding processing method and device, computer equipment and storage medium
Chang et al. An effective image self-recovery based fragile watermarking using self-adaptive weight-based compressed AMBTC
Neekhara et al. FaceSigns: semi-fragile neural watermarks for media authentication and countering deepfakes
CN115482142A (en) Dark watermark adding method, extracting method, system, storage medium and terminal
Wei et al. A robust image watermarking approach using cycle variational autoencoder
Wang et al. Digital video steganalysis by subtractive prediction error adjacency matrix
Liu et al. Adaptive feature calculation and diagonal mapping for successive recovery of tampered regions
Hu et al. A robust and secure blind color image watermarking scheme based on contourlet transform and Schur decomposition
CN114727113B (en) Method and device for robust video watermarking in real-time scene
Zhang et al. A robust and high-efficiency blind watermarking method for color images in the spatial domain
Maity et al. Genetic algorithms for optimality of data hiding in digital images
Chen et al. Learning iterative neural optimizers for image steganography
JP4945541B2 (en) Digital watermark embedding detection method using degraded host signal
CN115829819A (en) Neural network-based image robust reversible information hiding method, device and medium
Soualmi et al. A blind watermarking approach based on hybrid Imperialistic Competitive Algorithm and SURF points for color Images’ authentication
Butora et al. Side-informed steganography for jpeg images by modeling decompressed images
CN114900701A (en) Video digital watermark embedding and extracting method and system based on deep learning
CN114493971A (en) Media data conversion model training and digital watermark embedding method and device
CN117156152A (en) Model training method, encoding method, decoding method and equipment
Yang et al. A novel semi-fragile watermarking technique for image authentication

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Room 402, No. 66, North Street, University Town Center, Panyu District, Guangzhou City, Guangdong Province, 510006

Patentee after: Yifang Information Technology Co.,Ltd.

Address before: 510006 room 402, No. 66 (innovation building), North Central Street, University City, Panyu District, Guangzhou, Guangdong Province

Patentee before: GUANGZHOU EASEFUN INFORMATION TECHNOLOGY Co.,Ltd.