CN114549270A - Anti-shooting monitoring video watermarking method combining depth robust watermarking and template synchronization - Google Patents

Anti-shooting monitoring video watermarking method combining depth robust watermarking and template synchronization Download PDF

Info

Publication number
CN114549270A
CN114549270A CN202210109380.7A CN202210109380A CN114549270A CN 114549270 A CN114549270 A CN 114549270A CN 202210109380 A CN202210109380 A CN 202210109380A CN 114549270 A CN114549270 A CN 114549270A
Authority
CN
China
Prior art keywords
image
watermark
video
watermarking
robust
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210109380.7A
Other languages
Chinese (zh)
Inventor
孙一言
倪江群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202210109380.7A priority Critical patent/CN114549270A/en
Publication of CN114549270A publication Critical patent/CN114549270A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/0021Image watermarking
    • G06T1/005Robust watermarking, e.g. average attack or collusion attack resistant
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)
  • Editing Of Facsimile Originals (AREA)

Abstract

The invention provides a method for resisting the watermark of the shot surveillance video by combining the deep robust watermark and the template synchronization aiming at the limitation of the prior art, skillfully utilizing the characteristic that the content of part of the background in the surveillance video is usually basically unchanged, and selecting part of the background image as a watermark carrier; the used deep robust watermark network can embed a robust watermark which is invisible to naked eyes and is shooting-resistant into a watermark carrier image; meanwhile, by increasing the watermark embedding cost of the texture simple area, the network is guided to embed the watermark information in the texture complex area as much as possible, the defect that the Stegasamp embedding trace is obvious in the image smooth area is overcome, and the visual quality of the watermark is obviously improved.

Description

Anti-shooting monitoring video watermarking method combining depth robust watermarking and template synchronization
Technical Field
The invention relates to the technical field of multimedia content security, in particular to a shooting traceability technology for a monitoring video; and more particularly, to a method for anti-shot surveillance video watermarking in combination with depth robust watermarking and template synchronization.
Background
Most of the existing monitoring sources are displayed on a monitoring screen in a form of visible watermarks, for example, time information displayed at the upper left corner of the screen is directly overlaid on monitoring video content by directly overlaying plain text of watermark information with pure white or other color fonts. When a copying person uses equipment such as a camera to copy the display content in the monitoring screen, the watermark information can be shot together, and therefore monitoring and tracing can be carried out according to the watermark information in the copied video. Although the watermark is easy to generate, the visual effect of the whole monitoring video is undoubtedly influenced; on the other hand, as the watermark information is visible to naked eyes, a candid can easily use means such as post-processing and the like to destroy the watermark information, so that the video is difficult to trace.
The invention discloses a Chinese patent with publication date of 2016.12.07: a robust video watermark method of resisting geometric attack based on SIFT provides a video robust watermark scheme; although the existing robust video watermarking method can resist some slight geometric attacks to a certain extent, in the process of copying and shooting, the shooting angle and the size of video content are arbitrary, and severe angle changes are easy to occur. In such a scenario, the above methods are prone to generate synchronization position errors and synchronization omission problems during watermark synchronization, and thus these prior arts do not effectively solve the problem of watermark synchronization, and have certain limitations.
In the technical field of image watermarking, a method for resisting printing and shooting blind watermarking based on deep learning exists: stegasamp, but the method has no self-adaptive selection of the embedding position of the watermark, so that when the model embeds information in a smooth region of an image, a relatively obvious trace is left, and the visual effect of the watermark image is seriously influenced; in addition, the method can only process image watermarks and cannot be applied to video watermarks.
Disclosure of Invention
Aiming at the limitation of the prior art, the invention provides a shooting-resistant monitoring video watermarking method combining deep robust watermarking and template synchronization, and the technical scheme adopted by the invention is as follows:
a shooting-resistant monitoring video watermarking method combining depth robust watermarking and template synchronization is characterized in that watermarking information is embedded into a monitoring video through the following steps:
s11, selecting a rectangular area with relatively fixed background image content from the background area in the monitoring video as a template area for embedding the watermark;
s12, extracting a video frame of the monitoring video in a streaming buffer area, and acquiring an intercepted image from the video frame as a carrier image according to the template area;
s13, acquiring the equipment number information and the current timestamp of the monitoring video, and coding the equipment number information and the current timestamp into a binary bit sequence; splicing the binary bit sequence with a corresponding CRC (cyclic redundancy check) code and a BCH (broadcast channel) error correcting code to generate watermark information;
s14, inputting the carrier image and the watermark information into a preset depth robust watermark network, and generating a watermark image through an encoder in the depth robust watermark network; the depth robust watermark network is obtained by training a deep learning framework which is based on Stegasamp and combines an image block perception similar loss function and a YUV space differential loss function based on an image texture template;
and S15, replacing the image of the video frame in the template area with the watermark image.
Compared with the prior art, the method skillfully utilizes the characteristic that part of background content in the monitored video is usually basically unchanged, and selects part of background images as watermark carriers; the used deep robust watermark network can embed a robust watermark which is invisible to naked eyes and is shooting-resistant into a watermark carrier image; meanwhile, by increasing the watermark embedding cost of the texture simple area, the network is guided to embed the watermark information in the texture complex area as much as possible, the defect that the Stegasamp embedding trace is obvious in the image smooth area is overcome, and the visual quality of the watermark is obviously improved.
As a preferred solution, the image block perceptual similarity loss function is expressed by the following formula:
LP=LPIPS(Io,Iw);
wherein, IoRepresenting a carrier image; i iswRepresenting a watermark image.
As a preferred solution, the YUV space difference loss function based on the image texture template is expressed by the following formula:
Figure BDA0003494629930000021
wherein Y, U, V represent the carrier image IoAnd a watermark image IwAnd converting to a component of a YUV space, wherein I (I, j, c) represents the pixel value size of the image at the position with the abscissa I and the ordinate j in the c channel.
As a preferred solution, the overall loss function of the deep learning framework is expressed by the following formula:
L=λPLPTLTCLCMLM
wherein λ represents a loss weight; l isPRepresenting the image block perceptual similarity loss function; l isTRepresenting the YUV space difference loss function based on the image texture template; l isCRepresenting a discriminator loss function; l isMRepresenting a cross entropy loss function.
Further, the discriminator loss function LCExpressed by the following formula:
LC=D(Io)-D(Iw);
wherein D (-) represents a discriminator network; i isoRepresenting a carrier image; I.C. AwRepresenting a watermark image;
the cross entropy loss function LMExpressed by the following formula:
Figure BDA0003494629930000031
wherein, Bi(i 1, 2.., 64) represents a bit sequence input to the network during the training process; mi(i 1, 2.., 64) represents the bit sequence output by the network during the training process.
Preferably, the deep learning framework introduces distortion transformation including perspective transformation and/or blur and/or noise and/or color change and/or illumination and/or JPEG compression at a distortion layer during training.
As a preferred scheme, for a copied video obtained by copying a monitoring video embedded with the watermark information, the watermark information is extracted from the copied video by the following steps:
s21, calculating sift characteristic points of the copied video by extracting video frames of the copied video;
s22, obtaining the sift characteristic points of the carrier image, and matching the sift characteristic points with the sift characteristic points of the rephotograph video;
s23, solving a homography matrix H according to the matching result of the step S22, and carrying out perspective transformation on the video frame of the copied video according to the homography matrix H;
s24, cutting the perspective transformation result of the step S23 to the same size as the carrier image;
and S25, inputting the cutting result of the step S24 into the deep robust watermark network, and obtaining watermark information through a decoder in the deep robust watermark network.
The present invention also provides the following:
a shot-resistant surveillance video watermarking system combining depth robust watermarking and template synchronization comprises a watermarking information embedding module used for embedding watermarking information into surveillance video, wherein the watermarking information embedding module comprises a template area selecting unit, a carrier image intercepting unit, a watermarking information generating unit, a watermarking image generating unit and a video image replacing unit, and the method comprises the following steps:
the template area selection unit is used for selecting a rectangular area with relatively fixed background image content from a background area in the monitoring video as a template area for embedding the watermark;
the carrier image intercepting unit is used for extracting a video frame of the monitoring video in a streaming buffer area and acquiring an intercepted image from the video frame as a carrier image according to the template area;
the watermark information generating unit is used for acquiring the equipment number information and the current timestamp of the monitoring video and coding the equipment number information and the current timestamp into a binary bit sequence; splicing the binary bit sequence with a corresponding CRC (cyclic redundancy check) code and a BCH (broadcast channel) error correcting code to generate watermark information;
the watermark image generating unit is used for inputting the carrier image and the watermark information into a preset depth robust watermark network and generating a watermark image through an encoder in the depth robust watermark network; the depth robust watermark network is obtained by training a deep learning framework which is based on Stegasamp and combines an image block perception similar loss function and a YUV space differential loss function based on an image texture template;
the video image replacing unit is used for replacing the image of the video frame in the template area by the watermark image.
A storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the aforementioned anti-shot surveillance video watermarking method in combination with depth robust watermarking and template synchronization.
A computer device comprising a storage medium, a processor and a computer program stored in the storage medium and executable by the processor, the computer program when executed by the processor implementing the steps of the aforementioned anti-shot surveillance video watermarking method in combination with depth robust watermarking and template synchronization.
Drawings
Fig. 1 is a schematic flowchart of a process of embedding watermark information in a surveillance video by using a shot-resistant surveillance video watermarking method combining depth robust watermarking and template synchronization according to embodiment 1 of the present invention;
fig. 2 is an example of watermark information in embodiment 1 of the present invention;
fig. 3 is an example of the operation of the watermarking method according to embodiment 1 of the present invention;
fig. 4 is a schematic structural diagram of a deep robust watermark network provided in embodiment 1 of the present invention;
fig. 5 is a schematic diagram of an encoder structure of a deep robust watermark network provided in embodiment 1 of the present invention;
fig. 6 is a schematic diagram of a decoder structure of a deep robust watermarking network according to embodiment 1 of the present invention;
fig. 7 is a schematic flowchart of a process of extracting watermark information from a copied video by using a shot-resistant surveillance video watermarking method combining depth robust watermarking and template synchronization according to embodiment 1 of the present invention;
fig. 8 is a schematic diagram of a shot-resistant surveillance video watermarking system combining depth robust watermarking and template synchronization according to embodiment 2 of the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
it should be understood that the embodiments described are only a few embodiments of the present application, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without any creative effort belong to the protection scope of the embodiments in the present application.
The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the present application. As used in the examples of this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application, as detailed in the appended claims. In the description of the present application, it is to be understood that the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not necessarily used to describe a particular order or sequence, nor are they to be construed as indicating or implying relative importance. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.
Further, in the description of the present application, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. The invention is further illustrated below with reference to the figures and examples.
In order to solve the limitation of the prior art, the present embodiment provides a technical solution, and the technical solution of the present invention is further described below with reference to the accompanying drawings and embodiments.
Example 1
Referring to fig. 1, a method for watermarking an anti-shot surveillance video by combining a depth robust watermark and template synchronization includes the following steps:
s11, selecting a rectangular area with relatively fixed background image content from the background area in the monitoring video as a template area for embedding the watermark;
s12, extracting a video frame of the monitoring video in a streaming buffer area, and acquiring an intercepted image from the video frame as a carrier image according to the template area;
s13, acquiring the equipment number information and the current timestamp of the monitoring video, and coding the equipment number information and the current timestamp into a binary bit sequence; splicing the binary bit sequence with a corresponding CRC (cyclic redundancy check) code and a BCH (broadcast channel) error correcting code to generate watermark information;
s14, inputting the carrier image and the watermark information into a preset depth robust watermark network, and generating a watermark image through an encoder in the depth robust watermark network; the depth robust watermark network is obtained by training a deep learning framework which is based on Stegasamp and combines an image block perception similar loss function and a YUV space differential loss function based on an image texture template;
and S15, replacing the image of the video frame in the template area with the watermark image.
Compared with the prior art, the method skillfully utilizes the characteristic that part of background contents in the monitored video are usually basically unchanged, and selects part of background images as watermark carriers; the used deep robust watermark network can embed a robust watermark which is invisible to naked eyes and is shooting-resistant into a watermark carrier image; meanwhile, by increasing the watermark embedding cost of the texture simple area, the network is guided to embed the watermark information in the texture complex area as much as possible, the defect that the Stegasamp embedding trace is obvious in the image smooth area is overcome, and the visual quality of the watermark is obviously improved.
Specifically, in the art, the video watermarking method refers to a video watermarking processing method, and may include a process of embedding watermark information into a video, and may also include a process of extracting watermark information from a video.
According to the scheme provided by the embodiment, when the watermark information is embedded into the video, the watermark image which is resistant to shooting and code conversion and cannot be perceived by naked eyes is generated.
It can be considered that the scheme provided by the present embodiment mainly lies in: firstly, after the watermark is embedded into the video, the watermark is displayed through a monitoring screen, and the watermark is subjected to distortion of different degrees in the process of camera reproduction. Secondly, considering that the content of a part of background images is almost unchanged in a monitoring video scene, the part of background images are used as watermark synchronous templates to fix the position of watermark embedding. Finally, in order to improve the visual quality of the watermark, the embodiment takes stegasamp as a basis, and by extracting the texture complex region of the image as a template, the embedding cost of the texture complex region is reduced while the embedding cost of the smooth region is increased, and the network is guided to embed the watermark information in the texture complex region as much as possible, so as to improve the visual quality of the watermark video.
More specifically, referring to fig. 2, the watermark information generated in step S13 may be in the form of a 64-bit binary watermark information sequence. The first 32 bits are used for storing effective information, the effective information comprises timestamp information and equipment number information, wherein the year, month, day, hour and minute respectively account for 6, 4, 5 and 6 bits, and the equipment number accounts for 6 bits; the middle 8 bits store CRC check codes obtained according to the effective information; and finally, the 24 bits are supervisory information, and 24-bit BCH error correction codes obtained according to the effective information are stored and stored, wherein the BCH error correction codes can correct 3 error bits at most.
In order to facilitate the processing of the carrier image by the deep robust watermarking network, as a preferred embodiment, the carrier image may be scaled to a size suitable for network processing before being input into the deep robust watermarking network; accordingly, if the size of the watermark image is different from the original size of the carrier image, the watermark image may be scaled to the same size as the original size of the carrier image, and then step S15 is performed.
Referring to fig. 3, in this example, a rectangular area with almost unchanged content is first selected from a background image of a high-definition surveillance video as a watermark carrier, for example, fixed coordinates (0, 0) of a video frame are used as the upper left corner of the rectangular area, and a 3-channel RGB image with a size of 300 × 300 is captured; the carrier image is zoomed to 512 x 512 size suitable for network, then together with 64bit binary watermark information sequence, input into encoder of deep robust watermark network, after calculation by convolution neural network, output a watermark image containing watermark information. The watermark image is scaled to the size of 300 multiplied by 300 of the original rectangular area, and then the scaled watermark image is used for replacing the template area image in the original video frame, so as to obtain the video frame containing the watermark information. And performing the same embedding operation on each frame in the monitoring video buffer area, and finally displaying the watermark video frame on a monitoring screen through the buffer area to obtain a continuous monitoring video stream containing the watermark.
Furthermore, the deep robust watermarking network mainly comprises an encoder, a decoder and a discriminator, and the network structure can refer to fig. 4; the input of the encoder is a 512 x 512 3-channel RGB carrier image and a 64-bit binary information sequence, and the output is a residual image with the same input size, wherein the residual image is a difference value between the watermark image and the carrier image and represents an amplitude value which needs to be modified on a corresponding pixel of the carrier image in order to embed watermark information. And after the watermark image is obtained, a distortion layer simulating the process is designed in order to ensure that the watermark image can still be accurately decoded after being displayed on a monitoring screen and shot. After the watermark image is subjected to the action of the analog distortion function, the watermark image is input into a decoder, and the output of the decoder is a 64-bit binary information sequence. In the other branch, the carrier image and the watermark image are respectively used as a positive sample and a negative sample to be input into a discriminator, and the discrimination loss is used for guiding an encoder to generate a more real watermark image.
Specific network structure of the encoder referring to fig. 5, a structure similar to U-Net is adopted, which inputs RGB image 512 × 512 and 64-bit binary information sequence, and outputs residual image 512 × 512. Firstly, in order to change a one-dimensional 64-bit sequence into a two-dimensional 64-bit sequence, the 64-bit sequence is expanded into a vector with the length of 12288 through a full connection layer, and then the vector is reshaped into a three-channel image with the size of 64 × 64 through a vector folding mode, and the three-channel image is upsampled to the same size as the input image by nearest neighbor interpolation and is spliced with the input image to serve as the input of an encoder network main body. The main structure of the network comprises two parts of down sampling and up sampling. The network firstly continuously performs downsampling on the feature map by using 3 multiplied by 3 convolution, reduces the size, simultaneously continuously increases the number of channels of the feature map, and extracts the abstract features of the image; and then, a series of up-sampling operations of nearest neighbor interpolation are carried out to enlarge the size of the feature map, after each up-sampling operation, the feature map with the same size in the down-sampling process is spliced, convolution operation with the step length of 1 is carried out on the spliced feature map, the feature map with the same size is output, the size of the feature map is continuously enlarged in the process, the number of channels is continuously reduced, and until a residual image with the same size as the original image is output. And finally, adding the residual image and the original image to obtain the watermark image containing the watermark information.
Referring to fig. 6, a three-channel RGB watermark image with a size of 512 × 512 and containing watermark information is input, downsampling is continuously performed by using a convolution kernel with a step length of 2, the number of channels of a feature map is continuously increased to extract features of the image, and finally a 64-bit binary information sequence is output by using a full-link layer. The structure of the discriminator is similar to that of a decoder, a convolution kernel with the step length of 2 is adopted for down sampling, the number of channels of the feature map is increased continuously at the same time, so as to extract the features of the image, but the final output of the discriminator is a vector with the length of 2, which represents the probability of true prediction and false prediction respectively.
In particular, images may be distorted to different degrees during the process of being displayed on a monitor screen and being photographed, and the purpose of the network is to train a decoder to decode correct information for the images distorted to a certain degree. Therefore, the distortion needs to be simulated by a function during training, and six kinds of derivable distortion transformation such as perspective transformation, blurring, noise, color change, illumination and JPEG compression are introduced into a distortion layer for this purpose.
Therefore, as a preferred embodiment, the deep learning framework introduces distortion transformation including perspective transformation and/or blur and/or noise and/or color change and/or illumination and/or JPEG compression at the distortion layer during training.
The main goal of the whole network architecture is to make the embedded watermark imperceptible to the naked eye, and moreover to ensure that the decoding accuracy is high and robust to image distortion.
In order to make the watermark imperceptible to the naked eye, the present embodiment uses two kinds of loss functions to limit the difference between the watermark image and the carrier image: the first is the Image block perceptually similar (LPIPS) penalty as a penalty function, which is an Image Similarity assessment function based on network learning, the more similar the two images are visually, the smaller the penalty.
As a preferred embodiment, the image block perceptual similarity loss function is expressed by the following formula:
LP=LPIPS(Io,Iw);
wherein, IoRepresenting a carrier image; i iswRepresenting a watermark image.
The second is a YUV spatial difference loss function based on an image texture template. If watermark information is embedded in a smooth area of an image, the area has obvious watermark traces, so that the watermark information needs to be embedded in the smooth areaGiving greater embedding costs to smooth regions of the image. Therefore, when the network processes the image, firstly extracting the edge image by using a Canny operator, and then performing morphological expansion operation on the edge image, so that a texture template I with high brightness in a complex area and almost zero brightness in a smooth area is obtainedtAnd normalized to between 0 and 1. To support the image IoAnd watermark image IwConversion from RGB space to YUV space, the loss LTThe calculation formula of (a) is the sum of pixel-by-pixel losses; as a preferred embodiment, the YUV space difference loss function based on the image texture template is expressed by the following formula:
Figure BDA0003494629930000091
wherein Y, U, V represent the carrier image IoAnd a watermark image IwAnd converting to a component of a YUV space, wherein I (I, j, c) represents the pixel value size of the image at the position with the abscissa I and the ordinate j in the c channel.
In order to make the decoding accuracy high, punishment is carried out on the result of error decoding, so that cross entropy can be used as a loss function, the higher the error rate is, the larger the loss is, and the training aim is to minimize the loss function. In addition, there is a discriminator loss, which can be embodied in the form of a loss function of Wasserstein GAN
Thus, as a preferred embodiment, the overall loss function of the deep learning framework is expressed by the following formula:
L=λPLPTLTCLCMLM
wherein λ represents a loss weight; l isPRepresenting the image block perceptual similarity loss function; l isTRepresenting the YUV space difference loss function based on the image texture template; l isCRepresenting a discriminator loss function; l isMRepresenting a cross entropy loss function.
Further, the discriminator loss function LCAccording to the following formulaRepresents:
LC=D(Io)-D(Iw);
wherein D (-) represents a discriminator network; i isoRepresenting a carrier image; I.C. AwRepresenting a watermark image;
the cross entropy loss function LMExpressed by the following formula:
Figure BDA0003494629930000101
wherein, Bi(i 1, 2.., 64) represents a bit sequence input to the network during the training process; mi(i 1, 2.., 64) represents the bit sequence output by the network during the training process.
Specifically, when the network is actually trained, MIRFLICKR natural image data sets are used for training, the data sets comprise 25000 natural images, and the model can be converged after 200000 iterations of training on a single graphic computing card. When the 2080ti display card is used, the coding speed of a single picture is 10 milliseconds, and the requirement of real-time coding can be met.
As a preferred embodiment, referring to fig. 7, for a copied video obtained by copying a monitored video embedded with the watermark information, the watermark information is extracted from the copied video by the following steps:
s21, calculating sift characteristic points of the copied video by extracting video frames of the copied video;
s22, obtaining the sift characteristic points of the carrier image, and matching the sift characteristic points with the sift characteristic points of the rephotograph video;
s23, solving a homography matrix H according to the matching result of the step S22, and carrying out perspective transformation on the video frame of the copied video according to the homography matrix H;
s24, cutting the perspective transformation result of the step S23 to the same size as the carrier image;
and S25, inputting the cutting result of the step S24 into the deep robust watermark network, and obtaining watermark information through a decoder in the deep robust watermark network.
Specifically, when the surveillance video is reproduced, since the image content of the watermark template is almost unchanged, image registration is performed on the reproduced frame and the template (i.e., the carrier image) used when the watermark is embedded by using sift feature matching, so that rapid and accurate watermark synchronization can be realized.
In a specific example, referring to fig. 3 as well, after extracting watermark information from a copied video, the extracted watermark information is corrected by using BCH supervisory bits therein, and the validity of the watermark information is verified according to a CRC check code; if the time stamp and the equipment number information are valid, the time stamp and the equipment number information can be successfully extracted, and therefore the source tracing of the monitoring video is achieved.
Example 2
A shooting-resistant surveillance video watermarking system combining depth robust watermarking and template synchronization, please refer to fig. 8, which includes a watermark information embedding module 1 for embedding watermark information in a surveillance video, where the watermark information embedding module 1 includes a template region selecting unit 11, a carrier image intercepting unit 12, a watermark information generating unit 13, a watermark image generating unit 14, and a video image replacing unit 15, where:
the template area selecting unit 11 is configured to select a rectangular area with relatively fixed background image content from a background area in the surveillance video as a template area in which the watermark is embedded;
the carrier image capturing unit 12 is configured to extract a video frame of the monitoring video in a streaming buffer, and obtain a captured image from the video frame as a carrier image according to the template area;
the watermark information generating unit 13 is configured to obtain the device number information and the current timestamp of the monitoring video, and encode the device number information and the current timestamp into a binary bit sequence; splicing the binary bit sequence with a corresponding CRC (cyclic redundancy check) code and a BCH (broadcast channel) error correcting code to generate watermark information;
the watermark image generating unit 14 is configured to input the carrier image and the watermark information into a preset depth robust watermark network, and generate a watermark image through an encoder in the depth robust watermark network; the depth robust watermark network is obtained by training a deep learning framework which is based on Stegasamp and combines an image block perception similar loss function and a YUV space differential loss function based on an image texture template;
the video image replacing unit 15 is configured to replace the image of the video frame in the template area with the watermark image.
As a preferred embodiment, the system further comprises a watermark information extraction module 2 for extracting watermark information from the copied video; the reproduction video is obtained by reproducing the monitoring video embedded with the watermark information; the watermark information extraction module 2 comprises a feature point calculation unit 21, a physical sign point matching unit 22, a perspective transformation unit 23, a cutting unit 24 and a decoding unit 25; wherein:
the feature point calculating unit 21 is configured to calculate a sift feature point of the copied video by extracting a video frame of the copied video;
the sign point matching unit 22 is configured to obtain a sift feature point of the carrier image, and match the sift feature point of the copied video;
the perspective transformation unit 23 is configured to solve a homography matrix H according to a matching result of the physical sign point matching unit 22, and perform perspective transformation on a video frame of the copied video according to the homography matrix H;
the cropping unit 24 is configured to crop the perspective transformation result of the perspective transformation unit 23 to the same size as the carrier image;
the decoding unit 25 is configured to input the clipping result of the clipping unit 24 into the deep robust watermark network, and obtain watermark information through a decoder in the deep robust watermark network.
Example 3
A storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the anti-shot surveillance video watermarking method incorporating depth robust watermarking and template synchronization of embodiment 1.
Example 4
A computer device comprising a storage medium, a processor, and a computer program stored in the storage medium and executable by the processor, the computer program when executed by the processor implementing the steps of the anti-shot surveillance video watermarking method incorporating depth robust watermarking and template synchronization of embodiment 1.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A shooting-resistant surveillance video watermarking method combining depth robust watermarking and template synchronization is characterized in that watermarking information is embedded into a surveillance video through the following steps:
s11, selecting a rectangular area with relatively fixed background image content from the background area in the monitoring video as a template area for embedding the watermark;
s12, extracting a video frame of the monitoring video in a streaming buffer area, and acquiring an intercepted image from the video frame as a carrier image according to the template area;
s13, acquiring the equipment number information and the current timestamp of the monitoring video, and coding the equipment number information and the current timestamp into a binary bit sequence; splicing the binary bit sequence with a corresponding CRC (cyclic redundancy check) code and a BCH (broadcast channel) error correcting code to generate watermark information;
s14, inputting the carrier image and the watermark information into a preset depth robust watermark network, and generating a watermark image through an encoder in the depth robust watermark network; the depth robust watermark network is obtained by training a deep learning framework which is based on Stegasamp and combines an image block perception similar loss function and a YUV space differential loss function based on an image texture template;
and S15, replacing the image of the video frame in the template area with the watermark image.
2. The method for anti-shot surveillance video watermarking in combination with depth robust watermarking and template synchronization as recited in claim 1, wherein the image block perceptual similarity loss function is expressed by the following formula:
LP=LPIPS(Io,Iw);
wherein, IoRepresenting a carrier image; i iswRepresenting a watermark image.
3. The method for anti-shot surveillance video watermarking combining depth robust watermarking and template synchronization according to claim 1, wherein the YUV space difference loss function based on the image texture template is expressed by the following formula:
Figure FDA0003494629920000011
wherein Y, U, V represent the carrier image IoAnd a watermark image IwAnd converting to a component of a YUV space, wherein I (I, j, c) represents the pixel value size of the image at the position with the abscissa I and the ordinate j in the c channel.
4. The method for anti-shot surveillance video watermarking in combination with deep robust watermarking and template synchronization as claimed in claim 1, wherein the overall loss function of the deep learning framework is expressed by the following formula:
L=λPLPTLTCLCMLM
wherein λ represents a loss weight; l isPRepresenting the image block perceptual similarity loss function; l isTRepresenting the YUV space difference loss function based on the image texture template; l isCRepresenting a discriminator loss function; l isMRepresents a crossAn entropy loss function.
5. The method of anti-shot surveillance video watermarking with depth robust watermarking and template synchronization as claimed in claim 4, wherein the discriminator loss function LCExpressed by the following formula:
LC=D(Io)-D(Iw);
wherein D (-) represents a discriminator network; i isoRepresenting a carrier image; i iswRepresenting a watermark image;
the cross entropy loss function LMExpressed by the following formula:
Figure FDA0003494629920000021
wherein, Bi(i 1, 2.., 64) represents a bit sequence input to the network during the training process; mi(i 1, 2.., 64) represents the bit sequence output by the network during the training process.
6. The method for anti-shot surveillance video watermarking in combination with depth robust watermarking and template synchronization as claimed in claim 1, wherein the deep learning framework introduces distortion transformations including perspective transformation and/or blur and/or noise and/or color change and/or illumination and/or JPEG compression at a distortion layer during training.
7. The anti-shooting surveillance video watermarking method combining depth robust watermarking and template synchronization according to claim 1, wherein for a copied video obtained by copying the surveillance video embedded with the watermark information, the watermark information is extracted from the copied video by the following steps:
s21, calculating sift characteristic points of the copied video by extracting video frames of the copied video;
s22, obtaining the sift characteristic points of the carrier image, and matching the sift characteristic points with the sift characteristic points of the rephotograph video;
s23, solving a homography matrix H according to the matching result of the step S22, and carrying out perspective transformation on the video frame of the copied video according to the homography matrix H;
s24, cutting the perspective transformation result of the step S23 to the same size as the carrier image;
and S25, inputting the cutting result of the step S24 into the deep robust watermark network, and obtaining watermark information through a decoder in the deep robust watermark network.
8. The anti-shooting surveillance video watermarking system combining depth robust watermarking and template synchronization is characterized by comprising a watermarking information embedding module (1) used for embedding watermarking information into surveillance video, wherein the watermarking information embedding module (1) comprises a template area selecting unit (11), a carrier image intercepting unit (12), a watermarking information generating unit (13), a watermarking image generating unit (14) and a video image replacing unit (15), wherein:
the template area selecting unit (11) is used for selecting a rectangular area with relatively fixed background image content from a background area in a monitoring video as a template area for embedding a watermark;
the carrier image intercepting unit (12) is used for extracting a video frame of the monitoring video in a stream buffer area, and acquiring an intercepted image from the video frame as a carrier image according to the template area;
the watermark information generating unit (13) is used for acquiring the equipment number information and the current timestamp of the monitoring video and encoding the equipment number information and the current timestamp into a binary bit sequence; splicing the binary bit sequence with a corresponding CRC (cyclic redundancy check) code and a BCH (broadcast channel) error correcting code to generate watermark information;
the watermark image generating unit (14) is used for inputting the carrier image and the watermark information into a preset depth robust watermark network, and generating a watermark image through an encoder in the depth robust watermark network; the depth robust watermark network is obtained by training a deep learning framework which is based on Stegasamp and combines an image block perception similar loss function and a YUV space differential loss function based on an image texture template;
the video image replacing unit (15) is used for replacing the image of the video frame in the template area with the watermark image.
9. A storage medium having a computer program stored thereon, characterized in that: the computer program when being executed by a processor realizes the steps of the anti-shot surveillance video watermarking method in combination with depth robust watermarking and template synchronization according to any of the claims 1 to 7.
10. A computer device, characterized by: comprising a storage medium, a processor and a computer program stored in said storage medium and executable by said processor, said computer program when executed by the processor implementing the steps of the anti-shot surveillance video watermarking method incorporating depth robust watermarking and template synchronization according to any of the claims 1 to 7.
CN202210109380.7A 2022-01-28 2022-01-28 Anti-shooting monitoring video watermarking method combining depth robust watermarking and template synchronization Pending CN114549270A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210109380.7A CN114549270A (en) 2022-01-28 2022-01-28 Anti-shooting monitoring video watermarking method combining depth robust watermarking and template synchronization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210109380.7A CN114549270A (en) 2022-01-28 2022-01-28 Anti-shooting monitoring video watermarking method combining depth robust watermarking and template synchronization

Publications (1)

Publication Number Publication Date
CN114549270A true CN114549270A (en) 2022-05-27

Family

ID=81672983

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210109380.7A Pending CN114549270A (en) 2022-01-28 2022-01-28 Anti-shooting monitoring video watermarking method combining depth robust watermarking and template synchronization

Country Status (1)

Country Link
CN (1) CN114549270A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117095411A (en) * 2023-10-16 2023-11-21 青岛文达通科技股份有限公司 Detection method and system based on image fault recognition

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117095411A (en) * 2023-10-16 2023-11-21 青岛文达通科技股份有限公司 Detection method and system based on image fault recognition
CN117095411B (en) * 2023-10-16 2024-01-23 青岛文达通科技股份有限公司 Detection method and system based on image fault recognition

Similar Documents

Publication Publication Date Title
US10176545B2 (en) Signal encoding to reduce perceptibility of changes over time
Zhang et al. Robust invisible video watermarking with attention
CN111028308B (en) Steganography and reading method for information in image
Jia et al. RIHOOP: Robust invisible hyperlinks in offline and online photographs
CN107888925B (en) A kind of embedding grammar and detection method of digital video hiding information
Pramila et al. Toward an interactive poster using digital watermarking and a mobile phone camera
EP3477578B1 (en) Watermark embedding and extracting method for protecting documents
EP1952338A1 (en) Animated image code, apparatus for generating/decoding animated image code, and method thereof
US9270846B2 (en) Content encoded luminosity modulation
CN111161181A (en) Image data enhancement method, model training method, device and storage medium
CN112911341B (en) Image processing method, decoder network training method, device, equipment and medium
CN113095992A (en) Novel bar code screenshot steganography traceability combined algorithm
CN116152173A (en) Image tampering detection positioning method and device
CN115482142A (en) Dark watermark adding method, extracting method, system, storage medium and terminal
Yang et al. Language universal font watermarking with multiple cross-media robustness
CN102737240A (en) Method of analyzing digital document images
US10664940B2 (en) Signal encoding to reduce perceptibility of changes over time
JP7539998B2 (en) Zoom Agnostic Watermark Extraction
CN114549270A (en) Anti-shooting monitoring video watermarking method combining depth robust watermarking and template synchronization
CN114066709A (en) Screen-shot-resistant robust watermarking system and algorithm based on deep learning
Qin et al. Print-camera resistant image watermarking with deep noise simulation and constrained learning
CN114037596A (en) End-to-end image steganography method capable of resisting physical transmission deformation
CN114207659A (en) Light field messaging
Yakushev et al. Docmarking: Real-Time Screen-Cam Robust Document Image Watermarking
CN117597702A (en) Scaling-independent watermark extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination