CN114549270A

CN114549270A - Anti-shooting monitoring video watermarking method combining depth robust watermarking and template synchronization

Info

Publication number: CN114549270A
Application number: CN202210109380.7A
Authority: CN
Inventors: 孙一言; 倪江群
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2022-01-28
Filing date: 2022-01-28
Publication date: 2022-05-27

Abstract

The invention provides a method for resisting the watermark of the shot surveillance video by combining the deep robust watermark and the template synchronization aiming at the limitation of the prior art, skillfully utilizing the characteristic that the content of part of the background in the surveillance video is usually basically unchanged, and selecting part of the background image as a watermark carrier; the used deep robust watermark network can embed a robust watermark which is invisible to naked eyes and is shooting-resistant into a watermark carrier image; meanwhile, by increasing the watermark embedding cost of the texture simple area, the network is guided to embed the watermark information in the texture complex area as much as possible, the defect that the Stegasamp embedding trace is obvious in the image smooth area is overcome, and the visual quality of the watermark is obviously improved.

Description

Anti-shooting monitoring video watermarking method combining depth robust watermarking and template synchronization

Technical Field

The invention relates to the technical field of multimedia content security, in particular to a shooting traceability technology for a monitoring video; and more particularly, to a method for anti-shot surveillance video watermarking in combination with depth robust watermarking and template synchronization.

Background

Most of the existing monitoring sources are displayed on a monitoring screen in a form of visible watermarks, for example, time information displayed at the upper left corner of the screen is directly overlaid on monitoring video content by directly overlaying plain text of watermark information with pure white or other color fonts. When a copying person uses equipment such as a camera to copy the display content in the monitoring screen, the watermark information can be shot together, and therefore monitoring and tracing can be carried out according to the watermark information in the copied video. Although the watermark is easy to generate, the visual effect of the whole monitoring video is undoubtedly influenced; on the other hand, as the watermark information is visible to naked eyes, a candid can easily use means such as post-processing and the like to destroy the watermark information, so that the video is difficult to trace.

The invention discloses a Chinese patent with publication date of 2016.12.07: a robust video watermark method of resisting geometric attack based on SIFT provides a video robust watermark scheme; although the existing robust video watermarking method can resist some slight geometric attacks to a certain extent, in the process of copying and shooting, the shooting angle and the size of video content are arbitrary, and severe angle changes are easy to occur. In such a scenario, the above methods are prone to generate synchronization position errors and synchronization omission problems during watermark synchronization, and thus these prior arts do not effectively solve the problem of watermark synchronization, and have certain limitations.

In the technical field of image watermarking, a method for resisting printing and shooting blind watermarking based on deep learning exists: stegasamp, but the method has no self-adaptive selection of the embedding position of the watermark, so that when the model embeds information in a smooth region of an image, a relatively obvious trace is left, and the visual effect of the watermark image is seriously influenced; in addition, the method can only process image watermarks and cannot be applied to video watermarks.

Disclosure of Invention

Aiming at the limitation of the prior art, the invention provides a shooting-resistant monitoring video watermarking method combining deep robust watermarking and template synchronization, and the technical scheme adopted by the invention is as follows:

a shooting-resistant monitoring video watermarking method combining depth robust watermarking and template synchronization is characterized in that watermarking information is embedded into a monitoring video through the following steps:

s11, selecting a rectangular area with relatively fixed background image content from the background area in the monitoring video as a template area for embedding the watermark;

s12, extracting a video frame of the monitoring video in a streaming buffer area, and acquiring an intercepted image from the video frame as a carrier image according to the template area;

s13, acquiring the equipment number information and the current timestamp of the monitoring video, and coding the equipment number information and the current timestamp into a binary bit sequence; splicing the binary bit sequence with a corresponding CRC (cyclic redundancy check) code and a BCH (broadcast channel) error correcting code to generate watermark information;

s14, inputting the carrier image and the watermark information into a preset depth robust watermark network, and generating a watermark image through an encoder in the depth robust watermark network; the depth robust watermark network is obtained by training a deep learning framework which is based on Stegasamp and combines an image block perception similar loss function and a YUV space differential loss function based on an image texture template;

and S15, replacing the image of the video frame in the template area with the watermark image.

Compared with the prior art, the method skillfully utilizes the characteristic that part of background content in the monitored video is usually basically unchanged, and selects part of background images as watermark carriers; the used deep robust watermark network can embed a robust watermark which is invisible to naked eyes and is shooting-resistant into a watermark carrier image; meanwhile, by increasing the watermark embedding cost of the texture simple area, the network is guided to embed the watermark information in the texture complex area as much as possible, the defect that the Stegasamp embedding trace is obvious in the image smooth area is overcome, and the visual quality of the watermark is obviously improved.

As a preferred solution, the image block perceptual similarity loss function is expressed by the following formula:

L_P＝LPIPS(I_o,I_w)；

wherein, I_oRepresenting a carrier image; i is_wRepresenting a watermark image.

As a preferred solution, the YUV space difference loss function based on the image texture template is expressed by the following formula:

wherein Y, U, V represent the carrier image I_oAnd a watermark image I_wAnd converting to a component of a YUV space, wherein I (I, j, c) represents the pixel value size of the image at the position with the abscissa I and the ordinate j in the c channel.

As a preferred solution, the overall loss function of the deep learning framework is expressed by the following formula:

L＝λ_PL_P+λ_TL_T+λ_CL_C+λ_ML_M；

wherein λ represents a loss weight; l is_PRepresenting the image block perceptual similarity loss function; l is_TRepresenting the YUV space difference loss function based on the image texture template; l is_CRepresenting a discriminator loss function; l is_MRepresenting a cross entropy loss function.

Further, the discriminator loss function L_CExpressed by the following formula:

L_C＝D(I_o)-D(I_w)；

wherein D (-) represents a discriminator network; i is_oRepresenting a carrier image; I.C. A_wRepresenting a watermark image;

the cross entropy loss function L_MExpressed by the following formula:

wherein, B_i(i 1, 2.., 64) represents a bit sequence input to the network during the training process; m_i(i 1, 2.., 64) represents the bit sequence output by the network during the training process.

Preferably, the deep learning framework introduces distortion transformation including perspective transformation and/or blur and/or noise and/or color change and/or illumination and/or JPEG compression at a distortion layer during training.

As a preferred scheme, for a copied video obtained by copying a monitoring video embedded with the watermark information, the watermark information is extracted from the copied video by the following steps:

s21, calculating sift characteristic points of the copied video by extracting video frames of the copied video;

s22, obtaining the sift characteristic points of the carrier image, and matching the sift characteristic points with the sift characteristic points of the rephotograph video;

s23, solving a homography matrix H according to the matching result of the step S22, and carrying out perspective transformation on the video frame of the copied video according to the homography matrix H;

s24, cutting the perspective transformation result of the step S23 to the same size as the carrier image;

and S25, inputting the cutting result of the step S24 into the deep robust watermark network, and obtaining watermark information through a decoder in the deep robust watermark network.

The present invention also provides the following:

a shot-resistant surveillance video watermarking system combining depth robust watermarking and template synchronization comprises a watermarking information embedding module used for embedding watermarking information into surveillance video, wherein the watermarking information embedding module comprises a template area selecting unit, a carrier image intercepting unit, a watermarking information generating unit, a watermarking image generating unit and a video image replacing unit, and the method comprises the following steps:

the template area selection unit is used for selecting a rectangular area with relatively fixed background image content from a background area in the monitoring video as a template area for embedding the watermark;

the carrier image intercepting unit is used for extracting a video frame of the monitoring video in a streaming buffer area and acquiring an intercepted image from the video frame as a carrier image according to the template area;

the watermark information generating unit is used for acquiring the equipment number information and the current timestamp of the monitoring video and coding the equipment number information and the current timestamp into a binary bit sequence; splicing the binary bit sequence with a corresponding CRC (cyclic redundancy check) code and a BCH (broadcast channel) error correcting code to generate watermark information;

the watermark image generating unit is used for inputting the carrier image and the watermark information into a preset depth robust watermark network and generating a watermark image through an encoder in the depth robust watermark network; the depth robust watermark network is obtained by training a deep learning framework which is based on Stegasamp and combines an image block perception similar loss function and a YUV space differential loss function based on an image texture template;

the video image replacing unit is used for replacing the image of the video frame in the template area by the watermark image.

A storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the aforementioned anti-shot surveillance video watermarking method in combination with depth robust watermarking and template synchronization.

A computer device comprising a storage medium, a processor and a computer program stored in the storage medium and executable by the processor, the computer program when executed by the processor implementing the steps of the aforementioned anti-shot surveillance video watermarking method in combination with depth robust watermarking and template synchronization.

Drawings

Fig. 1 is a schematic flowchart of a process of embedding watermark information in a surveillance video by using a shot-resistant surveillance video watermarking method combining depth robust watermarking and template synchronization according to embodiment 1 of the present invention;

fig. 2 is an example of watermark information in embodiment 1 of the present invention;

fig. 3 is an example of the operation of the watermarking method according to embodiment 1 of the present invention;

fig. 4 is a schematic structural diagram of a deep robust watermark network provided in embodiment 1 of the present invention;

fig. 5 is a schematic diagram of an encoder structure of a deep robust watermark network provided in embodiment 1 of the present invention;

fig. 6 is a schematic diagram of a decoder structure of a deep robust watermarking network according to embodiment 1 of the present invention;

fig. 7 is a schematic flowchart of a process of extracting watermark information from a copied video by using a shot-resistant surveillance video watermarking method combining depth robust watermarking and template synchronization according to embodiment 1 of the present invention;

fig. 8 is a schematic diagram of a shot-resistant surveillance video watermarking system combining depth robust watermarking and template synchronization according to embodiment 2 of the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

it should be understood that the embodiments described are only a few embodiments of the present application, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without any creative effort belong to the protection scope of the embodiments in the present application.

The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the present application. As used in the examples of this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application, as detailed in the appended claims. In the description of the present application, it is to be understood that the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not necessarily used to describe a particular order or sequence, nor are they to be construed as indicating or implying relative importance. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.

Further, in the description of the present application, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. The invention is further illustrated below with reference to the figures and examples.

In order to solve the limitation of the prior art, the present embodiment provides a technical solution, and the technical solution of the present invention is further described below with reference to the accompanying drawings and embodiments.

Example 1

Referring to fig. 1, a method for watermarking an anti-shot surveillance video by combining a depth robust watermark and template synchronization includes the following steps:

Compared with the prior art, the method skillfully utilizes the characteristic that part of background contents in the monitored video are usually basically unchanged, and selects part of background images as watermark carriers; the used deep robust watermark network can embed a robust watermark which is invisible to naked eyes and is shooting-resistant into a watermark carrier image; meanwhile, by increasing the watermark embedding cost of the texture simple area, the network is guided to embed the watermark information in the texture complex area as much as possible, the defect that the Stegasamp embedding trace is obvious in the image smooth area is overcome, and the visual quality of the watermark is obviously improved.

Specifically, in the art, the video watermarking method refers to a video watermarking processing method, and may include a process of embedding watermark information into a video, and may also include a process of extracting watermark information from a video.

According to the scheme provided by the embodiment, when the watermark information is embedded into the video, the watermark image which is resistant to shooting and code conversion and cannot be perceived by naked eyes is generated.

It can be considered that the scheme provided by the present embodiment mainly lies in: firstly, after the watermark is embedded into the video, the watermark is displayed through a monitoring screen, and the watermark is subjected to distortion of different degrees in the process of camera reproduction. Secondly, considering that the content of a part of background images is almost unchanged in a monitoring video scene, the part of background images are used as watermark synchronous templates to fix the position of watermark embedding. Finally, in order to improve the visual quality of the watermark, the embodiment takes stegasamp as a basis, and by extracting the texture complex region of the image as a template, the embedding cost of the texture complex region is reduced while the embedding cost of the smooth region is increased, and the network is guided to embed the watermark information in the texture complex region as much as possible, so as to improve the visual quality of the watermark video.

More specifically, referring to fig. 2, the watermark information generated in step S13 may be in the form of a 64-bit binary watermark information sequence. The first 32 bits are used for storing effective information, the effective information comprises timestamp information and equipment number information, wherein the year, month, day, hour and minute respectively account for 6, 4, 5 and 6 bits, and the equipment number accounts for 6 bits; the middle 8 bits store CRC check codes obtained according to the effective information; and finally, the 24 bits are supervisory information, and 24-bit BCH error correction codes obtained according to the effective information are stored and stored, wherein the BCH error correction codes can correct 3 error bits at most.

In order to facilitate the processing of the carrier image by the deep robust watermarking network, as a preferred embodiment, the carrier image may be scaled to a size suitable for network processing before being input into the deep robust watermarking network; accordingly, if the size of the watermark image is different from the original size of the carrier image, the watermark image may be scaled to the same size as the original size of the carrier image, and then step S15 is performed.

Referring to fig. 3, in this example, a rectangular area with almost unchanged content is first selected from a background image of a high-definition surveillance video as a watermark carrier, for example, fixed coordinates (0, 0) of a video frame are used as the upper left corner of the rectangular area, and a 3-channel RGB image with a size of 300 × 300 is captured; the carrier image is zoomed to 512 x 512 size suitable for network, then together with 64bit binary watermark information sequence, input into encoder of deep robust watermark network, after calculation by convolution neural network, output a watermark image containing watermark information. The watermark image is scaled to the size of 300 multiplied by 300 of the original rectangular area, and then the scaled watermark image is used for replacing the template area image in the original video frame, so as to obtain the video frame containing the watermark information. And performing the same embedding operation on each frame in the monitoring video buffer area, and finally displaying the watermark video frame on a monitoring screen through the buffer area to obtain a continuous monitoring video stream containing the watermark.

Furthermore, the deep robust watermarking network mainly comprises an encoder, a decoder and a discriminator, and the network structure can refer to fig. 4; the input of the encoder is a 512 x 512 3-channel RGB carrier image and a 64-bit binary information sequence, and the output is a residual image with the same input size, wherein the residual image is a difference value between the watermark image and the carrier image and represents an amplitude value which needs to be modified on a corresponding pixel of the carrier image in order to embed watermark information. And after the watermark image is obtained, a distortion layer simulating the process is designed in order to ensure that the watermark image can still be accurately decoded after being displayed on a monitoring screen and shot. After the watermark image is subjected to the action of the analog distortion function, the watermark image is input into a decoder, and the output of the decoder is a 64-bit binary information sequence. In the other branch, the carrier image and the watermark image are respectively used as a positive sample and a negative sample to be input into a discriminator, and the discrimination loss is used for guiding an encoder to generate a more real watermark image.

Specific network structure of the encoder referring to fig. 5, a structure similar to U-Net is adopted, which inputs RGB image 512 × 512 and 64-bit binary information sequence, and outputs residual image 512 × 512. Firstly, in order to change a one-dimensional 64-bit sequence into a two-dimensional 64-bit sequence, the 64-bit sequence is expanded into a vector with the length of 12288 through a full connection layer, and then the vector is reshaped into a three-channel image with the size of 64 × 64 through a vector folding mode, and the three-channel image is upsampled to the same size as the input image by nearest neighbor interpolation and is spliced with the input image to serve as the input of an encoder network main body. The main structure of the network comprises two parts of down sampling and up sampling. The network firstly continuously performs downsampling on the feature map by using 3 multiplied by 3 convolution, reduces the size, simultaneously continuously increases the number of channels of the feature map, and extracts the abstract features of the image; and then, a series of up-sampling operations of nearest neighbor interpolation are carried out to enlarge the size of the feature map, after each up-sampling operation, the feature map with the same size in the down-sampling process is spliced, convolution operation with the step length of 1 is carried out on the spliced feature map, the feature map with the same size is output, the size of the feature map is continuously enlarged in the process, the number of channels is continuously reduced, and until a residual image with the same size as the original image is output. And finally, adding the residual image and the original image to obtain the watermark image containing the watermark information.

Referring to fig. 6, a three-channel RGB watermark image with a size of 512 × 512 and containing watermark information is input, downsampling is continuously performed by using a convolution kernel with a step length of 2, the number of channels of a feature map is continuously increased to extract features of the image, and finally a 64-bit binary information sequence is output by using a full-link layer. The structure of the discriminator is similar to that of a decoder, a convolution kernel with the step length of 2 is adopted for down sampling, the number of channels of the feature map is increased continuously at the same time, so as to extract the features of the image, but the final output of the discriminator is a vector with the length of 2, which represents the probability of true prediction and false prediction respectively.

In particular, images may be distorted to different degrees during the process of being displayed on a monitor screen and being photographed, and the purpose of the network is to train a decoder to decode correct information for the images distorted to a certain degree. Therefore, the distortion needs to be simulated by a function during training, and six kinds of derivable distortion transformation such as perspective transformation, blurring, noise, color change, illumination and JPEG compression are introduced into a distortion layer for this purpose.

Therefore, as a preferred embodiment, the deep learning framework introduces distortion transformation including perspective transformation and/or blur and/or noise and/or color change and/or illumination and/or JPEG compression at the distortion layer during training.

The main goal of the whole network architecture is to make the embedded watermark imperceptible to the naked eye, and moreover to ensure that the decoding accuracy is high and robust to image distortion.

In order to make the watermark imperceptible to the naked eye, the present embodiment uses two kinds of loss functions to limit the difference between the watermark image and the carrier image: the first is the Image block perceptually similar (LPIPS) penalty as a penalty function, which is an Image Similarity assessment function based on network learning, the more similar the two images are visually, the smaller the penalty.

As a preferred embodiment, the image block perceptual similarity loss function is expressed by the following formula:

L_P＝LPIPS(I_o,I_w)；

wherein, I_oRepresenting a carrier image; i is_wRepresenting a watermark image.

The second is a YUV spatial difference loss function based on an image texture template. If watermark information is embedded in a smooth area of an image, the area has obvious watermark traces, so that the watermark information needs to be embedded in the smooth areaGiving greater embedding costs to smooth regions of the image. Therefore, when the network processes the image, firstly extracting the edge image by using a Canny operator, and then performing morphological expansion operation on the edge image, so that a texture template I with high brightness in a complex area and almost zero brightness in a smooth area is obtained_tAnd normalized to between 0 and 1. To support the image I_oAnd watermark image I_wConversion from RGB space to YUV space, the loss L_TThe calculation formula of (a) is the sum of pixel-by-pixel losses; as a preferred embodiment, the YUV space difference loss function based on the image texture template is expressed by the following formula:

In order to make the decoding accuracy high, punishment is carried out on the result of error decoding, so that cross entropy can be used as a loss function, the higher the error rate is, the larger the loss is, and the training aim is to minimize the loss function. In addition, there is a discriminator loss, which can be embodied in the form of a loss function of Wasserstein GAN

Thus, as a preferred embodiment, the overall loss function of the deep learning framework is expressed by the following formula:

L＝λ_PL_P+λ_TL_T+λ_CL_C+λ_ML_M；

Further, the discriminator loss function L_CAccording to the following formulaRepresents:

L_C＝D(I_o)-D(I_w)；

the cross entropy loss function L_MExpressed by the following formula:

Specifically, when the network is actually trained, MIRFLICKR natural image data sets are used for training, the data sets comprise 25000 natural images, and the model can be converged after 200000 iterations of training on a single graphic computing card. When the 2080ti display card is used, the coding speed of a single picture is 10 milliseconds, and the requirement of real-time coding can be met.

As a preferred embodiment, referring to fig. 7, for a copied video obtained by copying a monitored video embedded with the watermark information, the watermark information is extracted from the copied video by the following steps:

Specifically, when the surveillance video is reproduced, since the image content of the watermark template is almost unchanged, image registration is performed on the reproduced frame and the template (i.e., the carrier image) used when the watermark is embedded by using sift feature matching, so that rapid and accurate watermark synchronization can be realized.

In a specific example, referring to fig. 3 as well, after extracting watermark information from a copied video, the extracted watermark information is corrected by using BCH supervisory bits therein, and the validity of the watermark information is verified according to a CRC check code; if the time stamp and the equipment number information are valid, the time stamp and the equipment number information can be successfully extracted, and therefore the source tracing of the monitoring video is achieved.

Example 2

A shooting-resistant surveillance video watermarking system combining depth robust watermarking and template synchronization, please refer to fig. 8, which includes a watermark information embedding module 1 for embedding watermark information in a surveillance video, where the watermark information embedding module 1 includes a template region selecting unit 11, a carrier image intercepting unit 12, a watermark information generating unit 13, a watermark image generating unit 14, and a video image replacing unit 15, where:

the template area selecting unit 11 is configured to select a rectangular area with relatively fixed background image content from a background area in the surveillance video as a template area in which the watermark is embedded;

the carrier image capturing unit 12 is configured to extract a video frame of the monitoring video in a streaming buffer, and obtain a captured image from the video frame as a carrier image according to the template area;

the watermark information generating unit 13 is configured to obtain the device number information and the current timestamp of the monitoring video, and encode the device number information and the current timestamp into a binary bit sequence; splicing the binary bit sequence with a corresponding CRC (cyclic redundancy check) code and a BCH (broadcast channel) error correcting code to generate watermark information;

the watermark image generating unit 14 is configured to input the carrier image and the watermark information into a preset depth robust watermark network, and generate a watermark image through an encoder in the depth robust watermark network; the depth robust watermark network is obtained by training a deep learning framework which is based on Stegasamp and combines an image block perception similar loss function and a YUV space differential loss function based on an image texture template;

the video image replacing unit 15 is configured to replace the image of the video frame in the template area with the watermark image.

As a preferred embodiment, the system further comprises a watermark information extraction module 2 for extracting watermark information from the copied video; the reproduction video is obtained by reproducing the monitoring video embedded with the watermark information; the watermark information extraction module 2 comprises a feature point calculation unit 21, a physical sign point matching unit 22, a perspective transformation unit 23, a cutting unit 24 and a decoding unit 25; wherein:

the feature point calculating unit 21 is configured to calculate a sift feature point of the copied video by extracting a video frame of the copied video;

the sign point matching unit 22 is configured to obtain a sift feature point of the carrier image, and match the sift feature point of the copied video;

the perspective transformation unit 23 is configured to solve a homography matrix H according to a matching result of the physical sign point matching unit 22, and perform perspective transformation on a video frame of the copied video according to the homography matrix H;

the cropping unit 24 is configured to crop the perspective transformation result of the perspective transformation unit 23 to the same size as the carrier image;

the decoding unit 25 is configured to input the clipping result of the clipping unit 24 into the deep robust watermark network, and obtain watermark information through a decoder in the deep robust watermark network.

Example 3

A storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the anti-shot surveillance video watermarking method incorporating depth robust watermarking and template synchronization of embodiment 1.

Example 4

A computer device comprising a storage medium, a processor, and a computer program stored in the storage medium and executable by the processor, the computer program when executed by the processor implementing the steps of the anti-shot surveillance video watermarking method incorporating depth robust watermarking and template synchronization of embodiment 1.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A shooting-resistant surveillance video watermarking method combining depth robust watermarking and template synchronization is characterized in that watermarking information is embedded into a surveillance video through the following steps:

2. The method for anti-shot surveillance video watermarking in combination with depth robust watermarking and template synchronization as recited in claim 1, wherein the image block perceptual similarity loss function is expressed by the following formula:

L_P＝LPIPS(I_o,I_w)；

wherein, I_oRepresenting a carrier image; i is_wRepresenting a watermark image.

3. The method for anti-shot surveillance video watermarking combining depth robust watermarking and template synchronization according to claim 1, wherein the YUV space difference loss function based on the image texture template is expressed by the following formula:

4. The method for anti-shot surveillance video watermarking in combination with deep robust watermarking and template synchronization as claimed in claim 1, wherein the overall loss function of the deep learning framework is expressed by the following formula:

L＝λ_PL_P+λ_TL_T+λ_CL_C+λ_ML_M；

wherein λ represents a loss weight; l is_PRepresenting the image block perceptual similarity loss function; l is_TRepresenting the YUV space difference loss function based on the image texture template; l is_CRepresenting a discriminator loss function; l is_MRepresents a crossAn entropy loss function.

5. The method of anti-shot surveillance video watermarking with depth robust watermarking and template synchronization as claimed in claim 4, wherein the discriminator loss function L_CExpressed by the following formula:

L_C＝D(I_o)-D(I_w)；

wherein D (-) represents a discriminator network; i is_oRepresenting a carrier image; i is_wRepresenting a watermark image;

the cross entropy loss function L_MExpressed by the following formula:

6. The method for anti-shot surveillance video watermarking in combination with depth robust watermarking and template synchronization as claimed in claim 1, wherein the deep learning framework introduces distortion transformations including perspective transformation and/or blur and/or noise and/or color change and/or illumination and/or JPEG compression at a distortion layer during training.

7. The anti-shooting surveillance video watermarking method combining depth robust watermarking and template synchronization according to claim 1, wherein for a copied video obtained by copying the surveillance video embedded with the watermark information, the watermark information is extracted from the copied video by the following steps:

8. The anti-shooting surveillance video watermarking system combining depth robust watermarking and template synchronization is characterized by comprising a watermarking information embedding module (1) used for embedding watermarking information into surveillance video, wherein the watermarking information embedding module (1) comprises a template area selecting unit (11), a carrier image intercepting unit (12), a watermarking information generating unit (13), a watermarking image generating unit (14) and a video image replacing unit (15), wherein:

the template area selecting unit (11) is used for selecting a rectangular area with relatively fixed background image content from a background area in a monitoring video as a template area for embedding a watermark;

the carrier image intercepting unit (12) is used for extracting a video frame of the monitoring video in a stream buffer area, and acquiring an intercepted image from the video frame as a carrier image according to the template area;

the watermark information generating unit (13) is used for acquiring the equipment number information and the current timestamp of the monitoring video and encoding the equipment number information and the current timestamp into a binary bit sequence; splicing the binary bit sequence with a corresponding CRC (cyclic redundancy check) code and a BCH (broadcast channel) error correcting code to generate watermark information;

the watermark image generating unit (14) is used for inputting the carrier image and the watermark information into a preset depth robust watermark network, and generating a watermark image through an encoder in the depth robust watermark network; the depth robust watermark network is obtained by training a deep learning framework which is based on Stegasamp and combines an image block perception similar loss function and a YUV space differential loss function based on an image texture template;

the video image replacing unit (15) is used for replacing the image of the video frame in the template area with the watermark image.

9. A storage medium having a computer program stored thereon, characterized in that: the computer program when being executed by a processor realizes the steps of the anti-shot surveillance video watermarking method in combination with depth robust watermarking and template synchronization according to any of the claims 1 to 7.

10. A computer device, characterized by: comprising a storage medium, a processor and a computer program stored in said storage medium and executable by said processor, said computer program when executed by the processor implementing the steps of the anti-shot surveillance video watermarking method incorporating depth robust watermarking and template synchronization according to any of the claims 1 to 7.