CN117131520A

CN117131520A - Two-stage image privacy protection method and system based on dynamic mask and generation recovery

Info

Publication number: CN117131520A
Application number: CN202310969216.8A
Authority: CN
Inventors: 滕旭阳; 方世超; 王子南; 陈晗; 仇兆炀; 毕美华
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2023-08-02
Filing date: 2023-08-02
Publication date: 2023-11-28

Abstract

The invention discloses a two-stage image privacy protection method and system based on dynamic mask and generation recovery, wherein the method comprises the following steps: s1, establishing a data set ImgDataset, preprocessing the data set ImgDataset, labeling the category of each target, and dividing the data set into a training set D in proportion _tr Verification set D _v Test set D _te For training set D _tr Performing classification training; s2, judging important attention areas of the images, generating corresponding binary masks, and masking the original image by using the masks; s3, inputting the mask image into a generator for training,encrypting the trained parameters; and S4, respectively transmitting the mask image and the parameter ciphertext obtained in the step S3 to a receiving end, decrypting the parameter ciphertext by the receiving end, loading the network weight parameter into a generator with the same network structure as the transmitting end in the receiving end, and repairing the mask image by the generator. According to the invention, by encrypting the training model parameters, the encryption operation on massive image data is avoided, the calculation cost is reduced, and the encryption and decryption rate is improved.

Description

Two-stage image privacy protection method and system based on dynamic mask and generation recovery

Technical Field

The invention belongs to the technical field of privacy protection of image information based on deep learning, and particularly relates to an image privacy protection method and system based on two stages of dynamic mask generation and restoration in an image transmission process.

Background

With the advent and widespread use of digital products such as digital cameras and smart phones, digital images are certainly multimedia information with the most rapid increase in number, and are widely applied to various fields such as engineering, national defense, medicine, scientific experiments, etc., so that image information security has become an important agenda at present.

Existing image encryption methods can be divided into two types, full-image encryption and selective encryption. The former is used for encrypting the whole image, has good encryption effect, but the method has higher calculation complexity and can not meet the requirement of rapidly expanding image data; the latter is to balance between security and computational complexity, and the significant object detection algorithm is adopted to identify and encrypt an important region of the image, so that the method can reduce certain computational cost.

Although both methods have certain encryption effect, the encryption of the whole image can not be realized while the encryption quality is ensured, and the saliency target data set is quite rare, and the saliency target detection algorithm can not be widely applied to various scenes, so that the practicability of extracting the key areas of the image by adopting the algorithm is not high. In the age of informatization and intellectualization, the two types of methods need to encrypt images, the digital image data volume is large, the redundancy is high, the image encryption cost is high, the capability of manually processing mass data is limited, and the image processing by using an intelligent system gradually becomes a new trend.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides a two-stage image privacy protection method and system based on dynamic masking and generation recovery. In a transmitting stage, the invention provides a self-adaptive mask method, which can dynamically generate a mask, then automatically and adaptively shade a sensitive area of an image by using the mask, hide effective information of the image, simultaneously input the mask image into a generator for training, further encrypt text of parameters trained by the generator, and finally send the mask image and ciphertext to a receiving end separately. In the receiving stage, firstly, the network parameter set ciphertext is decrypted, secondly, the decrypted plaintext parameters are loaded into a generator, the generation type recovery is carried out on the mask image, and finally, the image is processed. If the image is acquired by an illegal interception end in the transmission process, the identification model can not locate and detect the image. In addition, the invention converts the image encryption into the encryption of the training parameters of the generator, thereby effectively improving the encryption and decryption rate and reducing the calculation cost. The invention designs a self-adaptive category activation loss function based on region perception, which can further improve the quality of the generated image.

In order to achieve the aim of the invention, the invention adopts the following technical scheme:

the two-stage image privacy protection method based on dynamic masking and generation restoration comprises the following steps:

s1, establishing a data set ImgDataset, preprocessing the data set ImgDataset, marking category labels on each target, and dividing the data set ImgDataset into a training set D according to the proportion _tr Verification set D _v Test set D _te For training set D _tr And performing classification training.

S2, adaptively judging the important attention area of the image, generating a corresponding binary mask, and masking the original image by using the mask.

S3, inputting the mask image into a generator for training. Encrypting the trained network weight parameters.

And S4, respectively transmitting the mask image and the parameter ciphertext obtained in the step S3 to a receiving end, decrypting the parameter ciphertext by the receiving end, loading the network weight parameter into a generator with the same network structure as the transmitting end in the receiving end, and repairing the mask image by the generator.

Preferably, step S1 specifically includes uniformly scaling ImgDataset to 256×256 size, converting image data into tensors and normalizing, then inputting to res net18 for training, recording D _v Average accuracy and cross entropy loss, and saving the model parameters with optimal verification performance.

Preferably, step S2 comprises the steps of:

s21, D _tr Inputting the predicted result y into a trained ResNet18 network to obtain a one-dimensional vector with the length of C, and obtaining the predicted result y and the corresponding class label C through a softmax classifier.

S22, extracting a feature map F of the ResNet18 after the last convolution layer, which contains 512 channels (F _i The ith channel of F), each channel size being 8 x 8. Then f is carried out _i Bilinear interpolation amplification is carried out to 256×256 to obtain f _i ′，f _i ' multiplying the original image, then transmitting the result to the trained ResNet18 again, and outputting the result through a softmax classifier and extracting a predicted result y corresponding to the category label c _i . As shown in formula (1), y is used _i The difference from y is used for evaluating the contribution degree of the ith channel in the feature map F to the category c, and a negative value is zeroed through a ReLU function to obtain a Score category activation map (Score-CAM, score ClassActivation Mapping) CAM ₀ The method comprises the steps of carrying out a first treatment on the surface of the CAM was performed as shown in equation (2) ₀ Performing dispersion normalization to obtain a class activation map CAM.

S23, a threshold gamma (0.5 < gamma < 1) is set up, and the pixel value of the class activation map CAM is larger than or equal to gamma and can be regarded as an important area of the original map.

S24, constructing a full 0 mask of 256 x 256, searching the pixel point with the maximum pixel value and larger than gamma in the class activation diagram, acquiring the position coordinate of the pixel point, setting the pixel value of the pixel point to 0, and setting the pixel point at the same position in the mask to 1.

S25, repeating the step S24 until the pixel values in the CAM are smaller than gamma, obtaining a binary mask, wherein the pixel value is not 0, namely 1,1 corresponds to the important pixel in the original image, and 0 is opposite.

S26, filling the pixel points corresponding to the positions with the pixel values of 1 in the binary mask in the original image by using an integer of 0 to 255, and finally obtaining the corresponding mask image.

Preferably, step S3 includes the steps of:

s31, selecting the mask image I in S2 _mask Original picture I _gt The image is input to a generator, and an encoder of the generator judges a mask region of the image and extracts semantic information of the image.

S32, a decoder in the generator upsamples the extracted semantic information to generate a preliminary restoration result image.

S33, the discriminator receives the preliminary result image I output by the generator _re And calculating the countermeasures loss by combining the original figures, and returning to the generator.

S34, encrypting the parameters of the trained generator based on the Peltier series and the elliptic curve.

Specifically, in S31, the encoder adopts an eight-group structure, the first group is one convolution, the second to seven groups are one activation layer, one convolution and one batch normalization layer, and the eighth group is one activation layer and one convolution. The number of convolution kernels of the first, second and third groups of convolutions is respectively increased from 3 to 64, 64 to 128, 128 to 256, and the number of convolution kernels of the fourth to eighth groups of convolutions is 512. The convolution sizes in the eight-group structure are 4, the stride is 2, the padding is 1, and the activation functions of the second through eighth groups of activation layers are LeakyReLU. The encoder finally outputs 512 feature maps of size 1*1.

Specifically, in S32, the decoder performs deconvolution operation on the extracted semantic information features, and adopts a nine-group structure, where the first to seven groups are an active layer, a deconvolution layer, and a batch normalization layer, the eighth group is an active layer, a deconvolution layer, and the ninth group has only one active layer. The first to eight groups of semantic information features output from front to back of the encoder and the eighth to one group of semantic information features output from back to front of the decoder are stacked, so that each layer structure of the decoder can acquire the semantic information features output by the structure of the previous layer of the decoder and also can acquire the semantic information features output by the encoder. The number of convolution kernels of the first to fourth sets of deconvolutions is 512, and the number of convolution kernels of the fifth to eighth sets is reduced from 1024 to 256, 512 to 128, 256 to 64, 128 to 3, respectively. The deconvolution sizes in the first to eight groups of structures are all 4, the stride is all 2, the padding is all 1, the activation functions are all ReLU, and the activation function of the ninth group is Tanh. The decoder ultimately outputs a preliminary result map of 256 by 256 channels 3.

Specifically, in S33, the markov discriminator mainly includes four layers of convolution, three activation functions, namely, leakyReLU and an activation function Sigmoid layer; the convolution kernel size of the first three layers of convolution is 4, the stride is 2, the filling size is 1, the activation function LeakyRelu is used for activating the neurons, the activation function is set to be 0.2 in negative slope, and the in-situ execution operation is selected by the displacement parameter; the last convolution layer, convolution kernel size 4, stride 1, fill size 1, uses the activation function Sigmoid to activate the neuron. And the preliminary result image is utilized to carry out local discrimination, so that the effective recovery of the low-frequency local texture structure is ensured. The Markov discriminator receives the preliminary result image, and discriminates and processes the preliminary result image and the original complete image thereof to obtain the countermeasures loss L _adv ：

In the formula (3), G represents a generator and D represents a tableMarkov discriminator is shown, E (x) represents the expected value of the distribution function, p _data (I _gt ) Representing the distribution of real samples, p _data (I _mask ) Representing the distribution of the mask image, D (I _gt ) Representing Markov discriminator prediction I _gt Probability of D (I) _mask ) Representing Markov discriminator prediction I _mask Is a probability of (2).

While the initial result image and the original complete image thereof are input into the discriminator, the pixel level loss L between the initial result image and the original complete image thereof is also calculated _re ：

L _re ＝||I _re -I _gt || ₁ (4)

Inputting the preliminary result graph and the original complete image thereof into a VGG16 feature extractor, and calculating the perception loss L of the feature level between the images by the VGG16 feature extractor _p Loss of style L _s ：

In formula (5), N represents the number of layers of the convolution layer in VGG16, so here n=5; phi _j Represents the jth convolutional layer in VGG 16; c (C) _j H _j W _j The size of the j-th convolution layer output feature map is represented. C. H, W the number of channels, height and width, respectively, of the convolutional layer output profile. The Gram matrix in the formula (6) operates in the manner of formula (7), wherein m, n represents the index of the input feature map F in the channel dimension, and p represents the index of the feature map F in the spatial dimension.

In addition to the above-mentioned loss, in order to improve the quality of the generated image, the present invention designs a method based on regional senseKnown class activation loss function L _CAM ：

In the formula (8)Representation I _gt Category activation diagram of->Representation I _re CHW represents the number of pixels of the class activation map.

Pixel level loss, perception loss, style loss, class activation loss are returned to the training generator along with the countermeasures loss of the arbiter output. The weight of each loss function is 1, 0.01, 0.1,0.002, n represents I _mask The number of pixels in the middle mask region is +.>This is because the CAM focuses on the key region in the image, and if the adaptive masking is performed on this region, the number of masking pixels in each image is different, so that the parameter optimization is more facilitated by adopting a dynamic weighting parameter.

Specifically, the encryption process in S34 is as follows: firstly, a nonsensical plaintext is obtained by using cyclic shift to spread the plaintext on a symbol set, secondly, each element of the spread text is encoded into a real number by using a Pel sequence and a binary sequence, the purpose of hiding the spread text element is achieved, and finally, the encoded spread text is confused by generating an arrangement on an elliptic curve.

Preferably, step S4 includes the steps of:

s41, the sending end sends the parameter ciphertext and the mask image to the receiving end respectively.

S42, the receiving end receives the mask image from the sending end and the parameter ciphertext subjected to encryption processing, and decrypts the ciphertext to obtain a parameter plaintext;

s43, loading the parameters into a generator, generating and recovering the image by the generator, inputting the recovered image into a depth recognition model, and recognizing and positioning the image by the model.

The invention also discloses a two-stage image privacy protection system based on dynamic mask and generation recovery, which is used for executing the method and comprises the following modules:

a data set establishing module: establishing a data set ImgDataset, preprocessing the data set ImgDataset, marking category labels on each target, and dividing the data set into a training set D in proportion _tr Verification set D _v Test set D _te For training set D _tr Performing classification training;

and (3) masking module: adaptively judging important attention areas of the image, generating a corresponding binary mask, and masking the original image by using the mask;

an encryption module: inputting the mask image into a generator for training, and encrypting the trained network weight parameters;

and (3) a repair module: and respectively transmitting the mask image and the parameter ciphertext to a receiving end, decrypting the parameter ciphertext by the receiving end, loading the parameter into a generator with the same network structure as the transmitting end in the receiving end, and repairing the mask image by the generator.

Compared with the prior art, the invention has the beneficial effects that:

(1) According to the invention, by encrypting the training model parameters, the encryption operation on massive image data is avoided, the calculation cost is reduced, the encryption and decryption rate is improved, and the whole frame can be operated in an end-to-end mode;

(2) The invention provides a self-adaptive mask method based on a category activation graph, which can automatically and adaptively mask important information in an image, and also provides a category activation loss function based on area perception self-adaptation, which can improve the quality of a restored image and is more beneficial to identification and positioning of intelligent identification equipment;

(3) The technical scheme provided by the invention has higher safety. Because the existing generator has poor restoration effect on untrained images, even if mask images are intercepted, the interception end cannot restore the mask images through a restoration model. Furthermore, if the key leaks, the interception end does not know the network structure, and the image cannot be repaired. Only when the mask image, the network parameters and the secret key are all intercepted, the intercepting end can acquire the image information.

Drawings

FIG. 1 is a block diagram of a two-stage image privacy preserving method based on dynamic masking and generation restoration in accordance with a preferred embodiment of the present invention;

FIG. 2 is an exemplary diagram of generator trained image restoration;

FIG. 3 is an exemplary diagram of generator untrained image restoration;

FIG. 4 is a block diagram of a two-stage image privacy preserving system based on dynamic masking and generation restoration in accordance with a preferred embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

As shown in fig. 1-3, the embodiment of the invention discloses a two-stage image privacy protection method based on dynamic masking and generation restoration, which comprises the following steps:

s1, establishing a data set ImgDataset, preprocessing the data set ImgDataset, labeling the categories of all targets, and dividing the targets into a training set D according to the ratio of 7:2:1 _tr Verification set D _v Test set D _te Then, resNet18 pair D is used _tr And performing classification training.

S2, adaptively judging the important attention area of the image by using Score-CAM (Class Activation Mapping), generating a corresponding binary mask by adopting a pixel replacement method, and masking the original image by using the mask.

S3, sending the mask image into a generator for training. The trained parameters are encrypted based on the peal series and elliptic curve.

The steps of this embodiment will be described in detail.

In step S1 of this embodiment, imgDataset is uniformly scaled to 256×256 size, image data is converted into tensor and normalized, then input to res net18 for training, and record D _v Average accuracy and cross entropy loss, and saving the model parameters with optimal verification performance.

Step S2 of the present embodiment includes the steps of:

S22, extracting a feature map F of the ResNet18 after the last convolution layer, which contains 512 channels (F _i The ith channel of F), each channel size being 8 x 8. Then f is carried out _i Bilinear interpolation amplification is carried out to 256×256 to obtain f _i ′，f _i ' multiplying the original image, then transmitting the result to the trained ResNet18 again, and outputting the result through a softmax classifier and extracting a predicted result y corresponding to the category label c _i . As shown in formula (1), y is used _i The contribution degree of the ith channel in the feature map F to the class label c is evaluated by the difference value between the value and y, and a negative value is zeroed through a ReLU function to obtain the CAM ₀ The method comprises the steps of carrying out a first treatment on the surface of the CAM was performed as shown in equation (2) ₀ Performing dispersion normalization to obtain a class activation map CAM.

Step S3 of the present embodiment includes the steps of:

More specifically, in step S31, the encoder adopts an eight-group structure, the first group is one convolution, the second to seven groups are one activation layer, one convolution, and one batch normalization layer, and the eighth group is one activation layer and one convolution. The number of convolution kernels of the first, second and third groups of convolutions is respectively increased from 3 to 64, 64 to 128, 128 to 256, and the number of convolution kernels of the fourth to eighth groups of convolutions is 512. The convolution sizes in the eight-group structure are 4, the stride is 2, the padding is 1, and the activation functions of the second through eighth groups of activation layers are LeakyReLU. The encoder finally outputs 512 feature maps of size 1*1.

In step S32, the decoder performs deconvolution operation on the extracted semantic information features, and adopts nine groups of structures, where the first to seven groups are an active layer, a deconvolution layer, and a batch normalization layer, the eighth group is an active layer, a deconvolution layer, and the ninth group has only one active layer. The first to eight groups of semantic information features output from front to back of the encoder and the eighth to one group of semantic information features output from back to front of the decoder are stacked, so that each layer structure of the decoder can acquire the semantic information features output by the structure of the previous layer of the decoder and also can acquire the semantic information features output by the encoder. The number of convolution kernels of the first to fourth sets of deconvolutions is 512, and the number of convolution kernels of the fifth to eighth sets is reduced from 1024 to 256, 512 to 128, 256 to 64, 128 to 3, respectively. The deconvolution sizes in the first to eight groups of structures are all 4, the stride is 2, the fill is 1, the activation functions are all ReLU, and the activation function of the ninth group is Tanh. The decoder ultimately outputs a preliminary result map of 256 by 256 channels 3.

In step S33, the markov discriminator mainly comprises four layers of convolutions, three activation functions, namely, leakyReLU and one activation function Sigmoid layer; the convolution kernel size of the first three layers of convolution is 4, the stride is 2, the filling size is 1, the activation function LeakyRelu is used for activating the neurons, the activation function is set to be 0.2 in negative slope, and the in-situ execution operation is selected by the displacement parameter; the last convolution layer, convolution kernel size 4, stride 1, fill size 1, uses the activation function Sigmoid to activate the neuron. And the preliminary result image is utilized to carry out local discrimination, so that the effective recovery of the low-frequency local texture structure is ensured. The Markov discriminator receives the preliminary result image, and discriminates and processes the preliminary result image and the original complete image thereof to obtain the countermeasures loss L _adv ：

In formula (3), G represents the generator, D represents the markov discriminator, and e represents the distribution functionExpected value, p _data (I _gt ) Representing the distribution of real samples, p _data (I _mask ) Representing the distribution of the mask image, D (I _gt ) Representing Markov discriminator prediction I _gt Probability of D (I) _mask ) Representing Markov discriminator prediction I _mask Is a probability of (2).

L _re ＝||I _re -I _gt || ₁ (4)

In addition to the above-mentioned loss, in order to improve the quality of the generated image, the present invention designs a class activation loss function L based on region perception _CAM ：

The encryption method in step S34: firstly, a nonsensical plaintext is obtained by using cyclic shift to spread the plaintext on a symbol set, secondly, each element of the spread text is encoded into a real number by using a Pel sequence and a binary sequence, the purpose of hiding the spread text element is achieved, and finally, the encoded spread text is confused by generating an arrangement on an elliptic curve.

Step S4 of the present embodiment specifically includes the following steps:

S42, the receiving end receives the mask image from the sending end and the parameter ciphertext after encryption processing, decrypts the ciphertext to obtain the parameter plaintext,

As shown in fig. 1 and 4, this embodiment discloses a two-stage image privacy protection system based on dynamic masking and generation restoration, which is used for executing the above method, and includes the following modules:

For other content in this embodiment, reference may be made to the above-described method embodiments.

The foregoing description is only of the preferred embodiments of the invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. The two-stage image privacy protection method based on dynamic masking and generation recovery is characterized by comprising the following steps:

s1, establishing a data set ImgDataset, preprocessing the data set ImgDataset, marking category labels on targets, and dividing the targets into a training set D in proportion _tr Verification set D _v Test set D _te For training set D _tr Performing classification training;

s2, adaptively judging important attention areas of the image, generating a corresponding binary mask, and masking an original image by using the mask;

s3, inputting the mask image into a generator for training, and encrypting the trained network weight parameters;

2. The two-stage image privacy preserving method based on dynamic masking and generating restoration as claimed in claim 1, wherein step S1 specifically includes uniformly scaling the data set ImgDataset to 256×256 size, converting the image data into tensors and normalizing, inputting to ResNet18 for training, recording the verification set D _v Average accuracy and cross entropy loss, and saving the model parameters with optimal verification performance.

3. The two-stage image privacy preserving method based on dynamic masking and generating restoration as claimed in claim 1, wherein step S2 specifically comprises the steps of:

s21, training set D _tr Inputting the predicted result y into a trained ResNet18 network to obtain a one-dimensional vector with the length of C, and obtaining the predicted result y and a class label C corresponding to the predicted result y through a softmax classifier;

s22, extracting a characteristic diagram F of the ResNet18 after the last convolution layer, wherein the characteristic diagram comprises 512 channels, F _i The ith channel of F, each channel size being 8 x 8; will f _i Bilinear interpolation amplification is carried out to 256×256 to obtain f _i ′，f _i ' multiplying the original image, then transmitting the result to the trained ResNet18 again, and outputting the result through a softmax classifier and extracting a predicted result y corresponding to the category c _i The method comprises the steps of carrying out a first treatment on the surface of the As shown in formula (1), y is used _i Evaluating the contribution degree of the ith channel in the feature map F to the class label c by the difference value between the score and y, and returning the negative value to zero through a ReLU function to obtain a score class activation map CAM ₀ The method comprises the steps of carrying out a first treatment on the surface of the CAM was performed as shown in equation (2) ₀ Performing dispersion normalization to obtain a class activation map CAM;

s23, setting a threshold value gamma, wherein 0.5< gamma <1, and the pixel value of the class activation map CAM is larger than or equal to gamma and is regarded as an important area of the original map;

s24, constructing a full 0 mask of 256 x 256, searching the pixel point with the maximum pixel value and larger than gamma in the class activation image, acquiring the position coordinate of the pixel point, setting the pixel value of the pixel point to 0, and setting the pixel point at the same position in the mask to 1;

s25, repeating the step S24 until the pixel values in the CAM are smaller than gamma, obtaining a binary mask, wherein the pixel value is not 0, namely 1,1 corresponds to an important pixel in the original image, and 0 is opposite;

s26, filling the pixel points corresponding to the positions with the pixel values of 1 in the binary mask in the original image by using an integer of 0 to 255, and obtaining the corresponding mask image.

4. The two-stage image privacy preserving method based on dynamic masking and generating restoration as claimed in claim 3, wherein step S3 comprises the steps of:

s31, selectingTake mask image I in step S2 _mask Original picture I _gt Inputting the image into a generator, judging a mask area of the image by an encoder of the generator, and extracting semantic information of the image;

s32, a decoder in the generator performs up-sampling on the extracted semantic information to generate a preliminary restoration result image;

s33, the discriminator receives the preliminary result image I output by the generator _re Calculating the countermeasures loss by combining the original drawings, and returning to the generator;

5. The two-stage image privacy preserving method based on dynamic masking and generating restoration as claimed in claim 4, wherein in step S31, the encoder adopts eight groups of structures, the first group is a convolution, the second to seven groups are an activation layer, a convolution and a batch normalization layer, and the eighth group is an activation layer and a convolution; the number of convolution kernels of the first, second and third groups of convolutions is respectively increased from 3 to 64, 64 to 128, 128 to 256, and the number of convolution kernels of the fourth to eighth groups of convolutions is 512; the convolution sizes in the eight groups of structures are all 4, the steps are all 2, the filling is all 1, and the activation functions of the second to eighth groups of activation layers are all LeakyReLU; the encoder finally outputs 512 feature maps of size 1*1.

6. The method for protecting image privacy in two stages based on dynamic masking and generating and recovering as claimed in claim 5, wherein in step S32, the decoder performs deconvolution operation on the extracted semantic information features, and adopts nine groups of structures, the first to seven groups are an activation layer, a deconvolution layer and a batch normalization layer, the eighth group is an activation layer, a deconvolution layer, and the ninth group has only one activation layer; stacking the first to eight groups of semantic information features output from front to back by the encoder with the eighth to one group of semantic information features output from back to front by the decoder; the number of convolution kernels of the first to fourth groups of deconvolutions is 512, and the number of convolution kernels of the fifth to eighth groups is respectively reduced from 1024 to 256, 512 to 128, 256 to 64 and 128 to 3; the deconvolution sizes in the first to eight groups of structures are all 4, the stride is all 2, the filling is all 1, the activation functions are all ReLU, and the activation function of the ninth group is Tanh; the decoder ultimately outputs a preliminary result map of 256 by 256 channels 3.

7. The two-stage image privacy preserving method based on dynamic masking and generating restoration as claimed in claim 6, wherein in step S33, said arbiter mainly consists of four layers of convolution, three activation functions, lakreslu and one activation function Sigmoid layer; the convolution kernel size of the first three layers of convolution is 4, the stride is 2, the filling size is 1, the activation function LeakyRelu is used for activating the neurons, the activation function is set to be 0.2 in negative slope, and the in-situ execution operation is selected by the displacement parameter; the last convolution layer, convolution kernel size 4, stride 1, fill size 1, uses activation function Sigmoid to activate the neuron; the discriminator receives the preliminary result image, and discriminates and processes the preliminary result image and the original complete image thereof to obtain the countermeasures loss L _adv ：

In formula (3), G represents the generator, D represents the markov discriminator, e (x) represents the expected value of the distribution function, p _data (I _gt ) Representing the distribution of real samples, p _data (I _mask ) Representing the distribution of the mask image, D (I _gt ) Representing Markov discriminator prediction I _gt Probability of D (I) _mask ) Representing Markov discriminator prediction I _mask Probability of (2);

L _re ＝||I _re -I _gt || ₁ (4)

The preliminary result graph and the original result graph are processedThe complete image is input to a VGG16 feature extractor, and the VGG16 feature extractor calculates the perceived loss L of feature levels between images _p Loss of style L _s ：

In formula (5), N represents the number of layers of the convolution layer in VGG16, so here n=5; phi _j Represents the jth convolutional layer in VGG 16; c (C) _j H _j W _j Representing the size of the output characteristic diagram of the jth convolution layer; C. h, W the number of channels, height and width of the convolutional layer output profile, respectively; the Gram matrix operation mode in the formula (6) is as in the formula (7), wherein m and n represent indexes of the input feature map F in a channel dimension, and p represents indexes of the feature map F in a space dimension;

designing a region-awareness based class activation loss function L _CAM ：

In the formula (8), the expression "a",representation I _gt Category activation diagram of->Representation I _re CHW represents the number of pixels of the class activation map;

the pixel level loss, the perception loss, the style loss and the category activation loss are returned to the training generator together with the countermeasure loss output by the discriminator; the weight of each loss function is 1, 0.01, 0.1,0.002, n represents I _mask The number of pixels in the middle mask area.

8. The two-stage image privacy preserving method based on dynamic masking and generating restoration as claimed in claim 7, wherein the encryption in step S34: firstly, using cyclic shift to the symbol set to diffuse the plaintext, thereby obtaining meaningless plaintext; secondly, each element of the diffusion text is encoded into a real number by using a Pel sequence and a binary sequence, so that the purpose of hiding the diffusion text element is achieved; finally, the coded diffusion text is obfuscated by generating an arrangement on the elliptic curve.

9. The two-stage image privacy preserving method based on dynamic masking and generating restoration as claimed in any one of claims 1-4, wherein step S4 comprises the steps of:

s41, the sending end sends the parameter ciphertext and the mask image to the receiving end respectively;

10. The two-stage image privacy protection system based on dynamic masking and generation restoration is characterized by comprising the following modules:

and (3) a repair module: and respectively transmitting the mask image and the parameter ciphertext to a receiving end, decrypting the parameter ciphertext by the receiving end, loading the network weight parameter into a generator with the same network structure as the transmitting end in the receiving end, and repairing the mask image by the generator.