CN111598762A

CN111598762A - Generating type robust image steganography method

Info

Publication number: CN111598762A
Application number: CN202010315084.3A
Authority: CN
Inventors: 黄晓; 万林鸿; 倪江群
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2020-04-21
Filing date: 2020-04-21
Publication date: 2020-08-28
Anticipated expiration: 2040-04-21
Also published as: CN111598762B

Abstract

The invention provides a generating robust image steganography method, which comprises the following steps: constructing an image data set, and preprocessing the image data set; constructing and initializing a deep learning network architecture; training a deep learning network architecture by adopting a combined-fine tuning method to obtain a network architecture model; and generating a secret-carrying pseudo-graph by using a network architecture model, and carrying out secret communication to finish the image steganography process. The image steganography method provided by the invention integrates the embedding process of the secret information into the image generation process by utilizing the generation countermeasure network StyleGAN, constructs a generation type image steganography framework which can bear the secret information with larger capacity and has certain robustness, so that the obtained generation type image steganography method has the advantages of larger embedding capacity, good generated image quality, strong undetectable secret image statistics, high practicability and the like, and solves the problems of poor secret image quality, low embedding capacity, low information extraction accuracy and the like generated by the existing generation type image steganography.

Description

Generating type robust image steganography method

Technical Field

The invention relates to the technical field of image steganography in multimedia security, in particular to a generating type robust image steganography method.

Background

The image steganography is that secret information is embedded into a carrier image through certain algorithms to generate a secret image and the secret image is transmitted through a public channel, a receiver can obtain the secret information in the secret image through a corresponding extraction means, and the generated secret image has no obvious trace visually or statistically, so that a third party cannot know whether the image contains the secret information. According to the using mode of the carrier image, the image steganography algorithm can be divided into two types of carrier modification and carrier generation. The former is a commonly adopted mode at present, and the former modifies the pixel value of a carrier image or the coefficient of a transform domain finely to achieve the purpose of embedding information; the latter is generative image steganography, which constructs a bidirectional mapping of secret information and image distribution through some algorithm, the sending end can map the secret information into a vivid pseudo-natural image, and the receiving end can map the image back to the secret information, so that the processes of carrying the image and embedding are not needed.

The traditional image steganography generally refers to image steganography based on carrier modification, such steganography algorithms include UED [1], UNIWARD [2] and the like, generally, steganography analysis problems can be defined as two classification problems of carrier images and dense images, and then the images are detected by means of feature extraction, machine learning and the like. In recent years, application of deep learning in the field of steganalysis has achieved important results, some image advanced steganalysis methods based on convolutional neural networks, such as Yenet, SRnet and the like, are proposed successively, and image steganalysis schemes based on content self-adaptation have been threatened greatly.

In fact, modifying the carrier inevitably distorts the image, and the generated image steganography avoids modifying the image, which is theoretically a safer and more reliable steganography method. But since it is almost impossible to fully model a complex image distribution, it is difficult to get a two-way mapping of secret information to the image distribution under high load. In the existing generation-type image steganography method: some methods construct the mapping between the secret information and the image semantics through a bag of words or image hashing and other ways, the capacity of the method is very low, and a huge natural image database needs to be constructed; other methods map the composition of secret information with a class of image textures through well-designed invertible mathematical functions, which are also limited in capacity.

The generation countermeasure network (GAN) is an unsupervised learning framework, is excellent in tasks such as image generation, and has become an important branch of deep learning. The generation countermeasure network includes a generator and a discriminator, the generator aims to generate a realistic pseudo-natural image to try to deceive the discriminator, and the discriminator aims to discriminate a real image and a generated image as much as possible, which oppose each other and learn, urging both parties to evolve toward the target continuously. Wherein the confrontational relationship between the generator and the discriminator is in accordance with the confrontational behavior between the steganographic party and the steganographic analysis party. The generated countermeasure network is applied to the image steganography task, so that the network autonomously learns the steganography scheme, and the heuristic design problem of the traditional image steganography is solved. Based on this idea, some methods of image steganography using a generation countermeasure network have been proposed. At present, image steganography methods based on generation of a countermeasure network are limited, and all of the methods focus on information embedding under a carrier modification framework, and can be divided into the following three categories according to different generation targets of the countermeasure network:

firstly, a carrier image beneficial to image steganography is generated by using a generation countermeasure network, and the steganography process is still carried out by using a traditional steganography algorithm, such as SGAN [3] and SSGAN [4 ]. Taking SGAN as an example, it includes a generator G, two discriminators D and S. The generator G inputs random noise and outputs a carrier image; the generator G and the discriminator D carry out countermeasure training by utilizing the natural image and the generated carrier image to generate a carrier image closer to the natural image; carrying out image steganography on the generated carrier image by using a traditional steganography algorithm to obtain a secret image; the generator G and the discriminator S perform countermeasure training by using the carrier image and the secret-carrying image to improve the safety of the secret-carrying image.

Secondly, a modified probability map of pixel values generated by a generation countermeasure network is used, and a neural network structure is designed to simulate the process of embedding secret information by STC coding in the traditional image steganography, such as ASDL-GAN [5] and UT-GAN [6 ]. ASDL-GAN is the work of this aspect in mountaineering. The network comprises a generator G and a discriminator D. The generator G takes the carrier image as input and outputs a corresponding pixel value modification probability graph; the pixel value modification probability graph finally obtains a secret-carrying image through a TES (partial embedding simulator) network proposed by the document; the generator G and the discriminator D use the carrier image and the secret image to carry out the confrontation training. The network of UT-GAN is designed more delicately on the basis of ASDL-GAN, the security of the network is further improved, and the network is a steganography scheme with the best anti-detection performance under the carrier modification framework at present.

Thirdly, secret information is embedded into the carrier image by using a generation countermeasure network to directly obtain a secret image. The information embedding process of the scheme is not performed through the modification probability of the pixel value like the traditional image steganography, but the whole carrier modification task is completely handed to the generation of the countermeasure network to be completed, but a corresponding extractor needs to be trained to extract information. The corresponding scheme comprises Hayes GAN and SteganoGAN, and the networks of the Hayes GAN and the SteganoGAN consist of a generator G, a discriminator D and an extractor S. Hayes GAN 7 is the first network of this kind of scheme, its generator G regards stealthy information of 100-400 bits and 32 x 32 carrier image as the input, produce the corresponding carrier image, and compete with the training with the arbiter D, and the extractor S regards carrier image as the input, withdraw and output the stealthy information; SteganoGAN [8] further designs a generator and an extractor, and realizes the steganography of images with larger capacity.

For the method of carrier generation, application research for generating a countermeasure network is still shallow, and the existing method of generating image steganography based on the countermeasure network is DCGAN-based generating image steganography proposed in the document [9], which maps secret information subject to uniform distribution into a hidden vector subject to normal distribution and generates the hidden vector into a secret-carrying pseudo face image by using DCGAN (deep convolution to generate the countermeasure network). However, the existing DCGAN-based generated image steganography has three problems: firstly, the generated pseudo-graph has low resolution and serious distortion, because the DCGAN network has limited performance; secondly, the embedding capacity is low (lower than 0.01bpp), the load capacity is limited, and the information is encoded by the embedding capacity through an implicit vector; thirdly, consistency of the generator and the extractor during training is not considered, and the information extraction accuracy rate is relatively low.

Disclosure of Invention

The invention provides a generating robust image steganography method for overcoming the technical defects of low resolution, limited load capacity and low information extraction accuracy of a generated pseudo-image in the existing DCGAN-based generating image steganography method.

In order to solve the technical problems, the technical scheme of the invention is as follows:

a method of generating robust image steganography, comprising the steps of:

s1: constructing an image data set, and preprocessing the image data set;

s2: constructing and initializing a deep learning network architecture;

s3: training a deep learning network architecture by adopting a combined-fine tuning method to obtain a network architecture model;

s4: and generating a secret-carrying pseudo-graph by using a network architecture model, carrying out secret communication, and completing the steganography process of the generated robust image.

In the scheme, the generation countermeasure network StyleGAN is utilized to integrate the embedding process of the secret information into the image generation process, and a generation type image steganography framework which can bear the secret information with larger capacity and has certain robustness is constructed, so that the obtained generation type image steganography method has the advantages of larger embedding capacity, good generated image quality, strong undetectable secret image statistics, high practicability and the like, and solves the problems of poor quality of the secret image generated by the existing generation type image steganography, low embedding capacity, low information extraction accuracy and the like.

Wherein, the step S1 specifically includes:

selecting a CelebA-HQ face data set as a training set, wherein the data set comprises 30000 high-quality face images of 1024 multiplied by 1024, and carrying out mirror image amplification on the face images to obtain 60000 face images;

the pre-processing of the image data is further accomplished by scaling to a size of 128 x 128 using the imresize function of MATLAB, saved in pgm format.

Wherein, the step S2 specifically includes:

constructing a deep learning network architecture based on a StyleGAN network, wherein the deep learning network architecture comprises a modulation module, a coding module, a generator, a discriminator, an extractor and a noise adding module; wherein:

the modulation module modulates the secret information into a noise map generated randomly to generate a secret-carrying noise map;

the coding module carries out nonlinear change on the secret-carrying noise image so as to adapt to a robust image steganography task and distribute secret information into robust image characteristics;

the generator takes the random hidden vector and the noise map containing the secret information obtained from the encoding module as input to generate a secret-carrying pseudo map;

the discriminator takes the secret-carrying pseudo-graph or the real image as input to generate a discrimination score as an object of the generator for countertraining;

the extractor takes the secret-carrying pseudo-graph as input, extracts the secret information from the secret-carrying pseudo-graph and outputs the secret information;

the noise adding module is arranged between the generator and the extractor and is used for carrying out channel attack on the secret-carrying pseudo-graph in the transmission process so as to train and improve the robustness of the secret-carrying pseudo-graph.

In the above scheme, the deep learning network architecture is mainly realized by the following means:

firstly, the image generation capability of the StyleGAN and the characteristics of the network are fully utilized to generate a dense pseudo-natural image with higher resolution, more clearness and vividness;

secondly, the embedding mode of the secret information is changed, the information is not encoded by a hidden vector, but is modulated on a noise map in a noise adding module of StyleGAN. Thus, the secret information is expressed in the details which have little influence on the content of the pseudo-graph and higher imperceptibility, and has stronger secret carrying capacity.

Thirdly, a customized dense connection type network is designed to be used as a message extractor to extract a large amount of secret information more effectively, and a combined-fine tuning type training method is provided to fully train the message extractor, so that the accuracy of information extraction is further improved, and the effective capacity of a pseudo-graph is ensured.

Finally, considering the problem of channel attack in practical use, a corresponding coding module is designed to carry out nonlinear transformation on the secret-carrying noise image so as to adapt to a robust image steganography task, and a channel noise simulation module is added between a generator and an extractor, so that the robustness of the network is trained in a targeted manner, and the channel attack resistance of the secret-carrying image is improved.

Wherein, in the step S2, the secret-carrying noise map is obtained by modulating secret information with 64 × 64 bits to a random noise map with 64 × 64 resolution; in order to integrate the secret information into the image generation process and minimize the influence caused by the secret information, an information modulation mode without changing the noise distribution is adopted:

the sign of the input noise is modulated by binary coded information, 1 and 0 respectively represent the positive and negative of the noise, and the sign is represented by the following formula:

N′＝ABS(N)*(2M-1)

wherein N represents input noise, N' represents modulated noise, M represents binary coded secret information, and ABS (-) is an absolute operator; assuming that binary coded information follows a 0, 1, etc. probability distribution, and input noise is sampled from a normal distribution, then symbol-modulated secret-borne noise will also follow a normal distribution.

In step S2, the encoding module is a customized residual convolutional neural network, which includes two convolutional layers and two residual blocks; the residual block is a structure of adding a short-circuit path between two convolution layers; setting the kernel size of all the convolution layers to be 3 multiplied by 3 and the step length to be 1, and adopting a correction linear unit with leakage for the activation function of each layer except the last layer so as to avoid information loss; the resolution of the transformed secret noise map obtained by the coding module is unchanged, but the secret noise map can be more suitable for the distribution of the pseudo-map, so that the secret information is distributed in the robust image characteristics.

In step S2, the generator and the arbiter are both in a step-wise convolutional neural network structure, and are divided into 6 levels of 4 × 4, 8 × 8, 16 × 16, …, and 128 × 128; the generator takes the hidden vector, a plurality of random noise images and a noise image containing secret information as input, and takes the secret-carrying pseudo face image as output; wherein the hidden variable is an input to the mapping module, which determines the image content; random noise is input to each hierarchy noise adding module, and accordingly comprises noise maps with 6 different resolutions, such as 4 × 4, 8 × 8, 16 × 16, …, 128 × 128 and the like, which affect subtle changes and characteristics of different levels of images; the discriminator is used as an object of the generator for countertraining, the secret-carrying pseudo-graph or the real image is used as input, the discrimination score is used as output, and the structure of the discriminator and the generator are mirror images after the additional module is removed.

Wherein, in the step S2, the extractor is a dense connection type network which inputs a 128 × 128 × 3 dense image and outputs secret information of 64 × 64 bits; the densely connected network comprises 5 x 5 convolutional layers and a 2 x 2 average pooling layer, the output of the previous convolutional layer is shorted to the output of the subsequent convolutional layer to improve the information flow of the network, and the average pooling layer is adopted to perform down-sampling at the end of the network to reduce the risk of information loss.

In step S2, the noise module is added between the generator and the extractor, and two channel attacks, namely JPEG compression and scaling, are performed during the analog transmission process to train and improve the robustness of the secret-loaded image; the noise adding module comprises three modes, namely lossless transmission, scaling and JPEG compression, wherein one of the three modes is randomly selected from each iteration as a current channel state, and the attack on a secret-carrying image output by the generator is simulated;

the lossless transmission is the simplest mode, the secret-carrying pseudo-graph is directly transmitted to the extractor without any processing, and the image is not attacked in the transmission process; the scaling mode carries out scale transformation on the secret-carrying pseudo-map, and the coefficients are randomly sampled between (0.5, 1) each time, so that the resolution of the image is correspondingly lowered after the mode; in JPEG compression, an image is divided into 8 multiplied by 8 areas, discrete cosine transform in each area is calculated, and then different roughness quantization is carried out on the obtained frequency domain coefficients; the quantization step is not guided, so the JPEG compression cannot be directly suitable for optimization based on gradient, and the JPEG compression needs to be guided and approximated, so the quantization step of the JPEG compression is directly modified by adopting an approximation mode, and the modification method specifically comprises the following steps of:

Round_pprox(x)＝Round(x)+(x-Round(x))³

wherein Round_pprox(. cndot.) represents a rounded function that can be approximated, Round (. cndot.) represents a rounded function, and x represents quantized DCT coefficients. At worst, the derivative JPEG compression is numerically 0.125 different from JPEG compression.

Wherein, in the step S3, the joint-fine tuning method includes a plurality of iteration processes, each iteration is divided into three stages:

the first stage is the training of the discriminator, which aims at minimizing the Wasserstein loss, which is the difference between the discrimination score of the real image and the discrimination score of the secret-loaded pseudo-graph, and represents the distance measure of the distribution of the pseudo-graph and the distribution of the real graph:

wherein L is₁Is the stage objective function, D (-) is the discriminator network, G (-) is the generator network, Z, M, X are the hidden vector, secret information and secret-carrying pseudo-graph, respectively, and E denotes the mathematical expectation.

The second stage is the joint training of the generator and the extractor, and the objective function of the joint training is composed of two parts of the countermeasure loss and the decoding loss, which are specifically expressed as follows:

wherein L is₂It is the stage objective function CrossEntropy (x, y) that is used to find the cross entropy for both x and y, M' represents the secret information extracted by the extractor, E (-) is the discriminator, and the other parameters are the same as described above. The countermeasure loss is a discrimination score obtained by inputting the generated secret-carrying pseudo-map into a discriminator, and the minimized countermeasure loss acts onThe verisimilitude of the secret-loaded pseudo graph is improved; the decoding loss is defined as binary cross entropy between the secret information and the extracted information, and the minimum decoding loss acts on improving the extraction accuracy of the secret information; when the generator and the extractor are jointly trained by using the objective function consisting of the two parts, the gradient generated by decoding loss can be transmitted to the extractor and then to the generator, the network parameters of the generator and the extractor are adjusted, the consistency of the generator and the extractor is ensured, and meanwhile, the loss-resisting gradient flow helps the pseudo-graph generated by the generator to be vivid;

the third stage is fine training of an extractor, and the target is to minimize decoding loss after rounding and storing the secret-carrying pseudo-graph; after the generator and the extractor are jointly trained, the extractor is further refined by using the rounded secret-carrying pseudo-map, specifically:

fixing the network parameters of the generator at this stage, generating a batch of secret-carrying pseudo-graphs, mapping the values of the secret-carrying pseudo-graphs to the range of 0-255 for rounding, then mapping the rounded image values back to the range of 0-1, and inputting the image values into an extractor for training to minimize decoding loss, specifically:

L₃＝CrossEntropy(M",M)，M"＝E(Round(G(Z,M)))

wherein L is₃Is the objective function of the stage, M "represents the secret information extracted by the extractor at the stage, Round (-) represents the rounding function, and other parameters are the same as described above.

Unlike the previous stage, this stage only trains the extractor, no longer trains the generator, and therefore no gradient stream passes into the generator;

and performing iterative training on the network for a plurality of times according to the method, and terminating the whole training process when the quality of the image is not increased and the extraction accuracy of the secret information is not increased any more, so as to finish the training of the deep learning network architecture and obtain a network architecture model.

Wherein, the step S4 specifically includes:

when the network architecture model is actually used, a sending end holds a generator, and a receiving end holds an extractor; in the process of one-time communication, a transmitting end firstly modulates secret information with 64 multiplied by 64 bits into a secret-carrying noise map with 64 multiplied by 64 resolution, obtains the converted secret-carrying noise map through an encoding module, and replaces the input noise of a penultimate noise adding module in a generator with the secret-carrying noise map; then, a random hidden vector with 512 bits is used as the prior distribution of the pseudo-map and is input into a generator to obtain a secret-carrying pseudo-map with 128 x 128 resolution; the secret-carrying pseudo-graph reaches a receiving end after passing through a public channel; the receiving end utilizes the corresponding extractor to directly decode the secret-carrying pseudo-graph to obtain the secret information of 64 multiplied by 64 bits contained therein, thereby achieving one-time secret communication.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the method for steganography of the generated robust image provided by the invention integrates the embedding process of the secret information into the generation process of the image by utilizing the generation countermeasure network StyleGAN, constructs a generated image steganography framework which can bear the secret information with larger capacity and has certain robustness, so that the obtained generated image steganography method has the advantages of larger embedded capacity, good quality of the generated image, strong undetectable secret image statistics, high practicability and the like, and solves the problems of poor quality of the secret image generated by the existing generated image steganography, low embedded capacity, low information extraction accuracy and the like.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a schematic diagram of a generative robust image steganography network architecture based on a StyleGAN generative countermeasure network construction;

FIG. 3 is a schematic diagram of a network structure (left) and a residual block structure (right) of an encoding module;

fig. 4 is a schematic diagram of a densely connected extractor.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;

it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

Example 1

As shown in fig. 1, a method for generating robust image steganography includes the following steps:

s1: constructing an image data set, and preprocessing the image data set;

s2: constructing and initializing a deep learning network architecture;

In the specific implementation process, the generation countermeasure network StyleGAN is utilized to integrate the embedding process of the secret information into the image generation process, and a generation type image steganography framework which can bear the secret information with larger capacity and has certain robustness is constructed, so that the obtained generation type image steganography method has the advantages of larger embedding capacity, good generated image quality, strong undetectable secret image statistics, high practicability and the like, and solves the problems of poor quality of the secret image generated by the existing generation type image steganography, low embedding capacity, low information extraction accuracy and the like.

More specifically, the step S1 specifically includes:

and selecting a CelebA-HQ face data set as a training set. The data set contains 30000 high-quality face images of 1024 × 1024, which are mirror-amplified to 60000 face images. Then further scaled to a size of 128 x 128 by using the imresize function (default parameters) of MATLAB, saved in pgm format.

More specifically, as shown in fig. 2, the step S2 specifically includes:

In a specific implementation process, the coding module is used for carrying out nonlinear transformation on the secret-carrying noise map so as to adapt to a robust image steganography task. The secret-carrying noise map is obtained by modulating secret information of 64 x 64 bits to a random noise map of 64 x 64 resolution. In order to integrate the secret information into the image generation process and minimize the influence caused by the secret information, the invention adopts an information modulation mode without changing the noise distribution: the sign of the input noise is modulated by binary coded information, and 1 and 0 represent the positive and negative of the noise, respectively, and can be expressed by the following formula.

N′＝ABS(N)*(2M-1)

Where N represents the input noise, M' represents the modulated noise, M represents the binary-coded secret information, and ABS (-) is the absolute operator. Assuming that binary coded information follows a 0, 1, etc. probability distribution, and input noise is sampled from a normal distribution, then symbol-modulated secret-borne noise will also follow a normal distribution. The coding module is a customized residual convolutional neural network as shown in fig. 3, which includes two convolutional layers and two residual blocks. The residual block is a structure in which a short-circuited path is provided between two convolutional layers, as shown in the right part of the figure. The kernel size of all convolutional layers is set to 3 × 3, the step size is set to 1, and the activation function of each layer except the last layer adopts a leakage modified linear unit (leakage ReLU) to avoid information loss. The resolution of the transformed secret noise map obtained by the coding module is unchanged, but the secret noise map can be more suitable for the distribution of the pseudo-map, so that the secret information is distributed in the robust image characteristics.

In the implementation process, the generator and the arbiter are advanced convolutional neural network structures, similar to the StyleGAN documents, and are divided into 6 layers of 4 × 4, 8 × 8, 16 × 16, …, and 128 × 128. The generator takes the hidden vector, a plurality of random noise images and a noise image containing secret information as input, and takes the secret pseudo face image as output. Wherein the hidden variable is an input to the mapping module, which determines the image content; random noise is input to each level of the noise-adding module, and accordingly comprises 6 noise maps with different resolutions, such as 4 × 4, 8 × 8, 16 × 16, …, 128 × 128, etc., which affect different levels of subtle changes and features of the image. Because the low-level features affect the content of the image, the robustness is strong, the high-level feature map has high resolution and large carrying capacity, and after the capacity and the robustness are balanced, the secret information is modulated into the noise map of the 64 x 64-level noise adding module to obtain the carrying capacity of 0.25bpp, and meanwhile, in order to endow the carrying capacity pseudo map with stronger channel attack resistance, the noise map obtained by transforming the modulated noise map through the coding module is used as the input of the last noise adding module. The discriminator is used as an object of the generator for countertraining, the pseudo face image or the real image is used as input, the discrimination score is used as output, and the network structure and the generator are mirror images after the additional module is removed.

In a specific implementation, as shown in fig. 4, the extractor is a dense connection type network that inputs 128 × 128 × 3 dense images and outputs 64 × 64 bits of secret information. The densely connected network comprises 5 x 5 convolutional layers and a 2 x 2 average pooling layer, the output of the previous convolutional layer is shorted to the output of the subsequent convolutional layer to improve the information flow of the network, and the average pooling layer is adopted to perform down-sampling at the end of the network to reduce the risk of information loss. The application of the dense connection structure enables forward information to flow more strongly during network prediction, gradient reverse gradient transmission to be better during training, and the extractor is endowed with stronger information extraction capability to improve the accuracy of information extraction.

In the specific implementation process, a noise adding module is additionally added between a generator and an extractor, and two channel attacks, namely JPEG compression and scaling, which are most common in the transmission process are simulated to train and improve the robustness of the secret image. The channel noise simulation module comprises three modes, namely lossless transmission, scaling and JPEG compression, wherein one of the three modes is randomly selected as a current channel state in each iteration, and attacks on a secret-carrying image output by a generator are simulated. Lossless delivery is the simplest mode, and the secret-carrying pseudo-image is directly delivered to the extractor without any processing, which represents that the image is not attacked in the transmission process. The scaling mode scales the secret-carrying pseudo-map, with coefficients sampled randomly between (0.5, 1) each time, so that the resolution of the image becomes correspondingly lower after passing through the mode. JPEG compression, however, is a relatively complex transform that first divides an image into 8 x 8 regions, computes a Discrete Cosine Transform (DCT) within each region, and then quantizes the resulting frequency domain coefficients with different degrees of roughness. Where the quantization step is not derivable, JPEG compression cannot be directly applied to gradient-based optimization. The JPEG compression needs to be subjected to guided approximation, and the approximation method adopted by the invention is to directly modify the quantization step of the JPEG compression, wherein the modification method is shown in the following formula. At worst, the derivative JPEG compression is numerically 0.125 different from JPEG compression.

Round_pprox(x)＝Round(x)+(x-Round(x))³

The above is the network architecture adopted by the present invention.

More specifically, the present invention provides a joint-fine type training method for overcoming the contradiction between image quality and information extraction accuracy. The method can greatly improve the extraction accuracy of the secret information on the premise of ensuring the quality of the secret-carrying pseudo-graph. Each iteration in the combined-fine training is divided into three stages:

the first stage is the training of the arbiter, which aims to minimize the Wasserstein loss, as follows. The Wasserstein loss is the difference between the discrimination score of the real image and the discrimination score of the secret-loaded pseudo-graph, and represents a distance measure between the distribution of the pseudo-graph and the distribution of the real graph:

the second stage is joint training of the generator and the extractor, and the objective function of the joint training is composed of two parts of the countervailing loss and the decoding loss, which are as follows. The countermeasure loss is a discrimination score obtained by inputting the generated secret-carrying pseudo graph into a discriminator network, and the minimized countermeasure loss acts on the reality of the promoted secret-carrying pseudo graph; and the decoding loss is defined as binary cross entropy between the secret information and the extracted information, and the minimum decoding loss acts on improving the extraction accuracy of the secret information. The weights of both were experimentally determined to be 1 and 20. When the generator and the extractor are jointly trained by using the objective function consisting of the two parts, the gradient generated by decoding loss can be transmitted to the extractor and then to the generator, the network parameters of the generator and the extractor are adjusted, the consistency of the generator and the extractor is ensured, and meanwhile, the loss-resisting gradient flow helps the pseudo-graph generated by the generator to be vivid.

The third stage is fine training of the extractor, and the aim is to minimize decoding loss after rounding and saving the secret-carrying pseudo-graph. Because the secret-carrying pseudo-graph output by the generator is a floating point number in numerical value, rounding and storage are required in practical use, which causes the extraction accuracy of the secret information by the extractor to be greatly reduced, and is caused by the fact that the distribution of the pseudo-graph in the training stage is inconsistent with that in the testing stage. To eliminate this effect, the present invention further refines the extractor with rounded pseudo-graphs after the generator and extractor are jointly trained. Specifically, at this stage, the network parameters of the generator are fixed, a batch of secret-carrying pseudo-graphs is generated, the values of the secret-carrying pseudo-graphs are mapped to the range of 0-255 for rounding, then the rounded image values are mapped back to the range of 0-1, and the image values are input to the extractor for training to minimize decoding loss, as shown in the following formula. Note that unlike the previous stage, this stage only trains the extractor, no longer trains the generator, and thus no gradient stream is passed back into the generator.

L₃＝CrossEntropy(M",M)，M"＝E(Round(G(Z,M)))

And performing iterative training on the network for a plurality of times according to the method, and terminating the whole training process when the quality of the image is not increased and the extraction accuracy of the secret information is not increased. Experiments show that the information extraction accuracy rate of the combined-fine type training method is higher than that of other methods, the effective capacity is larger, and the generated secret-carrying pseudo-graph is difficult to distinguish from a real image visually.

The above is the combined-fine tuning type training method adopted by the invention aiming at the contradiction between the image quality and the information extraction accuracy.

Example 2

More specifically, the prior art has the following technical drawbacks:

(1) conventional carrier-modified image steganography algorithms, so-called content-adaptive-based image steganography algorithms, present significant challenges to their security in the face of advanced steganography analyzers based on convolutional neural networks. This is because the method needs to heuristically design the distortion cost function, and it is difficult to comprehensively and effectively implement accurate modeling due to a series of factors such as the designer's own limitations and algorithm complexity, which results in that the image steganography algorithm based on content adaptation is easily detected by the steganography analyzer based on the convolutional neural network.

(2) The carrier-modified image steganography scheme based on generation of the countermeasure network is still in intensive research, and at present, only UT-GAN has slightly better security than the traditional image steganography algorithm, but is still easy to detect by the existing advanced steganography analysis method under the condition of high load. This is especially true for other protocols such as SGAN, SSGAN, Hayes GAN, SteganoGAN, etc. Since the carrier-modified image steganography needs to be tampered with the natural image, the correlation among image pixels is inevitably destroyed, and the more the carrier-modified image steganography is tampered with, the more serious the distortion is, and the easier the distortion is captured by a steganography analyzer.

(3) The conventional carrier-generated image steganography methods all have the problem of extremely low embedding capacity (lower than 0.005 bpp). This is because it is almost impossible to accurately model a complex image distribution, and it is difficult to get a two-way mapping of secret information to the image distribution under high load conditions.

(4) The generation of the countermeasure network can better learn the image distribution and is natural and friendly to the generative image steganography method. However, the existing DCGAN-based generated image steganography has three problems: firstly, the generated pseudo-graph has low resolution and serious distortion, because the DCGAN network has limited performance; secondly, the embedding capacity is low (lower than 0.01bpp), the load capacity is limited, and the information is encoded by the embedding capacity through an implicit vector; thirdly, consistency of the generator and the extractor during training is not considered, and the information extraction accuracy rate is relatively low.

(5) The image steganography method does not consider the problem of practical use, has no robustness, and is easy to fail after being attacked by common channels such as compression, scaling and the like.

In response to the drawbacks of the prior art, the present invention, based on embodiment 1, uses StyleGAN [10] network to design a robust image steganography framework with larger capacity. The StyleGAN is an improved progressive generation countermeasure network, which changes the structure of a generator mainly by adding an additional module, and realizes unsupervised generation of images with strong controllability. The additional modules comprise a mapping network of the hidden vector, a pattern module and a noise adding module. The mapping module is a neural network consisting of a plurality of fully-connected layers, maps the hidden vector W into an intermediate vector W', and provides a learning path for the characteristic decoupling of W; the style module is a learnable affine transformation which transforms the intermediate vector into the style control variable of each layer and applies the style control variable to the feature map through the self-adaptive normalization operation to realize style control; the noise adding module is an additional module for increasing the details and the randomness of the image, and adds the scaled random noise to the feature map of each layer before carrying out the style transformation, so that the image is more vivid in generation.

The invention utilizes more advanced generation countermeasure network StyleGAN to integrate the embedding process of the secret information into the generation process of the image, and constructs a generation type image steganography framework which can bear the secret information with larger capacity and has certain robustness. The architecture is mainly realized by the following means:

By the means, the generated image steganography obtained by the method has the advantages of large embedding capacity, good generated image quality, strong undetectable secret-carrying image statistics, high practicability and the like, and solves the problems of poor secret-carrying image quality, low embedding capacity, low information extraction accuracy and the like generated by the conventional generated image steganography.

In a specific implementation process, a generating robust image steganography method is provided, which specifically comprises the following steps:

first, a data set is constructed. The invention selects the CelebA-HQ human face data set as a training set. The data set contains 30000 high-quality face images of 1024 × 1024, which are mirror-amplified to 60000 face images. Then further scaled to a size of 128 x 128 by using the imresize function (default parameters) of MATLAB, saved in pgm format.

And secondly, training and testing the network architecture. The invention uses PyTorch deep learning framework to realize the coding module, the generator, the discriminator, the extractor and the channel noise simulation module, and uses Adam gradient descent algorithm to train the network on the NIVDIA1080TI video card platform, wherein, the over-parameter beta of the algorithm is set as (0.0,0.99), the learning rate is set as 0.001, and the batch number is set as 16. The initialization of all convolutional layer convolutional cores of the network adopts a Gaussian initialization method and uses equal Learning Rate (equal Learning Rate) skills to ensure benign competition among a generator, a discriminator and an extractor. After the initialization is completed, iterative training is performed by using progressive training (progressive training) skill on the basis of the aforementioned combined-fine training method, that is, the network starts from a stage with a resolution of 4 × 4, and trains and increases the resolution of the image step by step until the resolution of the image that can be generated by the generator reaches the resolution of 128 × 128 of the target. It should be noted that the extractor is only added to the learning at the last stage (resolution 128 x 128). The whole network architecture is iteratively trained for more than 300000 times in total, and the whole training process is terminated until the image quality (evaluated by FID scores) and the extraction accuracy rate do not rise any more.

And thirdly, generating a secret-carrying pseudo-graph by using the trained network architecture and carrying out secret communication. After training is completed, the transmitting end holds the generator and the receiving end holds the extractor in actual use. Taking a communication process as an example, a sending end firstly modulates secret information with 64 × 64 bits into a secret-carrying noise map with 64 × 64 resolution, and the secret-carrying noise map is obtained after conversion through an encoding module and replaces input noise of a penultimate noise adding module in a StyleGAN generator with the secret-carrying noise map. Then, a random hidden vector with 512 bits is used as the prior distribution of the pseudo-image and is input into a generator, and a secret-carrying pseudo-face image with 128 x 128 resolution is obtained. And the secret-carrying image reaches a receiving end after passing through the public channel. The receiving end uses the corresponding extractor to directly decode the fake face image to obtain the secret information of 64 multiplied by 64 bits contained therein, thereby achieving one-time secret communication.

In the specific implementation process, compared with the prior art, the method has the following advantages:

firstly, the generated secret-carrying image has higher resolution and better quality. This is mainly due to the powerful image generation capability of StyleGAN. In addition, the invention adopts an information modulation mode without changing the distribution of the noise map, can fully exert the image generation capability of the StyleGAN, and ensure that the pseudo map is still not distorted under the condition of large load.

Secondly, the statistical undetectable property is stronger. The present invention modulates information onto a noise map in the noise addition module of the StyleGAN. Thus, the secret information is expressed in the details with little influence on the content of the pseudo-graph and higher concealment, and has stronger statistical undetectable property.

And thirdly, the carrying capacity is higher. The invention obtains the loading capacity of 0.25bpp by modulating the secret information into the noise map, designs a customized dense connection type network as a message extractor to more effectively extract a large amount of secret information, and simultaneously, the proposed combined-fine tuning type training method greatly improves the accuracy of information extraction and ensures the effective capacity of robust steganography.

Fourthly, the practicability is high, and the secret-carrying pseudo graph has the robust characteristic of resisting channel attack. The invention designs a coding module to carry out nonlinear transformation on the secret-carrying noise image so as to adapt to a robust steganography task, and adds a channel noise simulation module between a generator and an extractor, thereby training the robustness of the network in a targeted manner and improving the channel attack resistance of the secret-carrying image.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

[1]Guo L,Ni J,Shi Y Q,et al.Uniform Embedding for Efficient JPEGSteganography[J].IEEE Transactions on Information Forensics and Security,2014,9(5):814-825.

[2]Holub V,Fridrich J,Denemark T,et al.Universal distortion functionfor steganography in an arbitrary domain[J].Eurasip Journal on InformationSecurity,2014,2014(1):1-13.

[3]Volkhonskiy D,Nazarov I,Borisenko B,et al.SteganographicGenerative Adversarial Networks.[J].arXiv:Multimedia,2017.

[4]Shi H,Dong J,Wang W,et al.SSGAN:Secure Steganography Based onGenerative Adversarial Networks[C]//Advances in Multimedia InformationProcessing–PCM 2017.Cham:Springer International Publishing,2018:534–544.

[5]Tang W,Tan S,Li B,et al.Automatic Steganographic DistortionLearning Using a Generative Adversarial Network[J].IEEE Signal ProcessingLetters,2017,24(10):1547–1551.

[6]Yang J,Ruan D,Huang J,et al.An embedding cost learning frameworkusing gan[J].IEEE Transactions on Information Forensics and Security,2019,15:839-851.

[7]Hayes J,Danezis G.Generating steganographic images via adversarialtraining[M]//.Guyon I,Luxburg U V,Bengio S,et al,(eds.).Advances in NeuralInformation Processing Systems 30.Curran Associates,Inc.,2017:1954–1963.

[8]Zhang K A,Cuesta-Infante A,Xu L,et al.SteganoGAN:high capacityimage steganography with gans[J].arXiv preprint arXiv:1901.03892,2019.

[9]Hu D,Wang L,Jiang W,et al.A Novel Image Steganography Method via DeepConvolutional Generative Adversarial Networks[J].IEEE Access,2018:38303-38314.

[10]Karras T,Laine S,Aila T,et al.A Style-Based GeneratorArchitecture for Generative Adversarial Networks[C].computer vision andpattern recognition,2019:4401-4410.

Claims

1. A method of generating robust image steganography, comprising the steps of:

s1: constructing an image data set, and preprocessing the image data set;

s2: constructing and initializing a deep learning network architecture;

2. The method for generating robust image steganography according to claim 1, wherein the step S1 is specifically as follows:

3. The method for generating robust image steganography according to claim 1, wherein the step S2 is specifically as follows:

4. A method for generating robust image steganography as claimed in claim 3 wherein in said step S2, said secret-carrying noise map is obtained by modulating secret information of 64 x 64 bits to a random noise map of 64 x 64 resolution; in order to integrate the secret information into the image generation process and minimize the influence caused by the secret information, an information modulation mode without changing the noise distribution is adopted:

N′＝ABS(N)*(2M-1)

5. A method as claimed in claim 3, wherein in step S2, the coding module is a customized residual convolutional neural network, which comprises two convolutional layers and two residual blocks; the residual block is a structure of adding a short-circuit path between two convolution layers; setting the kernel size of all the convolution layers to be 3 multiplied by 3 and the step length to be 1, and adopting a correction linear unit with leakage for the activation function of each layer except the last layer so as to avoid information loss; the resolution of the transformed secret noise map obtained by the coding module is unchanged, but the secret noise map can be more suitable for the distribution of the pseudo-map, so that the secret information is distributed in the robust image characteristics.

6. A generating robust image steganography method according to claim 5, wherein in step S2, the generator and the discriminator are both in a progressive convolutional neural network structure, which is divided into 6 levels of 4 × 4, 8 × 8, 16 × 16, … and 128 × 128; the generator takes the hidden vector, a plurality of random noise images and a noise image containing secret information as input, and takes the secret-carrying pseudo face image as output; wherein the hidden variable is an input to the mapping module, which determines the image content; random noise is input to each hierarchy noise adding module, and accordingly comprises noise maps with 6 different resolutions, such as 4 × 4, 8 × 8, 16 × 16, …, 128 × 128 and the like, which affect subtle changes and characteristics of different levels of images; the discriminator is used as an object of the generator for countertraining, the secret-carrying pseudo-graph or the real image is used as input, the discrimination score is used as output, and the structure of the discriminator and the generator are mirror images after the additional module is removed.

7. A generating robust image steganography method as described in claim 6, wherein in said step S2, said extractor is a dense connection type network which inputs a 128 x 3 dense image and outputs 64 x 64 bits of secret information; the densely connected network comprises 5 x 5 convolutional layers and a 2 x 2 average pooling layer, the output of the previous convolutional layer is shorted to the output of the subsequent convolutional layer to improve the information flow of the network, and the average pooling layer is adopted to perform down-sampling at the end of the network to reduce the risk of information loss.

8. The method as claimed in claim 7, wherein in step S2, the noise module is added between the generator and the extractor, and performs two channel attacks of JPEG compression and scaling during analog transmission to train and improve the robustness of the secret-carrying image; the noise adding module comprises three modes, namely lossless transmission, scaling and JPEG compression, wherein one of the three modes is randomly selected from each iteration as a current channel state, and the attack on a secret-carrying image output by the generator is simulated;

the lossless transmission is the simplest mode, the secret-carrying pseudo-graph is directly transmitted to the extractor without any processing, and the image is not attacked in the transmission process; the scaling mode carries out scale transformation on the secret-carrying pseudo-map, and the coefficients are randomly sampled between (0.5, 1) each time, so that the resolution of the image is correspondingly lowered after the mode; in JPEG compression, an image is divided into 8 multiplied by 8 areas, DCT (discrete cosine transform) in each area is calculated, and then different roughness quantization is carried out on the obtained frequency domain coefficients; the quantization step is not guided, so the JPEG compression cannot be directly suitable for optimization based on gradient, and the JPEG compression needs to be guided and approximated, so the quantization step of the JPEG compression is directly modified by adopting an approximation mode, and the modification method specifically comprises the following steps of:

Round_pprox(x)＝Round(x)+(x-Round(x))³

wherein Round_pprox(. cndot.) represents a rounded function that can be approximated, Round (·) represents a rounded function, and x represents quantized DCT coefficients; at worst, the derivative JPEG compression is numerically 0.125 different from JPEG compression.

9. A method for generating robust image steganography as claimed in claim 3 wherein in said step S3, said joint-fine tuning method comprises a plurality of iterative processes, each iteration being divided into three phases:

wherein L is₁Is the stage objective function, D (-) is the arbiter network, G (-) is the generator network, Z, M, X are the hidden vector, secret information and secret-carrying pseudo-graph, respectively, and E denotes the mathematical expectation;

wherein L is₂It is the objective function CrossEntropy (x, y) of this stage that finds the cross entropy for both x and y, M' represents the secret information extracted by the extractor, E (-) is the discriminator, and the other parameters are the same as the above; the countermeasure loss is a discrimination score obtained by inputting the generated secret-carrying pseudo-graph into a discriminator, and the minimized countermeasure loss acts on improving the verisimilitude of the secret-carrying pseudo-graph; the decoding loss is defined as binary cross entropy between the secret information and the extracted information, and the minimum decoding loss acts on improving the extraction accuracy of the secret information; when the generator and the extractor are jointly trained by using the objective function consisting of the two parts, the gradient generated by decoding loss can be transmitted to the extractor and then to the generator, the network parameters of the generator and the extractor are adjusted, the consistency of the generator and the extractor is ensured, and meanwhile, the loss-resisting gradient flow helps the pseudo-graph generated by the generator to be vivid;

L₃＝CrossEntropy(M″，M)，M″＝E(Round(G(Z，M)))

wherein L is₃Is the objective function of the stage, M "represents the secret information extracted by the extractor at the stage, Round (-) represents the rounding function, and other parameters are the same as the above;

10. The method for generating robust image steganography according to claim 5, wherein the step S4 is specifically as follows: