Disclosure of Invention
Aiming at the challenges, the invention provides a digital media protected text steganography method based on a variational automatic encoder, aiming at improving the length of a steganography text and generating a diversified steganography text, converting secret information into a secret bit stream and embedding the secret bit stream into a carrier text generated by a network model, and specifically comprising the following steps:
preprocessing a text, including extracting global keywords and group keywords of a training text, dividing a long text into a plurality of short sequences, wherein each short sequence corresponds to one group of group keywords, and the global keywords are a union of all the groups of keywords;
constructing a neural network model consisting of an encoding network, a Gaussian sampling network and a decoding network, and vectorizing the text;
respectively acquiring the characteristics of the global keywords and the long sequences by using a coding network, and fusing the characteristics of the global keywords and the long sequences to acquire global characteristic representations for fusion;
gaussian sampling is carried out on the global feature representation in the coding network by utilizing the Gaussian sampling;
decoding the sampling result of the Gaussian sampling by using a decoding network to obtain the conditional probability distribution of the text;
and selecting K words with the maximum conditional probability, encoding the K words by using a Huffman code, and selecting one word corresponding to the Huffman code according to the secret bit stream to be embedded to finish the steganography of the file.
Further, obtaining global keyword annoyance features, namely extracting context features between words in the text, and extracting the context features between words in the text by combining a bidirectional gating circulation unit and an attention mechanism, wherein the method comprises the following steps:
acquiring a forward hidden state and a backward hidden state of the text by using a forward gated cyclic unit and a backward gated cyclic unit in a bidirectional gated cyclic unit;
merging the states acquired by the current forward gating circulating unit and the backward gating circulating unit, and inputting the merged states into an attention layer;
and calculating the matching score of the hidden layer output of each bidirectional gating cycle unit and the whole text representation vector in the attention layer as a proportion of the total score, and obtaining the output of the attention layer through linear transformation.
Further, processing the input vector using a bidirectional gated loop unit comprises:
z′t=σ(Wz'·[ht-1,xt]+bz')
rt=σ(Wr·[ht-1,xt]+br)
nt=tanh(Wn·[rt*ht-1,xt])
ht=(1-z′t)*ht-1+z′t*nt
wherein, z'
tTo update the door; w
z'To update the training weights of the doors; h is
t-1Is a hidden state of the upper layer; x is the number of
tInputting a vector of a bidirectional gating circulation unit for the t-th moment; b
z'To update the bias value of the gate; r is
tTo reset the gate; w
rTo reset the training weights of the gates; b
rTo reset the offset value of the gate; n is
tIs a candidate activation function; w
nIs the weight of the candidate activation function; h is
tOutputting a result for the hidden layer of the time t; σ (x) is a Sigmoid activation function; h is
t' hidden layer output state;
is a forward hidden state;
the state is a backward hidden state; wherein
Represents h
tAs a result of the forward hidden layer of (a),
represents h
tBackward hidden layer result of (1).
Further, the processing of the hidden layer output state by the attention layer includes:
ut=tanh(Wattnht'+battn)
st=∑tatht
outputt=Wost+bo
wherein, Wattn、battnRespectively, the weight and the bias value of the attention layer; u. ofattnAn attention matrix representing a random initialization; wo、boThe weight coefficient and the offset value of the output layer are respectively.
Furthermore, when the features of the long sequence are extracted, the vector representation of each short sentence is obtained, and then the correlation features h between the short sentences are obtained through the bidirectional GRUt s,ht sIndicating the t-th phrase stThe hidden layer feature of (1).
Further, the Gaussian sampling of the global features in the coding network comprises a model training stage and a generation stage of operating real-time data, the approximate posterior distribution of the global key word vectors and the global key word hidden variables is obtained by sampling the global key word vectors in the training stage, and the approximate prior distribution of the global key word vectors is obtained by sampling the global key word vectors when the real-time data is processed.
Further, the decoding the sampling result by the decoding network includes:
decoding the group keyword hidden variables, namely sampling the global hidden variables z to obtain the group keyword hidden variables generated by each clause;
performing feature fusion on the group keyword hidden variables obtained by decoding and the global hidden variables to obtain local hidden variables for guiding the generation of the current group clauses;
and performing characteristic decoding on the local hidden variables of each group to complete the conditional probability prediction of the words in each group clause.
Further, the group key hidden variable decoding process includes:
a group of keywords and a group of clauses under the constraint of the group of keywords are arranged at each layer of the neural network model, and each group of keywords is a subset of text vectors input into the neural network;
obtaining a global keyword hidden variable z by sampling a text vector input into a neural network; acquiring each group of keywords by sampling a text vector and z of the neural network;
acquiring prior distribution or posterior distribution through Gaussian sampling, and selecting a group of keywords corresponding to each layer;
in each time step t, the keyword decoder takes a text vector of a neural network, a global keyword hidden variable z and a grouping before t time as input, calculates the probability of each input item at t time, and takes the probability value exceeding a threshold value as the grouping keyword hidden variable of the grouping at t time.
Further, feature decoding is performed on the local hidden variables of each group, and the decoding aims to map the sampled feature codes into conditional probabilities corresponding to the words, and the feature decoding method includes the following steps:
wherein, GRU
sA GRU unit representing a sentence decoder;
is GRU
gEncoding the last hidden state of the keyword result g;
local hidden variables for each short sentence; w
sAs the weight of the initial hidden layer vector, b
sIs the offset value of the initial hidden layer vector;
and
feature vectors representing the forward and backward directions of a word-word context.
Further, the process of performing feature embedding includes:
each time step generates a word, tthwGenerating the t-th time stepwA word;
t before usew-1 conditional probability of generating a word, each word being at the t-th based on a prior probability or a posterior probabilitywThe probability of the time step being selected is sorted according to the reverse order;
before selection 2nIndividual word, first 2 using Huffman codingnEncoding the conditional probability of each word;
according to the secret information bit stream B to be embedded B1,b2,...,boSelecting a corresponding word as a carrier word of the secret information to finish the steganography of the text;
wherein, boRepresenting the o-th bit in the secret information bit stream, wherein o is the length of the bit stream; n is the extent of the embedded secret information.
The method can generate long and diverse steganographic texts, so that the steganographic texts can carry more secret information, and the visual inseparability, statistical inseparability and semantic inseparability of the natural language and the steganographic texts are realized.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a digital media protected text steganography method based on a variational automatic encoder, which comprises the following steps of
Preprocessing a text, including extracting global keywords and group keywords of a training text, dividing a long text into a plurality of short sequences, wherein each short sequence corresponds to one group of group keywords, and the global keywords are a union of all the groups of keywords;
constructing a neural network model consisting of an encoding network, a Gaussian sampling network and a decoding network, and vectorizing the text;
respectively acquiring the characteristics of the global keywords and the long sequences by using a coding network, and fusing the characteristics of the global keywords and the long sequences to acquire global characteristic representation;
gaussian sampling is carried out on the global feature representation in the coding network by utilizing the Gaussian sampling;
decoding the sampling result of the Gaussian sampling by using a decoding network to obtain the conditional probability distribution of the text;
and selecting K words with the maximum conditional probability, and selecting a word corresponding to the secret bit stream by using Huffman coding to complete the steganography of the file.
Example 1
In this embodiment, as shown in fig. 1, a digital media protected text steganography method based on a variational automatic encoder takes a global keyword, a long sequence and secret information to be hidden as an input data packet, and outputs a steganography text embedded with the secret information after passing through a steganography text generation model. The method specifically comprises the following four steps:
s1: and (4) preprocessing data. The global keywords and the group keywords of the training text need to be extracted from the data preprocessing part, a long sequence needs to be divided into a plurality of short sequences, each short sequence corresponds to one group of group keywords, and the global keywords are a union of all the groups of keywords. To embed the secret information in the plain text, the secret information also needs to be encoded into a bitstream.
S2: text context relevance is extracted. Firstly, text is required to be processed into a feature representation which can be identified by a computer, and a vectorization tool is required to be used for vectorizing the text; secondly, not only the context characteristics between text words, but also the context characteristics between each short sentence and each short sentence are required to be obtained; and finally, coding the acquired features by using a model.
S3: and (5) establishing a model. The model establishment is mainly divided into three parts: coding network, Gaussian sampling and decoding network. The coding network is mainly used for fusing the global keywords and the long sequences to obtain the global feature representation of the global keywords; the Gaussian sampling is mainly used for carrying out Gaussian sampling on global features in the coding network, and the sampling result obeys isotropic Gaussian distribution; the decoding network mainly decodes the sampling result and then completes the generation of the text according to the conditional probability distribution of the text.
S4: embedding and extracting secret information: embedding is mainly to embed the secret information by using the bit stream of the secret information, using word conditional probability calculated by a model, selecting a word corresponding to the secret bit stream by selecting a plurality of maximum conditional probabilities and using Huffman coding; the extraction process uses the same model as the embedding, and uses Huffman coding according to the predicted conditional probability to reversely obtain the text bit stream.
Example 2
This embodiment further illustrates the data preprocessing of step S1 in embodiment 1, and the step mainly includes the following 4 steps:
s11: and (5) sequence word segmentation. Generally, keywords cannot be directly obtained from the original sequence, words without word property influence on the accuracy of the keywords, and stop words are removed from the word segmentation result.
S12: and acquiring global keywords and group keywords. The group keywords are mainly used for local control of text generation content, and mature keyword extraction tools are generally used for obtaining keywords of texts; the global keywords are the union of the group keywords.
S13: and (5) dividing a long sequence. The use of long sequences for text generation often results in loss of relevant contextual features in the latter half of the text and may result in uncontrolled text generation. Therefore, it is necessary to divide the long text into a plurality of short sequences, and then through global and local control of the short sequences, it is ensured that the context characteristics of the text are not lost and the text content is controllable.
S14: the secret information is converted into a bit stream. Generally, due to the redundancy of text, secret information cannot be directly carried, and a sophisticated cryptography tool or a binary conversion tool is required to convert the secret information into a binary bit stream.
In natural language processing-text generation, the quality of a text generation model is generally influenced by various factors, especially the generation of long text, such as: word vector models, word-to-word correlations, phrase-to-phrase correlations, and the like. Based on the method, text features of different dimensions are obtained from word vector models and word-word and short sentence-short sentence correlations. This embodiment further illustrates the extraction of the text context correlation feature in step S2 in embodiment 1, where the step mainly includes the following 3 steps:
s21: and vectorizing the text.
Input x ═ x1,x2,...,xMAnd the keywords or topics for guiding the generation of the steganographic text. We vectorize the input x using the mature vectorization algorithm Word2Vec to represent k ═ k1,k2,...,kM}。
S22: word-to-word correlation features
To better capture the relational representation of text in semantic space, we combined bi-directional gated cyclic units (bigrus) and attention mechanism to extract text semantic features. For input x, which we encode using bi-directional GRU, x can be represented by the last hidden layer state concatenation of forward GRU and backward GRU, i.e. x is a concatenation of the forward GRU and the backward GRU
Is k
iIs context-aware. For the encoding process of the group clauses, in order to better extract the representation of the text in the semantic space, BiGRU is used between clauses and words, and attention is paidThe encoding is performed by a force mechanism, as shown in fig. 2. A Gated Round Unit (GRU), the expression of which can be expressed by the following formula:
z′t=σ(Wz'·[ht-1,xt]+bz') (1)
rt=σ(Wr·[ht-1,xt]+br) (2)
nt=tanh(Wn·[rt*ht-1,xt]) (3)
ht=(1-z′t)*ht-1+z′t*nt (4)
wherein xtIs the input vector at time t, htIs the output of the time step t, [ h ]t-1,xt]Representing matrix connection, representing multiplication of corresponding elements among vectors, and representing sigmoid function:
for the BiGRU at the time t, the hidden layer outputs the state h
tIs divided into forward hidden states
And a backward hidden state
Namely:
the output result h 'of the BiGRU layer is { h'1,h'2,...,h′rSend to the attention layer, where r represents the input sequence length. At the attention level, the semantic vector representation of the target word in the context is enhanced mainly by using an attention mechanism, which can make our model focus on more critical parts and ignore irrelevant parts. The input of the Attention layer is the input of the previous layer passing throughThe idea of BiGRU computing the processed output vector h' is to weight the matching score of the hidden layer output for each BiGRU time step with the entire text representation vector to the overall score. The weight coefficient of the Attention layer is specifically calculated by the following formulas:
ut=tanh(Wattnh′t+battn) (7)
st=∑tatht (9)
wherein, W
attnAnd b
attnWeights and bias values for Attention; u. of
attnIndicating a randomly initialized attention matrix, a
tLike the softmax function, it is essential to calculate all
u
attnThe score of the results is proportional. After the attention probability distribution value at each time is obtained, a feature vector s including text information is calculated. The output vector obtained by the Attention layer can be further obtained by linear transformation:
outputt=Wost+bo (10)
wherein WoAnd boAs the weighting coefficients and bias values for the output layer. Finally, the code vector output of each moment is obtainedattn={output1,output2,...,outputr}。
S23: clause-to-clause correlation features
For sentence-level generation tasks, we need to consider not only group keywords at the t level
At the same time, the state s of the upper sentence is also required to be combined
t'<tThat is, its generation can be represented by the following formula:
there is also a need for control in conjunction with global hidden variables, via global variables z and local variables
To guide generation of hierarchical sentences together. For each layer of sentences s
tWill get its sentence representation
Example 3
The present embodiment further illustrates the neural network model constructed in the present invention, and the model mainly includes three stages: coding network, Gaussian sampling and decoding network. The first stage coding network mainly fuses global keywords and long sequences to obtain global feature representation; the second stage of Gaussian sampling is mainly used for carrying out Gaussian sampling on global features in the coding network, and sampling results obey isotropic Gaussian distribution; the third stage decoding network mainly decodes the sampling result and then realizes the steganography of the text according to the conditional probability distribution of the text and the combination of a corresponding coding algorithm. The method specifically comprises the following steps for an encoding network, a Gaussian sampling network and a decoding network:
s31: coding network
The coding network is divided into three steps: step one, expressing the global keywords after vectorization through a neural network to obtain corresponding feature expressions; secondly, the long sequence representation during vectorization is passed through a neural network to obtain the corresponding feature representation; and thirdly, fusing the keyword features and the long sequence features.
S311: global keyword features
By encoding the vector representation k of the input x using bi-directional GRUs, x can be represented by the last hidden layer state concatenation of forward GRUs and backward GRUs, i.e.
Is k
iIs context-aware. The expressions of GRU are as shown above in (1) (2) (3) (4) of S22.
S312: long sentence contextual features
For the encoding process of long text, in order to be able to better extract the representation of the text in the semantic space, we encode between clauses and between words using BiGRU and attention mechanism, as shown in fig. 2. The expressions of GRU are as shown above in (1) (2) (3) (4) of S22. Then, the output result h' of the BiGRU layer is { h ═ h1',h2',...,hr' } to the attention mechanism layer, where r denotes the input sequence length. At the attention level, we mainly use the attention mechanism to enhance the semantic vector representation of the target word in the context. The input to the attention layer is the output vector h' of the previous layer that has undergone the BiGRU calculation process, the idea being that the matching score of the hidden layer output for each BiGRU time step with the entire text representation vector is a weighted sum of the overall scores. The weighting factor of the attention layer can be expressed by reference to the expressions (7), (8) and (9) of S22. Finally, we transform the output vector S obtained by the attention layer by linear transformation to obtain the result of equation (10) of S22.
S313: feature fusion
The feature fusion in the training stage mainly fuses the keyword features and the long sequence features, and the main purpose is to obtain global hidden variables and local hidden variables for controlling the overall generated content of the text. The fusion process is mainly to fuse the results of the calculations of S311 and S312. As shown in the following formula:
hk=[enc(x),outputattn] (12)
wherein h iskRepresenting global feature representation, enc (x) representing keyword feature, outputattnRepresenting long text features through the attention layer, here hkFrom enc (x) and outputattnAnd (4) splicing to obtain the finished product.
In the text test generation stage, only the keyword group x of the text needs to be input, and the corresponding enc (x) can be obtained, and the corresponding enc (x) is obtainedH of timek=enc(x)。
S32: gaussian sampling
In the model, h is calculated by the pair S313kSampling is carried out to obtain a global keyword hidden variable z, and reasonable planning diversity can be simulated through the global keyword hidden variable z. Here, two phases of sampling are mainly included, namely a training phase and a test generation phase.
S321: training phase
By sampling x and z to obtain an approximate a posteriori distribution, this process can be represented by the following equation:
wherein q is
θ'Representing the posterior distribution of samples, q
θ'Obeying an isotropic Gaussian distribution
μ
z'And
is a Gaussian distribution
X and y represent the input global keywords and the long-sequence text, respectively, and (x, y), i.e. h of the training phase mentioned at S313
kAnd z is a global keyword hidden variable obtained by sampling.
S322: test generation phase
By sampling x to obtain its approximate prior distribution, this process can be represented by the following equation:
wherein p is
θRepresenting a prior distribution of samples, p, supra
θAlso obey the isotropic Gauss divisionCloth
μ
zAnd
is a Gaussian distribution
X represents the global keyword input in the test generation stage, and x is h of the test generation stage mentioned in S313
kZ is a global keyword hidden variable sampled in the process.
Sentences and labels are added in the training process, and the labels and the sentences are ensured to have correlation; when testing (real-time data processing), a sentence is generated from the tag and the secret information as long as the tag is present.
S33: decoding network
The decoding network part mainly comprises three stages: the first stage is group keyword hidden variable decoding, and group hidden variables generated by each clause, namely group keyword hidden variables, are obtained by sampling global hidden variables z; the second stage is to perform feature fusion on the group keyword hidden variables obtained by decoding and the global hidden variables to obtain local hidden variables for guiding the generation of the current group clauses; and the third stage is local hidden variable characteristic decoding, wherein the characteristic decoding is carried out on the local hidden variables of each group, so that the conditional probability prediction of the words in each group clause is completed.
S331: group key latent variable decoding
In the invention, the generation diversity of the text is controlled in a fine-grained way mainly through the keywords of the hierarchy. Each layer has a set of keywords and a set of clauses under the constraints of the set of keywords. Each set of keywords is again a subset of the input x, so that the generation of long text can be decomposed into hierarchical set clause generation subtasks. In the model, an overall keyword hidden variable z is obtained by sampling input x, and the diversity of reasonable planning is simulated through the overall keyword hidden variable z. Each set of keywords is obtained by sampling x and z, and these processes can be represented by the following formulas:
wherein
Is a sequence of all groups, each
Is a subset of the input x. N represents the length of the group sequence, and may also represent the number of group clauses or the number of tiers.
Set clauses for constraining each time step t
tWe use a group key decoder (a GRU) to compute
In the present invention, the group keyword corresponding to each layer needs to be selected through the probability value. At each time step t, the keyword decoder takes an input item x, a global hidden variable z and a group before the t moment as input to calculate the probability of each input item at the t moment, and takes the probability value exceeding a threshold value as a group keyword hidden variable of the group at the t moment, namely
Wherein
Is an input term k
iIs a vector encoded by the BiGRU and attention mechanism,
is a hidden state of the keyword decoder, W
pAnd b
pAre the weights and bias values of the keyword decoder.
To make it possible to
Consciously judging input item k
iWhether or not it has been selected, will be selected at time t
And sending the keyword to a keyword encoder. The group keyword hierarchy constraint process ends until the probability of ending at the next time exceeds a threshold:
s332: latent variable feature fusion
After the group hidden variables are calculated, the group hidden variables need to be fused with the global hidden variables, so that the steganographic text related to the global keywords and related to the group keywords can be generated. The fused local hidden variables can be used for calculating the context expression between the clauses in the group clauses, and can be shown by the following formula:
wherein s ist'<tRepresenting the current sentence stThe relationship of the first t-1 phrases of (a).
S333: local latent variable feature decoding
The local latent variable feature decoding is mainly used for generating each group clause, and the main correspondence of the local latent variable feature decoding is sentence decoding. For each layer of sentences s
tWill get its sentence representation
And local variables
It is assumed to follow an isotropic gaussian distribution. During the inference, at time step t, the sentence decoder will be distributed from the prior experience
Middle sampling
During training, approximate posterior distribution therefrom
Middle sampling
Sentence representation
And
can be represented by the following equation:
wherein
Is the word decoder to the upper sentence s
t-1Last hidden layer state, GRU, when decoding a word of
sRepresenting the GRU unit of the sentence decoder. We go through the weavingCode input x, global keyword hidden variables z and group keywords g
zTo initialize a hidden state
Wherein the content of the first and second substances,
is GRU
gThe last hidden state of the keyword result g is encoded. For word level generation, one GRU is still used as a word decoder.
Example 4
In this embodiment, completing steganography of a file according to neural network model output includes the following steps:
s41: embedding of secret information
Statistical-based text features are typically subject to a conditional probability distribution, where each word in a sentence has some contextual relevance to other words of the context, i.e., p (w)
i)=p(w
1,...,w
i,...,w
Tw). We step the time t
w-conditional probability of the next word of 1
Sorting in reverse order, and then selecting the top 2 according to the number n of the secret information to be embedded
nIndividual word, and then uses Huffman coding to pair 2
nCoding the conditional probability of each word, and finally, according to the secret information bit stream B to be embedded, B
1,b
2,...,b
oSelecting corresponding words as carrier words of secret information to complete information hiding, and finally realizing text based on variational automatic encoderSteganographic techniques. Fig. 3 shows the selection process of a certain word at a certain level. For the first word of each level of the hierarchical word decoding structure, the selection is made by the contextual relevance of the hierarchical sentence decoding and the group keyword, which can be represented as follows:
wherein
Indicating the currently selected word, fun
BThe words to be generated are automatically selected by the secret information for the secret information embedding function (huffman coding).
In FIG. 3, when the first word "I" is selected, 2 with the highest conditional probability is selected using the GRU unitnIndividual words and are coded from 2 by Huffman codingnOne word is selected as the next word from the words, wherein n is the extent of embedding the secret information and represents how much secret information is embedded in one word, and the setting is performed by a person skilled in the art according to experience.
S42: extraction of secret information
When obtaining the steganographic text with the secret information, the extraction of the secret information can be performed using the same steganographic model as the embedding. The input global keywords and the input group keywords are unchanged, the same method is used for carrying out reverse ordering according to the conditional probability of the words to be generated, and the embedded secret bit streams are obtained one by one according to the word order of the steganographic text until the generation process of the text is finished. Therefore, all the embedded secret information can be completely acquired.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.