CN113987129A

CN113987129A - Digital media protection text steganography method based on variational automatic encoder

Info

Publication number: CN113987129A
Application number: CN202111311802.0A
Authority: CN
Inventors: 刘红; 李政; 肖云鹏; 李暾; 贾朝龙; 王蓉
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Shenzhen Wanzhida Technology Transfer Center Co ltd
Priority date: 2021-11-08
Filing date: 2021-11-08
Publication date: 2022-01-28

Abstract

The invention belongs to the field of information security, and particularly relates to a digital media protected text steganography method based on a variational automatic encoder, which comprises the steps of constructing a neural network model consisting of an encoding network, a Gaussian sampling network and a decoding network, and vectorizing a text; respectively acquiring the characteristics of the global keywords and the long sequences by using a coding network, and fusing the characteristics of the global keywords and the long sequences to acquire global characteristic representation; gaussian sampling is carried out on the global feature representation in the coding network by utilizing the Gaussian sampling; decoding the sampling result of the Gaussian sampling by using a decoding network to obtain the conditional probability distribution of the text; selecting K words with the maximum conditional probability, and selecting a word corresponding to the secret bit stream by using Huffman coding to complete the steganography of the file; the method can generate long and diverse steganographic texts, so that the steganographic texts can carry more secret information, and the natural language and the steganographic texts are visually indistinguishable, statistically indistinguishable and semantically indistinguishable.

Description

Digital media protection text steganography method based on variational automatic encoder

Technical Field

The invention belongs to the field of information security, and particularly relates to a digital media protected text steganography method based on a variational automatic encoder.

Background

The research on information hiding technology is originally from abroad, and the information hiding technology and the multimedia information security academic conference are gradually introduced into China after successful promotion in 1999, so that the information hiding technology becomes an emerging research field. In the information hiding technology, steganography, digital watermarking and the like are used for solving the safety problems of hidden communication, digital evidence obtaining, copyright protection and the like. Steganography, one of the key technologies in information hiding, is essentially to embed secret information into carrier data, hiding the existence of communication, and making an attacker visually unable to know whether the information contains secret information. Digital watermarking refers to embedding specific digital information (such as identity information, serial numbers, characters and the like) into digital products such as images, audios or videos, and is mostly used for copyright protection. Compared with the digital watermarking technology, steganography can embed more secret information, and the method for hiding the information is irregular, so that the attack difficulty is increased.

Steganography often uses various multimedia information carriers to hide secret information, including public carriers such as text, images, video, and audio. The text is used as the most widely used information carrier in daily communication and publishing viewpoints of people, and has great research value and practical significance for processing the information carrier. The text steganography technology is used for hiding the secret information into a publicly transmitted file or document so as to realize the hidden transmission of the secret information. Compared with the information hiding technology based on images, audio and video, the text has higher coding efficiency and smaller modification redundancy, and any bit change can cause the text to have perceptible change, so that how to skillfully use the text to carry the secret information to exchange the secret information is one of the hot problems studied in the information hiding field in recent years.

The text is an important carrier for information exchange and information transmission, is widely applied to life and production of people, and lays a good foundation for the development of text steganography. Most of the traditional text steganography is a carrier-based method, and the embedding of secret information is completed mainly by modifying the encoding mode, the arrangement format, the font size, the position and the like of a text so as to realize the steganography of the text. Although these traditional text steganography technologies can indeed solve the problem of transferring secret information for us, for these traditional steganography technologies, no matter how to improve the hidden capacity, improve the hiding and extraction algorithm, the carrier needs to be modified finally, modification traces are inevitably left on the carrier, so that the text carrier is difficult to resist the problem of detection by various steganography analysis tools, and an opportunity is provided for attackers.

Aiming at the defect that the traditional text information hiding is easy to detect by a steganalysis tool, researchers provide a novel carrier-free information hiding scheme to improve the detectability resistance of the information hiding. The carrier-free text steganography method aims to transmit secret information by finding a public carrier containing secret, and establishes a mapping relation between carrier information and secret information according to the characteristics of the carrier (the carrier is not required), so as to solve the problem that the traditional information hiding technology is easy to be attacked by a third party due to the modification trace of the carrier. With the development of natural language processing technology, researchers combine text automatic generation technology based on deep learning with information hiding, and provide an information hiding method related to automatic generation of text steganography, so that a carrier-free text steganography technology is realized. The text steganography method based on deep learning generally uses a large amount of corpus to train a text relation model, a secret information sender converts secret information into a secret bit stream, the secret bit stream is used according to a coding rule to automatically select a text to be generated so as to complete data hiding and realize data steganography, and a secret information receiver uses a language model or a data set which is the same as that of the sender to extract embedded secret information from the steganography text.

Although numerous scholars have conducted extensive research into steganographic text generation models and achieved considerable success, there are still some challenges:

1. difficulty in generation of long steganographic text. Since the generation of a long steganographic text requires consideration of the context correlation of short sentences at the same time and the generation of a long text is constructed by combining short sentences, it is necessary to consider not only the text correlation between words but also the context correlation between sentences.

2. Diversity of steganographic text. Most of the text steganography models are trained on the basis of a corpus with a single attribute, and are unsupervised steganography text generation models, so that the generated text content cannot be controlled, and the use scenes of the generated text are limited. Therefore, how to enrich the use scene of the steganographic text becomes a difficulty.

3. A text steganographic word vector model. A good word vector model training method is beneficial to generating more real and natural steganographic texts, so that the steganographic concealment and security of the texts are realized. It becomes a challenge how to obtain a word vector model that better represents the context characteristics of the text.

Disclosure of Invention

Aiming at the challenges, the invention provides a digital media protected text steganography method based on a variational automatic encoder, aiming at improving the length of a steganography text and generating a diversified steganography text, converting secret information into a secret bit stream and embedding the secret bit stream into a carrier text generated by a network model, and specifically comprising the following steps:

preprocessing a text, including extracting global keywords and group keywords of a training text, dividing a long text into a plurality of short sequences, wherein each short sequence corresponds to one group of group keywords, and the global keywords are a union of all the groups of keywords;

constructing a neural network model consisting of an encoding network, a Gaussian sampling network and a decoding network, and vectorizing the text;

respectively acquiring the characteristics of the global keywords and the long sequences by using a coding network, and fusing the characteristics of the global keywords and the long sequences to acquire global characteristic representations for fusion;

gaussian sampling is carried out on the global feature representation in the coding network by utilizing the Gaussian sampling;

decoding the sampling result of the Gaussian sampling by using a decoding network to obtain the conditional probability distribution of the text;

and selecting K words with the maximum conditional probability, encoding the K words by using a Huffman code, and selecting one word corresponding to the Huffman code according to the secret bit stream to be embedded to finish the steganography of the file.

Further, obtaining global keyword annoyance features, namely extracting context features between words in the text, and extracting the context features between words in the text by combining a bidirectional gating circulation unit and an attention mechanism, wherein the method comprises the following steps:

acquiring a forward hidden state and a backward hidden state of the text by using a forward gated cyclic unit and a backward gated cyclic unit in a bidirectional gated cyclic unit;

merging the states acquired by the current forward gating circulating unit and the backward gating circulating unit, and inputting the merged states into an attention layer;

and calculating the matching score of the hidden layer output of each bidirectional gating cycle unit and the whole text representation vector in the attention layer as a proportion of the total score, and obtaining the output of the attention layer through linear transformation.

Further, processing the input vector using a bidirectional gated loop unit comprises:

z′_t＝σ(W_z'·[h_t-1,x_t]+b_z')

r_t＝σ(W_r·[h_t-1,x_t]+b_r)

n_t＝tanh(W_n·[r_t*h_t-1,x_t])

h_t＝(1-z′_t)*h_t-1+z′_t*n_t

wherein, z'_tTo update the door; w_z'To update the training weights of the doors; h is_t-1Is a hidden state of the upper layer; x is the number of_tInputting a vector of a bidirectional gating circulation unit for the t-th moment; b_z'To update the bias value of the gate; r is_tTo reset the gate; w_rTo reset the training weights of the gates; b_rTo reset the offset value of the gate; n is_tIs a candidate activation function; w_nIs the weight of the candidate activation function; h is_tOutputting a result for the hidden layer of the time t; σ (x) is a Sigmoid activation function; h is_t' hidden layer output state;

is a forward hidden state;

the state is a backward hidden state; wherein

Represents h_tAs a result of the forward hidden layer of (a),

represents h_tBackward hidden layer result of (1).

Further, the processing of the hidden layer output state by the attention layer includes:

u_t＝tanh(W_attnh_t'+b_attn)

s_t＝∑_ta_th_t

output_t＝W_os_t+b_o

wherein, W_attn、b_attnRespectively, the weight and the bias value of the attention layer; u. of_attnAn attention matrix representing a random initialization; w_o、b_oThe weight coefficient and the offset value of the output layer are respectively.

Furthermore, when the features of the long sequence are extracted, the vector representation of each short sentence is obtained, and then the correlation features h between the short sentences are obtained through the bidirectional GRU_t ^s，h_t ^sIndicating the t-th phrase s_tThe hidden layer feature of (1).

Further, the Gaussian sampling of the global features in the coding network comprises a model training stage and a generation stage of operating real-time data, the approximate posterior distribution of the global key word vectors and the global key word hidden variables is obtained by sampling the global key word vectors in the training stage, and the approximate prior distribution of the global key word vectors is obtained by sampling the global key word vectors when the real-time data is processed.

Further, the decoding the sampling result by the decoding network includes:

decoding the group keyword hidden variables, namely sampling the global hidden variables z to obtain the group keyword hidden variables generated by each clause;

performing feature fusion on the group keyword hidden variables obtained by decoding and the global hidden variables to obtain local hidden variables for guiding the generation of the current group clauses;

and performing characteristic decoding on the local hidden variables of each group to complete the conditional probability prediction of the words in each group clause.

Further, the group key hidden variable decoding process includes:

a group of keywords and a group of clauses under the constraint of the group of keywords are arranged at each layer of the neural network model, and each group of keywords is a subset of text vectors input into the neural network;

obtaining a global keyword hidden variable z by sampling a text vector input into a neural network; acquiring each group of keywords by sampling a text vector and z of the neural network;

acquiring prior distribution or posterior distribution through Gaussian sampling, and selecting a group of keywords corresponding to each layer;

in each time step t, the keyword decoder takes a text vector of a neural network, a global keyword hidden variable z and a grouping before t time as input, calculates the probability of each input item at t time, and takes the probability value exceeding a threshold value as the grouping keyword hidden variable of the grouping at t time.

Further, feature decoding is performed on the local hidden variables of each group, and the decoding aims to map the sampled feature codes into conditional probabilities corresponding to the words, and the feature decoding method includes the following steps:

wherein, GRU_sA GRU unit representing a sentence decoder;

is GRU_gEncoding the last hidden state of the keyword result g;

local hidden variables for each short sentence; w_sAs the weight of the initial hidden layer vector, b_sIs the offset value of the initial hidden layer vector;

and

feature vectors representing the forward and backward directions of a word-word context.

Further, the process of performing feature embedding includes:

each time step generates a word, tth_wGenerating the t-th time step_wA word;

t before use_w-1 conditional probability of generating a word, each word being at the t-th based on a prior probability or a posterior probability_wThe probability of the time step being selected is sorted according to the reverse order;

before selection 2ⁿIndividual word, first 2 using Huffman codingⁿEncoding the conditional probability of each word;

according to the secret information bit stream B to be embedded B₁,b₂,...,b_oSelecting a corresponding word as a carrier word of the secret information to finish the steganography of the text;

wherein, b_oRepresenting the o-th bit in the secret information bit stream, wherein o is the length of the bit stream; n is the extent of the embedded secret information.

The method can generate long and diverse steganographic texts, so that the steganographic texts can carry more secret information, and the visual inseparability, statistical inseparability and semantic inseparability of the natural language and the steganographic texts are realized.

Drawings

FIG. 1 is a model structure of a digital media protected text steganography method based on a variational automatic encoder according to the present invention;

FIG. 2 is a schematic diagram of bidirectional GRU and attention mechanism feature extraction employed in the present invention;

fig. 3 is a schematic diagram of the secret embedding process in the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a digital media protected text steganography method based on a variational automatic encoder, which comprises the following steps of

respectively acquiring the characteristics of the global keywords and the long sequences by using a coding network, and fusing the characteristics of the global keywords and the long sequences to acquire global characteristic representation;

and selecting K words with the maximum conditional probability, and selecting a word corresponding to the secret bit stream by using Huffman coding to complete the steganography of the file.

Example 1

In this embodiment, as shown in fig. 1, a digital media protected text steganography method based on a variational automatic encoder takes a global keyword, a long sequence and secret information to be hidden as an input data packet, and outputs a steganography text embedded with the secret information after passing through a steganography text generation model. The method specifically comprises the following four steps:

s1: and (4) preprocessing data. The global keywords and the group keywords of the training text need to be extracted from the data preprocessing part, a long sequence needs to be divided into a plurality of short sequences, each short sequence corresponds to one group of group keywords, and the global keywords are a union of all the groups of keywords. To embed the secret information in the plain text, the secret information also needs to be encoded into a bitstream.

S2: text context relevance is extracted. Firstly, text is required to be processed into a feature representation which can be identified by a computer, and a vectorization tool is required to be used for vectorizing the text; secondly, not only the context characteristics between text words, but also the context characteristics between each short sentence and each short sentence are required to be obtained; and finally, coding the acquired features by using a model.

S3: and (5) establishing a model. The model establishment is mainly divided into three parts: coding network, Gaussian sampling and decoding network. The coding network is mainly used for fusing the global keywords and the long sequences to obtain the global feature representation of the global keywords; the Gaussian sampling is mainly used for carrying out Gaussian sampling on global features in the coding network, and the sampling result obeys isotropic Gaussian distribution; the decoding network mainly decodes the sampling result and then completes the generation of the text according to the conditional probability distribution of the text.

S4: embedding and extracting secret information: embedding is mainly to embed the secret information by using the bit stream of the secret information, using word conditional probability calculated by a model, selecting a word corresponding to the secret bit stream by selecting a plurality of maximum conditional probabilities and using Huffman coding; the extraction process uses the same model as the embedding, and uses Huffman coding according to the predicted conditional probability to reversely obtain the text bit stream.

Example 2

This embodiment further illustrates the data preprocessing of step S1 in embodiment 1, and the step mainly includes the following 4 steps:

s11: and (5) sequence word segmentation. Generally, keywords cannot be directly obtained from the original sequence, words without word property influence on the accuracy of the keywords, and stop words are removed from the word segmentation result.

S12: and acquiring global keywords and group keywords. The group keywords are mainly used for local control of text generation content, and mature keyword extraction tools are generally used for obtaining keywords of texts; the global keywords are the union of the group keywords.

S13: and (5) dividing a long sequence. The use of long sequences for text generation often results in loss of relevant contextual features in the latter half of the text and may result in uncontrolled text generation. Therefore, it is necessary to divide the long text into a plurality of short sequences, and then through global and local control of the short sequences, it is ensured that the context characteristics of the text are not lost and the text content is controllable.

S14: the secret information is converted into a bit stream. Generally, due to the redundancy of text, secret information cannot be directly carried, and a sophisticated cryptography tool or a binary conversion tool is required to convert the secret information into a binary bit stream.

In natural language processing-text generation, the quality of a text generation model is generally influenced by various factors, especially the generation of long text, such as: word vector models, word-to-word correlations, phrase-to-phrase correlations, and the like. Based on the method, text features of different dimensions are obtained from word vector models and word-word and short sentence-short sentence correlations. This embodiment further illustrates the extraction of the text context correlation feature in step S2 in embodiment 1, where the step mainly includes the following 3 steps:

s21: and vectorizing the text.

Input x ═ x₁,x₂,...,x_MAnd the keywords or topics for guiding the generation of the steganographic text. We vectorize the input x using the mature vectorization algorithm Word2Vec to represent k ═ k₁,k₂,...,k_M}。

S22: word-to-word correlation features

To better capture the relational representation of text in semantic space, we combined bi-directional gated cyclic units (bigrus) and attention mechanism to extract text semantic features. For input x, which we encode using bi-directional GRU, x can be represented by the last hidden layer state concatenation of forward GRU and backward GRU, i.e. x is a concatenation of the forward GRU and the backward GRU

Is k_iIs context-aware. For the encoding process of the group clauses, in order to better extract the representation of the text in the semantic space, BiGRU is used between clauses and words, and attention is paidThe encoding is performed by a force mechanism, as shown in fig. 2. A Gated Round Unit (GRU), the expression of which can be expressed by the following formula:

z′_t＝σ(W_z'·[h_t-1,x_t]+b_z') (1)

r_t＝σ(W_r·[h_t-1,x_t]+b_r) (2)

n_t＝tanh(W_n·[r_t*h_t-1,x_t]) (3)

h_t＝(1-z′_t)*h_t-1+z′_t*n_t (4)

wherein x_tIs the input vector at time t, h_tIs the output of the time step t, [ h ]_t-1,x_t]Representing matrix connection, representing multiplication of corresponding elements among vectors, and representing sigmoid function:

for the BiGRU at the time t, the hidden layer outputs the state h_tIs divided into forward hidden states

And a backward hidden state

Namely:

the output result h 'of the BiGRU layer is { h'₁,h'₂,...,h′_rSend to the attention layer, where r represents the input sequence length. At the attention level, the semantic vector representation of the target word in the context is enhanced mainly by using an attention mechanism, which can make our model focus on more critical parts and ignore irrelevant parts. The input of the Attention layer is the input of the previous layer passing throughThe idea of BiGRU computing the processed output vector h' is to weight the matching score of the hidden layer output for each BiGRU time step with the entire text representation vector to the overall score. The weight coefficient of the Attention layer is specifically calculated by the following formulas:

u_t＝tanh(W_attnh′_t+b_attn) (7)

s_t＝∑_ta_th_t (9)

wherein, W_attnAnd b_attnWeights and bias values for Attention; u. of_attnIndicating a randomly initialized attention matrix, a_tLike the softmax function, it is essential to calculate all

u_attnThe score of the results is proportional. After the attention probability distribution value at each time is obtained, a feature vector s including text information is calculated. The output vector obtained by the Attention layer can be further obtained by linear transformation:

output_t＝W_os_t+b_o (10)

wherein W_oAnd b_oAs the weighting coefficients and bias values for the output layer. Finally, the code vector output of each moment is obtained_attn＝{output₁,output₂,...,output_r}。

S23: clause-to-clause correlation features

For sentence-level generation tasks, we need to consider not only group keywords at the t level

At the same time, the state s of the upper sentence is also required to be combined_t'＜tThat is, its generation can be represented by the following formula:

there is also a need for control in conjunction with global hidden variables, via global variables z and local variables

To guide generation of hierarchical sentences together. For each layer of sentences s_tWill get its sentence representation

Example 3

The present embodiment further illustrates the neural network model constructed in the present invention, and the model mainly includes three stages: coding network, Gaussian sampling and decoding network. The first stage coding network mainly fuses global keywords and long sequences to obtain global feature representation; the second stage of Gaussian sampling is mainly used for carrying out Gaussian sampling on global features in the coding network, and sampling results obey isotropic Gaussian distribution; the third stage decoding network mainly decodes the sampling result and then realizes the steganography of the text according to the conditional probability distribution of the text and the combination of a corresponding coding algorithm. The method specifically comprises the following steps for an encoding network, a Gaussian sampling network and a decoding network:

s31: coding network

The coding network is divided into three steps: step one, expressing the global keywords after vectorization through a neural network to obtain corresponding feature expressions; secondly, the long sequence representation during vectorization is passed through a neural network to obtain the corresponding feature representation; and thirdly, fusing the keyword features and the long sequence features.

S311: global keyword features

By encoding the vector representation k of the input x using bi-directional GRUs, x can be represented by the last hidden layer state concatenation of forward GRUs and backward GRUs, i.e.

Is k_iIs context-aware. The expressions of GRU are as shown above in (1) (2) (3) (4) of S22.

S312: long sentence contextual features

For the encoding process of long text, in order to be able to better extract the representation of the text in the semantic space, we encode between clauses and between words using BiGRU and attention mechanism, as shown in fig. 2. The expressions of GRU are as shown above in (1) (2) (3) (4) of S22. Then, the output result h' of the BiGRU layer is { h ═ h₁',h₂',...,h_r' } to the attention mechanism layer, where r denotes the input sequence length. At the attention level, we mainly use the attention mechanism to enhance the semantic vector representation of the target word in the context. The input to the attention layer is the output vector h' of the previous layer that has undergone the BiGRU calculation process, the idea being that the matching score of the hidden layer output for each BiGRU time step with the entire text representation vector is a weighted sum of the overall scores. The weighting factor of the attention layer can be expressed by reference to the expressions (7), (8) and (9) of S22. Finally, we transform the output vector S obtained by the attention layer by linear transformation to obtain the result of equation (10) of S22.

S313: feature fusion

The feature fusion in the training stage mainly fuses the keyword features and the long sequence features, and the main purpose is to obtain global hidden variables and local hidden variables for controlling the overall generated content of the text. The fusion process is mainly to fuse the results of the calculations of S311 and S312. As shown in the following formula:

h^k＝[enc(x),output_attn] (12)

wherein h is^kRepresenting global feature representation, enc (x) representing keyword feature, output_attnRepresenting long text features through the attention layer, here h^kFrom enc (x) and output_attnAnd (4) splicing to obtain the finished product.

In the text test generation stage, only the keyword group x of the text needs to be input, and the corresponding enc (x) can be obtained, and the corresponding enc (x) is obtainedH of time^k＝enc(x)。

S32: gaussian sampling

In the model, h is calculated by the pair S313^kSampling is carried out to obtain a global keyword hidden variable z, and reasonable planning diversity can be simulated through the global keyword hidden variable z. Here, two phases of sampling are mainly included, namely a training phase and a test generation phase.

S321: training phase

By sampling x and z to obtain an approximate a posteriori distribution, this process can be represented by the following equation:

wherein q is_θ'Representing the posterior distribution of samples, q_θ'Obeying an isotropic Gaussian distribution

μ_z'And

is a Gaussian distribution

X and y represent the input global keywords and the long-sequence text, respectively, and (x, y), i.e. h of the training phase mentioned at S313^kAnd z is a global keyword hidden variable obtained by sampling.

S322: test generation phase

By sampling x to obtain its approximate prior distribution, this process can be represented by the following equation:

wherein p is_θRepresenting a prior distribution of samples, p, supra_θAlso obey the isotropic Gauss divisionCloth

μ_zAnd

is a Gaussian distribution

X represents the global keyword input in the test generation stage, and x is h of the test generation stage mentioned in S313^kZ is a global keyword hidden variable sampled in the process.

Sentences and labels are added in the training process, and the labels and the sentences are ensured to have correlation; when testing (real-time data processing), a sentence is generated from the tag and the secret information as long as the tag is present.

S33: decoding network

The decoding network part mainly comprises three stages: the first stage is group keyword hidden variable decoding, and group hidden variables generated by each clause, namely group keyword hidden variables, are obtained by sampling global hidden variables z; the second stage is to perform feature fusion on the group keyword hidden variables obtained by decoding and the global hidden variables to obtain local hidden variables for guiding the generation of the current group clauses; and the third stage is local hidden variable characteristic decoding, wherein the characteristic decoding is carried out on the local hidden variables of each group, so that the conditional probability prediction of the words in each group clause is completed.

S331: group key latent variable decoding

In the invention, the generation diversity of the text is controlled in a fine-grained way mainly through the keywords of the hierarchy. Each layer has a set of keywords and a set of clauses under the constraints of the set of keywords. Each set of keywords is again a subset of the input x, so that the generation of long text can be decomposed into hierarchical set clause generation subtasks. In the model, an overall keyword hidden variable z is obtained by sampling input x, and the diversity of reasonable planning is simulated through the overall keyword hidden variable z. Each set of keywords is obtained by sampling x and z, and these processes can be represented by the following formulas:

wherein

Is a sequence of all groups, each

Is a subset of the input x. N represents the length of the group sequence, and may also represent the number of group clauses or the number of tiers.

Set clauses for constraining each time step t_tWe use a group key decoder (a GRU) to compute

In the present invention, the group keyword corresponding to each layer needs to be selected through the probability value. At each time step t, the keyword decoder takes an input item x, a global hidden variable z and a group before the t moment as input to calculate the probability of each input item at the t moment, and takes the probability value exceeding a threshold value as a group keyword hidden variable of the group at the t moment, namely

Wherein

Is an input term k_iIs a vector encoded by the BiGRU and attention mechanism,

is a hidden state of the keyword decoder, W_pAnd b_pAre the weights and bias values of the keyword decoder.

To make it possible to

Consciously judging input item k_iWhether or not it has been selected, will be selected at time t

And sending the keyword to a keyword encoder. The group keyword hierarchy constraint process ends until the probability of ending at the next time exceeds a threshold:

s332: latent variable feature fusion

After the group hidden variables are calculated, the group hidden variables need to be fused with the global hidden variables, so that the steganographic text related to the global keywords and related to the group keywords can be generated. The fused local hidden variables can be used for calculating the context expression between the clauses in the group clauses, and can be shown by the following formula:

wherein s is_t'＜tRepresenting the current sentence s_tThe relationship of the first t-1 phrases of (a).

S333: local latent variable feature decoding

The local latent variable feature decoding is mainly used for generating each group clause, and the main correspondence of the local latent variable feature decoding is sentence decoding. For each layer of sentences s_tWill get its sentence representation

And local variables

It is assumed to follow an isotropic gaussian distribution. During the inference, at time step t, the sentence decoder will be distributed from the prior experience

Middle sampling

During training, approximate posterior distribution therefrom

Middle sampling

Sentence representation

And

can be represented by the following equation:

wherein

Is the word decoder to the upper sentence s_t-1Last hidden layer state, GRU, when decoding a word of_sRepresenting the GRU unit of the sentence decoder. We go through the weavingCode input x, global keyword hidden variables z and group keywords g^zTo initialize a hidden state

Wherein the content of the first and second substances,

is GRU_gThe last hidden state of the keyword result g is encoded. For word level generation, one GRU is still used as a word decoder.

Example 4

In this embodiment, completing steganography of a file according to neural network model output includes the following steps:

s41: embedding of secret information

Statistical-based text features are typically subject to a conditional probability distribution, where each word in a sentence has some contextual relevance to other words of the context, i.e., p (w)_i)＝p(w₁,...,w_i,...,w_Tw). We step the time t_w-conditional probability of the next word of 1

Sorting in reverse order, and then selecting the top 2 according to the number n of the secret information to be embeddedⁿIndividual word, and then uses Huffman coding to pair 2ⁿCoding the conditional probability of each word, and finally, according to the secret information bit stream B to be embedded, B₁,b₂,...,b_oSelecting corresponding words as carrier words of secret information to complete information hiding, and finally realizing text based on variational automatic encoderSteganographic techniques. Fig. 3 shows the selection process of a certain word at a certain level. For the first word of each level of the hierarchical word decoding structure, the selection is made by the contextual relevance of the hierarchical sentence decoding and the group keyword, which can be represented as follows:

wherein

Indicating the currently selected word, fun_BThe words to be generated are automatically selected by the secret information for the secret information embedding function (huffman coding).

In FIG. 3, when the first word "I" is selected, 2 with the highest conditional probability is selected using the GRU unitⁿIndividual words and are coded from 2 by Huffman codingⁿOne word is selected as the next word from the words, wherein n is the extent of embedding the secret information and represents how much secret information is embedded in one word, and the setting is performed by a person skilled in the art according to experience.

S42: extraction of secret information

When obtaining the steganographic text with the secret information, the extraction of the secret information can be performed using the same steganographic model as the embedding. The input global keywords and the input group keywords are unchanged, the same method is used for carrying out reverse ordering according to the conditional probability of the words to be generated, and the embedded secret bit streams are obtained one by one according to the word order of the steganographic text until the generation process of the text is finished. Therefore, all the embedded secret information can be completely acquired.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The digital media protected text steganography method based on the variational automatic encoder is characterized in that secret information is converted into secret bit streams, and the secret bit streams are embedded into a carrier text generated by a network model, and the method specifically comprises the following steps:

2. The method for steganography of digital media protected text based on variational automatic encoder as claimed in claim 1, wherein global keyword annoyance feature is obtained, i.e. context feature between words in text is extracted, and context feature between words in text is extracted by combining bidirectional gating cycle unit and attention mechanism, comprising the following steps:

3. The method of claim 2, wherein processing the input vector using a bi-directional gated round robin unit comprises:

z′_t＝σ(W_z'·[h_t-1,x_t]+b_z')

r_t＝σ(W_r·[h_t-1,x_t]+b_r)

n_t＝tanh(W_n·[r_t*h_t-1,x_t])

h_t＝(1-z′_t)*h_t-1+z′_t*n_t

wherein, z'_tTo update the door; w_z'To update the training weights of the doors; h is_t-1Is a hidden state of the upper layer; x is the number of_tInputting a vector of a bidirectional gating circulation unit for the t-th moment; b_z'To update the bias value of the gate; r is_tTo reset the gate; w_rTo reset the training weights of the gates; b_rTo reset the offset value of the gate; n is_tIs a candidate activation function; w_nIs the weight of the candidate activation function; h is_tOutputting a result for the hidden layer of the time t; σ (x) is a Sigmoid activation function; h'_tOutputting the state for the hidden layer;

is a forward hidden state;

the state is a backward hidden state; wherein

Represents h_tAs a result of the forward hidden layer of (a),

represents h_tBackward hidden layer result of (1).

4. The method of claim 2, wherein the attention layer processing the hidden layer output state comprises:

u_t＝tanh(W_attnh′_t+b_attn)

s_t＝∑_ta_th_t

output_t＝W_os_t+b_o

5. The method of claim 3, wherein in extracting the features of the long sequence, a vector representation of each short sentence is obtained, and the correlation features between short sentences are obtained by bidirectional GRU

Indicating the t-th phrase s_tThe hidden layer feature of (1).

6. The method of claim 1, wherein the Gaussian sampling of global features in the encoded network comprises a model training phase and a generation phase of real-time data, wherein the training phase comprises sampling global keyword vectors and global keyword hidden variables to obtain an approximate posterior distribution, and the real-time data is processed by sampling global keyword vectors to obtain an approximate prior distribution.

7. The method of claim 6, wherein the decoding network decodes the sampling result by:

8. The method of claim 7, wherein the group keyword hidden variable decoding process comprises:

9. The method of claim 7, wherein the feature decoding is performed on the local hidden variables of each group, and the decoding is performed to map the sampled feature codes to conditional probabilities corresponding to words, and comprises:

wherein, GRU_sA GRU unit representing a sentence decoder;

is GRU_gEncoding the last hidden state of the keyword result g;

and

10. The method of claim 1, wherein the embedding of the features comprises:

each time step generates a word, tth_wGenerating the t-th time step_wA word;