CN115952528B

CN115952528B - Multi-scale combined text steganography method and system

Info

Publication number: CN115952528B
Application number: CN202310240044.0A
Authority: CN
Inventors: 付章杰; 丁长浩; 卢俊杰
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2023-03-14
Filing date: 2023-03-14
Publication date: 2023-05-16
Anticipated expiration: 2043-03-14
Also published as: CN115952528A

Abstract

The invention discloses a multi-scale joint text steganography method and a system, wherein the method comprises the following steps: acquiring a text sequence and secret information; inputting the text sequence into a pre-constructed generation and replacement joint model, and obtaining the generation probability distribution of each word; performing steganography operation on the text sequence according to the secret information and the generated probability distribution to obtain a first steganography text and a steganography record; determining non-steganographic words in a text sequence according to the steganographic records, inputting the text sequence into a pre-constructed generated replacement joint model, and obtaining the replacement probability distribution of each non-steganographic word; performing steganography operation on the non-steganography words according to the secret information and the replacement probability distribution, and obtaining a second steganography text; generating a joint steganography text according to the first steganography text and the second steganography text; the method and the device can solve the technical problems of low quality and low embedding rate of the steganographic text in the traditional text steganographic algorithm.

Description

Multi-scale combined text steganography method and system

Technical Field

The invention relates to a multi-scale combined text steganography method and system, and belongs to the technical field of information hiding.

Background

Text steganography is a method for embedding secret information in text and performing secure transmission, and is mainly used for realizing secret communication. The most important difference between text steganography and cryptography is the existence of the hidden information itself rather than the content of the information. Text steganography therefore has unique advantages in protecting information security. However, the conventional text steganography algorithm has the problems of low steganography text quality, low embedding rate and the like.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, provides a multi-scale combined text steganography method and a system, and solves the technical problems of low steganography text quality and low embedding rate in the traditional text steganography algorithm.

In order to achieve the above purpose, the invention is realized by adopting the following technical scheme:

in a first aspect, the present invention provides a multi-scale joint text steganography method, including:

acquiring a text sequence and secret information;

inputting the text sequence into a pre-constructed generation and replacement joint model, and obtaining the generation probability distribution of each word;

performing steganography operation on the text sequence according to the secret information and the generated probability distribution to obtain a first steganography text and a steganography record;

determining non-steganographic words in a text sequence according to the steganographic records, inputting the text sequence into a pre-constructed generated replacement joint model, and obtaining the replacement probability distribution of each non-steganographic word;

performing steganography operation on the non-steganography words according to the secret information and the replacement probability distribution, and obtaining a second steganography text;

a joint steganographic text is generated from the first steganographic text and the second steganographic text.

Optionally, the construction process for generating the alternative joint model includes:

acquiring a preset number of text data;

preprocessing text data, and constructing a sample set based on the preprocessed text data;

dividing a sample set into a training set and a verification set according to a preset proportion;

generating a replacement joint model based on PyTorch construction, wherein the generation of the replacement joint model comprises generation of a model and replacement of the model;

and performing iterative training on the generated and replaced joint model by using a training set, after iterative training, verifying the generated and replaced joint model after iterative training by using a verification set, and after verification, keeping and outputting the generated and replaced joint model with the minimum loss.

Optionally, the preprocessing includes:

dividing the text data, reserving words in the division result and generating word sequences;

taking the first n-1 bits of the word sequence as a sample, taking the last n-1 bits of the word sequence as a label, and n being the total number of bits of the word sequence;

if the number of bits of the sample or the label is smaller than a preset bit number threshold value N, filling the tail of the corresponding sample or label by filling a symbol to enable the number of bits of the sample or the label to be equal to the preset bit number threshold value N;

if the number of bits of the sample or the label is larger than the preset number of bits threshold N, the tail part of the corresponding sample or label is cut off to make the number of bits equal to the preset number of bits threshold N.

Optionally, the iterative training of generating the alternative joint model using the training set includes:

inputting samples in the training set to generate a replacement joint model, and acquiring the generation probability distribution output by the generation model and the replacement probability distribution output by the replacement model;

calculating loss according to the generated probability distribution prediction result and the replacement probability distribution prediction result which are respectively used as the input of the cross entropy loss function with the label

And->

For loss->

And->

Sum acquisition loss->

；

To loss of

Performing back propagation to obtain a parameter gradient for generating a replacement joint model, and performing parameter optimization by using an Adam optimizer;

carrying out iteration by taking the generated and replaced joint model after parameter optimization into the step of iterative training until loss

And (5) converging, and outputting a trained generation replacement joint model.

Optionally, the generating the probability distribution of the model output includes:

extracting time sequence relation feature vectors of words in the sample one by using LSTM and forming a time sequence relation feature matrix

；

Calculating the relation weight of each word in the sample on time sequence characteristics through a multi-head self-attention mechanism and reflecting the relation weight to be an attention moment array

：

；

In the method, in the process of the invention,

for attention head->

Output feature vector, ">

For the total number of attention head>

Attention head->

A parameter matrix corresponding to query, key, value vector, +.>

For the attention parameter matrix, +.>

，

For the dimension of the timing relation feature vector, +.>

For the connection operation +.>

Is a sigmoid function;

the time sequence relation characteristic matrix

And attention matrix->

Multiplying to obtain time characteristic matrix of each time step>

：

；

Mapping each word in the sample to a high-dimensional semantic space through a word embedding layer to obtain a word embedding vector of each word;

constructing a graph structure

And embedding the word embedding vectors of all words in the sample as graph structures

Is->

，/>

The number of words in the sample;

extracting spatial relationships of all words in a sample by a sliding window algorithm to build a graph structure

Edge set of (i.e.)>

，/>

The number of edges;

using GAT from graph structure

Extracting the spatial relation feature vector of each node, calculating the spatial feature by a multi-head self-attention mechanism, and reflecting the spatial feature to be attention coefficient +.>

：/>

；

In the method, in the process of the invention,

for node->

To node->

Attention coefficient of>

For node->

Is>

For node->

Node->

And node->

Spatial relation feature vector, +_>

A linear transformation weight matrix for each node, < +.>

Is a weight vector, ++>

To activate the function +.>

To splice the two vectors;

attention coefficient

Multiplying the spatial relation feature vector of the node by the spatial relation feature vector of the node, and updating the spatial relation feature vector of the node through a multi-head self-attention mechanism to generate a spatial feature matrix +.>

：

；

In the method, in the process of the invention,

for attention head->

A corresponding weight matrix;

matrix the time characteristics

And spatial feature matrix->

Feature fusion is carried out through the first full-connection layer and the activation function to obtain a fusion feature matrix +.>

：

；

In the method, in the process of the invention,

a parameter matrix for the first full connection layer;

will fuse the feature matrix

Predictive generation by means of the second fully connected layer and the activation function, output generation probability distribution +.>

：

；

In the method, in the process of the invention,

for the parameter matrix of the second fully connected layer, < >>

Is the first bias parameter.

Optionally, the substitution probability distribution output by the substitution model includes:

randomly selecting a plurality of words from the sample to replace the words with symbols representing the mask, so as to obtain a sample with the mask symbols;

mapping the mask symbol samples to a high-dimensional semantic space through an embedded vector layer of BERT to obtain feature mapping vectors of words

：

；

In the method, in the process of the invention,

for the masked symbol samples, +.>

In order to embed the vector layer(s),

mapping feature vectors

Predictive generation by means of the third fully connected layer and the activation function, outputting a set of alternative probability distributions +.>

：

；

In the method, in the process of the invention,

for the parameter matrix of the third fully connected layer, < >>

For the second bias parameter, +.>

Is a sigmoid function;

will replace the probability distribution set

The probability distribution of the words with mask symbols is taken as output.

Optionally, the steganographically operating the text sequence according to the secret information and the generated probability distribution includes:

aiming at each word in the text sequence, the generation probability distribution is arranged in descending order according to the generation probability;

after being arranged before

The generation probabilities are taken out as generation candidate pools, < ->

A maximum number of bits embedded for a preset word;

calculating the ratio of the first and second generation probabilities in the generation candidate pool:

if the ratio is greater than the preset ratio threshold

Will thenFirstly, generating a probability corresponding word as the output of the word in the text sequence, and recording that the word in the text sequence is not hidden;

if the ratio is less than or equal to a preset ratio threshold

Constructing a Huffman tree according to the generation probabilities in the generation candidate pool, and acquiring a coding set of each generation probability according to the Huffman tree;

converting the secret information into a binary bit stream and initializing a value s=1;

when the codes in the code set are the same as the s bits before the binary bit stream, outputting the word corresponding to the probability of the codes as the word in the text sequence, and recording the word in the text sequence as steganography; when there is no code in the code set that is the same as the s bits before the binary bit stream, let s=s+1 and repeat the current step until s is greater than the total number of bits of the binary bit stream.

Optionally, the performing steganographic operation on the non-steganographic word according to the secret information and the replacement probability distribution includes:

for each non-steganographic word, arranging the replacement probability distribution in descending order according to the size of the replacement probability;

after being arranged before

The individual substitution probabilities are taken out as substitution candidate pools, < >>

A maximum number of bits embedded for a preset word;

constructing a Huffman tree according to the replacement probability in the replacement candidate pool, and acquiring a coding set of each replacement probability according to the Huffman tree;

when the codes existing in the code set are the same as the s bits before the binary bit stream, taking the word corresponding to the substitution probability corresponding to the codes as the output of the non-steganographic word, and recording the non-steganographic word as steganographic; when there is no code in the code set that is the same as the s bits before the binary bit stream, let s=s+1 and repeat the current step until s is greater than the total number of bits of the binary bit stream.

In a second aspect, the present invention provides a multi-scale joint text steganography system comprising:

the information acquisition module is used for acquiring the text sequence and the secret information;

the generation module is used for inputting the text sequence into a pre-constructed generation and replacement joint model and obtaining the generation probability distribution of each word;

the first steganography module is used for carrying out steganography operation on the text sequence according to the secret information and the generated probability distribution, and obtaining a first steganography text and a steganography record;

the replacement module is used for determining non-steganographic words in the text sequence according to the steganographic records, inputting the text sequence into a pre-constructed generated replacement joint model, and obtaining the replacement probability distribution of each non-steganographic word;

the second steganography module is used for carrying out steganography operation on the non-steganography words according to the secret information and the replacement probability distribution, and obtaining a second steganography text;

and the joint steganography module is used for generating joint steganography text according to the first steganography text and the second steganography text.

In a third aspect, the present invention provides a secret information extraction method based on the above-mentioned multi-scale joint text steganography method, including:

acquiring a joint steganography text;

inputting the joint steganography text into a pre-constructed generation and replacement joint model, and obtaining the generation probability distribution of each word;

carrying out extraction operation according to the generated probability distribution and the joint steganography text to obtain a first extraction text and an extraction record;

determining unextracted words in the joint steganography text according to the extraction records, inputting the joint steganography text into a pre-constructed generated replacement joint model, and obtaining replacement probability distribution of each unextracted word;

extracting according to the replacement probability distribution and the joint steganography text to obtain a second extracted text;

generating secret information according to the first extracted text and the second extracted text;

the secret information extracting operation according to the generated probability distribution and the joint steganography text comprises the following steps:

aiming at each word in the joint steganography text, the generation probability distribution is arranged in descending order according to the generation probability;

after being arranged before

The generation probabilities are taken out as generation candidate pools, < ->

A maximum number of bits embedded for a preset word;

if the ratio is greater than the preset ratio threshold

The first generation probability corresponding word is used as the output of the word in the joint steganography text, and the word in the joint steganography text is recorded as not extracted;

if the ratio is less than or equal to a preset ratio threshold

when the codes in the code set have the codes corresponding to the words which are the same as the words in the joint steganography text, the codes corresponding to the words are used as the output of the words in the joint steganography text, and the words in the joint steganography text are recorded as the extraction;

the extracting operation according to the replacement probability distribution and the joint steganography text comprises the following steps:

for each unextracted word, arranging the replacement probability distribution according to the descending order of the replacement probability;

after being arranged before

A maximum number of bits embedded for a preset word;

when the codes in the code set are the same as the words in the joint steganography text, the codes are used as the output of the non-extracted words, and the non-extracted words are recorded as extraction.

Compared with the prior art, the invention has the beneficial effects that:

according to the multi-scale combined text steganography method and system, the generated model and the generated and replaced combined model of the replaced model are constructed, so that the feature consistency of the generated model and the replaced model is guaranteed; the generated replacement joint model is applied to the steganography process, so that the text redundancy is utilized to the maximum extent, the steganography embedding (steganography) rate is improved, and the steganography text quality is ensured.

Drawings

Fig. 1 is a flowchart of a multi-scale joint text steganography method according to an embodiment of the present invention.

Fig. 2 is a flowchart of a secret information extraction method of a multi-scale joint text steganography method according to a third embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.

Embodiment one:

as shown in fig. 1, an embodiment of the present invention provides a multi-scale joint text steganography method, including the following steps:

s1, acquiring a text sequence and secret information;

s2, inputting the text sequence into a pre-constructed generation and replacement joint model, and obtaining the generation probability distribution of each word;

s3, performing steganography operation on the text sequence according to the secret information and the generated probability distribution, and obtaining a first steganography text and a steganography record;

s4, determining non-steganographic words in the text sequence according to the steganographic records, inputting the text sequence into a pre-constructed generated replacement joint model, and obtaining replacement probability distribution of each non-steganographic word;

s5, performing steganography operation on the non-steganography words according to the secret information and the replacement probability distribution, and obtaining a second steganography text;

s6, generating a joint steganography text according to the first steganography text and the second steganography text.

1. The construction process for generating the replacement joint model comprises the following steps:

11. acquiring a preset number of text data; for example, 30 ten thousand pieces of text data are obtained from an OPUS dataset website;

12. preprocessing text data, and constructing a sample set based on the preprocessed text data;

wherein the preprocessing comprises the following steps:

121. dividing the text data, reserving words in the division result and generating word sequences;

122. taking the first n-1 bits of the word sequence as a sample, taking the last n-1 bits of the word sequence as a label, and n being the total number of bits of the word sequence;

123. if the number of bits of the sample or the label is smaller than a preset bit number threshold value N, filling the tail of the corresponding sample or label by filling a symbol to enable the number of bits of the sample or the label to be equal to the preset bit number threshold value N;

124. if the number of bits of the sample or the label is larger than a preset number of bits threshold N, cutting off words at the tail of the corresponding sample or label to enable the number of bits to be equal to the preset number of bits threshold N;

13. dividing a sample set into a training set and a verification set according to a preset proportion; the ratio is usually set to be 8:2, all the words appearing in the training set are required to be counted and the word frequency is calculated, the words with the word frequency meeting the preset word frequency threshold value are added into a dictionary, and the length of the dictionary is the length of the model output probability distribution; words in the dictionary are in one-to-one correspondence with probabilities in the probability distribution;

14. generating a replacement joint model based on PyTorch building, wherein the generation of the replacement joint model comprises a generation model and a replacement model, the generation model is used for outputting a generation probability distribution, and the replacement model is used for outputting a replacement probability distribution;

15. iterative training is carried out on the generated and replaced joint model by using a training set, after iterative training, the generated and replaced joint model after iterative training is verified by using a verification set, and after verification, loss is reserved

The minimum generation replaces the joint model and outputs.

Wherein iteratively training the generation of the alternative joint model using the training set comprises:

151. inputting samples in the training set to generate a replacement joint model, and acquiring the generation probability distribution output by the generation model and the replacement probability distribution output by the replacement model;

152. calculating loss according to the generated probability distribution prediction result and the replacement probability distribution prediction result which are respectively used as the input of the cross entropy loss function with the label

And->

For loss->

And->

Sum acquisition loss

；

153. To loss of

154. carrying out iteration by taking the generated replacement joint model after parameter optimization into the step of iterative training (namely returning to step 151) until loss

In step 151, generating a probability distribution of model outputs includes:

(1.1) extracting time sequence relation characteristic vectors of words in the sample one by using LSTM, and forming a time sequence relation characteristic matrix

；

(1.2) calculating the relation weight of each word in the sample on the time sequence characteristic through a multi-head self-attention mechanism, and reflecting the relation weight into an attention moment array

：

；

In the method, in the process of the invention,

for attention head->

Output feature vector, ">

For the total number of attention head>

Attention head->

A parameter matrix corresponding to query, key, value vector, +.>

For the attention parameter matrix, +.>

，

For the dimension of the timing relation feature vector, +.>

For the connection operation +.>

Is a sigmoid function;

(1.3) matrix the time sequence relation characteristic

And attention matrix->

Multiplying to obtain time characteristic matrix of each time step

：

；

(2.1) mapping each word in the sample to a high-dimensional semantic space through a word embedding layer to obtain a word embedding vector of each word;

(2.2) building a graph Structure

Is->

，/>

The number of words in the sample;

(2.3) extracting spatial relationships of all words in the sample by sliding window algorithm to build the graph structure

Edge set of (i.e.)>

，/>

The number of edges;

(2.4) use of GAT from graph Structure

：

；

In the method, in the process of the invention,

for node->

To node->

Attention coefficient of>

For node->

Is>

For node->

Node->

And node->

Spatial relation feature vector, +_>

A linear transformation weight matrix for each node, < +.>

Is a weight vector, ++>

To activate the function +.>

To splice the two vectors;

(2.5) attention coefficient

：/>

；

In the method, in the process of the invention,

for attention head->

A corresponding weight matrix;

(3.1) time-feature matrix

And spatial feature matrix->

：

；

In the method, in the process of the invention,

a parameter matrix for the first full connection layer;

(3.2) fusing the feature matrices

：

；

In the method, in the process of the invention,

for the parameter matrix of the second fully connected layer, < >>

Is the first bias parameter.

In step 151, the replacement probability distribution of the replacement model output includes:

(1.1) randomly selecting a plurality of words from the sample to replace the words with symbols representing the mask, so as to obtain a sample with the mask symbols;

(1.2) mapping the masked symbol samples to a high-dimensional semantic space through an embedded vector layer of BERT to obtain feature mapping vectors of words

：

；

In the method, in the process of the invention,

for the masked symbol samples, +.>

In order to embed the vector layer(s),

(1.3) mapping the feature to the vector

：

；

In the method, in the process of the invention,

for the parameter matrix of the third fully connected layer, < >>

For the second bias parameter, +.>

Is a sigmoid function;

(1.4) replacing the probability distribution set

The probability distribution of the words with mask symbols is taken as output.

2. Steganographically operating a sequence of text based on the secret information and the generated probability distribution includes:

2.1, aiming at each word in the text sequence, arranging the generation probability distribution according to the generation probability size descending order;

2.2 before alignment

The generation probabilities are taken out as generation candidate pools, < ->

A maximum number of bits embedded for a preset word;

2.3, calculating the ratio of the first generation probability to the second generation probability in the generation candidate pool:

2.4 if the ratio is greater than the preset ratio threshold

Outputting the word corresponding to the first generation probability as a word in the text sequence, and recording that the word in the text sequence is not hidden;

2.5, if the ratio is less than or equal to the preset ratio threshold

2.6, converting the secret information into a binary bit stream, and initializing a value s=1;

2.7, when the codes exist in the code set and are the same as the s bits before the binary bit stream, outputting the word corresponding to the probability generated by the codes as the word in the text sequence, and recording the word in the text sequence as steganography; when there is no code in the code set that is the same as the s bits before the binary bit stream, let s=s+1 and repeat the current step until s is greater than the total number of bits of the binary bit stream.

3. Steganographically operating the non-steganographic word based on the secret information and the replacement probability distribution includes:

3.1, aiming at each non-steganographic word, arranging the replacement probability distribution in descending order according to the size of the replacement probability;

3.2 before alignment

A maximum number of bits embedded for a preset word;

3.3, constructing a Huffman tree according to the replacement probability in the replacement candidate pool, and acquiring a coding set of each replacement probability according to the Huffman tree;

3.4, converting the secret information into a binary bit stream, and initializing a numerical value s=1;

3.5, when the codes exist in the code set and are the same as the s bits before the binary bit stream, outputting the word corresponding to the substitution probability corresponding to the codes as an un-steganographic word, and recording the un-steganographic word as steganographic; when there is no code in the code set that is the same as the s bits before the binary bit stream, let s=s+1 and repeat the current step until s is greater than the total number of bits of the binary bit stream.

Embodiment two:

the embodiment of the invention provides a multi-scale joint text steganography system, which comprises the following components:

And (3) implementation:

as shown in fig. 2, according to a first embodiment, the present invention provides a secret information extraction method of a multi-scale joint text steganography method, including:

s11, acquiring a joint steganography text;

s12, inputting the joint steganography text into a pre-constructed generation and replacement joint model, and obtaining the generation probability distribution of each word;

s13, carrying out extraction operation according to the generated probability distribution and the joint steganography text, and obtaining a first extraction text and an extraction record;

s14, determining unextracted words in the joint steganography text according to the extraction records, inputting the joint steganography text into a pre-constructed generated replacement joint model, and obtaining replacement probability distribution of each unextracted word;

s15, extracting according to the replacement probability distribution and the joint steganography text to obtain a second extracted text;

s16, generating secret information according to the first extracted text and the second extracted text;

(1) Aiming at each word in the joint steganography text, the generation probability distribution is arranged in descending order according to the generation probability;

(2) After being arranged before

The generation probabilities are taken out as generation candidate pools, < ->

A maximum number of bits embedded for a preset word;

(3) Calculating the ratio of the first and second generation probabilities in the generation candidate pool:

(4) If the ratio is greater than the preset ratio threshold

(5) If the ratio is smaller than or equal to the preset ratio threshold value

(6) When the codes in the code set have the codes corresponding to the words which are the same as the words in the joint steganography text, the codes corresponding to the words are used as the output of the words in the joint steganography text, and the words in the joint steganography text are recorded as the extraction;

wherein, the extracting operation according to the replacement probability distribution and the joint steganography text comprises the following steps:

(1) Arranging the replacement probability distribution according to the descending order of the replacement probability for each unextracted word;

(2) After being arranged before

A maximum number of bits embedded for a preset word;

(3) Constructing a Huffman tree according to the replacement probability in the replacement candidate pool, and acquiring a coding set of each replacement probability according to the Huffman tree;

(4) When the codes in the code set are the same as the words in the joint steganography text, the codes are output as the non-extracted words, and the non-extracted words are recorded as the extraction.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims

1. A method of multi-scale joint text steganography, comprising:

acquiring a text sequence and secret information;

2. The method of claim 1, wherein the creating the alternative joint model comprises:

acquiring a preset number of text data;

3. A multi-scale joint text steganography method as recited in claim 2, wherein the preprocessing comprises:

4. A multi-scale joint text steganography method as recited in claim 2, wherein iteratively training the generation of the alternative joint model using a training set comprises:

And->

For loss->

And->

Sum acquisition loss->

；

To loss of

5. The method of claim 4, wherein generating the probability distribution of model output comprises:

；

：

；

In the method, in the process of the invention,

for attention head->

Output feature vector, ">

For the total number of attention head>

Attention head->

A parameter matrix corresponding to query, key, value vector, +.>

For the attention parameter matrix, +.>

，/>

For the dimension of the timing relation feature vector, +.>

For the connection operation +.>

Is a sigmoid function;

the time sequence relation characteristic matrix

And attention matrix->

Multiplying to obtain time characteristic matrix of each time step>

：

；

constructing a graph structure

And embedding the word embedding vectors of all words in the sample as a graph structure +.>

Is->

，/>

The number of words in the sample;

Edge set of (i.e.)>

，/>

The number of edges;

using GAT from graph structure

：

；

In the method, in the process of the invention,

for node->

To node->

Attention coefficient of>

For node->

Is>

For node->

Node->

And node->

Spatial relation feature vector, +_>

A linear transformation weight matrix for each node, < +.>

As a weight vector of the weight vector,

to activate the function +.>

To splice the two vectors;

attention coefficient

：

；

In the method, in the process of the invention,

for attention head->

A corresponding weight matrix;

matrix the time characteristics

And spatial feature matrix->

：/>

；

In the method, in the process of the invention,

a parameter matrix for the first full connection layer;

will fuse the feature matrix

：

；

In the method, in the process of the invention,

for the parameter matrix of the second fully connected layer, < >>

Is the first bias parameter.

6. The method of claim 4, wherein the substitution probability distribution of the substitution model output comprises:

：

；

In the method, in the process of the invention,

for the masked symbol samples, +.>

In order to embed the vector layer(s),

mapping feature vectors

Predictive generation by a third full connection layer and an activation function, outputting a replacement probability distribution set

：

；

In the method, in the process of the invention,

for the parameter matrix of the third fully connected layer, < >>

For the second bias parameter, +.>

Is a sigmoid function;

will replace the probability distribution set

The probability distribution of the words with mask symbols is taken as output.

7. A multi-scale joint text steganography method as recited in claim 1, wherein steganographically operating a sequence of texts based on secret information and a generated probability distribution comprises:

after being arranged before

The generation probabilities are taken out as generation candidate pools, < ->

A maximum number of bits embedded for a preset word;

if the ratio is greater than the preset ratio threshold

The first generation probability corresponding word is used as the output of the word in the text sequence, and the word in the text sequence is recorded as non-steganography;

if the ratio is less than or equal to a preset ratio threshold

8. A multi-scale joint text steganography method as recited in claim 1, wherein steganographically operating on non-steganographically words based on secret information and a replacement probability distribution comprises:

after being arranged before

A maximum number of bits embedded for a preset word;

9. A multi-scale joint text steganography system, comprising:

10. A secret information extraction method based on a multi-scale joint text steganography method according to any one of claims 1-8, characterized by comprising:

acquiring a joint steganography text;

after being arranged before

The generation probabilities are taken out as generation candidate pools, < ->

A maximum number of bits embedded for a preset word;

if the ratio is greater than the preset ratio threshold

if the ratio is less than or equal to a preset ratio threshold

Constructing a Huffman tree according to the generation probabilities in the generation candidate pool, and acquiring a coding set of each generation probability according to the Huffman tree; />

after being arranged before

A maximum number of bits embedded for a preset word;