CN117131203B

CN117131203B - Knowledge graph-based text generation steganography method, related method and device

Info

Publication number: CN117131203B
Application number: CN202311021705.7A
Authority: CN
Inventors: 李亚敏; 许虹虹; 吴笑民; 张俊
Original assignee: Hubei University
Current assignee: Hubei University
Priority date: 2023-08-14
Filing date: 2023-08-14
Publication date: 2024-03-22
Anticipated expiration: 2043-08-14
Also published as: CN117131203A

Abstract

The invention discloses a method for generating a hidden writing of a text based on a knowledge graph, a related method and a device, wherein the method comprises the steps of obtaining all triples of the knowledge graph matched according to a predetermined theme, and constructing a graph structure by the triples through graph coding; determining graph structure information according to the graph structure; determining the hidden state of each node of each triplet according to the graph structure information; obtaining predicate nodes and conditional probabilities of corresponding triples according to the hidden states of the nodes of each triplet; selecting a preset number of candidate triples from all triples according to the conditional probability of the predicate node of each triplet, and coding the candidate triples to obtain coded triples; matching the secret information stream with the encoded triples, and adding the matched triples into an intermediate plan; repeating until a complete intermediate plan is obtained; and generating a steganographic text based on a steganographic text generation model according to the intermediate plan and the graph structure information.

Description

Knowledge graph-based text generation steganography method, related method and device

Technical Field

The invention relates to the field of text steganography, in particular to a knowledge-graph-based text generation steganography method, a knowledge-graph-based text generation steganography related method and a knowledge-graph-based text generation steganography related device.

Background

With the rapid development of information technology, human society enters a network big data age, and a plurality of safety problems are increasingly developed while a new technological revolution brings convenience to our work and life. In recent years, threat events for the structural security, data security and information content security of the existing information network frequently occur, and particularly, the threat events are prominent in important national infrastructure industries of banks, traffic, commerce, medical treatment, communication, electric power and the like, which have high dependence on the informatization degree.

Nowadays, big data is taken as a new stage of information technology development, and a knowledge graph is one of important storage modes of the big data. In the age of big data and digitalization, the trend of actually strengthening the network information security construction is pushing the development of information security. Information hiding (also called steganography) is one of the key technologies of information security, and in recent years, the field of information hiding has become a focus of information security. As each network station relies on multimedia. Steganography this technique can embed secret information in digital media such as audio, video, and images, without compromising the quality of its carrier. The third party is not aware of the presence of the secret information nor is it aware of the presence of the secret information. Therefore, the method is widely applied to the fields of military information support, covert communication, privacy protection, copyright protection and the like. Among them, the use of text as a carrier for information hiding is recognized by a large number of researchers, and natural language is an information carrier widely used in daily communications. Compared with images or audios, the text information is more simplified, and occupied memory is less. Therefore, the research of language steganography has important research value and practical significance.

Current text generation steganography methods are mainly implemented by means of automatic text generation technology in natural language processing (Nature LanguageProcessing, NLP). The current related research work results at home and abroad are mostly text generation models and conditional probability coding methods based on neural networks, and after continuous optimization, high-quality steganography texts can be generated, so that the concealment of the steganography methods is improved, the concealment capacity of secret information can be increased to a certain extent, and the concealment of algorithms is improved.

Disclosure of Invention

In order to better realize information hiding, the inventor provides a text generation steganography method based on a knowledge graph, a related method and a related device.

In a first aspect, an embodiment of the present application provides a method for generating a steganography on the basis of a knowledge graph, where the method includes:

acquiring all triples of a knowledge graph matched according to a predetermined theme, and constructing a graph structure by the triples through graph coding; the triples comprise subject nodes, object nodes and predicate nodes;

determining the hidden state of each node of each triplet according to the graph structure information; obtaining predicate nodes and conditional probability of the corresponding triples according to the hidden states of the nodes of each triplet;

Selecting a preset number of candidate triples from all triples according to the conditional probability of the predicate node of each triplet, and coding the candidate triples to obtain coded triples;

matching the secret information stream with the preset number of the coded triples, and adding the matched triples into an intermediate plan;

repeating the steps of selecting candidate triples from the triples except the matched triples to encode and match the secret information stream, and adding the newly matched triples to the intermediate plan until all triples are added to the intermediate plan;

and generating a steganographic text based on a steganographic text generation model according to the intermediate plan and the graph structure information.

In one or some optional implementations of the embodiments of the present application, the obtaining all triples of the knowledge-graph matched according to a predetermined theme, and constructing a graph structure by graphic encoding the triples includes:

constructing a directed edge according to three nodes in the triplet; wherein the directed edges comprise directed edges between adjacent nodes and self-looped edges of the nodes;

and constructing a graph structure according to the three nodes of all the triples and the directed edges.

In one or some optional implementations of the embodiments of the present application, the determining, according to the graph structure, graph structure information and a hidden state of each node of each triplet includes:

encoding the graph structure information, and iteratively obtaining the hidden states of the nodes of each triplet by using the following formula 1:

in the method, in the process of the invention,is the hidden state of node v at time step t, ρ is the activation function, R is the set of all possible edge types, +.>Is the set of inner neighbors when node v's edge type is r, +.>Is a normalization term, W _r And b _r Is a parameter for each edge type.

In one or some optional implementations of the embodiments of the present application, the obtaining, according to the hidden states of the nodes of each triplet, a predicate node of the corresponding triplet and a conditional probability includes:

obtaining predicate nodes of corresponding triples according to the hidden states of the nodes of each triplet;

based on the following equation 2, the conditional probability of the predicate node is calculated:

wherein r is _i Is a predicate node; softmax is the activation function, h _r For the average pooling of hidden states of all predicate nodes, W represents a parameter of a graph edge type, R is a set of all possible edge types, Hiding the mean of the states for all edge type sets.

In one or some optional implementations of the embodiments of the present application, the selecting, according to the conditional probability of the predicate node of each triplet, a preset number of candidate triples from all triples and encoding the candidate triples, to obtain encoded triples includes:

sorting the corresponding triples from large to small according to the conditional probability of the predicate node, and selecting the triples with the preset number as candidate triples;

and ordering the corresponding candidate triples from big to small according to the conditional probability of the predicate node in the candidate triples, and carrying out coding assignment on the ordered candidate triples.

In one or some optional implementations of the embodiments of the present application, the matching the secret information stream with a preset number of the encoded triples, adding the matched triples to an intermediate plan includes:

determining a current secret information code according to the number of the candidate triples and the secret information stream;

and selecting one candidate triplet according to the current secret information code, and adding the matched triplet into the intermediate plan in sequence.

In one or some optional implementations of the embodiments of the present application, the repeatedly performing the steps of selecting candidate triples from among the triples other than the matched triples to encode for matching with the secret information stream, and adding a newly matched triplet to the intermediate plan until all triples are added to the intermediate plan includes:

selecting a preset number of new candidate triples from all the triples which are not accessed according to the conditional probability of the predicate node of each triplet, and encoding to obtain new encoded triples;

and matching the secret information stream with the preset number of the new coded triples, and adding the new matched triples into the intermediate plan.

In a second aspect, an embodiment of the present application provides a method for extracting text secret information based on a knowledge graph, where the method includes:

acquiring entities and relations in the steganographic text;

matching the entities and the relations in the steganographic text with the entities and the relations in the knowledge graph data set to obtain a knowledge graph corresponding to the steganographic text; obtaining all triples according to the knowledge graph, and constructing a graph structure by the triples through graph coding;

Determining the hidden state of each node of each triplet according to the graph structure information;

obtaining predicate nodes and conditional probability of the corresponding triples according to the hidden states of the nodes of each triplet;

keyword matching is carried out according to the steganographic text and the preset number of the coded triples, the codes of the matched triples are obtained, and the matched triples are added into an intermediate plan;

repeating the steps of selecting candidate triples from the triples except the matched triples to encode and match keywords with the steganographic text, and adding new matched triples to the intermediate plan until all triples are added to the intermediate plan;

and connecting the obtained codes in sequence to obtain the secret information stream.

In a third aspect, an embodiment of the present application provides a text generating steganography apparatus based on a knowledge graph, where the method includes:

the graphic coding module is used for acquiring all triples of the knowledge graph matched according to a predetermined theme, and constructing a graph structure by graphic coding of the triples; the triples comprise subject nodes, object nodes and predicate nodes;

The structure information acquisition module is used for determining graph structure information according to the graph structure;

the intermediate plan generating module is used for determining the hidden state of each node of each triplet according to the graph structure information; obtaining predicate nodes and conditional probability of the corresponding triples according to the hidden states of the nodes of each triplet; selecting a preset number of candidate triples from all triples according to the conditional probability of the predicate node of each triplet, and coding the candidate triples to obtain coded triples; matching the secret information stream with the preset number of the coded triples, and adding the matched triples into an intermediate plan; repeating the steps of selecting candidate triples from the triples except the matched triples to encode and match the secret information stream, and adding the newly matched triples to the intermediate plan until all triples are added to the intermediate plan;

and the hidden text generation module is used for generating a hidden text based on a hidden text generation model according to the intermediate plan and the diagram structure information.

In a fourth aspect, an embodiment of the present application provides a text secret information extraction device based on a knowledge-graph, where the method includes:

The knowledge map matching module is used for acquiring entities and relations in the acquisition steganographic text;

matching the entities and the relations in the steganographic text with the entities and the relations in the knowledge graph data set to obtain a knowledge graph corresponding to the steganographic text;

the graphic coding module is used for obtaining all triples according to the knowledge graph and constructing a graphic structure from the triples through graphic coding;

the intermediate plan generating module is used for determining the hidden state of each node of each triplet according to the graph structure information; obtaining predicate nodes and conditional probability of the corresponding triples according to the hidden states of the nodes of each triplet; selecting a preset number of candidate triples from all triples according to the conditional probability of the predicate node of each triplet, and coding the candidate triples to obtain coded triples; keyword matching is carried out according to the steganographic text and the preset number of the coded triples, the codes of the matched triples are obtained, and the matched triples are added into an intermediate plan; repeating the steps of selecting candidate triples from the triples except the matched triples to encode and match keywords with the steganographic text, and adding new matched triples to the intermediate plan until all triples are added to the intermediate plan;

And the secret information acquisition module is used for sequentially connecting the obtained codes to obtain a secret information stream.

In a fifth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements a knowledge-graph-based text generation steganography method as described above, and/or a knowledge-graph-based steganography text secret information extraction method as described above.

In a sixth aspect, an embodiment of the present application provides a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the above-mentioned knowledge-graph-based text generation steganography method and/or the above-mentioned knowledge-graph-based steganography text secret information extraction method when executing the computer program.

In a seventh aspect, embodiments of the present application provide a computer program product comprising instructions that, when executed on a computer device, cause the computer device to perform a knowledge-graph based text generation steganography method as described above, and/or a knowledge-graph based steganography text secret information extraction method as described above.

In an eighth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a computer program or instructions to implement a method for generating a steganographic text based on a knowledge graph as described above, and/or a method for extracting steganographic text secret information based on a knowledge graph as described above.

The beneficial effects of the technical scheme provided by the embodiment of the application at least comprise:

according to the method for generating the hidden writing of the text based on the knowledge graph, the knowledge graph of a single topic is selected, and the knowledge graph can integrate data from multiple sources and is expressed in a structured and organized mode, so that the generated text can be ensured to be consistent and have a logic structure in a task of generating the text, the generated text is ensured to be accurate and related to a given topic or field, the relevance and the consistency of generated content are realized, and the controllability of the generated content is realized; based on the graph structure information, the hidden state of the node of each triplet is obtained, and the conditional probability of the verb node is calculated, so that secret information is hidden in a generated intermediate plan according to the conditional probability ordering code assignment, the statistical distribution characteristics of an original text carrier caused by embedding of the secret information are prevented from being destroyed, the text is more natural, the generation process of the text is not interfered, and only the generation sequence of the triples is adjusted, so that the statistical distribution difference between a normal text and a steganographic text is reduced. And based on the graph structure information corresponding to the knowledge graph, the secret information stream is subjected to text steganography, and the generated steganography text has better steganography resistance.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

fig. 1 is a schematic step diagram of a knowledge-graph-based text generation steganography method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a graphic encoding provided in an embodiment of the present application;

FIG. 3 is a schematic diagram of an intermediate plan generation process provided by an embodiment of the present application;

fig. 4 is a schematic diagram of a secret information generating and extracting process according to an embodiment of the present application;

FIG. 5 is a schematic diagram of intermediate results of a steganographic text generation process provided by an embodiment of the present application;

fig. 6 is an anti-steganography detection experimental diagram of a knowledge-graph-based text generation steganography method provided in an embodiment of the present application;

Fig. 7 is a schematic step diagram of a method for extracting text secret information based on a knowledge graph according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a knowledge-graph-based text generating steganography device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a text secret information extraction device based on a knowledge graph according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

In addition, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

It should be understood that the sequence numbers of the steps in the following embodiments do not mean the order of execution, and the execution order of the processes should be determined by the functions and the internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

In order to illustrate the technical solution of the present application, the following description is made by specific examples.

The inventor finds that in the prior art, the text generation type steganography method is mainly realized by means of an automatic text generation technology in natural language processing. Firstly, a carefully designed model is used, a language statistics distribution model of an original text carrier is learned from a large number of natural text corpus, then, an automatic text generation technology is utilized to encode the conditional probability distribution of each candidate output word in the text generation process, and finally, according to secret information to be embedded, a steganographic text which is close to the statistical distribution characteristics of the original text carrier but contains secret information is output. Thus, embedding of secret information inevitably destroys the statistical distribution characteristics of the original text carrier. How to improve the relativity and continuity of the steganography text on the content, so that the generated steganography text content is controllable, and the text content has a certain theme and complete semantics, thereby improving the concealment of the text generation type steganography method. Based on the above, the inventor makes the invention through further research, and provides a text generation steganography method based on a knowledge graph, a related method and a related device.

Example 1

The embodiment of the application provides a text steganography method based on a knowledge graph, which is shown by referring to fig. 1, and comprises the following steps:

s101: and acquiring all triples of the knowledge graph matched according to a predetermined theme, and constructing a graph structure by the triples through graph coding. The triples include subject nodes, object nodes, and predicate nodes.

In the step S101, the triplet is composed of two entity nodes and a relation node, where the two entity nodes are a subject node and an object node, and the relation node is a predicate node, and the order of the nodes is from the subject node to the predicate node;

the step S101 is a step of obtaining all triples of the knowledge graph matched according to the predetermined theme, and constructing a graph structure by graphic coding of the triples, and specifically includes:

In the embodiment of the application, the inventor analyzes and discovers that in the traditional text generation steganography method, the secret information is usually embedded into the text by adopting coding and encryption technology or a text generation model based on a neural network and a conditional probability coding method, so that hidden contents are random binary data which cannot be understood. Based on the above, the inventor considers that a text steganography method based on a knowledge graph is adopted, so that hidden information can be corresponding to entities, relations and the like by means of rich semantic information in the knowledge graph. By selecting a proper knowledge graph, the hidden information can be semantically matched with the carrier text, so that the generated steganographic text has specific semantics and context, and the content is controllable.

In this application, a knowledge graph is first selected, specifically, a knowledge graph of a desired subject may be selected according to the field of the hidden information, or a knowledge graph may be randomly selected when the hidden content is unintelligible binary data.

After determining the knowledge-graph for hiding the information, a triplet, which may also be referred to as RDF triplet, is obtained, where the triplet represents a Subject-prediction-Object, and includes two entities, i.e., a Subject Object(s) and an Object (o), respectively, and a relationship, i.e., a Predicate prediction (p). Formally, each triplet is seen as three nodes, s, p and o, respectively.

In the embodiment of the present application, uniqueness of triples needs to be guaranteed, so relationships are used to distinguish different triples, and two different relationships with the same entity will be regarded as different nodes.

Referring to fig. 2, after determining the triplet of the knowledge-graph, directed edges are constructed from three nodes of the triplet, and the triplet includes four directed edges to connect the nodes s→p, p→s, o→p and p→o. These edges facilitate the exchange of information between any pair of neighbors. There is also a special self-loop n→n for each node n to enable the flow of information between adjacent iterations during feature aggregation.

In the embodiment of the application, a graph structure is constructed according to the three s, p and o nodes and all the directed edges. I.e. the graph structure consists of three nodes and all directed edges.

S102, determining the hidden state of each node of each triplet according to the graph structure information.

In the embodiment of the present application, after the graph structure is constructed based on the triplets of the knowledge graph, the graph structure information in the above step S102 may be obtained by using a relational-graph neural network (Relational Graph Convolutional Networks, R-GCN) encoder. Specifically, an initial feature representation and a hidden state may be allocated to each node in the graph structure, information propagation and hidden state update are performed by stacking a plurality of graph convolution layers, and the graph convolution layers update the hidden states of nodes by aggregating the hidden states of neighboring nodes around each node, so that the nodes can perform hidden state update by using the information of the neighboring nodes thereof, and gradually obtain the graph structure information. And propagating and updating nodes in a hidden state through a plurality of GCN layers, namely, the graph structure information containing the whole knowledge graph.

In the embodiment of the application, the R-GCN encoder is used to encode the graph, and the following iterative method based on the following formula 1 is used to learn the hidden state of each node v:

In the method, in the process of the invention,is the hidden state of node v at time step t, ρ is the activation function, R is the set of all possible edge types, +.>Is the set of inner neighbors when the edge type of node v is R, each edge type R is considered a separate relationship in the R-GCN encoder. An inner neighbor refers to a collection of nodes connected to node v by an edge type r. In other words, for the set of inner neighbors of node v whose edge type is r, it contains all nodes connected to node v by edge type r. />Is a normalization term, W _r And b _r Is a parameter for each edge type.

S103: and obtaining predicate nodes and conditional probability of the corresponding triples according to the hidden states of the nodes of each triplet.

The step S103 of obtaining the predicate node and the conditional probability of the corresponding triplet according to the hidden state of each node of the triplet specifically includes:

in the embodiment of the present application, the hidden state of each node is obtained by using the R-GCN encoder described in step S102. Wherein the hidden state of the predicate is different from the hidden states of the subject node and the object node, the hidden state of the predicate node comprising two additional binary bits, the two additional binary bits being connected to the input feature x _t . One binary bit indicates whether the predicate was accessed, and the other binary bit points to the last predicate accessed. Obtaining the final hiding state of each predicate after encodingAs the representation, other node characteristics of the triples are combined with the hidden state of the predicate nodes and input into a softmax function for probability normalization. The softmax function converts the vector into a probability distribution. Based on the following common formula

Equation 2, calculating to obtain conditional probability of predicate node:

wherein r is _i Is a predicate node; softmax is the activation function, h _r For the average pooling of hidden states of all predicate nodes, W represents a parameter of a graph edge type, R is a set of all possible edge types,hiding the mean of the states for all edge type sets.

S104: and selecting a preset number of candidate triples from all triples according to the conditional probability of the predicate node of each triplet, and coding the candidate triples to obtain coded triples.

The step S104 is a step of selecting and encoding a preset number of candidate triples from all the triples according to the conditional probability of the predicate node of each triplet, so as to obtain encoded triples, and specifically includes:

In the embodiment of the present application, fig. 3 is a schematic diagram of an intermediate plan generation, where the steps of selecting candidate triples and encoding are performed in the original map to the plan 1. Specifically, the conditional probabilities of predicate nodes in the triples obtained in step S103 are ordered from large to small, the triples with the first preset number and the largest conditional probability of the predicate nodes are selected as candidate triples, and the candidate triples are encoded according to a perfect binary tree.

In this embodiment of the present application, the total number of current candidate triples and the length of the secret information stream to be embedded currently are determined by the total number of triples that are not currently added to the intermediate plan, and the specific calculation formula is shown in formula 3:

where N represents the total number of triples in all triples that have not been added to the intermediate plan, L is the length of the secret information stream that the current triplet needs to be embedded, and S represents the total number of candidate triples.

S105: and matching the secret information stream with the preset number of the coded triples, and adding the matched triples into an intermediate plan.

In the step S105, the matching the secret information stream with the preset number of the encoded triples, and adding the matched triples to the intermediate plan includes:

In the embodiment of the application, in order to better combine the information from the knowledge graph structure and the text, the invention needs to generate a content plan for text implementation or a triplet sequence identified by the relation, which is called an intermediate plan. This intermediate plan is a long sequence of information. The intermediate program of learning the knowledge graph is a continuous process, which can be regarded as a sequential decision process. The schematic flow chart of the generation of the intermediate plan is shown in fig. 3, wherein the secret information flow is arranged in the upper blue horizontal box, the lower blue vertical box is the intermediate plan, and the sequential decision process generated by the intermediate plan is shown from left to right.

In the embodiment of the application, firstly, the length of the secret information stream to be embedded currently is calculated according to the number of the candidate triples, and the current secret information code is determined by combining the length of the secret information stream to be embedded currently and the secret information stream. And matching the current secret information code in the coded candidate triples, matching one candidate triplet, and adding the matched triples into an intermediate plan in sequence.

S106: and repeating the steps of selecting candidate triples from the triples except the matched triples to encode and match the secret information stream and adding the newly matched triples to the intermediate plan until all the triples are added to the intermediate plan.

In the above step S106, the step of repeatedly performing the steps of selecting a candidate triplet from the triples other than the matched triplet, encoding the candidate triplet, matching the secret information stream, and adding the newly matched triplet to the intermediate plan until all triples are added to the intermediate plan includes:

In the embodiment of the application, the predicate of the triplet added into the intermediate plan can obtain an indication characteristic, the probability of the triplet which is accessed when the conditional probability distribution is calculated next time can be small, the triplet is ensured not to be selected again, and the continuous execution of the intermediate plan generation is ensured. And then repeating the operations of steps S103-S105 from the remaining non-accessed triples, controlling the selection of the next triples through the secret information flow, and adding the selected triples into the intermediate plan. This process loops until the remaining one of the last triples automatically joins the plan, at which point all triples are accessed.

S107: and generating a steganographic text based on a steganographic text generation model according to the intermediate plan and the graph structure information.

In a specific embodiment, referring to FIG. 4, the steganographic text generation model may include a first graphical neural network, a second graphical neural network, an LSTM plan encoder, and an LSTM decoder;

In the step S107, the converting the intermediate plan into the steganographic text according to the steganographic text generating model includes:

inputting the graph structure into the first graph neural network for coding to obtain the graph structure information;

inputting the graph structure into the second graph neural network for reordering and embedding the secret information stream to obtain the intermediate plan;

inputting the intermediate plan into the LSTM plan encoder to obtain a serialized intermediate plan;

and inputting the serialized intermediate plan and the graph structure information into the LSTM decoder for decoding to obtain the steganographic text.

In the embodiment of the present application, the steganographic text generation model, which may also be referred to as a dual-teletext coding Stego model (GCN-LSTM-Stego), is a model that introduces a graphic structure into the text generation model and conditional probability coding. The structure of the steganographic text generation model is shown in fig. 4, wherein the upper half part is a schematic structure diagram of steganographic text generation, and the lower half part is a schematic structure diagram of secret information stream extraction. When the text steganography is executed, a first R-GCN encoder, namely a first graph neural network is used for capturing graph structure information of a given input knowledge graph, and a second R-GCN encoder, namely a second graph neural network is used for serializing and reordering nodes of the graph. When ordering, according to the secret information bit stream control sequence, secret information is embedded into the graph structure information to generate an intermediate plan, namely the intermediate plan and the graph structure information are double representations of the same knowledge graph.

In the embodiment of the application, the intermediate plan is obtained, that is, after the input sequence of all the triples is determined, the triples are combined with the corresponding word objects to complete the intermediate plan of the triples. To better help the LSTM plan encoder capture the semantic roles of each entity and predicate, special markers are added as separators before the subject, predicate, and object.

In the implementation of the application, after the intermediate plan and the graph structure information are obtained, an LSTM plan encoder which inputs the information of the capturing sequence of the intermediate plan and the graph structure information encodes the plan, and the information between the input and the output is aligned to obtain the serialized intermediate plan. Finally, in conjunction with the contextual representation of the encoder, the decoding is accomplished using an LSTM decoder with attention and replication mechanisms.

As a specific example of the embodiment of the present application, the intermediate results in the steganographic text flow obtained by the steganographic text generation model are shown in fig. 5. Reference Text is a knowledge graph corresponding to the original Text, the model input further comprises the knowledge graph, all triples corresponding to the knowledge graph can be obtained, and the data are as follows:

Alpena county regional airport|location|Maple ridge township,alpena county,michigan；

Alpena county regional airport|runway length|1533_._0；

Alpena county regional airport|city served|alpena,michigan；

Alpena county regional airport|elevation above the sea level|210；

Alpena county regional airport|runway name|7/25；

in a specific embodiment, after determining the triplet of the knowledge-graph, the directed edges are constructed according to three nodes of the triplet, and the triplet includes four directed edges to connect the nodes s→p, p→s, o→p and p→o. These edges facilitate the exchange of information between any pair of neighbors. There is also a special self-loop n→n for each node n to enable the flow of information between adjacent iterations during feature aggregation. And constructing a graph structure according to the s, p and o nodes and all the directed edges. The R-GCN encoder is used for acquiring the graph structure information, an initial characteristic representation and a hidden state are allocated for each node in the graph structure, information propagation and hidden state updating are carried out by stacking a plurality of graph convolution layers, the graph convolution layers update the hidden states of the nodes by aggregating the hidden states of neighbor nodes around each node, and the graph structure information is obtained step by step. And obtaining predicate nodes and conditional probabilities of the corresponding triples according to the hidden states of the nodes of the triples.

In a specific embodiment, in the original knowledge graph, referring to fig. 3, the triples in the black solid line are unselected triples, the number of unselected triples in the original graph is 5, i.e. n=5 at this time, l=2 and s=4 are calculated, i.e. the length of the secret information stream in which the current triples can be embedded is 2-bit binary code, and the total number of candidate triples is 4. As shown in plan 1 in fig. 3, the candidate triples are in the black dotted line frame, and are selected according to the conditional probability ordering of each predicate node of 5 triples.

And then coding the selected candidate triples based on a perfect binary tree, ensuring that the length of each node is 2, and coding leaf nodes based on the perfect binary tree only, wherein non-leaf nodes are used for constructing the tree structure. Where there are 4 candidate triples, the binary code length is 2, i.e. there are 2 ² =4 codes, each triplet may be assigned a unique code according to the structural characteristics of a perfect binary tree, in order from left to right of the binary tree leaf nodes: and (3) assigning the 4 candidate triples with the 4 codes according to the order of the conditional probability of the predicate node from big to small by using '00', '01', '10', '11', so as to obtain the coded candidate triples.

The length l=2 of the secret information stream to be embedded currently, the secret information to be embedded currently is encoded into 10 according to the current text embedding progress of the secret information stream, the triples encoded into 10 can be matched in the candidate triples after encoding, and the triples are added into the intermediate plan, as shown in plan 1 in fig. 3.

If the total number of triples not added to the intermediate plan is larger, for example, the number of triples not added to the intermediate plan n=26, the current triples need to be embedded in the secret information stream length l=5, and the total number of candidate triples s=10, that is, the binary code length is 5, that is, 2 exists ⁵ The number of candidate triples S s=10, but the current secret information stream code does not match the candidate triples, if this occurs, the first four bits are selected for further determination until they are identical.

In a specific embodiment, referring to FIG. 3, in plan 1, after adding a candidate triplet encoded as "10" to the intermediate plan, the triplet will no longer appear in the subsequent schematic, but rather than being deleted, the conditional probability of its predicate node is minimized. Plan 2 shown in fig. 3, where the number of non-selected triples is 4, i.e. where n=4, the available l=2, s=4 is calculated, i.e. the length of the secret information stream into which the current triplet can be embedded is a 2-bit binary code, and the total number of candidate triples is 4. Encoding the selected candidate triples based on a perfect binary tree, wherein the candidate triples have 4, and the binary code length is 2, namely 2 ² Each triplet is assigned a unique code according to the structural characteristics of a perfect binary tree, in order from left to right of the binary tree leaf nodes =4 codes: and (3) assigning the 4 candidate triples with the 4 codes according to the order of the conditional probability of the predicate node from big to small by using '00', '01', '10', '11', so as to obtain the coded candidate triples. The length L=2 of the secret information stream to be embedded currently, the secret information to be embedded currently is encoded into 10 according to the progress of the secret information stream to be embedded currently, the triples encoded into 10 can be matched in the candidate triples after encoding, and the triples are added into the intermediate plan. And so on, add candidate triples encoded as "0" to the intermediate plan, add candidate triples encoded as "1" to the intermediate plan. And at the moment, the last triplet is remained, the intermediate plan is automatically added, secret information is not embedded any more, and the final intermediate plan is obtained.

After the intermediate plan is obtained by the R-GCN encoder, to better help the LSTM plan encoder capture the semantic roles of each entity and predicate, special markers "S|", "P|", and "O|" are added as separators before the subject, predicate, and object. Finally decoding is performed by means of an LTSM decoder.

In one embodiment, after the LSTM decoder decodes the hidden text, the hidden text is translated into the target language using the OpenNMT toolkit, which is a Torch neural network machine translation system that is open-source by Harvard NLP. The method can quickly train and optimize the high-performance GPU and the memory, and has high usability. In the graph encoding process, since the generating task is relatively complex, as a specific implementation manner of the embodiment of the present application, the dimension of the input and hidden states may be set to 256. The two R-GCN encoders are of the same construction, with two GCN layers, each layer having a concealment size of 100. The activation function is remu. Training objectives were optimized using Adam optimizer, learning rate was 0.001, and stopped in advance on development set. The LSTM plan encoder comprises a 2-layer bi-directional LSTM with the same dimensional settings as the R-GCN encoder to facilitate information fusion.

The model provided by the invention realizes better semantic expression, and introduces graphic structure information into a text generation process based on R-GCN. The use of one R-GCN encoder to capture structural information for a given input knowledge-graph and another R-GCN to sequence and reorder the nodes of the graph makes the text more natural. And the generation process of the text is not interfered, only the generation sequence of the triples is adjusted, and the statistical distribution difference between the normal text and the steganographic text is reduced. And the method also has better performance in the aspect of anti-steganography detection.

Steganography detection is a process of detecting and judging from observed data whether secret information exists therein in the case of a known or unknown embedding algorithm. It and information hiding are natural countermeasures, and are also an important index for measuring whether a steganography method is safe or not. The present invention uses two text steganography methods to test the detection resistance of the prior art RNN-Stega and VAE-Stega models as presented by the methods of the examples of the present application, and the comparison results are shown in fig. 6. By comparing experimental results, four indexes of accuracy, precision, recall rate and F1-Score can be known, compared with RNN-Stega and VAE-Stega in the prior art, the numerical value of the steganography text of the model embedded secret information provided by the application is the lowest, so that the steganography resistance of the model embedded secret information is obviously superior to that of the steganography text generated by the comparison methods RNN-Stega and VAE-Stega, and the steganography text of the model embedded secret information provided by the application is not easy to detect, so that the model embedded secret information provided by the application has obvious advantages in steganography resistance.

Example two

Based on the same inventive concept, the embodiment of the application also provides a method for extracting text secret information based on a knowledge graph, and referring to fig. 7, the method comprises the following steps:

S201: acquiring entities and relations in the steganographic text;

s202: matching the entities and the relations in the steganographic text with the entities and the relations in the knowledge graph data set to obtain a knowledge graph corresponding to the steganographic text;

s203: obtaining all triples according to the knowledge graph, and constructing a graph structure by the triples through graph coding;

s204: determining the hidden state of each node of each triplet according to the graph structure information;

s205: obtaining predicate nodes and conditional probability of the corresponding triples according to the hidden states of the nodes of each triplet;

s206: selecting a preset number of candidate triples from all triples according to the conditional probability of the predicate node of each triplet, and coding the candidate triples to obtain coded triples;

s207: keyword matching is carried out according to the steganographic text and the preset number of the coded triples, the codes of the matched triples are obtained, and the matched triples are added into an intermediate plan;

s208: repeating the steps of selecting candidate triples from the triples except the matched triples to encode and match keywords with the steganographic text, and adding new matched triples to the intermediate plan until all triples are added to the intermediate plan;

S209: and connecting the obtained codes in sequence to obtain the secret information stream.

Example III

Based on the same inventive concept, the embodiment of the present application further provides a text generating steganography device based on a knowledge graph, referring to fig. 8, the device includes:

the graphic coding module 101 is used for acquiring all triples of the knowledge graph matched according to a predetermined theme, and constructing a graph structure by graphic coding of the triples; the triples comprise subject nodes, object nodes and predicate nodes;

a structure information obtaining module 102, configured to determine, according to the graph structure, graph structure information and a hidden state of each node of each triplet;

an intermediate plan generating module 103, configured to obtain predicate nodes and conditional probabilities of corresponding triples according to the hidden states of the nodes of each triplet; selecting a preset number of candidate triples from all triples according to the conditional probability of the predicate node of each triplet, and coding the candidate triples to obtain coded triples; matching the secret information stream with the preset number of the coded triples, and adding the matched triples into an intermediate plan; repeating the steps of selecting candidate triples from the triples except the matched triples to encode and match the secret information stream, and adding the newly matched triples to the intermediate plan until all triples are added to the intermediate plan;

The steganographic text generation module 104 is configured to generate steganographic text according to the intermediate plan and the graph structure information based on a steganographic text generation model.

Example IV

Based on the same inventive concept, the embodiment of the application further provides a device for training a real-time prediction model of a mechanical drilling rate, referring to fig. 9, the device includes:

the knowledge graph matching module 201 is configured to obtain entities and relationships in the steganographic text;

the graphic coding module 202 is configured to obtain all triples according to the knowledge graph, and construct a graphic structure from the triples through graphic coding;

the structure information obtaining module 203 is configured to determine, according to the graph structure, graph structure information and a hidden state of each node of each triplet;

an intermediate plan generating module 204, configured to obtain predicate nodes and conditional probabilities of corresponding triples according to the hidden states of the nodes of each triplet; selecting a preset number of candidate triples from all triples according to the conditional probability of the predicate node of each triplet, and coding the candidate triples to obtain coded triples; keyword matching is carried out according to the steganographic text and the preset number of the coded triples, the codes of the matched triples are obtained, and the matched triples are added into an intermediate plan; repeating the steps of selecting candidate triples from the triples except the matched triples to encode and match keywords with the steganographic text, and adding new matched triples to the intermediate plan until all triples are added to the intermediate plan;

The secret information obtaining module 205 is configured to sequentially connect the obtained codes to obtain a secret information stream.

Example five

Based on the same inventive concept, the embodiments of the present application further provide a computer-readable storage medium having stored thereon a computer program that, when executed by a processor, implements the knowledge-graph-based text generation steganography method described in the first embodiment and/or the knowledge-graph-based text secret information extraction method described in the second embodiment.

Example six

Based on the same inventive concept, the embodiments of the present application further provide a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the knowledge-graph-based text generation steganography method described in the first embodiment and/or the knowledge-graph-based text secret information extraction method described in the second embodiment when executing the computer program.

Example seven

Based on the same inventive concept, the embodiments of the present application further provide a computer program product comprising instructions, which when run on a computer device, cause the computer device to perform the knowledge-graph based text generation steganography method described in the above embodiment one and/or the knowledge-graph based text secret information extraction method described in the above embodiment two.

Example eight

Based on the same inventive concept, the embodiments of the present application further provide a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a computer program or instructions to implement the knowledge-graph-based text generation steganography method described in the first embodiment and/or the knowledge-graph-based text secret information extraction method described in the second embodiment.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A knowledge-graph-based text generation steganography method, comprising:

2. The knowledge-graph-based text generation steganography method of claim 1, wherein the obtaining all triples of the knowledge-graph that match according to a predetermined topic, building the triples into a graph structure by graph coding, includes:

3. The knowledge-graph-based text generation method of claim 2, wherein determining graph structure information and hidden states of nodes of each triplet based on the graph structure comprises:

4. The knowledge-graph-based text generation steganography method of claim 3, wherein the obtaining predicate nodes of the corresponding triples and conditional probabilities from hidden states of nodes of each triplet includes:

wherein r is _i Is a predicate node; softmax is the activation function, h _r For the average pooling of hidden states of all predicate nodes, W represents a parameter of a graph edge type, R is a set of all possible edge types,for all edge classesThe type set hides the mean of the states.

5. The knowledge-graph-based text generation steganography method of claim 4, wherein the selecting and encoding a predetermined number of candidate triples from the all triples according to the conditional probability of the predicate node of each triplet, to obtain the encoded triples, includes:

6. The knowledge-graph based text generation method of claim 5, wherein said matching the secret information stream with a preset number of said encoded triples, adding the matched triples to an intermediate plan, comprising:

7. A knowledge-graph based text generation steganography method as recited in claim 1, wherein the steps of selecting candidate triples from among the triples other than the matched triples for encoding to match the secret information stream and adding the newly matched triples to the intermediate plan are performed, comprising:

8. The method for extracting the text secret information based on the knowledge graph is characterized by comprising the following steps of:

acquiring entities and relations in the steganographic text;

Obtaining all triples according to the knowledge graph, and constructing a graph structure by the triples through graph coding;

9. A knowledge-graph-based text generation steganography device, comprising:

the structure information acquisition module is used for determining the graph structure information and the hidden state of each node of each triplet according to the graph structure;

the intermediate plan generating module is used for obtaining predicate nodes and conditional probability of the corresponding triples according to the hidden states of the nodes of each triplet; selecting a preset number of candidate triples from all triples according to the conditional probability of the predicate node of each triplet, and coding the candidate triples to obtain coded triples; matching the secret information stream with the preset number of the coded triples, and adding the matched triples into an intermediate plan; repeating the steps of selecting candidate triples from the triples except the matched triples to encode and match the secret information stream, and adding the newly matched triples to the intermediate plan until all triples are added to the intermediate plan;

10. A knowledge-graph-based text secret information extraction device, comprising:

the knowledge map matching module is used for acquiring entities and relations in the steganographic text;

the intermediate plan generating module is used for obtaining predicate nodes and conditional probability of the corresponding triples according to the hidden states of the nodes of each triplet; selecting a preset number of candidate triples from all triples according to the conditional probability of the predicate node of each triplet, and coding the candidate triples to obtain coded triples; keyword matching is carried out according to the steganographic text and the preset number of the coded triples, the codes of the matched triples are obtained, and the matched triples are added into an intermediate plan; repeating the steps of selecting candidate triples from the triples except the matched triples to encode and match keywords with the steganographic text, and adding new matched triples to the intermediate plan until all triples are added to the intermediate plan;