CN111666575B - Text carrier-free information hiding method based on word element coding - Google Patents

Text carrier-free information hiding method based on word element coding Download PDF

Info

Publication number
CN111666575B
CN111666575B CN202010295993.5A CN202010295993A CN111666575B CN 111666575 B CN111666575 B CN 111666575B CN 202010295993 A CN202010295993 A CN 202010295993A CN 111666575 B CN111666575 B CN 111666575B
Authority
CN
China
Prior art keywords
node
text
lemma
information
word element
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010295993.5A
Other languages
Chinese (zh)
Other versions
CN111666575A (en
Inventor
王晓梅
张维
张晨旭
吴亚男
安鑫
陈兴强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Engineering University of PLA Strategic Support Force
Original Assignee
Information Engineering University of PLA Strategic Support Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Engineering University of PLA Strategic Support Force filed Critical Information Engineering University of PLA Strategic Support Force
Priority to CN202010295993.5A priority Critical patent/CN111666575B/en
Publication of CN111666575A publication Critical patent/CN111666575A/en
Application granted granted Critical
Publication of CN111666575B publication Critical patent/CN111666575B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Bioethics (AREA)
  • Storage Device Security (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a text carrier-free information hiding method based on lemma coding, which takes a text as a carrier and realizes the hidden transmission of secret information through the lemma coding, and comprises the following specific steps: establishing a dynamically updated text library, and normalizing the format of the text through preprocessing; obtaining a word element sequence of each text by using a word segmentation module, forming a word element index file, and constructing a word element node tree by using the word element index file; arranging adjacent sub-nodes of the non-leaf lemma nodes in a descending order according to the transition probability, and coding adjacent paths of the sub-nodes; constructing an isomorphic text set of each lemma node source path; and the sender retrieves the corresponding text according to the secret information and sends the text to the receiver, and the receiver can extract the secret information through corresponding inverse transformation. Compared with the existing text carrier-free information hiding technology, the method can resist the existing steganography detection technology, meanwhile, the embedding capacity is obviously improved, and the application scene of the carrier-free information hiding method is greatly expanded.

Description

Text carrier-free information hiding method based on word element coding
Technical Field
The invention belongs to the technical field of information security, and particularly relates to a text carrier-free information hiding method based on word element coding.
Background
The development of network and communication technology has greatly promoted the productivity revolution, becoming indispensable pillar force for social development. Influenced by the openness characteristic of the internet, the risk of data security is increasingly complex, and the concealment and the security of communication activities are urgently to be strengthened.
Under the premise of not influencing the normal function of the digital carrier, the information hiding technology embeds the preprocessed secret information into the selected carrier, and the information transmission is realized through the transmission of the carrier. Compared with encryption methods, information hiding can better eliminate the perceptibility of secret information. However, in practice, the conventional information hiding technology inevitably modifies the carrier in a certain granularity, so that the statistical characteristics of the carrier are changed, and the carrier is difficult to resist a specific steganography detection attack. In this context, the concept of bearer-less information hiding has attracted the attention of researchers. The carrier-free information hiding method takes the secret information as a drive, directly retrieves and transmits the natural text meeting the requirements, and the receiver can extract the secret information according to the convention rule. Compared with the traditional information hiding technology, the carrier-free information hiding method does not need to modify the carrier, so that the existing steganography detection means can be resisted. Therefore, the carrier-free information hiding technology can really realize the hidden transmission of key data, has incomparable advantages in the aspects of hiding performance, detection resistance and the like, and further promotes the rapid development of the information hiding technology.
The research work of carrier-free information hiding carried out by taking texts as objects mainly comprises the following steps: document 1 (jihong yong, chapter j, sun star. Text carrierless information hiding scheme [ C ]// national information hiding and multimedia information security academic convention. 2016.) based on single keyword cuts secret information into keyword forms, generates a positioning tag by using user identity information, retrieves a natural text containing a combination of the tag and the keyword to send, and a receiver can extract the secret information according to the tag; document 2 (Zhou Z, mu Y, zhao N, et al. Coverless Information high Method Based on Multi-keywords [ J ]. 2016.) hides the number of keywords in a text by the part of speech of a word, and eliminates the phenomenon of tag ambiguity in the extraction process through the screening and reassignment of tags, with a slight increase in the Hiding capacity. Document 3 (Zhang J, wang L, lin h. Coverless Text Information mapping Method Based on the Rank Map [ J ]. 38555a.s.) converts secret Information into a Text set common word using a word conversion protocol, and guides the positioning of the common word using a word-Rank tag, thereby realizing the embedding and extraction of secret Information; document 4 (Zhang J, huang H, wang L, et al. Conversion text information associating using the frequency words hash [ J ]. International Journal of Network Security,2017,19 (6): 1016-1023.) defines a common word distance of a text, selects a corresponding location tag for the converted secret information by the common word distance and word rank tag location protocol, and directly retrieves the text containing the converted secret information and the corresponding word rank tag as a steganographic carrier; document 5 (Xianyi C, shell C. Text conversion formatting on and selection of words [ J ]. Soft Computing, 2018.) uses parity of chinese character unicode coding as a label, and uses a commonly used compound vocabulary as a keyword, thereby further improving the text hiding success rate and hiding capacity; document 6 (Xianyi C, name C. Text conversion based on compound and selection of words [ J ]. Soft Computing, 2018.) uses word2vec to obtain the approximate vocabulary of the keyword as the replacement when the search mismatch occurs, which significantly improves the hiding success rate.
The research results can be classified as a carrier-free information hiding method based on a label model, in the method, the secret information or the conversion form thereof only exists in a specific part (such as a specific keyword and the like) of a text, namely, the carrier text only has a specific position for transmitting the secret information, the main part is mainly used for keeping the normal semantics and the complete structure of the text and does not bear the function of representing specific information, each text can only hide 1-2.87 Chinese characters on average, and the hiding capacity is very limited. In addition, the combination of the tag and the keyword makes the success of hiding closely related to the capacity of the text library and the range of covering words, and the rare keywords often cannot be successfully matched, thereby reducing the success rate of hiding.
Based on the method, the text-based carrier-free information hiding is realized based on the word element coding, and compared with the existing research results, the method has stable hiding success rate and obviously improves the hiding capacity of the carrier text. Because the method does not change the natural text, the existing steganography detection means can be resisted, and the method has ideal concealment and safety.
Disclosure of Invention
The invention provides a text carrier-free information hiding method based on word element coding, which aims at solving the problems of unstable hiding success rate and low hiding capacity of the existing text carrier-free hiding method, and obviously improves the hiding success rate and the hiding capacity.
In order to achieve the purpose, the invention adopts the following technical scheme:
a text carrierless information hiding method based on lemma coding comprises the following steps:
step 1: establishing a dynamically updated text library C, and preprocessing each text in the text library C;
and 2, step: sequentially reading the preprocessed text contents, extracting the word element information, and constructing a word element node tree G according to the extracted word element information;
and step 3: traversing a lexical element node tree G, arranging adjacent sub-nodes of any non-leaf lexical element node in a descending order according to the transition probability, and coding an adjacent path of the lexical element node;
and 4, step 4: traversing the lexical element node tree G, and constructing an isomorphic text set of a source path of each lexical element node;
and 5: encrypting the secret information, determining a source path of a lemma node according to the lemma node tree G and the encrypted bit stream, and selecting a cipher text-carrying book from a corresponding isomorphic text set and sending the cipher text-carrying book;
step 6: and receiving the secret-carrying text, extracting the lemma information of the secret-carrying text, extracting the encrypted bit stream in the lemma information according to the lemma node tree G, and extracting the secret information through corresponding inverse transformation.
Further, the step 1 comprises:
step 1.1: removing stop words and non-Chinese characters in each text in the text library C;
step 1.2: and screening each text in the text library C according to the text length, and removing the text with the length deviating from the preset value.
Further, the step 2 comprises:
step 2.1: sequentially reading the preprocessed text contents, extracting the corresponding word element contents, position indexes and available text links of the texts, and storing to form word element index files;
step 2.2: inquiring the lemma index file obtained in the step 2.1, aggregating lemmas with position index of 1 and same content into a same node, using the same node as a first-layer lemma node of the lemma node tree G, and storing the lemma node index file according to the structure of the lemma node identifier, the father node identifier, the position index, the lemma content and the available text link set;
step 2.3: let V i Is a set of layer i morpheme nodes of a morpheme node tree G, v i,j For the jth lemma node of the ith layer of the lemma node tree G, let i =2, for
Figure BDA0002452207280000031
Reading v i-1,j The text content of the available text link set is aggregated into the same node as v by using the lemma with the position index of i and the same content in the part of text i-1,j Until the set V is reached i-1 All the lemma nodes are processed, and the ith layer of lemma nodes of the lemma node tree are obtained;
step 2.4: and (5) repeating the step 2.3 and the step 2.4 until all the lemma index files are processed, and obtaining a lemma node tree G of the text library C by letting i = i + 1.
Further, the step 3 comprises:
step 3.1: sequentially importing each non-leaf lemma node, and matching the lemma nodes according to the transition probability of the lemma nodesArranging the neighbor child nodes in a descending order; the transition probability of the lemma node is
Figure BDA0002452207280000041
Wherein S j Is S i Of a neighboring child node, T j Representing a lemma node S j The number of available text links, sigma T represents the lemma node S i The sum of the number of available text links of all adjacent child nodes;
step 3.2: acquiring the number n of adjacent paths of each non-leaf lemma node, if n is more than or equal to 2, encoding the adjacent paths of the lemma node, wherein the encoding bit number is m = [ log ] 2 n](ii) a If n is<2, the node is skipped.
Further, the step 4 comprises:
step 4.1: sequentially importing each lemma node, and if the lemma node is a leaf node, taking an available text link set corresponding to the lemma node as an isomorphic text set of a lemma node source path; if the lemma node is a non-leaf node, judging whether the lemma node has an adjacent path which is not coded;
and 4.2: if the word element node exists, the available text link set of the sub-node corresponding to the adjacent path which is not coded is used as the isomorphic text set of the word element node, and the adjacent path which is not coded and the subsequent word element node are deleted from the word element node tree G;
step 4.3: and if the word element node does not exist, selecting a part of texts from the available text link set of the word element node as an isomorphic text set of the path of the word element node source, and deleting corresponding texts from the available text link set of the child node.
Further, the step 5 comprises:
step 5.1: the secret information M is encrypted and packaged by an information frame structure, the total length of the information frame does not exceed the hiding capacity upper limit N of the lemma node tree G,
Figure BDA0002452207280000042
n denotes the number of levels of the lemma node tree G, B i Representing the highest encoding number of each layer;
step 5.2: and inquiring the word element node tree G, determining a source path of the word element node according to the information frame, selecting a secret-carrying text from the isomorphic text set of the path, and sending the secret-carrying text.
Further, the step 6 comprises:
step 6.1: receiving a secret-carrying text, extracting the lemma information of the secret-carrying text, converting the lemma information into an encrypted bit stream according to a lemma node tree G, extracting an encrypted information segment and a check code through an information frame structure corresponding to the bit stream, judging whether the received secret-carrying text is tampered or not through the check code, if so, replacing a channel for retransmission, and if not, executing the next step;
step 6.2: and combining the encrypted information in segments according to the receiving sequence of the encrypted text-carrying text, and decrypting to extract the original secret information.
Compared with the prior art, the invention has the following beneficial effects:
1) The invention obviously improves the hidden capacity of the secret-carrying text, effectively avoids the abnormal phenomena of dense transmission of a large number of texts and the like caused by over-small hidden capacity, and ensures that the information transmission is more concealed and safer;
2) The method does not modify the carrier text, and the transmitted carrier ciphertext has a complete semantic structure, normal statistical characteristics and good readability, and can resist various existing steganography detection means;
3) The invention better integrates the encryption technology and the related technology of the check coding, can resist the potential tampering attack in the transmission process and has certain robustness;
4) The selection of the text set, the word segmentation mode, the coding mode and the establishment rule of the isomorphic text set can directly influence the mapping relation between the text and the source path of the lemma node, an unauthorized party is difficult to crack secret information, and the data security can be effectively guaranteed.
Drawings
Fig. 1 is a basic flowchart of a text carrierless information hiding method based on lemma coding according to an embodiment of the present invention;
fig. 2 is an exemplary diagram of a lemma node tree of a text carrierless information hiding method based on lemma coding according to an embodiment of the present invention;
fig. 3 is an exemplary diagram of a coded lemma node tree of a text carrierless information hiding method based on lemma coding according to an embodiment of the present invention.
Detailed Description
In a statistical feature-based natural language processing model, S = (W) for a given sentence 1 W 2 W 3 ...W n ) The probability of occurrence can be expressed as P (S) = P (W) 1 )P(W 2 |W 1 )P(W 3 |W 1 W 2 )...P(W n |W 1 W 2 W 3 ...W n-1 ). As can be seen from the above formula, the probability of each word element is not completely independent, and the probability of generating the nth word is determined by the first n-1 words (W) 1 ,W 2 ,...,W n-1 ) And (6) determining. The model reveals the dependency relationship between the lemmas and provides a new mapping space for carrier-free information hiding.
Preprocessing each text of the appointed text library such as word removal and word segmentation, acquiring a word element set of each text, and finding out that: (1) Influenced by multiple factors such as subject, emotion, writing method and the like, even if the text is based on approximate content, the word element set of the text has obvious difference; (2) The connection relation between different word elements presents larger difference, and any word element W n Often only a specific range of lemmas are closely related (e.g., compound words, etc.). Based on this, the present invention utilizes a transition process between lemma nodes to represent information. For the purpose of explaining the basic idea and the implementation details of the present invention, the definitions of the relevant points are clear as follows:
and (3) word element: basic elements forming the sentence comprise forms of single words, expressions, compound phrases and the like;
and (3) a word element node tree: a word element node tree G = (V, E) established according to word element nodes and the connection relation between the nodes, wherein the set V stores information of the word element nodes, and the set E stores the connection relation between the word element nodes;
leaf node: for a given lemma node tree G, if the lemma node S i Without child nodes, the term element node S i As leaf nodes of a lemma node tree G;
A source path: let S i And S j Respectively as the initial node and the end node of the path p, if the lemma node S i Is the root node of the lemma node tree, the path p is called the lemma node S j The source path of (a);
an adjacent path: let S i And S j Respectively as the initial node and the end node of the path p, if the lemma node S j Is S i The adjacent child node of (2) is called a path p as a lemma node S i An adjoining path of (a);
isomorphic text sets: if a certain text can confirm the source path of the token node S, the text is called as an isomorphic text of the source path of the token node S, and the corresponding set is an isomorphic text set;
the invention is further illustrated by the following examples in conjunction with the accompanying drawings:
example 1
As shown in fig. 1, a text unsupported information hiding method based on lemma coding includes:
step S101: establishing a dynamically updated text library C, and preprocessing each text in the text library C;
step 102: sequentially reading the preprocessed text contents, extracting the word element information, and constructing a word element node tree G according to the extracted word element information;
step 103: traversing a lexical element node tree G, arranging adjacent sub-nodes of any non-leaf lexical element node in a descending order according to the transition probability, and coding an adjacent path of the lexical element node;
step 104: traversing the lexical element node tree G, and constructing an isomorphic text set of a source path of each lexical element node;
step 105: encrypting the secret information, determining a source path of the lemma node according to the lemma node tree G and the encrypted bit stream, and selecting a ciphertext from the corresponding isomorphic text set and sending the ciphertext;
step 106: and receiving the secret-carrying text, extracting the word element information of the secret-carrying text, extracting the encrypted bit stream in the word element information according to the word element node tree G, and realizing the extraction of the secret information through corresponding inverse transformation.
Further, the step 101 includes:
step 101.1: removing stop words and non-Chinese characters in each text in the text library C;
step 101.2: and screening each text in the text library C according to the text length to remove the text with the length significantly deviating from the preset value, specifically, the parameter can be determined by the user in combination with the communication scene, and the text with the proper length is selected for transmission (for example, the secret information is transmitted by using short text in instant messaging).
Further, the step 102 includes:
step 102.1: sequentially reading the content of each preprocessed text, extracting the corresponding word element content, position index and available text link of each text by using a word segmentation module, and storing to form a word element index file as shown in table 1; the step makes the lemma of each text have a unique identifier; as an implementable manner, the embodiment adopts a jieba word segmentation device in the python module, and the word segmentation device can provide multiple word segmentation modes;
TABLE 1 lemma index File Structure example
Position indexing Content of word element Available text links
Step 102.2: inquiring the lemma index file obtained in the step 102.1, aggregating lemmas with the position index of 1 and the same content into the same node, using the same node as a first-layer lemma node of the lemma node tree G, and storing the lemma index file in a structure shown in a table 2;
table 2 lemma node storage structure example
Token node identification Parent node identification Position indexing Content of word element Available text link collections
Step 102.3: let V i Is a set of layer i morpheme nodes of a morpheme node tree G, v i,j For the jth lemma node at the ith level of the lemma node tree G, let i =2, for
Figure BDA0002452207280000071
Reading v i-1,j The text content of the available text link, the lemmas with the position index of i and the same content in the partial text are aggregated into the same node as v i-1,j Until the set V, the process is repeated i-1 All the word element nodes are processed, and the ith layer of word element nodes of the word element node tree are obtained;
step 102.4: and (5) repeating the step 102.3 and the step 102.4 until all the lemma index files are processed, and obtaining the lemma node tree G of the text library C by letting i = i + 1.
Further, the step 103 includes:
step 103.1: sequentially importing each non-leaf lemma node, and arranging adjacent child nodes in a descending order according to the transition probability of the lemma node; the transition probability of the lemma node is
Figure BDA0002452207280000081
Wherein S j Is S i Of a neighboring child node, T j Representing a lemma node S j The number of available text links, Σ T represents a lemma node S i The sum of the number of available text links of all adjacent child nodes;
step 103.2: acquiring the number n of adjacent paths of each non-leaf lemma node, if n is more than or equal to 2, encoding the adjacent paths of the lemma node, wherein the encoding bit number is m = [ log ] 2 n](ii) a If n is<2, the node is skipped.
Further, the step 104 includes:
step 104.1: sequentially importing each lemma node, and if the lemma node is a leaf node, taking an available text link set corresponding to the lemma node as an isomorphic text set of a lemma node source path; if the word element node is a non-leaf node, judging whether the word element node has an adjacent path which is not coded;
step 104.2: if the word element node exists, the available text link set of the sub-node corresponding to the adjacent path which is not coded is used as the isomorphic text set of the word element node, and the adjacent path which is not coded and the subsequent word element node are deleted from the word element node tree G;
step 104.3: if the vocabulary entry node does not exist, selecting a part of texts from the available text link set of the vocabulary entry node as an isomorphic text set of the source path of the vocabulary entry node, and deleting corresponding texts from the available text link set of the child node.
Further, the step 105 comprises:
step 105.1: the sender encrypts the secret information M through an encryption algorithm and a key K and encapsulates the secret information M in an information frame structure shown in Table 3, the total length of the information frame does not exceed the hiding capacity upper limit N of the lemma node tree G,
Figure BDA0002452207280000082
n denotes the number of levels of the lemma node tree G, B i Representing the highest encoding bit number of each layer;
table 3 information frame structure example
Length information Encryption information segmentation Check code
It should be noted that, before the next step is executed, the sender needs to screen the information frame, so as to avoid the situation that the same text is repeatedly sent by the same information frame, and further improve the concealment of the communication activity;
step 105.2: and querying the lemma node tree G, determining a source path of the lemma node according to the information frame, selecting a secret text from the isomorphic text set of the path, and sending the secret text and the key K.
Further, the step 106 includes:
step 106.1: a receiver receives the secret-carrying text, utilizes a word segmentation module to extract the word element information of the secret-carrying text, converts the word element information into an encrypted bit stream according to a word element node tree G, respectively extracts encrypted information segments and check codes according to the length information of a frame structure header corresponding to the bit stream, judges whether the received secret-carrying text is tampered or not according to the check codes, if so, changes a channel for retransmission, otherwise, executes the next step;
step 106.2: after the correct encrypted information is obtained, all encrypted information segments are combined according to the receiving sequence of the ciphertext-carrying book, and the original secret information can be correctly extracted through inverse transformation such as decryption.
As an implementation manner, for convenience of description and without loss of generality, assuming that an information frame to be transmitted is "101001", a partial structure of the lemma node tree G is shown in fig. 2, and hiding and extracting information can be achieved through the following steps:
(1) Encoding
As shown in fig. 2, the lemma node tree G is a typical multi-way tree structure. If morpheme node S i N, then its adjacent paths can be stably embedded m = [ log = 2 n]The binary information of bit, the encoded lemma node tree G is shown in fig. 3.
(2) Constructing isomorphic text collections
The isomorphic text can establish the mapping relation between the text and the lemma node source path, and avoid ambiguity during extraction. If the lemma node is a leaf node, the available text link set is used as an isomorphic text set of the lemma node source path; if the lemma node is a non-leaf node, the construction method of the isomorphic text set depends on whether the lemma node has an adjacent path which is not coded.
Taking FIG. 3 as an example, the lemma node S 3 There are 5 sub-nodes in total, of which the path S is contiguous 3 →S 9 Not encoded and cannot be used to characterize secret information. To maximize the use of the text library resources, the lemma node S may be used 9 As a lemma node S 3 The isomorphic text set of (a). Morpheme node S 7 The adjacent paths are all coded, a certain number of texts can be selected from the available text link set of the lemma node according to an agreed rule to serve as an isomorphic text set, for example, n texts with the shortest length can be selected from the node with the most texts in the available text link set; the shortest text length can also be extracted from each of the available text link sets of each node. In order to avoid ambiguity in extraction, the corresponding text needs to be deleted from the available text link set of its child node.
(3) Information hiding
The information frame to be transmitted is "1010001", and as can be seen from fig. 3, the corresponding source path is S 0 →S 3 →S 7 →S 11 And selecting a text from the corresponding isomorphic text set to transmit.
(4) Information extraction
The receiving party obtains the lemma sequence of the secret-carrying text through the word segmentation module as (S) 0 、S 3 、S 7 、S 11 、…),The isomorphic text set is inquired to know that the secret text is the lemma node S 11 The isomorphic text of the source path can obtain an information frame '1010001' according to the coding rule, and the extraction of the secret information can be realized through corresponding inverse transformation.
The above shows only the preferred embodiments of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.

Claims (4)

1. A text carrierless information hiding method based on word element coding is characterized by comprising the following steps:
step 1: establishing a dynamically updated text library C, and preprocessing each text in the text library C;
step 2: sequentially reading the preprocessed text contents, extracting the word element information, and constructing a word element node tree G according to the extracted word element information; the word elements are basic elements forming sentences and comprise single words, words and compound phrases; the step 2 comprises the following steps:
step 2.1: sequentially reading the preprocessed text contents, extracting the corresponding word element contents, position indexes and available text links of the texts, and storing to form word element index files;
step 2.2: inquiring the lemma index file obtained in the step 2.1, aggregating lemmas with the position index of 1 and the same content into the same node as a first-layer lemma node of the lemma node tree G, and storing according to the structure of the lemma node identifier, the father node identifier, the position index, the lemma content and the available text link set;
step 2.3: let V i Is a set of layer i morpheme nodes of a morpheme node tree G, v i,j For the jth lemma node at the ith level of the lemma node tree G, let i =2, for
Figure FDA0003887083080000011
Reading v i-1,j The text content of the set of available text links, and the position in the portion of textThe lemmas with index of i and same content are aggregated into the same node as v i-1,j Until the set V is reached i-1 All the word element nodes are processed, and the ith layer of word element nodes of the word element node tree are obtained;
step 2.4: making i = i +1, repeating the step 2.3 and the step 2.4 until all the lemma index files are processed, and obtaining a lemma node tree G of the text library C;
and step 3: traversing a lexical element node tree G, arranging adjacent sub-nodes of any non-leaf lexical element node in a descending order according to the transition probability, and coding an adjacent path of the lexical element node; let S i And S j Respectively as initial node and termination node of path p, if the lemma node S j Is S i The adjacent child node of (2) is called a path p as a lemma node S i An adjoining path of (a); the step 3 comprises the following steps:
step 3.1: sequentially importing each non-leaf lemma node, and arranging adjacent child nodes in a descending order according to the transition probability of the lemma node; the transition probability of the lemma node is
Figure FDA0003887083080000012
Wherein S j Is S i Of a neighboring child node, T j Representing a lemma node S j The number of available text links, sigma T represents the lemma node S i The sum of the number of available text links of all adjacent child nodes;
step 3.2: acquiring the number n of adjacent paths of each non-leaf lemma node, if n is more than or equal to 2, encoding the adjacent paths of the lemma node, wherein the encoding bit number is m = [ log ] 2 n](ii) a If n is<2, skipping the node;
and 4, step 4: traversing the lexical element node tree G, and constructing an isomorphic text set of a source path of each lexical element node; let S i And S j Respectively as initial node and termination node of path p, if the lemma node S i Is the root node of the lemma node tree, the path p is called the lemma node S j The source path of (a); if a certain text can definitely represent the source path of the morpheme node S, the text is called as a morphemeThe corresponding set of the isomorphic texts of the source path of the node S is an isomorphic text set; the step 4 comprises the following steps:
step 4.1: sequentially importing each lemma node, and if the lemma node is a leaf node, taking an available text link set corresponding to the lemma node as an isomorphic text set of a lemma node source path; if the lemma node is a non-leaf node, judging whether the lemma node has an adjacent path which is not coded;
step 4.2: if yes, the available text link set of the child node corresponding to the uncoded adjacent path is used as an isomorphic text set of the lemma node, and the uncoded adjacent path and the subsequent lemma node are deleted from the lemma node tree G;
step 4.3: if not, selecting part of texts from the available text link set of the word element node as an isomorphic text set of the path of the word element node source, and deleting corresponding texts from the available text link set of the child node;
and 5: encrypting the secret information, determining a source path of a lemma node according to the lemma node tree G and the encrypted bit stream, and selecting a cipher text-carrying book from a corresponding isomorphic text set and sending the cipher text-carrying book;
step 6: and receiving the secret-carrying text, extracting the word element information of the secret-carrying text, extracting the encrypted bit stream in the word element information according to the word element node tree G, and realizing the extraction of the secret information through corresponding inverse transformation.
2. The method for hiding the text unsupported information based on the lemma coding according to claim 1, wherein the step 1 comprises:
step 1.1: removing stop words and non-Chinese characters in each text in the text library C;
step 1.2: and screening each text in the text library C according to the text length, and removing the text with the length deviating from the preset value.
3. The method as claimed in claim 1, wherein the step 5 comprises:
step 5.1: the secret information M is encrypted and packaged by an information frame structure, the total length of the information frame does not exceed the upper limit N of the hidden capacity of the lemma node tree G,
Figure FDA0003887083080000031
n denotes the number of levels of the lemma node tree G, B i Representing the highest encoding bit number of each layer;
step 5.2: and querying the lemma node tree G, determining a source path of the lemma node according to the information frame, selecting a cipher text from the isomorphic text set of the path, and sending the cipher text.
4. The method for hiding text unsupported information based on lemma coding according to claim 3, wherein said step 6 comprises:
step 6.1: receiving a secret-carrying text, extracting the lemma information of the secret-carrying text, converting the lemma information into an encrypted bit stream according to a lemma node tree G, extracting an encrypted information segment and a check code through an information frame structure corresponding to the bit stream, judging whether the received secret-carrying text is tampered or not through the check code, if so, replacing a channel for retransmission, and if not, executing the next step;
step 6.2: and combining the encrypted information in segments according to the receiving sequence of the encrypted text-carrying text, and decrypting to extract the original secret information.
CN202010295993.5A 2020-04-15 2020-04-15 Text carrier-free information hiding method based on word element coding Active CN111666575B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010295993.5A CN111666575B (en) 2020-04-15 2020-04-15 Text carrier-free information hiding method based on word element coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010295993.5A CN111666575B (en) 2020-04-15 2020-04-15 Text carrier-free information hiding method based on word element coding

Publications (2)

Publication Number Publication Date
CN111666575A CN111666575A (en) 2020-09-15
CN111666575B true CN111666575B (en) 2022-11-18

Family

ID=72382726

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010295993.5A Active CN111666575B (en) 2020-04-15 2020-04-15 Text carrier-free information hiding method based on word element coding

Country Status (1)

Country Link
CN (1) CN111666575B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7301938B2 (en) * 2021-12-06 2023-07-03 みずほリサーチ&テクノロジーズ株式会社 Document creation system, document creation method and document creation program
CN115203236B (en) * 2022-07-15 2023-05-12 哈尔滨工业大学 text-to-SQL generating method based on template retrieval
CN117093717B (en) * 2023-10-20 2024-01-30 湖南财信数字科技有限公司 Similar text aggregation method, device, equipment and storage medium thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003141106A (en) * 2001-10-31 2003-05-16 Communication Research Laboratory Private text extract method and device
CN107609356A (en) * 2017-08-09 2018-01-19 南京信息工程大学 Text carrier-free information concealing method based on label model
CN107947918A (en) * 2017-10-26 2018-04-20 青岛大学 A kind of carrier-free text steganography method based on character feature
CN108683491A (en) * 2018-03-19 2018-10-19 中山大学 A kind of information concealing method based on encryption and spatial term
CN109711121A (en) * 2018-12-27 2019-05-03 清华大学 Text steganography method and device based on Markov model and Huffman encoding

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003141106A (en) * 2001-10-31 2003-05-16 Communication Research Laboratory Private text extract method and device
CN107609356A (en) * 2017-08-09 2018-01-19 南京信息工程大学 Text carrier-free information concealing method based on label model
CN107947918A (en) * 2017-10-26 2018-04-20 青岛大学 A kind of carrier-free text steganography method based on character feature
CN108683491A (en) * 2018-03-19 2018-10-19 中山大学 A kind of information concealing method based on encryption and spatial term
CN109711121A (en) * 2018-12-27 2019-05-03 清华大学 Text steganography method and device based on Markov model and Huffman encoding

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
《中文文本信息隐藏研究进展》;吴国华;《通信学报》;20190930;第40卷(第9期);全文 *
《基于文本集常见词的无载体信息隐藏技术研究》;张建军;《中国博士学位论文全文数据库(信息科技辑)》;20180615;全文 *
基于Huffman编码的文本信息隐藏算法;戴祖旭等;《计算机工程》;20070805(第15期);全文 *
基于汉字笔画编码矩阵的文本隐写方法;于翔美等;《青岛大学学报(自然科学版)》;20190515(第02期);全文 *
基于词平台汉字编码的文本信息隐藏算法;张洪礼等;《计算机工程》;20100405(第07期);全文 *

Also Published As

Publication number Publication date
CN111666575A (en) 2020-09-15

Similar Documents

Publication Publication Date Title
CN111666575B (en) Text carrier-free information hiding method based on word element coding
CN107947918B (en) Carrier-free text steganography method based on character features
CN104917747B (en) A kind of secret communication method
Roy et al. A novel approach to format based text steganography
CN108418683B (en) Carrier-free text steganography method based on Chinese character structural features
CN109711121B (en) Text steganography method and device based on Markov model and Huffman coding
CN111797409B (en) Carrier-free information hiding method for big data Chinese text
Odeh et al. Steganography in Arabic text using Kashida variation algorithm (KVA)
US10114900B2 (en) Methods and systems for generating probabilistically searchable messages
CN108683491B (en) Information hiding method based on encryption and natural language generation
CN111027081B (en) Text carrierless information hiding method based on feature mapping
Baawi et al. A comparative study on the advancement of text steganography techniques in digital media
CN115758415A (en) Text carrier-free information hiding method based on Chinese character component combination
Rafat et al. Secure digital steganography for ASCII text documents
Malik et al. A high capacity text steganography scheme based on huffman compression and color coding
Safaka et al. Matryoshka: Hiding secret communication in plain sight
KR101584127B1 (en) System and method for deniable encryption
Prasad et al. A new approach to Telugu text steganography
CN114422230B (en) Information transmission system based on data encryption
Zhang et al. Coverless text steganography method based on characteristics of word association
Majumder et al. A new text steganography method based on sudoku puzzle generation
Guan et al. A novel coverless text steganographic algorithm based on polynomial encryption
Figueira A Survey on Semantic Steganography Systems
CN112000970B (en) Text carrier-free steganography method and system based on component association diagram
CN107423341B (en) Ciphertext full-text search system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant