CN111859897A - Text steganalysis method based on dynamic routing capsule network - Google Patents

Text steganalysis method based on dynamic routing capsule network Download PDF

Info

Publication number
CN111859897A
CN111859897A CN201911004852.7A CN201911004852A CN111859897A CN 111859897 A CN111859897 A CN 111859897A CN 201911004852 A CN201911004852 A CN 201911004852A CN 111859897 A CN111859897 A CN 111859897A
Authority
CN
China
Prior art keywords
text
dynamic routing
steganography
capsule
capsule network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911004852.7A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang University of Technology
Original Assignee
Shenyang University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang University of Technology filed Critical Shenyang University of Technology
Priority to CN201911004852.7A priority Critical patent/CN111859897A/en
Publication of CN111859897A publication Critical patent/CN111859897A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a text steganalysis method based on a dynamic routing capsule network. The method can extract the potential semantic features of the text and judge the subtle difference between the common text and the steganographic text. The method is different from the traditional text steganography method, and has the innovation point that the capsule network of a dynamic routing mechanism is utilized to perform steganography analysis on the generated steganography text. The method utilizes a dynamic routing mechanism to adaptively adjust the tightness of the inter-layer relation of the capsules, ensures the discrimination accuracy rate under high embedding rate and greatly improves the discrimination accuracy rate under low embedding rate.

Description

Text steganalysis method based on dynamic routing capsule network
Technical Field
The invention relates to the fields of information hiding, big data, deep learning, natural language processing and the like, in particular to a text steganalysis method based on a dynamic routing capsule network.
Background
Today's society is in the "big data age," a new type of capability. That is, mass data is analyzed in a manner to obtain a product or service of great value. In the big data era, more data can be analyzed, and even all data related to a specific phenomenon can be processed. The text is used as the most frequent communication mode for daily use of people, and the information value contained in the text is undoubted. With the rapid development of internet technology and mobile social platforms, the problem of data security assurance is brought forward.
The idea of network security management is 'strictly preventing blocking and physically isolating'. The three main information security systems in Shannon's summarized cyberspace are encryption systems, privacy systems and privacy systems, respectively. Encryption systems lack security because the ciphertext is easily perceived. After the attacker takes the personal account number and the password of the privacy system, the privacy system loses security guarantee. The hiding system is different from the hiding system, and the system mainly embeds the secret information into a carrier for public transmission to obtain data which is not different from common data, so that the imperceptibility and the safety of the secret information are ensured. The carrier of the public transmission can be pictures, audios and videos, texts and the like. The text is used as the most widely used information carrier in daily life of people, and has higher information coding degree. But because of the low redundancy of the text, it is very challenging to hide the secret information with the text.
Steganography and steganalysis are the relations between spears and shields in the information hiding system, which are restricted and dynamic. Steganalysis is mainly used to detect whether data transmitted on a common channel contains secret information. Steganalysis techniques based on text signals can be divided into two categories: one is to analyze the statistical properties of the text; another type is to analyze semantic relationships in text based on deep learning. Text steganography analysis based on text statistical characteristics finds differences between steganographic texts and natural texts as much as possible by counting text structures, text appearances and the like so as to determine whether the texts contain secret information. The method is difficult to detect the steganographic text with modified text semantic content. Therefore, a text steganalysis method for analyzing semantic relations in texts based on deep learning appears, so as to realize high-accuracy steganalysis.
With the rapid development of natural language processing, many techniques for modeling text in a serialized manner have emerged. Word2vec is based on the assumption of a language model-one Word can be inferred from the context-a CBOW representation method is proposed. The word vector trained by word2vec reduces the dimension of one-hot and contains richer semantic information. FIG. 1 is a CBOW model as used herein. As shown in fig. 1, CBOW calculates the probability of occurrence of a word based on C consecutive words before and after the word. The model uses a one-layer neural network to map sparse word vectors in the one-hot form into dense vectors of 300 dimensions.
The deep learning is also called deep neural network, and is a method for performing characterization learning on data. Deep learning can simulate neural structure interpretation data of human brain, such as images, audios, videos, texts and the like, and the working principle of the deep learning mainly learns mass training data characteristics by constructing a machine learning model with multiple hidden layers, so that the accuracy of classification or prediction is realized. The depth learning emphasizes the depth of the model structure, namely the number of layers from an input layer to an output layer, and the depth is deeper when the number of layers is larger; meanwhile, the importance of feature learning is highlighted, and original features of the sample are transformed to a new feature space through layer-by-layer feature extraction. Compared with the characteristics extracted manually, the method can save labor and extract the intrinsic information characteristics of data by utilizing deep learning.
The capsule network is a novel neural network model in deep learning, is a technology for realizing deep learning by using capsules as neurons, and has the working principle that state characteristics of all data are encapsulated in a vector form in capsule detection. The capsule network solves the problems that the convolutional neural network has too few structural levels and serious information loss of the pooling layer and can not identify the rotation distortion image for a long time. The capsule network based on dynamic routing can adaptively increase or decrease the connection strength, namely, top-down feedback exists, the detailed part of data characteristic information, such as position information of text semantic characteristics, is greatly reserved, and the accuracy of text classification is greatly improved due to the characteristic. The capsule network model is shown in fig. 2.
The capsule is a set of neurons that learn to detect a particular target within a given area and output a vector whose length represents an estimate of the probability of the target being present. If the input of the capsule changes slightly, the output of the capsule will also change accordingly. Thus, the capsules are equally varied. A simple capsule network consists of three parts: a convolution layer, a main capsule layer and a digital capsule layer. The convolutional layer of the capsule network is the same as that of the convolutional neural network, and the layer extracts n-gram features at different positions of a sentence through a convolutional filter. The semantic features of the sentence extracted by the convolutional layer can be expressed as
ci=f(Xi:j·W+b0)
Wherein, Xi:jFor vectorized representation of the input layer text data, ciRepresenting the extracted features, the convolution kernel being represented as
Figure BSA0000192899980000031
h is the convolution kernel height, b0For the bias term, f is the nonlinear activation function.
The main capsule layer replaces the scalar output of each neuron in the convolutional layer with a vector output, i.e., a capsule, which essentially reflects the semantic representation of the text. The difference is that here the scalar output of the convolution kernel is replaced by the vector output to preserve the instantiated features of the text, which can be expressed as
capi=g(Ci·W+b1)
Wherein Cap represents an instantiation feature, CiRepresenting a feature set extracted from the convolutional layer, b1For the bias term, g is the activation function Squash specific to the capsule network, used to compress the length of the capsule.
The digital capsule layer utilizes the output vector of the main capsule layer to carry out parameter propagation and dynamic routing update, and finally outputs a class probability vector according to the vector modular length.
The capsule network has good feature extraction and expression and semantic understanding capacity on the sequence signals. The text can be regarded as a sequence signal, so that deep semantic features of the text can be automatically learned through a capsule network, and slight differences between the natural text and the steganographic text can be found. The existing steganography analysis technology based on deep learning discards information with less occurrence times while extracting features, so that subtle differences between steganography texts and natural texts with low embedding rate are discarded. The invention provides a text steganalysis method based on a dynamic routing capsule network, which reserves more useful information through a dynamic routing mechanism among capsule layers. Compared with the prior method, the accuracy rate of distinguishing the natural text from the steganographic text is greatly improved.
From the above, it can be known that the problem existing in the existing method can be overcome by using the dynamic routing capsule network to perform text steganalysis, and the accuracy of judgment is improved.
Disclosure of Invention
The invention discloses a text steganalysis method based on a dynamic routing capsule network. The method can extract the potential semantic features of the text and judge the subtle difference between the common text and the steganographic text. The method is different from the traditional text steganography analysis method, and has the innovation point that the generated steganography text is subjected to steganography analysis by utilizing a capsule network of a dynamic routing mechanism. The method utilizes a dynamic routing mechanism to adaptively adjust the tightness of the inter-layer relation of the capsules, ensures the discrimination accuracy rate under high embedding rate and greatly improves the discrimination accuracy rate under low embedding rate. To achieve the above object, the method comprises the steps of:
(1) constructing a text data set as a training set by using T-Steg released by Z.Yang;
(2) preprocessing data, wherein the existing form of English natural texts is all lowercase, and only letters and numbers are reserved;
(3) the label of the artificially written natural text is 0, and the label of the steganographic text is 1;
(4) performing word2vec training on a natural text data set of Twitter;
(5) Vectorizing the texts in the training set by using the word vectors trained in the step (4);
(6) modeling is carried out aiming at the vectorized text, a capsule network model is constructed, and the performance of the model is optimized through a back propagation algorithm;
(7) testing the loss value of the model, and adjusting the model training parameters according to the loss value;
(8) repeating the steps (6) to (7) until the parameters and the performance of the neural network model are stable;
(9) and inputting a test set constructed by natural text and steganographic text, and outputting a test result 0/1.
In order to ensure the accuracy of text steganography analysis, the experiment utilizes a dynamic routing capsule network to respectively extract high-dimensional semantic features of a natural text and a steganography text, and whether a text object contains secret information or not is judged by analyzing the slight difference of the features of the natural text and the steganography text. The details of the model, including two main modules, are described below: a text representation module and a text steganalysis module. The text representation module uses Word2Vec vectorization to preprocess a data set required by an experiment, and finally represents a text into a dense matrix with the maximum sentence length as the length and the width as the Word vector dimension. The text steganography analysis module uses a dynamic routing-based capsule network to model the quantitative text, analyzes the semantic features of the natural text and the steganography text, and improves the accuracy of discrimination.
Word2 Vec-based text representation
Word2Vec trains words in text with CBOW, i.e., predicts the interword Y given the context. CBOW is a neural network with only one layer. Let the input be the one-hot vector of the context word, X ═ X1,x2,…,xn) Wherein x isiRepresenting a one-hot of a word. The overall training process can be expressed as
Figure BSA0000192899980000051
Wherein, W represents weight, H is a one-dimensional column vector obtained through a hidden layer, f is a softmax function, and O is a word vector under a CBOW model. Calculating the error between O and the intermediate word Y to be predicted and adjusting W1And W2And continuously reducing the error to finally obtain the trained model.
The CBOW word vector training is performed for each sentence in the text training set, which may be expressed as
Figure BSA0000192899980000052
For each sentence S, a matrix S ∈ R may be usedm,nIs shown, wherein the t line showsThe t-th word in sentence S, m is its length, and n is the dimension of the word vector.
Text steganalysis based on capsule network
The invention uses a capsule network with a dynamic routing mechanism for text steganalysis, the network comprising: a convolution layer, a main capsule layer and a digital capsule layer.
As mentioned above, the capsule network extracts the semantic features of the text through the convolution layer, and the feature set formed after the convolution kernel slides from the beginning to the end of the sentence is
C=[c1,c2,…,cm]
The main capsule layer performs sliding window convolution on the characteristics provided by the convolution layer to store the instantiation characteristics of the text, and the obtained instantiation set is
Cap=[cap1,cap2,…,capm]
And a dynamic routing mechanism is adopted when the main capsule layer is transmitted to the fully connected capsule layer. The dynamic routing mechanism is mainly used for connection between capsule layers. The method mainly constructs nonlinear mapping in an iterative mode and changes the connection strength through dynamic routing. As shown in fig. 3. For one capsule, input uiAnd output viFor vectors, transform the matrix WijFor weights between two levels, the output is predicted
Figure BSA0000192899980000061
Is composed of
Figure BSA0000192899980000062
Iterative dynamic routing can be represented as
Figure BSA0000192899980000063
Figure BSA0000192899980000064
Wherein,vjis the output vector of the capsule, with length (0, 1). c. CijIs a coupling coefficient and is obtained by iterative dynamic routing process calculationjcij=1。sjIs an intermediate variable. For sjThe activating function of the method adopts squaring instead of ReLU, and the squaring of the activating function can realize the compression of small vectors into zero and large vectors into unit vectors, so that the time overhead is reduced, and the resources are saved. The dynamic routing algorithm is shown in algorithm 1.
Dynamic routing cannot completely replace back propagation update parameters, transforming the matrix WijThere is still a need to optimize the performance of capsule networks using back propagation. And minimizing a loss function through iterative optimization of a network so as to obtain a language model most suitable for semantic feature extraction. The method defines the loss function of the whole network as a cross entropy loss function:
Figure BSA0000192899980000071
Figure BSA0000192899980000072
Representing the probability of predicting the current output sample label to be 1, and y represents the true value.
Algorithm 1 dynamic routing algorithm
Figure BSA0000192899980000073
Drawings
FIG. 1 is a block diagram of CBOW according to the present invention
FIG. 2 is a schematic diagram of a capsule network used in the present invention
Figure 3 is a schematic representation of inter-capsule layer connections used in the capsule network of the present invention.

Claims (2)

1. The text steganalysis method based on the dynamic routing capsule network comprises the following steps:
(1) constructing a text data set as a training set by using T-Steg released by Z.Yang;
(2) preprocessing data, wherein the existing form of English natural texts is all lowercase, and only letters and numbers are reserved;
(3) the label of the artificially written natural text is 0, and the label of the steganographic text is 1;
(4) performing word2vec training on a natural text data set of Twitter;
(5) performing vectorization text representation on the text in the training set by using the word vectors trained in the step (4);
(6) modeling the vector quantization text, constructing a capsule network model, and optimizing the performance of the model through a back propagation algorithm;
(7) testing the loss value of the model, and adjusting the model training parameters according to the loss value;
(8) repeating the steps (6) to (7) until the parameters and the performance of the neural network model are stable;
(9) And inputting a test set constructed by natural text and steganographic text, and outputting a test result 0/1.
2. The method for analyzing steganography of a text in a dynamically routed capsule network as claimed in claim 1, wherein as described in steps (4), (5), (6) and (7), the connection strength of the features is adaptively adjusted by using the dynamic routing, so that the steganography analysis is effectively realized, and the accuracy rate of detecting the steganography text at a low embedding rate is improved.
CN201911004852.7A 2019-10-16 2019-10-16 Text steganalysis method based on dynamic routing capsule network Pending CN111859897A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911004852.7A CN111859897A (en) 2019-10-16 2019-10-16 Text steganalysis method based on dynamic routing capsule network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911004852.7A CN111859897A (en) 2019-10-16 2019-10-16 Text steganalysis method based on dynamic routing capsule network

Publications (1)

Publication Number Publication Date
CN111859897A true CN111859897A (en) 2020-10-30

Family

ID=72970561

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911004852.7A Pending CN111859897A (en) 2019-10-16 2019-10-16 Text steganalysis method based on dynamic routing capsule network

Country Status (1)

Country Link
CN (1) CN111859897A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102893327A (en) * 2010-03-19 2013-01-23 数字标记公司 Intuitive computing methods and systems
EP3346411A1 (en) * 2017-01-10 2018-07-11 Crowdstrike, Inc. Computational modeling and classification of data streams
CN108280480A (en) * 2018-01-25 2018-07-13 武汉大学 A kind of hidden image vector safety evaluation method based on residual error symbiosis probability
CN108923922A (en) * 2018-07-26 2018-11-30 北京工商大学 A kind of text steganography method based on generation confrontation network
CN109492416A (en) * 2019-01-07 2019-03-19 南京信息工程大学 A kind of guard method of big data image and system based on safety zone
CN109754002A (en) * 2018-12-24 2019-05-14 上海大学 A kind of steganalysis hybrid integrated method based on deep learning
CN109784082A (en) * 2019-02-21 2019-05-21 中国科学技术大学 A kind of picture and text correlation robust steganography method and system based on pdf document
CN109817233A (en) * 2019-01-25 2019-05-28 清华大学 Voice flow steganalysis method and system based on level attention network model
CN109815496A (en) * 2019-01-22 2019-05-28 清华大学 Based on capacity adaptive shortening mechanism carrier production text steganography method and device
CN109859091A (en) * 2018-12-24 2019-06-07 中国人民解放军国防科技大学 Image steganography detection method based on Gabor filtering and convolutional neural network
CN110084734A (en) * 2019-04-25 2019-08-02 南京信息工程大学 A kind of big data ownership guard method being locally generated confrontation network based on object
CN110110318A (en) * 2019-01-22 2019-08-09 清华大学 Text Stego-detection method and system based on Recognition with Recurrent Neural Network

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102893327A (en) * 2010-03-19 2013-01-23 数字标记公司 Intuitive computing methods and systems
EP3346411A1 (en) * 2017-01-10 2018-07-11 Crowdstrike, Inc. Computational modeling and classification of data streams
CN108280480A (en) * 2018-01-25 2018-07-13 武汉大学 A kind of hidden image vector safety evaluation method based on residual error symbiosis probability
CN108923922A (en) * 2018-07-26 2018-11-30 北京工商大学 A kind of text steganography method based on generation confrontation network
CN109754002A (en) * 2018-12-24 2019-05-14 上海大学 A kind of steganalysis hybrid integrated method based on deep learning
CN109859091A (en) * 2018-12-24 2019-06-07 中国人民解放军国防科技大学 Image steganography detection method based on Gabor filtering and convolutional neural network
CN109492416A (en) * 2019-01-07 2019-03-19 南京信息工程大学 A kind of guard method of big data image and system based on safety zone
CN109815496A (en) * 2019-01-22 2019-05-28 清华大学 Based on capacity adaptive shortening mechanism carrier production text steganography method and device
CN110110318A (en) * 2019-01-22 2019-08-09 清华大学 Text Stego-detection method and system based on Recognition with Recurrent Neural Network
CN109817233A (en) * 2019-01-25 2019-05-28 清华大学 Voice flow steganalysis method and system based on level attention network model
CN109784082A (en) * 2019-02-21 2019-05-21 中国科学技术大学 A kind of picture and text correlation robust steganography method and system based on pdf document
CN110084734A (en) * 2019-04-25 2019-08-02 南京信息工程大学 A kind of big data ownership guard method being locally generated confrontation network based on object

Similar Documents

Publication Publication Date Title
CN110263324B (en) Text processing method, model training method and device
CN109101552B (en) Phishing website URL detection method based on deep learning
CN112990296B (en) Image-text matching model compression and acceleration method and system based on orthogonal similarity distillation
KR20200077307A (en) Method and apparatus for detecting abnormal traffic based on convolutional autoencoder
CN110110318B (en) Text steganography detection method and system based on cyclic neural network
CN111126386A (en) Sequence field adaptation method based on counterstudy in scene text recognition
CN104700100A (en) Feature extraction method for high spatial resolution remote sensing big data
CN113435208A (en) Student model training method and device and electronic equipment
CN110533570A (en) A kind of general steganography method based on deep learning
CN117082118B (en) Network connection method based on data derivation and port prediction
CN117251795A (en) Multi-mode false news detection method based on self-adaptive fusion
CN114282059A (en) Video retrieval method, device, equipment and storage medium
CN111274412A (en) Information extraction method, information extraction model training device and storage medium
CN114694255B (en) Sentence-level lip language recognition method based on channel attention and time convolution network
CN109729070B (en) Detection method of network heterogeneous concurrent steganography channel based on CNN and RNN fusion model
CN111130942A (en) Application flow identification method based on message size analysis
CN111767697A (en) Text processing method and device, computer equipment and storage medium
CN115329073A (en) Attention mechanism-based aspect level text emotion analysis method and system
CN113420179B (en) Semantic reconstruction video description method based on time sequence Gaussian mixture hole convolution
CN115147931A (en) Person-object interaction detection method based on person paired decoding interaction of DETR (digital enhanced tomography)
CN113705242A (en) Intelligent semantic matching method and device for education consultation service
CN114283432A (en) Text block identification method and device and electronic equipment
CN111859897A (en) Text steganalysis method based on dynamic routing capsule network
CN111859407A (en) Text automatic generation steganography method based on candidate pool self-contraction mechanism
CN113538199B (en) Image steganography detection method based on multi-layer perception convolution and channel weighting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201030

WD01 Invention patent application deemed withdrawn after publication