A kind of covert communications detection method arriving sequential coding for multilink
Technical field
The present invention relates to Networks and information security technology, a kind of arrive the hidden of sequential coding for multilink
Cover communication check method.
Background technology
Network steganography, as a kind of covert communications mode, utilizes legal data stream to transmit in a network as carrier
Secret information.Government, enterprises and individuals, by utilizing the hidden channel of network to carry out secret communication, transmit weight safely
Want information.But meanwhile, network steganography also can be utilized by illegal organizations and individuals, to transmit harmful information, threaten
Public security.Therefore, the existence of detection network steganography, prevent harm from occurring, be to closing main link.Hidden
The detection technique write, as a very important technology in network safety prevention field, causes researcher
Extensive concern, and have been achieved for a lot of achievements in research so far.
The covert communications arriving sequential coding based on multilink is a kind of brand-new covert communications mode, 2015,
WojciechWith the concept that Krzysztof Szczypiorski proposes stegblock.First have to choose
Some objects, and be the different identifier of each object definition, and these identifiers are the transmissions for hidden data
Known to side and recipient, these object mergings are become a sequence (i.e. data block), the value of data block is taken
Carry hidden information.
An example for stegblock is as follows, { in four objects in 1,2,3,4}, with 1 is being
Beginning, 3 is that the data block ended up constitutes stego-key, as shown in Figure 1.Sending direction recipient sends
These objects constitute a section data block based on stego-key rule.First paragraph data block is by { Isosorbide-5-Nitrae, 3} tri-
Individual object form because this is the shortest length of an object, contain from transmission initiating terminal start all of
Object.Data block subsequently is also defined by such mode.Additionally, in the communication instance that Fig. 1 shows
It is assumed that last of data block object number determines the value of this section of blocks of data, say, that first paragraph
The value of data block was ' 1 ' (this segment data block is made up of 3 objects), and the value of second segment data block is ' 0 ' (data
Block is made up of 6 objects).The value adding up these data blocks i.e. can get the value of secret information, and then we can
To show that in this example, the value of the concealed data of transmission is ' 1010 '.Recipient must if cracking concealed data
It is understood that the stego-key of this communication.
Covert communications based on multilink arrival sequential coding applies the concept of stegblock just, uses
MPTCP agreement sets up multi link communications between two main frames, gives different identifiers to different links.
The multilink using MPTCP agreement arrives sequence covert communications and is operated in the following manner:
1, between sender and the recipient of two main frames, i.e. hidden information, set up MPTCP link, will
Every TCP flow regards as an object, gives different identifiers, the number of the identifier i.e. number of TCP flow.
2, data block is generated according to steganography key.
3, the transmission to packet is controlled, and selects different TCP according to the order of identifier in data block
Stream is given out a contract for a project.
Assuming to exist between two main frames 5 different TCP flow, identifier is respectively 1,2,3,4,5,
The key of covert communications is 1 and 3, then packet arrive sequence 1,3,4} represent ' 1 ', and 5,3,4,
1} represents ' 0 '.
The main advantage of this steganography method be it may ensure that hiding data transmission in stability.In this method
In, recipient may decide that the order of object, and be based on TSN (transmission sequence number) non-reception respectively
The time of individual object.TSN is i.e. the serial number of blocks of data, it be as blocks of data transmission and incremental also
And corresponding to the specific blocks of data in data stream.When data occur to postpone or lose, and sender can be carried out
Retransmit, thus achieve perfect reception of hiding data and the reception time of each object need not be sticked to.
Owing to stegblock is a brand-new concept, this covert communications side of undisclosed detection in prior art
The method of formula.
Summary of the invention
Goal of the invention: the defect existed for above-mentioned prior art, it is desirable to provide one is for multilink
Arrive the covert communications detection method of sequential coding.
Technical scheme: a kind of covert communications detection method arriving sequential coding for multilink, including setting up mould
Type storehouse and utilize model library to detect, described model library of setting up comprises the steps:
(1) data capture device is set: set up multilink based on MPTCP link between two main frames,
And utilize packet when communicating between data capture device capture main frame, the arrival of packet is selected by filter screen
Sequence;
(2) character mapper is set: set up a character mapper according to the principle of stegblock, will capture
Packet arrive Sequence Transformed for identifier nucleotide sequence, identifier nucleotide sequence is one-dimension array;
(3) window dispenser is set: the window that identifier nucleotide sequence is divided into size to be w by window dispenser, altogether
Can be divided intoIndividual window;Each window is divided into the minizone that size is L, and a window can be divided intoIndividual minizone,
If there is N bar stream, f between main frameNIt is the number of packet by certain stream, statistics in a minizone
Accounting P of every TCP flow in each minizoneNi=fN/ L, i=1,2,3 ...,
(4) entropy is set and asks for device: entropy asks for the letter of every TCP flow identifier nucleotide sequence in device calculation window
Breath entropy
(5) Euclidean distance is set and asks for device: Euclidean distance asks for device by the N bar TCP flow in each window
N number of comentropy regards as the point in a N-dimensional space, calculates each point Euclidean distance O to initial point respectivelyj, j=1,
2,3 ...,The Euclidean distance of every bit is:Wherein xnFor step (4)
In comentropy H tried to achieveN;
(6) the normal data model of the different number of links of training, and set detection threshold value: chain between two main frames
In the case of way difference, repeat step (1)-(5), obtain the normal data model of different number of links, right
Each model is analyzed, and seeks average M of data contained by different number of links+, variance V+With detection threshold value
Th+=M++aV+, wherein α is self-defining constant function, is used for adjusting detection threshold value;
(7) model library is set up: the detection threshold value of the different number of links tried to achieve in step (6) is put into model library
In.
Further, described utilize model library to carry out detection to specifically include following steps:
(A) judge testing data number of links: if number of links is 1, then there is not multilink between two main frames
Communication, EP (end of program);Otherwise there is multi link communications, recall from model library accordingly according to the number of link
Model and detection threshold value Th+;
(B) process testing data and calculate Euclidean distance: utilizing character mapper to be arrived by the packet of capture
Sequence Transformed for identifier nucleotide sequence, this identifier nucleotide sequence is an one-dimension array;Utilize window dispenser by array
It is divided into size to be the window of ω;Entropy is utilized to ask for the entropy of all identifiers in device calculates each window,
If there being N number of different identifier, then calculate N number of entropy;N by the N bar TCP flow in each window
Individual comentropy regards as the point in a N-dimensional space, calculates each point Euclidean distance O to initial points, s=1,2,3 ...,The Euclidean distance of every bit is:Wherein xnFor entropy;
(C) data to be tested attribute is judged: by OsDetection threshold value Th corresponding with model library+Make comparisons,
More than Th+Then data to be tested are containing ciphertext data, are otherwise normal data.
Further, the data capture device i.e. wireshark described in step (1), described filter is
The filter that wireshark is built-in.
Further, the number of links of every main frame described in step (6) is all not less than two.
Further, threshold value Th described in step (A)+For Th+=M++αV+, wherein M+For difference
The average of data contained by number of links, V+The variance of data contained by different number of links, α is self-defining constant letter
Number.
Further, entropy i.e. comentropy described in step (B)Wherein PNi
For by the accounting of every TCP flow, i=1,2,3 in window dispenser institute by stages ...,
Beneficial effect: the present invention proposes one, and " it is hidden that multilink arrives sequential coding for Novel hidden WriteMode
Communication " detection method, the packet of testing data and normal data is arrived sequence and regards random sequence as, point
By the identifier of TCP flow, sequence not being sought entropy, if there is N bar TCP flow during two main-machine communications, then can get
N number of entropy, regards as N number of entropy of N bar TCP flow the point in a N-dimensional space, then asks this point to initial point
Euclidean distance, by comparing the difference of the two Euclidean distance, it is judged that whether data to be tested stream is containing ciphertext data
Stream.And on the basis of obtaining traffic flow information entropy, combine with the calculating of Euclidean distance, thus improve
Detection results, can obtain reliable testing result.
Accompanying drawing explanation
Fig. 1 is the principle exemplary plot of stegblock.
Fig. 2 is proper communication model training flow chart.
The actually detected flow chart of Fig. 3.
Fig. 4 is the experiment effect figure of observation window w=1000.
Detailed description of the invention
Below by a most preferred embodiment and combine accompanying drawing the technical program is described in detail.
Below in conjunction with accompanying drawing, the present invention is described in further detail.
Fig. 1 is the principle exemplary plot of stegblock, and multilink just arrives the coded system of sequence covert communications
It is based on the principle of stegblock.
Assuming to exist between two main frames 4 TCP flow, its identifier is respectively 1,2,3,4, sender
With 1,3 as key, a pair key one data block of composition, the TCP flow mark at the packet place first arrived
Symbol for Isosorbide-5-Nitrae, 3}, represent ' 1 ', after first end of data block, the beginning of next bit that is second data block,
Second data block be 2, Isosorbide-5-Nitrae, 1,2,3}, represent ' 0 ', the 3rd data block be 3,4,3,2,1}, expression ' 1 ',
Fourth data block be 3,1}, and represent ' 0 '.
Coded system based on stegblock, we are when writing detection method, as long as by detecting certain window
The entropy of every TCP flow identifier (i.e. 1,2,3,4) in mouthful, then these entropy are processed, i.e.
Can determine whether whether this window contains hidden information.
Fig. 2 is for proper communication model training flow chart, compiles as it can be seen, one arrives sequence for multilink
The covert communications detection method of code, including setting up model library and utilize model library to detect, described sets up model
Storehouse specifically includes following steps:
(1) data capture device is set: set up multilink based on MPTCP link between two main frames,
And utilize packet when communicating between data capture device capture main frame, the present embodiment is pass through wireshark
Normal multi link communications data between two main frames are captured as data capture device, more built-in with wireshark
Filter screen select the arrival sequence of packet.
(2) character mapper is set: set up a character mapper according to the principle of stegblock, gives difference
Stream give different digital coding (i.e. identifier), as to 192.xxx.xxx.11 to 192.xxx.xxx.1
Stream gives identifier 1, gives identifier 2 to the stream of 192.xxx.xxx.12 to 192.xxx.xxx.1, will capture
Packet arrive Sequence Transformed for identifier nucleotide sequence;The packet of capture is arrived Sequence Transformed by the present embodiment
Packet for each bar TCP flow arrives sequence, and this sequence is an one-dimension array, by the group identifier of each stream
Becoming, when there is 4 TCP flow between two main frames, these 4 streams replace, then with 1,2,3,4 respectively
Identifier array is formed by 1,2,3,4.
(3) window dispenser is set: the window that identifier nucleotide sequence is divided into size to be w by window dispenser, altogether
Can be divided intoIndividual window;Each window is divided into the minizone that size is L, and a window can be divided intoIndividual minizone,
If there is N bar stream, f between main frameNIt is the number of packet by certain stream, statistics in a minizone
Accounting P of every TCP flow in each minizoneNi=fN/ L, i=1,2,3 ...,The present embodiment intercepts
Front 49000 data (i.e. 49000 bags) in array, with the window size of w=1000, are classified as
49 detection windows.
Again split detection window with L=100, each detection window is divided into 10 equal-sized minizones.
Add up in each minizone, the probability that each TCP flow identifier occurs, it is assumed that identifier is the stream (i.e. 1 of 1
Number stream), in first minizone occur probability be p1, the probability occurred in second minizone is p2,
By that analogy, the probability occurred the tenth minizone is p10。
(4) entropy is set and asks for device: entropy asks for the letter of every TCP flow identifier nucleotide sequence in device calculation window
Breath entropyThe present embodiment calculates No. 1 stream in this window with this formula
Entropy, in like manner can also calculate 2,3, No. 4 stream entropy in this window, at a detection window
In, there are 4 TCP flow, it is possible to go out 4 entropy according to the identifier calculation of TCP flow.
(5) Euclidean distance is set and asks for device: Euclidean distance asks for device by the N bar TCP flow in each window
N number of comentropy regards as the point in a N-dimensional space, calculates each point Euclidean distance O to initial point respectivelyj, j=1,
2,3 ...,The Euclidean distance of every bit is:Wherein xnFor step (4)
In comentropy H tried to achieveN;By 4 entropy as the point in 4 dimension spaces in the present embodiment, obtain this
Point, to the Euclidean distance of initial point, obtains the Euclidean distance value of 49 windows.
(6) the normal data model of the different number of links of training, and set detection threshold value: chain between two main frames
The situation of way different (number of links of every main frame all not less than two such as 2,3,4,5)
Under, repeat step (1)-(5), the normal data model of different number of links can be obtained, each model is carried out
Analyze, seek average M of data contained by different number of links+, variance V+With detection threshold value Th+=M++aV+,
Wherein α is self-defining constant function, is used for adjusting detection threshold value.The present embodiment obtains through step (5)
After the Euclidean distance value of 49 windows, the data model of proper communication under 4 links can be set up out and to respectively
Model is analyzed.
(7) model library is set up: the detection threshold value of the different number of links tried to achieve in step (6) is put into model library
In.
Utilize model library to carry out detection to specifically include following steps as it is shown on figure 3, described:
(A) judge testing data number of links: if number of links is 1, then there is not multilink between two main frames
Communication, EP (end of program);Otherwise there is multi link communications, recall from model library accordingly according to the number of link
Model and detection threshold value Th+, Th+=M++aV+, wherein M+Contained by different number of links, data is equal
Value, V+The variance of data contained by different number of links, α is self-defining constant function;Through sentencing in the present embodiment
Disconnected testing data is the communication environment of 4 links, recall from model library 4 link proper communications model and
Detection threshold value.
(B) process testing data and calculate Euclidean distance: utilizing character mapper to be arrived by the packet of capture
Sequence Transformed for identifier nucleotide sequence, this identifier nucleotide sequence is an one-dimension array;Extract the number of data to be tested stream
Arrive sequence according to bag, give its different identifier, packet is arrived Sequence Transformed for accessible array,
Utilizing character mapper in the present embodiment, the packet of capture arrives Sequence Transformed for each bar TCP flow
Packet arrives sequence, and this sequence is made up of the identifier of each stream, there are 4 TCP between two main frames
During stream, these 4 streams replace with 1,2,3,4 respectively, then identifier array is formed by 1,2,3,4.
Utilize window dispenser that array is divided into size be the window of ω;Intercept front 49000 in array
Individual data (i.e. 49000 bags), with the window size of w=1000, are classified as 49 detection windows.With
L=100 splits detection window again, and each detection window is divided into 10 equal-sized minizones.Statistics
In each minizone, the probability that each TCP flow identifier occurs, it is assumed that identifier is the stream (i.e. No. 1 stream) of 1,
The probability occurred in first minizone is p1, the probability occurred in second minizone is p2, with this type of
Pushing away, the probability occurred the tenth minizone is p10;
Entropy is utilized to ask for the entropy of all identifiers in device calculates each window, if there being N number of different mark
Symbol (i.e. TCP flow), then calculate N number of entropy;N number of information by the N bar TCP flow in each window
Entropy regards as the point in a N-dimensional space, calculates each point Euclidean distance O to initial points, s=1,2,3 ...,
The Euclidean distance of every bit is:Wherein xnFor entropy, i.e. comentropyWherein PNiFor by the accounting for of every TCP flow in window dispenser institute by stages
Ratio, i=1,2,3 ...,The present embodiment uses entropy to ask for device, calculates No. 1 stream in this window
Entropy.In like manner can also calculate 2,3, No. 4 streams entropy in this window.In a detection window,
There are 4 TCP flow, it is possible to calculate 4 entropy.Use Euclidean distance to ask for device, 4 entropy are worked as
Do the point in 4 dimension spaces, obtain this some Euclidean distance to initial point.
(C) data to be tested attribute is judged: by OsDetection threshold value Th corresponding with model library+Make comparisons,
More than Th+Then data to be tested are containing ciphertext data, are otherwise normal data.
Fig. 4 is the experiment effect figure of observation window w=1000, and solid line is normal data, and dotted line is containing ciphertext data,
As seen from the figure, the method proposed in the application present invention, can well distinguish and arrive sequential coding for multilink
Normal data in covert communications and containing ciphertext data.
Below it is only the preferred embodiment of the present invention, it should be pointed out that: for the ordinary skill people of the art
For Yuan, under the premise without departing from the principles of the invention, it is also possible to make some improvements and modifications, these improve
Also protection scope of the present invention is should be regarded as with retouching.