CN104135385A - Method of application classification in Tor anonymous communication flow - Google Patents
Method of application classification in Tor anonymous communication flow Download PDFInfo
- Publication number
- CN104135385A CN104135385A CN201410370944.8A CN201410370944A CN104135385A CN 104135385 A CN104135385 A CN 104135385A CN 201410370944 A CN201410370944 A CN 201410370944A CN 104135385 A CN104135385 A CN 104135385A
- Authority
- CN
- China
- Prior art keywords
- tor
- anonymous communication
- state
- probability
- communication flow
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004891 communication Methods 0.000 title claims abstract description 60
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000002887 multiple sequence alignment Methods 0.000 claims abstract description 10
- 238000003064 k means clustering Methods 0.000 claims abstract description 7
- 230000007246 mechanism Effects 0.000 claims abstract description 6
- 238000003780 insertion Methods 0.000 claims abstract description 4
- 230000037431 insertion Effects 0.000 claims abstract description 4
- 239000011159 matrix material Substances 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 9
- 238000012546 transfer Methods 0.000 claims description 9
- 230000000750 progressive effect Effects 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 8
- 230000007704 transition Effects 0.000 claims description 8
- 238000013459 approach Methods 0.000 claims description 4
- 230000006399 behavior Effects 0.000 claims description 4
- 241001672694 Citrus reticulata Species 0.000 claims description 2
- 230000003139 buffering effect Effects 0.000 claims description 2
- 238000012545 processing Methods 0.000 claims description 2
- 230000001105 regulatory effect Effects 0.000 claims description 2
- 230000004044 response Effects 0.000 claims description 2
- 238000002864 sequence alignment Methods 0.000 claims description 2
- 238000006467 substitution reaction Methods 0.000 claims description 2
- 238000007781 pre-processing Methods 0.000 abstract description 6
- 238000007519 figuring Methods 0.000 abstract 1
- 238000005070 sampling Methods 0.000 abstract 1
- 239000000203 mixture Substances 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 7
- 241000234282 Allium Species 0.000 description 5
- 235000002732 Allium cepa var. cepa Nutrition 0.000 description 5
- 238000011160 research Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 241000808793 Strigula Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005206 flow analysis Methods 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 238000002347 injection Methods 0.000 description 1
- 239000007924 injection Substances 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a method of application classification in Tor anonymous communication flow, which mainly solves the problem of acquisition of upper-layer application type information in the Tor anonymous communication flow and relates to the correlation technique, such as feature selection, sampling preprocessing and flow modeling. The method comprises the following steps of: firstly, defining a concept of a flow burst section by utilizing a data packet scheduling mechanism of Tor, and serving a volume value and a direction of the flow burst section as classification features; secondly, preprocessing a data sample based on a K-means clustering algorithm and a multiple sequence alignment algorithm, and solving the problems of over-fitting and inconsistent length of the data sample through the manners of value symbolization and gap insertion; and lastly, respectively modeling uplink Tor anonymous communication flow and downlink Tor anonymous communication flow of different applications by utilizing a Profile hidden Markov model, providing a heuristic algorithm to establish the Profile hidden Markov model quickly, during specific classification, substituting features of network flow to be classified into the Profile hidden Markov models of different applications, respectively figuring up probabilities corresponding to an uplink flow model and a downlink flow model, and deciding the upper-layer application type included by the Tor anonymous communication flow to be classified through a maximum joint probability value.
Description
Technical field
The present invention is a kind of Tor anonymous communication flow application sorting technique, has utilized the correlation techniques such as feature selecting, sample preprocessing and flow modeling, relates to particularly anonymous communication and flow analysis research field of network security.
Background technology
Along with the fast development of Internet and mobile Internet be widely used, network has incorporated the every aspect of people's daily life.Meanwhile, network service brings safety and privacy concern have also received increasing concern.For protecting network user's privacy information, researcher has designed multiple anonymous communication scheme as onion routing agreement etc., and has developed on this basis some practical anonymous communication system, as Tor, JAP, I2P etc.But being widely used also of anonymous communication system brought huge challenge to network supervision.User can break through existing Network Acccss Control Policy to obtain illegal Internet resources, disclose secrets to information and to implement anonymous attack etc. by anonymous communication system.Because anonymous communication flow is encipher flux, for realizing the effective supervision to it, be necessary the identification and analysis technology of anonymous communication flow to launch further investigation.On the one hand specification user's network behavior effectively, hits and stops the network crime of carrying out based on anonymous communication system; On the other hand along with the research of anonymous communication flow identification and analysis technology is goed deep into; can disclose existing Protocol for Anonymous Communication and anonymous communication system and realize the upper leak existing; thereby can design more perfect Protocol for Anonymous Communication with and system realize, for the network user provides better secret protection.
Anonymous communication technology is to be proposed first in 1981 by Chaum, and this technology realizes hiding of user identity and correspondence by inserting one or more intermediate nodes (Mix node) on the communication path sender and recipient.User is in the time sending data, first Mix node and recipient's address information on definite forward-path, then utilize the PKI of each Mix node on forward-path to encrypt layer by layer data and address information, form " Onion Loaf ", and will be somebody's turn to do " Onion Loaf " and be sent to first Mix node on forward-path.Receive after " Onion Loaf ", this Mix node is decrypted operation to obtain next hop address to it, and " Onion Loaf " after deciphering is sent to next-hop node, and other node operates successively until finally initial data is forwarded to recipient.When return data, undertaken by corresponding reverse order, recipient is back to data the Mix node (being last the Mix node on forward-path) being directly connected with it, then on path, the private key of each Mix node utilization oneself is encrypted layer by layer data and is forwarded in the other direction, and finally carries out repeatedly decryption oprerations by user and draw Content of Communication.
The abuse of anonymous communication system brings grave danger to network security.For example German Government has been arrested the supplier of several Tor Egress nodes successively in 2007, and in fact the supplier of these Tor Egress nodes is the scapegoats that illegally browse the such network crimes such as pornographic information.In the time that anonymous offender utilizes Tor Network Capture child porn information, first corresponding network traffics will be sent to Tor Egress node, then give anonymous criminal by related data through Tor forwarded by these Egress nodes.IP address information according to network traffics only can be traced these Tor Egress nodes, and real cybercriminal cannot learn.In addition, Botnet (Botnet) has brought into use Tor anonymous communication network to come hidden command and control (C & C) server, each Bot node communicates by Tor and C & C server, hide the relevance between true identity and the Bot node of C & C server, made the detection of Botnet more difficult.More seriously, the network attack instrument that some are popular, thus as providing config option, the DoS attack instrument torshammer for Web server, SQL injection attacks instrument sqlmap etc. make attack traffic hide and detect and follow the trail of through the anonymous forwarded of Tor.Originally for the protection of the positive victim abuse of anonymous communication system of user privacy information, bring grave danger to network security.Therefore,, for stoping the anonymous network crime and maintaining network safety, be necessary that the upper layer application type to comprising in Tor anonymous communication flow is classified, thereby can determine the network behavior of anonymous.
At present Tor anonymous communication flow application sort research is taking the overall performance that promotes Tor network as target, thereby application class work is to be completed according to its observable protocol layer information by Tor node, and not in network flow, extracting feature carries out application class.
Summary of the invention
Technical problem: the target of application class is to obtain upper layer application type hiding in anonymous communication flow,, for anonymous communication stream f, assailant need determine the type T of the upper layer application wherein comprising
1, T
2..., T
nthereby, can infer which anonymous network activity targeted customer Alice is just carrying out, as anonymous web browsing, anonymous P2P download etc.Particularly, the application class problem of anonymous communication flow can be expressed as follows: for all possible application type T
1, T
2..., T
nhow anonymous communication is flowed to f and maps to a class or a few classes wherein? for above-mentioned technical problem, the present invention is according to the data packet dispatching mechanism of Tor, define grid stream bursts section, and taking section bulking value and direction as characteristic of division, propose a kind of application class method based on Profile HMM (Profile HMM), and use sample preprocessing technology to anticipate sample data, be convenient to the foundation of final mask.
Technical scheme:
For solving the problems of the technologies described above, the present invention, analysing in depth on the basis of Tor anonymous communication system realization mechanism, has proposed a kind of Tor anonymous communication flow application sorting technique.This sorting technique is specific as follows:
A kind of Tor anonymous communication flow application sorting technique, comprises step:
1) obtaining of Tor anonymous communication flow application characteristic of division:
2) sample data preliminary treatment:
3) discharge model of Tor anonymous communication flow is set up:
4) combining the probable value that different models calculate classifies.
Described step 1) in the data length between rightabout message in network flow be greater than to the continuous message of 0 (not comprising header field) be defined as stream bursts section, and by the summation of all message lengths in stream bursts section bulking value (Segment Volume) section of being defined as, in the direction section of being defined as of burst section, the direction of message, enters flow path direction or goes out flow path direction.Feature selecting is bulking value and the direction of stream bursts section.
Described step 2) in, use K-means clustering algorithm that stream bursts section bulking value is carried out to cluster, and determine final number of clusters according to the validity of cluster, then carry out numerical symbol.Meanwhile, use Multiple Sequence Alignment algorithm to process the sample data after meeting, by the mode of inserting space, sample data length is consistent, the versatility that implementation model is set up.
Described step 3) in, use heuristic to set up the Profile hidden Markov model of Tor anonymous communication flow.If the quantity that in row, letter occurs exceedes half, be Match state, otherwise be Insert state.Delete state determines by the room in row corresponding to Match state, and room is more, represents from previous state transitions larger to this Delete shape probability of state.If total N the letter of Match state and room, its Vacancy quantity is n, and to transfer to Delete shape probability of state be (n+1)/(N+1) to preceding state.Meanwhile, transition probability is calculated as state i and transfers to the number of times of state j divided by the total degree of state i transfer, and output probability is calculated as the number of times of i Match State-output character a divided by the total quantity of all characters of this State-output.
Described step 4) in, the probability that uses single parameter alpha associating uplink and downlink discharge model to produce, and α value is between 0 and 1.The upper layer application type information that decides Tor anonymous communication flow to comprise according to maximum joint probability value.
The present invention is directed to Tor anonymous communication flow application classification problem, utilize the data packet dispatching mechanism of Tor, definition stream bursts section concept, and using the bulking value of stream bursts section and direction as characteristic of division; Based on K-means clustering algorithm and Multiple Sequence Alignment algorithm, data sample is carried out to preliminary treatment, solve data sample over-fitting and length inconsistence problems by the mode in numerical symbolization and insertion space; Use Profile HMM to carry out respectively modeling to the uplink and downlink Tor anonymous communication flow of different application, propose a kind of heuritic approach Rapid Establishment Profile hidden Markov model.
Beneficial effect: the present invention has good classification effect, the feature such as the speed of service is fast, extra network load few (only need passive observation network traffics) can effectively realize the application class of Tor anonymous communication flow under large-scale network environment.
Brief description of the drawings
Fig. 1 is Tor anonymous communication system Organization Chart of the present invention;
Fig. 2 is the data dispatch policy map of the anonymous node of Tor of the present invention;
Fig. 3 is Profile HMM schematic diagram in the present invention;
Fig. 4 is specific embodiment of the invention flow chart.
Embodiment
This method mainly solves the Tor anonymous communication flow problem of application type acquisition of information at the middle and upper levels, relates to the correlation techniques such as feature selecting, sample preprocessing and flow modeling.First this method utilizes the data packet dispatching mechanism of Tor, definition stream bursts section concept, and using the bulking value of stream bursts section and direction as characteristic of division.Then based on K-means clustering algorithm and Multiple Sequence Alignment algorithm, data sample is carried out to preliminary treatment, solve data sample over-fitting and length inconsistence problems by the mode in numerical symbolization and insertion space.Finally, use Profile HMM to carry out respectively modeling to the uplink and downlink Tor anonymous communication flow of different application, propose a kind of heuritic approach Rapid Establishment Profile hidden Markov model.In the time of concrete classification, by in the Profile hidden Markov model of the feature substitution different application of network traffics to be sorted, calculate respectively the probability that uplink and downlink discharge model is corresponding, and the upper layer application type that decides Tor anonymous communication flow to be sorted to be comprised with maximum joint probability value.
Below in conjunction with accompanying drawing, the present invention is further described in more detail.
1, obtaining of Tor anonymous communication flow application characteristic of division
Tor anonymous communication system utilizes libevent event to realize the processing scheduling of data in input block and output buffer, can be expressed as poll (Round Robin) scheduling of implicit expression.When Tor node receives cell data from TLS/Socks interface, deposited in (Input Buffer) in corresponding input block.For the cell existing in input block, according to the difference of link direction, Tor adopts polling dispatching strategy to be decrypted or encryption it.
Polling dispatching strategy specifically describes: first processes the cell in first input block, handles after the cell of some, then process second cell in input block, by that analogy, until last input block.And then from first input block, so circulation repeatedly.For the cell in input block, after being disposed, in the corresponding output buffer of restoring (Output Buffer).For output buffer queue, similar with input block queue, Tor adopts poll strategy to dispatch equally, and the cell in different buffering areas is sent to network by TLS/Socks interface.
According to the scheduling strategy of Tor anonymous communication system, the present invention defines stream bursts section (FBS, Flow Burst Segmentation) is greater than 0 (not comprising header field) continuous message for being positioned at data length between rightabout message.If c1, c2, s1, s2, s3, s4, c3, c4 is that mutual message and the message length between client and server is greater than 0, and wherein, ci represents the message that client is sent, and si is the response message that server end returns, and i is natural number.According to the definition of above-mentioned stream bursts section, c1, c2}, s1, and s2, s3, s4} is with { c3, c4} is three various flows burst sections.Naturally, can be by the summation of all message lengths in stream bursts section bulking value (Segment Volume) section of being defined as, in the direction section of being defined as of burst section, the direction of message, enters flow path direction or goes out flow path direction.
2, sample data preliminary treatment
The present invention adopts K-means clustering algorithm to carry out symbolism to the bulking value of stream bursts section, is converted into letter character by numerical value, thus the span of reduction stream bursts section bulking value, the versatility of increase institute established model.
The detailed process of sample symbolism: (as shown in table 1)
First determine suitable number of clusters k
f.Number of clusters k is incremented to 26 from 2, calculates all kinds of middle distance intra
kwith between class distance inter
k, then calculate validity v
k=intra
k/ inter
k, and with minimum v
kbe worth corresponding k as number of clusters, make the discharge model of setting up there is better generalization.
Determine number of clusters k
fafter, for all types of application, all stream bursts section bulking values that first its training stage gathered carry out K-mean cluster, then the numerical value in each class are all used to same letter representation.(for example, for class 1, number range is wherein 16 to 676, and the unification of the numerical value in this segment limit represents with alphabetical A.) complete after cluster, all training samples are carried out to symbolism, determine the class at each bulking value place, then replace numerical value with such corresponding symbol, (shape is as <A, A, B, B to obtain characteristic vector after symbolism, D, C, C ... >).
The present invention adopts Multiple Sequence Alignment algorithm to compare to characteristic vector, makes the different characteristic vector length of same type application identical.
The basic skills of Multiple Sequence Alignment algorithm can be expressed as: by adding room (Gap) to make characteristic vector length identical, make the room minimum number that adds, i.e. Least-cost simultaneously.Because sequence exact matching needs a large amount of computing time and memory headroom, therefore the present invention mainly considers the progressive alignment algorithm based on sequence length, and its progressive comparison is completed by following three steps:
Step 1: by the comparison between two of sequence, calculate the distance between every pair of sequence, and then obtain distance matrix.The comparison between two of sequence is completed by dynamic programming algorithm, calculates the similarity scoring of two sequences with alternative manner, is stored in a score matrix, then according to this score matrix, recalls and finds optimum aligned sequences.
Step 2: calculate guide tree according to distance matrix.In this step, according to the distance matrix obtaining in step 1, build guide tree (Guide Tree).Guide tree representation be each order to aligned sequences in follow-up Multiple Sequence Alignment.
Step 3: the order of branch in setting along guide, the sequence that progressive comparison newly adds.In this step, complete equally the comparison of multisequencing by progressive contrast.Set the order from leaf node to root node according to guide, sequence is compared, the sequence pair that the relation of first comparing approaches the most, then introduces the sequence of closing on constantly rebuild comparison more gradually, until all sequences is all added into.
Similar with step 1, the comparison between sequence still completes by dynamic programming algorithm, but specifically, can have comparing between sequence and group (Profile) and group and group in step 3.All sequences are divided into many groups according to distance, thereby need to complete final sequence alignment to not comparing on the same group.
Table 1 is the algorithm false code of sample data symbolism:
3, the discharge model of Tor anonymous communication flow is set up
Profile hidden Markov model is made up of Match, Insert, tri-kinds of states of Delete.For determining Match, Insert, Delete state, first learning sample to be arranged, each characteristic vector is a line, forms sample matrix.It should be noted that characteristic vector now passed through sample preprocessing, formed by letter and room (with strigula "-" expression), and length is identical.Investigate each row of matrix, each row is Match or the Insert state in corresponding Profile HMM model.The present invention adopts following heuristic to determine the concrete state that each row is corresponding: if the quantity that in these row, letter occurs exceedes half, be Match state, otherwise be Insert state.Delete state is determined by the room in row corresponding to Match state.
Determine after Match, Insert and Delete state, also need transition probability between computing mode and the output probability of Match state.Because Delete state does not produce output, thereby without calculating corresponding output probability, and Insert state is output as random output, its observable character set comprises the character occurring in sample, and output probability is that equiprobability distributes, for all observable characters, the probability of its generation is 1/C, and C is character set size.
For the output probability of the transition probability between computing mode and Match state, first the transfer number between statistic behavior and row corresponding to Match state are planted the quantity that each character occurs.Transition probability is calculated as state i and transfers to the number of times of state j divided by the total degree of state i transfer, and output probability is calculated as the number of times of i Match State-output character a divided by the total quantity of all characters of this State-output.
4, combining the probable value that different models calculate classifies
Application class flow process is:
Step 1: be extracted into and go out the stream bursts section bulking value on both direction from stream f, obtaining characteristic vector V
iand V
e.
Step 2: according to the clustering information obtaining in the training stage, to V
iand V
ecarry out symbolism.Characteristic vector after note symbolism is respectively S
iand S
e.
Step 3: to each Profile hidden Markov model
calculate SI by model
the probability producing.Note probability is
Step 4: similar with step 3, to each Profile hidden Markov model
calculate S
eby model
the probability producing.Note probability is
Step 5: calculate joint probability
Wherein, 0≤α≤1, reaches optimum classification results for regulating to become a mandarin and go out stream to the difference contribution of classification.
Step 6: the joint probability of selective value maximum
p
m=argmax{p
1,p
2,...,p
N}
The application type that flows f is defined as m application type in training set.
The present invention also can have other numerous embodiments; in the situation that not deviating from spirit of the present invention and essence thereof; those of ordinary skill in the art can make according to the present invention various corresponding changes and distortion, and these change and be out of shape the protection range that all should belong to the appended claim of the present invention accordingly.
Claims (5)
1. a method for Tor anonymous communication flow application classification, is characterized in that comprising step:
1) utilize the data packet dispatching mechanism of Tor, definition stream bursts section concept, and using the bulking value of stream bursts section and direction as characteristic of division;
2) based on K-means clustering algorithm and Multiple Sequence Alignment algorithm, the data sample of characteristic of division is carried out to preliminary treatment, solve data sample over-fitting and length inconsistence problems by the mode in numerical symbolization and insertion space; Through data sample preliminary treatment, numerical value is symbolism, be made up of, and length is identical letter and room;
3) use Profile HMM to carry out respectively modeling to the uplink and downlink Tor anonymous communication flow of different application;
4) last, by in the Profile hidden Markov model of the feature substitution different application of network traffics to be sorted, calculate respectively the probability that uplink and downlink discharge model is corresponding, and the upper layer application type that decides Tor anonymous communication flow to be sorted to be comprised with maximum joint probability value;
Described step 3) in, use heuristic to set up the Profile hidden Markov model of Tor anonymous communication flow, method is as follows:
The Profile hidden Markov model of Tor anonymous communication flow is made up of Match, Insert, tri-kinds of states of Delete; For determining Match, Insert, Delete state, first learning sample to be arranged, each characteristic vector of characteristic of division is a line, forms sample matrix;
Investigate each row of sample matrix, each row is Match or the Insert state in corresponding Profile hidden Markov model; The method of determining the concrete state that each row is corresponding is: if the quantity that in these row, letter occurs exceedes half, being Match state, if the quantity that in these row, letter occurs does not exceed half, is Insert state; Delete state determines by the room in row corresponding to Match state, and room is more, represents from previous state transitions larger to this Delete shape probability of state;
Because Delete state does not produce output, thereby without calculating its corresponding output probability, and Insert state is output as random output, its observable character set comprises the character occurring in sample, and output probability is that equiprobability distributes, for all observable characters, the probability of its generation is 1/C, and C is character set size; So determine after Match, Insert and Delete state, the only output probability of the transition probability between computing mode and Match state, method is:
First the quantity that in the row that the transfer number between statistic behavior and Match state are corresponding, each character occurs;
Transition probability is calculated as: state i transfers to the number of times of state j divided by the total degree of state i transfer;
Output probability is calculated as: the number of times of i Match State-output character a is divided by the total quantity of all characters of this State-output.
2. the method for Tor anonymous communication flow application classification according to claim 1, it is characterized in that described step 1) in, data length between rightabout message in network flow is greater than to 0 and does not comprise that the continuous message of header field is defined as stream bursts section, and the bulking value of stream bursts section is defined as to the summation of all message lengths in this stream bursts section, the direction of stream bursts section is defined as the direction of message in stream bursts section, enters flow path direction or goes out flow path direction; Characteristic of division is chosen as bulking value and the direction of stream bursts section.
3. the method for Tor anonymous communication flow application classification according to claim 2, is characterized in that described step 1) in,
The processing that Tor anonymous communication system utilizes libevent event to realize data in input block and output buffer is dispatched, and is expressed as the polling dispatching of implicit expression; When Tor node receives cell data from TLS/Socks interface, deposited in corresponding input block; For the cell existing in input block, Tor adopts polling dispatching strategy to be decrypted or encryption it;
The method of polling dispatching strategy is:
First process the cell in first input block, handle after the cell of some, then process second cell in input block, by that analogy, until last input block; And then return to first input block and process, so circulation is repeatedly;
For the cell in input block, after being disposed, restore in corresponding output buffer; For output buffer queue, similar with input block queue, Tor adopts poll strategy to dispatch equally, and the cell in different buffering areas is sent to network by TLS/Socks interface;
For stream bursts section, establish c1, c2, s1, s2, s3, s4, c3, c4 is that mutual message and the message length between client and server is greater than 0, and wherein, ci represents the message that client is sent, and si is the response message that server end returns, and i is natural number; C1, c2}, s1, s2, s3, s4} and c3, c4} is three various flows burst section; , by the summation of all message lengths in the stream bursts section bulking value section of being defined as, in the direction section of being defined as of burst section, the direction of message, enters flow path direction or goes out flow path direction.
4. the method for Tor anonymous communication flow application classification according to claim 1, it is characterized in that described step 2) in, adopt K-means clustering algorithm to carry out symbolism to the bulking value of stream bursts section, be converted into letter character by numerical value, the semiosis of sample comprises:
First determine suitable number of clusters k
f: number of clusters k is incremented to 26 from 2, calculates all kinds of middle distance intra
kwith between class distance inter
k, then calculate validity v
k=intra
k/ inter
k, and with minimum v
kbe worth corresponding k as number of clusters;
Determine number of clusters k
fafter, for all types of application, all stream bursts section bulking values that first its training stage gathered carry out K-mean cluster, then the numerical value in each class are all used to same letter representation; Complete after cluster, all training samples are carried out to symbolism, determine the class at each bulking value place, then replace numerical value with such corresponding symbol, obtain the characteristic vector of the characteristic of division after symbolism;
Adopt Multiple Sequence Alignment algorithm to compare to characteristic vector, make the length of different characteristic vector of same type application identical, specifically adopt the progressive alignment algorithm based on sequence length, its progressive comparison is completed by following three steps:
A: by the comparison between two of sequence, calculate the distance between every pair of sequence, and then obtain distance matrix; The comparison between two of sequence is completed by dynamic programming algorithm, calculates the similarity scoring of two sequences with alternative manner, is stored in a score matrix, then according to this score matrix, recalls and finds optimum aligned sequences;
B: according to the distance matrix obtaining in step a, build guide tree, guide tree representation be each order to aligned sequences in follow-up Multiple Sequence Alignment;
C: the order of branch in setting along guide, the sequence that progressive comparison newly adds; In this step, complete the comparison of multisequencing by progressive contrast; Set the order from leaf node to root node according to guide, sequence is compared, the sequence pair that the relation of first comparing approaches the most, then introduces the sequence of closing on constantly rebuild comparison more gradually, until all sequences is all added into; A is similar with step, and the comparison between sequence still completes by dynamic programming algorithm, but specifically, can have comparing between sequence and group and group and group in this step c; All sequences are divided into many groups according to distance, thereby need to complete final sequence alignment to not comparing on the same group.
5. the method for Tor anonymous communication flow application classification according to claim 1, is characterized in that described step 4) in, application class flow process is:
4.1: from stream f, extract the stream bursts section bulking value on descending and up both direction, obtain characteristic vector and be designated as respectively V
iand V
e;
4.2: according to the clustering information obtaining in the training stage, to V
iand V
ecarry out symbolism; Characteristic vector after note symbolism is respectively S
iand S
e;
4.3: the Profile hidden Markov model of the vector correspondence of the bulking value of the stream bursts section on down direction is
to each model
calculate S
iby model
the probability producing, note probability is
4.4: the Profile hidden Markov model of the vector correspondence of the bulking value of the stream bursts section on up direction is
to each model
calculate S
eby model
the probability producing, note probability is
4.5: calculate joint probability
Wherein, 0≤α≤1, α becomes a mandarin and goes out the difference contribution of stream to classification for regulating, to reach optimum classification results;
4.6: the joint probability of selective value maximum
p
m=argmax{p
1,p
2,...,p
N}
The application type that flows f is defined as m application type in training set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410370944.8A CN104135385B (en) | 2014-07-30 | 2014-07-30 | Method of application classification in Tor anonymous communication flow |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410370944.8A CN104135385B (en) | 2014-07-30 | 2014-07-30 | Method of application classification in Tor anonymous communication flow |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104135385A true CN104135385A (en) | 2014-11-05 |
CN104135385B CN104135385B (en) | 2017-05-24 |
Family
ID=51807914
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410370944.8A Expired - Fee Related CN104135385B (en) | 2014-07-30 | 2014-07-30 | Method of application classification in Tor anonymous communication flow |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104135385B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104361123A (en) * | 2014-12-03 | 2015-02-18 | 中国科学技术大学 | Individual behavior data anonymization method and system |
CN104702465A (en) * | 2015-02-09 | 2015-06-10 | 桂林电子科技大学 | Parallel network flow classification method |
CN109194657A (en) * | 2018-09-11 | 2019-01-11 | 北京理工大学 | A kind of encrypting web traffic characteristic extracting method based on accumulation data packet length |
CN109728977A (en) * | 2019-01-14 | 2019-05-07 | 电子科技大学 | JAP anonymity flow rate testing methods and system |
CN109951444A (en) * | 2019-01-29 | 2019-06-28 | 中国科学院信息工程研究所 | A kind of encryption Anonymizing networks method for recognizing flux |
CN110113338A (en) * | 2019-05-08 | 2019-08-09 | 北京理工大学 | A kind of encryption traffic characteristic extracting method based on Fusion Features |
CN110363023A (en) * | 2019-06-20 | 2019-10-22 | 广东工业大学 | A kind of Anonymizing networks source tracing method based on PHMM |
CN112866369A (en) * | 2021-01-12 | 2021-05-28 | 北京工业大学 | Anonymous P2P network anonymity degree evaluation method based on hidden Markov model |
CN113037709A (en) * | 2021-02-02 | 2021-06-25 | 厦门大学 | Webpage fingerprint monitoring method for multi-label browsing of anonymous network |
CN114422210A (en) * | 2021-12-30 | 2022-04-29 | 中国人民解放军战略支援部队信息工程大学 | Anonymous network passive flow analysis and evaluation method and system based on AnoA theory |
CN114500396A (en) * | 2022-02-09 | 2022-05-13 | 江苏大学 | MFD chromatographic characteristic extraction method and system for distinguishing anonymous Tor application flow |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030225903A1 (en) * | 2002-06-04 | 2003-12-04 | Sandeep Lodha | Controlling the flow of packets within a network node utilizing random early detection |
US20040095934A1 (en) * | 2002-11-18 | 2004-05-20 | Cosine Communications, Inc. | System and method for hardware accelerated packet multicast in a virtual routing system |
CN101252541A (en) * | 2008-04-09 | 2008-08-27 | 中国科学院计算技术研究所 | Method for establishing network flow classified model and corresponding system thereof |
CN102664881A (en) * | 2012-04-13 | 2012-09-12 | 东南大学 | Method for positioning hidden service under hypertext transfer protocol 1.1 |
-
2014
- 2014-07-30 CN CN201410370944.8A patent/CN104135385B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030225903A1 (en) * | 2002-06-04 | 2003-12-04 | Sandeep Lodha | Controlling the flow of packets within a network node utilizing random early detection |
US20040095934A1 (en) * | 2002-11-18 | 2004-05-20 | Cosine Communications, Inc. | System and method for hardware accelerated packet multicast in a virtual routing system |
CN101252541A (en) * | 2008-04-09 | 2008-08-27 | 中国科学院计算技术研究所 | Method for establishing network flow classified model and corresponding system thereof |
CN102664881A (en) * | 2012-04-13 | 2012-09-12 | 东南大学 | Method for positioning hidden service under hypertext transfer protocol 1.1 |
Non-Patent Citations (1)
Title |
---|
何高峰等: "Tor匿名通信流量在线识别方法", 《软件学报》 * |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104361123B (en) * | 2014-12-03 | 2017-11-03 | 中国科学技术大学 | A kind of personal behavior data anonymous method and system |
CN104361123A (en) * | 2014-12-03 | 2015-02-18 | 中国科学技术大学 | Individual behavior data anonymization method and system |
CN104702465A (en) * | 2015-02-09 | 2015-06-10 | 桂林电子科技大学 | Parallel network flow classification method |
CN104702465B (en) * | 2015-02-09 | 2017-10-10 | 桂林电子科技大学 | A kind of parallel network flow sorting technique |
CN109194657B (en) * | 2018-09-11 | 2020-05-12 | 北京理工大学 | Webpage encryption traffic characteristic extraction method based on accumulated data packet length |
CN109194657A (en) * | 2018-09-11 | 2019-01-11 | 北京理工大学 | A kind of encrypting web traffic characteristic extracting method based on accumulation data packet length |
CN109728977A (en) * | 2019-01-14 | 2019-05-07 | 电子科技大学 | JAP anonymity flow rate testing methods and system |
CN109951444B (en) * | 2019-01-29 | 2020-05-22 | 中国科学院信息工程研究所 | Encrypted anonymous network traffic identification method |
CN109951444A (en) * | 2019-01-29 | 2019-06-28 | 中国科学院信息工程研究所 | A kind of encryption Anonymizing networks method for recognizing flux |
CN110113338A (en) * | 2019-05-08 | 2019-08-09 | 北京理工大学 | A kind of encryption traffic characteristic extracting method based on Fusion Features |
CN110113338B (en) * | 2019-05-08 | 2020-06-26 | 北京理工大学 | Encrypted flow characteristic extraction method based on characteristic fusion |
CN110363023A (en) * | 2019-06-20 | 2019-10-22 | 广东工业大学 | A kind of Anonymizing networks source tracing method based on PHMM |
CN110363023B (en) * | 2019-06-20 | 2023-03-21 | 广东工业大学 | Anonymous network tracing method based on PHMM |
CN112866369A (en) * | 2021-01-12 | 2021-05-28 | 北京工业大学 | Anonymous P2P network anonymity degree evaluation method based on hidden Markov model |
CN112866369B (en) * | 2021-01-12 | 2023-07-25 | 北京工业大学 | Anonymous P2P network anonymity degree assessment method based on hidden Markov model |
CN113037709A (en) * | 2021-02-02 | 2021-06-25 | 厦门大学 | Webpage fingerprint monitoring method for multi-label browsing of anonymous network |
CN113037709B (en) * | 2021-02-02 | 2022-03-29 | 厦门大学 | Webpage fingerprint monitoring method for multi-label browsing of anonymous network |
CN114422210A (en) * | 2021-12-30 | 2022-04-29 | 中国人民解放军战略支援部队信息工程大学 | Anonymous network passive flow analysis and evaluation method and system based on AnoA theory |
CN114500396A (en) * | 2022-02-09 | 2022-05-13 | 江苏大学 | MFD chromatographic characteristic extraction method and system for distinguishing anonymous Tor application flow |
CN114500396B (en) * | 2022-02-09 | 2024-04-16 | 江苏大学 | MFD chromatographic feature extraction method and system for distinguishing anonymous Torr application flow |
Also Published As
Publication number | Publication date |
---|---|
CN104135385B (en) | 2017-05-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104135385A (en) | Method of application classification in Tor anonymous communication flow | |
Nie et al. | Intrusion detection for secure social internet of things based on collaborative edge computing: a generative adversarial network-based approach | |
CN103078897B (en) | A kind of system realizing Web service fine grit classification and management | |
Wang et al. | Multilevel identification and classification analysis of Tor on mobile and PC platforms | |
Qin et al. | Federated learning-based network intrusion detection with a feature selection approach | |
CN104601596B (en) | Data-privacy guard method in a kind of Classification Data Mining system | |
CN105871832A (en) | Network application encrypted traffic recognition method and device based on protocol attributes | |
Haddadi et al. | Botnet behaviour analysis using ip flows: with http filters using classifiers | |
Niu et al. | A heuristic statistical testing based approach for encrypted network traffic identification | |
Lingyu et al. | A hierarchical classification approach for tor anonymous traffic | |
Chen et al. | Sequential message characterization for early classification of encrypted internet traffic | |
Lin et al. | ERID: A deep learning-based approach towards efficient real-time intrusion detection for IoT | |
CN112507336A (en) | Server-side malicious program detection method based on code characteristics and flow behaviors | |
Lee et al. | A machine learning approach to predicting block cipher security | |
CN109858510A (en) | A kind of detection method for http protocol ETag value covert communications | |
Liu et al. | Spatial‐Temporal Feature with Dual‐Attention Mechanism for Encrypted Malicious Traffic Detection | |
De Souza et al. | A distinguishing attack with a neural network | |
Zheng et al. | Detecting malicious tls network traffic based on communication channel features | |
Deng et al. | Identifying tor anonymous traffic based on gravitational clustering analysis | |
CN106534144A (en) | Network covert channel construction method based on Web application directory tree | |
CN111371727A (en) | Detection method for NTP protocol covert communication | |
Li et al. | Prism: Real-Time Privacy Protection Against Temporal Network Traffic Analyzers | |
CN111835720B (en) | VPN flow WEB fingerprint identification method based on feature enhancement | |
Zhou et al. | IoT unbalanced traffic classification system based on Focal_Attention_LSTM | |
Zolotukhin et al. | On application-layer DDoS attack detection in high-speed encrypted networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170524 |
|
CF01 | Termination of patent right due to non-payment of annual fee |