CN104135385A - Method of application classification in Tor anonymous communication flow - Google Patents

Method of application classification in Tor anonymous communication flow Download PDF

Info

Publication number
CN104135385A
CN104135385A CN201410370944.8A CN201410370944A CN104135385A CN 104135385 A CN104135385 A CN 104135385A CN 201410370944 A CN201410370944 A CN 201410370944A CN 104135385 A CN104135385 A CN 104135385A
Authority
CN
China
Prior art keywords
tor
anonymous communication
state
probability
communication flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410370944.8A
Other languages
Chinese (zh)
Other versions
CN104135385B (en
Inventor
蒋平
许勇
赵琛
史明文
汪兆斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NANJING PUBLIC SECURITY BUREAU
Original Assignee
NANJING PUBLIC SECURITY BUREAU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NANJING PUBLIC SECURITY BUREAU filed Critical NANJING PUBLIC SECURITY BUREAU
Priority to CN201410370944.8A priority Critical patent/CN104135385B/en
Publication of CN104135385A publication Critical patent/CN104135385A/en
Application granted granted Critical
Publication of CN104135385B publication Critical patent/CN104135385B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a method of application classification in Tor anonymous communication flow, which mainly solves the problem of acquisition of upper-layer application type information in the Tor anonymous communication flow and relates to the correlation technique, such as feature selection, sampling preprocessing and flow modeling. The method comprises the following steps of: firstly, defining a concept of a flow burst section by utilizing a data packet scheduling mechanism of Tor, and serving a volume value and a direction of the flow burst section as classification features; secondly, preprocessing a data sample based on a K-means clustering algorithm and a multiple sequence alignment algorithm, and solving the problems of over-fitting and inconsistent length of the data sample through the manners of value symbolization and gap insertion; and lastly, respectively modeling uplink Tor anonymous communication flow and downlink Tor anonymous communication flow of different applications by utilizing a Profile hidden Markov model, providing a heuristic algorithm to establish the Profile hidden Markov model quickly, during specific classification, substituting features of network flow to be classified into the Profile hidden Markov models of different applications, respectively figuring up probabilities corresponding to an uplink flow model and a downlink flow model, and deciding the upper-layer application type included by the Tor anonymous communication flow to be classified through a maximum joint probability value.

Description

The method of Tor anonymous communication flow application classification
Technical field
The present invention is a kind of Tor anonymous communication flow application sorting technique, has utilized the correlation techniques such as feature selecting, sample preprocessing and flow modeling, relates to particularly anonymous communication and flow analysis research field of network security.
Background technology
Along with the fast development of Internet and mobile Internet be widely used, network has incorporated the every aspect of people's daily life.Meanwhile, network service brings safety and privacy concern have also received increasing concern.For protecting network user's privacy information, researcher has designed multiple anonymous communication scheme as onion routing agreement etc., and has developed on this basis some practical anonymous communication system, as Tor, JAP, I2P etc.But being widely used also of anonymous communication system brought huge challenge to network supervision.User can break through existing Network Acccss Control Policy to obtain illegal Internet resources, disclose secrets to information and to implement anonymous attack etc. by anonymous communication system.Because anonymous communication flow is encipher flux, for realizing the effective supervision to it, be necessary the identification and analysis technology of anonymous communication flow to launch further investigation.On the one hand specification user's network behavior effectively, hits and stops the network crime of carrying out based on anonymous communication system; On the other hand along with the research of anonymous communication flow identification and analysis technology is goed deep into; can disclose existing Protocol for Anonymous Communication and anonymous communication system and realize the upper leak existing; thereby can design more perfect Protocol for Anonymous Communication with and system realize, for the network user provides better secret protection.
Anonymous communication technology is to be proposed first in 1981 by Chaum, and this technology realizes hiding of user identity and correspondence by inserting one or more intermediate nodes (Mix node) on the communication path sender and recipient.User is in the time sending data, first Mix node and recipient's address information on definite forward-path, then utilize the PKI of each Mix node on forward-path to encrypt layer by layer data and address information, form " Onion Loaf ", and will be somebody's turn to do " Onion Loaf " and be sent to first Mix node on forward-path.Receive after " Onion Loaf ", this Mix node is decrypted operation to obtain next hop address to it, and " Onion Loaf " after deciphering is sent to next-hop node, and other node operates successively until finally initial data is forwarded to recipient.When return data, undertaken by corresponding reverse order, recipient is back to data the Mix node (being last the Mix node on forward-path) being directly connected with it, then on path, the private key of each Mix node utilization oneself is encrypted layer by layer data and is forwarded in the other direction, and finally carries out repeatedly decryption oprerations by user and draw Content of Communication.
The abuse of anonymous communication system brings grave danger to network security.For example German Government has been arrested the supplier of several Tor Egress nodes successively in 2007, and in fact the supplier of these Tor Egress nodes is the scapegoats that illegally browse the such network crimes such as pornographic information.In the time that anonymous offender utilizes Tor Network Capture child porn information, first corresponding network traffics will be sent to Tor Egress node, then give anonymous criminal by related data through Tor forwarded by these Egress nodes.IP address information according to network traffics only can be traced these Tor Egress nodes, and real cybercriminal cannot learn.In addition, Botnet (Botnet) has brought into use Tor anonymous communication network to come hidden command and control (C & C) server, each Bot node communicates by Tor and C & C server, hide the relevance between true identity and the Bot node of C & C server, made the detection of Botnet more difficult.More seriously, the network attack instrument that some are popular, thus as providing config option, the DoS attack instrument torshammer for Web server, SQL injection attacks instrument sqlmap etc. make attack traffic hide and detect and follow the trail of through the anonymous forwarded of Tor.Originally for the protection of the positive victim abuse of anonymous communication system of user privacy information, bring grave danger to network security.Therefore,, for stoping the anonymous network crime and maintaining network safety, be necessary that the upper layer application type to comprising in Tor anonymous communication flow is classified, thereby can determine the network behavior of anonymous.
At present Tor anonymous communication flow application sort research is taking the overall performance that promotes Tor network as target, thereby application class work is to be completed according to its observable protocol layer information by Tor node, and not in network flow, extracting feature carries out application class.
Summary of the invention
Technical problem: the target of application class is to obtain upper layer application type hiding in anonymous communication flow,, for anonymous communication stream f, assailant need determine the type T of the upper layer application wherein comprising 1, T 2..., T nthereby, can infer which anonymous network activity targeted customer Alice is just carrying out, as anonymous web browsing, anonymous P2P download etc.Particularly, the application class problem of anonymous communication flow can be expressed as follows: for all possible application type T 1, T 2..., T nhow anonymous communication is flowed to f and maps to a class or a few classes wherein? for above-mentioned technical problem, the present invention is according to the data packet dispatching mechanism of Tor, define grid stream bursts section, and taking section bulking value and direction as characteristic of division, propose a kind of application class method based on Profile HMM (Profile HMM), and use sample preprocessing technology to anticipate sample data, be convenient to the foundation of final mask.
Technical scheme:
For solving the problems of the technologies described above, the present invention, analysing in depth on the basis of Tor anonymous communication system realization mechanism, has proposed a kind of Tor anonymous communication flow application sorting technique.This sorting technique is specific as follows:
A kind of Tor anonymous communication flow application sorting technique, comprises step:
1) obtaining of Tor anonymous communication flow application characteristic of division:
2) sample data preliminary treatment:
3) discharge model of Tor anonymous communication flow is set up:
4) combining the probable value that different models calculate classifies.
Described step 1) in the data length between rightabout message in network flow be greater than to the continuous message of 0 (not comprising header field) be defined as stream bursts section, and by the summation of all message lengths in stream bursts section bulking value (Segment Volume) section of being defined as, in the direction section of being defined as of burst section, the direction of message, enters flow path direction or goes out flow path direction.Feature selecting is bulking value and the direction of stream bursts section.
Described step 2) in, use K-means clustering algorithm that stream bursts section bulking value is carried out to cluster, and determine final number of clusters according to the validity of cluster, then carry out numerical symbol.Meanwhile, use Multiple Sequence Alignment algorithm to process the sample data after meeting, by the mode of inserting space, sample data length is consistent, the versatility that implementation model is set up.
Described step 3) in, use heuristic to set up the Profile hidden Markov model of Tor anonymous communication flow.If the quantity that in row, letter occurs exceedes half, be Match state, otherwise be Insert state.Delete state determines by the room in row corresponding to Match state, and room is more, represents from previous state transitions larger to this Delete shape probability of state.If total N the letter of Match state and room, its Vacancy quantity is n, and to transfer to Delete shape probability of state be (n+1)/(N+1) to preceding state.Meanwhile, transition probability is calculated as state i and transfers to the number of times of state j divided by the total degree of state i transfer, and output probability is calculated as the number of times of i Match State-output character a divided by the total quantity of all characters of this State-output.
Described step 4) in, the probability that uses single parameter alpha associating uplink and downlink discharge model to produce, and α value is between 0 and 1.The upper layer application type information that decides Tor anonymous communication flow to comprise according to maximum joint probability value.
The present invention is directed to Tor anonymous communication flow application classification problem, utilize the data packet dispatching mechanism of Tor, definition stream bursts section concept, and using the bulking value of stream bursts section and direction as characteristic of division; Based on K-means clustering algorithm and Multiple Sequence Alignment algorithm, data sample is carried out to preliminary treatment, solve data sample over-fitting and length inconsistence problems by the mode in numerical symbolization and insertion space; Use Profile HMM to carry out respectively modeling to the uplink and downlink Tor anonymous communication flow of different application, propose a kind of heuritic approach Rapid Establishment Profile hidden Markov model.
Beneficial effect: the present invention has good classification effect, the feature such as the speed of service is fast, extra network load few (only need passive observation network traffics) can effectively realize the application class of Tor anonymous communication flow under large-scale network environment.
Brief description of the drawings
Fig. 1 is Tor anonymous communication system Organization Chart of the present invention;
Fig. 2 is the data dispatch policy map of the anonymous node of Tor of the present invention;
Fig. 3 is Profile HMM schematic diagram in the present invention;
Fig. 4 is specific embodiment of the invention flow chart.
Embodiment
This method mainly solves the Tor anonymous communication flow problem of application type acquisition of information at the middle and upper levels, relates to the correlation techniques such as feature selecting, sample preprocessing and flow modeling.First this method utilizes the data packet dispatching mechanism of Tor, definition stream bursts section concept, and using the bulking value of stream bursts section and direction as characteristic of division.Then based on K-means clustering algorithm and Multiple Sequence Alignment algorithm, data sample is carried out to preliminary treatment, solve data sample over-fitting and length inconsistence problems by the mode in numerical symbolization and insertion space.Finally, use Profile HMM to carry out respectively modeling to the uplink and downlink Tor anonymous communication flow of different application, propose a kind of heuritic approach Rapid Establishment Profile hidden Markov model.In the time of concrete classification, by in the Profile hidden Markov model of the feature substitution different application of network traffics to be sorted, calculate respectively the probability that uplink and downlink discharge model is corresponding, and the upper layer application type that decides Tor anonymous communication flow to be sorted to be comprised with maximum joint probability value.
Below in conjunction with accompanying drawing, the present invention is further described in more detail.
1, obtaining of Tor anonymous communication flow application characteristic of division
Tor anonymous communication system utilizes libevent event to realize the processing scheduling of data in input block and output buffer, can be expressed as poll (Round Robin) scheduling of implicit expression.When Tor node receives cell data from TLS/Socks interface, deposited in (Input Buffer) in corresponding input block.For the cell existing in input block, according to the difference of link direction, Tor adopts polling dispatching strategy to be decrypted or encryption it.
Polling dispatching strategy specifically describes: first processes the cell in first input block, handles after the cell of some, then process second cell in input block, by that analogy, until last input block.And then from first input block, so circulation repeatedly.For the cell in input block, after being disposed, in the corresponding output buffer of restoring (Output Buffer).For output buffer queue, similar with input block queue, Tor adopts poll strategy to dispatch equally, and the cell in different buffering areas is sent to network by TLS/Socks interface.
According to the scheduling strategy of Tor anonymous communication system, the present invention defines stream bursts section (FBS, Flow Burst Segmentation) is greater than 0 (not comprising header field) continuous message for being positioned at data length between rightabout message.If c1, c2, s1, s2, s3, s4, c3, c4 is that mutual message and the message length between client and server is greater than 0, and wherein, ci represents the message that client is sent, and si is the response message that server end returns, and i is natural number.According to the definition of above-mentioned stream bursts section, c1, c2}, s1, and s2, s3, s4} is with { c3, c4} is three various flows burst sections.Naturally, can be by the summation of all message lengths in stream bursts section bulking value (Segment Volume) section of being defined as, in the direction section of being defined as of burst section, the direction of message, enters flow path direction or goes out flow path direction.
2, sample data preliminary treatment
The present invention adopts K-means clustering algorithm to carry out symbolism to the bulking value of stream bursts section, is converted into letter character by numerical value, thus the span of reduction stream bursts section bulking value, the versatility of increase institute established model.
The detailed process of sample symbolism: (as shown in table 1)
First determine suitable number of clusters k f.Number of clusters k is incremented to 26 from 2, calculates all kinds of middle distance intra kwith between class distance inter k, then calculate validity v k=intra k/ inter k, and with minimum v kbe worth corresponding k as number of clusters, make the discharge model of setting up there is better generalization.
Determine number of clusters k fafter, for all types of application, all stream bursts section bulking values that first its training stage gathered carry out K-mean cluster, then the numerical value in each class are all used to same letter representation.(for example, for class 1, number range is wherein 16 to 676, and the unification of the numerical value in this segment limit represents with alphabetical A.) complete after cluster, all training samples are carried out to symbolism, determine the class at each bulking value place, then replace numerical value with such corresponding symbol, (shape is as <A, A, B, B to obtain characteristic vector after symbolism, D, C, C ... >).
The present invention adopts Multiple Sequence Alignment algorithm to compare to characteristic vector, makes the different characteristic vector length of same type application identical.
The basic skills of Multiple Sequence Alignment algorithm can be expressed as: by adding room (Gap) to make characteristic vector length identical, make the room minimum number that adds, i.e. Least-cost simultaneously.Because sequence exact matching needs a large amount of computing time and memory headroom, therefore the present invention mainly considers the progressive alignment algorithm based on sequence length, and its progressive comparison is completed by following three steps:
Step 1: by the comparison between two of sequence, calculate the distance between every pair of sequence, and then obtain distance matrix.The comparison between two of sequence is completed by dynamic programming algorithm, calculates the similarity scoring of two sequences with alternative manner, is stored in a score matrix, then according to this score matrix, recalls and finds optimum aligned sequences.
Step 2: calculate guide tree according to distance matrix.In this step, according to the distance matrix obtaining in step 1, build guide tree (Guide Tree).Guide tree representation be each order to aligned sequences in follow-up Multiple Sequence Alignment.
Step 3: the order of branch in setting along guide, the sequence that progressive comparison newly adds.In this step, complete equally the comparison of multisequencing by progressive contrast.Set the order from leaf node to root node according to guide, sequence is compared, the sequence pair that the relation of first comparing approaches the most, then introduces the sequence of closing on constantly rebuild comparison more gradually, until all sequences is all added into.
Similar with step 1, the comparison between sequence still completes by dynamic programming algorithm, but specifically, can have comparing between sequence and group (Profile) and group and group in step 3.All sequences are divided into many groups according to distance, thereby need to complete final sequence alignment to not comparing on the same group.
Table 1 is the algorithm false code of sample data symbolism:
3, the discharge model of Tor anonymous communication flow is set up
Profile hidden Markov model is made up of Match, Insert, tri-kinds of states of Delete.For determining Match, Insert, Delete state, first learning sample to be arranged, each characteristic vector is a line, forms sample matrix.It should be noted that characteristic vector now passed through sample preprocessing, formed by letter and room (with strigula "-" expression), and length is identical.Investigate each row of matrix, each row is Match or the Insert state in corresponding Profile HMM model.The present invention adopts following heuristic to determine the concrete state that each row is corresponding: if the quantity that in these row, letter occurs exceedes half, be Match state, otherwise be Insert state.Delete state is determined by the room in row corresponding to Match state.
Determine after Match, Insert and Delete state, also need transition probability between computing mode and the output probability of Match state.Because Delete state does not produce output, thereby without calculating corresponding output probability, and Insert state is output as random output, its observable character set comprises the character occurring in sample, and output probability is that equiprobability distributes, for all observable characters, the probability of its generation is 1/C, and C is character set size.
For the output probability of the transition probability between computing mode and Match state, first the transfer number between statistic behavior and row corresponding to Match state are planted the quantity that each character occurs.Transition probability is calculated as state i and transfers to the number of times of state j divided by the total degree of state i transfer, and output probability is calculated as the number of times of i Match State-output character a divided by the total quantity of all characters of this State-output.
4, combining the probable value that different models calculate classifies
Application class flow process is:
Step 1: be extracted into and go out the stream bursts section bulking value on both direction from stream f, obtaining characteristic vector V iand V e.
Step 2: according to the clustering information obtaining in the training stage, to V iand V ecarry out symbolism.Characteristic vector after note symbolism is respectively S iand S e.
Step 3: to each Profile hidden Markov model calculate SI by model the probability producing.Note probability is p i I = Pr ( S I | M i I ) , i = 1 , . . . , N .
Step 4: similar with step 3, to each Profile hidden Markov model calculate S eby model the probability producing.Note probability is p i E = Pr ( S E | M i E ) , i = 1 , . . . , N .
Step 5: calculate joint probability
p i = &alpha;p i I + ( 1 - &alpha; ) p i E , i = 1 , . . . , N .
Wherein, 0≤α≤1, reaches optimum classification results for regulating to become a mandarin and go out stream to the difference contribution of classification.
Step 6: the joint probability of selective value maximum
p m=argmax{p 1,p 2,...,p N}
The application type that flows f is defined as m application type in training set.
The present invention also can have other numerous embodiments; in the situation that not deviating from spirit of the present invention and essence thereof; those of ordinary skill in the art can make according to the present invention various corresponding changes and distortion, and these change and be out of shape the protection range that all should belong to the appended claim of the present invention accordingly.

Claims (5)

1. a method for Tor anonymous communication flow application classification, is characterized in that comprising step:
1) utilize the data packet dispatching mechanism of Tor, definition stream bursts section concept, and using the bulking value of stream bursts section and direction as characteristic of division;
2) based on K-means clustering algorithm and Multiple Sequence Alignment algorithm, the data sample of characteristic of division is carried out to preliminary treatment, solve data sample over-fitting and length inconsistence problems by the mode in numerical symbolization and insertion space; Through data sample preliminary treatment, numerical value is symbolism, be made up of, and length is identical letter and room;
3) use Profile HMM to carry out respectively modeling to the uplink and downlink Tor anonymous communication flow of different application;
4) last, by in the Profile hidden Markov model of the feature substitution different application of network traffics to be sorted, calculate respectively the probability that uplink and downlink discharge model is corresponding, and the upper layer application type that decides Tor anonymous communication flow to be sorted to be comprised with maximum joint probability value;
Described step 3) in, use heuristic to set up the Profile hidden Markov model of Tor anonymous communication flow, method is as follows:
The Profile hidden Markov model of Tor anonymous communication flow is made up of Match, Insert, tri-kinds of states of Delete; For determining Match, Insert, Delete state, first learning sample to be arranged, each characteristic vector of characteristic of division is a line, forms sample matrix;
Investigate each row of sample matrix, each row is Match or the Insert state in corresponding Profile hidden Markov model; The method of determining the concrete state that each row is corresponding is: if the quantity that in these row, letter occurs exceedes half, being Match state, if the quantity that in these row, letter occurs does not exceed half, is Insert state; Delete state determines by the room in row corresponding to Match state, and room is more, represents from previous state transitions larger to this Delete shape probability of state;
Because Delete state does not produce output, thereby without calculating its corresponding output probability, and Insert state is output as random output, its observable character set comprises the character occurring in sample, and output probability is that equiprobability distributes, for all observable characters, the probability of its generation is 1/C, and C is character set size; So determine after Match, Insert and Delete state, the only output probability of the transition probability between computing mode and Match state, method is:
First the quantity that in the row that the transfer number between statistic behavior and Match state are corresponding, each character occurs;
Transition probability is calculated as: state i transfers to the number of times of state j divided by the total degree of state i transfer;
Output probability is calculated as: the number of times of i Match State-output character a is divided by the total quantity of all characters of this State-output.
2. the method for Tor anonymous communication flow application classification according to claim 1, it is characterized in that described step 1) in, data length between rightabout message in network flow is greater than to 0 and does not comprise that the continuous message of header field is defined as stream bursts section, and the bulking value of stream bursts section is defined as to the summation of all message lengths in this stream bursts section, the direction of stream bursts section is defined as the direction of message in stream bursts section, enters flow path direction or goes out flow path direction; Characteristic of division is chosen as bulking value and the direction of stream bursts section.
3. the method for Tor anonymous communication flow application classification according to claim 2, is characterized in that described step 1) in,
The processing that Tor anonymous communication system utilizes libevent event to realize data in input block and output buffer is dispatched, and is expressed as the polling dispatching of implicit expression; When Tor node receives cell data from TLS/Socks interface, deposited in corresponding input block; For the cell existing in input block, Tor adopts polling dispatching strategy to be decrypted or encryption it;
The method of polling dispatching strategy is:
First process the cell in first input block, handle after the cell of some, then process second cell in input block, by that analogy, until last input block; And then return to first input block and process, so circulation is repeatedly;
For the cell in input block, after being disposed, restore in corresponding output buffer; For output buffer queue, similar with input block queue, Tor adopts poll strategy to dispatch equally, and the cell in different buffering areas is sent to network by TLS/Socks interface;
For stream bursts section, establish c1, c2, s1, s2, s3, s4, c3, c4 is that mutual message and the message length between client and server is greater than 0, and wherein, ci represents the message that client is sent, and si is the response message that server end returns, and i is natural number; C1, c2}, s1, s2, s3, s4} and c3, c4} is three various flows burst section; , by the summation of all message lengths in the stream bursts section bulking value section of being defined as, in the direction section of being defined as of burst section, the direction of message, enters flow path direction or goes out flow path direction.
4. the method for Tor anonymous communication flow application classification according to claim 1, it is characterized in that described step 2) in, adopt K-means clustering algorithm to carry out symbolism to the bulking value of stream bursts section, be converted into letter character by numerical value, the semiosis of sample comprises:
First determine suitable number of clusters k f: number of clusters k is incremented to 26 from 2, calculates all kinds of middle distance intra kwith between class distance inter k, then calculate validity v k=intra k/ inter k, and with minimum v kbe worth corresponding k as number of clusters;
Determine number of clusters k fafter, for all types of application, all stream bursts section bulking values that first its training stage gathered carry out K-mean cluster, then the numerical value in each class are all used to same letter representation; Complete after cluster, all training samples are carried out to symbolism, determine the class at each bulking value place, then replace numerical value with such corresponding symbol, obtain the characteristic vector of the characteristic of division after symbolism;
Adopt Multiple Sequence Alignment algorithm to compare to characteristic vector, make the length of different characteristic vector of same type application identical, specifically adopt the progressive alignment algorithm based on sequence length, its progressive comparison is completed by following three steps:
A: by the comparison between two of sequence, calculate the distance between every pair of sequence, and then obtain distance matrix; The comparison between two of sequence is completed by dynamic programming algorithm, calculates the similarity scoring of two sequences with alternative manner, is stored in a score matrix, then according to this score matrix, recalls and finds optimum aligned sequences;
B: according to the distance matrix obtaining in step a, build guide tree, guide tree representation be each order to aligned sequences in follow-up Multiple Sequence Alignment;
C: the order of branch in setting along guide, the sequence that progressive comparison newly adds; In this step, complete the comparison of multisequencing by progressive contrast; Set the order from leaf node to root node according to guide, sequence is compared, the sequence pair that the relation of first comparing approaches the most, then introduces the sequence of closing on constantly rebuild comparison more gradually, until all sequences is all added into; A is similar with step, and the comparison between sequence still completes by dynamic programming algorithm, but specifically, can have comparing between sequence and group and group and group in this step c; All sequences are divided into many groups according to distance, thereby need to complete final sequence alignment to not comparing on the same group.
5. the method for Tor anonymous communication flow application classification according to claim 1, is characterized in that described step 4) in, application class flow process is:
4.1: from stream f, extract the stream bursts section bulking value on descending and up both direction, obtain characteristic vector and be designated as respectively V iand V e;
4.2: according to the clustering information obtaining in the training stage, to V iand V ecarry out symbolism; Characteristic vector after note symbolism is respectively S iand S e;
4.3: the Profile hidden Markov model of the vector correspondence of the bulking value of the stream bursts section on down direction is to each model calculate S iby model the probability producing, note probability is p i I = Pr ( S I | M i I ) , i = 1 , . . . , N ;
4.4: the Profile hidden Markov model of the vector correspondence of the bulking value of the stream bursts section on up direction is to each model calculate S eby model the probability producing, note probability is p i E = Pr ( S E | M i E ) , i = 1 , . . . , N ;
4.5: calculate joint probability
p i = &alpha;p i I + ( 1 - &alpha; ) p i E , i = 1 , . . . , N .
Wherein, 0≤α≤1, α becomes a mandarin and goes out the difference contribution of stream to classification for regulating, to reach optimum classification results;
4.6: the joint probability of selective value maximum
p m=argmax{p 1,p 2,...,p N}
The application type that flows f is defined as m application type in training set.
CN201410370944.8A 2014-07-30 2014-07-30 Method of application classification in Tor anonymous communication flow Active CN104135385B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410370944.8A CN104135385B (en) 2014-07-30 2014-07-30 Method of application classification in Tor anonymous communication flow

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410370944.8A CN104135385B (en) 2014-07-30 2014-07-30 Method of application classification in Tor anonymous communication flow

Publications (2)

Publication Number Publication Date
CN104135385A true CN104135385A (en) 2014-11-05
CN104135385B CN104135385B (en) 2017-05-24

Family

ID=51807914

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410370944.8A Active CN104135385B (en) 2014-07-30 2014-07-30 Method of application classification in Tor anonymous communication flow

Country Status (1)

Country Link
CN (1) CN104135385B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104361123A (en) * 2014-12-03 2015-02-18 中国科学技术大学 Individual behavior data anonymization method and system
CN104702465A (en) * 2015-02-09 2015-06-10 桂林电子科技大学 Parallel network flow classification method
CN109194657A (en) * 2018-09-11 2019-01-11 北京理工大学 A kind of encrypting web traffic characteristic extracting method based on accumulation data packet length
CN109728977A (en) * 2019-01-14 2019-05-07 电子科技大学 JAP anonymity flow rate testing methods and system
CN109951444A (en) * 2019-01-29 2019-06-28 中国科学院信息工程研究所 A kind of encryption Anonymizing networks method for recognizing flux
CN110113338A (en) * 2019-05-08 2019-08-09 北京理工大学 A kind of encryption traffic characteristic extracting method based on Fusion Features
CN110363023A (en) * 2019-06-20 2019-10-22 广东工业大学 A kind of Anonymizing networks source tracing method based on PHMM
CN112866369A (en) * 2021-01-12 2021-05-28 北京工业大学 Anonymous P2P network anonymity degree evaluation method based on hidden Markov model
CN113037709A (en) * 2021-02-02 2021-06-25 厦门大学 Webpage fingerprint monitoring method for multi-label browsing of anonymous network
CN114422210A (en) * 2021-12-30 2022-04-29 中国人民解放军战略支援部队信息工程大学 Anonymous network passive flow analysis and evaluation method and system based on AnoA theory
CN114500396A (en) * 2022-02-09 2022-05-13 江苏大学 MFD chromatographic characteristic extraction method and system for distinguishing anonymous Tor application flow

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030225903A1 (en) * 2002-06-04 2003-12-04 Sandeep Lodha Controlling the flow of packets within a network node utilizing random early detection
US20040095934A1 (en) * 2002-11-18 2004-05-20 Cosine Communications, Inc. System and method for hardware accelerated packet multicast in a virtual routing system
CN101252541A (en) * 2008-04-09 2008-08-27 中国科学院计算技术研究所 Method for establishing network flow classified model and corresponding system thereof
CN102664881A (en) * 2012-04-13 2012-09-12 东南大学 Method for positioning hidden service under hypertext transfer protocol 1.1

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030225903A1 (en) * 2002-06-04 2003-12-04 Sandeep Lodha Controlling the flow of packets within a network node utilizing random early detection
US20040095934A1 (en) * 2002-11-18 2004-05-20 Cosine Communications, Inc. System and method for hardware accelerated packet multicast in a virtual routing system
CN101252541A (en) * 2008-04-09 2008-08-27 中国科学院计算技术研究所 Method for establishing network flow classified model and corresponding system thereof
CN102664881A (en) * 2012-04-13 2012-09-12 东南大学 Method for positioning hidden service under hypertext transfer protocol 1.1

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
何高峰等: "Tor匿名通信流量在线识别方法", 《软件学报》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104361123B (en) * 2014-12-03 2017-11-03 中国科学技术大学 A kind of personal behavior data anonymous method and system
CN104361123A (en) * 2014-12-03 2015-02-18 中国科学技术大学 Individual behavior data anonymization method and system
CN104702465A (en) * 2015-02-09 2015-06-10 桂林电子科技大学 Parallel network flow classification method
CN104702465B (en) * 2015-02-09 2017-10-10 桂林电子科技大学 A kind of parallel network flow sorting technique
CN109194657B (en) * 2018-09-11 2020-05-12 北京理工大学 Webpage encryption traffic characteristic extraction method based on accumulated data packet length
CN109194657A (en) * 2018-09-11 2019-01-11 北京理工大学 A kind of encrypting web traffic characteristic extracting method based on accumulation data packet length
CN109728977A (en) * 2019-01-14 2019-05-07 电子科技大学 JAP anonymity flow rate testing methods and system
CN109951444B (en) * 2019-01-29 2020-05-22 中国科学院信息工程研究所 Encrypted anonymous network traffic identification method
CN109951444A (en) * 2019-01-29 2019-06-28 中国科学院信息工程研究所 A kind of encryption Anonymizing networks method for recognizing flux
CN110113338A (en) * 2019-05-08 2019-08-09 北京理工大学 A kind of encryption traffic characteristic extracting method based on Fusion Features
CN110113338B (en) * 2019-05-08 2020-06-26 北京理工大学 Encrypted flow characteristic extraction method based on characteristic fusion
CN110363023A (en) * 2019-06-20 2019-10-22 广东工业大学 A kind of Anonymizing networks source tracing method based on PHMM
CN110363023B (en) * 2019-06-20 2023-03-21 广东工业大学 Anonymous network tracing method based on PHMM
CN112866369A (en) * 2021-01-12 2021-05-28 北京工业大学 Anonymous P2P network anonymity degree evaluation method based on hidden Markov model
CN112866369B (en) * 2021-01-12 2023-07-25 北京工业大学 Anonymous P2P network anonymity degree assessment method based on hidden Markov model
CN113037709A (en) * 2021-02-02 2021-06-25 厦门大学 Webpage fingerprint monitoring method for multi-label browsing of anonymous network
CN113037709B (en) * 2021-02-02 2022-03-29 厦门大学 Webpage fingerprint monitoring method for multi-label browsing of anonymous network
CN114422210A (en) * 2021-12-30 2022-04-29 中国人民解放军战略支援部队信息工程大学 Anonymous network passive flow analysis and evaluation method and system based on AnoA theory
CN114500396A (en) * 2022-02-09 2022-05-13 江苏大学 MFD chromatographic characteristic extraction method and system for distinguishing anonymous Tor application flow
CN114500396B (en) * 2022-02-09 2024-04-16 江苏大学 MFD chromatographic feature extraction method and system for distinguishing anonymous Torr application flow

Also Published As

Publication number Publication date
CN104135385B (en) 2017-05-24

Similar Documents

Publication Publication Date Title
CN104135385A (en) Method of application classification in Tor anonymous communication flow
Nie et al. Intrusion detection for secure social internet of things based on collaborative edge computing: a generative adversarial network-based approach
CN103078897B (en) A kind of system realizing Web service fine grit classification and management
Wang et al. Multilevel identification and classification analysis of Tor on mobile and PC platforms
Haddadi et al. Botnet behaviour analysis using ip flows: with http filters using classifiers
Abusnaina et al. Dfd: Adversarial learning-based approach to defend against website fingerprinting
Qin et al. Federated learning-based network intrusion detection with a feature selection approach
Han et al. Covert timing channel detection method based on time interval and payload length analysis
Lingyu et al. A hierarchical classification approach for tor anonymous traffic
CN104601596A (en) Data privacy protection method in classification data mining system
Al-Duwairi et al. BotDigger: a fuzzy inference system for botnet detection
CN113779355B (en) Network rumor tracing evidence obtaining method and system based on blockchain
Chen et al. Sequential message characterization for early classification of encrypted internet traffic
Lin et al. ERID: a deep learning-based approach towards efficient real-time intrusion detection for IoT
CN112507336A (en) Server-side malicious program detection method based on code characteristics and flow behaviors
Lee et al. A machine learning approach to predicting block cipher security
CN115277216A (en) Vulnerability exploitation attack encryption flow classification method based on multi-head self-attention mechanism
CN109858510A (en) A kind of detection method for http protocol ETag value covert communications
Wang et al. Feature mining for encrypted malicious traffic detection with deep learning and other machine learning algorithms
Santhosh et al. Detection Of DDOS Attack using Machine Learning Models
De Souza et al. A distinguishing attack with a neural network
Zheng et al. Detecting malicious tls network traffic based on communication channel features
Deng et al. Identifying tor anonymous traffic based on gravitational clustering analysis
Song et al. A de-anonymize attack method based on traffic analysis
CN106534144A (en) Network covert channel construction method based on Web application directory tree

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant