CN112235254A - Rapid identification method for Tor network bridge in high-speed backbone network - Google Patents

Rapid identification method for Tor network bridge in high-speed backbone network Download PDF

Info

Publication number
CN112235254A
CN112235254A CN202011003470.5A CN202011003470A CN112235254A CN 112235254 A CN112235254 A CN 112235254A CN 202011003470 A CN202011003470 A CN 202011003470A CN 112235254 A CN112235254 A CN 112235254A
Authority
CN
China
Prior art keywords
counter
data
packets
tor
sent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011003470.5A
Other languages
Chinese (zh)
Other versions
CN112235254B (en
Inventor
吴桦
郭树一
程光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202011003470.5A priority Critical patent/CN112235254B/en
Publication of CN112235254A publication Critical patent/CN112235254A/en
Application granted granted Critical
Publication of CN112235254B publication Critical patent/CN112235254B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computer Security & Cryptography (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Algebra (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a method for rapidly identifying a Tor network bridge in a high-speed backbone network, which comprises the following specific steps: selecting relevant characteristics capable of being used for Tor bridge identification in a high-speed backbone network, and constructing a small-scale traffic data training set for model training; sampling a data packet in a high-speed backbone network, and performing statistics of data packet record and extraction of characteristic values by using a multiple Count Bloom Filter algorithm; and identifying and classifying the records of the sampled data packets by using the trained model to obtain a network bridge list. The invention can quickly and accurately identify the Tor network bridge in the backbone network, provide a network bridge list for a network manager and effectively improve the efficiency of network management; because the selected features are mostly proportional features, the selected features can be extracted from the sampled incomplete flow data and used for identification and classification, and the storage consumption of the features is reduced.

Description

Rapid identification method for Tor network bridge in high-speed backbone network
Technical Field
The invention belongs to the technical field of network space safety, and relates to a rapid identification method for a Tor network bridge in a high-speed backbone network.
Background
With the increasing security situation of the cyberspace, the supervision of the cyberspace is more strict. To evade surveillance, more and more lawbreakers choose to conduct illegal activities through the darknet. The second generation onion routing Tor, the most widely used darknet technology, is the first choice for most lawbreakers due to its high concealment and ease of operation. Therefore, in order to maintain the security of the network space, the identification of the use of the darknet is one of the research hotspots in the network security field.
Tor is most widely used compared to other darknet technologies. To ensure anonymity and to resist tracking, when using Tor for network access, the host first requests three onion routes of public address from the directory server to establish a communication link and encrypts the transmission using TLS. On the basis, Tor also introduces a bridge and an obfuscation protocol, and a host end is firstly connected with a bridge route of an undisclosed address, and then a communication link is established from the bridge route. Accordingly, the source address of the host end cannot be acquired by the onion route in the link, which causes further difficulty for network supervision.
In recent years, identification research aiming at the use of a hidden network at home and abroad mainly focuses on flow identification, and mainly focuses on a machine learning method. These studies have been essentially developed around the improvement of feature selection and machine learning algorithms, where the selected features can achieve good recognition in the complete flow data. However, the existing method has the following main problems: (1) the current method is based on a complete flow data set for research, and the selected characteristics are only suitable for complete flow data; (2) in order to improve the indexes such as identification accuracy and the like, the number of selected features in the existing research is large, and a large amount of resources are consumed during extraction and storage; (3) the identification research on the complete flow data is difficult to realize under the large-scale flow of the high-speed backbone network. The above problems cause that the existing method cannot realize the rapid identification of the Tor bridge in the high-speed backbone network environment.
Therefore, in order to realize the rapid identification of the Tor network bridge in the high-speed backbone network environment, the invention performs sampling operation at the high-speed backbone network route, selects the characteristics and selects the identification characteristics still applicable to the sampled data packet record; in order to improve the calculation and storage efficiency of the features, a multiple Count Bloom Filter algorithm is used for counting the data packet records and processing the features.
Disclosure of Invention
Aiming at the Tor bridge possibly existing in the high-speed backbone network, firstly, carrying out characteristic selection on traffic between a host end and the bridge, selecting identification characteristics still applicable to a sampled data packet record, carrying out sampling operation at a high-speed backbone network route, carrying out statistics on the data packet and calculation on characteristic values by using a multiple Count Bloom Filter algorithm in order to improve the calculation and storage efficiency of the characteristics, and finally, using a random forest algorithm to carry out identification on the bridge.
In order to achieve the purpose, the invention provides the following technical scheme:
a rapid identification method for Tor bridges in a high-speed backbone network comprises the following steps:
(1) collecting and storing Tor flow data and normal flow data used for model training;
(2) extracting features which can be used for complete flow data identification and classification from the original data, selecting the features, extracting training data from the original data after the features which can be used for recording identification and classification are reserved, and performing model training of machine learning;
(3) sampling flow data at a high-speed backbone network route, and then processing a data packet obtained by sampling by using a multiple Count Bloom Filter algorithm to obtain a record;
(4) and (4) inputting the sampling statistical result obtained in the step (3) into the model processing record trained in the step (2) for identifying the network bridge.
Further, the step (1) specifically includes the following substeps:
(1.1) installing Tor Browser software at a host end, and selecting to use a network bridge to establish a communication link;
(1.2) starting an application to start Tor flow data acquisition;
(1.3) performing network access using the Tor Browser;
(1.4) stopping collecting after the webpage is loaded, and storing the currently collected Tor flow data file between the host and the network bridge;
(1.5) starting an application to start common flow data acquisition;
(1.6) operating with common applications;
(1.7) stopping collecting after the operation is finished, and storing the currently collected common flow data file;
and (1.8) repeating the operations (1.2) to (1.7) until a sufficient amount of flow data is collected.
Further, the step (2) specifically includes the following sub-steps:
(2.1) firstly, extracting characteristics and training a model by using the complete flow data acquired in the step (1), and selecting a random forest algorithm with high use accuracy;
(2.2) when the characteristics are selected, the importance of the characteristics is evaluated by using a method based on the kini index in a random forest algorithm, wherein the calculation method of the kini index is as follows:
Figure BDA0002695109630000021
where k represents k classes, pkA sample weight representing a class k;
then feature XjThe importance of the node m, i.e., the variation of the kini index before and after branching of the node m, is:
Figure BDA0002695109630000031
wherein GImGini index, GI, representing the pre-branching nodelAnd GIrRespectively representing the Gini indexes of two new nodes after branching;
(2.3) comprehensively considering the feature importance and the usability in the record, and selecting a proper available feature;
and (2.4) taking the flow data collected in the step (1) as original data, extracting training data from the original data through previous feature engineering, and performing model training by using a random forest algorithm.
Further, suitable characteristics available in said step (2.3) are shown in the following table:
feature(s) Means of
F1 Whether more than half of the packets have time stamps
F2 Ratio of non-empty packets sent by client to total number of packets
F3 The ratio of the non-empty packets sent by the server to the total number of the packets
F4 Ratio of empty packet sent by client to non-empty packet sent by server
F5 Ratio of empty packet sent by server to non-empty packet sent by client
F6 Ratio of non-empty packets sent by client to total number of data packets
F7 Server-side issued nonRatio of empty packets to total number of data packets
F8 Proportion of PSH packets sent by client to total number of data packets
F9 The proportion of PSH packets sent by the server side to the total number of data packets
F10 The proportion of packets with the length of 0-50 sent by the client to the total number of data packets
F11 The proportion of packets with the length of 50-200 sent by the client to the total number of data packets
F12 The proportion of the packets with the length larger than 1200 sent by the client to the total number of the data packets
F13 The proportion of packets with the length of 50-200 sent by the server side to the total number of the data packets
F14 The proportion of the packets with the length larger than 1200 sent by the server side to the total number of the data packets
Further, the step (3) specifically includes the following sub-steps:
(3.1) setting a data packet sampling proportion at a high-speed backbone network route for carrying out flow sampling;
and (3.2) processing the sampled data packet by using an MCBF algorithm to obtain a statistical result.
Further, the step (3.2) specifically includes the following sub-steps:
(3.2.1) for each sampled data packet, respectively taking the { source IP address, port number } and { destination IP address, port number } of the data packet as the input of a hash function, and respectively obtaining a plurality of outputs mapped to the corresponding positions of the MCBF by twice input;
(3.2.2) there exists a 12-byte data structure in each mapped location for storing the information related to the characteristics in the data packet, if the data packet satisfies the corresponding information, adding 1 to the location corresponding to the data structure, otherwise, not changing;
(3.2.3) when the set threshold value theta is reached, extracting the stored information, and then calculating a characteristic value;
and (3.2.4) calculating the extracted information to obtain a recorded characteristic statistical result.
In the step (3.2.2), the information to be stored is shown in the following table:
Figure BDA0002695109630000041
further, in the step (3.2.3), the information stored in the position where the number of packets sent by the client is recorded to be the minimum is taken as the extracted information.
In the step (3.2.4), the calculation correspondence between the information stored in each position and the characteristics is shown in the following table:
feature(s) Calculation method
F1 If the value in Counter 12 is greater than 1/2 θ, F1 is labeled 1, otherwise 0
F2 Counter 2/Counter 1
F3 Counter 4/Counter 3
F4 (Counter 1-Counter 2)/Counter 4
F5 (Counter 3-Counter 4)/Counter 2
F6 Counter 2/(Counter 1+Counter 3)
F7 Counter 4/(Counter 1+Counter 3)
F8 Counter 5/(Counter 1+Counter 3)
F9 Counter 6/(Counter 1+Counter 3)
F10 Counter 7/(Counter11+Counter 3)
F11 Counter 8/(Counter 1+Counter 3)
F12 Counter 9/(Counter 1+Counter 3)
F13 Counter 10/(Counter 1+Counter 3)
F14 Counter 11/(Counter 1+Counter 3)
Where the value of F1 is determined by Counter 12 and the threshold θ, if the value in Counter 12 is greater than 1/2 θ, then F1 for that record is marked as 1.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) the invention can quickly and accurately identify the Tor network bridge in the backbone network, provide a network bridge list for a network manager and effectively improve the efficiency of network management.
(2) The selected features are mostly proportional features, and can be extracted from the sampled incomplete flow data for identification and classification, so that the storage consumption of the features is reduced.
(3) The invention uses multiple Count Bloom Filter algorithm for statistical processing of the sampled data packet in the high-speed backbone network, thereby improving the efficiency of data packet processing.
Drawings
FIG. 1 is a framework of the method of the present invention for rapidly identifying a Tor bridge in a high speed backbone network.
Fig. 2 shows the accuracy of different machine learning algorithm models when performing complete flow data identification and classification.
FIG. 3 shows the accuracy of the trained model.
FIG. 4 is a diagram of the multiple Count Bloom Filter algorithm.
Fig. 5 shows the sampling ratio fixed at 64: 1, prediction result parameters under different threshold conditions.
Detailed Description
The technical solutions provided by the present invention will be described in detail below with reference to specific examples, and it should be understood that the following specific embodiments are only illustrative of the present invention and are not intended to limit the scope of the present invention.
The invention provides a method for rapidly identifying a Tor bridge in a high-speed backbone network, wherein an identification frame is shown in figure 1 and comprises three parts, the first part is the construction of a training data set, the specific content is the extraction of relevant characteristics which can be used for Tor bridge identification in the high-speed backbone network, the construction of a small-scale traffic data training set, and the training of a machine learning model is carried out in the training set; the second part is the operation in the high-speed backbone network, the concrete content is that the sampling of the data packet is carried out in the high-speed backbone network, and the record statistics and the calculation of the characteristic value of the data packet after the sampling are carried out by using a multiple Count Bloom Filter algorithm; and the third part is the identification operation of the network bridge and outputs a network bridge list, and specifically, the method comprises the steps of identifying and classifying the record of the sampled data packet by using a trained machine learning model, predicting the network bridge and recording the network bridge list. In the second part, the data results after sampling and processing by multiple Count Bloom Filter algorithm are called records, and each record contains server IP, port and related characteristic value.
Specifically, the method of the invention comprises the following steps:
(1) and (4) collecting and storing Tor flow data and normal flow data used for model training.
The specific process of the step is as follows:
(1.1) installing Tor Browser software at a host end, and selecting to use a network bridge to establish a communication link;
(1.2) starting a Wireshark flow acquisition application to start Tor flow data acquisition;
(1.3) performing network access using the Tor Browser;
(1.4) stopping collecting after the webpage is loaded, and storing a Tor flow data file (. pcap) between the host end and the network bridge which is collected currently;
(1.5) starting a Wireshark flow acquisition application to start common flow data acquisition;
(1.6) using common applications for operations including but not limited to web access, chat, etc.;
(1.7) stopping collecting after the operation is finished, and storing the currently collected common flow data file (. pcap);
(1.8) repeating the operations (1.2) - (1.7) until a total of approximately 10000 flow data are collected.
(2) Extracting the characteristics which can be used for complete flow data identification and classification from the original data, selecting the characteristics, keeping the characteristics which can be used for recording the identification and classification, extracting training data from the original data, and performing model training of machine learning.
The specific process in this step is as follows:
(2.1) firstly, extracting features and training a model by using the complete flow data acquired in the step (1), and selecting a random forest algorithm with the highest accuracy by comparing parameters such as accuracy of algorithm models such as random forests, K neighbors and naive Bayes as shown in figure 2.
(2.2) when the feature selection is carried out, the importance of the feature is evaluated by using a method based on the Gini index in a random forest algorithm. The calculation method of the kini index is as follows:
Figure BDA0002695109630000061
where k represents k classes, pkRepresenting the sample weight of class k.
Then feature XjThe importance of the node m, i.e., the variation of the kini index before and after branching of the node m, is:
Figure BDA0002695109630000062
wherein GImGini index, GI, representing the pre-branching nodelAnd GIrRespectively representing the kini indexes of two new nodes after branching.
(2.3) the final selected features, after taking into account the importance scores of the features and the availability of the features in the records, are shown in Table 1:
TABLE 1 available characteristics
Figure BDA0002695109630000063
Figure BDA0002695109630000071
And (2.4) taking the flow data collected in the step (1) as raw data, completing feature extraction and selection through the previous two steps of (2.1) and (2.2), finally determining available features in the step (2.3), extracting training data from the raw data according to the available features, and performing model training by using a random forest algorithm, wherein the model accuracy is shown in figure 3, wherein the category 1 represents ordinary flow, and the category 0 represents Tor flow.
(3) Sampling flow data at a high-speed backbone network route, storing a data packet according to a sampling ratio, and processing the data packet obtained by sampling by using a multiple Count Bloom Filter algorithm to obtain a record;
the method specifically comprises the following steps:
(3.1) acquiring a verification data set, wherein the verification data set comprises two parts, one part is traffic for carrying out Tor network access by using the same bridge in application, and the other part is traffic data acquired from zero point to fifteen point in 4 months, 9 days in early morning of 2019 by the Japan MAWI working group. The validation data set was sampled at a sampling ratio set to 128: 1;
(3.2) processing the sampled data packet by using a multiple Count Bloom Filter algorithm (MCBF for short) to obtain a statistical result, wherein the algorithm structure is shown in fig. 4, and the specific process is as follows:
(3.2.1) for each sampled data packet, respectively taking the { source IP address, port number } and { destination IP address, port number } of the data packet as the input of a hash function, and respectively obtaining a plurality of outputs mapped to the corresponding positions of the MCBF by twice input;
(3.2.2) there is a 12 byte data structure in each mapped location for storing information about the characteristics in the packet, the information to be stored being as shown in table 2;
table 2 stored information
Figure BDA0002695109630000072
Figure BDA0002695109630000081
If the data packet meets the corresponding information, adding 1 to the position corresponding to the data structure, otherwise, keeping the data structure unchanged;
(3.2.3) when the set threshold is reached, namely when the number of data packets sent by the client reaches 100, extracting the stored information, and then calculating the characteristic value. Considering that when the number of the data packets is too large, the hash result may have an error, and therefore, the information stored in the position where the number of the data packets sent by the client is recorded to the minimum is taken as the extracted information;
(3.2.4) calculating the extracted information to obtain the recorded characteristic statistical result, wherein the calculation corresponding relation between the information stored in each position and the characteristics is shown in a table 3,
TABLE 3 correspondence of features to information
Feature(s) Calculation method
F1 If the value in Counter 12 is greater than 1/2 θ, F1 is labeled 1, otherwise 0
F2 Counter 2/Counter 1
F3 Counter 4/Counter 3
F4 (Counter 1-Counter 2)/Counter 4
F5 (Counter 3-Counter 4)/Counter 2
F6 Counter 2/(Counter 1+Counter 3)
F7 Counter 4/(Counter 1+Counter 3)
F8 Counter 5/(Counter 1+Counter 3)
F9 Counter 6/(Counter 1+Counter 3)
F10 Counter 7/(Counter 1+Counter 3)
F11 Counter 8/(Counter 1+Counter 3)
F12 Counter 9/(Counter 1+Counter 3)
F13 Counter 10/(Counter 1+Counter 3)
F14 Counter 11/(Counter 1+Counter 3)
Partial statistics as shown in table 4, when the value in Counter 12 is greater than half the threshold, i.e., 50, then F1 is labeled as 1;
table 4 partial statistical results
Figure BDA0002695109630000082
Figure BDA0002695109630000091
(4) And (3) carrying out identification classification on the bridges on the records by using the model trained in the step (2), carrying out identification on the bridges, and outputting a bridge list. The partial identification results are shown in table 5, where category 0 indicates that the server is identified as a Tor bridge, and category 1 indicates that the server is identified as a normal server.
Table 5 partial recognition results
Figure BDA0002695109630000092
Figure BDA0002695109630000101
To verify the accuracy of the invention at different sampling ratios and thresholds, when the sampling ratio is fixed at 64: the results of the experiments with different thresholds at 1 are shown in FIG. 5.
The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and such improvements and modifications are also considered to be within the scope of the present invention.

Claims (9)

1. A rapid identification method for a Tor bridge in a high-speed backbone network is characterized by comprising the following steps:
(1) collecting and storing Tor flow data and normal flow data used for model training;
(2) extracting features which can be used for complete flow data identification and classification from the original data, selecting the features, extracting training data from the original data after the features which can be used for recording identification and classification are reserved, and performing model training of machine learning;
(3) sampling flow data at a high-speed backbone network route, and then processing a data packet obtained by sampling by using a multiple Count Bloom Filter algorithm to obtain a record;
(4) and (4) inputting the sampling statistical result obtained in the step (3) into the model processing record trained in the step (2) for identifying the network bridge.
2. The method for rapidly identifying Tor bridges in a high-speed backbone network according to claim 1, wherein said step (1) comprises the following sub-steps:
(1.1) installing Tor Browser software at a host end, and selecting to use a network bridge to establish a communication link;
(1.2) starting an application to start Tor flow data acquisition;
(1.3) performing network access using the Tor Browser;
(1.4) stopping collecting after the webpage is loaded, and storing the currently collected Tor flow data file between the host and the network bridge;
(1.5) starting an application to start common flow data acquisition;
(1.6) operating with common applications;
(1.7) stopping collecting after the operation is finished, and storing the currently collected common flow data file;
and (1.8) repeating the operations (1.2) to (1.7) until a sufficient amount of flow data is collected.
3. The method for rapidly identifying Tor bridges in a high-speed backbone network according to claim 1, wherein said step (2) comprises the following sub-steps:
(2.1) firstly, extracting characteristics and training a model by using the complete flow data acquired in the step (1), and selecting a random forest algorithm with high use accuracy;
(2.2) when the characteristics are selected, the importance of the characteristics is evaluated by using a method based on the kini index in a random forest algorithm, wherein the calculation method of the kini index is as follows:
Figure FDA0002695109620000011
where k represents k classes, pkA sample weight representing a class k;
then feature XjThe importance of the node m, i.e., the variation of the kini index before and after branching of the node m, is:
Figure FDA0002695109620000012
wherein GImGini index, GI, representing the pre-branching nodelAnd GIrRespectively representing the Gini indexes of two new nodes after branching;
(2.3) comprehensively considering the feature importance and the usability in the record, and selecting a proper available feature;
and (2.4) taking the flow data collected in the step (1) as original data, extracting training data from the original data through previous feature engineering, and performing model training by using a random forest algorithm.
4. The method for rapid identification of Tor bridges in a high speed backbone network according to claim 3, wherein the suitable available characteristics in step (2.3) are shown in the following table:
feature(s) Means of F1 Whether more than half of the packets have time stamps F2 Ratio of non-empty packets sent by client to total number of packets F3 The ratio of the non-empty packets sent by the server to the total number of the packets F4 Ratio of empty packet sent by client to non-empty packet sent by server F5 Ratio of empty packet sent by server to non-empty packet sent by client F6 Ratio of non-empty packets sent by client to total number of data packets F7 The ratio of the non-empty packets sent by the server to the total number of the data packets F8 Proportion of PSH packets sent by client to total number of data packets F9 The proportion of PSH packets sent by the server side to the total number of data packets F10 The proportion of packets with the length of 0-50 sent by the client to the total number of data packets F11 The proportion of packets with the length of 50-200 sent by the client to the total number of data packets F12 The proportion of the packets with the length larger than 1200 sent by the client to the total number of the data packets F13 The proportion of packets with the length of 50-200 sent by the server side to the total number of the data packets F14 The proportion of the packets with the length larger than 1200 sent by the server side to the total number of the data packets
5. The method for rapidly identifying Tor bridges in a high-speed backbone network according to claim 1, wherein said step (3) comprises the following sub-steps:
(3.1) setting a data packet sampling proportion at a high-speed backbone network route for carrying out flow sampling;
and (3.2) processing the sampled data packet by using an MCBF algorithm to obtain a statistical result.
6. The method for rapidly identifying Tor bridges in a high-speed backbone network according to claim 5, wherein said step (3.2) comprises the following sub-steps:
(3.2.1) for each sampled data packet, respectively taking the { source IP address, port number } and { destination IP address, port number } of the data packet as the input of a hash function, and respectively obtaining a plurality of outputs mapped to the corresponding positions of the MCBF by twice input;
(3.2.2) there exists a 12-byte data structure in each mapped location for storing the information related to the characteristics in the data packet, if the data packet satisfies the corresponding information, adding 1 to the location corresponding to the data structure, otherwise, not changing;
(3.2.3) when the set threshold value theta is reached, extracting the stored information, and then calculating a characteristic value;
and (3.2.4) calculating the extracted information to obtain a recorded characteristic statistical result.
7. The method for fast identification of Tor bridges in a high speed backbone network according to claim 6, wherein in said step (3.2.2), the information needed to be stored is as shown in the following table:
Figure FDA0002695109620000031
8. the method for fast identifying Tor bridges in a high-speed backbone network according to claim 6, wherein in said step (3.2.3), the information stored in the location where the number of packets sent by the client is the least recorded is used as the extracted information.
9. The method of claim 6, wherein in step (3.2.4), the computed correspondence of the information stored in each location to the characteristics is as shown in the following table:
feature(s) Calculation method F1 If the value in Counter 12 is greater than 1/2 θ, F1 is labeled 1, otherwise 0 F2 Counter 2/Counter 1 F3 Counter 4/Counter 3 F4 (Counter 1-Counter 2)/Counter 4 F5 (Counter 3-Counter 4)/Counter 2 F6 Counter 2/(Counter 1+Counter 3) F7 Counter 4/(Counter 1+Counter 3) F8 Counter 5/(Counter 1+Counter 3) F9 Counter 6/(Counter 1+Counter 3) F10 Counter 7/(Counter 1+Counter 3) F11 Counter 8/(Counter 1+Counter 3) F12 Counter 9/(Counter 1+Counter 3) F13 Counter 10/(Counter 1+Counter 3) F14 Counter 11/(Counter 1+Counter 3)
Where the value of F1 is determined by Counter 12 and the threshold θ, if the value in Counter 12 is greater than 1/2 θ, then F1 for that record is marked as 1.
CN202011003470.5A 2020-09-22 2020-09-22 Rapid identification method for Tor network bridge in high-speed backbone network Active CN112235254B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011003470.5A CN112235254B (en) 2020-09-22 2020-09-22 Rapid identification method for Tor network bridge in high-speed backbone network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011003470.5A CN112235254B (en) 2020-09-22 2020-09-22 Rapid identification method for Tor network bridge in high-speed backbone network

Publications (2)

Publication Number Publication Date
CN112235254A true CN112235254A (en) 2021-01-15
CN112235254B CN112235254B (en) 2023-03-24

Family

ID=74107316

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011003470.5A Active CN112235254B (en) 2020-09-22 2020-09-22 Rapid identification method for Tor network bridge in high-speed backbone network

Country Status (1)

Country Link
CN (1) CN112235254B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113283498A (en) * 2021-05-21 2021-08-20 东南大学 VPN flow rapid identification method facing high-speed network
CN115002045A (en) * 2022-07-19 2022-09-02 中国电子科技集团公司第三十研究所 Twin network-based dark website session identification method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108768883A (en) * 2018-05-18 2018-11-06 新华三信息安全技术有限公司 A kind of network flow identification method and device
CN109873793A (en) * 2017-12-04 2019-06-11 北京明朝万达科技股份有限公司 A kind of darknet discovery, source tracing method and system based on sample flow analysis
CN109936578A (en) * 2019-03-21 2019-06-25 西安电子科技大学 The detection method of HTTPS tunnel traffic in a kind of network-oriented
CN109951444A (en) * 2019-01-29 2019-06-28 中国科学院信息工程研究所 A kind of encryption Anonymizing networks method for recognizing flux
CN110460502A (en) * 2019-09-10 2019-11-15 西安电子科技大学 Application rs traffic recognition methods under VPN based on distribution characteristics random forest
CN110519298A (en) * 2019-09-19 2019-11-29 北京丁牛科技有限公司 A kind of Tor method for recognizing flux and device based on machine learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109873793A (en) * 2017-12-04 2019-06-11 北京明朝万达科技股份有限公司 A kind of darknet discovery, source tracing method and system based on sample flow analysis
CN108768883A (en) * 2018-05-18 2018-11-06 新华三信息安全技术有限公司 A kind of network flow identification method and device
CN109951444A (en) * 2019-01-29 2019-06-28 中国科学院信息工程研究所 A kind of encryption Anonymizing networks method for recognizing flux
CN109936578A (en) * 2019-03-21 2019-06-25 西安电子科技大学 The detection method of HTTPS tunnel traffic in a kind of network-oriented
CN110460502A (en) * 2019-09-10 2019-11-15 西安电子科技大学 Application rs traffic recognition methods under VPN based on distribution characteristics random forest
CN110519298A (en) * 2019-09-19 2019-11-29 北京丁牛科技有限公司 A kind of Tor method for recognizing flux and device based on machine learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ZHENZHEN CAI等: "isAnon: Flow-Based Anonymity Network Traffic Identification Using Extreme Gradient Boosting", 《2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)》 *
潘逸涵等: "基于深度学习的Tor流量识别方法", 《通信技术》 *
马陈城等: "基于深度神经网络burst特征分析的网站指纹攻击方法", 《计算机研究与发展》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113283498A (en) * 2021-05-21 2021-08-20 东南大学 VPN flow rapid identification method facing high-speed network
CN115002045A (en) * 2022-07-19 2022-09-02 中国电子科技集团公司第三十研究所 Twin network-based dark website session identification method and system

Also Published As

Publication number Publication date
CN112235254B (en) 2023-03-24

Similar Documents

Publication Publication Date Title
CN107665191B (en) Private protocol message format inference method based on extended prefix tree
US8494985B1 (en) System and method for using network application signatures based on modified term transition state machine
CN111565205B (en) Network attack identification method and device, computer equipment and storage medium
US8577817B1 (en) System and method for using network application signatures based on term transition state machine
CN114143037B (en) Malicious encrypted channel detection method based on process behavior analysis
CN112235254B (en) Rapid identification method for Tor network bridge in high-speed backbone network
CN107222511A (en) Detection method and device, computer installation and the readable storage medium storing program for executing of Malware
CN113489619A (en) Network topology inference method and device based on time series analysis
Gu et al. Realtime Encrypted Traffic Identification using Machine Learning.
Feng et al. BotFlowMon: Learning-based, content-agnostic identification of social bot traffic flows
CN113938290B (en) Website de-anonymization method and system for user side flow data analysis
CN113518073B (en) Method for rapidly identifying bit currency mining botnet flow
CN108199878B (en) Personal identification information identification system and method in high-performance IP network
Oudah et al. A novel features set for internet traffic classification using burstiness
CN114024748B (en) Efficient Ethernet traffic identification method combining active node library and machine learning
Zhou et al. Classification of botnet families based on features self-learning under network traffic censorship
CN110032596B (en) Method and system for identifying abnormal traffic user
Tang et al. Intelligent Awareness of Delay-Sensitive Internet Traffic in Digital Twin Network
CN111654479A (en) Flooding attack detection method based on random forest and XGboost
Ma et al. Study of information network traffic identification based on c4. 5 algorithm
Long et al. Botnet Detection Based on Flow Summary and Graph Sampling with Machine Learning
CN110689074A (en) Feature selection method based on fuzzy set feature entropy value calculation
Roeling et al. Stochastic block models as an unsupervised approach to detect botnet-infected clusters in networked data
CN117527446B (en) Network abnormal flow refined detection method
Liu et al. Video traffic identification with a distribution distance-based feature selection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant