CN114024748B - Efficient Ethernet traffic identification method combining active node library and machine learning - Google Patents
Efficient Ethernet traffic identification method combining active node library and machine learning Download PDFInfo
- Publication number
- CN114024748B CN114024748B CN202111302612.2A CN202111302612A CN114024748B CN 114024748 B CN114024748 B CN 114024748B CN 202111302612 A CN202111302612 A CN 202111302612A CN 114024748 B CN114024748 B CN 114024748B
- Authority
- CN
- China
- Prior art keywords
- ethernet
- traffic
- active node
- node library
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000010801 machine learning Methods 0.000 title claims abstract description 24
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 13
- 238000012549 training Methods 0.000 claims abstract description 11
- 238000012544 monitoring process Methods 0.000 claims abstract description 5
- 238000012216 screening Methods 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 5
- 238000007637 random forest analysis Methods 0.000 claims description 5
- 238000002474 experimental method Methods 0.000 claims description 2
- 238000007477 logistic regression Methods 0.000 claims description 2
- 238000012706 support-vector machine Methods 0.000 claims description 2
- 239000013589 supplement Substances 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 3
- 238000011161 development Methods 0.000 description 9
- 238000001514 detection method Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000001914 filtration Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000005406 washing Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 238000013524 data verification Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2441—Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2483—Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/02—Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
- H04L63/0227—Filtering policies
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1433—Vulnerability analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/16—Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computer Hardware Design (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention provides a high-efficiency Ethernet flow identification method combining an active node library and machine learning, which is divided into four parts, wherein the first part is the structure of the active node library; the second part is training of the recognition model, the third part is performing comparison analysis by using different machine learning algorithms, and selecting a model which is obtained after training of the machine learning algorithm and is most suitable for classification as the recognition model; the fourth part is the Ethernet traffic identification, and the specific content is that the traffic is divided into TCP and UDP traffic after being screened by the active node library and is input into an identification model for identification, and meanwhile, the node information in the Ethernet active node library is updated according to the identification result. The invention can effectively identify the Ethernet flow existing in the current network, and the accuracy of the monitoring effect reaches 99%. The method is convenient for a network manager to monitor the Ethernet network traffic.
Description
Technical Field
The invention belongs to the technical field of network space safety, and relates to a high-efficiency Ethernet flow identification method combining active node library and machine learning.
Background
Blockchains are a distributed ledger technique maintained jointly by multiple parties, ensuring transport and access security through cryptography. The method can realize the functions of consistent storage of data in the account book, difficult tampering, repudiation prevention and the like. The blockchain technology provides a new solution for further solving trust problems, security problems and efficiency problems in the Internet, and also brings new opportunities and challenges for the development of industries such as finance and the like.
After the block chain technology is first proposed by the Zhongben. Various blockchain industries, such as bitcoin, ethernet, etc., including encrypted digital currency, are rapidly developing. According to the statistics of China electronic information industry development institute, the domestic blockchain industry scale in 2020 reaches 48.5 hundred million yuan, and the growth rate reaches 48.5 percent compared with the last year. With the rapid development of the whole industrial scale, potential safety supervision problems in blockchains are also exposed. Firstly, the blockchain digital currency provides a safe and stable money washing way for crimes such as money washing, luxoviruses and the like, and the development of dark nets and black products is promoted to a great extent; secondly, the blockchain digital currency enables the money transfer across the national border to be simpler, and influences the stability of financial markets of various countries; finally, due to the fact that the blockchain is decentralised and cannot be tampered, the blockchain is often used for storing and spreading sensitive information, and health of a network ecological environment is seriously affected. The abuse of blockchain not only jeopardizes national security and social stability, but also brings great threat and challenge to network security supervision.
As a representative application in the blockchain, the bitcoin implements blockchain application development with a scripting engine. This also makes bitcoin limited by the expressive power of the scripting language, difficult to maintain complex contract development, and therefore its performance is greatly limited; the Ethernet (Ethereum) abstracts the blockchain system into a transaction-based state machine on the basis of an Ethernet virtual machine (EVM, ethereum Virtual Machine), and supports recording arbitrary information and executing arbitrary functions by using a complete programming language of the figure. In the 23 rd-stage global public chain technical evaluation index issued by China electronic information industry development institute, the Ethernet stands for the first place in the applicability evaluation of 37 public chains. Compared with other blockchain implementation schemes such as bitcoin, the Ethernet can better support the blockchain distributed application development, and has higher research value and research significance.
However, not symmetrical to the rapid development of the blockchain industry is the lag of blockchain supervision technology. The existing research on the blockchain security problem is mostly aimed at exploring the blockchain technology, such as a blockchain attack mode, a blockchain design vulnerability, a blockchain application direction and the like, and the analysis on the blockchain security problem on the network traffic supervision level is lacking. And ethernet is the most applicable blockchain platform, and as blockchain technology matures, the ethernet will be developed. The Ethernet network traffic is measured and analyzed, and the Ethernet safety supervision scheme is explored, so that the Ethernet network traffic monitoring method has important significance for Ethernet network safety and even block chain network safety.
Therefore, the invention gathers the Ethernet traffic in the network by constructing the active Ethernet nodes in the network. Traffic is then divided into TCP and UDP traffic to respectively correspond to the identification features. And (5) completing the identification and the distinction of the normal flow and the Ethernet flow by using a random forest algorithm.
Disclosure of Invention
In order to effectively monitor the Ethernet and realize the identification of the Ethernet traffic, the invention provides a high-efficiency Ethernet traffic identification method combining active node library and machine learning. Aiming at the problem of concealment of the traffic characteristics of the Ethernet, a high-efficiency traffic identification method of the Ethernet is provided by combining an active node library and machine learning. According to the method, an active node library is initialized by using an Ethernet core node library according to the inherent 'small world' characteristic of the Ethernet. Then constructing an active node library based on the core node library, wherein the active node library comprises Ethernet nodes in an active state; then, respectively extracting corresponding characteristics aiming at a UDP-based Ethernet node discovery process and a TCP-based Ethernet data transmission process, and further identifying the Ethernet flow by a machine learning method; and finally, combining the selected characteristics and the model generated by training, filtering the flow through an active node library, and inputting the flow into the identification model to finish the identification of the Ethernet flow. In order to achieve the above purpose, the present invention provides the following technical solutions:
An efficient ethernet traffic identification method combining active node library and machine learning, comprising the following steps:
(1) Based on the assumption that the total number of the Ethernet nodes in the supervision area tends to converge, the active node stores the Ethernet node information in the current area. And collecting Ethernet core node information, initializing an active node library and acquiring Ethernet traffic.
(2) The flow characteristics of the Ethernet UDP flow and the TCP flow are selected respectively, and corresponding flow characteristics are extracted for the Ethernet NDP protocol and RLPx characteristics to be supplemented.
(3) The accurate identification of the Ethernet flow is realized by a machine learning method, and a data set is constructed to test and evaluate the obtained model.
(4) Based on the constructed active node library and the acquired identification model, identifying the Ethernet flow input and correspondingly updating the active node library;
Further, the step (1) specifically includes the following sub-steps:
(1.1) acquiring all the currently disclosed Ethernet core node information through a web crawler, and storing the information in a core node library in the form of an IP address;
(1.2) initializing an active node library according to the collected information of the core node library;
(1.3) dynamically updating the known node information through the information of the nodes in the active node library, so as to obtain the information of the Ethernet nodes in the whole supervision area;
(1.4) setting an expiration time, eliminating inactive Ethernet nodes in a long-time active node library, ensuring timeliness of the active node library and improving efficiency of flow screening;
(1.5) modifying the means NodeFinder for detecting an ethernet node to communicate with the detected ethernet node;
(1.7) capturing ethernet traffic on the intermediate router.
Further, the step (2) specifically includes the following sub-steps:
(2.1) reflecting the characteristic correlation according to the mutual information, and respectively selecting the first 10 characteristics with the highest mutual information value in each of the Ethernet UDP flow and the TCP flow;
(2.2) analyzing the Ethernet UDP flow data packet structure to obtain UDP flow characteristics;
(2.3) analyze RPLx the encryption handshake ENCHANDSHAKE procedure of the protocol to obtain the ethernet TCP traffic characteristics.
Further characteristics of ethernet TCP and UDP traffic we selected are shown in table 1, table 2 below:
Further, the step (3) specifically includes the following sub-steps:
(3.1) combining the acquired Ethernet traffic with various application traffic in the public data set VPN-nonVPN to form a data set ETI required by an experiment;
(3.2) data set was prepared according to 8:2 into training and test sets, four machine learning algorithms are used: support vector machine, random forest, logistic regression, and K-nearest neighbor evaluate the method used from multiple indices.
Further, the step (4) specifically includes the following sub-steps:
(4.1) screening unidentified traffic through an active node library, inputting an identification model, and outputting whether the traffic is Ethernet traffic or not;
and (4.2) updating node information of the Ethernet active node library according to the identification result.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) The invention can effectively identify the Ethernet flow existing in the current network, and the accuracy of the monitoring effect reaches 99%. The method is convenient for a network manager to monitor the Ethernet network traffic.
(2) According to the invention, TCP and UDP flows are separated, and data packet structure analysis and other works are respectively carried out on the flows, so that the characteristics most suitable for classification are obtained, and the accuracy of monitoring is effectively improved by combining the use judgment of mutual information.
(3) The invention constructs the Ethernet active node library, and screens out potential Ethernet traffic through the active node library. Compared with the flow detection method without filtering by the active node library, the indexes such as detection accuracy and precision are improved by 3% on average. The time consumed for detecting the Ethernet flow of the same spline number is less than 50% of the time consumed by the flow detection method without being filtered by the active node library.
(4) The method for screening the traffic through the Ethernet active node library can effectively avoid negative influence on the identification performance.
Drawings
FIG. 1 is a schematic diagram of an experimental environment setup;
FIG. 2 is a schematic diagram of an identification framework;
FIG. 3 is a diagram showing performance of different machine learning algorithms on various performance indicators before and after screening using an active node library on identification of UDP flows;
FIG. 4 is a schematic representation of different machine learning algorithms on various performance indicators before and after screening using an active node library on the identification of UDP flows;
Fig. 5 identifies a time-consuming schematic, where (a) UDP traffic identification efficiency is schematic and (b) TCP traffic identification efficiency is schematic.
Detailed Description
The technical scheme provided by the present invention will be described in detail with reference to the following specific examples, and it should be understood that the following specific examples are only for illustrating the present invention and are not intended to limit the scope of the present invention.
Example 1: the invention provides a high-efficiency Ethernet traffic identification method combining an active node library and machine learning, wherein an identification framework is shown in figure 2 and is divided into four parts, the first part is a structure of the active node library, and the specific content is that the active node library is initialized by storing information of core nodes ensuring stable operation of the Ethernet in the core node library, so that operations such as searching, adding, deleting and the like of the active nodes are completed, and the active node library is constructed. Then, a flow collection unit is deployed through an active node library to collect the Ethernet flow; the second part is training of an identification model, and specifically comprises the steps of dividing the Ethernet traffic into TCP and UDP traffic, respectively carrying out correlation analysis on the data packet structure of the Ethernet traffic, acquiring the Ethernet traffic identification characteristics most suitable for classification through actual data verification, and simultaneously carrying out characteristic screening work by using a measurement unit of mutual information. After the selection of the characteristics is completed, the Ethernet flow and the background flow acquired before are used as data sets and are divided into training set test sets; the third part is to use different machine learning algorithms for comparison analysis, and select a model which is obtained after training the machine learning algorithm most suitable for classification as an identification model; the fourth part is the Ethernet traffic identification, and the specific content is that the traffic is divided into TCP and UDP traffic after being screened by the active node library and is input into an identification model for identification.
Specifically, the method for rapidly identifying the flow of the bit coin dug botnet comprises the following steps:
(1) Constructing an active node library, and building an experimental environment to collect relevant Ethernet traffic.
The specific process of the step is as follows:
(1.1) acquiring all the currently disclosed Ethernet core node information by using a web crawler, and storing the information in a core node library in the form of an IP address;
Initializing an active node library by using the information acquired by the core node library, continuously searching the currently existing Ethernet active nodes, and dynamically updating the node information of the active node library;
(1.3) setting an expiration time for each active node, eliminating the Ethernet nodes which are not active for a long time, and ensuring the timeliness of an active node library;
(1.4) modifying the means NodeFinder for detecting an ethernet node to communicate with the detected ethernet node;
(1.5) capturing ethernet traffic on the intermediate router by means of Wireshark software;
(1.6) various application traffic in the public data set VPN-nonVPN is employed as background flow.
(2) Dividing the original Ethernet traffic into TCP and UDP traffic, analyzing the data packet structure of the two traffic to extract the characteristics which can be used for complete traffic data identification and classification, using mutual information and carrying out characteristic selection, and reserving the characteristics which can be used for recording identification and classification.
The specific process in the step is as follows:
(2.1) dividing the original Ethernet traffic into TCP and UDP traffic;
(2.2) screening of features using mutual information indicators was performed based on the common 80 traffic statistics proposed by Draper et al. And respectively selecting the first ten features with highest mutual information of TCP and UDP flows.
(2.3) Analyzing the Ethernet UDP flow data packet structure, wherein the lengths of the data packets in the UDP flow have strict sequence relation, and the lengths of the data packets of each type have different and stable distribution. The length of the first eight packets in the UDP stream is extracted as a feature.
(2.3) Extracted 18 features of Ethernet UDP traffic have feature names and corresponding meanings as
Table 3 shows
(2.4) Analyzing the ethernet TCP traffic interaction procedure, it is found that the ethernet TCP stream will contain a number of packets with equal payload lengths. The payload of the packet carrying the header is typically a packet combination of 32B, 1B and 12B. Characterized by the average length of the load of two packets in the encryption handshake phase and the proportion of packets with load lengths of 32B, 1B and 12B in the total packets.
(2.5) Extracted feature names of 12 features of Ethernet TCP traffic and corresponding meanings of the features are as follows
Table 4 shows
(3) After the feature selection is completed, the ethernet flow and the background flow acquired before are used as data sets and are divided into training set test sets. And (3) performing comparison analysis by using different machine learning algorithms, and selecting a model which is obtained after training by the machine learning algorithm most suitable for classification as an identification model.
The specific process in the step is as follows:
(3.1) constructing an ethernet traffic data set using the data collected in step (1), and setting the data set to 8: the ratio of 2 is divided into a training set and a test set. And selecting a random forest algorithm with highest accuracy by comparing parameters such as accuracy of the algorithm models such as random forest, K neighbor, naive Bayes and the like. And meanwhile, the identification effect before and after the flow is screened by comparing with the method using the active node library. The accuracy of identification after screening by the active node library method is improved by 3% compared with the prior method, and the specific analysis results are shown in fig. 3 and 4.
And (3.2) performing time-consuming evaluation of the combination of the active node library and the machine learning identification method Ethernet traffic identification and the traditional method detection. Compared with the traditional detection method which is time-consuming, the method for identifying the Ethernet flow by combining the active node library and the machine learning identification method has the advantages that the time consumption is reduced by more than 50%, and the specific analysis result is shown in fig. 5.
(4) And dividing the traffic into TCP and UDP traffic after being screened by the active node library, inputting the TCP and UDP traffic into an identification model for identification, and updating the information of the Ethernet active node library according to the identification traffic result.
The method specifically comprises the following steps:
And (4.1) inputting the IP address of the source and destination of the traffic extraction to be detected into an active node library, and judging whether the active node library contains the IP address.
And (4.2) if the IP address is contained, dividing the traffic into TCP and UDP traffic, lifting the relevant characteristics, and then respectively putting the TCP and UDP traffic into the identification model obtained in the step (3) for judgment and identification.
(4.3) If the traffic is identified as the Ethernet traffic and neither source nor destination IP is in the active node library, adding relevant IP address information to the active node library as a new active node.
(4.4) Setting an active time for the active node, and if the node which does not respond beyond the active time, deleting the node from the active node library.
The technical means disclosed by the scheme of the invention is not limited to the technical means disclosed by the embodiment, and also comprises the technical scheme formed by any combination of the technical features. It should be noted that modifications and adaptations to the invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.
Claims (1)
1. The efficient Ethernet traffic identification method combining active node library and machine learning is characterized by comprising the following steps:
(1) Based on the assumption that the total number of the Ethernet nodes in the monitoring area tends to be converged, storing the Ethernet node information in the current area by using the active node library, and collecting the Ethernet core node information to initialize the active node library so as to acquire the Ethernet traffic;
(2) Respectively selecting flow characteristics of the Ethernet UDP flow and the TCP flow, and extracting corresponding flow characteristics as supplement aiming at the characteristics of the Ethernet NDP protocol and RLPx;
(3) The accurate identification of the Ethernet flow is realized by a machine learning method, a data set is constructed, and the obtained model is tested and evaluated;
(4) Based on the constructed active node library and the acquired identification model, the Ethernet traffic input is identified,
Step (1) collects the information of the Ethernet core nodes to initialize the active node library, and obtains the Ethernet flow; the method specifically comprises the following substeps:
(1.1) acquiring all the currently disclosed Ethernet core node information through a web crawler, and storing the information in a core node library in the form of an IP address;
(1.2) initializing an active node library according to the collected information of the core node library;
(1.3) dynamically updating the known node information through the information of the nodes in the active node library, so as to obtain the information of the Ethernet nodes in the whole supervision area;
(1.4) setting an expiration time, eliminating the inactive Ethernet nodes in the long-time active node library,
(1.5) Modifying the means NodeFinder for detecting an ethernet node to communicate with the detected ethernet node;
(1.7) capturing ethernet traffic on the intermediate router;
Wherein, the step (2) specifically comprises the following sub-steps:
(2.1) analyzing the Ethernet UDP flow data packet structure to obtain UDP flow characteristics;
(2.2) analyzing the encryption handshake process of RPLx protocols to obtain the characteristics of the Ethernet TCP flow;
(2.3) reflecting the correlation of the features according to the mutual information, respectively selecting the first 10 features with the highest mutual information value in the Ethernet UDP flow and the TCP flow, and adding the related features obtained in the steps (2.1) and (2.2) as the last selected feature;
wherein, the step (3) specifically comprises the following sub-steps:
(3.1) combining the acquired Ethernet traffic with various application traffic in the public data set VPN-nonVPN to form a data set required by an experiment;
(3.2) the dataset was written with 8:2 into training and test sets, four machine learning algorithms are used: the method comprises the steps of evaluating a used method from a plurality of indexes by a support vector machine, a random forest, logistic regression and K nearest neighbor;
the step (4) specifically comprises the following sub-steps:
(4.1) screening unidentified traffic through an active node library, inputting an identification model, outputting whether the traffic is Ethernet traffic or not,
And (4.2) updating node information of the Ethernet active node library according to the identification result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111302612.2A CN114024748B (en) | 2021-11-04 | 2021-11-04 | Efficient Ethernet traffic identification method combining active node library and machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111302612.2A CN114024748B (en) | 2021-11-04 | 2021-11-04 | Efficient Ethernet traffic identification method combining active node library and machine learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114024748A CN114024748A (en) | 2022-02-08 |
CN114024748B true CN114024748B (en) | 2024-04-30 |
Family
ID=80061397
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111302612.2A Active CN114024748B (en) | 2021-11-04 | 2021-11-04 | Efficient Ethernet traffic identification method combining active node library and machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114024748B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115442291B (en) * | 2022-08-19 | 2024-09-03 | 南京理工大学 | Active network topology sensing method for Ethernet |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102315974A (en) * | 2011-10-17 | 2012-01-11 | 北京邮电大学 | Stratification characteristic analysis-based method and apparatus thereof for on-line identification for TCP, UDP flows |
CN111082995A (en) * | 2019-12-25 | 2020-04-28 | 中国科学院信息工程研究所 | Ethernet workshop network behavior analysis method, corresponding storage medium and electronic device |
CN111865823A (en) * | 2020-06-24 | 2020-10-30 | 东南大学 | Light-weight Ether house encrypted flow identification method |
CN112910918A (en) * | 2021-02-26 | 2021-06-04 | 南方电网科学研究院有限责任公司 | Industrial control network DDoS attack traffic detection method and device based on random forest |
CN113344562A (en) * | 2021-08-09 | 2021-09-03 | 四川大学 | Method and device for detecting Etheng phishing accounts based on deep neural network |
CN113469275A (en) * | 2021-07-21 | 2021-10-01 | 东南大学 | Refined classification method for ether house behavior traffic |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200311583A1 (en) * | 2019-04-01 | 2020-10-01 | Hewlett Packard Enterprise Development Lp | System and methods for fault tolerance in decentralized model building for machine learning using blockchain |
-
2021
- 2021-11-04 CN CN202111302612.2A patent/CN114024748B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102315974A (en) * | 2011-10-17 | 2012-01-11 | 北京邮电大学 | Stratification characteristic analysis-based method and apparatus thereof for on-line identification for TCP, UDP flows |
CN111082995A (en) * | 2019-12-25 | 2020-04-28 | 中国科学院信息工程研究所 | Ethernet workshop network behavior analysis method, corresponding storage medium and electronic device |
CN111865823A (en) * | 2020-06-24 | 2020-10-30 | 东南大学 | Light-weight Ether house encrypted flow identification method |
CN112910918A (en) * | 2021-02-26 | 2021-06-04 | 南方电网科学研究院有限责任公司 | Industrial control network DDoS attack traffic detection method and device based on random forest |
CN113469275A (en) * | 2021-07-21 | 2021-10-01 | 东南大学 | Refined classification method for ether house behavior traffic |
CN113344562A (en) * | 2021-08-09 | 2021-09-03 | 四川大学 | Method and device for detecting Etheng phishing accounts based on deep neural network |
Non-Patent Citations (1)
Title |
---|
基于活跃节点库的以太坊加密流量识别方法;胡晓艳等;网络空间安全;20200825;第11卷(第8期);第34-39页 * |
Also Published As
Publication number | Publication date |
---|---|
CN114024748A (en) | 2022-02-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113344562B (en) | Method and device for detecting Etheng phishing accounts based on deep neural network | |
CN109117634A (en) | Malware detection method and system based on network flow multi-view integration | |
CN102420723A (en) | Anomaly detection method for various kinds of intrusion | |
CN103944887B (en) | Intrusion event detection method based on hidden conditional random fields | |
CN108965248A (en) | A kind of P2P Botnet detection system and method based on flow analysis | |
CN113821793B (en) | Multi-stage attack scene construction method and system based on graph convolution neural network | |
CN115277102B (en) | Network attack detection method and device, electronic equipment and storage medium | |
Silva et al. | A statistical analysis of intrinsic bias of network security datasets for training machine learning mechanisms | |
CN114024748B (en) | Efficient Ethernet traffic identification method combining active node library and machine learning | |
CN114782051A (en) | Ether phishing account detection device and method based on multi-feature learning | |
CN117454376A (en) | Industrial Internet data security detection response and tracing method and device | |
Tan et al. | Ethereum fraud behavior detection based on graph neural networks | |
CN112235254B (en) | Rapid identification method for Tor network bridge in high-speed backbone network | |
Hammerschmidt et al. | Reliable machine learning for networking: Key issues and approaches | |
CN116208356B (en) | Virtual currency mining flow detection method based on deep learning | |
Zhou et al. | Classification of botnet families based on features self-learning under network traffic censorship | |
CN114722920A (en) | Deep map convolution model phishing account identification method based on map classification | |
Ampel et al. | Disrupting ransomware actors on the bitcoin blockchain: A graph embedding approach | |
CN107239704A (en) | Malicious web pages find method and device | |
Shanker et al. | Fss-part: Feature grouping subset model for predicting network attacks | |
Erokhin et al. | The Dataset Features Selection for Detecting and Classifying Network Attacks | |
Wang et al. | Detecting Ethereum Phishing Scams with Temporal Motif Features of Subgraph | |
Difaizi et al. | URL Based Malicious Activity Detection Using Machine Learning | |
Yuan et al. | MultiNetAD: Multiplex Network-Based Anomaly Access Detection Featuring Semantic Hierarchies | |
Ding et al. | Divide, Conquer, and Coalesce: Meta Parallel Graph Neural Network for IoT Intrusion Detection at Scale |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |