CN115001739A - Random forest based transverse worm attack detection method - Google Patents

Random forest based transverse worm attack detection method Download PDF

Info

Publication number
CN115001739A
CN115001739A CN202210409078.3A CN202210409078A CN115001739A CN 115001739 A CN115001739 A CN 115001739A CN 202210409078 A CN202210409078 A CN 202210409078A CN 115001739 A CN115001739 A CN 115001739A
Authority
CN
China
Prior art keywords
random forest
source
current connection
model
connections
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210409078.3A
Other languages
Chinese (zh)
Inventor
杨瑞瑞
徐砚
李立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Electronic Technology Cyber Security Co Ltd
Original Assignee
China Electronic Technology Cyber Security Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Electronic Technology Cyber Security Co Ltd filed Critical China Electronic Technology Cyber Security Co Ltd
Priority to CN202210409078.3A priority Critical patent/CN115001739A/en
Publication of CN115001739A publication Critical patent/CN115001739A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/145Countermeasures against malicious traffic the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a random forest based transverse worm attack detection method, which comprises the following steps: s1, generating a random forest model; s2, testing the random forest model; s3, storing the trained random forest model, analyzing the network traffic characteristics in real time, loading the random forest model for result prediction, giving an alarm if the prediction result belongs to the transverse worm attack, and discarding the traffic data if the traffic is normal traffic. The method can predict the transverse worm attack traffic information in the industrial control network in real time and provide a corresponding solution according to the alarm result.

Description

Random forest based transverse worm attack detection method
Technical Field
The invention relates to the technical field of worm attack detection, in particular to a transverse worm attack detection method based on random forest.
Background
With the rapid development of computer network technology, the application of the Internet is deep, and the threat to the computer system security and the network security is increasingly serious. The worm is an intelligent and automatic intrusion technology, can run without the intervention of a computer user, and can scan and attack a node host with system bugs on a network, and the worm is propagated from one node to another node through a local area network or the internet.
The worm has the characteristics of high propagation speed, wide coverage area, strong destructive power and the like, can occupy most system resources of an infected host to damage a target system, and can seize network bandwidth to cause serious network blockage and even paralysis of the whole network. Therefore, how to detect the network worm in real time so as to take timely measures to prevent and suppress the network worm is an important issue in the field of network security research.
Disclosure of Invention
Aiming at the defects in the prior art, the method for detecting the transverse worm attack based on the random forest solves the problem of how to detect the network worm in real time.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that: a random forest based transverse worm attack detection method comprises the following steps:
s1, generating a random forest model;
s2, testing the random forest model;
s3, storing the trained random forest model, analyzing network flow characteristics in real time, loading the random forest model for result prediction, giving an alarm if the prediction result belongs to a transverse worm attack, and discarding the flow data if the prediction result is normal flow.
Further: the specific steps of step S1 are:
s11, downloading the MS-SQL Slammer worm attack network flow data as a negative sample, and the normal network flow data in the industrial control environment as a positive sample;
s12, extracting multi-dimensional features in the network flow data from the positive sample and the negative sample;
and S13, performing model training in the random forest by using 50 trees, and determining the final classification result by the category with the maximum statistical votes of all decision trees in the random forest.
Further: the multidimensional feature in step S12 includes a traffic quintuple: source ip, source port, destination ip, destination port, protocol type; the number of connections having the same source ip as the current connection, the number of connections having the same source ip and the same destination port as the current connection, the number of connections having the same source ip and the same destination port as the current connection, and the number of connections having the same source ip and the same destination port as the current connection are respectively the number of connections having the same source ip as the current connection, and the number of connections having the current source ip as the destination ip is respectively the number of connections having the same source ip as the current connection as the source ip as the current connection.
Further, the method comprises the following steps: and the multi-dimensional features select flow connection features within 10s for statistics.
Further: the classification result in the step S13 includes the transverse worm attack traffic data and the normal traffic data.
Further: and the random forest model uses the kini index as a standard for segmenting nodes.
Further: the formula of the segmentation node is as follows:
Figure BDA0003603393490000021
Ai=i(N L )-i(N R )
in the above formula, i (N) is a Gini index, N is an unseparated node, N is L To the left node after separation, N R To separate the right node, W i For class weight of class c samples, n i For each sample number in a node, Δ i is the impurity reduction.
Further: the indexes tested in the step S2 include a detection rate and a false alarm rate, and the calculation formula is as follows:
Figure BDA0003603393490000031
Figure BDA0003603393490000032
the higher the detection rate, the lower the false alarm rate, and the better the model effect.
The invention has the beneficial effects that: the method comprises the steps of collecting network data of a mirror image port, carrying out protocol deep analysis and data preprocessing, generating formatted data, transmitting the formatted data to a message queue, loading a pre-trained random forest classification model to predict flow data in real time, and giving an alarm if a prediction result belongs to worm attack. The method can predict the transverse worm attack traffic information in the industrial control network in real time and provide a corresponding solution according to the alarm result.
Drawings
Fig. 1 is a working principle diagram of the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
As shown in fig. 1, a method for detecting a lateral worm attack based on a random forest includes the following steps:
s1, generating a random forest model through a model training module;
(1) the training data source is as follows: downloading the MS-SQL Slammer worm attack network flow data as a negative sample, and normal network flow data in the industrial control environment as a positive sample;
(2) feature extraction: extracting multi-dimensional features in the network flow data, wherein the multi-dimensional features comprise the following features:
1) flow quintuple: source ip, source port, destination ip, destination port, protocol type;
2) the number of connections having the same source ip as the current connection, the number of connections having the same source ip and the same destination port as the current connection, the number of connections having the same source ip and the same destination port as the current connection as the source ip as the destination ip, and the number of connections having the same source ip as the destination ip as the current connection as the source ip as the current connection as the number of connections having the same source ip as the destination ip as the current connection as the source ip as the current connection. Generally, the multi-dimensional flow characteristics select flow connection characteristics within 10s for statistics.
(3) Model training: training 50 trees in the random forest, and determining a final classification result by the category with the most votes counted by all decision trees in the random forest (the classification result is divided into two categories, namely transverse worm attack traffic data and normal traffic data);
node segmentation: the method uses a Gini index (gini) as a standard of a segmentation node in a random forest model, and comprises the following formula:
Figure BDA0003603393490000041
Ai=i(N L )-i(N R )
in the above formula, i (N) is a Gini index, N is an unseparated node, N is L To the left node after separation, N R To the rear right node, W i For class weight of class c samples, n i For each sample number in a node, Δ i is the impurity reduction.
S2, testing the random forest model through the model verification module;
a model verification module: when testing the model, the following two indexes are mainly adopted for evaluation:
Figure BDA0003603393490000042
Figure BDA0003603393490000043
the higher the detection rate is, the lower the false alarm rate is, and the better the model effect is.
S3, storing the trained random forest model through the model prediction module, analyzing network traffic characteristics in real time, loading the model to predict the result, giving an alarm if the predicted result belongs to a transverse worm attack, and discarding the traffic data if the predicted result is normal traffic.
The invention identifies the attack behavior of the transverse worm by analyzing, cleaning, extracting the characteristics and preprocessing the captured data packet in real time and identifying the attack behavior of the transverse worm by a random forest model, and the method has the following characteristics:
1) feature extraction: the model can extract multi-dimensional features aiming at worm attack from the traffic data;
2) characteristic processing: the model can process high-latitude data, only needs to extract features and does not need to select the features (the model is randomly selected according to feature importance), and the accuracy can be still maintained if a part of the features are lost;
3) model training: the model training speed is fast, and because the decision trees in the random forest are independent from one another, parallelization is easy to realize;
4) and (3) real-time alarming: the flow data can be predicted in real time, and the process from network flow preprocessing to alarm is completed within 10 s;
5) and (3) accurate alarm: the method can accurately predict the attack class of the transverse worm and give a detailed solution.
The experimental cases on the MS-SQL Slammer dataset are as follows:
the experimental data is based on an MS-SQL Slammer worm data set provided by Robert Beverly, and the proportion of a training set to a testing set is 7:3
Training set: normal data: 707390, respectively; worm attack data: 742448
And (3) test set: normal data: 303168, respectively; worm attack data: 318192
1) Random forest and NavieBayes algorithm model comparison
Figure BDA0003603393490000051
2) Random forest and SVM algorithm model comparison
Figure BDA0003603393490000061
Compared with two different algorithms, the detection rate and the false alarm rate of the random forest in the identification of the transverse worm attack are higher and lower than those of other algorithms, and the method can be used as an industrial control network transverse worm identification method.

Claims (8)

1. A random forest based transverse worm attack detection method is characterized by comprising the following steps:
s1, generating a random forest model;
s2, testing the random forest model;
s3, storing the trained random forest model, analyzing network flow characteristics in real time, loading the random forest model for result prediction, giving an alarm if the prediction result belongs to a transverse worm attack, and discarding the flow data if the prediction result is normal flow.
2. The method for detecting lateral worm attacks based on random forests as recited in claim 1, wherein the step S1 comprises the following steps:
s11, downloading the MS-SQL Slammer worm attack network traffic data as a negative sample, and using the normal network traffic data in the industrial control environment as a positive sample;
s12, extracting multi-dimensional features in the network flow data from the positive sample and the negative sample;
and S13, performing model training in the random forest by using 50 trees, and determining the final classification result by the category with the maximum statistical votes of all decision trees in the random forest.
3. The method as claimed in claim 2, wherein the multidimensional feature in step S12 comprises a traffic quintuple: source ip, source port, destination ip, destination port, protocol type; the number of connections having the same source ip as the current connection, the number of connections having the same source ip and the same destination port as the current connection, the number of connections having the same source ip and the same destination port as the current connection as the source ip as the destination ip, and the number of connections having the same source ip as the destination ip as the current connection as the source ip as the current connection as the number of connections having the same source ip as the destination ip as the current connection as the source ip as the current connection.
4. A method as claimed in claim 2, wherein the multidimensional feature selects traffic junction features for statistics within 10 s.
5. The method for detecting lateral worm attacks based on random forest as claimed in claim 2, wherein the classification result in step S13 includes lateral worm attack traffic data and normal traffic data.
6. A method as claimed in claim 2, wherein the random forest model uses a kini index as a criterion for segmenting nodes.
7. The method of detecting a random forest-based lateral worm attack as recited in claim 6, wherein the formula of the segmentation nodes is:
Figure FDA0003603393480000021
Δi=i(N L )-i(N R )
in the above formula, i (N) is a Gini index, N is an unseparated node, N is L To the left node after separation, N R To separate the right node, W i For class weight of class c samples, n i For each sample number in a node, Δ i is the impurity reduction.
8. The method for detecting transverse worm attacks based on random forests as recited in claim 1, wherein the indexes tested in step S2 comprise a detection rate and a false alarm rate, and the calculation formula is as follows:
Figure FDA0003603393480000022
Figure FDA0003603393480000023
the higher the detection rate, the lower the false alarm rate and the better the model effect.
CN202210409078.3A 2022-04-19 2022-04-19 Random forest based transverse worm attack detection method Pending CN115001739A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210409078.3A CN115001739A (en) 2022-04-19 2022-04-19 Random forest based transverse worm attack detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210409078.3A CN115001739A (en) 2022-04-19 2022-04-19 Random forest based transverse worm attack detection method

Publications (1)

Publication Number Publication Date
CN115001739A true CN115001739A (en) 2022-09-02

Family

ID=83023761

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210409078.3A Pending CN115001739A (en) 2022-04-19 2022-04-19 Random forest based transverse worm attack detection method

Country Status (1)

Country Link
CN (1) CN115001739A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110691073A (en) * 2019-09-19 2020-01-14 中国电子科技网络信息安全有限公司 Industrial control network brute force cracking flow detection method based on random forest
WO2020233259A1 (en) * 2019-07-12 2020-11-26 之江实验室 Multi-center mode random forest algorithm-based feature importance sorting system
CN112910918A (en) * 2021-02-26 2021-06-04 南方电网科学研究院有限责任公司 Industrial control network DDoS attack traffic detection method and device based on random forest
CN112953948A (en) * 2021-02-26 2021-06-11 南方电网科学研究院有限责任公司 Real-time network transverse worm attack flow detection method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020233259A1 (en) * 2019-07-12 2020-11-26 之江实验室 Multi-center mode random forest algorithm-based feature importance sorting system
CN110691073A (en) * 2019-09-19 2020-01-14 中国电子科技网络信息安全有限公司 Industrial control network brute force cracking flow detection method based on random forest
CN112910918A (en) * 2021-02-26 2021-06-04 南方电网科学研究院有限责任公司 Industrial control network DDoS attack traffic detection method and device based on random forest
CN112953948A (en) * 2021-02-26 2021-06-11 南方电网科学研究院有限责任公司 Real-time network transverse worm attack flow detection method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
祝鹏程;方勇;黄诚;刘强;: "基于TF-IDF和随机森林算法的Web攻击流量检测方法研究", 信息安全研究, no. 11, 5 November 2018 (2018-11-05) *

Similar Documents

Publication Publication Date Title
US20220368703A1 (en) Method and device for detecting security based on machine learning in combination with rule matching
CN112953924B (en) Network abnormal flow detection method, system, storage medium, terminal and application
WO2021088372A1 (en) Neural network-based ddos detection method and system in sdn network
CN111935170B (en) Network abnormal flow detection method, device and equipment
CN111107102A (en) Real-time network flow abnormity detection method based on big data
CN113645232B (en) Intelligent flow monitoring method, system and storage medium for industrial Internet
CN112738015A (en) Multi-step attack detection method based on interpretable convolutional neural network CNN and graph detection
CN112528277A (en) Hybrid intrusion detection method based on recurrent neural network
CN102420723A (en) Anomaly detection method for various kinds of intrusion
CN110392013A (en) A kind of Malware recognition methods, system and electronic equipment based on net flow assorted
CN111523588B (en) Method for classifying APT attack malicious software traffic based on improved LSTM
CN117081858B (en) Intrusion behavior detection method, system, equipment and medium based on multi-decision tree
CN114021135A (en) LDoS attack detection and defense method based on R-SAX
CN114531283B (en) Method, system, storage medium and terminal for measuring robustness of intrusion detection model
CN112953948A (en) Real-time network transverse worm attack flow detection method and device
CN117857088A (en) Network traffic abnormality detection method, system, equipment and medium
CN113660267A (en) Botnet detection system and method aiming at IoT environment and storage medium
CN112104628A (en) Adaptive feature rule matching real-time malicious flow detection method
CN116405261A (en) Malicious flow detection method, system and storage medium based on deep learning
CN111490976A (en) Dynamic baseline management and monitoring method for industrial control network
CN115001739A (en) Random forest based transverse worm attack detection method
CN111447169A (en) Method and system for identifying malicious webpage in real time on gateway
CN112929364B (en) Data leakage detection method and system based on ICMP tunnel analysis
CN114528909A (en) Unsupervised anomaly detection method based on flow log feature extraction
CN114745161B (en) Abnormal traffic detection method and device, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination