CN115001739A

CN115001739A - Random forest based transverse worm attack detection method

Info

Publication number: CN115001739A
Application number: CN202210409078.3A
Authority: CN
Inventors: 杨瑞瑞; 徐砚; 李立
Original assignee: China Electronic Technology Cyber Security Co Ltd
Current assignee: China Electronic Technology Cyber Security Co Ltd
Priority date: 2022-04-19
Filing date: 2022-04-19
Publication date: 2022-09-02

Abstract

The invention discloses a random forest based transverse worm attack detection method, which comprises the following steps: s1, generating a random forest model; s2, testing the random forest model; s3, storing the trained random forest model, analyzing the network traffic characteristics in real time, loading the random forest model for result prediction, giving an alarm if the prediction result belongs to the transverse worm attack, and discarding the traffic data if the traffic is normal traffic. The method can predict the transverse worm attack traffic information in the industrial control network in real time and provide a corresponding solution according to the alarm result.

Description

Random forest based transverse worm attack detection method

Technical Field

The invention relates to the technical field of worm attack detection, in particular to a transverse worm attack detection method based on random forest.

Background

With the rapid development of computer network technology, the application of the Internet is deep, and the threat to the computer system security and the network security is increasingly serious. The worm is an intelligent and automatic intrusion technology, can run without the intervention of a computer user, and can scan and attack a node host with system bugs on a network, and the worm is propagated from one node to another node through a local area network or the internet.

The worm has the characteristics of high propagation speed, wide coverage area, strong destructive power and the like, can occupy most system resources of an infected host to damage a target system, and can seize network bandwidth to cause serious network blockage and even paralysis of the whole network. Therefore, how to detect the network worm in real time so as to take timely measures to prevent and suppress the network worm is an important issue in the field of network security research.

Disclosure of Invention

Aiming at the defects in the prior art, the method for detecting the transverse worm attack based on the random forest solves the problem of how to detect the network worm in real time.

In order to achieve the purpose of the invention, the invention adopts the technical scheme that: a random forest based transverse worm attack detection method comprises the following steps:

s1, generating a random forest model;

s2, testing the random forest model;

s3, storing the trained random forest model, analyzing network flow characteristics in real time, loading the random forest model for result prediction, giving an alarm if the prediction result belongs to a transverse worm attack, and discarding the flow data if the prediction result is normal flow.

Further: the specific steps of step S1 are:

s11, downloading the MS-SQL Slammer worm attack network flow data as a negative sample, and the normal network flow data in the industrial control environment as a positive sample;

s12, extracting multi-dimensional features in the network flow data from the positive sample and the negative sample;

and S13, performing model training in the random forest by using 50 trees, and determining the final classification result by the category with the maximum statistical votes of all decision trees in the random forest.

Further: the multidimensional feature in step S12 includes a traffic quintuple: source ip, source port, destination ip, destination port, protocol type; the number of connections having the same source ip as the current connection, the number of connections having the same source ip and the same destination port as the current connection, the number of connections having the same source ip and the same destination port as the current connection, and the number of connections having the same source ip and the same destination port as the current connection are respectively the number of connections having the same source ip as the current connection, and the number of connections having the current source ip as the destination ip is respectively the number of connections having the same source ip as the current connection as the source ip as the current connection.

Further, the method comprises the following steps: and the multi-dimensional features select flow connection features within 10s for statistics.

Further: the classification result in the step S13 includes the transverse worm attack traffic data and the normal traffic data.

Further: and the random forest model uses the kini index as a standard for segmenting nodes.

Further: the formula of the segmentation node is as follows:

Ai＝i(N _L )-i(N _R )

in the above formula, i (N) is a Gini index, N is an unseparated node, N is _L To the left node after separation, N _R To separate the right node, W _i For class weight of class c samples, n _i For each sample number in a node, Δ i is the impurity reduction.

Further: the indexes tested in the step S2 include a detection rate and a false alarm rate, and the calculation formula is as follows:

the higher the detection rate, the lower the false alarm rate, and the better the model effect.

The invention has the beneficial effects that: the method comprises the steps of collecting network data of a mirror image port, carrying out protocol deep analysis and data preprocessing, generating formatted data, transmitting the formatted data to a message queue, loading a pre-trained random forest classification model to predict flow data in real time, and giving an alarm if a prediction result belongs to worm attack. The method can predict the transverse worm attack traffic information in the industrial control network in real time and provide a corresponding solution according to the alarm result.

Drawings

Fig. 1 is a working principle diagram of the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.

As shown in fig. 1, a method for detecting a lateral worm attack based on a random forest includes the following steps:

s1, generating a random forest model through a model training module;

(1) the training data source is as follows: downloading the MS-SQL Slammer worm attack network flow data as a negative sample, and normal network flow data in the industrial control environment as a positive sample;

(2) feature extraction: extracting multi-dimensional features in the network flow data, wherein the multi-dimensional features comprise the following features:

1) flow quintuple: source ip, source port, destination ip, destination port, protocol type;

2) the number of connections having the same source ip as the current connection, the number of connections having the same source ip and the same destination port as the current connection, the number of connections having the same source ip and the same destination port as the current connection as the source ip as the destination ip, and the number of connections having the same source ip as the destination ip as the current connection as the source ip as the current connection as the number of connections having the same source ip as the destination ip as the current connection as the source ip as the current connection. Generally, the multi-dimensional flow characteristics select flow connection characteristics within 10s for statistics.

(3) Model training: training 50 trees in the random forest, and determining a final classification result by the category with the most votes counted by all decision trees in the random forest (the classification result is divided into two categories, namely transverse worm attack traffic data and normal traffic data);

node segmentation: the method uses a Gini index (gini) as a standard of a segmentation node in a random forest model, and comprises the following formula:

Ai＝i(N _L )-i(N _R )

in the above formula, i (N) is a Gini index, N is an unseparated node, N is _L To the left node after separation, N _R To the rear right node, W _i For class weight of class c samples, n _i For each sample number in a node, Δ i is the impurity reduction.

S2, testing the random forest model through the model verification module;

a model verification module: when testing the model, the following two indexes are mainly adopted for evaluation:

the higher the detection rate is, the lower the false alarm rate is, and the better the model effect is.

S3, storing the trained random forest model through the model prediction module, analyzing network traffic characteristics in real time, loading the model to predict the result, giving an alarm if the predicted result belongs to a transverse worm attack, and discarding the traffic data if the predicted result is normal traffic.

The invention identifies the attack behavior of the transverse worm by analyzing, cleaning, extracting the characteristics and preprocessing the captured data packet in real time and identifying the attack behavior of the transverse worm by a random forest model, and the method has the following characteristics:

1) feature extraction: the model can extract multi-dimensional features aiming at worm attack from the traffic data;

2) characteristic processing: the model can process high-latitude data, only needs to extract features and does not need to select the features (the model is randomly selected according to feature importance), and the accuracy can be still maintained if a part of the features are lost;

3) model training: the model training speed is fast, and because the decision trees in the random forest are independent from one another, parallelization is easy to realize;

4) and (3) real-time alarming: the flow data can be predicted in real time, and the process from network flow preprocessing to alarm is completed within 10 s;

5) and (3) accurate alarm: the method can accurately predict the attack class of the transverse worm and give a detailed solution.

The experimental cases on the MS-SQL Slammer dataset are as follows:

the experimental data is based on an MS-SQL Slammer worm data set provided by Robert Beverly, and the proportion of a training set to a testing set is 7:3

Training set: normal data: 707390, respectively; worm attack data: 742448

And (3) test set: normal data: 303168, respectively; worm attack data: 318192

1) Random forest and NavieBayes algorithm model comparison

2) Random forest and SVM algorithm model comparison

Compared with two different algorithms, the detection rate and the false alarm rate of the random forest in the identification of the transverse worm attack are higher and lower than those of other algorithms, and the method can be used as an industrial control network transverse worm identification method.

Claims

1. A random forest based transverse worm attack detection method is characterized by comprising the following steps:

s1, generating a random forest model;

s2, testing the random forest model;

2. The method for detecting lateral worm attacks based on random forests as recited in claim 1, wherein the step S1 comprises the following steps:

s11, downloading the MS-SQL Slammer worm attack network traffic data as a negative sample, and using the normal network traffic data in the industrial control environment as a positive sample;

3. The method as claimed in claim 2, wherein the multidimensional feature in step S12 comprises a traffic quintuple: source ip, source port, destination ip, destination port, protocol type; the number of connections having the same source ip as the current connection, the number of connections having the same source ip and the same destination port as the current connection, the number of connections having the same source ip and the same destination port as the current connection as the source ip as the destination ip, and the number of connections having the same source ip as the destination ip as the current connection as the source ip as the current connection as the number of connections having the same source ip as the destination ip as the current connection as the source ip as the current connection.

4. A method as claimed in claim 2, wherein the multidimensional feature selects traffic junction features for statistics within 10 s.

5. The method for detecting lateral worm attacks based on random forest as claimed in claim 2, wherein the classification result in step S13 includes lateral worm attack traffic data and normal traffic data.

6. A method as claimed in claim 2, wherein the random forest model uses a kini index as a criterion for segmenting nodes.

7. The method of detecting a random forest-based lateral worm attack as recited in claim 6, wherein the formula of the segmentation nodes is:

Δi＝i(N _L )-i(N _R )

8. The method for detecting transverse worm attacks based on random forests as recited in claim 1, wherein the indexes tested in step S2 comprise a detection rate and a false alarm rate, and the calculation formula is as follows:

the higher the detection rate, the lower the false alarm rate and the better the model effect.