CN113422766A

CN113422766A - Network system security risk assessment method under DDoS attack

Info

Publication number: CN113422766A
Application number: CN202110680333.3A
Authority: CN
Inventors: 赵小林; 徐浩; 彭辉; 薛静锋; 单纯; 王琪瑶; 赵斌
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2021-06-18
Filing date: 2021-06-18
Publication date: 2021-09-21
Anticipated expiration: 2041-06-18
Also published as: CN113422766B

Abstract

The invention provides a network system security risk assessment method under DDoS attack, which can optimize a DDoS attack detection method, improve attack detection efficiency and accuracy, and simultaneously scientifically and effectively calculate a network system security risk value under DDoS attack. The method uses the symmetric positive definite characteristic of the processed covariance matrix to describe the changed covariance matrix characteristic space into a positive definite symmetric manifold, uses the Riemann mean value on the manifold in a safe state as a safety measurement base line, and uses the Riemann distance between the safety measurement base line and the Riemann mean value as a real-time network safety risk value. The behavioral risk accumulation coefficient based on the attack identification and the Riemann metric can be used to analyze changes in risk values caused by persistent attacks. Meanwhile, attack identification and Riemann measurement are combined, so that the fault tolerance rate of the model is improved, and the network behavior risk change can be better represented.

Description

Network system security risk assessment method under DDoS attack

Technical Field

The invention relates to the technical field of network security, in particular to a network system security risk assessment method under DDoS attack.

Background

DDoS attacks are one of the most prevalent network attacks today. DDoS attacks are the goal of an attacker consuming a server's resources by making a large number of legitimate requests to the server or by occupying the resources for a long period of time, resulting in denial of service. And moreover, the attack source is difficult to trace due to the adoption of the distributed client for attack. Such attacks are apparently reasonable network requests in use, increasing the difficulty of detecting attacks, assessing risk, and taking precautions. Therefore, attack detection and risk calculation for DDoS attacks are essential.

In the security risk assessment process based on the assessment system, there are always many human factor interventions, and the subjectivity is too strong, for example, in a DDOS effect assessment method based on a BP neural network (CN108900513B), a DDOS effect assessment method based on a BP neural network is described. The method carries out security level evaluation by establishing a DDoS attack effect evaluation index system and using a BP neural network. The method considers the visual angle of an attacker and the visual angle of a victim at the same time, but the data of the attacker cannot be acquired in a real environment, and meanwhile, a large number of human factors are set for the evaluation grade in the method, so that the method has strong subjectivity.

In the security risk assessment method based on the logical reasoning image attack graph and the countermeasure game, no human factors exist, but the problem of state explosion exists, and the calculation complexity increases along with the node number index.

Therefore, in the existing risk assessment method, the DDoS attack detection accuracy rate and the calculation efficiency cannot be obtained at the same time, and the attack strength is difficult to give.

Disclosure of Invention

In view of this, the invention provides a network system security risk assessment method under DDoS attack, which can optimize a DDoS attack detection method, improve attack detection efficiency and accuracy, and simultaneously scientifically and effectively calculate a network system security risk value under DDoS attack.

In order to achieve the above object, the method for evaluating the security risk of the network system under DDoS attack of the present invention comprises the following steps:

acquiring flow characteristic data;

aiming at the flow characteristic data, performing characteristic selection by adopting an index screening algorithm based on a recursive characteristic elimination method, and selecting d indexes with highest importance to obtain the flow data represented by the d characteristics;

describing the flow characteristics of the flow data represented by the d characteristics within a period of time by using a covariance matrix to obtain a network flow covariance matrix; wherein, for the security state represented by the network traffic covariance matrix, a new label is used, a voting mode is adopted to determine the new label, and the label with the most number in a plurality of traffic is used as the label of the covariance matrix data:

carrying out DDoS attack identification on the network traffic covariance matrix in a limit tree mode;

adding a forward disturbance on the diagonal line of the network flow covariance matrix, converting the network flow covariance matrix into a positive definite symmetric matrix, and taking a stretched symmetric positive definite manifold as a feature space of network security features;

in the feature space of the network security features, a Riemann mean value in a security state is used as a security measurement baseline, and a Riemann distance from the security measurement baseline is used as a real-time network security risk value;

calculating a comprehensive risk value; wherein the comprehensive risk value at the time t is R_t＝(1+η_t)Risk_tWherein, Risk_tIs the real-time risk value at time t;

wherein

The real-time risk value at the ith moment is shown, i is s … t-1; representing the risk accumulation factor starting from the detection of some attack at time s and continuing until time t.

Before DDoS attack identification is carried out on the network flow covariance matrix in a limit tree mode, the network flow covariance matrix is reduced firstly, and the matrix is reduced from dimension d multiplied by dimension d to dimension d

Before feature selection is carried out by adopting an index screening algorithm based on a recursive feature elimination method, data cleaning and data dimensionless processing are carried out on the flow feature data.

Wherein, the data dimensionless processing method is mean value standardization.

The specific use method of the limit tree is as follows:

assuming that the limit tree is composed of K decision trees and the feature space F of the sample set X is composed of n features, each decision tree is constructed using the sample set X, including the following sub-steps:

step 41, randomly selecting m features to form a feature space, wherein m is less than n;

step 42, randomly selecting a feature to split when splitting the node;

step 43, dividing the data set into different subsets according to the selection characteristics to enable the classification effect to be the best;

step 44, recursion K times of steps 42-43, and finishing the construction of the decision tree;

and step 45, obtaining a classification label for each decision tree for an unknown sample, and obtaining a final classification result for the plurality of classification labels by adopting a voting method.

Wherein, the state point x_iThe Riemann distance to the network security metric baseline is:

wherein, δ representsThe computational function of the riemann geodesic on the SPD manifold,

is a Riemann mean representing a baseline of the security metric.

Has the advantages that:

aiming at the problems of low identification accuracy and general efficiency of DDoS attack, the invention provides a network flow characteristic space conversion method based on a covariance matrix. The basic characteristics of the network flow are mapped to a higher-dimensional characteristic space by using the covariance matrix, the accidental influence caused by single flow is reduced by using the method, the characteristic space is enriched, the network security state is represented by the overall characteristics of the network flow, and the DDoS attack identification precision is high.

Aiming at the problem of how to carry out quantitative evaluation on the network security risk, the invention utilizes the characteristic of positive symmetry of the processed covariance matrix to describe the feature space of the covariance matrix after change into a positive definite symmetrical manifold, which is a Riemann manifold with the characteristics of a plum cluster. The network security risk quantitative evaluation is accurate, and the calculation efficiency is high.

Aiming at the problems that the DDoS attack has strong attack persistence and the security risk can be continuously increased, the invention provides a behavior risk accumulation coefficient based on attack identification and Riemann measurement, which can be used for analyzing the change of the risk value caused by the persistent attack. Meanwhile, attack identification and Riemann measurement are combined, so that the fault tolerance rate of the model is improved, and the network behavior risk change can be better represented.

Drawings

FIG. 1 is a schematic overall flow chart of the present invention.

Detailed Description

The invention is described in detail below by way of example with reference to the accompanying drawings.

The invention describes the network security state by utilizing the covariance matrix, and can objectively and accurately describe the network system security state, thereby improving the DDoS attack identification effect. Network security risks are calculated by utilizing Riemann distances on positive definite symmetric manifold, and network system security risk values under DDoS attacks are calculated scientifically and effectively by utilizing accumulated risks to calculate the risk size brought by continuous DDoS attacks. The invention optimizes the DDoS attack detection method, improves the attack detection efficiency and accuracy rate aiming at the problem of various network attack and defense states, and simultaneously scientifically and effectively calculates the network system security risk value under the DDoS attack.

The overall flow diagram of the invention is shown in fig. 1, and specifically comprises the following steps:

step 1, acquiring flow characteristic data, and then preprocessing the flow characteristic data to obtain processed data;

the specific way of acquiring the flow characteristic data is as follows:

collecting network flow data, capturing the flow data flowing through an evaluation target system by a packet capturing tool (such as wireshark), and analyzing a data packet to obtain network flow characteristic data;

the flow characteristic data is preprocessed by two steps of data cleaning and data dimensionless processing, and the steps are as follows:

the first step is as follows: and (6) data cleaning. Mainly "dirty" data that may have a poor impact on data processing is scrubbed. The typical dirty data and processing is mainly of the type described in table 1.

TABLE 1 common dirty data types and processing modes

In addition, the data cleaning process can also make a preliminary feature selection: if most values of a certain characteristic are in an illogical condition, the index is determined not to be completely collected and cannot be used for data analysis, and the characteristic is deleted; if some data is collected in a manner that some features directly reflect the data label, for example, when attack data is collected, each attack attacks against a specific IP, the direct classification by taking the IP results in very high accuracy of the results, but actually, the results are over-fitted and cannot be applied to common data, and the feature list also needs to be deleted.

The second step is that: and carrying out data dimensionless processing.

Because the dimensions of different features are different, the change range of the value of part of the features may be larger, and the change range of the other part of the features is smaller, so that the features with large change values dominate the whole analysis process, and the accuracy of data processing is affected. Therefore, it is necessary to reduce the influence of the dimension as much as possible by a non-dimensionalization method. Common non-dimensionalization processing methods include Max-Min normalization, z-Score normalization, mean normalization, and maximum absolute value normalization.

The invention selects the mean value standardization, takes the characteristic mean value as the standard, takes the value of each characteristic as the multiple of the characteristic mean value, and unifies the dimension. The mean value standardization can furthest reserve the distribution and change characteristics of each characteristic on the premise of de-dimensionalization.

And 2, selecting the features of the processed data by adopting an index screening algorithm based on a recursive feature elimination method, and selecting d indexes with the highest importance to obtain the flow data represented by the d features.

Specifically, the principle of the index screening algorithm based on the recursive feature elimination method is to continuously construct a classification model by using different feature combinations, select the feature with the minimum weight each time, eliminate the feature, train with the remaining features, and iterate repeatedly until the number of the remaining features is the expected number. The index screening algorithm based on the recursive feature elimination method has no requirement on a specifically used classifier, and can use a tree classifier such as a decision tree or a random forest and a classification model such as logistic regression or SVM.

Assuming that the feature space F of the sample set X consists of n features, the goal is to select m features for subsequent processing. The specific method comprises the steps of firstly, taking n features in a feature space F as dividing features, constructing a classifier, calculating the weight of each feature, and selecting the feature with the minimum weight to delete from the feature space F; and then, taking the newly obtained features in the feature space as the division features, and iteratively deleting the minimum weight features until the number of the remaining features is m.

And 3, describing the flow characteristics of the flow data characterized by the d characteristics in a period of time by using a covariance matrix.

Because of the large scale and long duration of DDoS attacks, it is not necessary to know whether a single piece of data is a DDoS attack or not, but whether a certain period of time is a DDoS attack or not, and an automated DDoS attack usually means that the flow data of the attack initiated by the automated DDoS attack has certain characteristics, which may be determined by the relative relationship among various characteristics of the flow. Therefore, the present invention uses the covariance matrix to describe the traffic characteristics over a period of time, which is as follows:

for the flow data characterized by d features, the d features form a feature space X_d＝{x₁，x₂，...，x_dLet x assume that n pieces of flow data are used to construct a covariance matrix_ikRepresents a feature x_kThe value at flow i. The covariance between any two features can be calculated using the following equation:

the covariance matrix of the set of traffic data is a d x d dimensional matrix whose diagonal is the variance of each feature and whose i rows and j columns have values of x features_iAnd x_jThe covariance of (a) is:

obviously, the covariance matrix

The element of the diagonal on Σ is the variance of each variable, reflecting the degree of dispersion of each variable, while the element on the non-diagonal is the covariance between the different variables, reflecting the overall error and correlation between any two variables.

For a matrix formed by n pieces of d-dimensional data, the covariance matrix is a d x d matrix, and the value in the matrix reflects the correlation of two features. The matrix reflects the characteristics of the correlation between the indexes in the set of data. This can shift from focusing on the performance of each feature itself on the attack to focusing on the problem of correlation between features on the characterization of the attack.

A new label is used for the security state characterized by the generated covariance matrix. The invention determines a new label by adopting a voting mode, uses the label with the most number in a plurality of flows as the label of the covariance matrix data:

where k is the number of data pieces,

the purpose of this formula is to find c that maximizes y as the label for the new data.

The invention uses the covariance matrix to express the network security state, and maps the basic characteristics of the network flow to a higher dimension, thereby forming the covariance characteristics among the basic indexes reflecting the network security state from the overall view, enriching the characteristic space, and improving the DDoS attack identification efficiency and accuracy.

Step 4, DDoS attack identification is carried out, including dimension reduction and attack identification;

the dimensionality reduction is specifically:

the covariance matrix itself is characterized as a non-negatively determined symmetric matrix. The symmetric matrix is characterized in that the elements in the matrix are symmetric about the diagonal, so that the covariance matrix can be obtainedReducing the difference matrix from the size of d x d to

The attack recognition is specifically as follows:

DDoS attack identification is carried out by adopting a mode of an extreme tree (ExtraTrees). The ExtraTrees improves the identification accuracy of the decision tree through the integration of the decision tree and the idea of integrated learning. The specific use method of the limit tree is as follows:

suppose the limit tree consists of K decision trees and the feature space F of the sample set X consists of n features.

Constructing each decision tree using the sample set X, comprising the sub-steps of:

step 41, randomly selecting m features to form a feature space;

step 42, randomly selecting a feature to split when splitting the node;

step 44, recursion step 42-step 43, completing the construction of decision tree;

And 5, adding a positive disturbance epsilon to a diagonal line of the network flow covariance matrix, converting the covariance matrix into a positive definite symmetric matrix, and taking a stretched symmetric positive definite manifold as a feature space of network security features while ensuring that the influence on data is minimum as much as possible.

And (3) proving that: the covariance matrix can be converted to a symmetric positive definite matrix by adding a positive perturbation.

Let n random variables X_i(1. ltoreq. i.gtoreq.n) and the mathematical expectation for each random variable is μ_i＝E(X_i)

The covariance matrix is then:

at the moment, a positive disturbance epsilon >0 is added to diagonal elements of the covariance matrix, and the disturbance takes a smaller value to ensure that the covariance matrix property of the original data is not influenced as much as possible, so that a new covariance matrix sigma' is generated.

To prove positive-definite for sigma', it is only necessary to prove that y is y for any non-0 vector₁，y₂，...，y_n]^THaving y of^T∑′y＞0。

Due to the fact that

The result of this calculation is a 1 × n matrix, for the kth element of the matrix:

thus, it is possible to provide

To obtain

Let random variable

Then there is

Obviously, y^TSigma' y has natural non-negativity, while for any non-zero vector y, by adding a positive perturbation epsilon, there is y^TΣ 'y >0, i.e., Σ' is positive.

The basis of the step is as follows:

the network flow covariance matrix has symmetrical non-negative character, and can be made into a symmetrical positive definite matrix by processing the network flow covariance matrix. Therefore, the covariance matrix is converted into the positive definite symmetric matrix by performing forward disturbance processing on the covariance matrix, and the expanded symmetric positive definite manifold is used as the feature space of the network security feature. The symmetric positive definite manifold is a Riemann manifold having a lie group structure formed by a symmetric positive definite matrix of d × d, and is denoted as

A manifold. For a symmetrical positive definite manifold

The Riemann distance between the midpoints of the manifold is different according to different selected Riemann measures, and a common Riemann distance calculation method

There are four types:

affine invariant measure is

Wherein | x | is the norm of the matrix;

logarithmic Euclidean measure of

Stein divergence of

Wherein det (×) is the determinant of the matrix;

jeffrey divergence of

Wherein, Tr (×) refers to the trace of the matrix;

wherein X and Y are two symmetric positive definite matrixes respectively.

Correspondingly, four Riemann mean values of positive definite symmetrical manifold can be obtained, wherein m is the number of points for calculating the Riemann mean value.

For affine invariant measurement, the riemann mean value can be obtained by iterative calculation of a formula:

for logarithmic Euclidean measurement, the Riemann mean value can be obtained by calculating a formula as follows:

for the Stein divergence, the Riemann mean value can be obtained by iterative calculation of a formula as follows:

for Jeffrey divergence, the Riemann mean can be calculated by the formula:

wherein the content of the first and second substances,

in the manifold space constructed by the network state, the network behavior causes the change of the network state, and the generation of a behavior path is caused. When a network attack behavior occurs, a network state is influenced, the generated state changes, behavior change from a normal state to an abnormal state occurs in a network state manifold space, and the representation of the change process on differential geometry is a network behavior path. The magnitude of the behavior risk generated by the network attack behavior can be calculated through the distance between the network state manifold space and the normal state.

Therefore, the invention uses the feature of symmetrical positive determination of the processed covariance matrix to describe the feature space of the covariance matrix after change into a positive determined symmetrical manifold, which is a riemann manifold with the characteristics of a lie group, each network state is a point on the symmetrical positive determined manifold, the transformation from one network state to another network state is the homoembryo mapping from the point on the symmetrical positive determined manifold to the point, the transformation of a series of network states caused by a series of behaviors forms a behavior path on the symmetrical positive determined manifold, the distance between the state points where the behaviors start and stop is the role of the network behaviors, and the size of the distance reflects the size of the risk of the network behaviors. The riemann distance calculation on the symmetrical positive definite manifold can then be used to reflect the riemann distance between different network states, the magnitude of which reflects the magnitude of the network risk value, so that the riemann distance can be used to calculate the network security risk.

And 6, in the feature space of the network security features obtained in the step 5, using the Riemann mean value in the security state as a security measurement baseline, and using the Riemann distance between the Riemann mean value and the security measurement baseline as a real-time network security risk value. For the state point x_iThe calculation of the risk value is specifically:

and using the Riemann mean value of the network flow covariance matrix in the safe state as a baseline of the network safety risk measurement, and calculating the distance between the covariance matrix of the state point and the baseline covariance matrix as the network safety risk value at the moment. State point x_iThe Riemann distance to the network security metric baseline is:

where δ represents the calculated function of the Riemannian geodesic on the SPD manifold,

is a Riemann mean representing a baseline of the security metric.

When calculating the real-time network security risk value, firstly, a network security measurement baseline is determined, a commonly used security measurement method outputs a security risk value through some data models according to the network security characteristics, and then a network security reference value is determined manually according to experience or data of a security state, and the reference value is actually the baseline of the network security measurement. The invention provides a new method for representing a safety measurement baseline, which defines the safety measurement baseline as a certain point in a network state manifold space as a new safety measurement baseline.

Step 7, considering that the security risk of DDoS attack is an accumulated process, and the security risk is also continuously accumulated in the attack continuous process, the invention proposes to use a risk accumulation coefficient η to characterize the security accumulation process, and the calculation mode of the risk accumulation coefficient at the time t is as follows:

wherein

The real-time risk value at the ith time is shown.

Representing the risk accumulation factor starting from the detection of some attack at time s and continuing until time t.

the integrated risk value at time t is R_t＝(1+η_t)Risk_tWherein, Risk_tIs the real-time risk value at time t.

In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A network system security risk assessment method under DDoS attack is characterized by comprising the following steps:

acquiring flow characteristic data;

wherein

2. The method of claim 1, which isIs characterized in that before DDoS attack recognition is carried out on the network flow covariance matrix in a limit tree mode, the network flow covariance matrix is reduced firstly, and the matrix is reduced from dimension d multiplied by dimension d to dimension d

3. The method of claim 1 or 2, wherein the flow characteristic data is subjected to data cleaning and data dimensionless processing before feature selection using an index screening algorithm based on a recursive feature elimination method.

4. The method of claim 3, wherein the data dimensionless processing method is mean normalization.

5. The method of claim 1, 2 or 4, wherein the limit tree is used in the following way:

step 42, randomly selecting a feature to split when splitting the node;

6. The method of claim 3, wherein the limit tree is used in the following way:

step 42, randomly selecting a feature to split when splitting the node;

7. Method according to claim 1, 2, 4 or 6, characterized in that a state point x_iThe Riemann distance to the network security metric baseline is:

is a Riemann mean representing a baseline of the security metric.

8. The method of claim 5, wherein state point x_iThe Riemann distance to the network security metric baseline is:

is a Riemann mean representing a baseline of the security metric.