CN109951468B

CN109951468B - Network attack detection method and system based on F value optimization

Info

Publication number: CN109951468B
Application number: CN201910183415.XA
Authority: CN
Inventors: 高跃; 王楠; 赵曦滨; 万海
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2019-03-12
Filing date: 2019-03-12
Publication date: 2020-08-28
Anticipated expiration: 2039-03-12
Also published as: CN109951468A

Abstract

The application discloses a network attack detection method and system based on F value optimization, wherein the method comprises the following steps: step 1, calculating a wrong score cost value corresponding to received network data according to an F value calculation model, and generating a cost value matrix, wherein the network data comprises marked network data and unmarked network data; step 2, constructing a hypergraph corresponding to the network data according to the network data; step 3, calculating a prediction class mark corresponding to the network data according to the cost value matrix and the hypergraph; and 4, detecting the unmarked network data in the network data according to the prediction class marks. According to the technical scheme, the detection rate of the unbalanced data is improved, the F value evaluation index is used for optimizing the misclassification cost value, and the accuracy and reliability of network abnormal data detection are improved.

Description

Network attack detection method and system based on F value optimization

Technical Field

The application relates to the technical field of network anomaly detection, in particular to a network attack detection method based on F value optimization and a network attack detection system based on F value optimization.

Background

With the rapid development of network technology, network attack events also occur frequently, and in the face of increasing data traffic, how to efficiently and accurately detect abnormal traffic contained therein becomes more important, and because the traffic in the network follows numerous protocol types, and contains a large amount of different types of data and the data have serious unbalance, how to balance the detection rate and the accuracy of unbalanced data, improve the detection rate of the system for different network abnormal data, and how to efficiently and accurately detect abnormal data information is very important. The current method for anomaly detection mainly aims to improve the accuracy of detection, but not to reduce the comprehensive cost of detection.

In the prior art, the main challenges of network traffic anomaly detection are as follows:

1) the problem of serious imbalance of different types of data in data flow is solved, and the detection rate of all types of data is difficult to improve simultaneously;

2) it is difficult to build high-order data association between the flows and mine complex association between the data.

Disclosure of Invention

The purpose of this application lies in: the F value measurement index with better detection performance aiming at the unbalanced data is used for replacing the accuracy to optimize the wrong score cost value, the detection rate of the unbalanced data is improved to the maximum extent by using the F value measurement index, and the accuracy and the reliability of the network abnormal data detection are improved.

The technical scheme of the first aspect of the application is as follows: a network attack detection method based on F value optimization is provided, and the method comprises the following steps: step 1, calculating a wrong-scoring cost value corresponding to received network data according to an F value calculation model, and generating a cost value matrix, wherein the network data comprises marked network data and unmarked network data, and a calculation formula of the wrong-scoring cost value is as follows:

in the formula, F_βFormula for F value calculation for two classes, mcF_βCalculating a formula for the F values of multiple classifications, wherein r is a distribution parameter, and β is an adjusting parameter;

step 2, constructing a hypergraph corresponding to the network data according to the network data; step 3, calculating a prediction class mark corresponding to the network data according to the cost value matrix and the hypergraph; and 4, detecting the unmarked network data in the network data according to the prediction class marks.

In any of the above technical solutions, further, the distribution parameter r is a uniform distribution parameter sequentially selected from [0.2,0.4,0.6,0.8], and the value of the adjustment parameter β is 1.

In any one of the above technical solutions, further, in step 3, specifically including: step 31, according to the hypergraph, performing Laplace regularization transformation to generate a type matrix; step 32, constructing a cost-sensitive hypergraph learning model according to the type matrix and the cost value matrix; and step 33, calculating a prediction class mark F according to the cost-sensitive hypergraph learning model.

In any one of the above technical solutions, further, in step 4, specifically including: step 41, detecting a prediction type mark according to the marked network data in the network data to generate a detection score; step 42, selecting the prediction class mark with the highest detection score and marking the prediction class mark as an abnormal data detection model; and 43, detecting unmarked network data in the network data according to the abnormal data detection model.

The technical scheme of the second aspect of the application is as follows: provided is a network attack detection system based on F value optimization, which comprises: the device comprises a generating unit, a constructing unit, a calculating unit and a detecting unit; the generating unit is used for calculating a wrong-scoring cost value corresponding to the received network data according to the F value calculation model and generating a cost value matrix, wherein the network data comprises marked network data and unmarked network data, and a calculation formula of the wrong-scoring cost value is as follows:

the construction unit is used for constructing a hypergraph corresponding to the network data according to the network data; the calculation unit is used for calculating a prediction class mark corresponding to the network data according to the cost value matrix and the hypergraph; the detection unit is used for detecting the unmarked network data in the network data according to the prediction class mark.

In any one of the above technical solutions, further, the calculating unit specifically includes: the device comprises a generating module, a constructing module and a calculating module; the generation module is used for carrying out Laplace regularization transformation according to the hypergraph to generate a type matrix; the construction module is used for constructing a cost-sensitive hypergraph learning model according to the type matrix and the cost value matrix; and the calculation module is used for calculating the prediction class mark F according to the cost-sensitive hypergraph learning model.

In any one of the above technical solutions, further, the detecting unit specifically includes: the system comprises a scoring module, a marking module and a detection module; the scoring module is used for detecting the prediction type mark according to the marked network data in the network data to generate a detection score; the marking module is used for selecting the prediction class mark with the highest detection score and marking the prediction class mark as an abnormal data detection model; the detection module is used for detecting the unmarked network data in the network data according to the abnormal data detection model.

The beneficial effect of this application is: by utilizing the F value model, the misclassification cost value of the received network data is calculated, and the F value evaluation index is utilized to optimize the misclassification cost value, so that the problem of unbalanced detection rate of different types of network data is avoided, and the detection rate of unbalanced data is optimized. And then, by constructing a hypergraph of the received network data, the relevance among the network data is optimized, the accuracy of the prediction class mark of the network data is improved, the network data is detected according to the prediction class mark, and the accuracy and the reliability of the detection of the abnormal network data are improved.

Drawings

The advantages of the above and/or additional aspects of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic flow chart diagram of a network attack detection method based on F-value optimization according to one embodiment of the present application;

fig. 2 is a schematic block diagram of a network attack detection system based on F-value optimization according to an embodiment of the present application.

Detailed Description

In order that the above objects, features and advantages of the present application can be more clearly understood, the present application will be described in further detail with reference to the accompanying drawings and detailed description. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, however, the present application may be practiced in other ways than those described herein, and therefore the scope of the present application is not limited by the specific embodiments disclosed below.

The first embodiment is as follows:

as shown in fig. 1, the present embodiment provides a network attack detection method based on F value optimization, which includes:

step 1, calculating a wrong score cost value corresponding to received network data according to an F value calculation model, and generating a cost value matrix, wherein the network data comprises marked network data and unmarked network data, and a calculation formula of the wrong score cost value is as follows:

in the second classification, the first classification is carried out,

the method comprises the following steps of a plurality of categories,

in the formula, F_βCost value calculation formula for two classes, mcF_βFor the cost value calculation formula of multi-classification, β is the adjustment parameter, r is [0.2,0.4,0.6,0.8]]One distribution parameter is sequentially selected in the sequence.

For binary type network data, when the received network data belongs to the first classification (class1), the corresponding calculation formula of the false score cost value is:

when the received network data belongs to the second category (class2), the corresponding calculation formula of the false score cost value is:

for multi-classification type network data, according to the type of the received network data, calculating formula

The selection is made and is not described in detail here.

Preferably, the distribution parameter r is a uniform distribution parameter sequentially selected from [0.2,0.4,0.6,0.8], and the value of the adjustment parameter β is 1.

Specifically, in the network data detection process, for any network data, the category to which the network data belongs may be two categories or may also be multiple categories, and accordingly, the F value calculation model includes an F value calculation formula of the two categories and an F value calculation formula of the multiple categories.

Setting a marginal probability P for k classes (k classes) of network data_kThe probability of misclassification into k classes is FP_k(h) The probability of misclassification into other classes is FN_k(h) Therefore, its corresponding misallocation location e (h) can be expressed as:

e(h)＝(FN₁(h),FP₁(h),…,FN_k(h),FP_k(h),…,FN_L(h),FP_L(h))，

i.e. e of the misallocation positions e (h)_2k-1The bit label is FN_k(h) E th, e_2kBit label FP_k(h) In the formula, h is a classifier.

When the network data is binary data, the corresponding F value calculation formula is as follows:

when the network data is the multi-classification data, the corresponding F value calculation formula is as follows:

according to different classification data types of the network data, the F value and the misclassification cost value corresponding to the network data can be obtained by selecting corresponding calculation formulas and classifying the network data into two or more classes. Setting the value range of the F value as [0,1 ]]Defining a series of uniformly distributed distribution parameters r according to the value range of the F value_iE.g. [0.2,0.4,0.6,0.8]]. For each distribution parameterNumber r_iThe corresponding misclassification cost value can be calculated by using the calculation formula of the misclassification cost value, and a cost value matrix gamma ∈ R is generated^n*nThe cost value matrix gamma is a diagonal matrix, and n is the total number of network data.

For network data belonging to the second category, according to the distribution parameter r_iAnd the generated cost value matrix gamma is as follows:

for network data belonging to multiple categories, according to distribution parameter r_iAnd the generated cost value matrix gamma is as follows:

step 2, constructing a hypergraph corresponding to the network data according to the network data;

specifically, a hypergraph is constructed by using a star-type expansion method, the hypergraph structure can be described as G ═ V; E; W), wherein received network data are taken as points V of the hypergraph, the connection relation corresponding to each network data is a hyperedge E, the weight value of each hyperedge is a weight W, the connection relation of the hypergraph is described by adopting an H matrix, and the calculation formula of the H matrix is as follows:

in the formula, v_centralIs the central point of the hyper-map,

is the average of the distances between the points in the hypergraph,

d(v_i,v_central) Is a super edge e_pUpper point v_iAnd a center point v_centralA is an adjustment parameter, and in the present embodiment, the adjustment parameter a is 0.05.

Step 3, calculating a prediction class mark corresponding to the network data according to the cost value matrix and the hypergraph;

in the step 3, the method specifically comprises the following steps:

step 31, according to the hypergraph, performing laplacian regularization transformation to generate a type matrix F (v)_i,m)；

Specifically, the hypergraph is subjected to laplacian regularization transformation, and the corresponding calculation formula is as follows:

(e)＝∑_v∈Vh(v,e)，

d(v)＝∑_e∈Ew(e)h(v,e)，

wherein W (e) is the weight of the excess edge e, F (v)_iM) is node v_iType matrix of, indicating node v_iWhether the represented network data belongs to the m-th class, F (v)_iM) is 1, represents a node v_iBelongs to the m-th category, and if 0, represents the node v_iNot belonging to the m-th class, (e) is the degree of the super edge e, and all the degrees of the super edge e form a diagonal matrix D_eD (v) is the degree of the node v, and the degrees of all the nodes v form a focusing matrix D_vIt is possible to set:

therefore, the calculation formula corresponding to the laplacian regularization transform can be written as:

Ω＝F(v_i,m)^TΔF(v_i,m)。

step 32, constructing a cost-sensitive hypergraph learning model according to the type matrix and the cost value matrix;

and step 33, calculating the prediction class index F according to the cost-sensitive hypergraph learning model.

Specifically, the type matrix F (v) corresponding to the hypergraph can be obtained by laplace regularization_iM) according to the type matrix F (v)_iM) and a cost value matrix gamma, constructing a cost-sensitive hypergraph learning model and correspondingly calculatingThe calculation formula is as follows:

wherein Y is a known label value matrix of the marked network data in the network data, the dimension of the known label value matrix is n x m, n is the total number of the network data, m is the number of all categories, for the known label value matrix Y, the corresponding category is marked with 1, other m-1 positions are marked with 0, for the unmarked network data, all bits are marked with 0,

for optimizing the regularization expression of the hypergraph structure, gamma, mu and lambda are regulating parameters, and N is_eIs the amount of data.

In the process of optimizing the calculation formula of the cost-sensitive hypergraph learning model, because the optimization of the formula is convex, the optimization can be carried out by utilizing an alternative optimization strategy.

First, W is fixed to optimize W, and the formula can be written as:

the partial derivative of w can be obtained:

secondly, fixing W, optimizing W, the formula can be written as:

the partial derivative of W can be obtained:

in the formula (I), the compound is shown in the specification,

is an identity matrix.

And through repeated iteration, reducing objective function values, and optimizing to obtain a corresponding prediction type mark F:

F＝Xw。

and 4, detecting the unmarked network data in the network data according to the prediction class mark.

In the step 4, the method specifically comprises the following steps:

step 41, detecting the prediction type mark according to the marked network data in the network data to generate a detection score;

step 42, selecting the prediction class mark with the highest detection score and marking the prediction class mark as an abnormal data detection model;

and 43, detecting the unmarked network data in the network data according to the abnormal data detection model.

Specifically, a plurality of prediction class labels F can be obtained by using the obtained plurality of cost value matrices γ, the obtained plurality of prediction class labels F are detected by using the marked network data in the received network data, the marked network data is detected according to the prediction class labels F and compared with the known label value matrix Y to generate corresponding detection scores, then the prediction class labels F are sorted according to the high-low sequence of the detection scores, the prediction class label with the highest score is selected and marked as an abnormal data detection model, the unmarked network data in the received network data is detected by using the selected abnormal data detection model, and whether the unmarked network data is the network attack data or not is judged.

Preferably, step 4 specifically includes:

step 401, detecting the prediction class label according to the marked network data in the network data, and generating a detection score;

step 402, selecting the prediction class labels with the same number as the preset number according to the sequence of the detection scores from large to small, fusing the selected prediction class labels by adopting a fusion algorithm, and marking the fusion result as an abnormal data detection model;

step 403, detecting the unmarked network data in the network data according to the abnormal data detection model.

Example two:

as shown in fig. 2, the present embodiment provides a network attack detection system 100 based on F value optimization, which includes: a generating unit 101, a constructing unit 102, a calculating unit 103 and a detecting unit 104; the generating unit 101 is configured to calculate, according to the F value calculation model, an incorrect score cost value corresponding to the received network data, and generate a cost value matrix, where the network data includes marked network data and unmarked network data, and a calculation formula of the incorrect score cost value is:

The selection is made and is not described in detail here.

e(h)＝(FN₁(h),FP₁(h),…,FN_k(h),FP_k(h),…,FN_L(h),FP_L(h))，

according to different classification data types of the network data, the F value and the misclassification cost value corresponding to the network data can be obtained by selecting corresponding calculation formulas and classifying the network data into two or more classes. Setting the value range of the F value as [0,1 ]]Defining a series of uniformly distributed distribution parameters r according to the value range of the F value_iE.g. [0.2,0.4,0.6,0.8]]. For each distribution parameter r_iThe corresponding misclassification cost value can be calculated by using the calculation formula of the misclassification cost value, and a cost value matrix gamma ∈ R is generated^n*nThe cost value matrix gamma is a diagonal matrix, and n is the total number of network data.

the construction unit 102 is configured to construct a hypergraph corresponding to the network data according to the network data;

in the formula, v_centralIs the central point of the hyper-map,

is the average of the distances between the points in the hypergraph,

The calculating unit 103 is configured to calculate a prediction class label corresponding to the network data according to the cost value matrix and the hypergraph;

further, the calculating unit 103 specifically includes: the device comprises a generating module, a constructing module and a calculating module; the generation module is used for carrying out Laplace regularization transformation according to the hypergraph to generate a type matrix;

(e)＝∑_v∈Vh(v,e)，

d(v)＝∑_e∈Ew(e)h(v,e)，

Ω＝F(v_i,m)^TΔF(v_i,m)。

the construction module is used for constructing a cost-sensitive hypergraph learning model according to the type matrix and the cost value matrix; and the calculation module is used for calculating the prediction class mark F according to the cost-sensitive hypergraph learning model.

Specifically, the type matrix F (v) corresponding to the hypergraph can be obtained by laplace regularization_iM) according to the type matrix F (v)_iM) and a cost value matrix gamma, constructing a cost sensitive hypergraph learning model, wherein the corresponding calculation formula is as follows:

First, W is fixed to optimize W, and the formula can be written as:

the partial derivative of w can be obtained:

secondly, fixing W, optimizing W, the formula can be written as:

the partial derivative of W can be obtained:

in the formula (I), the compound is shown in the specification,

is an identity matrix.

F＝Xw。

the detecting unit 104 is configured to detect unlabeled network data in the network data according to the prediction class label.

Further, the detecting unit 104 specifically includes: the system comprises a scoring module, a marking module and a detection module; the scoring module is used for detecting the prediction type mark according to the marked network data in the network data to generate a detection score; the marking module is used for selecting the prediction class mark with the highest detection score and marking the prediction class mark as an abnormal data detection model; the detection module is used for detecting the unmarked network data in the network data according to the abnormal data detection model.

Preferably, the detecting unit 104 specifically includes: the system comprises a score generation module, a fusion module and an abnormality detection module; the score generation module is used for detecting the prediction type mark according to the marked network data in the network data to generate a detection score; the fusion module is used for selecting the prediction class labels with the same number as the preset number according to the sequence of the detection scores from large to small, fusing the selected prediction class labels by adopting a fusion algorithm, and recording the fusion result as an abnormal data detection model; and the anomaly detection module is used for detecting the unmarked network data in the network data according to the anomaly data detection model.

The technical scheme of the present application is described in detail above with reference to the accompanying drawings, and the present application provides a network attack detection method and system based on F value optimization, wherein the method includes: step 1, calculating a wrong score cost value corresponding to received network data according to an F value calculation model, and generating a cost value matrix, wherein the network data comprises marked network data and unmarked network data; step 2, constructing a hypergraph corresponding to the network data according to the network data; step 3, calculating a prediction class mark corresponding to the network data according to the cost value matrix and the hypergraph; and 4, detecting the unmarked network data in the network data according to the prediction class marks. According to the technical scheme, the detection rate of the unbalanced data is improved, the F value evaluation index is used for optimizing the misclassification cost value, and the accuracy and reliability of network abnormal data detection are improved.

The steps in the present application may be sequentially adjusted, combined, and subtracted according to actual requirements.

The units in the device can be merged, divided and deleted according to actual requirements.

Although the present application has been disclosed in detail with reference to the accompanying drawings, it is to be understood that such description is merely illustrative and not restrictive of the application of the present application. The scope of the present application is defined by the appended claims and may include various modifications, adaptations, and equivalents of the invention without departing from the scope and spirit of the application.

Claims

1. A network attack detection method based on F value optimization is characterized by comprising the following steps:

step 1, calculating the misclassification cost value corresponding to the received network data according to an F value calculation model, wherein the F value calculation model comprises a two-classification F value calculation formula and a multi-classification F value calculation formula, and the two-classification F value calculation formula is as follows:

the calculation formula of the F value of the multi-classification is as follows:

in the formula, P₁Marginal probability of network data of class1, e₁、e₂、e_2k-1The 1 st, 2 nd and 2k-1 st bits in e (h) are allocated for error respectively,

the network data comprises marked network data and unmarked network data, and the calculation formula of the misclassification cost value is as follows:

in the formula, F_βFormula for F value calculation for two classes, mcF_βFor the calculation formula of F values of multiple classifications, r is a uniformly distributed distribution parameter, β is a regulation parameter,

aiming at each uniformly distributed distribution parameter R, calculating a corresponding misclassification cost value by using the calculation formula of the misclassification cost value, and generating a cost value matrix gamma ∈ R^n*nWherein, the cost value matrix γ is a diagonal matrix, n is the total number of the network data, and the calculation formula of the cost value matrix γ is as follows:

two-class network data

Multi-classification type network data

Step 2, constructing a hypergraph corresponding to the network data according to the network data, wherein the connection relation of the hypergraph is described by an H matrix, and the calculation formula of the H matrix is as follows:

in the formula, v_centralIs the central point of the hyper-map,

the mean value of the distances between the points in the hypergraph, d (v)_i，v_central) Is a super edge e_pUpper point v_iAnd a center point v_centralA is an adjusting parameter;

2. The method according to claim 1, wherein the distribution parameter r is a uniformly distributed parameter sequentially selected from [0.2,0.4,0.6,0.8], and the value of the adjustment parameter β is 1.

3. The method for detecting network attack based on F value optimization according to claim 1, wherein the step 3 specifically includes:

step 31, according to the hypergraph, performing Laplace regularization transformation to generate a type matrix;

4. The method for detecting network attack based on F value optimization according to claim 1, wherein the step 4 specifically includes:

5. A network attack detection system based on F value optimization is characterized by comprising: the device comprises a generating unit, a constructing unit, a calculating unit and a detecting unit;

the generation unit is used for calculating the misclassification cost value corresponding to the received network data according to an F value calculation model, wherein the F value calculation model comprises a two-classification F value calculation formula and a multi-classification F value calculation formula, and the two-classification F value calculation formula is as follows:

in the formula, F_βF value calculation for two classesFormula mcF_βFor the calculation formula of F values of multiple classifications, r is a uniformly distributed distribution parameter, β is a regulation parameter,

two-class network data

Multi-classification type network data

The construction unit is used for constructing a hypergraph corresponding to the network data according to the network data, wherein the connection relationship of the hypergraph is described by an H matrix, and the calculation formula of the H matrix is as follows:

in the formula, v_centralIs the central point of the hyper-map,

the calculation unit is used for calculating a prediction class label corresponding to the network data according to the cost value matrix and the hypergraph;

the detection unit is used for detecting the unmarked network data in the network data according to the prediction class mark.

6. The system according to claim 5, wherein the distribution parameter r is a uniformly distributed parameter sequentially selected from [0.2,0.4,0.6,0.8], and the value of the adjustment parameter β is 1.

7. The system according to claim 5, wherein the computing unit specifically includes: the device comprises a generating module, a constructing module and a calculating module;

the generation module is used for carrying out Laplace regularization transformation according to the hypergraph to generate a type matrix;

the construction module is used for constructing a cost-sensitive hypergraph learning model according to the type matrix and the cost value matrix;

the calculation module is used for calculating the prediction class mark F according to the cost-sensitive hypergraph learning model.

8. The system according to claim 5, wherein the detecting unit specifically includes: the system comprises a scoring module, a marking module and a detection module;

the scoring module is used for detecting the prediction type mark according to the marked network data in the network data to generate a detection score;

the marking module is used for selecting the prediction type mark with the highest detection score and marking the prediction type mark as an abnormal data detection model;

the detection module is used for detecting the unmarked network data in the network data according to the abnormal data detection model.