CN109951468B - Network attack detection method and system based on F value optimization - Google Patents

Network attack detection method and system based on F value optimization Download PDF

Info

Publication number
CN109951468B
CN109951468B CN201910183415.XA CN201910183415A CN109951468B CN 109951468 B CN109951468 B CN 109951468B CN 201910183415 A CN201910183415 A CN 201910183415A CN 109951468 B CN109951468 B CN 109951468B
Authority
CN
China
Prior art keywords
network data
value
matrix
hypergraph
calculation formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910183415.XA
Other languages
Chinese (zh)
Other versions
CN109951468A (en
Inventor
高跃
王楠
赵曦滨
万海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201910183415.XA priority Critical patent/CN109951468B/en
Publication of CN109951468A publication Critical patent/CN109951468A/en
Application granted granted Critical
Publication of CN109951468B publication Critical patent/CN109951468B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The application discloses a network attack detection method and system based on F value optimization, wherein the method comprises the following steps: step 1, calculating a wrong score cost value corresponding to received network data according to an F value calculation model, and generating a cost value matrix, wherein the network data comprises marked network data and unmarked network data; step 2, constructing a hypergraph corresponding to the network data according to the network data; step 3, calculating a prediction class mark corresponding to the network data according to the cost value matrix and the hypergraph; and 4, detecting the unmarked network data in the network data according to the prediction class marks. According to the technical scheme, the detection rate of the unbalanced data is improved, the F value evaluation index is used for optimizing the misclassification cost value, and the accuracy and reliability of network abnormal data detection are improved.

Description

Network attack detection method and system based on F value optimization
Technical Field
The application relates to the technical field of network anomaly detection, in particular to a network attack detection method based on F value optimization and a network attack detection system based on F value optimization.
Background
With the rapid development of network technology, network attack events also occur frequently, and in the face of increasing data traffic, how to efficiently and accurately detect abnormal traffic contained therein becomes more important, and because the traffic in the network follows numerous protocol types, and contains a large amount of different types of data and the data have serious unbalance, how to balance the detection rate and the accuracy of unbalanced data, improve the detection rate of the system for different network abnormal data, and how to efficiently and accurately detect abnormal data information is very important. The current method for anomaly detection mainly aims to improve the accuracy of detection, but not to reduce the comprehensive cost of detection.
In the prior art, the main challenges of network traffic anomaly detection are as follows:
1) the problem of serious imbalance of different types of data in data flow is solved, and the detection rate of all types of data is difficult to improve simultaneously;
2) it is difficult to build high-order data association between the flows and mine complex association between the data.
Disclosure of Invention
The purpose of this application lies in: the F value measurement index with better detection performance aiming at the unbalanced data is used for replacing the accuracy to optimize the wrong score cost value, the detection rate of the unbalanced data is improved to the maximum extent by using the F value measurement index, and the accuracy and the reliability of the network abnormal data detection are improved.
The technical scheme of the first aspect of the application is as follows: a network attack detection method based on F value optimization is provided, and the method comprises the following steps: step 1, calculating a wrong-scoring cost value corresponding to received network data according to an F value calculation model, and generating a cost value matrix, wherein the network data comprises marked network data and unmarked network data, and a calculation formula of the wrong-scoring cost value is as follows:
Figure BDA0001992050140000021
Figure BDA0001992050140000022
in the formula, FβFormula for F value calculation for two classes, mcFβCalculating a formula for the F values of multiple classifications, wherein r is a distribution parameter, and β is an adjusting parameter;
step 2, constructing a hypergraph corresponding to the network data according to the network data; step 3, calculating a prediction class mark corresponding to the network data according to the cost value matrix and the hypergraph; and 4, detecting the unmarked network data in the network data according to the prediction class marks.
In any of the above technical solutions, further, the distribution parameter r is a uniform distribution parameter sequentially selected from [0.2,0.4,0.6,0.8], and the value of the adjustment parameter β is 1.
In any one of the above technical solutions, further, in step 3, specifically including: step 31, according to the hypergraph, performing Laplace regularization transformation to generate a type matrix; step 32, constructing a cost-sensitive hypergraph learning model according to the type matrix and the cost value matrix; and step 33, calculating a prediction class mark F according to the cost-sensitive hypergraph learning model.
In any one of the above technical solutions, further, in step 4, specifically including: step 41, detecting a prediction type mark according to the marked network data in the network data to generate a detection score; step 42, selecting the prediction class mark with the highest detection score and marking the prediction class mark as an abnormal data detection model; and 43, detecting unmarked network data in the network data according to the abnormal data detection model.
The technical scheme of the second aspect of the application is as follows: provided is a network attack detection system based on F value optimization, which comprises: the device comprises a generating unit, a constructing unit, a calculating unit and a detecting unit; the generating unit is used for calculating a wrong-scoring cost value corresponding to the received network data according to the F value calculation model and generating a cost value matrix, wherein the network data comprises marked network data and unmarked network data, and a calculation formula of the wrong-scoring cost value is as follows:
Figure BDA0001992050140000023
Figure BDA0001992050140000031
in the formula, FβFormula for F value calculation for two classes, mcFβCalculating a formula for the F values of multiple classifications, wherein r is a distribution parameter, and β is an adjusting parameter;
the construction unit is used for constructing a hypergraph corresponding to the network data according to the network data; the calculation unit is used for calculating a prediction class mark corresponding to the network data according to the cost value matrix and the hypergraph; the detection unit is used for detecting the unmarked network data in the network data according to the prediction class mark.
In any of the above technical solutions, further, the distribution parameter r is a uniform distribution parameter sequentially selected from [0.2,0.4,0.6,0.8], and the value of the adjustment parameter β is 1.
In any one of the above technical solutions, further, the calculating unit specifically includes: the device comprises a generating module, a constructing module and a calculating module; the generation module is used for carrying out Laplace regularization transformation according to the hypergraph to generate a type matrix; the construction module is used for constructing a cost-sensitive hypergraph learning model according to the type matrix and the cost value matrix; and the calculation module is used for calculating the prediction class mark F according to the cost-sensitive hypergraph learning model.
In any one of the above technical solutions, further, the detecting unit specifically includes: the system comprises a scoring module, a marking module and a detection module; the scoring module is used for detecting the prediction type mark according to the marked network data in the network data to generate a detection score; the marking module is used for selecting the prediction class mark with the highest detection score and marking the prediction class mark as an abnormal data detection model; the detection module is used for detecting the unmarked network data in the network data according to the abnormal data detection model.
The beneficial effect of this application is: by utilizing the F value model, the misclassification cost value of the received network data is calculated, and the F value evaluation index is utilized to optimize the misclassification cost value, so that the problem of unbalanced detection rate of different types of network data is avoided, and the detection rate of unbalanced data is optimized. And then, by constructing a hypergraph of the received network data, the relevance among the network data is optimized, the accuracy of the prediction class mark of the network data is improved, the network data is detected according to the prediction class mark, and the accuracy and the reliability of the detection of the abnormal network data are improved.
Drawings
The advantages of the above and/or additional aspects of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic flow chart diagram of a network attack detection method based on F-value optimization according to one embodiment of the present application;
fig. 2 is a schematic block diagram of a network attack detection system based on F-value optimization according to an embodiment of the present application.
Detailed Description
In order that the above objects, features and advantages of the present application can be more clearly understood, the present application will be described in further detail with reference to the accompanying drawings and detailed description. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, however, the present application may be practiced in other ways than those described herein, and therefore the scope of the present application is not limited by the specific embodiments disclosed below.
The first embodiment is as follows:
as shown in fig. 1, the present embodiment provides a network attack detection method based on F value optimization, which includes:
step 1, calculating a wrong score cost value corresponding to received network data according to an F value calculation model, and generating a cost value matrix, wherein the network data comprises marked network data and unmarked network data, and a calculation formula of the wrong score cost value is as follows:
Figure BDA0001992050140000041
in the second classification, the first classification is carried out,
Figure BDA0001992050140000042
the method comprises the following steps of a plurality of categories,
in the formula, FβCost value calculation formula for two classes, mcFβFor the cost value calculation formula of multi-classification, β is the adjustment parameter, r is [0.2,0.4,0.6,0.8]]One distribution parameter is sequentially selected in the sequence.
For binary type network data, when the received network data belongs to the first classification (class1), the corresponding calculation formula of the false score cost value is:
Figure BDA0001992050140000051
when the received network data belongs to the second category (class2), the corresponding calculation formula of the false score cost value is:
Figure BDA0001992050140000052
for multi-classification type network data, according to the type of the received network data, calculating formula
Figure BDA0001992050140000053
The selection is made and is not described in detail here.
Preferably, the distribution parameter r is a uniform distribution parameter sequentially selected from [0.2,0.4,0.6,0.8], and the value of the adjustment parameter β is 1.
Specifically, in the network data detection process, for any network data, the category to which the network data belongs may be two categories or may also be multiple categories, and accordingly, the F value calculation model includes an F value calculation formula of the two categories and an F value calculation formula of the multiple categories.
Setting a marginal probability P for k classes (k classes) of network datakThe probability of misclassification into k classes is FPk(h) The probability of misclassification into other classes is FNk(h) Therefore, its corresponding misallocation location e (h) can be expressed as:
e(h)=(FN1(h),FP1(h),…,FNk(h),FPk(h),…,FNL(h),FPL(h)),
i.e. e of the misallocation positions e (h)2k-1The bit label is FNk(h) E th, e2kBit label FPk(h) In the formula, h is a classifier.
When the network data is binary data, the corresponding F value calculation formula is as follows:
Figure BDA0001992050140000054
when the network data is the multi-classification data, the corresponding F value calculation formula is as follows:
Figure BDA0001992050140000055
according to different classification data types of the network data, the F value and the misclassification cost value corresponding to the network data can be obtained by selecting corresponding calculation formulas and classifying the network data into two or more classes. Setting the value range of the F value as [0,1 ]]Defining a series of uniformly distributed distribution parameters r according to the value range of the F valueiE.g. [0.2,0.4,0.6,0.8]]. For each distribution parameterNumber riThe corresponding misclassification cost value can be calculated by using the calculation formula of the misclassification cost value, and a cost value matrix gamma ∈ R is generatedn*nThe cost value matrix gamma is a diagonal matrix, and n is the total number of network data.
For network data belonging to the second category, according to the distribution parameter riAnd the generated cost value matrix gamma is as follows:
Figure BDA0001992050140000061
for network data belonging to multiple categories, according to distribution parameter riAnd the generated cost value matrix gamma is as follows:
Figure BDA0001992050140000062
step 2, constructing a hypergraph corresponding to the network data according to the network data;
specifically, a hypergraph is constructed by using a star-type expansion method, the hypergraph structure can be described as G ═ V; E; W), wherein received network data are taken as points V of the hypergraph, the connection relation corresponding to each network data is a hyperedge E, the weight value of each hyperedge is a weight W, the connection relation of the hypergraph is described by adopting an H matrix, and the calculation formula of the H matrix is as follows:
Figure BDA0001992050140000063
in the formula, vcentralIs the central point of the hyper-map,
Figure BDA0001992050140000064
is the average of the distances between the points in the hypergraph,
d(vi,vcentral) Is a super edge epUpper point viAnd a center point vcentralA is an adjustment parameter, and in the present embodiment, the adjustment parameter a is 0.05.
Step 3, calculating a prediction class mark corresponding to the network data according to the cost value matrix and the hypergraph;
in the step 3, the method specifically comprises the following steps:
step 31, according to the hypergraph, performing laplacian regularization transformation to generate a type matrix F (v)i,m);
Specifically, the hypergraph is subjected to laplacian regularization transformation, and the corresponding calculation formula is as follows:
Figure BDA0001992050140000071
(e)=∑v∈Vh(v,e),
d(v)=∑e∈Ew(e)h(v,e),
wherein W (e) is the weight of the excess edge e, F (v)iM) is node viType matrix of, indicating node viWhether the represented network data belongs to the m-th class, F (v)iM) is 1, represents a node viBelongs to the m-th category, and if 0, represents the node viNot belonging to the m-th class, (e) is the degree of the super edge e, and all the degrees of the super edge e form a diagonal matrix DeD (v) is the degree of the node v, and the degrees of all the nodes v form a focusing matrix DvIt is possible to set:
Figure BDA0001992050140000072
therefore, the calculation formula corresponding to the laplacian regularization transform can be written as:
Ω=F(vi,m)TΔF(vi,m)。
step 32, constructing a cost-sensitive hypergraph learning model according to the type matrix and the cost value matrix;
and step 33, calculating the prediction class index F according to the cost-sensitive hypergraph learning model.
Specifically, the type matrix F (v) corresponding to the hypergraph can be obtained by laplace regularizationiM) according to the type matrix F (v)iM) and a cost value matrix gamma, constructing a cost-sensitive hypergraph learning model and correspondingly calculatingThe calculation formula is as follows:
Figure BDA0001992050140000073
wherein Y is a known label value matrix of the marked network data in the network data, the dimension of the known label value matrix is n x m, n is the total number of the network data, m is the number of all categories, for the known label value matrix Y, the corresponding category is marked with 1, other m-1 positions are marked with 0, for the unmarked network data, all bits are marked with 0,
Figure BDA0001992050140000074
for optimizing the regularization expression of the hypergraph structure, gamma, mu and lambda are regulating parameters, and N iseIs the amount of data.
In the process of optimizing the calculation formula of the cost-sensitive hypergraph learning model, because the optimization of the formula is convex, the optimization can be carried out by utilizing an alternative optimization strategy.
First, W is fixed to optimize W, and the formula can be written as:
Figure BDA0001992050140000081
the partial derivative of w can be obtained:
Figure BDA0001992050140000082
secondly, fixing W, optimizing W, the formula can be written as:
Figure BDA0001992050140000083
the partial derivative of W can be obtained:
Figure BDA0001992050140000084
in the formula (I), the compound is shown in the specification,
Figure BDA0001992050140000085
is an identity matrix.
And through repeated iteration, reducing objective function values, and optimizing to obtain a corresponding prediction type mark F:
F=Xw。
and 4, detecting the unmarked network data in the network data according to the prediction class mark.
In the step 4, the method specifically comprises the following steps:
step 41, detecting the prediction type mark according to the marked network data in the network data to generate a detection score;
step 42, selecting the prediction class mark with the highest detection score and marking the prediction class mark as an abnormal data detection model;
and 43, detecting the unmarked network data in the network data according to the abnormal data detection model.
Specifically, a plurality of prediction class labels F can be obtained by using the obtained plurality of cost value matrices γ, the obtained plurality of prediction class labels F are detected by using the marked network data in the received network data, the marked network data is detected according to the prediction class labels F and compared with the known label value matrix Y to generate corresponding detection scores, then the prediction class labels F are sorted according to the high-low sequence of the detection scores, the prediction class label with the highest score is selected and marked as an abnormal data detection model, the unmarked network data in the received network data is detected by using the selected abnormal data detection model, and whether the unmarked network data is the network attack data or not is judged.
Preferably, step 4 specifically includes:
step 401, detecting the prediction class label according to the marked network data in the network data, and generating a detection score;
step 402, selecting the prediction class labels with the same number as the preset number according to the sequence of the detection scores from large to small, fusing the selected prediction class labels by adopting a fusion algorithm, and marking the fusion result as an abnormal data detection model;
step 403, detecting the unmarked network data in the network data according to the abnormal data detection model.
Example two:
as shown in fig. 2, the present embodiment provides a network attack detection system 100 based on F value optimization, which includes: a generating unit 101, a constructing unit 102, a calculating unit 103 and a detecting unit 104; the generating unit 101 is configured to calculate, according to the F value calculation model, an incorrect score cost value corresponding to the received network data, and generate a cost value matrix, where the network data includes marked network data and unmarked network data, and a calculation formula of the incorrect score cost value is:
Figure BDA0001992050140000091
Figure BDA0001992050140000092
in the formula, FβFormula for F value calculation for two classes, mcFβCalculating a formula for the F values of multiple classifications, wherein r is a distribution parameter, and β is an adjusting parameter;
for binary type network data, when the received network data belongs to the first classification (class1), the corresponding calculation formula of the false score cost value is:
Figure BDA0001992050140000101
when the received network data belongs to the second category (class2), the corresponding calculation formula of the false score cost value is:
Figure BDA0001992050140000102
for multi-classification type network data, according to the type of the received network data, calculating formula
Figure BDA0001992050140000103
The selection is made and is not described in detail here.
Preferably, the distribution parameter r is a uniform distribution parameter sequentially selected from [0.2,0.4,0.6,0.8], and the value of the adjustment parameter β is 1.
Specifically, in the network data detection process, for any network data, the category to which the network data belongs may be two categories or may also be multiple categories, and accordingly, the F value calculation model includes an F value calculation formula of the two categories and an F value calculation formula of the multiple categories.
Setting a marginal probability P for k classes (k classes) of network datakThe probability of misclassification into k classes is FPk(h) The probability of misclassification into other classes is FNk(h) Therefore, its corresponding misallocation location e (h) can be expressed as:
e(h)=(FN1(h),FP1(h),…,FNk(h),FPk(h),…,FNL(h),FPL(h)),
i.e. e of the misallocation positions e (h)2k-1The bit label is FNk(h) E th, e2kBit label FPk(h) In the formula, h is a classifier.
When the network data is binary data, the corresponding F value calculation formula is as follows:
Figure BDA0001992050140000104
when the network data is the multi-classification data, the corresponding F value calculation formula is as follows:
Figure BDA0001992050140000105
according to different classification data types of the network data, the F value and the misclassification cost value corresponding to the network data can be obtained by selecting corresponding calculation formulas and classifying the network data into two or more classes. Setting the value range of the F value as [0,1 ]]Defining a series of uniformly distributed distribution parameters r according to the value range of the F valueiE.g. [0.2,0.4,0.6,0.8]]. For each distribution parameter riThe corresponding misclassification cost value can be calculated by using the calculation formula of the misclassification cost value, and a cost value matrix gamma ∈ R is generatedn*nThe cost value matrix gamma is a diagonal matrix, and n is the total number of network data.
For network data belonging to the second category, according to the distribution parameter riAnd the generated cost value matrix gamma is as follows:
Figure BDA0001992050140000111
for network data belonging to multiple categories, according to distribution parameter riAnd the generated cost value matrix gamma is as follows:
Figure BDA0001992050140000112
the construction unit 102 is configured to construct a hypergraph corresponding to the network data according to the network data;
specifically, a hypergraph is constructed by using a star-type expansion method, the hypergraph structure can be described as G ═ V; E; W), wherein received network data are taken as points V of the hypergraph, the connection relation corresponding to each network data is a hyperedge E, the weight value of each hyperedge is a weight W, the connection relation of the hypergraph is described by adopting an H matrix, and the calculation formula of the H matrix is as follows:
Figure BDA0001992050140000113
in the formula, vcentralIs the central point of the hyper-map,
Figure BDA0001992050140000114
is the average of the distances between the points in the hypergraph,
d(vi,vcentral) Is a super edge epUpper point viAnd a center point vcentralA is an adjustment parameter, and in the present embodiment, the adjustment parameter a is 0.05.
The calculating unit 103 is configured to calculate a prediction class label corresponding to the network data according to the cost value matrix and the hypergraph;
further, the calculating unit 103 specifically includes: the device comprises a generating module, a constructing module and a calculating module; the generation module is used for carrying out Laplace regularization transformation according to the hypergraph to generate a type matrix;
specifically, the hypergraph is subjected to laplacian regularization transformation, and the corresponding calculation formula is as follows:
Figure BDA0001992050140000115
(e)=∑v∈Vh(v,e),
d(v)=∑e∈Ew(e)h(v,e),
wherein W (e) is the weight of the excess edge e, F (v)iM) is node viType matrix of, indicating node viWhether the represented network data belongs to the m-th class, F (v)iM) is 1, represents a node viBelongs to the m-th category, and if 0, represents the node viNot belonging to the m-th class, (e) is the degree of the super edge e, and all the degrees of the super edge e form a diagonal matrix DeD (v) is the degree of the node v, and the degrees of all the nodes v form a focusing matrix DvIt is possible to set:
Figure BDA0001992050140000121
therefore, the calculation formula corresponding to the laplacian regularization transform can be written as:
Ω=F(vi,m)TΔF(vi,m)。
the construction module is used for constructing a cost-sensitive hypergraph learning model according to the type matrix and the cost value matrix; and the calculation module is used for calculating the prediction class mark F according to the cost-sensitive hypergraph learning model.
Specifically, the type matrix F (v) corresponding to the hypergraph can be obtained by laplace regularizationiM) according to the type matrix F (v)iM) and a cost value matrix gamma, constructing a cost sensitive hypergraph learning model, wherein the corresponding calculation formula is as follows:
Figure BDA0001992050140000122
wherein Y is a known label value matrix of the marked network data in the network data, the dimension of the known label value matrix is n x m, n is the total number of the network data, m is the number of all categories, for the known label value matrix Y, the corresponding category is marked with 1, other m-1 positions are marked with 0, for the unmarked network data, all bits are marked with 0,
Figure BDA0001992050140000123
for optimizing the regularization expression of the hypergraph structure, gamma, mu and lambda are regulating parameters, and N iseIs the amount of data.
In the process of optimizing the calculation formula of the cost-sensitive hypergraph learning model, because the optimization of the formula is convex, the optimization can be carried out by utilizing an alternative optimization strategy.
First, W is fixed to optimize W, and the formula can be written as:
Figure BDA0001992050140000124
the partial derivative of w can be obtained:
Figure BDA0001992050140000125
secondly, fixing W, optimizing W, the formula can be written as:
Figure BDA0001992050140000131
the partial derivative of W can be obtained:
Figure BDA0001992050140000132
in the formula (I), the compound is shown in the specification,
Figure BDA0001992050140000133
is an identity matrix.
And through repeated iteration, reducing objective function values, and optimizing to obtain a corresponding prediction type mark F:
F=Xw。
the detecting unit 104 is configured to detect unlabeled network data in the network data according to the prediction class label.
Further, the detecting unit 104 specifically includes: the system comprises a scoring module, a marking module and a detection module; the scoring module is used for detecting the prediction type mark according to the marked network data in the network data to generate a detection score; the marking module is used for selecting the prediction class mark with the highest detection score and marking the prediction class mark as an abnormal data detection model; the detection module is used for detecting the unmarked network data in the network data according to the abnormal data detection model.
Specifically, a plurality of prediction class labels F can be obtained by using the obtained plurality of cost value matrices γ, the obtained plurality of prediction class labels F are detected by using the marked network data in the received network data, the marked network data is detected according to the prediction class labels F and compared with the known label value matrix Y to generate corresponding detection scores, then the prediction class labels F are sorted according to the high-low sequence of the detection scores, the prediction class label with the highest score is selected and marked as an abnormal data detection model, the unmarked network data in the received network data is detected by using the selected abnormal data detection model, and whether the unmarked network data is the network attack data or not is judged.
Preferably, the detecting unit 104 specifically includes: the system comprises a score generation module, a fusion module and an abnormality detection module; the score generation module is used for detecting the prediction type mark according to the marked network data in the network data to generate a detection score; the fusion module is used for selecting the prediction class labels with the same number as the preset number according to the sequence of the detection scores from large to small, fusing the selected prediction class labels by adopting a fusion algorithm, and recording the fusion result as an abnormal data detection model; and the anomaly detection module is used for detecting the unmarked network data in the network data according to the anomaly data detection model.
The technical scheme of the present application is described in detail above with reference to the accompanying drawings, and the present application provides a network attack detection method and system based on F value optimization, wherein the method includes: step 1, calculating a wrong score cost value corresponding to received network data according to an F value calculation model, and generating a cost value matrix, wherein the network data comprises marked network data and unmarked network data; step 2, constructing a hypergraph corresponding to the network data according to the network data; step 3, calculating a prediction class mark corresponding to the network data according to the cost value matrix and the hypergraph; and 4, detecting the unmarked network data in the network data according to the prediction class marks. According to the technical scheme, the detection rate of the unbalanced data is improved, the F value evaluation index is used for optimizing the misclassification cost value, and the accuracy and reliability of network abnormal data detection are improved.
The steps in the present application may be sequentially adjusted, combined, and subtracted according to actual requirements.
The units in the device can be merged, divided and deleted according to actual requirements.
Although the present application has been disclosed in detail with reference to the accompanying drawings, it is to be understood that such description is merely illustrative and not restrictive of the application of the present application. The scope of the present application is defined by the appended claims and may include various modifications, adaptations, and equivalents of the invention without departing from the scope and spirit of the application.

Claims (8)

1. A network attack detection method based on F value optimization is characterized by comprising the following steps:
step 1, calculating the misclassification cost value corresponding to the received network data according to an F value calculation model, wherein the F value calculation model comprises a two-classification F value calculation formula and a multi-classification F value calculation formula, and the two-classification F value calculation formula is as follows:
Figure FDA0002557559550000011
the calculation formula of the F value of the multi-classification is as follows:
Figure FDA0002557559550000012
in the formula, P1Marginal probability of network data of class1, e1、e2、e2k-1The 1 st, 2 nd and 2k-1 st bits in e (h) are allocated for error respectively,
the network data comprises marked network data and unmarked network data, and the calculation formula of the misclassification cost value is as follows:
Figure FDA0002557559550000013
Figure FDA0002557559550000014
in the formula, FβFormula for F value calculation for two classes, mcFβFor the calculation formula of F values of multiple classifications, r is a uniformly distributed distribution parameter, β is a regulation parameter,
aiming at each uniformly distributed distribution parameter R, calculating a corresponding misclassification cost value by using the calculation formula of the misclassification cost value, and generating a cost value matrix gamma ∈ Rn*nWherein, the cost value matrix γ is a diagonal matrix, n is the total number of the network data, and the calculation formula of the cost value matrix γ is as follows:
two-class network data
Figure FDA0002557559550000015
Multi-classification type network data
Figure FDA0002557559550000021
Step 2, constructing a hypergraph corresponding to the network data according to the network data, wherein the connection relation of the hypergraph is described by an H matrix, and the calculation formula of the H matrix is as follows:
Figure FDA0002557559550000022
in the formula, vcentralIs the central point of the hyper-map,
Figure FDA0002557559550000023
the mean value of the distances between the points in the hypergraph, d (v)i,vcentral) Is a super edge epUpper point viAnd a center point vcentralA is an adjusting parameter;
step 3, calculating a prediction class mark corresponding to the network data according to the cost value matrix and the hypergraph;
and 4, detecting the unmarked network data in the network data according to the prediction class mark.
2. The method according to claim 1, wherein the distribution parameter r is a uniformly distributed parameter sequentially selected from [0.2,0.4,0.6,0.8], and the value of the adjustment parameter β is 1.
3. The method for detecting network attack based on F value optimization according to claim 1, wherein the step 3 specifically includes:
step 31, according to the hypergraph, performing Laplace regularization transformation to generate a type matrix;
step 32, constructing a cost-sensitive hypergraph learning model according to the type matrix and the cost value matrix;
and step 33, calculating the prediction class index F according to the cost-sensitive hypergraph learning model.
4. The method for detecting network attack based on F value optimization according to claim 1, wherein the step 4 specifically includes:
step 41, detecting the prediction type mark according to the marked network data in the network data to generate a detection score;
step 42, selecting the prediction class mark with the highest detection score and marking the prediction class mark as an abnormal data detection model;
and 43, detecting the unmarked network data in the network data according to the abnormal data detection model.
5. A network attack detection system based on F value optimization is characterized by comprising: the device comprises a generating unit, a constructing unit, a calculating unit and a detecting unit;
the generation unit is used for calculating the misclassification cost value corresponding to the received network data according to an F value calculation model, wherein the F value calculation model comprises a two-classification F value calculation formula and a multi-classification F value calculation formula, and the two-classification F value calculation formula is as follows:
Figure FDA0002557559550000031
the calculation formula of the F value of the multi-classification is as follows:
Figure FDA0002557559550000032
in the formula, P1Marginal probability of network data of class1, e1、e2、e2k-1The 1 st, 2 nd and 2k-1 st bits in e (h) are allocated for error respectively,
the network data comprises marked network data and unmarked network data, and the calculation formula of the misclassification cost value is as follows:
Figure FDA0002557559550000033
Figure FDA0002557559550000034
in the formula, FβF value calculation for two classesFormula mcFβFor the calculation formula of F values of multiple classifications, r is a uniformly distributed distribution parameter, β is a regulation parameter,
aiming at each uniformly distributed distribution parameter R, calculating a corresponding misclassification cost value by using the calculation formula of the misclassification cost value, and generating a cost value matrix gamma ∈ Rn*nWherein, the cost value matrix γ is a diagonal matrix, n is the total number of the network data, and the calculation formula of the cost value matrix γ is as follows:
two-class network data
Figure FDA0002557559550000041
Multi-classification type network data
Figure FDA0002557559550000042
The construction unit is used for constructing a hypergraph corresponding to the network data according to the network data, wherein the connection relationship of the hypergraph is described by an H matrix, and the calculation formula of the H matrix is as follows:
Figure FDA0002557559550000043
in the formula, vcentralIs the central point of the hyper-map,
Figure FDA0002557559550000044
the mean value of the distances between the points in the hypergraph, d (v)i,vcentral) Is a super edge epUpper point viAnd a center point vcentralA is an adjusting parameter;
the calculation unit is used for calculating a prediction class label corresponding to the network data according to the cost value matrix and the hypergraph;
the detection unit is used for detecting the unmarked network data in the network data according to the prediction class mark.
6. The system according to claim 5, wherein the distribution parameter r is a uniformly distributed parameter sequentially selected from [0.2,0.4,0.6,0.8], and the value of the adjustment parameter β is 1.
7. The system according to claim 5, wherein the computing unit specifically includes: the device comprises a generating module, a constructing module and a calculating module;
the generation module is used for carrying out Laplace regularization transformation according to the hypergraph to generate a type matrix;
the construction module is used for constructing a cost-sensitive hypergraph learning model according to the type matrix and the cost value matrix;
the calculation module is used for calculating the prediction class mark F according to the cost-sensitive hypergraph learning model.
8. The system according to claim 5, wherein the detecting unit specifically includes: the system comprises a scoring module, a marking module and a detection module;
the scoring module is used for detecting the prediction type mark according to the marked network data in the network data to generate a detection score;
the marking module is used for selecting the prediction type mark with the highest detection score and marking the prediction type mark as an abnormal data detection model;
the detection module is used for detecting the unmarked network data in the network data according to the abnormal data detection model.
CN201910183415.XA 2019-03-12 2019-03-12 Network attack detection method and system based on F value optimization Active CN109951468B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910183415.XA CN109951468B (en) 2019-03-12 2019-03-12 Network attack detection method and system based on F value optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910183415.XA CN109951468B (en) 2019-03-12 2019-03-12 Network attack detection method and system based on F value optimization

Publications (2)

Publication Number Publication Date
CN109951468A CN109951468A (en) 2019-06-28
CN109951468B true CN109951468B (en) 2020-08-28

Family

ID=67009501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910183415.XA Active CN109951468B (en) 2019-03-12 2019-03-12 Network attack detection method and system based on F value optimization

Country Status (1)

Country Link
CN (1) CN109951468B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111181939B (en) * 2019-12-20 2022-02-25 广东工业大学 Network intrusion detection method and device based on ensemble learning
CN111586051B (en) * 2020-05-08 2021-06-01 清华大学 Network anomaly detection method based on hypergraph structure quality optimization
CN113723550B (en) * 2021-09-06 2023-12-05 珠海横琴跨境说网络科技有限公司 Abnormality detection method and system for optimizing cost and false detection rate based on hypergraph
CN114969351B (en) * 2022-08-01 2022-10-25 长沙市智为信息技术有限公司 Web attack detection method and device based on hypergraph aggregation network

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101986299A (en) * 2010-10-28 2011-03-16 浙江大学 Multi-task personalized web service method based on hypergraph
US8429151B2 (en) * 2010-11-22 2013-04-23 Ianywhere Solutions, Inc. Highly adaptable query optimizer search space generation process
US8792633B2 (en) * 2012-09-07 2014-07-29 Genesys Telecommunications Laboratories, Inc. Method of distributed aggregation in a call center
CN103345645B (en) * 2013-06-27 2016-09-28 复旦大学 Commodity image class prediction method towards net purchase platform
CN103793467B (en) * 2013-09-10 2017-01-25 浙江鸿程计算机系统有限公司 Method for optimizing real-time query on big data on basis of hyper-graphs and dynamic programming
CN104090936B (en) * 2014-06-27 2017-02-22 华南理工大学 News recommendation method based on hypergraph sequencing

Also Published As

Publication number Publication date
CN109951468A (en) 2019-06-28

Similar Documents

Publication Publication Date Title
CN109951468B (en) Network attack detection method and system based on F value optimization
CN109325691B (en) Abnormal behavior analysis method, electronic device and computer program product
WO2019238109A1 (en) Fault root cause analysis method and apparatus
US20190166024A1 (en) Network anomaly analysis apparatus, method, and non-transitory computer readable storage medium thereof
WO2019223384A1 (en) Feature interpretation method and device for gbdt model
Zhang et al. Data streaming with affinity propagation
CN110880019B (en) Method for adaptively training target domain classification model through unsupervised domain
CN109255440B (en) Method for predictive maintenance of power production equipment based on Recurrent Neural Networks (RNN)
CN107579846B (en) Cloud computing fault data detection method and system
CN110427311A (en) Disk failure prediction technique and system based on temporal aspect processing and model optimization
CN111586051B (en) Network anomaly detection method based on hypergraph structure quality optimization
CN110225055A (en) A kind of network flow abnormal detecting method and system based on KNN semi-supervised learning model
CN112756759B (en) Spot welding robot workstation fault judgment method
CN111786951B (en) Traffic data feature extraction method, malicious traffic identification method and network system
CN111523778A (en) Power grid operation safety assessment method based on particle swarm algorithm and gradient lifting tree
CN110633371A (en) Log classification method and system
CN114036347B (en) Cloud platform supporting digital fusion service and working method
CN112800770A (en) Entity alignment method based on heteromorphic graph attention network
CN111478314A (en) Transient stability assessment method for power system
CN109818971B (en) Network data anomaly detection method and system based on high-order association mining
CN114647525A (en) Diagnostic method, diagnostic device, terminal and storage medium
CN114116829A (en) Abnormal data analysis method, abnormal data analysis system, and storage medium
Kim et al. DyGRAIN: An Incremental Learning Framework for Dynamic Graphs.
KR101090892B1 (en) Method of providing information for predicting enzyme selectivity of metabolism phase ii reactions
CN112422546A (en) Network anomaly detection method based on variable neighborhood algorithm and fuzzy clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant