CN114547608A - Network security situation assessment method based on noise reduction self-coding kernel density estimation - Google Patents

Network security situation assessment method based on noise reduction self-coding kernel density estimation Download PDF

Info

Publication number
CN114547608A
CN114547608A CN202210108654.0A CN202210108654A CN114547608A CN 114547608 A CN114547608 A CN 114547608A CN 202210108654 A CN202210108654 A CN 202210108654A CN 114547608 A CN114547608 A CN 114547608A
Authority
CN
China
Prior art keywords
network
data
situation
noise reduction
coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210108654.0A
Other languages
Chinese (zh)
Inventor
杜秀丽
陶帆
宋林凯
吕亚娜
邱少明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University
Original Assignee
Dalian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University filed Critical Dalian University
Priority to CN202210108654.0A priority Critical patent/CN114547608A/en
Publication of CN114547608A publication Critical patent/CN114547608A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Virology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a network security situation assessment method based on noise reduction self-coding kernel density estimation, which belongs to the field of computer network security and comprises the following steps: acquiring network flow data as situation data; preprocessing the situation data; dividing the preprocessed situation data into training set data and test set data according to a proportion; constructing a noise reduction self-coding network and a kernel code estimation model based on training set data; sequentially inputting the test set data into a noise reduction self-coding network and a nuclear code estimation model to obtain the threat occurrence probability of network flow data; based on the threat occurrence probability of network flow data, performing security evaluation on the network situation to determine the level of the network security situation; the method comprises the steps of utilizing the capacity of processing redundant information and learning nonlinear features of a noise reduction self-coding network to reduce the dimension of network situation data, extracting situation hidden features, and combining the advantages of no parameter estimation to provide the density probability estimation of the hidden features by kernel density estimation to obtain the threat occurrence probability.

Description

Network security situation assessment method based on noise reduction self-coding kernel density estimation
Technical Field
The invention relates to the field of computer network security, in particular to a network security situation evaluation method based on noise reduction self-coding kernel density estimation.
Background
The rapid growth of network space provides convenience and benefits to people, but also presents network security challenges to people. Although security measures such as firewalls and intrusion detection systems have been deployed when designing network architectures to detect and prevent attacks on the network, a large number of alarms and false positives are often generated. Therefore, it is necessary to develop an effective network security situation assessment method to analyze and quantitatively assess the security situation of the network system, fully understand the threat of the network system, provide a visual understanding of the network security situation, and assist the network administrator in making decisions.
Network situation awareness is a key research technology for changing passive defense into active awareness in the field of information security, and the concept is developed from battlefield situation awareness. According to the situation awareness model proposed by Base, research on situation assessment is indispensable. At present, more research theories applied to network security situation assessment include fuzzy theory, evidence theory, Markov model, Bayesian network and the like, situation elements processed by the models are mainly alarm logs generated by security tools, and good effects are shown in the practical application of small and medium-scale networks, but some limitations still exist, for example, basic probability distribution in the evidence theory needs expert experience support, so that the assessment effect is good and bad; markov and Bayes need more prior knowledge based on probability theory and knowledge inference, so that the evaluation cost is huge, and the situation evaluation efficiency is difficult to improve. These limitations are increasingly evident in large-scale network environments, and neural networks are widely applied to various fields due to the advantages of solving complex problems by nonlinear mapping, so that important attention is paid to the situation awareness field.
Xielixia et al[1-2]The BP neural network is firstly proposed to be used for network security situation assessment, and then the weight of the BP neural network is optimized by using a cuckoo optimization algorithm on the basis, so that an improved situation assessment model is obtained. The experimental result shows that the improved method has higher convergence speed and better evaluation effect than the traditional BP neural network; han[3]Aiming at the problem that the traditional neural network dimension is exponentially increased, so that the calculated amount is increased and the method is not suitable for large-scale complex networks, the method for quantitatively evaluating the security condition of the wireless interconnected intelligent robot group network based on the convolutional neural network is provided, and the accuracy rate reaches 95%; literature reference[4]Based on the reality situation that tag situation data are difficult to obtain, the situation evaluation method based on the deep self-coding network is provided to avoid the BP neural network training relying on tags, the situation data are trained in a semi-supervised learning mode, and a situation evaluation model is established. Experiments show that the root mean square error of the method is obviously smaller than that of a BP neural network, but experts participate in the evaluation process, and the situation evaluation effect is lack of analysis; yang Hongyu team[5-6]The method comprises the steps that network flow is used as a main situation element, a self-encoder variant is applied to the field of network situation perception twice, firstly, a variational self-encoder is combined with a generated countermeasure network to establish a threat test model, and the network security threat situation is evaluated, but the threat test model is complex and has high requirements on hardware; and then, the proposed depth self-encoder model carries out secondary classification and fifth classification on the network abnormity types, and the proposed model has higher classification precision, but the used data set is too long and not suitable for the complex network environment at present.
The traditional flow analysis reveals that all network events are reflected on the flow, and the normal network flow and the abnormal network flow have obvious performance difference, so the flow analysis can evaluate the network eventsEstimating network state[7]Related researches of intrusion detection show that abnormal events occur rarely, so that normal samples and abnormal samples in a real network are distributed unevenly, and network flow needs to be labeled firstly based on a supervised learning model, so that not only is time consumed, but also the model efficiency is reduced.
In order to avoid the defects of a supervised model, the invention provides an unsupervised network security situation assessment method for noise reduction self-coding kernel density estimation. The self-encoder reduces the dimension of the high-dimensional network situation data and extracts hidden features, but the output of the self-encoder is only simple copy of the input, so that the supervision model loses effectiveness, and the accuracy of the traditional network security situation evaluation method is not high due to the fact that the current network situation data has the characteristics of high dimension and nonlinearity.
Disclosure of Invention
Aiming at the problem that the accuracy of the traditional network security situation assessment method is not high due to the fact that the current network situation data has the characteristics of high dimension and nonlinearity, the invention discloses
A network security situation assessment method based on noise reduction self-coding kernel density estimation comprises the following steps:
acquiring network flow data as situation data;
preprocessing the situation data; dividing the preprocessed situation data into training set data and test set data according to a proportion;
constructing a noise reduction self-coding network and a kernel code estimation model based on training set data;
sequentially inputting the test set data into a noise reduction self-coding network and a nuclear code estimation model to obtain the threat occurrence probability of network flow data;
and performing security evaluation on the network situation based on the threat occurrence probability of the network flow data, and determining the level of the network security situation.
Further, the preprocessing the situation data comprises the following steps: and removing repeated characteristic columns in the situation data and samples with infinite and null values in fields, and then carrying out normalization processing.
Further, the process of constructing the noise reduction self-coding network and the kernel code estimation model based on the training set data is as follows: sequentially inputting training set data into a noise reduction self-coding network to train the noise reduction self-coding network to obtain hidden layer characteristics of network flow data; the hidden layer characteristics of the network flow data are input into a kernel density model, kernel density is trained, and a noise reduction self-coding network and a kernel password estimation model are constructed, wherein the specific process comprises the following steps:
step 1: setting the maximum training times;
step 2: initializing parameters of a noise reduction self-coding network;
and step 3: inputting training set data into a noise reduction self-coding network, and sequentially calculating the input data as follows:
Figure BDA0003494666860000031
Figure BDA0003494666860000032
y=g(w'h+b') (3)
x, h and y respectively represent input layer data, hidden layer data and output data; q. q.sDRepresents [0, 1]]A is a noise factor; f and g respectively represent nonlinear excitation functions in the encoding and decoding processes; w and w 'are weight parameters, b and b' are offsets;
and 4, step 4: calculating a reconstruction error by using a reconstruction error calculation formula;
and 5: minimizing reconstruction errors; adjusting the weight and the bias parameters;
step 6: judging whether the training count value k is larger than the set maximum training times, if so, finishing the training of the noise reduction self-organizing network, otherwise, adjusting the weight and the offset parameter, and returning to the step 3;
and 7: inputting the training data into the trained noise reduction self-coding network again and acquiring hidden layer characteristic data;
and 8: selecting a Gaussian function as a kernel function of the kernel density estimation model and setting a window width value of the Gaussian function;
and step 9: and taking the hidden layer characteristic data as input, and constructing a kernel density estimation model through a kernel density estimation formula to obtain the probability density distribution of the training data.
Further, the process of sequentially inputting the test set data into the noise reduction self-coding network and the kernel code estimation model to obtain the threat occurrence probability of the network flow data is as follows:
step 1: test data will be
Figure BDA0003494666860000033
Inputting the data into a trained noise reduction self-coding network, and acquiring hidden layer characteristics of test data
Figure BDA0003494666860000034
Step 2: inputting hidden layer characteristics of test data into constructed nuclear density estimation model to calculate density value
Figure BDA0003494666860000035
And step 3: by passing
Figure BDA0003494666860000041
Deriving outliers of test data
Figure BDA0003494666860000042
Smaller outliers indicate the presence of anomalies;
and 4, step 4: and (4) inverting and normalizing the abnormal value to be between [0, 1], and taking the inverted and normalized value as the threat occurrence probability.
Further, the process of performing security assessment on the network situation based on the threat occurrence probability of the network flow data and determining the level of the network security situation is as follows:
based on the threat occurrence probability of network flow data, performing score evaluation on the confidentiality C, the integrity I and the availability A of the network equipment based on a general vulnerability scoring system, and quantifying a threat influence value;
quantifying a network security situation value based on the threat occurrence probability and the threat influence value of the network flow data;
normalizing the network security situation values, and dividing the normalized network security situation values into: 5 grades of excellent, good, medium, poor and dangerous are adopted.
Further, the calculation formula of the threat effect value is as follows:
Figure BDA0003494666860000043
wherein: ei: a threat impact value, i represents a network flow sequence number; c: confidentiality, I: integrity, a: availability.
A network security posture assessment system based on noise reduction self-coding kernel density estimation, comprising:
an acquisition module: the system is used for acquiring network flow data as situation data;
a preprocessing module: the device is used for preprocessing the situation data; dividing the preprocessed situation data into training set data, verification set data and test set data according to a proportion;
a training module: the method is used for constructing a noise reduction self-coding network and a nuclear code estimation model based on training set data;
an estimation module: the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is used for sequentially inputting test set data into a noise reduction self-coding network and a nuclear password estimation model to obtain the threat occurrence probability of network flow data;
a determination module: the method is used for carrying out security assessment on the network situation based on the threat occurrence probability of the network flow data and determining the level of the network security situation.
Due to the adoption of the technical scheme, the network security situation evaluation method based on the noise reduction self-coding kernel density estimation overcomes the limitation that the traditional network security situation quantitative evaluation method based on the supervised feature learning needs to be modeled by depending on a data label, reduces the dimension of network situation data and extracts situation latent features by utilizing the capacity of the noise reduction self-coding network for processing redundant information and nonlinear feature learning, and simultaneously provides the kernel density estimation to perform density probability estimation on the latent features to obtain the threat occurrence probability by combining the advantages of no parameter estimation; the method takes the network flow as the basic situation of network situation evaluation; the application provides an unsupervised network security situation evaluation method which uses a noise reduction self-encoder to inhibit the copying tendency of the traditional self-encoder, combines the noise reduction self-encoder with kernel density estimation to form noise reduction self-encoding kernel density estimation, and an improved noise reduction self-encoding kernel density estimation model can improve the sensitivity to the abnormal situation of a network; according to the method and the device, the security situation value is quantified according to the threat occurrence probability, and the network security situation is quantitatively evaluated. And (3) adopting a real network flow database to analyze and evaluate the situation of the network environment with the attack behavior, wherein the analysis shows that: the accuracy and the recall rate of the proposed model are respectively improved by 3.51 percent and 5.99 percent, and the accuracy of evaluation is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow chart of the present invention;
FIG. 2(a) is a process diagram of a noise-reduced self-coding kernel density estimation model according to the present invention; (b) training a test chart for the noise reduction self-coding kernel density estimation model;
FIG. 3 is a diagram of a noise reduction self-coding network architecture according to the present invention;
FIG. 4 is a ROC-AUC plot of the OCDAE-KDE model on the test set of the present invention;
FIG. 5 is a diagram of the network security situation values of the Friday test set according to the present invention.
Detailed Description
In order to make the technical solutions and advantages of the present invention clearer, the following describes the technical solutions in the embodiments of the present invention clearly and completely with reference to the drawings in the embodiments of the present invention:
FIG. 1 is a flow chart of the present invention; a network security situation assessment method based on noise reduction self-coding kernel density estimation comprises the following steps:
a situation acquisition stage: acquiring network flow data as situation data; preprocessing the situation data; dividing the preprocessed situation data into training set data, verification set data and test set data according to a proportion; selecting a CICIDS-2017 data set as a research object in a noise reduction self-coding network and nuclear code estimation model situation acquisition stage;
situation understanding phase: constructing a noise reduction self-coding network and a kernel code estimation model based on training set data;
and (3) situation evaluation stage: sequentially inputting the test set data into a noise reduction self-coding network and a nuclear code estimation model to obtain the threat occurrence probability of network flow data; the noise reduction self-coding network mainly carries out dimension reduction on network flow situation data processing redundant information and nonlinear feature learning, hidden features are excavated and hidden in a hidden layer representation mode, the redundant information is removed, and then density probability estimation is carried out on the hidden features by combining the advantages of parameter-free estimation and utilizing a kernel density model, so that threat occurrence probability is obtained;
and performing security evaluation on the network situation based on the threat occurrence probability of the network flow data, and determining the level of the network security situation.
Further, the preprocessing the situation data comprises the following steps: and deleting and removing repeated characteristic columns and samples with infinite infinity and NaN null values in the situation data from the pandas library, and then carrying out normalization processing.
Further: the process of constructing the noise reduction self-coding network and the kernel code estimation model based on the training set data is as follows: FIG. 2(a) is a process diagram of a noise-reduced self-coding kernel density estimation model according to the present invention; (b) training a test chart for the noise reduction self-coding kernel density estimation model; FIG. 3 is a block diagram of a noise reduction self-coding network according to the present invention;
the process of constructing the noise reduction self-coding network and the kernel code estimation model based on the training set data is as follows: sequentially inputting training set data into a noise reduction self-coding network to train the noise reduction self-coding network and obtain hidden layer characteristics of network flow data; the hidden layer characteristics of the network flow data are input into a kernel density model, kernel density is trained, and a noise reduction self-coding network and a kernel password estimation model are constructed, wherein the specific process comprises the following steps:
step 1: setting the maximum training times;
step 2: initializing parameters of a noise reduction self-coding network;
and step 3: inputting training data into a noise reduction self-coding network, and sequentially calculating the input data as follows:
Figure BDA0003494666860000061
Figure BDA0003494666860000062
y=g(w'h+b') (3)
wherein: x, h and y respectively represent input layer data, hidden layer data and output data; q. q.sDRepresents [0, 1]]A is a noise factor; f and g respectively represent nonlinear excitation functions in the encoding and decoding processes; w and w 'are weight parameters, b and b' are offsets;
and 4, step 4: obtaining a reconstruction error by using a reconstruction error calculation formula;
JDAE(θ)=∑L(x,y) (4)
wherein: j. the design is a squareDAE(θ) represents the overall error, L represents the reconstruction error for each sample;
and 5: minimizing reconstruction errors using an Adagrad optimizer;
step 6: judging whether the training count value k is larger than the set maximum training times, if so, finishing the training of the noise reduction self-organizing network, otherwise, adjusting the weight and the offset parameter, and returning to the step 3;
and 7: selecting a Gaussian function as a kernel function of the kernel density estimation model and setting a value of a window width z of the Gaussian function; the formula of the gaussian function is as follows:
Figure BDA0003494666860000071
wherein: kz(x) Representing a gaussian function; x represents a random variable;
and 8: constructing a kernel density estimation model by using hidden layer characteristic data as input through a kernel density estimation formula to obtain probability density distribution of training data
Figure BDA0003494666860000072
The kernel density estimation formula is as follows:
Figure BDA0003494666860000073
wherein: h is hidden layer data, i is an input sample serial number, and n is the total number of input samples;
in the testing stage, the testing data obtains the threat occurrence probability through a constructed noise reduction self-coding kernel density estimation model, and the specific steps are as follows:
step 1: test data will be
Figure BDA0003494666860000074
Inputting the data into a trained noise reduction self-coding network, and acquiring hidden layer characteristics of test data
Figure BDA0003494666860000075
Step 2: inputting hidden layer characteristics of test data into constructed nuclear density estimation model to calculate density value
Figure BDA0003494666860000076
And step 3: obtaining an abnormal value of the test data by calculating the abnormal value of the hidden layer of the test data, wherein the smaller the abnormal value is, the more possible the abnormal value is, otherwise, the more consistent the distribution of the normal sample is;
the abnormal value calculation formula of the hidden layer is as follows:
Figure BDA0003494666860000077
Figure BDA0003494666860000078
testing an outlier of a hidden layer of data;
and 4, step 4: will be provided with
Figure BDA0003494666860000079
Inverted and normalized to [0, 1]]Taking the abnormal value of the hidden layer of the test data after inversion and normalization as the threat occurrence probability
Further, the process of performing security assessment on the network situation based on the threat occurrence probability of the network flow data and determining the level of the network security situation is as follows:
based on the threat occurrence probability of network flow data, performing score evaluation on the confidentiality C, the integrity I and the availability A of the network equipment based on a general vulnerability scoring system, and quantifying a threat influence value;
the calculation formula of the threat influence value is as follows:
Figure BDA00034946668600000710
wherein: ei: a threat impact value, i represents a network flow sequence number; c: confidentiality, I: integrity, a: availability.
The scores are shown in table 1:
TABLE 1 CIA evaluation Table
Figure BDA0003494666860000081
Quantifying a network security situation value based on the threat occurrence probability and the threat influence value of the network flow data;
network security situation value comprehensive consideration threat occurrence probability PiAnd a threat impact value EiSetting the network security situation value S of the ith network flowiComprises the following steps:
Si=PiEi (9)
normalizing the network security situation values, and classifying the network security situations by referring to a national public emergency plan and a situation division standard of a national internet emergency center. Dividing the normalized network security situation value into 5 intervals: the network security situation states of [0.00, 0.20], (0.20, 0.40], (0.40, 0.60], (0.60, 0.80], (0.80, 1.00) correspond to 5 levels of excellent, good, medium, poor and dangerous levels respectively.
A network security posture assessment system based on noise-reduced self-encoded kernel density estimation, comprising:
an acquisition module: the system is used for acquiring network flow data as situation data;
a preprocessing module: the device is used for preprocessing the situation data; dividing the preprocessed situation data into training set data, verification set data and test set data according to a proportion;
a training module: the method is used for constructing a noise reduction self-coding network and a nuclear code estimation model based on training set data;
an estimation module: the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is used for sequentially inputting test set data into a noise reduction self-coding network and a nuclear password estimation model to obtain the threat occurrence probability of network flow data;
a determination module: the method is used for carrying out security assessment on the network situation based on the threat occurrence probability of the network flow data and determining the level of the network security situation.
The computer environment used for the experiment was Intel (R) core (TM) i7-4790 CPU @3.60GHz, 8.00GB RAM, the simulation language was python3.6 tensoflow 1.10, and the compilation environment was Pycharm Community Edition 2020.2.2x 64. The network traffic data used is from the CICIDS-2017 dataset. The proportion of abnormal data of the data set from tuesday to friday to normal data is 3.2%, 4.9%, 0.49% and 69.7%, the sample distribution is quite unbalanced, 15% of monday data is selected as a training set, the data of tuesday to friday is selected as a testing set, wherein the proportion of the data of tuesday to friday is 10 times that of the data of the training set, the proportion of the data of friday to friday is selected according to the original proportion, and the data volume of the final testing set is shown in table 2.
TABLE 2 Experimental data set
Figure BDA0003494666860000091
In order to test the real effectiveness of the network security situation evaluation method based on the noise reduction self-coding kernel density estimation, through a large number of experiments, the parameters of the noise reduction self-coding network are selected as follows: the input neurons and the output neurons of the noise reduction self-coding network are 78, and the hidden layer neurons are based on
Figure BDA0003494666860000092
The number of the input neurons is determined to be 9, wherein m is the number of the input neurons, an error function is a mean square error function, an optimizer is Adagard, excitation functions of an encoder and a decoder are Sigmoid functions, the learning rate is 0.01, the iteration number is 100, batch is 300, and the noise factor is 0.4. The kernel function of the kernel density estimation model is a Gaussian function, and the parameter window width is according to
Figure BDA0003494666860000093
Example 1: noise reduction self-coding network and verification of validity of kernel code estimation model
And (3) carrying out validity verification on the noise reduction self-coding network and the kernel code estimation model by using the test set, and selecting the effects of an ROC curve and an AUC value index evaluation model. The ROC curve is formed by connecting different points under different threshold settings, and the ROC curve of an ideal model should be as close to the upper left end as possible. The AUC values represent the size of the area under the ROC curve, with larger values indicating better performance of the model. FIG. 4 is a ROC-AUC plot of the OCDAE-KDE model on the test set of the present invention; the dashed line in the figure indicates that AUC is 0.5, which indicates that the model classification effect at this value is only a random guess, and is a boundary for judging whether the model is valid or not; the model with 0< AUC <0.5 shows that the effect is extremely poor, the prediction effect is not better than random guess, and the model has no meaning in real life; a model in the range of 0.5< AUC <1 is valid and performs better as the AUC value increases, while the model is a perfect classification model when AUC is 1. As can be seen from the figure, the AUC values of the model disclosed herein are all above 0.9 in the four-day test, which indicates that the noise-reduced self-coding nuclear density estimation model has not only effectiveness, but also excellent performance.
Example 2: noise reduction self-coding network and kernel code estimation model classification performance verification
And comparing the noise reduction self-coding kernel density estimation model with an self-coding model (OCAE), an noise reduction self-coding model (OCDAE) and an self-coding kernel density estimation model (OCAE-KDE), and evaluating indexes by using an Accuracy (Accuracy), Precision (Precision), Recall (Recall) and F1 value. . Wherein TP is true positive, which indicates that the true sample is a positive type, and the model classification result is also the positive type; FP is false positive, which indicates that the real sample is a negative class, and the model classification result is a positive class; TN is true negative, which means that the true sample is a negative class, and the model classification result is also a negative class; FN is false negative, which indicates that the true sample is positive, but the model classification result is negative. The detailed formula is shown as formula (3-6).
Figure BDA0003494666860000101
Figure BDA0003494666860000102
Figure BDA0003494666860000103
Figure BDA0003494666860000104
The abnormal score threshold values of the four models need to be determined before the four indexes are calculated. The average reconstruction error of the training set plus three times of the standard deviation of the average reconstruction error of the training set is selected as an abnormal threshold value by the OCDAE and the OCAE, the abnormal scores on the training set are sorted according to the ascending order, and then the value at the position of 0.5% of the percentile is selected as the abnormal threshold value. Evaluation index results as shown in table 3, it can be seen that the accuracy and F1 values of the method are highest in the four models in four days, and are improved by 3.51% and 5.99% respectively relative to OCAE.
TABLE 3 comparison of evaluation index results of four models
Figure BDA0003494666860000105
Figure BDA0003494666860000111
Example 3: situation assessment result analysis
Network security situation values are calculated and visualized for quantitative evaluation on network flows in attack time periods from tuesday to friday by using the OCDAE-KDE, the OCAE, the OCDAE and the OCAE-KDE respectively, wherein the network security situation values of 300 groups of tests in friday are shown in figure 5. As can be seen from fig. 5, the peak time indicates that an abnormal situation occurs, the network is threatened, the basic situation value of the OCAE model is low, the trend is relatively gentle, the change amplitude of the situation value is small at the attack time, and the OCDAE model inhibits the copy tendency due to the 'damage' operation on the input data, so that the evaluation effect is slightly good. The OCDAE-KDE model and the OCAE-KDE model are combined with the nuclear density model to carry out probability estimation on the hidden layer simplified characteristics, the characterization capability on the network threat is stronger, but the basic situation values of the method provided by the invention in the four models are higher than those of the other three models, and the method has strong sensitivity to the network abnormal situation.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.
[1]Xie Lixia,Wang Yachao,Yu Jinbo.Network security situation awareness based on neural network[J].Journal of Tsinghua University(Science and Technology),2013,53(12):1750-1760.
Xielixia, Wang Asia, Yuibo, Wiibo.network security situation awareness based on neural networks [ J ]. university of Qinghua academic newspaper: nature science edition, 2013,53(12): 1750-.
[2]Xie Lixia,Wang Zhihua.Network security situation assessment method based on cuckoo search optimized back propagation neural network[J].Journal of Computer Applications,2017,37(7):1926-1930.
Xielixia, Wangxianghua, network security situation assessment method based on cuckoo search optimization BP neural network [ J ] computer application, 2017,37(7): 1926-.
[3]HanWeiHong,Tian ZhiHong,Huang Zizhong,et al.QuantitativeAssessment of Wireless Connected Intelligent Robot Swarms Network Security Situation[J].IEEE Access,2019,7(99):134293-134300.
[4]Zhang Yuchen,Zhang Renchuan,Liu Jing,Wang Yongwei.Network Security Situation Evaluaton Using Deep Auto-Encoders Network[J].Computer Engineering andApplications,2020,56(06):92-98.
The method comprises the steps of Zhangyuchen, Zhang-ren Sichuan, LiuJING and WanYongwei, evaluating the network security situation by using a deep self-coding network [ J ], computer engineering and application, 2020,56(06) and 92-98.
[5]Yang Hongyu,Zeng Renyan,Wang Fengyan,et al.An Unsupervised Learning-Based Network Threat Situation Assessment Model for Internet of Things[J].Security and CommunicationNetworks,2020,2020(9):1-11.
[6]Yang Hongyu,Zeng Renyun.A deep learning network security situation assessmentmethod[J].Journal ofXidianUniversity,2021,48(01):183-190.
[7]Lakkaraju K,Yurcik W,Lee A J.NVisionIP:NetFlow visualizations of system state for security situational awareness[C]//Workshop on Visualization&DataMining for Computer Security.2004.

Claims (7)

1. A network security situation assessment method based on noise reduction self-coding kernel density estimation is characterized by comprising the following steps:
acquiring network flow data as situation data;
preprocessing the situation data; dividing the preprocessed situation data into training set data and test set data according to a proportion;
constructing a noise reduction self-coding network and a kernel code estimation model based on training set data;
sequentially inputting the test set data into a noise reduction self-coding network and a nuclear code estimation model to obtain the threat occurrence probability of network flow data;
and performing security evaluation on the network situation based on the threat occurrence probability of the network flow data, and determining the level of the network security situation.
2. The method for evaluating the network security situation based on the noise reduction self-coding kernel density estimation according to claim 1, characterized in that: the preprocessing of the situation data comprises the following steps: and removing repeated characteristic columns in the situation data and samples with infinite and null values in fields, and then carrying out normalization processing.
3. The method for evaluating the network security situation based on the noise reduction self-coding kernel density estimation according to claim 1, characterized in that: the process of constructing the noise reduction self-coding network and the kernel code estimation model based on the training set data is as follows: sequentially inputting training set data into a noise reduction self-coding network to train the noise reduction self-coding network to obtain hidden layer characteristics of network flow data; inputting hidden layer characteristics of network flow data into a kernel density model, training the kernel density, and constructing a noise reduction self-coding network and a kernel password estimation model, wherein the specific process comprises the following steps:
step 1: setting the maximum training times;
step 2: initializing parameters of a noise reduction self-coding network;
and step 3: inputting training set data into a noise reduction self-coding network, and sequentially calculating the input data as follows:
Figure FDA0003494666850000011
Figure FDA0003494666850000012
y=g(w'h+b') (3)
x, h and y respectively represent input layer data, hidden layer data and output data; q. q.sDRepresents [0, 1]]A is a noise factor; f and g respectively represent nonlinear excitation functions in the encoding and decoding processes; w and w 'are weight parameters, b and b' are offsets;
and 4, step 4: calculating a reconstruction error by using a reconstruction error calculation formula;
and 5: minimizing reconstruction errors; adjusting the weight and the bias parameters;
step 6: judging whether the training count value k is larger than the set maximum training times, if so, finishing the training of the noise reduction self-organizing network, otherwise, adjusting the weight and the offset parameter, and returning to the step 3;
and 7: inputting the training data into the trained noise reduction self-coding network again and acquiring hidden layer characteristic data;
and 8: selecting a Gaussian function as a kernel function of the kernel density estimation model and setting a window width value of the Gaussian function;
and step 9: and taking the hidden layer characteristic data as input, and constructing a kernel density estimation model through a kernel density estimation formula to obtain the probability density distribution of the training data.
4. The method for evaluating the network security situation based on the noise reduction self-coding kernel density estimation according to claim 1, characterized in that: the process of sequentially inputting the test set data into the noise reduction self-coding network and the nuclear code estimation model to obtain the threat occurrence probability of the network flow data is as follows:
step 1: test data will be
Figure FDA0003494666850000021
Inputting the data into a trained noise reduction self-coding network, and acquiring hidden layer characteristics of test data
Figure FDA0003494666850000022
And 2, step: inputting hidden layer characteristics of test data into constructed nuclear density estimation model to calculate density value
Figure FDA0003494666850000023
And step 3: by passing
Figure FDA0003494666850000024
Deriving outliers of test data
Figure FDA0003494666850000025
Smaller outliers indicate the presence of anomalies;
and 4, step 4: and (4) inverting and normalizing the abnormal value to be between [0, 1], and taking the inverted and normalized value as the threat occurrence probability.
5. The method for evaluating the network security situation based on the noise-reduction self-coding kernel density estimation according to claim 1, characterized in that: the process of performing security assessment on the network situation based on the threat occurrence probability of the network flow data and determining the level of the network security situation is as follows:
based on the threat occurrence probability of network flow data, performing score evaluation on the confidentiality C, the integrity I and the availability A of the network equipment based on a general vulnerability scoring system, and quantifying a threat influence value;
quantifying a network security situation value based on the threat occurrence probability and the threat influence value of the network flow data;
normalizing the network security situation values, and dividing the normalized network security situation values into: 5 grades of excellent, good, medium, poor and dangerous are adopted.
6. The method according to claim 5, wherein the evaluation method comprises: the calculation formula of the threat influence value is as follows:
Figure FDA0003494666850000031
wherein: ei: a threat impact value, i represents a network flow sequence number; c: confidentiality, I: integrity, a: availability.
7. A network security situation assessment system based on noise reduction self-coding kernel density estimation is characterized by comprising:
an acquisition module: the system is used for acquiring network flow data as situation data;
a pretreatment module: the device is used for preprocessing the situation data; dividing the preprocessed situation data into training set data and test set data according to a proportion;
a training module: the method is used for constructing a noise reduction self-coding network and a nuclear code estimation model based on training set data;
an estimation module: the system is used for sequentially inputting the test set data into a noise reduction self-coding network and a nuclear code estimation model to obtain the threat occurrence probability of network flow data;
a determination module: the method is used for carrying out security assessment on the network situation based on the threat occurrence probability of the network flow data and determining the level of the network security situation.
CN202210108654.0A 2022-01-28 2022-01-28 Network security situation assessment method based on noise reduction self-coding kernel density estimation Pending CN114547608A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210108654.0A CN114547608A (en) 2022-01-28 2022-01-28 Network security situation assessment method based on noise reduction self-coding kernel density estimation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210108654.0A CN114547608A (en) 2022-01-28 2022-01-28 Network security situation assessment method based on noise reduction self-coding kernel density estimation

Publications (1)

Publication Number Publication Date
CN114547608A true CN114547608A (en) 2022-05-27

Family

ID=81674113

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210108654.0A Pending CN114547608A (en) 2022-01-28 2022-01-28 Network security situation assessment method based on noise reduction self-coding kernel density estimation

Country Status (1)

Country Link
CN (1) CN114547608A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115242544A (en) * 2022-08-05 2022-10-25 河北师范大学 Network security situation sensing method and system based on improved Res2net
CN116915459A (en) * 2023-07-13 2023-10-20 上海戎磐网络科技有限公司 Network threat analysis method based on large language model

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115242544A (en) * 2022-08-05 2022-10-25 河北师范大学 Network security situation sensing method and system based on improved Res2net
CN116915459A (en) * 2023-07-13 2023-10-20 上海戎磐网络科技有限公司 Network threat analysis method based on large language model
CN116915459B (en) * 2023-07-13 2024-03-08 上海戎磐网络科技有限公司 Network threat analysis method based on large language model

Similar Documents

Publication Publication Date Title
Yang et al. A network security situation assessment method based on adversarial deep learning
CN112039903B (en) Network security situation assessment method based on deep self-coding neural network model
CN114547608A (en) Network security situation assessment method based on noise reduction self-coding kernel density estimation
CN116366376B (en) APT attack traceability graph analysis method
CN114528547A (en) ICPS (information storage and protection System) unsupervised online attack detection method and device based on community feature selection
Zhuo et al. Attack and defense: Adversarial security of data-driven FDC systems
Wang et al. Research on network security situation assessment and forecasting technology
CN112613032B (en) Host intrusion detection method and device based on system call sequence
CN111669410B (en) Industrial control network negative sample data generation method, device, server and medium
Coli et al. DDoS attacks detection in the IoT using deep gaussian-bernoulli restricted boltzmann machine
CN114118680A (en) Network security situation assessment method and system
Huo et al. Traffic anomaly detection method based on improved GRU and EFMS-Kmeans clustering
Kaiser et al. Attack Forecast and Prediction
Al-Nafjan et al. Intrusion detection using PCA based modular neural network
Cai et al. Machine learning-based threat identification of industrial internet
Albahar et al. THE USE OF FRACTAL DIMENSION (FD) ANALYSIS IN DETECTION OF ANOMALIES, SABOTAGES, AND MALICIOUS ACTS IN A CYBER-PHYSICAL SYSTEM USING HIGUCHI'S ALGORITHM.
Olson et al. Decision making with uncertainty and data mining
Deng et al. VFD-AE: Efficient Attack Detection in Industrial Cyber-Physical Systems using Vital Feature Discovery and Deep Learning Technique
Hussein et al. ANOMALY DETECTION IN CYBER-PHYSICAL SYSTEMS USING EXPLAINABLE ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
Raut et al. Adaptive Neuro-Fuzzy Inference System For Anomaly-Based Intrusion Detection
Meng et al. Computer Network Security Evaluation Method Based on GABP Model
CN114884694B (en) Industrial control network security risk assessment method based on hierarchical modeling
Wang Identification of damage locations in long-span continuous rigid frame bridges by using support vector machines
CN117521042B (en) High-risk authorized user identification method based on ensemble learning
Liu Multivariate Network Intrusion Detection Methods Based on Machine Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination