CN115203683A

CN115203683A - Network security internal threat detection method

Info

Publication number: CN115203683A
Application number: CN202210554047.7A
Authority: CN
Inventors: 陶晓玲; 卢深; 符廉铕; 余玥琳; 赵峰; 杨昌松
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2022-05-20
Filing date: 2022-05-20
Publication date: 2022-10-18

Abstract

The invention relates to the technical field of network security anomaly detection, in particular to a network security internal threat detection method, which aims at the problems that data distribution of an internal threat detection data set is unbalanced, the existing anomaly-based threat detection cannot be associated with a specific threat scene, and the detection efficiency is low due to the lack of sufficient label data. The network security internal threat detection method improves the detection performance of the model.

Description

Network security internal threat detection method

Technical Field

The invention relates to the technical field of network security anomaly detection, in particular to a network security internal threat detection method.

Background

In recent years, security accidents caused by malicious operations of internal users frequently occur, and since most internal users have access rights to a system, know vulnerabilities of an internal network, and grasp core data, internal attacks often cause more serious losses than external attacks.

At present, certain research results are obtained in the field of internal user behavior detection, but the problems that data distribution of a data set is unbalanced, enough label data is lacked, unknown threat behaviors are difficult to detect and the like still exist, and therefore detection performance is low are caused.

Disclosure of Invention

The invention aims to provide a network security internal threat detection method, which uses a small sample learning form to carry out modeling, carries out feature learning on limited data based on a DNN (digital noise network) Prototype network and carries out classification detection, and uses a CWGAN (continuous wave antenna array) network to carry out a sample enhancement method so as to improve the model detection performance.

In order to achieve the above object, the present invention provides a method for detecting internal threats of network security, which comprises the following steps:

performing characteristic collection of original data to obtain a data set;

dividing the data set into a support set and a query set;

inputting data into a Protopype network for internal detection;

and based on CWGAN network data enhancement, the generated data is expanded to a query set to complete a detection task.

During the process of collecting the characteristics of the original data and obtaining the data set, firstly, the behavior characteristics are extracted from the original data, and then the behavior characteristic data is subjected to missing value and normalization preprocessing.

The support set is a training set, the query set is a test set, and the support set and the query set are both K-wayN-shot data sets.

The method comprises the steps of inputting data into a ProtoType network for internal detection, specifically, inputting a mapping function in the ProtoType network to obtain a feature vector of each sample, calculating the feature vector to obtain a Prototype representation of a corresponding class, and then classifying and verifying the feature vector in a query set and the calculated Prototype.

In the process of completing a detection task after expanding generated data to a query set based on CWGAN network data enhancement, inputting a class label represented by a prototype calculated during internal detection as a constraint condition, enabling the generated data of the trained GAN network to be infinitely similar to source input data, expanding the generated data to the query set, and completing a class verification test.

Wherein, the parameter adjustment of the Prototype network and the CWGAN network data is realized by a GA genetic algorithm.

Wherein, in the process of data input into the Prototype network for internal detection, a DNN neural network is used for learning the characteristic representation of the sample data.

The invention provides a network security internal threat detection method, which aims at the problems that data distribution is unbalanced, deep networks cannot be effectively used for learning data characteristics and cannot be associated with specific threat behaviors or scenes, uses a small sample learning form for modeling, simultaneously uses a DNN-based Prototype network for carrying out characteristic learning on limited data and carrying out classification detection, further aims at the problem of data lack of enough labels, uses a CWGAN network, simultaneously uses a GA genetic algorithm for automatic optimization, and improves the detection performance of a model.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of a method for detecting a threat inside network security according to the present invention.

FIG. 2 is a schematic diagram of a network composition framework of a network security internal threat detection method according to the present invention.

Fig. 3 is a schematic diagram of the Prototype network of the present invention for internal threat detection.

Fig. 4 is a schematic diagram of an objective function and a network structure of the CWGAN network of the present invention.

FIG. 5 is a comparison of the evaluation index parameters of the present invention and the remaining algorithms.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

Referring to fig. 1, the present invention provides a method for detecting a threat inside network security, comprising the following steps:

s1: performing characteristic collection of original data to obtain a data set;

s2: dividing the data set into a support set and a query set;

s3: inputting data into a Prototype network for internal detection;

s4: and based on CWGAN network data enhancement, the generated data is expanded to a query set to complete a detection task.

During the process of collecting the characteristics of the original data and acquiring the data set, firstly, the behavior characteristics are extracted from the original data, and then the behavior characteristic data is subjected to missing value and normalization preprocessing.

The method comprises the steps of inputting data into a ProtoType network for internal detection, specifically, obtaining a feature vector of each sample through a mapping function in the data input ProtoType network, calculating the feature vector to obtain a Prototype representation of a corresponding class, and then performing classification verification by using the feature vector in a query set and the calculated Prototype.

In the process of completing a detection task after expanding generated data to a query set based on CWGAN network data enhancement, inputting a class label represented by a prototype calculated during internal detection as a constraint condition, leading the generated data of the trained GAN network to be infinitely similar to source input data, expanding the generated data to the query set and completing a class verification test.

Parameter adjustment of Prototype network and CWGAN network data is realized through a GA genetic algorithm.

During the data input into the Prototype network for internal detection, the DNN neural network is used to learn the feature representation of the sample data.

The network composition framework in the network security internal threat detection method can be understood by referring to fig. 2.

The invention is further described below with reference to specific embodiments and implementations:

the present embodiment uses the v4.2 version of the CMU-CERT dataset for algorithm training and testing. The CERT dataset defines malicious activities by generating various usage scenarios, the dataset containing 3277 ten thousand individual events, available audit data sources including login activity, email traffic, web browsing tracking, file access logs, USB driver usage, and LDAP information describing organizational hierarchies and user roles. The data set in version v4.2 records activity logs for 1000 employees of different organizations over 17 months.

1. Data processing

Based on the log information in the dataset, the CERT dataset defines three threat scenarios, which are as follows:

(1) Scene 1: a user who has never used the removable drive before or overtime logs in and uploads data to wikileaks. Shortly thereafter, the user leaves the organization.

(2) Scene 2: the user starts to search for new employment opportunities through the job hunting and recruitment website. Before leaving the organization, the user uses the USB drive to reveal confidential information. The USB drive is used more frequently than he/she used before.

(3) Scene 3: the user downloads the key recorder and obtains a list of passwords that organize the different employees. Next, he/she transmits this password list to the supervisor's computer using the USB driver, and attempts to search for the supervisor's password. Upon success, the computer on which he/she logs in to the supervisor issues an email causing internal panic. This type of malicious activity is often from system administrators, especially when they have contradictory conflicts with their supervisors.

The method carries out internal threat detection analysis based on the three types of threat scenes, and extracts relevant characteristics from five log files of logic, device, file, email and http of a data set, wherein the extracted characteristics are shown in the following table 1.

TABLE 1 behavioral characteristic Table

20 behavior characteristics are extracted from the 5 log files in the table, and then preprocessing such as missing values and normalization is carried out on the characteristic data so as to reduce the complexity of network training.

2. Internal threat detection based on Prototype network

After completing the feature collection of the raw data, the data is classified into a support set (support set) and a query set (query set):

(1) Support set: i.e., training set, assuming that there is a sample set S = { (x) containing N labels ₁ ,y ₁ ),...,(x _N ,y _N ) Where x denotes sample data, y _i E { 1.. K } represents the label of the sample data corresponding class, S _K A data set with class k samples is represented. This dataset is called a K-way N-shot dataset, i.e. a dataset containing a total of K × N samples of the K classes N.

(2) And (3) query set: i.e., the test set, corresponds to the support set, which is also a K-way N-shot, but differs from the support set in that some of the data in the query set is unlabeled.

After the data set is divided into the support set and the query set, the prototype network trains an embedding function f _θ (x) The function is a neural network, the projection mode of the sample x in a measurement space is represented, the parameter theta is a value learned through the neural network, and the distribution of the sample in the measurement space can be more accurately described through learning a better theta value, namely the data of the same type are closer and the data of different types are farther. Then, prototype representations of each class are calculated on the metric space, and the prototype calculation formula is as follows:

wherein c is _k Prototype representation, S, representing class k _k Represents the class k, | S _k Number of samples in | Category k, (x) _i ,y _i ) Feature vectors and labels for the samples.

And calculating the distance distribution between the samples in the query set and the prototypes based on the distance function of softmax, so as to determine which type of prototypes the samples in the query set belong to, and finishing the classification task. The distribution function calculation formula is expressed as follows:

this formulation means that for sample x the probability of belonging to class k is obtained by correlation calculation, where d (-) is a distance function. The cosine similarity function is used as the distance function.

Finally, the prototype network expresses the negative logarithmic probability of the distribution function as the loss function, the formula is shown as follows, the formula calculates the minimum value of the distribution function through a gradient descent algorithm, so that a good theta value is learned after convergence, and the function f after training can be considered to be finished _θ (x) The same samples can be mapped to a location that is relatively close to each other.

J(θ)＝-log(p _θ (y＝k|x))

The method for detecting the internal threat by the Prototype network can be simply summarized as shown in fig. 3, the samples are projected to a measurement space, a Prototype to which each sample belongs, such as C1, C2 and C3 in the figure, is calculated, then the detected sample X is projected to the measurement space, and finally, which type the sample belongs to is calculated through a distance function.

3. CWGAN-based data enhancement

The working content of this module is sample enhancement under the condition of small sample, and a small sample training task τ = (S) is known _τ ,Q _τ ) By support set S _τ And query set Q _τ Composition wherein (x, y) ∈ S _τ X and y are sample data and tag data, respectively, and then the sample x passes through a feature extraction network, i.e., an embedding function f _θ (x) Generating a representation vector

The expression vector is calculated through a prototype network to obtain a prototype expression c of the expression vector, and a label y of the prototype c is used as a condition to be input into a generator of the WGAN network to obtain synthesis sample data

Where z is noisy data, the goal of the generator is that the synthesized data is infinitely close to the real data, and the discriminator then attempts to distinguish the synthesized data. And finally, adding the data synthesized by the trained generator into a training data set to complete the task of sample enhancement. The objective function and network structure of CWGAN are shown in figure 4, respectively.

The whole network model is formed by a Prototype network and a CWGAN network together, the optimal parameters are adjusted by using a Genetic Algorithm (GA) during model training so as to continuously reduce the target function of the network, and finally, a detection task is completed through a test data set, namely a query set, and each index is calculated.

Furthermore, the invention also compares an internal threat detection method based on LSTM-AE, an internal threat detection method based on DCNNs, an internal threat detection method based on GCN and the detection methods provided in this chapter. Precision, recall and F1-score indexes are calculated from the macro-average angle, and the proposed model is subjected to comparative verification, as shown in FIG. 5 by the results calculated by the respective evaluation indexes.

The differences of the four evaluation indexes of several algorithms can be obviously observed, the DCNNs are best in performance on Accuracy, the method is second, the DCNNs are not satisfactory in performance on other indexes, LSTM-AE is best on Precision, and the method is second, and the method is superior to other algorithms on the rest two indexes, namely Recall Recall rate and F1 comprehensive index. Therefore, the network security internal threat detection method can effectively detect the internal threat.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A network security internal threat detection method is characterized by comprising the following steps:

performing characteristic collection of original data to obtain a data set;

dividing the data set into a support set and a query set;

inputting data into a Protopype network for internal detection;

and based on CWGAN network data enhancement, expanding the generated data to a query set and then completing a detection task.

2. The method for network security internal threat detection according to claim 1,

3. The method for network security internal threat detection according to claim 1,

4. The network security internal threat detection method of claim 1,

the method comprises the steps of inputting data into a ProtoType network for internal detection, specifically, obtaining a feature vector of each sample through a mapping function in the ProtoType network, calculating the feature vector to obtain Prototype representation of a corresponding class, and then performing classification verification by using feature vectors in a query set and the calculated Prototype.

5. The network security internal threat detection method of claim 1,

in the process of completing a detection task after expanding generated data to a query set based on CWGAN network data enhancement, a class label represented by a prototype calculated during internal detection is used as constraint condition input, the generated data of the trained GAN network is infinitely similar to source input data, the generated data is expanded to the query set, and a class verification test is completed.

6. The network security internal threat detection method of claim 1,

7. The network security internal threat detection method of claim 4,