CN115203683A - Network security internal threat detection method - Google Patents

Network security internal threat detection method Download PDF

Info

Publication number
CN115203683A
CN115203683A CN202210554047.7A CN202210554047A CN115203683A CN 115203683 A CN115203683 A CN 115203683A CN 202210554047 A CN202210554047 A CN 202210554047A CN 115203683 A CN115203683 A CN 115203683A
Authority
CN
China
Prior art keywords
data
network
prototype
detection
network security
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210554047.7A
Other languages
Chinese (zh)
Inventor
陶晓玲
卢深
符廉铕
余玥琳
赵峰
杨昌松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN202210554047.7A priority Critical patent/CN115203683A/en
Publication of CN115203683A publication Critical patent/CN115203683A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/554Detecting local intrusion or implementing counter-measures involving event detection and direct action
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/086Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Physiology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Security & Cryptography (AREA)
  • Genetics & Genomics (AREA)
  • Computer Hardware Design (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to the technical field of network security anomaly detection, in particular to a network security internal threat detection method, which aims at the problems that data distribution of an internal threat detection data set is unbalanced, the existing anomaly-based threat detection cannot be associated with a specific threat scene, and the detection efficiency is low due to the lack of sufficient label data. The network security internal threat detection method improves the detection performance of the model.

Description

Network security internal threat detection method
Technical Field
The invention relates to the technical field of network security anomaly detection, in particular to a network security internal threat detection method.
Background
In recent years, security accidents caused by malicious operations of internal users frequently occur, and since most internal users have access rights to a system, know vulnerabilities of an internal network, and grasp core data, internal attacks often cause more serious losses than external attacks.
At present, certain research results are obtained in the field of internal user behavior detection, but the problems that data distribution of a data set is unbalanced, enough label data is lacked, unknown threat behaviors are difficult to detect and the like still exist, and therefore detection performance is low are caused.
Disclosure of Invention
The invention aims to provide a network security internal threat detection method, which uses a small sample learning form to carry out modeling, carries out feature learning on limited data based on a DNN (digital noise network) Prototype network and carries out classification detection, and uses a CWGAN (continuous wave antenna array) network to carry out a sample enhancement method so as to improve the model detection performance.
In order to achieve the above object, the present invention provides a method for detecting internal threats of network security, which comprises the following steps:
performing characteristic collection of original data to obtain a data set;
dividing the data set into a support set and a query set;
inputting data into a Protopype network for internal detection;
and based on CWGAN network data enhancement, the generated data is expanded to a query set to complete a detection task.
During the process of collecting the characteristics of the original data and obtaining the data set, firstly, the behavior characteristics are extracted from the original data, and then the behavior characteristic data is subjected to missing value and normalization preprocessing.
The support set is a training set, the query set is a test set, and the support set and the query set are both K-wayN-shot data sets.
The method comprises the steps of inputting data into a ProtoType network for internal detection, specifically, inputting a mapping function in the ProtoType network to obtain a feature vector of each sample, calculating the feature vector to obtain a Prototype representation of a corresponding class, and then classifying and verifying the feature vector in a query set and the calculated Prototype.
In the process of completing a detection task after expanding generated data to a query set based on CWGAN network data enhancement, inputting a class label represented by a prototype calculated during internal detection as a constraint condition, enabling the generated data of the trained GAN network to be infinitely similar to source input data, expanding the generated data to the query set, and completing a class verification test.
Wherein, the parameter adjustment of the Prototype network and the CWGAN network data is realized by a GA genetic algorithm.
Wherein, in the process of data input into the Prototype network for internal detection, a DNN neural network is used for learning the characteristic representation of the sample data.
The invention provides a network security internal threat detection method, which aims at the problems that data distribution is unbalanced, deep networks cannot be effectively used for learning data characteristics and cannot be associated with specific threat behaviors or scenes, uses a small sample learning form for modeling, simultaneously uses a DNN-based Prototype network for carrying out characteristic learning on limited data and carrying out classification detection, further aims at the problem of data lack of enough labels, uses a CWGAN network, simultaneously uses a GA genetic algorithm for automatic optimization, and improves the detection performance of a model.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a method for detecting a threat inside network security according to the present invention.
FIG. 2 is a schematic diagram of a network composition framework of a network security internal threat detection method according to the present invention.
Fig. 3 is a schematic diagram of the Prototype network of the present invention for internal threat detection.
Fig. 4 is a schematic diagram of an objective function and a network structure of the CWGAN network of the present invention.
FIG. 5 is a comparison of the evaluation index parameters of the present invention and the remaining algorithms.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
Referring to fig. 1, the present invention provides a method for detecting a threat inside network security, comprising the following steps:
s1: performing characteristic collection of original data to obtain a data set;
s2: dividing the data set into a support set and a query set;
s3: inputting data into a Prototype network for internal detection;
s4: and based on CWGAN network data enhancement, the generated data is expanded to a query set to complete a detection task.
During the process of collecting the characteristics of the original data and acquiring the data set, firstly, the behavior characteristics are extracted from the original data, and then the behavior characteristic data is subjected to missing value and normalization preprocessing.
The support set is a training set, the query set is a test set, and the support set and the query set are both K-wayN-shot data sets.
The method comprises the steps of inputting data into a ProtoType network for internal detection, specifically, obtaining a feature vector of each sample through a mapping function in the data input ProtoType network, calculating the feature vector to obtain a Prototype representation of a corresponding class, and then performing classification verification by using the feature vector in a query set and the calculated Prototype.
In the process of completing a detection task after expanding generated data to a query set based on CWGAN network data enhancement, inputting a class label represented by a prototype calculated during internal detection as a constraint condition, leading the generated data of the trained GAN network to be infinitely similar to source input data, expanding the generated data to the query set and completing a class verification test.
Parameter adjustment of Prototype network and CWGAN network data is realized through a GA genetic algorithm.
During the data input into the Prototype network for internal detection, the DNN neural network is used to learn the feature representation of the sample data.
The network composition framework in the network security internal threat detection method can be understood by referring to fig. 2.
The invention is further described below with reference to specific embodiments and implementations:
the present embodiment uses the v4.2 version of the CMU-CERT dataset for algorithm training and testing. The CERT dataset defines malicious activities by generating various usage scenarios, the dataset containing 3277 ten thousand individual events, available audit data sources including login activity, email traffic, web browsing tracking, file access logs, USB driver usage, and LDAP information describing organizational hierarchies and user roles. The data set in version v4.2 records activity logs for 1000 employees of different organizations over 17 months.
1. Data processing
Based on the log information in the dataset, the CERT dataset defines three threat scenarios, which are as follows:
(1) Scene 1: a user who has never used the removable drive before or overtime logs in and uploads data to wikileaks. Shortly thereafter, the user leaves the organization.
(2) Scene 2: the user starts to search for new employment opportunities through the job hunting and recruitment website. Before leaving the organization, the user uses the USB drive to reveal confidential information. The USB drive is used more frequently than he/she used before.
(3) Scene 3: the user downloads the key recorder and obtains a list of passwords that organize the different employees. Next, he/she transmits this password list to the supervisor's computer using the USB driver, and attempts to search for the supervisor's password. Upon success, the computer on which he/she logs in to the supervisor issues an email causing internal panic. This type of malicious activity is often from system administrators, especially when they have contradictory conflicts with their supervisors.
The method carries out internal threat detection analysis based on the three types of threat scenes, and extracts relevant characteristics from five log files of logic, device, file, email and http of a data set, wherein the extracted characteristics are shown in the following table 1.
TABLE 1 behavioral characteristic Table
Figure BDA0003654168870000041
Figure BDA0003654168870000051
20 behavior characteristics are extracted from the 5 log files in the table, and then preprocessing such as missing values and normalization is carried out on the characteristic data so as to reduce the complexity of network training.
2. Internal threat detection based on Prototype network
After completing the feature collection of the raw data, the data is classified into a support set (support set) and a query set (query set):
(1) Support set: i.e., training set, assuming that there is a sample set S = { (x) containing N labels 1 ,y 1 ),...,(x N ,y N ) Where x denotes sample data, y i E { 1.. K } represents the label of the sample data corresponding class, S K A data set with class k samples is represented. This dataset is called a K-way N-shot dataset, i.e. a dataset containing a total of K × N samples of the K classes N.
(2) And (3) query set: i.e., the test set, corresponds to the support set, which is also a K-way N-shot, but differs from the support set in that some of the data in the query set is unlabeled.
After the data set is divided into the support set and the query set, the prototype network trains an embedding function f θ (x) The function is a neural network, the projection mode of the sample x in a measurement space is represented, the parameter theta is a value learned through the neural network, and the distribution of the sample in the measurement space can be more accurately described through learning a better theta value, namely the data of the same type are closer and the data of different types are farther. Then, prototype representations of each class are calculated on the metric space, and the prototype calculation formula is as follows:
Figure BDA0003654168870000052
wherein c is k Prototype representation, S, representing class k k Represents the class k, | S k Number of samples in | Category k, (x) i ,y i ) Feature vectors and labels for the samples.
And calculating the distance distribution between the samples in the query set and the prototypes based on the distance function of softmax, so as to determine which type of prototypes the samples in the query set belong to, and finishing the classification task. The distribution function calculation formula is expressed as follows:
Figure BDA0003654168870000053
this formulation means that for sample x the probability of belonging to class k is obtained by correlation calculation, where d (-) is a distance function. The cosine similarity function is used as the distance function.
Finally, the prototype network expresses the negative logarithmic probability of the distribution function as the loss function, the formula is shown as follows, the formula calculates the minimum value of the distribution function through a gradient descent algorithm, so that a good theta value is learned after convergence, and the function f after training can be considered to be finished θ (x) The same samples can be mapped to a location that is relatively close to each other.
J(θ)=-log(p θ (y=k|x))
The method for detecting the internal threat by the Prototype network can be simply summarized as shown in fig. 3, the samples are projected to a measurement space, a Prototype to which each sample belongs, such as C1, C2 and C3 in the figure, is calculated, then the detected sample X is projected to the measurement space, and finally, which type the sample belongs to is calculated through a distance function.
3. CWGAN-based data enhancement
The working content of this module is sample enhancement under the condition of small sample, and a small sample training task τ = (S) is known τ ,Q τ ) By support set S τ And query set Q τ Composition wherein (x, y) ∈ S τ X and y are sample data and tag data, respectively, and then the sample x passes through a feature extraction network, i.e., an embedding function f θ (x) Generating a representation vector
Figure BDA0003654168870000061
The expression vector is calculated through a prototype network to obtain a prototype expression c of the expression vector, and a label y of the prototype c is used as a condition to be input into a generator of the WGAN network to obtain synthesis sample data
Figure BDA0003654168870000062
Where z is noisy data, the goal of the generator is that the synthesized data is infinitely close to the real data, and the discriminator then attempts to distinguish the synthesized data. And finally, adding the data synthesized by the trained generator into a training data set to complete the task of sample enhancement. The objective function and network structure of CWGAN are shown in figure 4, respectively.
Figure BDA0003654168870000063
The whole network model is formed by a Prototype network and a CWGAN network together, the optimal parameters are adjusted by using a Genetic Algorithm (GA) during model training so as to continuously reduce the target function of the network, and finally, a detection task is completed through a test data set, namely a query set, and each index is calculated.
Furthermore, the invention also compares an internal threat detection method based on LSTM-AE, an internal threat detection method based on DCNNs, an internal threat detection method based on GCN and the detection methods provided in this chapter. Precision, recall and F1-score indexes are calculated from the macro-average angle, and the proposed model is subjected to comparative verification, as shown in FIG. 5 by the results calculated by the respective evaluation indexes.
The differences of the four evaluation indexes of several algorithms can be obviously observed, the DCNNs are best in performance on Accuracy, the method is second, the DCNNs are not satisfactory in performance on other indexes, LSTM-AE is best on Precision, and the method is second, and the method is superior to other algorithms on the rest two indexes, namely Recall Recall rate and F1 comprehensive index. Therefore, the network security internal threat detection method can effectively detect the internal threat.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (7)

1. A network security internal threat detection method is characterized by comprising the following steps:
performing characteristic collection of original data to obtain a data set;
dividing the data set into a support set and a query set;
inputting data into a Protopype network for internal detection;
and based on CWGAN network data enhancement, expanding the generated data to a query set and then completing a detection task.
2. The method for network security internal threat detection according to claim 1,
during the process of collecting the characteristics of the original data and acquiring the data set, firstly, the behavior characteristics are extracted from the original data, and then the behavior characteristic data is subjected to missing value and normalization preprocessing.
3. The method for network security internal threat detection according to claim 1,
the support set is a training set, the query set is a test set, and the support set and the query set are both K-wayN-shot data sets.
4. The network security internal threat detection method of claim 1,
the method comprises the steps of inputting data into a ProtoType network for internal detection, specifically, obtaining a feature vector of each sample through a mapping function in the ProtoType network, calculating the feature vector to obtain Prototype representation of a corresponding class, and then performing classification verification by using feature vectors in a query set and the calculated Prototype.
5. The network security internal threat detection method of claim 1,
in the process of completing a detection task after expanding generated data to a query set based on CWGAN network data enhancement, a class label represented by a prototype calculated during internal detection is used as constraint condition input, the generated data of the trained GAN network is infinitely similar to source input data, the generated data is expanded to the query set, and a class verification test is completed.
6. The network security internal threat detection method of claim 1,
parameter adjustment of Prototype network and CWGAN network data is realized through a GA genetic algorithm.
7. The network security internal threat detection method of claim 4,
during the data input into the Prototype network for internal detection, the DNN neural network is used to learn the feature representation of the sample data.
CN202210554047.7A 2022-05-20 2022-05-20 Network security internal threat detection method Pending CN115203683A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210554047.7A CN115203683A (en) 2022-05-20 2022-05-20 Network security internal threat detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210554047.7A CN115203683A (en) 2022-05-20 2022-05-20 Network security internal threat detection method

Publications (1)

Publication Number Publication Date
CN115203683A true CN115203683A (en) 2022-10-18

Family

ID=83575018

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210554047.7A Pending CN115203683A (en) 2022-05-20 2022-05-20 Network security internal threat detection method

Country Status (1)

Country Link
CN (1) CN115203683A (en)

Similar Documents

Publication Publication Date Title
CN108806718B (en) Audio identification method based on analysis of ENF phase spectrum and instantaneous frequency spectrum
CN110377605B (en) Sensitive attribute identification and classification method for structured data
CN112367273B (en) Flow classification method and device of deep neural network model based on knowledge distillation
CN110134719B (en) Identification and classification method for sensitive attribute of structured data
US11533373B2 (en) Global iterative clustering algorithm to model entities' behaviors and detect anomalies
CN111143838B (en) Database user abnormal behavior detection method
CN110493262B (en) Classification-improved network attack detection method and system
CN113904872A (en) Feature extraction method and system for anonymous service website fingerprint attack
CN112926045A (en) Group control equipment identification method based on logistic regression model
CN117195250A (en) Data security management method and system
CN115577357A (en) Android malicious software detection method based on stacking integration technology
CN109344913B (en) Network intrusion behavior detection method based on improved MajorCluster clustering
CN114329455A (en) User abnormal behavior detection method and device based on heterogeneous graph embedding
CN117236699A (en) Network risk identification method and system based on big data analysis
CN115242487B (en) APT attack sample enhancement and detection method based on meta-behavior
CN115203683A (en) Network security internal threat detection method
CN115643153A (en) Alarm correlation analysis method based on graph neural network
CN111507878B (en) Network crime suspects investigation method and system based on user portrait
CN113010673A (en) Vulnerability automatic classification method based on entropy optimization support vector machine
CN112422505A (en) Network malicious traffic identification method based on high-dimensional extended key feature vector
CN111291376A (en) Web vulnerability verification method based on crowdsourcing and machine learning
CN117579324B (en) Intrusion detection method based on gating time convolution network and graph
CN117786121B (en) File identification method and system based on artificial intelligence
CN117150265B (en) Robust radio frequency signal open set identification method under low signal-to-noise ratio condition
Tan et al. Efficient intrusion detection method based on Conditional Random Fields

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination