CN117527295A

CN117527295A - Self-adaptive network threat detection system based on artificial intelligence

Info

Publication number: CN117527295A
Application number: CN202311254098.9A
Authority: CN
Inventors: 王文佳
Original assignee: Guangdong Information Security Evaluation Center
Current assignee: Guangdong Information Security Evaluation Center
Priority date: 2023-09-26
Filing date: 2023-09-26
Publication date: 2024-02-06

Abstract

The invention relates to the technical field of network threat detection, in particular to an artificial intelligence-based self-adaptive network threat detection system, which comprises: a data collection module configured to collect network traffic data and generate a first data set; the data preprocessing module is used for carrying out denoising and standardization operation to generate a second data set; the feature selection module is used for performing feature selection on the second data set according to a preset feature selection algorithm to generate a third data set; an artificial intelligence model module that generates a fourth dataset; the self-adaptive module is used for comparing and analyzing the threat judgment in the fourth data set with the original network flow data in the first data set and automatically adjusting the parameters of the artificial intelligent model module; the prediction correction module generates a fifth dataset. The invention not only improves the accuracy of threat detection, but also greatly improves the response speed and flexibility of the system because the data flow and parameter adjustment between the modules are all automatically carried out.

Description

Self-adaptive network threat detection system based on artificial intelligence

Technical Field

The invention relates to the technical field of network threat detection, in particular to an artificial intelligence-based self-adaptive network threat detection system.

Background

Network security has become an integral part of today's information society. With the continuous upgrading and diversification of cyber attack means, conventional cyber threat detection methods, such as rule-based and signature-based detection methods, have been difficult to cope with increasingly complex and variable cyber threats. These conventional methods often rely on predefined rules or known attack signatures, and lack sufficient detection capability for new or unknown attack means. Meanwhile, due to the complexity of network environment and data traffic, false alarm and missing report rate are relatively high.

Artificial intelligence techniques, particularly machine learning and deep learning, have shown powerful performance in many fields of image recognition, natural language processing, and the like. However, the application of these advanced artificial intelligence techniques to the field of cyber threat detection still faces a number of challenges. One of the key issues is how to effectively extract features useful for threat detection from massive, multidimensional network traffic data, and how to build an accurate and real-time threat detection model based on these features.

Most existing network threat detection systems based on artificial intelligence are static and lack the ability to adapt in real time. This results in the possibility that these systems may degrade detection performance in the face of changing network environments and threat patterns. Moreover, these systems often also do not take into account threat predictions for future network traffic, and thus it is difficult to provide comprehensive and prospective network security protection.

In order to solve the problems, the invention provides an artificial intelligence-based self-adaptive network threat detection system, which aims to realize network threat detection with high accuracy and high response speed through integration and self-adaptive optimization of multiple modules.

Disclosure of Invention

Based on the above object, the present invention provides an artificial intelligence based adaptive cyber threat detection system.

An artificial intelligence based adaptive cyber threat detection system comprising:

a data collection module configured to collect network traffic data and generate a first data set;

the data preprocessing module is connected to the data collecting module, receives the first data set, performs denoising and standardization operation, and generates a second data set;

the feature selection module is connected to the data preprocessing module, receives the second data set, performs feature selection on the second data set according to a preset feature selection algorithm, and generates a third data set;

the artificial intelligent model module is connected to the feature selection module, receives the third data set, performs threat detection on the third data set by adopting an artificial intelligent algorithm, and generates a fourth data set, wherein the fourth data set comprises judgment on whether the network traffic has threat or not;

the self-adaptive module is connected with the artificial intelligent model module and the data collection module, receives the fourth data set and the first data set, compares and analyzes the threat judgment in the fourth data set with the original network flow data in the first data set, automatically adjusts parameters of the artificial intelligent model module, and feeds back the parameters to the artificial intelligent model module;

the prediction correction module is connected to the self-adaptive module and the artificial intelligent model module, threat prediction is carried out on future network traffic based on the artificial intelligent model adjusted by the self-adaptive module, a fifth data set is generated, and the prediction correction module adjusts a feature selection algorithm of the feature selection module according to a prediction result of the fifth data set so as to improve the accuracy of threat detection.

Further, the data collection module specifically includes:

the network switch or router is accessed, and network data packets of a transmission layer and an application layer are captured in real time through a deep data packet inspection or flow mirroring technology;

carrying out protocol analysis and classification on the captured data packets, and sequencing and indexing the analyzed and classified network data packets according to the source address, the destination address, the port number and the protocol type;

grouping the sequenced and indexed network data packets according to time periods by utilizing a time window, and calculating the statistical characteristics of each group on each preset field; the statistical properties of the generated packets are integrated into a matrix or data frame form as a first data set.

Further, the data preprocessing module performs smoothing processing on each statistical characteristic field in the first data set through a gaussian filtering denoising algorithm, eliminates noise or abnormal values, applies Z-Score standardization to denoised data to convert the values of each statistical characteristic field into values with uniform dimension or range, and then re-integrates the values into a new matrix or data frame form to serve as the second data set.

Further, the feature selection module specifically includes:

applying a recursive feature elimination algorithm or feature ordering based on information gain to evaluate the importance of the statistical characteristic fields in the second data set, specifically removing the last feature field through a series of iterative processes each time until the preset feature quantity K is reached;

according to a recursive feature elimination algorithm or a feature ordering evaluation and ordering result based on information gain, selecting a statistics characteristic field of the top 10 of the ranks as a first feature, and extracting all data corresponding to the 10 fields from a second data set;

reconstructing a new data set by using the extracted 10 first characteristic fields, wherein each row of data only comprises the 10 selected fields, and re-integrating the data according to the arrangement sequence of the data in the original second data set, and generating a new matrix with 10 columns as a third data set after integration;

further, the artificial intelligence model module is internally provided with a pre-trained neural network model, the neural network model receives each row of data in the third data set, and each row of data comprises 10 first characteristic fields selected by the characteristic selection module;

the neural network model performs forward propagation operation on each line of data and outputs a numerical value in a [0,1] interval, wherein the numerical value represents the probability of whether the corresponding network traffic data is threatening network traffic;

for network traffic with an output probability value greater than or equal to a preset threshold, marking the network traffic as 'threatening network traffic'; otherwise, it is marked as "non-threatening network traffic";

the neural network model output and threat signature of each row of data are integrated into a new data set to generate a fourth data set, wherein the fourth data set comprises 10 first characteristic fields in the original third data set and a second characteristic field, and the second characteristic field records the judgment of whether the corresponding network traffic is threat network traffic or not.

Further, the self-adaptive module firstly executes a data alignment operation to pair each row of data in the fourth data set with the corresponding network traffic data in the first data set;

for each paired data, the adaptation module compares the threat determination in the fourth data set with the actual network traffic label in the first data set, and calculates a false positive rate for data labeled "threatening network traffic" but in practice "non-threatening network traffic" or data labeled "non-threatening network traffic" but in practice "threatening network traffic";

based on the calculated misjudgment rate, the self-adaptive module generates a correction factor which directly influences parameters of a neural network model in the artificial intelligent model module;

the self-adaptive module applies the generated correction factors to the neural network model of the artificial intelligent model module, adjusts parameters of the neural network model, and reduces the threshold value of the neural network model or improves the learning rate if the misjudgment rate exceeds a preset upper limit; if the misjudgment rate is lower than the preset lower limit, the threshold value of the neural network model is increased or the learning rate is reduced.

Further, the false positive rate is calculated as follows:

the self-adaptive module maintains a counter set, including a true example TP counter, a true negative example TN counter, a false positive example FP counter and a false negative example FN counter;

for data in the fourth dataset marked as "threatening network traffic" and the first dataset is actually also "threatening network traffic", increment the "true instance TP" counter by one;

for data in the fourth dataset marked as "non-threatening network traffic" and the first dataset is also actually "non-threatening network traffic", increment the "true negative TN" counter by one;

for data in the fourth dataset marked as "threatening network traffic" but in the first dataset actually being "non-threatening network traffic", incrementing a "false positive FP" counter;

for data in the fourth dataset marked as "non-threatening network traffic" but in the first dataset actually "threatening network traffic", incrementing a "false negative example FN" counter;

calculating the misjudgment rate according to the four counters: misjudgment rate

The misjudgment rate is used for evaluating threat detection accuracy of the system and is used as a basis for generating a correction factor by the self-adaptive module.

Further, the prediction correction module receives the artificial intelligent model parameters adjusted by the self-adaptive module, wherein the artificial intelligent model parameters comprise learning rate, threshold value and activation function parameters;

the prediction correction module is embedded with a flow prediction sub-module, and predicts the network flow in a future period of time by using a time sequence analysis algorithm to generate a prediction data set of future network flow;

the prediction correction module transmits the prediction data set to the artificial intelligent model module adjusted by the self-adaptive module, threat detection is carried out on the prediction data set by utilizing the adjusted parameters, an adjusted neural network model is applied to each line of data in the prediction data set, forward propagation operation is carried out, a numerical value in a [0,1] interval is output, and the numerical value represents the probability of whether the corresponding prediction network flow data is threat network flow or not;

marking the network traffic in the predicted data set as 'threatening network traffic' or 'non-threatening network traffic' according to the output probability value and the threshold value adjusted by the self-adapting module;

the neural network model output and threat signature of each row of predicted data are integrated into a new data set to generate a fifth data set, the fifth data set comprising statistical characteristic fields of the predicted network traffic, and third characteristic fields recording a determination of whether the corresponding predicted network traffic is threatening network traffic.

The invention has the beneficial effects that:

the invention realizes high network security protection through multi-module integration. The data collection module, the data preprocessing module, the feature selection module, the artificial intelligent model module, the self-adaptive module and the prediction correction module work cooperatively, so that the whole process from the original network flow to the final threat judgment is ensured to be carried out under a unified frame, the threat detection accuracy is improved, and the response speed and the flexibility of the system are greatly improved because the data flow and the parameter adjustment between the modules are carried out automatically.

According to the invention, the self-adaptive module can automatically adjust the parameters of the artificial intelligent model module according to the misjudgment rate, so that the system can be self-optimized, the prediction correction module further utilizes the adjusted parameters and the characteristics to select correction factors, more accurate threat prediction is carried out on the future network flow, and the prospective protection of the system is realized.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only of the invention and that other drawings can be obtained from them without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a system module according to an embodiment of the invention.

Detailed Description

The present invention will be further described in detail with reference to specific embodiments in order to make the objects, technical solutions and advantages of the present invention more apparent.

It is to be noted that unless otherwise defined, technical or scientific terms used herein should be taken in a general sense as understood by one of ordinary skill in the art to which the present invention belongs. The terms "first," "second," and the like, as used herein, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.

As shown in fig. 1, an adaptive cyber threat detection system based on artificial intelligence, comprising:

The data collection module specifically comprises:

grouping the sequenced and indexed network data packets according to time periods by utilizing a time window, and calculating the statistical characteristics of each group on each preset field, such as average data packet size, data transmission rate and the like, for each group;

integrating the statistical characteristics of each generated group into a matrix or data frame form to be used as a first data set;

the data collection module captures network data packets in real time, and generates a first data set with rich content and structuring through protocol analysis, classification, sequencing, indexing and statistical characteristic calculation, so that high-quality input data is provided for subsequent modules.

The data preprocessing module carries out smoothing treatment on each statistical characteristic field in the first data set through a Gaussian filtering denoising algorithm, eliminates noise or abnormal values, applies Z-Score standardization to the denoised data to convert the values of each statistical characteristic field into values with uniform dimension or range, and then re-integrates the values into a new matrix or data frame form to serve as a second data set;

the data preprocessing module generates a second data set by receiving the first data set from the data collecting module and performing specially designed denoising and standardization steps on the first data set, so as to ensure that the subsequent modules can process data with uniform and reliable quality.

The feature selection module specifically comprises:

applying a recursive feature elimination algorithm or feature ordering based on information gain, performing importance assessment on statistical characteristic fields (data packet size, data transmission rate, protocol type, etc.) in the second data set, specifically removing the last feature field through a series of iterative processes each time until a preset feature number K (K is a positive integer, e.g., k=10) is reached;

the feature selection module receives the second data set in an explicit and deterministic manner and, by precisely applying a recursive feature elimination algorithm, selects the 10 statistical property fields most relevant to network threat detection, generating a third data set. Therefore, the data processing efficiency is improved, and the accuracy of threat detection by the follow-up module, particularly the artificial intelligent model module is enhanced.

The artificial intelligence model module is internally provided with a pre-trained neural network model, the neural network model receives each row of data in the third data set, and each row of data comprises 10 first characteristic fields selected by the characteristic selection module;

for network traffic with an output probability value greater than or equal to a preset threshold (e.g., the threshold is set to 0.8), it is labeled as "threatening network traffic"; otherwise, it is marked as "non-threatening network traffic";

integrating the neural network model output and threat markers of each row of data into a new data set to generate a fourth data set, wherein the fourth data set comprises 10 first characteristic fields in the original third data set and second characteristic fields, and the second characteristic fields record the judgment of whether the corresponding network traffic is threat network traffic or not;

the artificial intelligent model module accurately detects the threat to the third data set through a built-in pre-trained neural network model, and generates a fourth data set according to the detection result. The fourth data set not only comprises the statistical characteristic field in the original third data set, but also is additionally provided with a second characteristic field for recording whether the network traffic has threat or not, thereby realizing more comprehensive and accurate network threat detection.

The self-adaptive module firstly executes data alignment operation once, so that each row of data in the fourth data set is paired with corresponding network traffic data in the first data set;

the self-adaptive module applies the generated correction factors to the neural network model of the artificial intelligent model module, adjusts parameters of the neural network model, and reduces the threshold value of the neural network model or improves the learning rate if the misjudgment rate exceeds a preset upper limit (for example, the upper limit is set to be 5 percent); if the misjudgment rate is lower than a preset lower limit (for example, the lower limit is set to be 1%), the threshold value of the neural network model is increased or the learning rate is reduced;

the self-adaptive module can analyze network flow data and threat detection results in real time through the tight integration with the artificial intelligent model module and the data collection module, and automatically adjust parameters of the artificial intelligent model module according to actual performances, so that threat detection accuracy of the system is improved.

The false positive rate is calculated as follows:

The misjudgment rate is used for evaluating threat detection accuracy of the system and is used as a basis for generating a correction factor by the self-adaptive module;

the false positive rate calculation mode provides a quantization method for evaluating the accuracy of the artificial intelligent model module on threat detection tasks, and provides a basis for automatically adjusting parameters of the artificial intelligent model module for the self-adaptive module.

The prediction correction module receives the artificial intelligent model parameters adjusted by the self-adaptive module, wherein the artificial intelligent model parameters comprise learning rate, threshold value and activation function parameters;

Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the invention is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the invention, the steps may be implemented in any order and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.

The present invention is intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the present invention should be included in the scope of the present invention.

Claims

1. An artificial intelligence based adaptive cyber threat detection system, comprising:

2. The adaptive cyber threat detection system based on artificial intelligence of claim 1, wherein the data collection module specifically comprises:

grouping the sequenced and indexed network data packets according to time periods by utilizing a time window, and calculating the statistical characteristics of each group on each preset field;

the statistical properties of the generated packets are integrated into a matrix or data frame form as a first data set.

3. The adaptive cyber threat detection system of claim 2, wherein the data preprocessing module performs smoothing on each of the statistical characteristic fields in the first data set by a gaussian filter denoising algorithm, eliminates noise or outliers, and applies Z-Score normalization to the denoised data to convert the values of each of the statistical characteristic fields into values having a uniform dimension or range, and then re-integrates the values into a new matrix or data frame form as the second data set.

4. The adaptive cyber threat detection system based on artificial intelligence of claim 3, wherein the feature selection module specifically comprises:

and reconstructing a new data set by using the extracted 10 first characteristic fields, wherein each row of data only comprises the data of the 10 selected fields, re-integrating the data according to the arrangement sequence of the data in the original second data set, and generating a new matrix with 10 columns as a third data set after integration.

5. The adaptive network threat detection system based on artificial intelligence of claim 4, wherein the artificial intelligence model module has built-in a pre-trained neural network model that receives each row of data in the third dataset, each row of data comprising 10 first feature fields selected by the feature selection module;

6. The adaptive network threat detection system of claim 5, wherein the adaptation module first performs a data alignment operation to pair each row of data in the fourth data set with corresponding network traffic data in the first data set;

7. The adaptive network threat detection system of claim 6, wherein the false positive rate is calculated as follows:

calculating the misjudgment rate according to the four counters:

8. The adaptive network threat detection system based on artificial intelligence of claim 7, wherein the prediction modification module receives artificial intelligence model parameters adjusted by the adaptation module, including learning rate, threshold, activation function parameters;