CN112134847A

CN112134847A - Attack detection method based on user flow behavior baseline

Info

Publication number: CN112134847A
Application number: CN202010867932.1A
Authority: CN
Inventors: 王文冰; 孙剑文; 陈明; 毛艳芳
Original assignee: Zhengzhou University of Light Industry
Current assignee: Zhengzhou University of Light Industry
Priority date: 2020-08-26
Filing date: 2020-08-26
Publication date: 2020-12-25

Abstract

The invention belongs to the technical field of attack detection methods, and in particular relates to an attack detection method based on user traffic behavior baseline. The method extracts the flow-level feature set of the user's network traffic behavior; inputs the feature set into the improved model for training, obtains the user behavior baseline, and uses the baseline as the criterion for attack detection of new access traffic. The method only uses the user's normal behavioral traffic during training. The improved bidirectional generative adversarial network algorithm can perform stable training on high-latitude traffic characteristics, and is suitable for the detection of unknown attacks, with fast detection speed and high accuracy.

Description

Attack detection method based on user traffic behavior baseline

技术领域technical field

本发明属于攻击检测方法技术领域，具体涉及一种基于用户流量行为基线的攻击检测方法。The invention belongs to the technical field of attack detection methods, and in particular relates to an attack detection method based on user traffic behavior baseline.

背景技术Background technique

随着机器学习技术在实际应用中的发展，其在流量的异常检测领域也得到了广泛的应用。有研究人员选用少量的流量特征，仅针对特定的攻击场景，使用机器学习模型中经典的决策树算法、聚类算法、遗传算法等区分正常流量和入侵流量，达到了较好的检测效果。但是这些方法对于本身复杂且高维的流量数据性能较差，并且时间开销极大。深度学习模型的出现解决了高维复杂数据的训练问题，然而攻击样本难以获取，当攻击流量样本空间不足时，现有的方法难以使检测模型得到充分训练，并且难以对未知攻击产生良好的检测效果。With the development of machine learning technology in practical applications, it has also been widely used in the field of traffic anomaly detection. Some researchers select a small number of traffic characteristics, only for specific attack scenarios, and use the classic decision tree algorithm, clustering algorithm, genetic algorithm, etc. in the machine learning model to distinguish normal traffic from intrusion traffic, and achieve better detection results. However, these methods have poor performance for complex and high-dimensional traffic data, and have a huge time overhead. The emergence of deep learning models solves the training problem of high-dimensional and complex data. However, it is difficult to obtain attack samples. When the space of attack traffic samples is insufficient, it is difficult for the existing methods to fully train the detection model, and it is difficult to produce good detection of unknown attacks. Effect.

发明内容SUMMARY OF THE INVENTION

针对现有技术存在的缺陷和问题，本发明提供一种基于用户流量行为基线的攻击检测方法，该方法通过网络流量元数据来定义用户的行为特征，刻画特定用户的行为模式，在训练过程中仅使用正常用户的流量数据，从而对非用户产生的异常行为流量进行检测，不但能够检测已知攻击行为，还能预警未知行为。In view of the defects and problems existing in the prior art, the present invention provides an attack detection method based on user traffic behavior baseline. The method defines user behavior characteristics through network traffic metadata, and depicts behavior patterns of specific users. Only the traffic data of normal users is used to detect abnormal behavior traffic generated by non-users, which can not only detect known attack behaviors, but also warn unknown behaviors.

本发明解决其技术问题所采用的方案是：一种基于用户流量行为基线的攻击检测方法，包括如下步骤：The scheme adopted by the present invention to solve the technical problem is: an attack detection method based on user traffic behavior baseline, comprising the following steps:

(1)抓取特定用户至少一周的网卡关口出入流量数据，对出入流量数据进行IP地址过滤，生成与该用户全部相关的流量数据，再对流量数据进行过滤，丢弃超时、乱序和重传的流量；获取用户行为流量中完整的双向TCP会话流和UDP会话流，流的方向由第一个数据包的方向标定，其中，TCP会话流以SYN为起始，任意端发送FIN且数量小于2为终止，UDP会话流以120秒为超时时间阈值，并以五元组<源IP、目的IP、源端口、目的端口、协议类型>标注为其会话ID；依据确定的特征集维数，提取会话流中的元数据特征，进行统计特征计算，对所提取特征中的类别特征进行变换使输入模型的特征属于同一量纲，对与时间相关的统计特征进行标准差计算，并按照会话流的ID整合将统计特征整合为具有特定维度的特征集；(1) Capture the inbound and outbound traffic data of the network card gateway of a specific user for at least one week, perform IP address filtering on the inbound and outbound traffic data, generate all traffic data related to the user, and then filter the traffic data to discard timeouts, out-of-order and retransmission obtain the complete bidirectional TCP session flow and UDP session flow in the user behavior flow. The direction of the flow is marked by the direction of the first data packet. The TCP session flow starts with SYN, and any end sends FIN and the number is less than 2 is termination, the UDP session flow takes 120 seconds as the timeout time threshold, and is marked as its session ID with the quintuple <source IP, destination IP, source port, destination port, protocol type>; according to the determined feature set dimension, Extract the metadata features in the session flow, perform statistical feature calculation, transform the category features in the extracted features so that the features of the input model belong to the same dimension, calculate the standard deviation of the time-related statistical features, and calculate according to the session flow. The ID integration integrates statistical features into feature sets with specific dimensions;

(2)选用用户的部分流量特征集作为训练集，对训练集中的数值型数据进行归一化处理，对训练集中的类别型数据进行哑编码使其能够用于模型训练；(2) Selecting part of the traffic feature set of the user as the training set, normalizing the numerical data in the training set, and dummy coding the categorical data in the training set so that it can be used for model training;

(3)将双向生成对抗网络模型参数初始化，确定模型参数，先对模型中的判别器进行训练，然后对生成器和编码器进行训练，之后进行交替训练直至判别器的损失函数呈现震荡趋势；训练完毕后分别得到判别器部分的训练损失值以及生成器和编码器部分的训练损失值，通过L1范数分别计算出判别器部分以及生成器和编码器部分的损失得分，再通过异常分数公式计算出用户数据的基线；(3) Initialize the parameters of the bidirectional generative adversarial network model, determine the model parameters, first train the discriminator in the model, then train the generator and the encoder, and then perform alternate training until the loss function of the discriminator shows an oscillating trend; After the training is completed, the training loss value of the discriminator part and the training loss value of the generator and encoder parts are obtained respectively, and the loss scores of the discriminator part and the generator and encoder parts are calculated respectively through the L1 norm, and then through the abnormal score formula Calculate the baseline of user data;

(4)对测试样本流量行为进行行为特征整合，对测试样本特征参数中连续型的参数进行归一化处理，类别型的特征进行哑编码；将测试样本中的每一条会话流输入双向对抗网络模型进行计算，得到测试样本中每一条会话流的特征分布得分；将测试样本中的每一条会话流的特征分布得分与基线进行比较，当特征分布得分大于基线，则该样本被判定为攻击样本，当特征分布得分不大于基线，则该样本为正常样本。(4) Integrate the behavioral features of the test sample traffic behavior, normalize the continuous parameters in the test sample feature parameters, and perform dummy coding on the categorical features; input each conversation flow in the test sample into the two-way adversarial network The model calculates to obtain the feature distribution score of each session flow in the test sample; compare the feature distribution score of each session flow in the test sample with the baseline, when the feature distribution score is greater than the baseline, the sample is determined as an attack sample , when the feature distribution score is not greater than the baseline, the sample is a normal sample.

上述的用于用户流量行为基线的攻击检测方法，第一步中用户产生的网络通信会话流与攻击行为产生的网络通信会话流：当满足下述条件时，认为数据包属于同一条会话流：The above-mentioned attack detection method for user traffic behavior baseline, the network communication session flow generated by the user in the first step and the network communication session flow generated by the attack behavior: when the following conditions are met, consider that the data packets belong to the same session flow:

TCP会话流：TCP session flow:

流方向∩SrcIP₁＝SrcIP₂∩DstIP₁＝DstIP₂∩Prot₁ Flow direction ∩SrcIP ₁ =SrcIP ₂ ∩DstIP ₁ =DstIP ₂ ∩Prot ₁

＝Prot₂∩tcp标志位=Prot ₂ ∩tcp flag bit

UDP会话流：UDP session flow:

流方向∩SrcIP₁＝SrcIP₂∩DstIP₁＝DstIP₂∩Prot₁＝Prot₂∩T_udp Flow direction ∩SrcIP ₁ =SrcIP ₂ ∩DstIP ₁ =DstIP ₂ ∩Prot ₁ =Prot ₂ ∩T _udp

式中：SrcIP表示源IP地址，DstIP表示目的IP地址，Prot表示端口号；T_udp表示UDP流的超时时间阈值，∩表示条件同时成立。In the formula: SrcIP represents the source IP address, DstIP represents the destination IP address, Prot represents the port number; T _udp represents the timeout time threshold of the UDP stream, and ∩ represents that the conditions are satisfied at the same time.

上述的用于用户流量行为基线的攻击检测方法，步骤一中网卡关口出入流量数据是用户至少一周的流量数据。In the above-mentioned attack detection method for user traffic behavior baseline, in step 1, the inbound and outbound traffic data of the network card gateway is the traffic data of the user for at least one week.

上述的用于用户流量行为基线的攻击检测方法，所述训练集为用户的正常行为流量特征集。In the above attack detection method for user traffic behavior baseline, the training set is the user's normal behavior traffic feature set.

上述的用于用户流量行为基线的攻击检测方法，所述测试样本包括除训练集之外的用户的正常行为流量特征集和多种攻击流量的特征集。In the above attack detection method for user traffic behavior baseline, the test sample includes a user's normal behavior traffic feature set and a variety of attack traffic feature sets other than the training set.

本发明的有益效果：本发明在基线模型训练过程中仅使用用户的流量行为特征，该数据易于采集，使模型能够得到充分训练；使用高维的网络流量元数据定义用户的行为模式，对用户的行为基线进行较全面的刻画，使攻击检测模型不必指向设定的攻击场景，是一个通用的攻击检测模型；本发明对双向生成对抗网络模型进行了改进，把WGAN-GP应用在双向GAN框架中，代替了原有模型判别器的损失函数，使模型训练收敛迅速，同时检测高效。Beneficial effects of the present invention: the present invention only uses the user's traffic behavior characteristics in the baseline model training process, and the data is easy to collect, so that the model can be fully trained; high-dimensional network traffic metadata is used to define the user's behavior pattern, and the user It is a general attack detection model, which is a general attack detection model; the invention improves the bidirectional generative confrontation network model, and applies WGAN-GP to the bidirectional GAN framework. , which replaces the loss function of the original model discriminator, so that the model training converges quickly and the detection is efficient.

附图说明Description of drawings

图1是一种基于用户流量行为基线的攻击检测方法流程图。Figure 1 is a flowchart of an attack detection method based on user traffic behavior baseline.

图2是用户流量行为特征集提取的流程图。Figure 2 is a flowchart of user traffic behavior feature set extraction.

图3是用户流量行为基线模型训练的流程图。Figure 3 is a flow chart of training a baseline model of user traffic behavior.

图4是检测模型依据基线进行攻击行为检测的流程图。FIG. 4 is a flow chart of the detection model performing attack behavior detection according to the baseline.

具体实施方式Detailed ways

针对目前的攻击检测方法仅针对特定的攻击场景，对于本身复杂且高维的流量数据性能较差，时间开销大，攻击流量样本空间不足时现有方法难以是检测模型得到充分训练且难以对未知攻击产生良好检测效果的问题，本发明提供一种基于用户流量行为基线的攻击检测方法，该方法能够通过网络流量元数据来定义用户的行为特征，刻画特定用户的行为模式，在训练过程中仅使用正常用户的流量数据，从而对非用户产生的异常行为流量进行检测，下面结合附图和实施例对本发明进一步说明。In view of the current attack detection methods only for specific attack scenarios, the performance of complex and high-dimensional traffic data is poor, and the time overhead is large. When the attack traffic sample space is insufficient, it is difficult for the existing methods to fully train the detection model and to detect unknown To solve the problem of good detection effect caused by attacks, the present invention provides an attack detection method based on user traffic behavior baseline, which can define user behavior characteristics through network traffic metadata, and describe behavior patterns of specific users. The traffic data of normal users is used to detect abnormal behavior traffic generated by non-users. The present invention is further described below with reference to the accompanying drawings and embodiments.

实施例1：本实施例提供一种基于用户流量行为基线的攻击检测方法，该方法的具体流程如图1所示。Embodiment 1: This embodiment provides an attack detection method based on a user traffic behavior baseline. The specific process of the method is shown in FIG. 1 .

第一步、提取用户的网络流量行为特征集，流程如图2所示。The first step is to extract the user's network traffic behavior feature set, the process is shown in Figure 2.

获取特定用户至少一周的网卡关口出入流量数据，对所有流量数据IP地址进行过滤，保留与该用户全部相关的流量数据。对这些流量数据进行再次过滤，丢弃超时、乱序和重传的流量，获取用户行为流量中完整的双向TCP会话流和UDP会话流，用户产生的网络通信会话流与攻击行为产生的网络通信会话流当满足下述条件时，认为数据包属于同一条会话流：Obtain the inbound and outbound traffic data of the network card gateway of a specific user for at least one week, filter the IP addresses of all traffic data, and retain all the traffic data related to the user. Filter these traffic data again, discard the traffic that times out, out of sequence and retransmission, and obtain the complete bidirectional TCP session flow and UDP session flow in the user behavior traffic, the network communication session flow generated by the user and the network communication session generated by the attack behavior. Flow When the following conditions are met, packets are considered to belong to the same session flow:

TCP会话流：TCP session flow:

＝Prot₂∩tcp标志位=Prot ₂ ∩tcp flag bit

UDP会话流：UDP session flow:

其中，流方向由第一个数据包的方向标定，SrcIP表示源IP地址，DstIP表示目的IP地址，Prot表示端口号，TCP会话流中的tcp标志位表示以SYN为起始，任意端发送FIN且数量小于2为终止，UDP会话流中的T_udp表示UDP流的超时时间阈值，此处设置为T_udp＝120s，∩表示条件同时成立。Among them, the flow direction is marked by the direction of the first data packet, SrcIP indicates the source IP address, DstIP indicates the destination IP address, Prot indicates the port number, and the tcp flag bit in the TCP session flow indicates that SYN is the starting point, and any end sends FIN And if the number is less than 2, it is terminated. T _udp in the UDP session flow indicates the timeout time threshold of the UDP flow. Here, it is set to T _udp = 120s, and ∩ indicates that the conditions are satisfied at the same time.

从数据包生成会话流的基本信息包括源IP地址、目的IP地址、源端口、目的端口、协议类型、时间戳、载荷长度、Flag标识、时间窗口、包头长度；将完整的会话流以五元组<源IP、目的IP、源端口、目的端口、协议类型>标注为其ID。The basic information for generating session flow from data packets includes source IP address, destination IP address, source port, destination port, protocol type, timestamp, payload length, Flag identifier, time window, and packet header length; The group <source IP, destination IP, source port, destination port, protocol type> is marked as its ID.

依据确定的特征维数提取会话流中的元数据特征，进行统计特征计算，对所提取特征中的类别特征进行变换，由于数据中的类别特征需要进行预处理，比如目的端口类型和协议类型。变换的目的是使输入模型的特征属于同一量纲；对与时间相关的统计特征进行标准差计算反映出数据的离散程度，并按照会话流的ID整合将统计特征整合为具有特定维度的特征集。Extract the metadata features in the session stream according to the determined feature dimension, perform statistical feature calculation, and transform the category features in the extracted features. Because the category features in the data need to be preprocessed, such as destination port type and protocol type. The purpose of the transformation is to make the features of the input model belong to the same dimension; calculate the standard deviation of the time-related statistical features to reflect the degree of dispersion of the data, and integrate the statistical features into a feature set with a specific dimension according to the ID integration of the session flow .

然后，选用特定用户的部分正常行为流量特征集作为训练集，对训练集中的数值型数据进行归一化处理，本实施例的数值型特征为30维原始统计特征和22维计算统计特征；对训练集中的类别型数据进行哑编码，哑编码是任意去除一个状态，即如果有四种状态，则需要三个状态位，其中三种状态在激活时状态位值为1，而第四种状态可以用[0,0,0]表示；编码是将离散型每一种特征任都看成一种状态，这里采用哑变量编码方式，编码效果如表1所示，使其能够用于模型训练。Then, select some normal behavior traffic feature sets of specific users as the training set, and normalize the numerical data in the training set. The numerical features in this embodiment are 30-dimensional original statistical features and 22-dimensional calculated statistical features; The categorical data in the training set is dummy encoded. Dummy encoding is to arbitrarily remove a state, that is, if there are four states, three state bits are required, of which three states have a state bit value of 1 when activated, and the fourth state It can be represented by [0, 0, 0]; coding is to treat each discrete feature as a state, and the dummy variable coding method is used here. The coding effect is shown in Table 1, so that it can be used for model training.

表1 哑编码结果Table 1 Dummy coding results

第二步、通过生成对抗算法对会话特征进行基线学习，流程如图3所示，具体如下。The second step is to conduct baseline learning of conversation features through generative adversarial algorithms. The process is shown in Figure 3, and the details are as follows.

将双向生成对抗网络模型的参数初始化，确定模型参数，先对模型中的判别器进行训练，将训练样本X与其隐层映射拼接，将隐变量Z与其真是空间的特征映射拼接，将两对数据输入判别器经过第一层全连接输出64维数据，再经过1层全连接，输出1维数据，即最后一层全连接将每一个神经元连接到每一个输出神经元。Initialize the parameters of the bidirectional generative adversarial network model, determine the model parameters, first train the discriminator in the model, splicing the training sample X with its hidden layer map, splicing the latent variable Z with the feature map of the real space, and splicing the two pairs of data The input discriminator outputs 64-dimensional data through the first layer of full connection, and then outputs 1-dimensional data through 1 layer of full connection, that is, the last layer of full connection connects each neuron to each output neuron.

然后对生成器和编码器进行训练，其中生成器采用4层神经网络，输入是16维的隐空间变量，经过第一层全连接输出32维数据，采用ReLU激活，经过第二层全连接输出64维数据，再次采用ReLU激活，经过第三层全连接输出57维，得到真实空间特征表示，尝试重构原始输入数据，即G(E(x))。Then train the generator and encoder. The generator uses a 4-layer neural network, the input is a 16-dimensional latent space variable, and the first layer is fully connected to output 32-dimensional data, which is activated by ReLU, and the second layer is fully connected to output. The 64-dimensional data is activated by ReLU again, and the third layer is fully connected to output 57 dimensions to obtain the real space feature representation, and try to reconstruct the original input data, that is, G(E(x)).

编码器采用3层神经网络，输入是57维原始数据，经过第一层全连接输出32维数据，采用Leaky ReLU激活，经过第二层全连接输出16维数据，输出隐层特征表示，即E(x)。The encoder uses a 3-layer neural network, the input is 57-dimensional original data, the first layer is fully connected to output 32-dimensional data, activated by Leaky ReLU, and the second layer is fully connected to output 16-dimensional data, and output the hidden layer feature representation, namely E (x).

将训练样本x从真实空间映射到隐层空间得到训练样本与隐变量的分布差异，Loss_e＝-E_x～px[D(x,E(x))]；再将隐变量z从隐层空间映射到真实空间得到已知噪声z与重构变量G(z)的分布差异，Loss_g＝E_z～pz[D(G(z),z)]；计算重构样本(G(E(x)))与原始样本x的分布差异，Loss_d＝Loss_e+Loss_g；计算重构样本(G(E(x)))与原始样本x的分布差异，Loss_d＝Loss_e+Loss_g；得到判别器的目标损失函数Map the training sample x from the real space to the hidden layer space to obtain the distribution difference between the training sample and the hidden variable, Loss _e = -E _{x ~ px} [D(x, E(x))]; then transfer the hidden variable z from the hidden layer to The space is mapped to the real space to obtain the distribution difference between the known noise z and the reconstructed variable G(z), Loss _g =E _z～pz [D(G(z),z)]; calculate the reconstructed sample (G(E( The distribution difference between x))) and the original sample x, Loss _d = Loss _e +Loss _g ; Calculate the distribution difference between the reconstructed sample (G(E(x))) and the original sample x, Loss _d = Loss _e +Loss _g ; get the target loss function of the discriminator

训练双向生成对抗网络中的判别器，梯度更新，参数传给生成器和编码器，再同时训练双向生成对抗网络中的训练生成器和编码器。从生成器和编码器输出的数据拼接后又送入判别器，因此上述过程交替训练，直至训练至判别器的目标损失函数呈现震荡趋势；判别器最终输出1维数据，输出得到隐层空间的特征匹配，能够利用局部相关性对特征进行抽取，减少数据处理量，同时保留有用特征，其倒数第二层作为特征输出。The discriminator in the bidirectional generative adversarial network is trained, the gradient is updated, the parameters are passed to the generator and the encoder, and the training generator and the encoder in the bidirectional generative adversarial network are trained at the same time. The data output from the generator and the encoder are spliced and then sent to the discriminator. Therefore, the above process is alternately trained until the target loss function trained to the discriminator shows an oscillating trend; the discriminator finally outputs 1-dimensional data, and the output is the hidden layer space. Feature matching can use local correlation to extract features, reduce the amount of data processing, while retaining useful features, and its penultimate layer is used as feature output.

计算判别器的得分L_f，L_f＝|fD(x,E(x))-fD(E(x),G(E(x)))|，求L_f的L1范数计算判别器的损失得分S_f＝‖L_f‖₁。Calculate the score L _f of the discriminator, L _f =|fD(x,E(x))-fD(E(x),G(E(x)))|, find the L1 norm of L _f to calculate the discriminator's Loss score S _f = ‖L _f ‖ ₁ .

求取训练样本x经过编码再经过生成得到G(E(x))的损失，即L_x。Obtain the loss of G(E(x)) after encoding and then generating the training sample x, that is, L _x .

L_x＝||x-G(E(x))||，L _x =||xG(E(x))||,

求L_x的L1范数的生成器和编码器部分的损失得分S_x，S_x＝‖L_x‖₁。Find the loss score S _x for the generator and encoder parts of the L1 norm of L _x , S _x = ‖L _x ‖ ₁ .

根据生成器和编码器的根据生成器和编码器的损失得分S_x和判别器部分的损失得分S_f计算异常分数Score＝(1-weight)*S_x+weight*S_f，即为基线。According to the loss score S _x of the generator and the encoder and the loss score S _f of the discriminator part, calculate the abnormal score Score=(1-weight)*S _x +weight*S _f , which is the baseline.

第三步、对测试样本进行攻击检测，流程如图4所示，具体为：对测试样本流量行为进行行为特征整合，其中测试样包括除训练集之外的用户的正常行为流量特征集和多种攻击流量的特征集，对测试样本特征参数中连续型的参数进行归一化处理，类别型的特征进行哑编码；将测试样本中的每一条会话流输入双向对抗网络模型进行计算，得到测试样本中每一条会话流的特征分布得分；将测试样本中的每一条会话流的特征分布得分与基线进行比较，当特征分布得分大于基线，则该样本被判定为攻击样本，当特征分布得分不大于基线，则该样本为正常样本。The third step is to perform attack detection on the test sample. The process is shown in Figure 4, which is as follows: Integrate the behavior features of the test sample traffic behavior, where the test sample includes the user's normal behavior traffic feature set other than the training set and multiple A feature set of attack traffic, normalize the continuous parameters in the feature parameters of the test sample, and perform dummy coding on the features of the category; input each session flow in the test sample into the two-way adversarial network model for calculation, and get the test The feature distribution score of each session flow in the sample; compare the feature distribution score of each session flow in the test sample with the baseline, when the feature distribution score is greater than the baseline, the sample is judged as an attack sample, and when the feature distribution score is different greater than the baseline, the sample is a normal sample.

用本实施例的方法对243703测试样本进行检测，结果见表2.243703 test samples were detected with the method of the present embodiment, the results are shown in Table 2.

表2本发明攻击检测方法的检测结果The detection result of table 2 attack detection method of the present invention

攻击类型type of attack Precision(％)Precision(%) Recall(％)Recall(%) Accuracy(％)Accuracy (%) F1(％)F1(%) DoSDoS 100100 100100 100100 100100 DDoSDDoS 100100 100100 100100 100100 Web attackweb attack 90.8990.89 91.3491.34 96.6796.67 91.5391.53 InfiltrationInfiltration 100100 100100 100100 100100

由表2检测结果数据可以看出采用本发明的检测方法对DoS、DDoS和Infiltration三种攻击类型的精确率(Precision)、召回率(Recall)、准确率(Accuracy)和F1值均为100％，说明本发明检测方法具有较好的检测效果。From the detection result data in Table 2, it can be seen that the detection method of the present invention has a precision rate (Precision), a recall rate (Recall), an accuracy rate (Accuracy) and an F1 value of the three attack types of DoS, DDoS and Infiltration, which are all 100%. , indicating that the detection method of the present invention has a better detection effect.

以上所述仅为本发明的较佳实施例，并不限制本发明，凡在本发明的精神和原则范围内所做的任何修改、等同替换和改进，均应包含在本发明的保护范围之内。The above are only preferred embodiments of the present invention and do not limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principle scope of the present invention shall be included in the protection scope of the present invention. Inside.

Claims

1. An attack detection method based on a user flow behavior baseline is characterized in that: the method comprises the following steps:

the method comprises the steps that firstly, network card gateway access flow data of a user are captured, IP address filtering is carried out on the access flow data, and all flow data related to the user are reserved; filtering the flow data, discarding overtime, disorder and retransmitted flows, acquiring complete bidirectional TCP session flow and UDP session flow in user behavior flow, and marking quintuple as ID thereof; extracting metadata features in the conversation flow according to the determined feature dimension, performing statistical feature calculation, transforming category features in the extracted features to enable the features of the input model to belong to the same dimension, performing standard deviation calculation on the statistical features related to time, and integrating the statistical features into a feature set with a specific dimension according to ID integration of the conversation flow;

secondly, selecting a part of flow characteristic set of a user as a training set, carrying out normalization processing on numerical data in the training set, and carrying out dummy coding on class type data in the training set so that the class type data can be used for model training;

initializing parameters of a bidirectional generation countermeasure network model, determining model parameters, training a discriminator in the model, then training a generator and an encoder, and then performing alternate training until a loss function of the discriminator presents an oscillation trend; respectively obtaining a training loss value of the discriminator part and training loss values of the generator and the encoder part after training is finished, respectively calculating loss scores of the discriminator part and the generator and the encoder part through an L1 norm, and then calculating a base line of user data through an abnormal score formula;

fourthly, performing behavior characteristic integration on the flow behaviors of the test samples, performing normalization processing on continuous parameters in the characteristic parameters of the test samples, and performing dummy coding on the characteristics of the type; inputting each conversation flow in the test sample into a bidirectional confrontation network model for calculation to obtain a feature distribution score of each conversation flow in the test sample; and comparing the feature distribution score of each conversation flow in the test sample with the baseline, judging the sample as an attack sample when the feature distribution score is larger than the baseline, and judging the sample as a normal sample when the feature distribution score is not larger than the baseline.

2. The user traffic behavior baseline-based attack detection method according to claim 1, wherein: in the first step, the network communication session flow generated by the user and the network communication session flow generated by the attack behavior are as follows: the data packets are considered to belong to the same session flow when the following conditions are met:

TCP session flow:

flow direction ^ SrcIP₁＝SrcIP₂∩DstIP₁＝DstIP₂∩Prot₁＝Prot₂N-tcp flag bit

UDP session flow:

flow direction ^ SrcIP₁＝SrcIP₂∩DstIP₁＝DstIP₂∩Prot₁＝Prot₂∩T_udp

In the formula: SrcIP represents a source IP address, DstIP represents a destination IP address, and Prot represents a port number; t is_udpDenotes a timeout threshold value of the UDP stream, and ∈ denotes that the condition is satisfied at the same time.

3. The user traffic behavior baseline-based attack detection method according to claim 1, wherein: in the first step, the gateway entrance and exit traffic data of the network card is the traffic data of at least one week of the user.

4. The user traffic behavior baseline-based attack detection method according to claim 1, wherein: the training set is a normal behavior flow characteristic set of the user.

5. The user traffic behavior baseline-based attack detection method according to claim 1, wherein: the test sample comprises a normal behavior traffic feature set of the user and feature sets of various attack traffic except the training set.