CN117407733B

CN117407733B - Flow anomaly detection method and system based on countermeasure generation shapelet

Info

Publication number: CN117407733B
Application number: CN202311695002.2A
Authority: CN
Inventors: 肖勇才; 杨柳; 喻思; 杨浩; 徐健; 刘旷也; 章玲玲; 姚保明; 蔡庆; 喻宝禄; 兰鑫; 黎书适; 余存智; 杜江龙
Original assignee: Nanchang Kechen Electric Power Test And Research Co ltd; Electric Power Research Institute of State Grid Jiangxi Electric Power Co Ltd
Current assignee: Nanchang Kechen Electric Power Test And Research Co ltd; Electric Power Research Institute of State Grid Jiangxi Electric Power Co Ltd
Priority date: 2023-12-12
Filing date: 2023-12-12
Publication date: 2024-04-02
Anticipated expiration: 2043-12-12
Also published as: CN117407733A

Abstract

The invention discloses a flow anomaly detection method and a system based on countermeasure generation shapelet, wherein the method comprises the following steps: judging whether the time length of at least one flow data time sequence is larger than a first preset threshold value or not; if the number of the one-dimensional convolution layers is larger than a first preset threshold value, generating a shape sequence by adopting two one-dimensional convolution layers and an average pooling layer; identifying the shapelet sequence and the flow data time sequence based on a discriminator, and adding the fight loss training shapelet sequence and the flow data time sequence to ensure that the shape difference of the generated shapelet sequence and the flow data time sequence is not more than a second preset threshold; and calculating a DTW value between the trained shapelet sequence and the traffic data time sequence, converting the traffic data time sequence into a feature vector based on the shapelet sequence according to the DTW value, and combining the feature vector with a KNN classifier to output a traffic abnormality detection result. The efficiency of flow anomaly detection can be improved.

Description

Flow anomaly detection method and system based on countermeasure generation shapelet

Technical Field

The invention belongs to the technical field of flow abnormality, and particularly relates to a flow abnormality detection method and system based on countermeasure generation shapelet.

Background

In the field of time sequence classification in recent years, the timing sequence classification technology based on the shapelet has the advantages of strong interpretability, high accuracy, capability of being used for high-dimensional data and the like, and is widely applied. The shape refers to a subsequence with discriminant information in a time sequence, and the time sequence classification method based on the shape is to classify the time sequence by judging whether one or more discriminant subsequences are contained in one time sequence. For example, the invention method with the patent application number of CN202310488197.7 proposes a method and a system for detecting abnormal network traffic based on a shape. The model is mainly trained according to the following steps: 1) Grabbing flow, manufacturing the flow into a CSV file, preprocessing the CSV file, then processing the CSV file into a npy file through numpy, and then dividing a training set and a testing set; 2) Generating a shapelet sequence with the most representative in the flow data time sequence by adopting a learning time sequence shapelet algorithm; 3) Training a CNN-LSTM anomaly detection model by using a shapelet sequence; 4) And carrying out flow anomaly detection by using the trained CNN-LSTM anomaly detection model to obtain a flow multi-classification detection result.

However, this solution has the following drawbacks: 1) By back-propagating the learned shapelet without any constraints during the learning process, the learned shapelet may be dissimilar to any real subsequence, which reduces the interpretability of the shapelet approach. 2) For multi-classification tasks, current deep shape learning methods tend to focus more on the local patterns of large class samples, while ignoring small class samples, thereby affecting the classification accuracy of the model on unbalanced data sets.

Disclosure of Invention

The invention provides a flow anomaly detection method and a system based on countermeasure generation shapelet, which are used for solving the technical problem of low accuracy of flow anomaly detection.

In a first aspect, the present invention provides a method for detecting traffic anomalies based on countermeasure generation shapelet, comprising:

grabbing flow data and converting the flow data into a CSV file, wherein the CSV file comprises at least one flow data time sequence based on time sequencing;

judging whether the time length of the at least one flow data time sequence is greater than a first preset threshold value or not;

if not greater than the first preset threshold value, settingIs a set of N time series of traffic data, assuming +.>The medium-traffic data time series has a length +.>，/>The ith traffic data time series +.>By->The composition of the following elements:，i=1，…，N；

randomly selecting a flow data time sequence as a first clustering center；

Calculating the shortest Euclidean distance D (x) between each flow data time sequence and the current existing class aggregation center, wherein the expression of D (x) is as follows:

，

in the method, in the process of the invention,for any traffic data time series except the cluster center, +.>For the first element in the first flow data time sequence,/ >For the second element in the first flow data time series, +.>For the third element in the first traffic data time sequence, +.>For the nth element in the first traffic data time sequence,/th element in the first traffic data time sequence>For the first element in the second flow data time series, +.>For the second element in the second flow data time series, +.>For the third element in the second flow data time series, +.>Is the nth element in the third traffic data time sequence;

calculating probability that each flow data time series is selected as the next cluster centerUntil K cluster centers are selected as +.>，/>Is the full range of all elements;

calculating the clustering category to which each flow data time sequence belongs according to a k-means algorithm, wherein the expression for calculating the clustering category to which each flow data time sequence belongs is as follows:

，

in the method, in the process of the invention,for the i-th cluster category,/->For marking symbol->For the ith traffic data time series, +.>For the K-th cluster center,/a. About>Is a calculation function;

wherein,，

in the method, in the process of the invention,indicating that the ith traffic data time sequence is classified into the kth cluster center as 1, otherwise, is 0;

extracting a shapelet sequence from an original subsequence using a special point as an endpoint by taking the mutation point of the flow data time sequence as the special point, wherein the special point is a peak point of the flow data time sequence, namely h=argmax (f (x)) and l=argminx (f (x));

Wherein f (x) represents a functional expression of the flow data time series, argmax represents a value of the argument x at the maximum value, argmin represents a value of the argument x at the minimum value;

if the flow data time sequence is larger than a first preset threshold value, extracting at least one original subsequence of a certain flow data time sequence in the CSV file by adopting a preset shape element generator according to a sliding window with a preset length, splicing the at least one original subsequence, and obtaining at least one shape element sequence after one-dimensional convolution layer and average pooling layer operation;

identifying the at least one shapelet sequence and the at least one flow data time sequence based on a discriminator, and adding a countermeasures against loss training the at least one shapelet sequence and the at least one flow data time sequence, so that the shape difference between the generated at least one shapelet sequence and the at least one flow data time sequence is not greater than a second preset threshold;

and calculating a DTW value between the at least one shapelet sequence and the at least one flow data time sequence after training, converting the at least one flow data time sequence into at least one eigenvector based on the shapelet sequence according to the DTW value, and combining with a KNN classifier to output a flow abnormality detection result.

In a second aspect, the present invention provides a traffic anomaly detection system based on countermeasure generation shapelet, comprising:

the conversion module is configured to capture flow data and convert the flow data into a CSV file, wherein the CSV file comprises at least one flow data time sequence based on time sequencing;

the judging module is configured to judge whether the time length of the at least one flow data time sequence is larger than a first preset threshold value or not;

randomly selecting a flow data time sequence as a first clustering center；

，

in the method, in the process of the invention,for any traffic data time series except the cluster center, +.>For the first flow rateThe first element in the data time series, < >>For the second element in the first flow data time series, +.>For the third element in the first traffic data time sequence, +. >For the nth element in the first traffic data time sequence,/th element in the first traffic data time sequence>For the first element in the second flow data time series, +.>For the second element in the second flow data time series, +.>For the third element in the second flow data time series, +.>Is the nth element in the third traffic data time sequence;

，

wherein,，

the splicing module is configured to extract at least one original subsequence of a certain flow data time sequence in the CSV file by adopting a preset shape element generator according to a sliding window with a preset length if the flow data time sequence is larger than a first preset threshold value, splice the at least one original subsequence, and obtain at least one shape element sequence after one-dimensional convolution layer and average pooling layer operation;

an identification module configured to identify the at least one shape sequence and the at least one flow data time sequence based on a discriminator, and to add a countermeasures against loss training the at least one shape sequence and the at least one flow data time sequence, such that the shape of the generated at least one shape sequence and the at least one flow data time sequence differ by no more than a second preset threshold;

and the output module is configured to calculate a DTW value between the at least one shapelet sequence and the at least one flow data time sequence after training, convert the at least one flow data time sequence into at least one eigenvector based on the shapelet sequence according to the DTW value, and output a flow abnormality detection result in combination with a KNN classifier.

In a third aspect, there is provided an electronic device, comprising: the system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method for detecting traffic anomalies based on countermeasure generation shapelet according to any one of the embodiments of the invention.

In a fourth aspect, the present invention also provides a computer readable storage medium having stored thereon a computer program, which when executed by a processor, causes the processor to perform the steps of the method for detecting traffic anomalies based on countermeasure generation shapelet of any one of the embodiments of the present invention.

The flow anomaly detection method and system based on the countermeasure generation shapelet have the following beneficial effects:

1. the idea of generating an countermeasure network is adopted, and the shape sequence is restrained by an countermeasure learning strategy, so that the generated shape sequence cannot be excessively different from the shape of the traffic data time sequence, and the interpretability of the KNN classifier is enhanced;

2. when the time length of at least one flow data time sequence is not more than a first preset threshold value, extracting a most representative flow data time sequence, extracting a shape sequence from the flow data time sequence, taking a mutation point of the flow data time sequence as a special point, extracting the shape sequence from a subsequence using the special point as an endpoint, and when the time length of at least one flow data time sequence is more than the first preset threshold value, adopting two one-dimensional convolution layers and an average pooling layer to generate the shape sequence, wherein the time complexity of the shape sequence is far less than that required by a traditional back propagation algorithm based on deep learning, so that the efficiency of the KNN classifier is improved;

3. In the prior art, the shapelet sequence is determined after training, but if a new sequence appears in the test set, the shapelet sequence learned in the training stage may be difficult to identify, thereby influencing the classification performance of the KNN classifier.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a flow anomaly detection method based on countermeasure generation shapelet according to an embodiment of the present invention;

FIG. 2 is a block diagram of a flow anomaly detection system based on countermeasure generation shapelet according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, a flow chart of a method for detecting traffic anomalies based on countermeasure generation of a shape is shown.

As shown in fig. 1, the flow anomaly detection method based on the countermeasure generation shapelet specifically includes the following steps:

step S101, capturing flow data, and converting the flow data into a CSV file, where the CSV file includes at least one flow data time sequence based on time ordering.

In the step, calling a scapy packet of python under the environment of installing a wincap, setting a network card or ip of a target host, and capturing flow data; inputting the grabbed flow data into a ciclovmeter to obtain a CSV file, carrying out data preprocessing on the CSV file by adopting a pytorch frame, finding out a null value, a missing value and an infinite value in the CSV file, and replacing the null value, the missing value and the infinite value with 0.

Step S102, judging whether the time length of the at least one flow data time sequence is larger than a first preset threshold value.

In this step, the first preset threshold value refers to a preset limit for the time length of the at least one time series of traffic data. In other words, this is a predetermined time length criterion for comparing and judging the flow data time series.

For example, if the first preset threshold is set to 24 hours, then the length of the time series of traffic data requires at least 24 hours of data to satisfy the condition. The setting of the threshold can be adjusted according to specific requirements and scenes, and is mainly used for filtering or screening out the flow data time sequence meeting a certain time length so as to further process or analyze.

Specifically, if the threshold value is not greater than the first preset threshold value, settingIs a set of N time series of traffic data, assuming +.>The medium-traffic data time series has a length +.>，/>The ith traffic data time series +.>By->The composition of the following elements: />，i=1，…，N；

Randomly selecting a flow data time sequence as a first clustering center；

，

In the method, in the process of the invention,for any traffic data time series except the cluster center, +.>For the first element in the first flow data time sequence,/>For the second element in the first flow data time series, +.>For the third element in the first traffic data time sequence, +.>For the nth element in the first traffic data time sequence,/th element in the first traffic data time sequence>For the first element in the second flow data time series, +.>For the second element in the second flow data time series, +.>For the third element in the second flow data time series, +.>Is the nth element in the third traffic data time sequence;

，

wherein, ，

where f (x) represents a functional expression of the flow data time series, argmax represents the value of the argument x at the maximum value, argmin represents the value of the argument x at the minimum value.

Step S103, if the current flow data time sequence is larger than a first preset threshold value, extracting at least one original subsequence of a certain flow data time sequence in the CSV file by adopting a preset shape element generator according to a sliding window with a preset length, splicing the at least one original subsequence, and obtaining at least one shape element sequence after one-dimensional convolution layer and average pooling layer operation.

In this step, the shapelet generator includes two one-dimensional convolution layers and an averaging pooling layer;

extracting at least one original subsequence of a certain flow data time sequence in the CSV file by using a preset shape element generator according to a sliding window with a preset length, splicing the at least one original subsequence, and obtaining at least one shape element sequence after one-dimensional convolution layer and average pool layer operation comprises the following steps:

Is provided withIs a set of N time series of traffic data, assuming +.>The medium-traffic data time series has a length +.>，/>The ith traffic data time series +.>By->The composition of the following elements: />，i=1，…，N；

By a sliding window with a step length of 1 and a length of MUp-sliding to obtain P original subsequences with length of M, P=Q-M+1, and counting the ith flowAccording to the time sequence->All original subsequences of length M are spliced together and the result is recorded as +.>The following steps are:

，

in the method, in the process of the invention,time series for ith traffic data +.>The length from time P is +.>Is>Representing a splicing operation->Time series for ith traffic data +.>An original subsequence of length M starting from time 1,/>Time series for ith traffic data +.>An original subsequence of length m+1 starting from time 2;

at the position ofA convolution operation with a convolution kernel size w x P is used with a step size 1 in the direction along the time axis to obtain the generated sA haplet sequence, wherein, through the first convolution layer, according to +.>The generated j-th shapelet sequence +.>The expression of (2) is:

，

in the method, in the process of the invention,represents the jth filter of width w, < >>Representing bias items->Representing a convolution operation;

the output of the first convolution layer is input to the second convolution layer, and j filters are redefined on the second convolution layer for training according to the same logic as the first convolution layer, wherein j filters are redefined on the second convolution layer for training according to the logic of the first convolution layer The generated j-th shapelet sequence +.>The expression of (2) is:

，

adding an average pooling layer, and according to the average pooling layer pairTreatment is carried out in which ∈>The expression for the processing is:

，

in the method, in the process of the invention,for the width of the filter, +.>Is the length of the sliding window;

then according to the ith traffic data time sequenceGenerating k shapelet sequences, which are recorded as:，/>for the ith shapelet sequence, < +.>Is the first element of the ith shapelet sequence, +.>2 nd element of the ith shapelet sequence, +.>Is the j element of the i-th shapelet sequence, +.>Is the kth element of the ith shapelet sequence.

Step S104, identifying the at least one shape sequence and the at least one flow data time sequence based on the discriminator, and adding the at least one shape sequence and the at least one flow data time sequence for countering loss training, so that the difference between the shapes of the generated at least one shape sequence and the at least one flow data time sequence is not greater than a second preset threshold.

In this step, the second preset threshold represents the maximum allowable difference of the shape similarity, that is, the difference between the generated shape sequence and the time sequence of the flow data should be less than or equal to the second preset threshold.

The measurement of the degree of difference between the shapelet sequence and the time series of traffic data may be achieved by calculating the distance or similarity between them. DTW is also used here.

Specifically, since the deep learning-based method directly learns to obtain the shapelet sequence through a back propagation algorithm, no constraint is added, and thus the generated shapelet sequence is greatly different from the time sequence of the traffic data. This can impair both the resolvability of the shapelet sequence and the accuracy of anomaly detection. In this regard, a discriminator is trained to determine whether the input data is a shape sequence or a true subsequence. And further converts the training process into a game process of the shapelet generator and arbiter network.

An antagonistic loss is added to the objective loss function of the arbiter and trained by minimizing the following loss function.

The expression of the loss function against loss is:

，

in the method, in the process of the invention,for the classification result of the arbiter, +.>For a real data sample, is +.>Is a shapelet sample;

the aim of the discriminator is to hope that the discriminator can better distinguish the generated sample from the real sample, and the expression of the discriminator is as follows:

，

In the method, in the process of the invention,as a discriminator, p=q-m+1, -/->For the number of samples +.>Is a binary cross entropy function.

As a loss function, the binary cross entropy is used to evaluate the quality of a binary model prediction, that is, if the prediction value approaches 1, the value of the loss function should approach 0 for the case of a label of 1. Conversely, if the predicted value approaches 0 at this time, the value of the loss function should be very large. Therefore, the optimization effect can be achieved more intuitively.

Step S105, calculating a DTW value between the trained at least one shapelet sequence and the at least one traffic data time sequence, converting the at least one traffic data time sequence into at least one feature vector based on the shapelet sequence according to the DTW value, and combining with a KNN classifier to output a traffic abnormality detection result.

In this step, a converted dataset is constructed by using DTW values between a shape sequence and a flow data time sequence, the input flow data time sequence is converted into feature vectors based on the shape sequence, each input time sequence can be converted into feature vectors containing k features, each feature corresponds to a shape sequence, and the feature value is the DTW value between the shape and the input time sequence. The original flow data time sequence set can be converted into a characteristic vector set which can be input into most machine learning algorithms, and the characteristic vector set is combined with a KNN classifier to realize flow anomaly detection.

It should be noted that, the expression for calculating the DTW value between the at least one shapelet sequence and the at least one traffic data time sequence after training is:

，

in the method, in the process of the invention,for the DTW value between the at least one shapelet sequence and the at least one traffic data time sequence after training,/v>，/>For the length of the time series of traffic data, M is the sliding window length, < >>Is the total number of iterations.

In summary, the method of the present application determines a shape according to the following criteria when the time length of at least one time series of traffic data is not greater than a first preset threshold: criterion 1: a most representative time series of traffic data is extracted from which the shapelet is extracted. Clustering by adopting an optimized k-means++ algorithm to obtain a similar group, and selecting the group with the smallest Euclidean distance with other groups. Criterion 2: taking the mutation point of the flow data time sequence as a special point, and extracting a shape sequence from an original subsequence using the special point as an endpoint; when the time length of at least one flow data time sequence is larger than a first preset threshold value, a shape sequence generated by two one-dimensional convolution layers and an average pooling layer is adopted, then the shape sequence and the flow data time sequence are identified based on a discriminator, the shape of the generated shape sequence and the shape sequence is slightly different by adding the counterloss training shape sequence and the flow data time sequence, finally, the DTW between the flow data time sequence and the shape sequence is calculated, and is converted into converted data, and combined with a KNN classifier, and a flow abnormality detection result is output, so that the flow abnormality detection efficiency is improved.

Referring to fig. 2, a block diagram of a flow anomaly detection system based on countermeasure generation of a shape is shown.

As shown in fig. 2, the flow anomaly detection system 200 includes a conversion module 210, a determination module 220, a stitching module 230, an identification module 240, and an output module 250.

The conversion module 210 is configured to capture flow data and convert the flow data into a CSV file, where the CSV file includes at least one flow data time sequence determining module 220 based on time ordering, and is configured to determine whether a time length of the at least one flow data time sequence is greater than a first preset threshold; the splicing module 230 is configured to extract at least one original subsequence of a certain flow data time sequence in the CSV file according to a sliding window with a preset length by adopting a preset shape generator if the flow data time sequence is greater than a first preset threshold, splice the at least one original subsequence, and obtain at least one shape sequence after one-dimensional convolution layer and average pooling layer operation; an identification module 240 configured to identify the at least one shape sequence and the at least one flow data time sequence based on a discriminator, and to add a countermeasures training the at least one shape sequence and the at least one flow data time sequence such that the shape of the generated at least one shape sequence and the at least one flow data time sequence differ by no more than a second preset threshold; and the output module 250 is configured to calculate a DTW value between the at least one shapelet sequence and the at least one flow data time sequence after training, convert the at least one flow data time sequence into at least one feature vector based on the shapelet sequence according to the DTW value, and output a flow abnormality detection result in combination with the KNN classifier.

It should be understood that the modules depicted in fig. 2 correspond to the various steps in the method described with reference to fig. 1. Thus, the operations and features described above for the method and the corresponding technical effects are equally applicable to the modules in fig. 2, and are not described here again.

In other embodiments, embodiments of the present invention further provide a computer readable storage medium having stored thereon a computer program, where the program instructions, when executed by a processor, cause the processor to perform the method for detecting traffic anomalies based on countermeasure generation shape in any of the method embodiments described above;

as one embodiment, the computer-readable storage medium of the present invention stores computer-executable instructions configured to:

The computer readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created from the use of a traffic anomaly detection system that generates a shape based on an countermeasure, and the like. In addition, the computer-readable storage medium may include high-speed random access memory, and may also include memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, the computer readable storage medium optionally includes a memory remotely located with respect to the processor, the remote memory being connectable over a network to a flow anomaly detection system that generates a shape based on the countermeasure. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, as shown in fig. 3, where the device includes: a processor 310 and a memory 320. The electronic device may further include: an input device 330 and an output device 340. The processor 310, memory 320, input device 330, and output device 340 may be connected by a bus or other means, for example in fig. 3. Memory 320 is the computer-readable storage medium described above. The processor 310 executes various functional applications of the server and data processing by running non-volatile software programs, instructions and modules stored in the memory 320, i.e., implements the flow anomaly detection method described above for the method embodiments based on the countermeasure generation shapelet. The input device 330 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the countermeasure-based shapelet-generating flow anomaly detection system. The output device 340 may include a display device such as a display screen.

The electronic equipment can execute the method provided by the embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method. Technical details not described in detail in this embodiment may be found in the methods provided in the embodiments of the present invention.

As an embodiment, the electronic device is applied to a traffic anomaly detection system for generating a shapelet based on countermeasure, and is used for a client, and includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to:

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on such understanding, the foregoing technical solutions may be embodied essentially or in part in the form of a software product, which may be stored in a computer-readable storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the various embodiments or methods of some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for detecting traffic anomalies based on the generation of a shape by an countermeasure, comprising:

randomly selecting a flow data time sequence as a first clustering center；

，

in the method, in the process of the invention,for any traffic data time series except the cluster center, +.>For the first element in the first flow data time sequence,/>For the second element in the first flow data time series, +.>For the third element in the first traffic data time sequence, +.>For the nth element in the first traffic data time sequence,/th element in the first traffic data time sequence >For the first element in the second flow data time series, +.>For the second element in the second time series of traffic data,for the third element in the second flow data time series, +.>Is the nth element in the third traffic data time sequence;

，

wherein,，

if the flow data time series is larger than a first preset threshold value, extracting at least one original subsequence of a certain flow data time series in the CSV file by adopting a preset shape element generator according to a sliding window with a preset length, splicing the at least one original subsequence, and obtaining at least one shape element sequence after one-dimensional convolution layer and average pooling layer operation, wherein the shape element generator comprises two one-dimensional convolution layers and an average pooling layer;

is provided withIs a set of N time series of traffic data, assuming +.>The medium-traffic data time series has a length +.>，/>The ith traffic data time series +.>By->The composition of the following elements: / >，i=1，…，N；

By a sliding window with a step length of 1 and a length of MUp-sliding to obtain P original sub-sequences with length of M, P=Q-M+1, and time-sequence of ith flow data +.>All original subsequences of length M are spliced together and the result is recorded as +.>The following steps are:

，

in the method, in the process of the invention,time series for ith traffic data +.>The length from time P is +.>Representing a concatenation operation, +.>Time series for ith traffic data +.>An original sub-sequence of length M starting from time 1,time series for ith traffic data +.>An original subsequence of length m+1 starting from time 2;

at the position ofA convolution operation with a convolution kernel of size w x P is used with a step size of 1 in the direction of the time axis to obtain a generated shapelet sequence, wherein the first convolution layer is passed according to>The generated j-th shapelet sequence +.>The expression of (2) is:

，

then according to the ith traffic data time sequenceGenerating k shapelet sequences, which are recorded as:，/>for the ith shapelet sequence, < +.>Is the first element of the ith shapelet sequence, +.>2 nd element of the ith shapelet sequence, +.>Is the j element of the i-th shapelet sequence, +.>A kth element of an ith shapelet sequence;

2. The method for detecting traffic anomalies based on countermeasure generation of a shape according to claim 1, wherein the capturing traffic data and converting the traffic data into a CSV file comprises:

calling a scapy packet of python under the environment of installing a wincap, setting a network card or ip of a target host, and capturing flow data;

and inputting the captured flow data into a ciclovmeter to obtain a CSV file.

3. The method for detecting traffic anomalies based on countermeasure generation of a shape according to claim 1, further comprising, prior to extracting at least one original subsequence of a time series of traffic data in the CSV file according to a sliding window of a preset length:

and carrying out data preprocessing on the CSV file by adopting a pytorch framework, finding out null values, missing values and infinite values in the CSV file, and replacing the null values, the missing values and the infinite values with 0.

4. The method for detecting abnormal flow based on the countermeasure generation of the shapelet according to claim 1, wherein the expression of the loss function of the countermeasure loss is:

，

in the method, in the process of the invention,for the classification result of the arbiter, +.>For a real data sample, +. >A j-th element of the i-th shapelet sequence;

the expression of the discriminator is as follows:

，

5. A method of anomaly detection of traffic based on countermeasure generation of a shapelet according to claim 1, wherein the expression for calculating DTW values between the at least one shapelet sequence and the at least one traffic data time sequence after training is:

，

6. A system for detecting traffic anomalies based on the generation of a shape by an countermeasure, comprising:

if not greater than the first preset threshold value, setting Is a set of N time series of traffic data, assuming +.>The medium-traffic data time series has a length +.>，/>The ith traffic data time series +.>By->The composition of the following elements:，i=1，…，N；

randomly select oneTraffic data time series as first cluster center；

，

in the method, in the process of the invention,for any traffic data time series except the cluster center, +.>For the first element in the first flow data time sequence,/>For the second element in the first flow data time series, +.>For the third element in the first traffic data time sequence, +.>For the nth element in the first traffic data time sequence,/th element in the first traffic data time sequence>For the first element in the second flow data time series, +.>Is the firstThe second element in the two traffic data time series,for the third element in the second flow data time series, +.>Is the nth element in the third traffic data time sequence;

calculating probability that each flow data time series is selected as the next cluster centerUntil K cluster centers are selected as +. >，/>Is the full range of all elements;

，

wherein,，

the splicing module is configured to extract at least one original subsequence of a certain flow data time sequence in the CSV file by adopting a preset shape element generator according to a sliding window with a preset length if the preset shape element is larger than a first preset threshold, splice the at least one original subsequence, and obtain at least one shape element sequence after one-dimensional convolution layer and average pooling layer operation, wherein the shape element generator comprises two one-dimensional convolution layers and an average pooling layer;

，

in the method, in the process of the invention,for the ith traffic data time sequenceColumn->The length from time P is +.>Representing a concatenation operation, +.>Time series for ith traffic data +.>An original sub-sequence of length M starting from time 1,time series for ith traffic data +.>An original subsequence of length m+1 starting from time 2;

at the position ofA convolution operation with a convolution kernel of size w x P is used with a step size of 1 in the direction of the time axis to obtain a generated shapelet sequence, wherein the first convolution layer is passed according to >The generated j-th shapelet sequence +.>The expression of (2) is:

，

the output of the first convolution layer is input to the second convolution layer, and j filters are redefined on the second convolution layer for training according to the same logic as the first convolution layer, wherein j filters are redefined on the second convolution layer for training according to the logic of the first convolution layerThe generated j-th shapelet sequence +.>The expression of (2) is:

，

in the method, in the process of the invention,for the width of the filter, +.>For sliding windowsA length;

7. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 5.

8. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method of any one of claims 1 to 5.