CN112884075A - Traffic data enhancement method, traffic data classification method and related device - Google Patents

Traffic data enhancement method, traffic data classification method and related device Download PDF

Info

Publication number
CN112884075A
CN112884075A CN202110310934.5A CN202110310934A CN112884075A CN 112884075 A CN112884075 A CN 112884075A CN 202110310934 A CN202110310934 A CN 202110310934A CN 112884075 A CN112884075 A CN 112884075A
Authority
CN
China
Prior art keywords
data
training
flow data
flow
generator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110310934.5A
Other languages
Chinese (zh)
Inventor
陈龙
王炜
江军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Original Assignee
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Topsec Technology Co Ltd, Beijing Topsec Network Security Technology Co Ltd, Beijing Topsec Software Co Ltd filed Critical Beijing Topsec Technology Co Ltd
Priority to CN202110310934.5A priority Critical patent/CN112884075A/en
Publication of CN112884075A publication Critical patent/CN112884075A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application provides a traffic data enhancement method, a traffic data classification method and a related device, wherein the method comprises the following steps: acquiring a flow data sample; merging the flow data samples to obtain merged flow data; training the generated countermeasure network by using the flow data subjected to merging processing to obtain a generated countermeasure network model; generating augmented traffic samples using the generated countermeasure network model; the augmented flow samples are combined with the merged flow data to obtain an enhanced flow data set. In the implementation process, the trained generation confrontation network model can better generate the expansion flow sample, so that the flow data set of the expansion flow sample is obtained, and the quantity of the flow sample is effectively expanded.

Description

Traffic data enhancement method, traffic data classification method and related device
Technical Field
The application relates to the technical field of machine learning, artificial intelligence and deep learning, in particular to a traffic data enhancement method, a traffic data classification method and a related device.
Background
At present, in the training and inference process of the deep learning model, a plurality of samples are often needed to train the deep learning model, and a good effect can be achieved. However, when the number of the specific class samples in the training data set used for training the deep learning model is small, the recognition accuracy of the specific class samples is much lower than that of other classes. In order to increase the identification accuracy of the specific class samples, it is common to manually collect more specific class samples as training data, but in a special scenario (e.g., network attack traffic data), this is not only inefficient, but also difficult to collect the specific class samples.
Disclosure of Invention
An object of the embodiments of the present application is to provide a method for enhancing traffic data, a method for classifying traffic data, and a related device, which are used to solve the problem of low efficiency in obtaining a specific type of sample.
The embodiment of the application provides a method for enhancing flow data, which comprises the following steps: acquiring a flow data sample; merging the flow data samples to obtain merged flow data; training the generated countermeasure network by using the flow data subjected to merging processing to obtain a generated countermeasure network model; generating augmented traffic samples using the generated countermeasure network model; the augmented flow samples are combined with the merged flow data to obtain an enhanced flow data set. In the implementation process, the generated countermeasure network is trained by using the flow data subjected to merging processing to obtain a generated countermeasure network model, an expanded flow sample is generated by using the generated countermeasure network model, and then the expanded flow sample is combined with the flow data subjected to merging processing; that is to say, the trained generation countermeasure network model can better generate the extended traffic samples, so as to obtain the traffic data set of the extended traffic samples, thereby effectively extending the number of the traffic samples.
Optionally, in this embodiment of the present application, generating the countermeasure network includes: a discriminator and a generator; training the generation of the countermeasure network by using the merging processed flow data, comprising: pre-training the generator to obtain a pre-trained generator; pre-training the discriminator according to the pre-trained generator to obtain the pre-trained discriminator; after the pre-training is finished, iteratively executing a confrontation training process until the generated confrontation network is converged; the confrontation training process comprises the following steps: acquiring noise data and a category label vector; generating operation is carried out on the noise data and the category label vector by using a generator after pre-training, and a false data sequence is obtained; acquiring a true data sequence, and performing discrimination operation on a false data sequence, the true data sequence and a category label vector by using a pre-trained discriminator to obtain discrimination result data and category label data; and performing iterative training on the pre-trained generator and the pre-trained discriminator by using the discrimination result data, the class label data and the true data sequence. In the implementation process, the generator and the discriminator are pre-trained, and then the generated countermeasure network model with the weight parameter convergence is obtained by iteratively executing the countermeasure training process, so that the quality of the enhanced flow data is effectively improved.
Optionally, in an embodiment of the present application, the pre-training of the generator includes: randomly initializing parameters of a generator and a discriminator; and pre-training the generator by using the flow data after merging as training data and using a maximum likelihood estimation algorithm to obtain the pre-trained generator. In the implementation process, the generator is pre-trained by using a maximum likelihood estimation algorithm, so that the problem of poor quality of generated flow data under the condition of no pre-training is solved, and the quality of enhanced flow data is effectively improved.
Optionally, in this embodiment of the present application, pre-training the discriminator according to the pre-trained generator includes: generating initial data using a pre-trained generator; and pre-training the discriminator by taking the initial data as training data and the cross entropy function as a loss function. In the implementation process, the discriminator is pre-trained by taking the initial data as the training data and the cross entropy function as the loss function, so that the problem of poor quality of the generated flow data under the condition of no pre-training is solved, and the quality of the enhanced flow data is effectively improved.
Optionally, in this embodiment of the present application, the merging process performed on the flow data samples includes: deleting redundant data in the flow data sample according to a preset rule base to obtain flow data with the redundancy deleted; and performing data merging processing on the flow data with the redundancy deleted according to a timestamp rule. In the implementation process, the redundant data in the flow data sample is deleted through the preset rule base, and the flow data with the redundancy deleted is subjected to data merging processing, so that the problem of poor quality of generated flow data caused by the redundant data is solved, and the quality of the enhanced flow data is effectively improved.
Optionally, in this embodiment of the present application, the generation of the countermeasure network generates the countermeasure network AC-GAN for the auxiliary classification. In the implementation process, the countermeasure network model is generated by using the auxiliary classification to enhance the flow data, so that the model can pay more attention to the classification information of the flow data, the problem of poor quality of the generated flow data is avoided, and the enhanced flow data quality is effectively improved.
The embodiment of the application further provides a traffic data classification method, which includes: obtaining an enhanced traffic data set using a method as described above; training a neural network by using the enhanced flow data set to obtain a neural network model; and classifying the traffic data to be classified by using a neural network model to obtain a classification result. In the implementation process, the enhanced flow data set is used for training the neural network, and the neural network model is used for classifying the flow data to be classified, so that the problem of low accuracy of classification by using the neural network model due to data imbalance is solved, and the accuracy of classification by using the neural network model is effectively improved.
An embodiment of the present application further provides a traffic data enhancement device, including: the flow data acquisition module is used for acquiring a flow data sample; the flow data processing module is used for merging the flow data samples to obtain merged flow data; the countermeasure model obtaining module is used for training the generated countermeasure network by using the flow data after merging processing to obtain a generated countermeasure network model; a traffic sample expansion module for generating an expanded traffic sample using the generative confrontation network model; and the flow data enhancement module is used for combining the expanded flow sample with the flow data after merging processing to obtain an enhanced flow data set.
Optionally, in this embodiment of the present application, generating the countermeasure network includes: a discriminator and a generator; a network model acquisition module comprising: the generator pre-training module is used for pre-training the generator to obtain a pre-trained generator; the discriminator pre-training module is used for pre-training the discriminator according to the pre-trained generator to obtain the pre-trained discriminator; the network model training module is used for iteratively executing the confrontation training process after the pre-training is finished until the confrontation network is generated to be converged; the confrontation training process comprises the following steps: acquiring noise data and a category label vector; generating operation is carried out on the noise data and the category label vector by using a generator after pre-training, and a false data sequence is obtained; acquiring a true data sequence, and performing discrimination operation on a false data sequence, the true data sequence and a category label vector by using a pre-trained discriminator to obtain discrimination result data and category label data; and performing iterative training on the pre-trained generator and the pre-trained discriminator by using the discrimination result data, the class label data and the true data sequence.
Optionally, in an embodiment of the present application, the generator pre-training module includes: the parameter initialization module is used for randomly initializing the parameters of the generator and the discriminator; and the first pre-training module is used for pre-training the generator by using the flow data after merging as training data and using a maximum likelihood estimation algorithm to obtain the pre-trained generator.
Optionally, in an embodiment of the present application, the identifier pre-training module includes: an initial data generation module for generating initial data using the pre-trained generator; and the second pre-training module is used for pre-training the discriminator by taking the initial data as training data and taking the cross entropy function as a loss function.
Optionally, in an embodiment of the present application, the traffic data processing module includes: the redundant data deleting module is used for deleting redundant data in the flow data sample according to a preset rule base to obtain flow data with the redundancy deleted; and the data merging processing module is used for merging the data of the flow data after the redundancy is deleted according to the time stamp rule.
An embodiment of the present application further provides a traffic data classification device, including: an enhanced traffic obtaining module for obtaining an enhanced traffic data set using a method as described above; a network model obtaining module for training a neural network using the enhanced traffic data set to obtain a neural network model; and the classification result obtaining module is used for classifying the traffic data to be classified by using the neural network model to obtain a classification result.
An embodiment of the present application further provides an electronic device, including: a processor and a memory, the memory storing processor-executable machine-readable instructions, the machine-readable instructions when executed by the processor performing the method as described above.
Embodiments of the present application also provide a storage medium having a computer program stored thereon, where the computer program is executed by a processor to perform the method as described above.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic flow chart of a traffic data enhancement method provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of an opponent training process provided by an embodiment of the present application;
fig. 3 is a schematic flow chart of a traffic data classification method provided in an embodiment of the present application;
fig. 4 is a schematic structural diagram of a flow data enhancement device provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
The technical solution in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
Before introducing the traffic data enhancement method and the traffic data classification method provided in the embodiment of the present application, some concepts related in the embodiment of the present application are introduced:
data enhancement, also called amplification of a training data set or data amplification, refers to performing amplification operation on existing training data to obtain more training data, specifically, for example: assuming that the training data is an image, the background color or brightness of the image, the angle of the rotated image, or the size of the cropped image, etc. may be changed, by which more image data for training the model may be added.
The data classification model, also called data classification neural network model, refers to a neural network model for data classification obtained after training a neural network, that is, traffic data is used as input of the data classification model to obtain output of a probability list, where the probability list refers to a plurality of probabilities obtained by calculating the data through the data classification neural network model, and the probability refers to the probability that the data belongs to each classification.
A generated confrontation Network (GAN), also called as a generated confrontation Network, is a learning model in machine learning, and learns by making two neural networks game with each other; the generation countermeasure network consists of a generator (generator) and a discriminator (discriminator), wherein the generator randomly samples from a potential space (latency) as input data, and the output result of the generator needs to imitate a real sample in a training set as much as possible; the input data of the discriminator is the real sample or the output data of the generator (i.e. the output result of the generator), and the aim is to separate the output data of the generator from the real sample as much as possible; the generator should cheat the discriminator as much as possible (i.e. the discriminator should distinguish the output data of the generator from the true samples as much as possible), the generator and the discriminator are confronted with each other to continuously adjust the parameters, and finally the aim is that the discriminator cannot judge whether the output result of the generator is true or not.
A server refers to a device that provides computing services over a network, such as: x86 server and non-x 86 server, non-x 86 server includes: mainframe, minicomputer, and UNIX server.
It should be noted that the traffic data enhancement method and the traffic data classification method provided in the embodiments of the present application may be executed by an electronic device, where the electronic device refers to a device terminal having a function of executing a computer program or the server described above, and the device terminal includes, for example: smart phones, Personal Computers (PCs), tablet computers, Personal Digital Assistants (PDAs), or Mobile Internet Devices (MIDs), etc.
Before introducing the traffic data enhancement method and the traffic data classification method provided in the embodiments of the present application, application scenarios applicable to the traffic data enhancement method and the traffic data classification method are introduced, where the application scenarios include, but are not limited to: enhancing specific types of traffic data on a network using a traffic data enhancement method, where the specific types include: network attack, malware, virus worm, and the like; by using the traffic data classification method to classify the network traffic, the network traffic can be classified into simple malicious traffic and non-malicious traffic, and also can be classified into more detailed malicious categories, such as: a Web network attack type, a Structured Query Language (SQL) injection type, and a virus worm type, among others.
Please refer to fig. 1, which is a schematic flow chart of a traffic data enhancement method provided in the embodiment of the present application; the flow data enhancement method mainly includes the steps that a generated countermeasure network is trained by using flow data subjected to merging processing to obtain a generated countermeasure network model, an expanded flow sample is generated by using the generated countermeasure network model, and then the expanded flow sample is combined with the flow data subjected to merging processing to obtain a flow data set of the expanded flow sample, so that the number of the flow samples is effectively expanded; the traffic data enhancement method may include:
step S110: flow data samples are obtained.
The traffic data samples refer to network packets or network data frames in network traffic data, and it is generally difficult to collect network attack traffic data samples, so the number of network attack traffic data samples needs to be increased.
The above embodiments of step S110 include, but are not limited to: in a first obtaining mode, a network device (such as a router or a switch) intercepts a traffic data sample sent by other devices, and stores the traffic data sample into a file system, a database or a mobile storage device; a second obtaining manner, obtaining a pre-stored flow data sample, specifically for example: obtaining a flow data sample from a file system, or obtaining the flow data sample from a database, or obtaining the flow data sample from a mobile storage device; and in the third acquisition mode, a flow data sample compressed packet on the Internet is downloaded by using software such as a browser and the like, and then the flow data sample is obtained by decompressing from the compressed packet.
After step S110, step S120 is performed: merging the flow data samples to obtain merged flow data.
There are many embodiments of the above step S120, including but not limited to the following:
the first embodiment performs type tagging on the flow data samples, and performs data merging processing after deleting redundant data, and the embodiment includes: performing type marking on the flow data sample, deleting redundant data in the flow data sample according to a preset rule base, and obtaining the flow data after deleting the redundancy, wherein the preset rule base can be set according to specific conditions, for example: and removing the header data of the network routing message aiming at a Hyper Text Transfer Protocol (HTTP) message, and only keeping the message load and the like. Then, performing data merging processing on the flow data with the redundancy deleted according to a timestamp rule, specifically for example: taking the time granularity as minutes as an example, the traffic data received (after the redundancy is deleted) at 9 point 1 is uniformly stored in a script Object Notation (JSON) format or an eXtensible Markup Language (XML) format, and the traffic data received (after the redundancy is deleted) at 9 point 2 is uniformly stored in a TXT format or a CSV format.
In a second embodiment, performing type tagging on the flow data samples and performing data merging processing after modifying data with incomplete field values, the embodiment may include: performing type labeling on a flow data sample, and completing data field attribute values with incomplete field values, specifically for example: and counting all the data to obtain the average value or the median of all the data, and filling the average value or the median into the missing value of the field to obtain the corrected data. Of course, in a specific implementation process, the modified data may be subjected to cluster analysis to obtain a plurality of clusters, the data of the same cluster is merged together, and an index of each data is established in the cluster, so as to search for the data more quickly.
After step S120, step S130 is performed: and training the generated countermeasure network by using the flow data after merging processing to obtain a generated countermeasure network model.
Wherein, the generation of the countermeasure Network (GAN) may adopt an Auxiliary classification to generate the countermeasure Network (AC-GAN), PacketCGAN, styleGAN2, WGAN (Wasserstein GAN), WGAN-GP (Wasserstein GAN-gradient) and the like; generating the countermeasure network includes: the arbiter and the generator can be trained separately or together when training the countermeasure network.
The implementation of the step S130 may include the following steps:
step S131: and pre-training the generator to obtain the pre-trained generator.
The embodiment of step S131 described above is, for example: the data set of the flow data sample is divided into a training set and a testing set according to a preset proportion, wherein the preset proportion can be set according to specific conditions, for example, the proportion of the training set to the testing set is set to be 7 to 3, and then the generator is pre-trained by using a small part of proportion data in the training set. The specific process of pre-training is as follows: randomly initializing parameters of a generator and an arbiter to obtain the initialized generator and the initialized arbiter; and pre-training the initialized generator by using the flow data after merging as training data and using a maximum likelihood estimation algorithm to obtain the pre-trained generator.
Step S132: and pre-training the discriminator according to the pre-trained generator to obtain the pre-trained discriminator.
The embodiment of step S132 described above is, for example: initial data is generated using a pre-trained generator. And pre-training the initialized discriminator by taking the initial data as training data and the cross entropy function as a loss function to obtain the pre-trained discriminator.
Step S133: after the pre-training is completed, the countermeasure training process is executed iteratively until the generated countermeasure network converges.
Please refer to fig. 2, which is a schematic diagram of an antagonistic training process provided in the embodiment of the present application; there are many embodiments of the anti-exercise process in step S133, including but not limited to the following:
the first embodiment is described by taking training assisted classification to generate an antagonistic network AC-GAN as an example: firstly, acquiring Noise data (Noise _ data) and a class label vector (C _ vector), wherein the class label vector (C _ vector) can be a One-Hot (One-Hot) encoding tensor of training data label information; secondly, generating and operating the noise data and the class label vector by using a Generator (Generator) after pre-training to obtain a Fake data sequence (Fake _ data _ Seq); then, acquiring a true data sequence (Real _ data), and performing discrimination operation on the false data sequence, the true data sequence and the class label vector by using a pre-trained Discriminator (Discriminator) to obtain discrimination result data (Fake/Real) and class label data (C1, C2, … …, Cn), that is, the output of the Discriminator has two tensor discrimination result data (namely, a true and false judgment tensor) and class label data (namely, a classification result tensor); and finally, performing iterative training on the pre-trained generator and the pre-trained discriminator by using the discrimination result data, the category label data and the true data sequence.
In the second implementation mode, noise data are obtained, and a generator after pre-training is used for operating the noise data to obtain a flow data sequence; wherein the flow data sequence comprises a complete flow data sequence and an incomplete flow data sequence. And simulating the incomplete flow data sequence by adopting a Monte Carlo tree search algorithm to obtain a simulated flow data sequence. And combining the simulated flow data sequence with the complete flow data sequence to form a new flow data sequence. And training a discriminator by using the new flow data sequence and the flow data after merging processing, and generating a reward value. The generator is trained using a strategic gradient algorithm in conjunction with the reward value.
After step S130, step S140 is performed: an augmented traffic sample is generated using the generative confrontation network model.
After the generation of the confrontation network model is finished, the flow sample which can be output by the generator only by inputting noise data to the generator can be called as an expanded flow sample.
The embodiment of step S140 described above is, for example: firstly, generating a batch of expansion flow samples by using a generated countermeasure network model, and checking the format of the batch of expansion flow samples; judging whether the expansion flow samples of the batch conform to a preset format or not; if not, removing the expansion flow samples which do not conform to the preset format; if yes, the expansion flow sample conforming to the preset format is reserved.
After step S140, step S150 is performed: the augmented flow samples are combined with the merged flow data to obtain an enhanced flow data set.
There are many ways to combine the data in step S150, including but not limited to the following:
in the first combination mode, the set of expanded traffic samples is simply merged (synthesized) with the merged traffic data set to obtain a merged traffic data set, where the merged traffic data set is the enhanced traffic data set.
In a second combination mode, the number of combinations to be combined is determined, and then the selection and combination are performed, for example: assuming that what needs to be enhanced above is malicious traffic data, and there are 20 pieces of original merged malicious traffic data and 50 pieces of original merged non-malicious traffic data, the malicious traffic data and the non-malicious traffic data can be understood as data tags. Then, 60 expanded flow samples are generated according to the merged malicious flow data, and the number of the expanded flow samples needing to be merged is 50-20-30, so that 30 expanded flow samples can be screened out from the 60 expanded flow samples, the number of the malicious flow data is equal to that of the non-malicious flow data, and the problem of class label imbalance (class imbalance) in the process of training the neural network is solved.
In the implementation process, the generated countermeasure network is trained by using the flow data subjected to merging processing to obtain a generated countermeasure network model, an expanded flow sample is generated by using the generated countermeasure network model, and then the expanded flow sample is combined with the flow data subjected to merging processing; that is to say, the trained generation countermeasure network model can better generate the extended traffic samples, so as to obtain the traffic data set of the extended traffic samples, thereby effectively extending the number of the traffic samples.
Please refer to fig. 3, which is a schematic flow chart of a traffic data classification method according to an embodiment of the present application; it is understood that after the enhanced flow data set is obtained above, the flow data may be further classified, and the flow data classification method herein may include:
step S210: data enhancement is performed on the already obtained flow data samples using a flow data enhancement method as above, obtaining an enhanced flow data set.
Step S220: the neural network is trained using the enhanced flow data set to obtain a neural network model.
The embodiment of step S220 described above is, for example: training a data classification neural network by using the enhanced flow data set to obtain a data classification neural network model; among other things, data classification neural network models that can be used are for example: convolutional Neural Networks (CNN), Deep Neural Networks (DNN), and so on.
Step S230: and classifying the traffic data to be classified by using a neural network model to obtain a classification result.
The embodiment of the step S230 is, for example: classifying the traffic data to be classified by using a data classification neural network model to obtain a classification result; among other things, data classification neural network models that can be used are for example: LeNet network model, AlexNet network model, VGG network model, GoogLeNet network model, ResNet network model, and so on.
In the implementation process, the enhanced flow data set is used for training the neural network, and the neural network model is used for classifying the flow data to be classified, so that the problem of low accuracy of classification by using the neural network model due to data imbalance is solved, and the accuracy of classification by using the neural network model is effectively improved.
Please refer to fig. 4, which illustrates a schematic structural diagram of a traffic data enhancement device according to an embodiment of the present application; the embodiment of the present application further provides a traffic data enhancement apparatus 300, including:
a flow data obtaining module 310, configured to obtain a flow data sample.
And the flow data processing module 320 is configured to merge the flow data samples to obtain merged flow data.
And the countermeasure model obtaining module 330 is configured to train the generated countermeasure network using the merged traffic data, so as to obtain a generated countermeasure network model.
A traffic sample expansion module 340 for generating expanded traffic samples using the generative confrontation network model.
A flow data enhancement module 350, configured to combine the expanded flow sample with the merged flow data to obtain an enhanced flow data set.
Optionally, in this embodiment of the present application, generating the countermeasure network includes: a discriminator and a generator; a network model acquisition module comprising:
and the generator pre-training module is used for pre-training the generator to obtain the pre-trained generator.
And the discriminator pre-training module is used for pre-training the discriminator according to the pre-trained generator to obtain the pre-trained discriminator.
The network model training module is used for iteratively executing the confrontation training process after the pre-training is finished until the confrontation network is generated to be converged; the confrontation training process comprises the following steps: acquiring noise data and a category label vector; generating operation is carried out on the noise data and the category label vector by using a generator after pre-training, and a false data sequence is obtained; acquiring a true data sequence, and performing discrimination operation on a false data sequence, the true data sequence and a category label vector by using a pre-trained discriminator to obtain discrimination result data and category label data; and performing iterative training on the pre-trained generator and the pre-trained discriminator by using the discrimination result data, the class label data and the true data sequence.
Optionally, in an embodiment of the present application, the generator pre-training module includes:
and the parameter initialization module is used for randomly initializing the parameters of the generator and the discriminator.
And the first pre-training module is used for pre-training the generator by using the flow data after merging as training data and using a maximum likelihood estimation algorithm to obtain the pre-trained generator.
Optionally, in an embodiment of the present application, the identifier pre-training module includes:
and the initial data generation module is used for generating initial data by using the pre-trained generator.
And the second pre-training module is used for pre-training the discriminator by taking the initial data as training data and the cross entropy function as a loss function to obtain the pre-trained discriminator.
Optionally, in an embodiment of the present application, the traffic data processing module includes:
and the redundant data deleting module is used for deleting the redundant data in the flow data sample according to the preset rule base to obtain the flow data with the redundancy deleted.
And the data merging processing module is used for merging the data of the flow data after the redundancy is deleted according to the time stamp rule.
An embodiment of the present application further provides a traffic data classification device, including:
an enhanced traffic obtaining module for obtaining an enhanced traffic data set using a method as described above.
And the network model obtaining module is used for training the neural network by using the enhanced flow data set to obtain a neural network model.
And the classification result obtaining module is used for classifying the traffic data to be classified by using the neural network model to obtain a classification result.
It should be understood that the apparatus corresponds to the embodiments of the flow data enhancement method and the flow data classification method described above, and can perform the steps related to the embodiments of the method described above, and the specific functions of the apparatus can be referred to the description above, and the detailed description is appropriately omitted here to avoid redundancy. The device includes at least one software function that can be stored in memory in the form of software or firmware (firmware) or solidified in the Operating System (OS) of the device.
Please refer to fig. 5, which illustrates a schematic structural diagram of an electronic device according to an embodiment of the present application. An electronic device 400 provided in an embodiment of the present application includes: a processor 410 and a memory 420, the memory 420 storing machine-readable instructions executable by the processor 410, the machine-readable instructions when executed by the processor 410 performing the method as above.
The embodiment of the present application also provides a storage medium 430, where the storage medium 430 stores a computer program, and the computer program is executed by the processor 410 to perform the method as above.
The storage medium 430 may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as a Static Random Access Memory (SRAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), an Erasable Programmable Read-Only Memory (EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
In addition, functional modules of the embodiments in the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an alternative embodiment of the embodiments of the present application, but the scope of the embodiments of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the embodiments of the present application, and all the changes or substitutions should be covered by the scope of the embodiments of the present application.

Claims (10)

1. A method for traffic data enhancement, comprising:
acquiring a flow data sample;
merging the flow data samples to obtain merged flow data;
training a generated countermeasure network by using the flow data subjected to merging processing to obtain a generated countermeasure network model;
generating an augmented traffic sample using the generated antagonistic network model;
combining the expanded flow samples with the merged flow data to obtain an enhanced flow data set.
2. The method of claim 1, wherein generating the countermeasure network comprises: a discriminator and a generator; the training of the generation of the countermeasure network by using the merging processed flow data comprises the following steps:
pre-training the generator to obtain a pre-trained generator;
pre-training the discriminator according to the generator after pre-training to obtain the discriminator after pre-training;
after the pre-training is finished, iteratively executing a confrontation training process until the generated confrontation network is converged; the confrontation training process comprises the following steps:
acquiring noise data and a category label vector;
generating operation is carried out on the noise data and the class label vector by using the generator after pre-training, and a false data sequence is obtained;
acquiring a true data sequence, and performing discrimination operation on the false data sequence, the true data sequence and the class label vector by using the pre-trained discriminator to obtain discrimination result data and class label data;
and performing iterative training on the pre-trained generator and the pre-trained discriminator by using the discrimination result data, the category label data and the true data sequence.
3. The method of claim 2, wherein pre-training the generator comprises:
randomly initializing parameters of the generator and the discriminator;
and pre-training the generator by using the flow data after merging as training data and using a maximum likelihood estimation algorithm to obtain the pre-trained generator.
4. The method of claim 3, wherein pre-training the arbiter according to the pre-trained generator comprises:
generating initial data using the pre-trained generator;
and pre-training the discriminator by taking the initial data as training data and taking a cross entropy function as a loss function.
5. The method of claim 1, wherein the merging the traffic data samples comprises:
deleting redundant data in the flow data sample according to a preset rule base to obtain flow data with the redundancy deleted;
and performing data merging processing on the flow data with the redundancy deleted according to a timestamp rule.
6. The method of any of claims 1-5, wherein generating the competing network is generating a competing network, AC-GAN, for the assisted classification.
7. A traffic data classification method is characterized by comprising the following steps:
obtaining an enhanced traffic data set using the method of any one of claims 1-5;
training a neural network by using the enhanced flow data set to obtain a neural network model;
and classifying the traffic data to be classified by using the neural network model to obtain a classification result.
8. A traffic data enhancement device, comprising:
the flow data acquisition module is used for acquiring a flow data sample;
the flow data processing module is used for merging the flow data samples to obtain merged flow data;
the countermeasure model obtaining module is used for training a generated countermeasure network by using the flow data after merging processing to obtain a generated countermeasure network model;
a traffic sample expansion module for generating an expanded traffic sample using the generative confrontation network model;
and the flow data enhancement module is used for combining the expanded flow sample with the flow data after merging processing to obtain an enhanced flow data set.
9. An electronic device, comprising: a processor and a memory, the memory storing machine-readable instructions executable by the processor, the machine-readable instructions, when executed by the processor, performing the method of any of claims 1 to 7.
10. A storage medium, having stored thereon a computer program which, when executed by a processor, performs the method of any one of claims 1 to 7.
CN202110310934.5A 2021-03-23 2021-03-23 Traffic data enhancement method, traffic data classification method and related device Pending CN112884075A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110310934.5A CN112884075A (en) 2021-03-23 2021-03-23 Traffic data enhancement method, traffic data classification method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110310934.5A CN112884075A (en) 2021-03-23 2021-03-23 Traffic data enhancement method, traffic data classification method and related device

Publications (1)

Publication Number Publication Date
CN112884075A true CN112884075A (en) 2021-06-01

Family

ID=76042165

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110310934.5A Pending CN112884075A (en) 2021-03-23 2021-03-23 Traffic data enhancement method, traffic data classification method and related device

Country Status (1)

Country Link
CN (1) CN112884075A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113947706A (en) * 2021-12-20 2022-01-18 四川师范大学 Image diversity enhancement method and system based on generation countermeasure network
CN114553520A (en) * 2022-02-21 2022-05-27 华南师范大学 Network attack data stream synthesis method and device, electronic equipment and storage medium
CN116737793A (en) * 2023-05-29 2023-09-12 南方电网能源发展研究院有限责任公司 Carbon emission stream generation method, model training method, device and computer equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180152467A1 (en) * 2016-11-30 2018-05-31 Cisco Technology, Inc. Leveraging synthetic traffic data samples for flow classifier training
CN109639479A (en) * 2018-12-07 2019-04-16 北京邮电大学 Based on the network flow data Enhancement Method and device for generating confrontation network
CN111651642A (en) * 2020-04-16 2020-09-11 南京邮电大学 Improved TEXT-GAN-based flow data set generation method
CN112270351A (en) * 2020-10-24 2021-01-26 国网江苏省电力有限公司信息通信分公司 Semi-supervised encryption traffic identification method for generating countermeasure network based on auxiliary classification

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180152467A1 (en) * 2016-11-30 2018-05-31 Cisco Technology, Inc. Leveraging synthetic traffic data samples for flow classifier training
CN109639479A (en) * 2018-12-07 2019-04-16 北京邮电大学 Based on the network flow data Enhancement Method and device for generating confrontation network
CN111651642A (en) * 2020-04-16 2020-09-11 南京邮电大学 Improved TEXT-GAN-based flow data set generation method
CN112270351A (en) * 2020-10-24 2021-01-26 国网江苏省电力有限公司信息通信分公司 Semi-supervised encryption traffic identification method for generating countermeasure network based on auxiliary classification

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
YUHE Z.等: "Enhancement of real-time traffic data in navigation clients", 《13TH INTERNATIONAL IEEE CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS》 *
冯旸赫 等: "《在线半监督学习理论及方法》", 31 January 2019, 北京:国防工业出版社 *
娄岩主编: "《大数据应用基础》", 31 October 2018 *
文常保 等: "《人工神经网络理论及应用》", 31 March 2019, 西安:西安电子科技大学出版社 *
曾琦 等: "基于半监督深度生成对抗网络的图像识别方法", 《测控技术》 *
李杰 等: "基于生成对抗网络的网络流量特征伪装技术", 《计算机工程》 *
杭州市数据资源管理局 编著: "《数据资源管理》", 30 November 2019, 杭州:浙江大学出版社 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113947706A (en) * 2021-12-20 2022-01-18 四川师范大学 Image diversity enhancement method and system based on generation countermeasure network
CN113947706B (en) * 2021-12-20 2022-06-28 四川师范大学 Image diversity enhancement method and system based on generation countermeasure network
CN114553520A (en) * 2022-02-21 2022-05-27 华南师范大学 Network attack data stream synthesis method and device, electronic equipment and storage medium
CN114553520B (en) * 2022-02-21 2023-11-21 华南师范大学 Network attack data stream synthesis method, device, electronic equipment and storage medium
CN116737793A (en) * 2023-05-29 2023-09-12 南方电网能源发展研究院有限责任公司 Carbon emission stream generation method, model training method, device and computer equipment

Similar Documents

Publication Publication Date Title
CN112884075A (en) Traffic data enhancement method, traffic data classification method and related device
US10496924B1 (en) Dictionary DGA detector model
CN110210617B (en) Confrontation sample generation method and generation device based on feature enhancement
CN111615702B (en) Method, device and equipment for extracting structured data from image
CN109086654B (en) Handwriting model training method, text recognition method, device, equipment and medium
CN112861648B (en) Character recognition method, character recognition device, electronic equipment and storage medium
US11212297B2 (en) Access classification device, access classification method, and recording medium
CN111967609B (en) Model parameter verification method, device and readable storage medium
CN111311136A (en) Wind control decision method, computer equipment and storage medium
CN112862093A (en) Graph neural network training method and device
WO2023246146A1 (en) Target security recognition method and apparatus based on optimization rule decision tree
CN116992299B (en) Training method, detecting method and device of blockchain transaction anomaly detection model
CN113422782A (en) Cloud service vulnerability analysis method and artificial intelligence analysis system based on big data
CN111368289A (en) Malicious software detection method and device
CN113987236B (en) Unsupervised training method and unsupervised training device for visual retrieval model based on graph convolution network
CN112884121A (en) Traffic identification method based on generation of confrontation deep convolutional network
CN114422271B (en) Data processing method, device, equipment and readable storage medium
CN111783812A (en) Method and device for identifying forbidden images and computer readable storage medium
CN112437022B (en) Network traffic identification method, device and computer storage medium
US20170039484A1 (en) Generating negative classifier data based on positive classifier data
CN112801186A (en) Verification image generation method, device and equipment
CN110162957B (en) Authentication method and device for intelligent equipment, storage medium and electronic device
CN111126420A (en) Method and device for establishing recognition model
CN115631502A (en) Character recognition method, character recognition device, model training method, electronic device and medium
CN114282218A (en) Attack detection method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210601