CN116304705A - Flow data set generation method and device based on conditional diffusion model - Google Patents

Flow data set generation method and device based on conditional diffusion model Download PDF

Info

Publication number
CN116304705A
CN116304705A CN202310278870.4A CN202310278870A CN116304705A CN 116304705 A CN116304705 A CN 116304705A CN 202310278870 A CN202310278870 A CN 202310278870A CN 116304705 A CN116304705 A CN 116304705A
Authority
CN
China
Prior art keywords
flow data
noise
data set
diffusion
gray level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310278870.4A
Other languages
Chinese (zh)
Inventor
赵莎莎
刘振娟
张登银
刘鑫
冯向南
蔡宇欣
肖睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202310278870.4A priority Critical patent/CN116304705A/en
Publication of CN116304705A publication Critical patent/CN116304705A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a flow data set generation method and device based on a conditional diffusion model, wherein the method comprises the following steps: collecting a flow data set with a label; preprocessing an original flow data set to obtain a gray level map; taking the gray level diagram as input, and performing forward process training of the diffusion model; after the diffusion model converges, a trained noise predictor is obtained and used for the reverse process of the diffusion model; generating a noise image with the same size as the target gray level image by using Gaussian noise as an initial value of the noise-containing image, and performing reverse process training; training a reverse process in a circulating iteration mode, and finally obtaining a target gray scale image; and converting the generated gray level map into a corresponding numerical matrix to finish the generation of flow data. The invention avoids the defects that partial key features are possibly lost due to undersampling and the classifier is possibly over-fitted due to oversampling, and compared with GAN, the invention can obtain better picture generation effect and avoid the defect of unstable training in the original generation countermeasure model.

Description

Flow data set generation method and device based on conditional diffusion model
Technical Field
The invention relates to a flow data set generation method and device based on a conditional diffusion model, which are conditional flow data generation methods and belong to the field of network flow mining and network behavior analysis.
Background
With the development of computer information technology, a large number of terminal devices are rushed into the internet, and a large amount of network traffic with various types is generated. The Internet has profound effects on society, economy and life of China, and has become an indispensable infrastructure in daily life. In order to protect user privacy, improve user quality of service (QoS), and maintain network security, network traffic identification and classification have become important for research in the fields of network behavior analysis, anomaly detection, and the like. With the development of information security technology, traffic data analysis faces the following challenges:
1. in order to protect user privacy, many internet applications use encryption protocols to encrypt traffic, so that traffic characteristics are difficult to extract, and some traditional methods based on deep packet inspection and machine learning have greatly reduced accuracy when confronted with encrypted traffic.
2. Traffic data is difficult to obtain. When acquiring flow data, the acquired data sets are unbalanced due to different heat of various applications. In the face of an balanced data set, the deep learning model can learn the characteristic attribute of each sample well and obtain a good classification effect, and unbalanced data sets can lead to the conditions of low sample class classification accuracy reduction, unstable model performance and the like.
3. The data is unbalanced, common solutions include random oversampling and undersampling, generating an countermeasure network technology and the like, but in application, the oversampling may cause training of a large number of repeated samples, so that the classifier is likely to generate overfitting, and different rules are generated on the same sample by the classifier; undersampling has a high probability of losing part of the key feature information; the original generation counteracts the problem that the network is easy to have unstable training, vanishing gradient and collapse mode.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a flow data set generation method and device based on a conditional diffusion model, and is a flow data generation method based on supervised learning, so that the problem of difficult acquisition of encrypted flow data is solved, and the defects in the existing data set equalization method are overcome.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
the invention provides a flow data set generation method based on a conditional diffusion model, which comprises the following steps:
collecting a flow data set through a packet grabbing tool, and configuring labels of the flow data according to flow categories (such as MySQL and QQ) to form a flow data set with labels;
converting into an array with the same length by cutting off or zero filling based on the flow data set with the label, normalizing, and finally converting into a gray level map;
putting the gray level image into a conditional denoising diffusion probability model, and training a forward diffusion process;
after the forward diffusion process training is completed, a corresponding flow gray scale image can be generated by inputting label information into the model;
and reducing the dimension of the generated gray level map into an array, and then performing inverse normalization processing to obtain the generated flow data.
Further, collecting a flow data set, comprising:
the packets in the network are captured using Wireshark and saved as PCAP format to form a traffic data set.
Further, the flow data set is flow data in PCAP format or ERF format or PCAPNG format.
Further, constructing a gray scale map based on the flow data set includes:
filtering useless local area network data packets in the data set based on the flow data set;
based on the cleaned flow data set, reading flow data in a binary reading mode, reading each byte of information to enable the value range to be between 0 and 255, and finally forming a one-dimensional array;
for the one-dimensional array, the array length in the data set is unified by line cutting, and then a two-dimensional array is formed by dimension transformation;
and carrying out data normalization on the two-dimensional array, defining the two-dimensional array between [0,1] to generate a gray level diagram, wherein the two-dimensional array is essentially a two-dimensional matrix, and the expression is as follows:
Pixel=[P 1 ,P 2 ,...,P i ] T (4)
P i =(x irc )(r∈[1,2,...,h],c∈[1,2,...,w]) (5)
wherein Pixel represents the entire dataset, P i Characteristic vector representing the ith packet, i.e. figure i Zhang Huidu, x icr For the information of a byte on the ith data packet, which is also a pixel value of the gray scale map, c represents the c-th column of the matrix, represents the arrival time of the byte, the sequence of the arrival time is represented from left to right, r represents the r-th row of the matrix, the arrival time is increased from top to bottom, and h×w represents the length of the data packet after uniform clipping.
Further, the useless local area network data packets to be filtered in the process comprise data packets under ARP protocol and DHCP protocol;
further, placing the processed data set into a conditional denoising diffusion probability model for training comprises the following steps:
step A, a gray level diagram constructed by flow data is read, wherein:
gray scale map P i Is a 28X28 image, the input X of the model consisting of gray-scale images, X is a four-dimensional tensor [ b, c, h, w ]]Where b is the minimum value, c is the channel of the gray map, values 1, h, w are the height and width of the gray map;
diffusion steps t-uniformity (1, T) represent adding noise to the image to different extents;
tag c= [ c 1 ,c 2 ,...,c b ]The vector is also a vector with a dimension b, wherein the value of c represents the flow label of the current gray level image, and the random discarding condition is used for unconditional training during training of the noise predictor;
step B, gaussian noise is added to the gray level diagram to obtain a noise-containing diagram:
Figure BDA0004137409950000041
wherein X is t Representing the denoised image, X 0 Represents the initial gray scale map, t represents the diffusion step, α= [ α ] 1 ,α 2 ,...,α t ,...,α T ]Representing sequences related to t, in this example using
Figure BDA0004137409950000042
The function is generated and the function is generated,
Figure BDA0004137409950000043
representing the weight, c representing the condition, i.e. the tag information of the data, e-N (0,I) representing gaussian noise subject to a standard normal distribution;
step C, using noisy graph X t The tag c of the flow, and the diffusion step t as input to the noise predictor G, predicts the current gaussian noise e θ
Step D, utilizing the predicted Gaussian noise E θ And the true gaussian noise e calculates the loss function:
Figure BDA0004137409950000044
loss represents loss, e represents gaussian noise subject to standard normal distribution, e θ Representing the noise predictor, a neural network,
Figure BDA0004137409950000045
representing the weight, X 0 The first gray scale is represented, t represents the diffusion step, and c represents the condition, i.e., the label information of the data.
Then optimizing a loss function of the diffusion probability model through an Adam optimization algorithm;
and E, repeating the steps A-D until the training times epoch reach a preset value.
Further, step (4) of performing a reverse process of the conditional diffusion probability model by using the trained noise predictor to generate flow data, includes the steps of:
step a, generating a noise image X with the same size as the target gray level image by using Gaussian noise E-N (0,I) conforming to standard normal distribution T As the initial value of the noisy graph, setting the initial value of the diffusion value T as T, namely the number of loop iterations, and decrementing to 0 in the iteration process;
step b, predicting the current Gaussian noise by using the noisy image, the target flow label and the diffusion step t as inputs of a noise predictor
Figure BDA0004137409950000051
Figure BDA0004137409950000052
Wherein the first term on the right of the equation is expressed in terms of the current noise-containing map X t The label c and the diffusion step t are the noise prediction values obtained by the input of the noise predictor G under the condition, and the rightThe second term corresponds to the prediction noise input by the unconditional noise predictor G, w is the super-parameter and represents Gaussian noise with a certain probability of combining the conditional and unconditional noise predictors as the current prediction
Figure BDA0004137409950000053
Step c, subtracting the generated prediction noise from the noise image;
step d, repeating the step b and the step c until the diffusion step t is 0, and finally generating a gray level diagram of the target flow data;
and e, converting the gray level image into a corresponding numerical matrix to finish the generation of flow data.
Further, the noise predictor G is implemented based on a U-Net model.
In a second aspect, the present invention provides a flow data set generating device based on a conditional diffusion model, including:
and a data acquisition module: the method comprises the steps of collecting a flow data set, configuring labels of the flow data according to flow categories, and forming a flow data set with labels;
a digital-to-image conversion module: the flow data set with the label is used for being converted into an array with the same length through truncation or zero padding, then normalized and finally converted into a gray level image;
model training module: the gray level image is used for putting the gray level image into a conditional denoising diffusion probability model to train a forward diffusion process;
a gray level map generation module: after the training for the forward diffusion process is completed, a corresponding flow gray scale image can be generated by inputting label information into the model;
and the graph-digital conversion module is used for: the method is used for reducing the dimension of the generated gray level image into an array, and then carrying out inverse normalization processing to obtain the generated flow data.
In a third aspect, the present invention provides a flow data set generating device based on a conditional diffusion model, including a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to the first aspect.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention relates to a flow data generation method based on a conditional denoising diffusion probability model, which comprises the steps of firstly generating a gray level diagram, training and generating a new gray level diagram based on the gray level diagram, decoding the new gray level diagram to obtain data, solving the problem that the current encrypted flow data is difficult to obtain, and avoiding the defects that partial key features are lost due to undersampling and the classifier is over-fitted due to oversampling;
2. the invention relates to a flow data generation method based on a conditional denoising diffusion probability model, which adopts a noise predictor G based on a U-Net model, and can obtain better picture generation effect compared with GAN, thereby avoiding the defect of unstable training in an original generation countermeasure model.
Drawings
FIG. 1 is a forward process of a diffusion probability model as described in the present invention;
FIG. 2 is a reverse process of the diffusion probability model described in the present invention;
FIG. 3 is a block diagram of a Resblock network in a diffusion probability model as described in the present invention;
fig. 4 is a block diagram of a noise predictor U-Net network described in the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.
Embodiment one:
the network quantity generation method based on the conditional diffusion probability model provided by the invention has the forward diffusion process shown in figure 1, and specifically comprises the following steps:
the method comprises the steps of (1) collecting a flow data set, configuring labels of flow data according to flow content, and forming a flow data set with labels;
in a specific example of the embodiment of the present invention, the flow data set collecting step includes:
capturing a data packet in a network by using Wireshark, and storing the data packet in a PCAP format to form a traffic data set, wherein the data source comprises: weChat, tremble, panning, mailbox, and the like.
Traffic content perspectives binary data, and tags are data sources such as QQ, weChat, and Uku.
Step (2) constructing a gray scale map based on the flow data set;
in a specific implementation method of the embodiment of the present invention, the step (2) specifically includes:
(2-1) filtering useless local area network data packets in the data set, such as data packets under ARP protocol and DHCP protocol, based on the flow data set;
(2-2) based on the cleaned flow data set, reading flow data in a binary reading mode, reading each byte of information to enable the value range to be between 0 and 255, and finally forming a one-dimensional array;
(2-3) for the one-dimensional array, cutting is required, the array length in the data set is unified, and then a two-dimensional array is formed through dimension transformation;
(2-4) carrying out data normalization on the two-dimensional array, defining the two-dimensional array between [0,1] and generating a gray scale map, wherein the gray scale map is essentially a two-dimensional matrix, and the expression is as follows:
Pixel=[P 1 ,P 2 ,...,P i ] T (4)
P i =(x irc )(r∈[1,2,...,h],c∈[1,2,...,w]) (5)
wherein Pixel represents the entire dataset, P i Characteristic vector representing the ith packet, i.e. figure i Zhang Huidu, x icr For a byte of information on the ith packet, also a pixel value of the gray scale, c represents the c-th column of the matrix, represents the arrival time of the byte, the order of arrival time is represented from left to right, r represents the r-th row of the matrix, and the arrival time is increased from top to bottomH×w represents the length of the packet after uniform clipping.
The gray scale is a black-and-white image, the colored image is colored, the flow collection and the conversion into pictures are all prepared, and the gray scale image is a two-dimensional array in the memory of a computer.
Step (3), putting the processed data set into a conditional denoising diffusion probability model for training;
the invention provides a network quantity generation method based on a conditional denoising diffusion probability model, which is divided into two processes, namely a forward process and a reverse process, wherein in a specific implementation method of the invention, a structure of a noise predictor G is shown in a table 1, and the method is realized based on a U-Net model:
TABLE 1U-Net network parameters
Figure BDA0004137409950000081
Figure BDA0004137409950000091
The forward process training step (3) comprises the following steps:
step one, a gray scale image constructed by flow data is read, wherein:
gray scale map P i Is a 28X28 image, the input X of the model consisting of gray-scale images, X is a four-dimensional tensor [ b, c, h, w ]]Wherein b is the minimum batch size, c is the channel of the gray scale map, the values are 1, h and w are the height and width of the gray scale map, and the training frequency epoch is set to be 1000;
diffusion steps t-uniformity (1, T) represent adding noise to the image to different extents;
tag c= [ c 1 ,c 2 ,…,c b ]The vector is also a vector with a dimension b, wherein the value of c represents the flow label of the current gray level image, and the random discarding condition is used for unconditional training during training of the noise predictor;
step two, gaussian noise is added into the gray level diagram to obtain a noise-containing diagram:
Figure BDA0004137409950000101
wherein X is t Representing the denoised image, X 0 Represents the initial gray scale map, t represents the diffusion step, α= [ α ] 1 ,α 2 ,...,α t ,...,α T ]Representing sequences related to t, in this example using
Figure BDA0004137409950000102
The function is generated and the function is generated,
Figure BDA0004137409950000103
representing the weight, c representing the condition, i.e. the tag information of the data, e-N (0,I) representing gaussian noise subject to a standard normal distribution;
step three, using noisy graph X t The tag c of the flow, and the diffusion step t as input to the noise predictor G, predicts the current gaussian noise e θ The method comprises the steps of carrying out a first treatment on the surface of the The noise predictor G is currently mostly implemented based on such a Unet neural network model.
Step four, utilizing the predicted Gaussian noise E θ And the true gaussian noise e calculates the loss function:
Figure BDA0004137409950000104
then optimizing a loss function of the diffusion probability model through an Adam optimization algorithm;
and fifthly, repeating the first, second, third and fourth steps until the model converges, namely the training frequency epoch is reduced to 0.
After the deep learning training is finished, the model can learn data information, the trained model can learn noise distribution information in the training process, so that noise is predicted, and noise removal processing is carried out on the image.
Step (4) is to use a trained noise predictor to perform a reverse process of a conditional diffusion probability model to generate flow data, as shown in fig. 2, and specifically includes the following steps:
step one, generating a noise image X with the same size as a target gray level image by using Gaussian noise E-N (0,I) conforming to standard normal distribution T As the initial value of the noise-containing graph, setting the initial value of the diffusion value T as T, namely the denoising cycle number, and decrementing to 0 in the iterative process;
step two, predicting the current Gaussian noise by using the noisy image, the target flow label and the diffusion step t as the input of a noise predictor
Figure BDA0004137409950000105
Figure BDA0004137409950000111
Wherein the first term on the right of the equation is expressed in terms of the current noise-containing map X t The label c and the diffusion step t are the noise predicted value obtained by the input of the noise predictor G under the condition that the conditional noise predicted value is obtained, the second item on the right corresponds to the predicted noise input by the noise predictor G under the condition that the unconditional noise predicted value is obtained, and w is the super parameter and represents the Gaussian noise which takes the noise predicted value under the condition that the conditional noise predicted value and the unconditional noise predicted value are combined with a certain probability as the current prediction
Figure BDA0004137409950000112
Step three, subtracting the generated prediction noise from the noise image;
and fourthly, repeating the second and third steps until the diffusion step t is 0, and finally generating a gray level diagram of the target flow data.
And fifthly, reducing the dimension of the generated gray level map into an array, and then performing inverse normalization processing to obtain the generated flow data.
The invention can be deployed on a server, and solves the problem of unbalanced flow data sets in network flow identification and anomaly detection research.
Embodiment two:
the present embodiment provides a flow data set generating device based on a conditional diffusion model, including:
and a data acquisition module: the method comprises the steps of collecting a flow data set, configuring labels of the flow data according to flow categories, and forming a flow data set with labels;
a digital-to-image conversion module: the flow data set with the label is used for being converted into an array with the same length through truncation or zero padding, then normalized and finally converted into a gray level image;
model training module: the gray level image is used for putting the gray level image into a conditional denoising diffusion probability model to train a forward diffusion process;
a gray level map generation module: after the training for the forward diffusion process is completed, a corresponding flow gray scale image can be generated by inputting label information into the model;
and the graph-digital conversion module is used for: the method is used for reducing the dimension of the generated gray level image into an array, and then carrying out inverse normalization processing to obtain the generated flow data.
The apparatus of this embodiment may be used to implement the method described in embodiment one.
Embodiment III:
the embodiment provides a flow data set generating device based on a conditional diffusion model, which comprises a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to the first aspect.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims (10)

1. A method for generating a flow data set based on a conditional diffusion model, comprising the steps of:
collecting a flow data set, configuring labels of the flow data according to flow categories, and forming a flow data set with the labels;
converting into an array with the same length by cutting off or zero filling based on the flow data set with the label, normalizing, and finally converting into a gray level map;
putting the gray level image into a conditional denoising diffusion probability model, and training a forward diffusion process;
after the forward diffusion process training is completed, a corresponding flow gray scale image can be generated by inputting label information into the model;
and reducing the dimension of the generated gray level map into an array, and then performing inverse normalization processing to obtain the generated flow data.
2. The method of generating a flow data set based on a conditional diffusion model of claim 1, wherein collecting the flow data set comprises:
the packets in the network are captured using Wireshark and saved as PCAP format to form a traffic data set.
3. The method for generating a flow data set based on a conditional diffusion model according to claim 2, wherein the flow data set is flow data in PCAP format or ERF format or pcaping format.
4. The flow data set generating method based on the conditional diffusion model according to claim 1, wherein constructing a gray scale map based on the flow data set comprises:
filtering useless local area network data packets in the data set based on the flow data set;
based on the cleaned flow data set, reading flow data in a binary reading mode, reading each byte of information to enable the value range to be between 0 and 255, and finally forming a one-dimensional array;
for the one-dimensional array, the array length in the data set is unified by line cutting, and then a two-dimensional array is formed by dimension transformation;
and carrying out data normalization on the two-dimensional array, defining the two-dimensional array between [0,1] to generate a gray level diagram, wherein the two-dimensional array is essentially a two-dimensional matrix, and the expression is as follows:
Pixel=[P 1 ,P 2 ,...,P i ] T (4)
P i =(x irc )(r∈[1,2,...,h],c∈[1,2,...,w]) (5)
wherein Pixel represents the entire dataset, P i Characteristic vector representing the ith packet, i.e. figure i Zhang Huidu, x icr For the information of a byte on the ith data packet, which is also a pixel value of the gray scale map, c represents the c-th column of the matrix, represents the arrival time of the byte, the sequence of the arrival time is represented from left to right, r represents the r-th row of the matrix, the arrival time is increased from top to bottom, and h×w represents the length of the data packet after uniform clipping.
5. The method for generating a traffic data set based on a conditional diffusion model according to claim 1, wherein the useless local area network data packets to be filtered in the process comprise ARP and DHCP data packets.
6. The conditional diffusion model-based flow data set generation method of claim 1, wherein placing the processed data set into the conditional denoising diffusion probability model is trained comprising the steps of:
step A, a gray level diagram constructed by flow data is read, wherein:
gray scale map P i Is a 28X28 image, the input X of the model consisting of gray-scale images, X is a four-dimensional tensor [ b, c, h, w ]]Where b is the minimum value, c is the channel of the gray map, values 1, h, w are the height and width of the gray map;
diffusion steps t-uniformity (1, T) represent adding noise to the image to different extents;
tag c= [ c 1 ,c 2 ,...,c b ]The vector is also a vector with a dimension b, wherein the value of c represents the flow label of the current gray level image, and the random discarding condition is used for unconditional training during training of the noise predictor;
step B, gaussian noise is added to the gray level diagram to obtain a noise-containing diagram:
Figure FDA0004137409940000031
wherein X is t Representing the denoised image, X 0 Represents the initial gray scale map, t represents the diffusion step, α= [ α ] 1 ,α 2 ,...,α t ,...,α T ]Representing sequences related to t, in this example using
Figure FDA0004137409940000032
The function is generated and the function is generated,
Figure FDA0004137409940000033
representing the weight, c representing the condition, i.e. the tag information of the data, e-N (0,I) representing gaussian noise subject to a standard normal distribution;
step C, using noisy graph X t The tag c of the flow, and the diffusion step t as input to the noise predictor G, predicts the current gaussian noise e θ
Step D, utilizing the predicted Gaussian noise E θ And the true gaussian noise e calculates the loss function:
Figure FDA0004137409940000034
loss represents loss, e represents gaussian noise subject to standard normal distribution, e θ Representing the noise predictor, a neural network,
Figure FDA0004137409940000035
representing the weight, X 0 The first gray scale is represented, t represents the diffusion step, and c represents the condition, i.e., the label information of the data.
Then optimizing a loss function of the diffusion probability model through an Adam optimization algorithm;
and E, repeating the steps A-D until the training times epoch reach a preset value.
7. The method of generating a flow data set based on a conditional diffusion model according to claim 1, wherein the step (4) of generating flow data by performing a reverse process of the conditional diffusion probability model using a trained noise predictor comprises the steps of:
step a, generating a noise image X with the same size as the target gray level image by using Gaussian noise E-N (0,I) conforming to standard normal distribution T As the initial value of the noisy graph, setting the initial value of the diffusion value T as T, namely the number of loop iterations, and decrementing to 0 in the iteration process;
step b, predicting the current Gaussian noise by using the noisy image, the target flow label and the diffusion step t as inputs of a noise predictor
Figure FDA0004137409940000041
Figure FDA0004137409940000042
Wherein the first term on the right of the equation is expressed in terms of the current noise-containing map X t The label c and the diffusion step t are the noise predicted value obtained by the input of the noise predictor G under the condition that the conditional noise predicted value is obtained, the second item on the right corresponds to the predicted noise input by the noise predictor G under the condition that the unconditional noise predicted value is obtained, and w is the super parameter and represents the Gaussian noise which takes the noise predicted value under the condition that the conditional noise predicted value and the unconditional noise predicted value are combined with a certain probability as the current prediction
Figure FDA0004137409940000043
Step c, subtracting the generated prediction noise from the noise image;
step d, repeating the step b and the step c until the diffusion step t is 0, and finally generating a gray level diagram of the target flow data;
and e, converting the gray level image into a corresponding numerical matrix to finish the generation of flow data.
8. The method for generating a conditional diffusion model-based flow data set according to claim 1, wherein the noise predictor G is implemented based on a U-Net model.
9. A flow data set generating device based on a conditional diffusion model, comprising:
and a data acquisition module: the method comprises the steps of collecting a flow data set, configuring labels of the flow data according to flow categories, and forming a flow data set with labels;
a digital-to-image conversion module: the flow data set with the label is used for being converted into an array with the same length through truncation or zero padding, then normalized and finally converted into a gray level image;
model training module: the gray level image is used for putting the gray level image into a conditional denoising diffusion probability model to train a forward diffusion process;
a gray level map generation module: after the training for the forward diffusion process is completed, a corresponding flow gray scale image can be generated by inputting label information into the model;
and the graph-digital conversion module is used for: the method is used for reducing the dimension of the generated gray level image into an array, and then carrying out inverse normalization processing to obtain the generated flow data.
10. A flow data set generating device based on a conditional diffusion model, which is characterized by comprising a processor and a storage medium;
the storage medium is used for storing instructions;
the processor being operative according to the instructions to perform the steps of the method according to any one of claims 1 to 8.
CN202310278870.4A 2023-03-21 2023-03-21 Flow data set generation method and device based on conditional diffusion model Pending CN116304705A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310278870.4A CN116304705A (en) 2023-03-21 2023-03-21 Flow data set generation method and device based on conditional diffusion model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310278870.4A CN116304705A (en) 2023-03-21 2023-03-21 Flow data set generation method and device based on conditional diffusion model

Publications (1)

Publication Number Publication Date
CN116304705A true CN116304705A (en) 2023-06-23

Family

ID=86819811

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310278870.4A Pending CN116304705A (en) 2023-03-21 2023-03-21 Flow data set generation method and device based on conditional diffusion model

Country Status (1)

Country Link
CN (1) CN116304705A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116542774A (en) * 2023-06-27 2023-08-04 深圳市迪博企业风险管理技术有限公司 Probability diffusion model-based method for detecting compliance of company-associated transactions on sale
CN117423396A (en) * 2023-12-18 2024-01-19 烟台国工智能科技有限公司 Crystal structure generation method and device based on diffusion model

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116542774A (en) * 2023-06-27 2023-08-04 深圳市迪博企业风险管理技术有限公司 Probability diffusion model-based method for detecting compliance of company-associated transactions on sale
CN116542774B (en) * 2023-06-27 2023-12-22 深圳市迪博企业风险管理技术有限公司 Probability diffusion model-based method for detecting compliance of company-associated transactions on sale
CN117423396A (en) * 2023-12-18 2024-01-19 烟台国工智能科技有限公司 Crystal structure generation method and device based on diffusion model
CN117423396B (en) * 2023-12-18 2024-03-08 烟台国工智能科技有限公司 Crystal structure generation method and device based on diffusion model

Similar Documents

Publication Publication Date Title
CN111444878B (en) Video classification method, device and computer readable storage medium
CN106557778B (en) General object detection method and device, data processing device and terminal equipment
CN116304705A (en) Flow data set generation method and device based on conditional diffusion model
CN108229321B (en) Face recognition model, and training method, device, apparatus, program, and medium therefor
CN111031071B (en) Malicious traffic identification method and device, computer equipment and storage medium
Wang et al. Fast Image Segmentation Using Two‐Dimensional Otsu Based on Estimation of Distribution Algorithm
CN111447190A (en) Encrypted malicious traffic identification method, equipment and device
CN110431560A (en) The searching method and device of target person, equipment, program product and medium
CN107688829A (en) A kind of identifying system and recognition methods based on SVMs
CN113612767A (en) Encrypted malicious flow detection method and system based on multitask learning enhancement
CN114169381A (en) Image annotation method and device, terminal equipment and storage medium
CN116258719A (en) Flotation foam image segmentation method and device based on multi-mode data fusion
CN114119480A (en) Crack defect detection system based on deep learning
CN111079858A (en) Encrypted data processing method and device
CN115510981A (en) Decision tree model feature importance calculation method and device and storage medium
CN116568858B (en) System and method for synthesizing diamond using machine learning
Ali et al. A meta-heuristic method for reassemble bifragmented intertwined JPEG image files in digital forensic investigation
CN110071845B (en) Method and device for classifying unknown applications
CN114119970B (en) Target tracking method and device
CN113139906B (en) Training method and device for generator and storage medium
Zhao et al. Star-gnn: spatial-temporal video representation for content-based retrieval
Liu et al. Adaptive Texture and Spectrum Clue Mining for Generalizable Face Forgery Detection
Murthi et al. A semi-automated system for smart harvesting of tea leaves
Maheshwari et al. Empirical aspect of big data to enhance medical images using HIPI
Le et al. Locality and relative distance-aware non-local networks for hand-raising detection in classroom video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination