CN115277086A - Network background flow generation method based on generation countermeasure network - Google Patents
Network background flow generation method based on generation countermeasure network Download PDFInfo
- Publication number
- CN115277086A CN115277086A CN202210720772.7A CN202210720772A CN115277086A CN 115277086 A CN115277086 A CN 115277086A CN 202210720772 A CN202210720772 A CN 202210720772A CN 115277086 A CN115277086 A CN 115277086A
- Authority
- CN
- China
- Prior art keywords
- network
- data stream
- flow
- data
- generation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000012360 testing method Methods 0.000 claims abstract description 29
- 230000006854 communication Effects 0.000 claims abstract description 20
- 238000012549 training Methods 0.000 claims abstract description 20
- 238000004891 communication Methods 0.000 claims abstract description 19
- 230000003042 antagnostic effect Effects 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 20
- 230000004913 activation Effects 0.000 claims description 17
- 210000002569 neuron Anatomy 0.000 claims description 9
- 238000004422 calculation algorithm Methods 0.000 claims description 4
- 239000000284 extract Substances 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000004088 simulation Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 7
- 230000006399 behavior Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 238000005206 flow analysis Methods 0.000 description 3
- 238000013179 statistical model Methods 0.000 description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 241000695274 Processa Species 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention provides a network background flow generation method based on a generation countermeasure network, which comprises the following steps: 1) Acquiring a training sample set and a test sample set; 2) Constructing M generation confrontation network models; 3) Generating an antagonistic network model and performing iterative training; 4) Acquiring characteristics of a data flow of the predicted flow; 5) And obtaining a network background flow generation result. The invention trains a countermeasure network model by extracting the data stream composed of the statistical characteristics of all original flow data packets in each group, can learn the multidimensional data stream level statistical characteristics of the target network flow, and uses the next data stream application category as a characteristic to learn the correlation and user behavior habits among different application flows of user nodes, so that the flow sent by the nodes with special identity information is very close to the normal flow, the characteristics of the real data stream are more accurately described, and the safety of a covert communication system is effectively improved.
Description
Technical Field
The invention belongs to the technical field of network security, relates to a network background traffic generation method, and particularly relates to a network background traffic generation method based on a generation countermeasure network, which can be used for generating network background traffic.
Background
When communication nodes in the internet use network applications to communicate, traffic data packets need to be interacted, and an attacker can classify the traffic data packets of the communication nodes by using a network traffic classification technology to implement traffic interception. Therefore, the research of the background traffic generation technology which can bypass the analysis of the traffic of the attacker has important significance.
Network traffic generation techniques may be used to simulate real network traffic to enable covert communication at the node. The network flow generation method mainly comprises a network flow generation method based on a statistical model and a network flow generation method based on flow characteristics.
The network flow generation method based on the statistical model mainly uses the Markov model, the Poisson distribution model and other statistical models to match with a flow generation tool to generate the flow, and the method mainly generates the background network flow during the internet pressure test. The method has the disadvantages that the relation between the traffic data packets is difficult to simulate under the condition of huge amount of network traffic based on a simple probability model, and the establishment of a complex probability model is very difficult.
The network traffic generation method based on the traffic characteristics is divided into a packet level and a flow level according to the granularity of the traffic characteristics. The flow generation method based on the data packet level characteristics mainly focuses on the statistical characteristics and the arrival process of the data packets, only considers the basic characteristics of a single data packet, does not consider the mutual influence among various data packets, ignores the flow characteristics among protocols and in a single protocol, and has low fidelity of the generated flow. The traffic generation based on the data stream level is mainly concerned with the characteristics of the data stream and its arrival process. A data flow generally refers to a quadruple consisting of a source IP address, a source port, a destination IP address, and a destination port. The disadvantage is that the characteristics of the user behavior habit, the relevance of different application flows, the time dimension and the like are lacked, so that the attacker analyzes the user node from the aspect. The network flow generation method based on the flow characteristics mainly extracts the characteristics of data flow by means of a machine learning technology to be used as a training sample set of a neural network, then builds the neural network to carry out iterative training, finally carries out simulation output on the network flow characteristics, then uses a flow generation tool to generate an initial data packet sequence according to the network flow characteristics obtained by simulation, encrypts data required to be sent by a user and then embeds the data into the initial data packet sequence to generate network flow.
The method extracts the packet-level characteristics of network traffic packet sample sets of different applications collected in advance, sends the packet-level characteristics to a corresponding generation countermeasure network model in a generation countermeasure network model library for iterative training, then randomly selects a generation countermeasure network from the trained generation countermeasure network model library, and uses the selected generator network for generating the simulation background traffic for sending. However, the method has the disadvantages that only one randomly selected generation countermeasure network is used for generating the background traffic each time, the habit of using the network application by a user in a time dimension is not considered, an attacker can detect the hidden traffic from the generated background traffic and further discover the hidden communication node, and the generation safety of the network background traffic is low. And the used generation countermeasure network adopts a flow generation method based on the data packet level characteristics, only the data packet level characteristics of the network flow are used for representing the flow, the mutual influence among various data packets is not considered, the generated flow has low fidelity, and the improvement of safety and reliability is influenced.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a network background traffic generation method based on a generation countermeasure network, which is used for solving the technical problems of low safety and reliability in the prior art.
In order to realize the purpose, the technical scheme adopted by the invention comprises the following steps:
(1) Obtaining a training sample set XtrainAnd test sample set Xtest:
(1a) Dividing S original flow data packets including M kinds of network applications continuously sent by a communication node in internet communication into N groups, wherein each group includes a plurality of original flow data packets with the same source IP address, source port, destination IP address and destination port in one communication process, extracting statistical characteristics of all the original flow data packets in each group to form data flow, and obtaining a data flow set F = { F } including N data flows1,F2,...,Fn,...,FNWherein M is not less than 2,S not less than 10000nRepresenting the nth data flow containing the application class characteristics of the next data flow describing the relevance of different network flows; each network application at least corresponds to one data flow, and each data flow corresponds to one network application;
(1b) For each data flow FnThe non-digital features of the data stream are subjected to one-hot coding, and the one-hot coding result is normalized to obtain a preprocessed data stream setThen applying the class label to each data flow by using the networkIs marked to obtainCorresponding network application class label set y = { y =1,y2,...,yn,...,yN} then N in it1Training sample set X consisting of strip data streams and corresponding labelstrainN will remain2Strip data stream and corresponding label thereof form test sample set XtestIn which N is1>1/2N,N=N1+N2;
(2) Constructing M generation confrontation network models:
constructing a generative countermeasure network model C = { C comprising M parallel arrangements of the same kind as the network application1,C2,...,Cm,...,CMEach of which generates a countermeasure network CmComprising a generator network G cascaded in sequencemAnd arbiter network DmWherein, CmRepresenting a generating countermeasure network corresponding to the mth network application, the generator network GmThe device comprises a laminated input layer, a plurality of first full-connection layers and a tanh activation function output layer; arbiter network DmThe device comprises a laminated input layer, a plurality of second full-connection layers and a sigmoid activation function output layer;
(3) Performing iterative training on the generated confrontation network model library:
(3a) Initializing each generative confrontation network model CmMedium generator network GmArbiter network DmRespectively isThe iteration frequency is Q, the maximum iteration frequency is Q, Q is more than or equal to 10000, and Q =0;
(3b) Will train sample set XtrainAs inputs to the M parallel arranged generative confrontation network models, each generative confrontation network model CmMedium generator network GmTo XtrainEach of the K data streams having a medium label of mPerforming characteristic prediction to obtain CmCorresponding predictive data stream feature setCombination of Chinese herbsWherein K is less than N1;
(3c) Arbiter network DmCalculate each one separatelyAnd each ofDerived from a training sample set XtrainIs given a probability ofCorresponding probability setAnd XtrainCorresponding probability set D of K data streams with m middle labels2={d1,d2,...,dk,...,dKAnd (c) the step of (c) in which,to representThrough a generator network GmThe characteristics of the resulting data stream are predicted,representation arbiter network DmComputingDerived from a sample set XtrainAnd the label is the probability of m, dkRepresentation arbiter network DmComputingDerived from a sample set XtrainAnd the probability of the label being m;
(3d) Using cross entropy loss functionBy passingComputation generator network GmLoss ofAt the same time passAnd dkComputation decider network DmLoss ofAnd using a counter-propagating method byComputation generator network GmGradient of network parameters byComputation arbiter network DmA network parameter gradient of (a); then using a gradient descent algorithm, through GmNetwork parameter gradient pair GmNetwork parameters ofUpdate is carried out through DmNetwork parameter gradient pair DmParameter (d) ofUpdating to obtain M generated countermeasure network models which are arranged in parallel for the iteration;
(3e) Judging whether Q = Q is true, if so, obtaining M trained generation confrontation network modelsOtherwise, let q = q +1, and perform step (3 b);
(4) Acquiring the characteristics of the data flow of the predicted flow:
set X of test samplestestAsM trained inputs to generate an antagonistic network model, each trained generator networkTo XtestPerforming feature prediction on each data stream sample with the middle label of M to obtain a feature set A = { A) comprising M predicted data stream feature subsets1,A2,...,Am,...,AMIn which A ismRepresenting a test sample set XtestEach sample with m labels passes through a corresponding generator networkPredicting the obtained predicted data stream feature set;
(5) Obtaining a network background flow generation result:
(5a) Initializing an application class v of a first data stream1The iteration number is L, the number of the network background flows to be generated is L, and L =1, wherein v is more than or equal to 11≤M;
(5b) Selecting the application category v in the prediction data stream feature set AlOf the feature subset ofAnd generates the satisfactionIn a randomly selected predictive data stream characteristicInitial data stream sequence clSimultaneous extraction ofOf the next data stream ofl+1Wherein, 1 is less than or equal to vl,vl+1≤M;
(5c) Judging whether L = L is true, if so, obtaining an initial data stream set c = { c = }1,c2,...cl,...cLElse, let l= l +1, and step (5 b) is performed;
(5d) Information data to be sent by the communication node is encrypted, and the encrypted information data is embedded into each initial data stream clIn the method, a set c' = { c) of L pieces of network background traffic is obtained1′,c′2,...c′l,...,c′L}。
Compared with the prior art, the invention has the following advantages:
1. the invention trains the generated countermeasure network model by taking the next data flow application category as an important characteristic, can depict the correlation between different application flows of user nodes, better simulates and generates network background flow according to the behavior habits of users, ensures that the flow obtained by simulation in a certain time dimension is very close to normal flow, an attacker cannot detect the abnormity of the nodes by counting the correlation of the application flows in the certain time dimension through a flow analysis technology, and improves the safety of a covert communication system.
2. The invention trains the countermeasure network model by extracting the data stream formed by the statistical characteristics of all the original flow data packets in each group, so that the multidimensional data stream level statistical characteristics of the target network flow can be learned, the characteristics of the real data stream are more accurately described, and the safety of the covert communication system is further improved.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and specific examples.
Referring to fig. 1, the present invention includes the steps of:
step 1) obtaining a training sample set XtrainAnd test sample set Xtest:
Step 1 a) in this embodiment, a wirereshark tool is used to capture S original traffic data packets including M types of network applications, which are continuously sent by a laboratory computer for 7 days according to the use habits of users in the campus network environment, and the original traffic data packets are divided into N groups, wherein each group includes a source IP address, a source port and a destination IP address in a primary communication processA plurality of original flow data packets with the same address and destination ports, and 10 main statistical features of all the original flow data packets in each group are extracted to form a data stream, so as to obtain a data stream set F = { including N data streams1,F2,...,Fn,...,FNWherein M is not less than 2,S not less than 10000nRepresenting the nth data stream containing the next data stream application category characteristics describing different network traffic correlations, wherein, in the embodiment, the S =129054, n =7068, m =5,5 network applications include an HTTP web page request, a WeChat, an OneNote, a 163 mailbox and a channel dictionary, each traffic data stream includes 8 digital characteristics and 2 non-digital characteristics, and the 8 digital characteristics are the total number of data packets, the connection duration of the data packets, the mean value and the median of the lengths of the data packets, the mean value and the median of the arrival time intervals of the data packets, and the mean value and the median of the lengths of the sliding windows, respectively; the 2 non-numeric features are the protocol type and the next data stream application category.
The next data flow application category is taken as an important characteristic to depict the correlation between different application flows of the user node, network background flow can be better simulated and generated according to user behavior habits, the flow obtained through simulation is very close to normal flow, an attacker cannot detect the abnormity of the node through the correlation of the application flows in a certain time dimension through flow analysis technology statistics, and the safety of a covert communication system is improved. The statistical characteristics of the data flow level are used for paying attention to the characteristics of the data flow and the arrival process of the data flow, the interaction among various data packets is considered, and the network flow is simulated more accurately.
Step 1 b) for each data flow FnIs subjected to one-hot encoding using the formula X' =2 × logW(X + 1) -1, normalizing each flow data stream subjected to unique hot coding to obtain a preprocessed data stream setThen applying the class label to each data flow by using the networkIs marked to obtainCorresponding network application class label set y = { y = { y =1,y2,...,yn,...,yNAnd then N in the solution1Training sample set X consisting of strip data streams and corresponding labelstrainN will remain2Strip data stream and corresponding label thereof form test sample set XtestWherein X is the original eigenvalue, W is the upper limit of X, X' is the normalized eigenvalue, N1>1/2N,N=N1+N2In the present embodiment, N1=5000,N2=2068, performing one-hot coding on two characteristics of the protocol type and the next data stream application category, and quantizing into a 3-dimensional digital characteristic and a 5-dimensional digital characteristic respectively;
the one-hot coding uses an n-bit state register to code n states, each state has an independent register bit, the value of discrete characteristics can be expanded to Euclidean space by using the one-hot coding, the characteristic dimension is increased, non-digital characteristics which are difficult to learn of the generated confrontation network model are converted into digital characteristics which are easy to learn, and the training difficulty of generating the confrontation network model is reduced.
The data normalization technology can simplify data operation, and can solve the problem of gradient explosion when the generation confrontation network adjusts network parameters according to a gradient descent algorithm, thereby accelerating the convergence speed of the generation confrontation network model.
Step 2) constructing M confrontation network generation models:
constructing a generative countermeasure network model C = { C comprising M parallel arrangements of the same kind as the network application1,C2,...,Cm,...,CMEach of which generates a countermeasure network CmComprising a generator network G cascaded in sequencemAnd arbiter network DmWherein, CmRepresenting a generating countermeasure network corresponding to the mth network application, the generator network GmComprising a laminated input layer, a plurality of first fully-connected layers anda tanh activation function output layer; arbiter network DmThe device comprises a laminated input layer, a plurality of second full-connection layers and a sigmoid activation function output layer;
generator network GmThe number of the contained first full-connection layers is 3, the number of the neurons is 80, 100 and 60 respectively, the activation functions are all leak-relu, the output layer contains 15 neurons, and the activation function is tanh;
arbiter network DmThe number of the second full-connection layers is 3, the number of the neurons is 50, 80 and 30 respectively, the activation functions are all leak-relu, the output layer comprises 1 neuron, and the activation function is sigmoid.
The leakage-relu activation function is a variant of the classical relu activation function, the output of the function has a very small gradient to the negative input, and because the reciprocal is always not zero, the occurrence of silent neurons is reduced, and the problem of network gradient disappearance in the back propagation process is solved. The leaky-relu activation function is used in the generation of the countermeasure network, so that the learning speed of the generation of the countermeasure network can be increased, and the training time for generating the countermeasure network is further shortened.
Step 3) generating an iterative training of the antagonistic network model:
step 3 a) initializing each generative confrontation network model CmMedium generator network GmArbiter network DmRespectively isThe iteration number is Q, the maximum iteration number is Q, Q =50000, and Q =0;
step 3 b) training sample set XtrainAs inputs to the M parallel arranged generative confrontation network models, each generative confrontation network model CmMedium generator network GmTo XtrainEach of the K data streams with m middle tagsPerforming characteristic prediction to obtain CmCorresponding set of predicted data stream featuresWherein K is less than N1;
Step 3 c) arbiter network DmCalculate each one separatelyAnd each ofFrom a training sample set XtrainIs given a probability ofCorresponding probability setAnd XtrainCorresponding probability set D of K data streams with m middle labels2={d1,d2,...,dk,...,dKAnd (c) the step of (c) in which,to representThrough a generator network GmThe characteristics of the resulting data stream are predicted,representation arbiter network DmComputingDerived from a sample set XtrainAnd the probability of the label being m, dkRepresentation arbiter network DmComputingDerived from a sample set XtrainAnd the probability of the label being m;
step 3 d) using a cross entropy loss function byComputation generator network GmLoss of (2)At the same time passAnd dkComputation decider network DmLoss ofAnd adopting a back propagation method built in an Adam optimizer byComputation generator network GmGradient of network parameters byComputation arbiter network DmA network parameter gradient of (a); then Adam optimizer uses gradient descent algorithm, through GmNetwork parameter gradient pair GmNetwork parameters ofUpdate is carried out by DmNetwork parameter gradient pair DmParameter (d) ofUpdating to obtain M generated confrontation network models which are arranged in parallel at the iteration; wherein loss ofAndthe calculation formulas of (A) and (B) are respectively as follows:
step 3 e) judging whether Q = Q is true, if so, obtaining M trained generation confrontation network modelsOtherwise, let q = q +1, and perform step (3 b);
step 4), acquiring the characteristics of the data flow of the predicted flow:
set X of test samplestestEach trained generator network is used as input of M trained generation confrontation network modelsTo XtestPerforming feature prediction on each data stream sample with the middle label of M to obtain a feature set A = { A) comprising M predicted data stream feature subsets1,A2,...,Am,...,AMIn which A ismRepresenting a test sample set XtestEach sample with m labels passes through a corresponding generator networkPredicting the obtained predicted data stream feature set;
step 5), obtaining a network background flow generation result:
step 5 a) initializing the application class v of the first data stream1The iteration number is L, the number of the network background flows to be generated is L, and L =1, wherein v is more than or equal to 11≤5;
Step 5 b) selecting the application category v in the prediction data stream feature set AlFeature subset of (2)Using tarfen equal flow generator script to collect characteristic set according to predicted flow data streamWriting a configuration file, based on which the traffic generator is basedGenerating an initial data stream sequence clSimultaneously extractOf the next data stream ofl+1Wherein v is 1. Ltoreq. Vl,vl+1≤5;
Step 5 c) judging whether L = L is true, if so, obtaining an initial data stream set c = { c = { c }1,c2,...cl,...cLElse, let l = l +1, and perform step (5 b);
step 5 d) encrypting the information data to be sent by the communication node and embedding the encrypted information data into each initial data stream clIn the method, a set c' = { c) of L pieces of network background traffic is obtained1′,c′2,...c′l,...,c′L}。
Selecting a corresponding generation countermeasure network for next data stream generation according to a next data stream application category obtained by the generation countermeasure network learning, simulating the correlation between different application flows of a user node in a certain time dimension, conforming to the user behavior habit, enabling the simulated flow to be closer to the normal flow, enabling an attacker not to count the correlation of the application flows in the certain time dimension through a flow analysis technology to detect the abnormity of the node, and improving the safety of the covert communication system.
The technical effects of the invention are further explained by combining simulation experiments as follows:
1. simulation conditions and contents:
the platform of the simulation experiment of the invention is as follows: the operating systems are Windows 10, tensorflow version 2.3.0, jupyter notebook version 4.3.0, python version 3.8.3.
The simulation experiment of the invention is to adopt the invention and the prior art "network background flow generating method based on generating confrontation network GAN" to carry on the contrast experiment, use KNN classifier to compare the accuracy and total accuracy of the background flow that the invention and prior art generate 5 kinds of different applications and correspond to, get the comparative result of the two, and draw the comparative result into table 1; comparing the accuracy rate and the total accuracy rate of the background flow corresponding to 5 different applications generated by the method and the prior art by using the SVM classifier to obtain the comparison results of the background flow and the total accuracy rate, and respectively drawing the comparison results into a table 2.
TABLE 1 comparison of background traffic generated for different applications using KNN classifier
HTTP | OneNote | 163 mailbox | There is a dictionary of way | General of | ||
Prior Art | 0.71 | 0.58 | 0.60 | 0.63 | 0.61 | 0.64 |
The invention | 0.91 | 0.95 | 0.92 | 0.98 | 0.95 | 0.94 |
TABLE 2 comparison of background traffic generated for different applications using SVM classifier
HTTP | OneNote | 163 mailbox | There is a dictionary of way | General of | ||
Prior Art | 0.91 | 0.40 | 0.60 | 0.42 | 0.62 | 0.61 |
The invention | 0.95 | 0.98 | 0.96 | 0.95 | 0.96 | 0.96 |
2. And (3) simulation result analysis:
according to the background traffic generation method disclosed by the invention, after the correlation between different application traffic of the user node is described by using the next data stream application category as an important feature, the accuracy of the generated background traffic corresponding to each application is effectively improved, the network background traffic can be better simulated and generated according to the behavior habits of the user, and the safety of a covert communication system is improved.
The foregoing description is only an example of the present invention and does not constitute any limitation to the present invention, and it will be apparent to those skilled in the art that various modifications and variations in form and detail may be made without departing from the principle of the present invention after understanding the content and principle of the present invention, but these modifications and variations are within the scope of the claims of the present invention.
Claims (4)
1. A network background traffic generation method based on a generation countermeasure network is characterized by comprising the following steps:
(1) Obtaining a training sample set XtrainAnd test sample set Xtest:
(1a) The method comprises the steps that S original flow data packets including M kinds of network applications and continuously sent by a communication node in internet communication are divided into N groups, and each group includes a plurality of original flow data packets with the same source IP address, source port, destination IP address and destination port in one communication processData packets, and extracting statistical characteristics of all original flow data packets in each group to form data streams, so as to obtain a data stream set F = { F } including N data streams1,F2,...,Fn,...,FNWherein M is not less than 2,S not less than 10000nRepresenting the nth data flow containing the application class characteristics of the next data flow describing the relevance of different network flows;
(1b) For each data flow FnThe non-digital features of the data stream are subjected to one-hot coding, and the one-hot coding result is normalized to obtain a preprocessed data stream setThe network application category label is then used to label each data streamIs marked to obtainCorresponding network application class label set y = { y = { y =1,y2,...,yn,...,yNAnd then N in the solution1Training sample set X consisting of strip data streams and corresponding labelstrainN will remain2Strip data stream and corresponding label thereof form a test sample set XtestIn which N is1>1/2N,N=N1+N2;
(2) Constructing M generation confrontation network models:
constructing a generative countermeasure network model C = { C comprising M parallel arrangements of the same kind as the network application1,C2,...,Cm,...,CMEach of which generates a countermeasure network CmComprising a generator network G cascaded in sequencemAnd arbiter network DmWherein, CmRepresenting a generating countermeasure network corresponding to the mth network application, the generator network GmThe device comprises a laminated input layer, a plurality of first full-connection layers and a tanh activation function output layer; arbiter network DmComprises a laminated input layer,A plurality of second fully-connected layers and sigmoid activation function output layers;
(3) Generating an iterative training of the antagonistic network model:
(3a) Initializing each generative confrontation network model CmMedium generator network GmArbiter network DmRespectively isThe iteration frequency is Q, the maximum iteration frequency is Q, Q is more than or equal to 10000, and Q =0;
(3b) Will train sample set XtrainAs inputs to M parallel arranged generative confrontation network models, each generative confrontation network model CmMedium generator network GmTo XtrainEach of the K data streams with m middle tagsPerforming characteristic prediction to obtain CmCorresponding set of predicted data stream featuresWherein K is less than N1;
(3c) Arbiter network DmCalculate each one separatelyAnd each ofDerived from a training sample set XtrainIs given a probability ofCorresponding probability setAnd XtrainCorresponding probability set D of K data streams with m middle labels2={d1,d2,...,dk,...,dKAnd (c) the step of (c) in which,to representThrough a generator network GmThe characteristics of the resulting data stream are predicted,representation arbiter network DmComputingDerived from a sample set XtrainAnd the probability of the label being m, dkRepresentation arbiter network DmCalculating outDerived from a sample set XtrainAnd the probability of a label being m;
(3d) Using a cross-entropy loss function, byComputation generator network GmLoss of (2)At the same time passAnd dkComputation decider network DmLoss ofAnd using a counter-propagating method byComputation generationDevice network GmGradient of network parameters byComputation arbiter network DmNetwork parameter gradient of (a); then using a gradient descent algorithm, through GmNetwork parameter gradient pair GmNetwork parameters ofUpdate is carried out by DmNetwork parameter gradient pair DmParameter (d) ofUpdating to obtain M generated confrontation network models which are arranged in parallel at the iteration;
(3e) Judging whether Q = Q is true, if so, obtaining M trained generation confrontation network modelsOtherwise, let q = q +1, and perform step (3 b);
(4) Acquiring the characteristics of the data flow of the predicted flow:
set of test samples XtestEach trained generator network is used as input of M trained generation confrontation network modelsTo XtestPerforming feature prediction on each data stream sample with the middle label of M to obtain a feature set A = { A) comprising M predicted data stream feature subsets1,A2,...,Am,...,AMIn which A ismRepresenting a test sample set XtestEach sample with m labels passes through a corresponding generator networkPredicting the obtained predicted data stream feature set;
(5) Obtaining a network background flow generation result:
(5a) Initializing an application class v of a first data stream1The iteration number is L, the number of the network background flows to be generated is L, and L =1, wherein v is more than or equal to 11≤M;
(5b) Selecting the application category v in the prediction data stream feature set AlFeature subset of (2)And generates the satisfactionIn a randomly selected predictive data stream characteristicInitial data stream sequence clSimultaneously extractOf the next data stream ofl+1Wherein v is 1. Ltoreq. Vl,vl+1≤M;
(5c) Judging whether L = L is true, if so, obtaining an initial data stream set c = { c = }1,c2,...cl,...cLExecuting step (5 d), otherwise, let l = l +1, and executing step (5 b);
(5d) Information data to be sent by the communication node is encrypted, and the encrypted information data is embedded into each initial data stream clObtaining L pieces of network background flow sets c '= { c'1,c′2,...c′l,...,c′L}。
2. The method for generating network background traffic based on generation countermeasure network of claim 1, wherein the main statistical characteristics of all original traffic data packets in each group in step (1 a) comprise digital characteristics and non-digital characteristics, wherein:
the digital characteristics mainly comprise the total number of data packets, the connection duration of the data packets, the mean value and the median of the length of the data packets, the mean value and the median of the arrival time interval of the data packets and the mean value and the median of the length of a sliding window;
the non-numeric characteristics mainly include the protocol type and the next data stream application category.
3. The method according to claim 1, wherein each of the created countermeasure networks C in step (2) is a GANmComprising a generator network G cascaded in sequencemAnd arbiter network DmWherein:
generator network GmThe number of the first full-connection layers is 3, the number of the neurons is 80, 100 and 60 respectively, the activation functions are all leak-relu, the output layer comprises 15 neurons, and the activation function is tanh;
arbiter network DmThe number of the second full-connection layers is 3, the number of the neurons is 50, 80 and 30 respectively, the activation functions are all leak-relu, the output layer comprises 1 neuron, and the activation function is sigmoid.
4. The method for generating network background traffic based on generation of countermeasure network as claimed in claim 1, wherein the passing in step (3 d)Computation generator network GmLoss of (2)And byAnd dkComputation decider network DmLoss of (2)The calculation formulas are respectively as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210720772.7A CN115277086B (en) | 2022-06-16 | 2022-06-16 | Network background flow generation method based on generation of countermeasure network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210720772.7A CN115277086B (en) | 2022-06-16 | 2022-06-16 | Network background flow generation method based on generation of countermeasure network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115277086A true CN115277086A (en) | 2022-11-01 |
CN115277086B CN115277086B (en) | 2023-10-20 |
Family
ID=83760931
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210720772.7A Active CN115277086B (en) | 2022-06-16 | 2022-06-16 | Network background flow generation method based on generation of countermeasure network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115277086B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115604131A (en) * | 2022-12-15 | 2023-01-13 | 广州丰石科技有限公司(Cn) | Link flow prediction method, system, electronic device and medium |
CN116708258A (en) * | 2023-06-20 | 2023-09-05 | 中国电子科技集团公司第十五研究所 | Background flow network topology convergence method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109889452A (en) * | 2019-01-07 | 2019-06-14 | 中国科学院计算技术研究所 | Network context flow generation method and system based on condition production confrontation network |
US20210073630A1 (en) * | 2019-09-10 | 2021-03-11 | Robert Bosch Gmbh | Training a class-conditional generative adversarial network |
WO2021174935A1 (en) * | 2020-03-03 | 2021-09-10 | 平安科技(深圳)有限公司 | Generative adversarial neural network training method and system |
CN113726545A (en) * | 2021-06-23 | 2021-11-30 | 清华大学 | Network traffic generation method and device for generating countermeasure network based on knowledge enhancement |
-
2022
- 2022-06-16 CN CN202210720772.7A patent/CN115277086B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109889452A (en) * | 2019-01-07 | 2019-06-14 | 中国科学院计算技术研究所 | Network context flow generation method and system based on condition production confrontation network |
US20210073630A1 (en) * | 2019-09-10 | 2021-03-11 | Robert Bosch Gmbh | Training a class-conditional generative adversarial network |
WO2021174935A1 (en) * | 2020-03-03 | 2021-09-10 | 平安科技(深圳)有限公司 | Generative adversarial neural network training method and system |
CN113726545A (en) * | 2021-06-23 | 2021-11-30 | 清华大学 | Network traffic generation method and device for generating countermeasure network based on knowledge enhancement |
Non-Patent Citations (3)
Title |
---|
PETER ASHWOOD-SMITH;BILEL JAMOUSSI;DON FEDYK;DAREK SKALECKI;NORTEL NETWORKS;PETER ASHWOOD-SMITH;BILEL JAMOUSSI;DON FEDYK;DAREK SKA: "IMPROVING TOPOLOGY DATA BASE ACCURACY WITH LSP FEEDBACK VIA CR-LDP", IETF * |
T. ZSEBY; FRAUNHOFER FOKUS; M. MOLINA; DANTE; N. DUFFIELD; AT AMP;AMP;AMP;T LABS ?RESEARCH;S. NICCOLINI; NEC EUROPE LTD.;F. RASPAL: "Sampling and Filtering Techniques for IP Packet Selection", IETF * |
李杰;周路;李华欣;闫璐;朱浩瑾;: "基于生成对抗网络的网络流量特征伪装技术", 计算机工程, no. 12 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115604131A (en) * | 2022-12-15 | 2023-01-13 | 广州丰石科技有限公司(Cn) | Link flow prediction method, system, electronic device and medium |
CN116708258A (en) * | 2023-06-20 | 2023-09-05 | 中国电子科技集团公司第十五研究所 | Background flow network topology convergence method and device |
CN116708258B (en) * | 2023-06-20 | 2024-04-19 | 中国电子科技集团公司第十五研究所 | Background flow network topology convergence method and device |
Also Published As
Publication number | Publication date |
---|---|
CN115277086B (en) | 2023-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115277086B (en) | Network background flow generation method based on generation of countermeasure network | |
CN111565156B (en) | Method for identifying and classifying network traffic | |
CN108629183A (en) | Multi-model malicious code detecting method based on Credibility probability section | |
Liu et al. | LSTM-CGAN: Towards generating low-rate DDoS adversarial samples for blockchain-based wireless network detection models | |
CN111245667A (en) | Network service identification method and device | |
Soleymanpour et al. | CSCNN: cost-sensitive convolutional neural network for encrypted traffic classification | |
Qin et al. | Attentional payload anomaly detector for web applications | |
CN110334488B (en) | User authentication password security evaluation method and device based on random forest model | |
CN112948578B (en) | DGA domain name open set classification method, device, electronic equipment and medium | |
Kopal | Of Ciphers and Neurons-Detecting the Type of Ciphers Using Artificial Neural Networks. | |
Zhang et al. | Adaptive matrix sketching and clustering for semisupervised incremental learning | |
CN112702157B (en) | Block cipher system identification method based on improved random forest algorithm | |
CN114826681A (en) | DGA domain name detection method, system, medium, equipment and terminal | |
Zhou et al. | Few-shot website fingerprinting attack with cluster adaptation | |
Cai et al. | A malicious network traffic detection model based on bidirectional temporal convolutional network with multi-head self-attention mechanism | |
CN117318980A (en) | Small sample scene-oriented self-supervision learning malicious traffic detection method | |
CN113542271B (en) | Network background flow generation method based on generation of confrontation network GAN | |
Wang et al. | A two-phase approach to fast and accurate classification of encrypted traffic | |
Jovic et al. | Traditional machine learning methods for side-channel analysis | |
CN110188928A (en) | A kind of the formative optimization system and method for cloud data education training process | |
Jian et al. | An induction learning approach for building intrusion detection models using genetic algorithms | |
CN113111329B (en) | Password dictionary generation method and system based on multi-sequence long-term and short-term memory network | |
Du et al. | DBWE-Corbat: Background network traffic generation using dynamic word embedding and contrastive learning for cyber range | |
CN111401067B (en) | Honeypot simulation data generation method and device | |
Zhang et al. | A novel RNN-GBRBM based feature decoder for anomaly detection technology in industrial control network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |