CN115277086A - Network background flow generation method based on generation countermeasure network - Google Patents

Network background flow generation method based on generation countermeasure network Download PDF

Info

Publication number
CN115277086A
CN115277086A CN202210720772.7A CN202210720772A CN115277086A CN 115277086 A CN115277086 A CN 115277086A CN 202210720772 A CN202210720772 A CN 202210720772A CN 115277086 A CN115277086 A CN 115277086A
Authority
CN
China
Prior art keywords
network
data stream
flow
data
generation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210720772.7A
Other languages
Chinese (zh)
Other versions
CN115277086B (en
Inventor
董庆宽
穆涛
陈原
任晓龙
杨福兴
马飞龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202210720772.7A priority Critical patent/CN115277086B/en
Publication of CN115277086A publication Critical patent/CN115277086A/en
Application granted granted Critical
Publication of CN115277086B publication Critical patent/CN115277086B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a network background flow generation method based on a generation countermeasure network, which comprises the following steps: 1) Acquiring a training sample set and a test sample set; 2) Constructing M generation confrontation network models; 3) Generating an antagonistic network model and performing iterative training; 4) Acquiring characteristics of a data flow of the predicted flow; 5) And obtaining a network background flow generation result. The invention trains a countermeasure network model by extracting the data stream composed of the statistical characteristics of all original flow data packets in each group, can learn the multidimensional data stream level statistical characteristics of the target network flow, and uses the next data stream application category as a characteristic to learn the correlation and user behavior habits among different application flows of user nodes, so that the flow sent by the nodes with special identity information is very close to the normal flow, the characteristics of the real data stream are more accurately described, and the safety of a covert communication system is effectively improved.

Description

Network background flow generation method based on generation of countermeasure network
Technical Field
The invention belongs to the technical field of network security, relates to a network background traffic generation method, and particularly relates to a network background traffic generation method based on a generation countermeasure network, which can be used for generating network background traffic.
Background
When communication nodes in the internet use network applications to communicate, traffic data packets need to be interacted, and an attacker can classify the traffic data packets of the communication nodes by using a network traffic classification technology to implement traffic interception. Therefore, the research of the background traffic generation technology which can bypass the analysis of the traffic of the attacker has important significance.
Network traffic generation techniques may be used to simulate real network traffic to enable covert communication at the node. The network flow generation method mainly comprises a network flow generation method based on a statistical model and a network flow generation method based on flow characteristics.
The network flow generation method based on the statistical model mainly uses the Markov model, the Poisson distribution model and other statistical models to match with a flow generation tool to generate the flow, and the method mainly generates the background network flow during the internet pressure test. The method has the disadvantages that the relation between the traffic data packets is difficult to simulate under the condition of huge amount of network traffic based on a simple probability model, and the establishment of a complex probability model is very difficult.
The network traffic generation method based on the traffic characteristics is divided into a packet level and a flow level according to the granularity of the traffic characteristics. The flow generation method based on the data packet level characteristics mainly focuses on the statistical characteristics and the arrival process of the data packets, only considers the basic characteristics of a single data packet, does not consider the mutual influence among various data packets, ignores the flow characteristics among protocols and in a single protocol, and has low fidelity of the generated flow. The traffic generation based on the data stream level is mainly concerned with the characteristics of the data stream and its arrival process. A data flow generally refers to a quadruple consisting of a source IP address, a source port, a destination IP address, and a destination port. The disadvantage is that the characteristics of the user behavior habit, the relevance of different application flows, the time dimension and the like are lacked, so that the attacker analyzes the user node from the aspect. The network flow generation method based on the flow characteristics mainly extracts the characteristics of data flow by means of a machine learning technology to be used as a training sample set of a neural network, then builds the neural network to carry out iterative training, finally carries out simulation output on the network flow characteristics, then uses a flow generation tool to generate an initial data packet sequence according to the network flow characteristics obtained by simulation, encrypts data required to be sent by a user and then embeds the data into the initial data packet sequence to generate network flow.
The method extracts the packet-level characteristics of network traffic packet sample sets of different applications collected in advance, sends the packet-level characteristics to a corresponding generation countermeasure network model in a generation countermeasure network model library for iterative training, then randomly selects a generation countermeasure network from the trained generation countermeasure network model library, and uses the selected generator network for generating the simulation background traffic for sending. However, the method has the disadvantages that only one randomly selected generation countermeasure network is used for generating the background traffic each time, the habit of using the network application by a user in a time dimension is not considered, an attacker can detect the hidden traffic from the generated background traffic and further discover the hidden communication node, and the generation safety of the network background traffic is low. And the used generation countermeasure network adopts a flow generation method based on the data packet level characteristics, only the data packet level characteristics of the network flow are used for representing the flow, the mutual influence among various data packets is not considered, the generated flow has low fidelity, and the improvement of safety and reliability is influenced.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a network background traffic generation method based on a generation countermeasure network, which is used for solving the technical problems of low safety and reliability in the prior art.
In order to realize the purpose, the technical scheme adopted by the invention comprises the following steps:
(1) Obtaining a training sample set XtrainAnd test sample set Xtest
(1a) Dividing S original flow data packets including M kinds of network applications continuously sent by a communication node in internet communication into N groups, wherein each group includes a plurality of original flow data packets with the same source IP address, source port, destination IP address and destination port in one communication process, extracting statistical characteristics of all the original flow data packets in each group to form data flow, and obtaining a data flow set F = { F } including N data flows1,F2,...,Fn,...,FNWherein M is not less than 2,S not less than 10000nRepresenting the nth data flow containing the application class characteristics of the next data flow describing the relevance of different network flows; each network application at least corresponds to one data flow, and each data flow corresponds to one network application;
(1b) For each data flow FnThe non-digital features of the data stream are subjected to one-hot coding, and the one-hot coding result is normalized to obtain a preprocessed data stream set
Figure BDA0003698372120000031
Then applying the class label to each data flow by using the network
Figure BDA0003698372120000032
Is marked to obtain
Figure BDA0003698372120000033
Corresponding network application class label set y = { y =1,y2,...,yn,...,yN} then N in it1Training sample set X consisting of strip data streams and corresponding labelstrainN will remain2Strip data stream and corresponding label thereof form test sample set XtestIn which N is1>1/2N,N=N1+N2
(2) Constructing M generation confrontation network models:
constructing a generative countermeasure network model C = { C comprising M parallel arrangements of the same kind as the network application1,C2,...,Cm,...,CMEach of which generates a countermeasure network CmComprising a generator network G cascaded in sequencemAnd arbiter network DmWherein, CmRepresenting a generating countermeasure network corresponding to the mth network application, the generator network GmThe device comprises a laminated input layer, a plurality of first full-connection layers and a tanh activation function output layer; arbiter network DmThe device comprises a laminated input layer, a plurality of second full-connection layers and a sigmoid activation function output layer;
(3) Performing iterative training on the generated confrontation network model library:
(3a) Initializing each generative confrontation network model CmMedium generator network GmArbiter network DmRespectively is
Figure BDA0003698372120000034
The iteration frequency is Q, the maximum iteration frequency is Q, Q is more than or equal to 10000, and Q =0;
(3b) Will train sample set XtrainAs inputs to the M parallel arranged generative confrontation network models, each generative confrontation network model CmMedium generator network GmTo XtrainEach of the K data streams having a medium label of m
Figure BDA0003698372120000035
Performing characteristic prediction to obtain CmCorresponding predictive data stream feature setCombination of Chinese herbs
Figure BDA0003698372120000036
Wherein K is less than N1
(3c) Arbiter network DmCalculate each one separately
Figure BDA0003698372120000037
And each of
Figure BDA0003698372120000038
Derived from a training sample set XtrainIs given a probability of
Figure BDA0003698372120000039
Corresponding probability set
Figure BDA00036983721200000310
And XtrainCorresponding probability set D of K data streams with m middle labels2={d1,d2,...,dk,...,dKAnd (c) the step of (c) in which,
Figure BDA00036983721200000311
to represent
Figure BDA00036983721200000312
Through a generator network GmThe characteristics of the resulting data stream are predicted,
Figure BDA00036983721200000313
representation arbiter network DmComputing
Figure BDA00036983721200000314
Derived from a sample set XtrainAnd the label is the probability of m, dkRepresentation arbiter network DmComputing
Figure BDA0003698372120000041
Derived from a sample set XtrainAnd the probability of the label being m;
(3d) Using cross entropy loss functionBy passing
Figure BDA0003698372120000042
Computation generator network GmLoss of
Figure BDA0003698372120000043
At the same time pass
Figure BDA0003698372120000044
And dkComputation decider network DmLoss of
Figure BDA0003698372120000045
And using a counter-propagating method by
Figure BDA0003698372120000046
Computation generator network GmGradient of network parameters by
Figure BDA0003698372120000047
Computation arbiter network DmA network parameter gradient of (a); then using a gradient descent algorithm, through GmNetwork parameter gradient pair GmNetwork parameters of
Figure BDA0003698372120000048
Update is carried out through DmNetwork parameter gradient pair DmParameter (d) of
Figure BDA0003698372120000049
Updating to obtain M generated countermeasure network models which are arranged in parallel for the iteration;
(3e) Judging whether Q = Q is true, if so, obtaining M trained generation confrontation network models
Figure BDA00036983721200000410
Otherwise, let q = q +1, and perform step (3 b);
(4) Acquiring the characteristics of the data flow of the predicted flow:
set X of test samplestestAsM trained inputs to generate an antagonistic network model, each trained generator network
Figure BDA00036983721200000411
To XtestPerforming feature prediction on each data stream sample with the middle label of M to obtain a feature set A = { A) comprising M predicted data stream feature subsets1,A2,...,Am,...,AMIn which A ismRepresenting a test sample set XtestEach sample with m labels passes through a corresponding generator network
Figure BDA00036983721200000412
Predicting the obtained predicted data stream feature set;
(5) Obtaining a network background flow generation result:
(5a) Initializing an application class v of a first data stream1The iteration number is L, the number of the network background flows to be generated is L, and L =1, wherein v is more than or equal to 11≤M;
(5b) Selecting the application category v in the prediction data stream feature set AlOf the feature subset of
Figure BDA00036983721200000413
And generates the satisfaction
Figure BDA00036983721200000414
In a randomly selected predictive data stream characteristic
Figure BDA00036983721200000415
Initial data stream sequence clSimultaneous extraction of
Figure BDA00036983721200000416
Of the next data stream ofl+1Wherein, 1 is less than or equal to vl,vl+1≤M;
(5c) Judging whether L = L is true, if so, obtaining an initial data stream set c = { c = }1,c2,...cl,...cLElse, let l= l +1, and step (5 b) is performed;
(5d) Information data to be sent by the communication node is encrypted, and the encrypted information data is embedded into each initial data stream clIn the method, a set c' = { c) of L pieces of network background traffic is obtained1′,c′2,...c′l,...,c′L}。
Compared with the prior art, the invention has the following advantages:
1. the invention trains the generated countermeasure network model by taking the next data flow application category as an important characteristic, can depict the correlation between different application flows of user nodes, better simulates and generates network background flow according to the behavior habits of users, ensures that the flow obtained by simulation in a certain time dimension is very close to normal flow, an attacker cannot detect the abnormity of the nodes by counting the correlation of the application flows in the certain time dimension through a flow analysis technology, and improves the safety of a covert communication system.
2. The invention trains the countermeasure network model by extracting the data stream formed by the statistical characteristics of all the original flow data packets in each group, so that the multidimensional data stream level statistical characteristics of the target network flow can be learned, the characteristics of the real data stream are more accurately described, and the safety of the covert communication system is further improved.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and specific examples.
Referring to fig. 1, the present invention includes the steps of:
step 1) obtaining a training sample set XtrainAnd test sample set Xtest
Step 1 a) in this embodiment, a wirereshark tool is used to capture S original traffic data packets including M types of network applications, which are continuously sent by a laboratory computer for 7 days according to the use habits of users in the campus network environment, and the original traffic data packets are divided into N groups, wherein each group includes a source IP address, a source port and a destination IP address in a primary communication processA plurality of original flow data packets with the same address and destination ports, and 10 main statistical features of all the original flow data packets in each group are extracted to form a data stream, so as to obtain a data stream set F = { including N data streams1,F2,...,Fn,...,FNWherein M is not less than 2,S not less than 10000nRepresenting the nth data stream containing the next data stream application category characteristics describing different network traffic correlations, wherein, in the embodiment, the S =129054, n =7068, m =5,5 network applications include an HTTP web page request, a WeChat, an OneNote, a 163 mailbox and a channel dictionary, each traffic data stream includes 8 digital characteristics and 2 non-digital characteristics, and the 8 digital characteristics are the total number of data packets, the connection duration of the data packets, the mean value and the median of the lengths of the data packets, the mean value and the median of the arrival time intervals of the data packets, and the mean value and the median of the lengths of the sliding windows, respectively; the 2 non-numeric features are the protocol type and the next data stream application category.
The next data flow application category is taken as an important characteristic to depict the correlation between different application flows of the user node, network background flow can be better simulated and generated according to user behavior habits, the flow obtained through simulation is very close to normal flow, an attacker cannot detect the abnormity of the node through the correlation of the application flows in a certain time dimension through flow analysis technology statistics, and the safety of a covert communication system is improved. The statistical characteristics of the data flow level are used for paying attention to the characteristics of the data flow and the arrival process of the data flow, the interaction among various data packets is considered, and the network flow is simulated more accurately.
Step 1 b) for each data flow FnIs subjected to one-hot encoding using the formula X' =2 × logW(X + 1) -1, normalizing each flow data stream subjected to unique hot coding to obtain a preprocessed data stream set
Figure BDA0003698372120000061
Then applying the class label to each data flow by using the network
Figure BDA0003698372120000062
Is marked to obtain
Figure BDA0003698372120000063
Corresponding network application class label set y = { y = { y =1,y2,...,yn,...,yNAnd then N in the solution1Training sample set X consisting of strip data streams and corresponding labelstrainN will remain2Strip data stream and corresponding label thereof form test sample set XtestWherein X is the original eigenvalue, W is the upper limit of X, X' is the normalized eigenvalue, N1>1/2N,N=N1+N2In the present embodiment, N1=5000,N2=2068, performing one-hot coding on two characteristics of the protocol type and the next data stream application category, and quantizing into a 3-dimensional digital characteristic and a 5-dimensional digital characteristic respectively;
the one-hot coding uses an n-bit state register to code n states, each state has an independent register bit, the value of discrete characteristics can be expanded to Euclidean space by using the one-hot coding, the characteristic dimension is increased, non-digital characteristics which are difficult to learn of the generated confrontation network model are converted into digital characteristics which are easy to learn, and the training difficulty of generating the confrontation network model is reduced.
The data normalization technology can simplify data operation, and can solve the problem of gradient explosion when the generation confrontation network adjusts network parameters according to a gradient descent algorithm, thereby accelerating the convergence speed of the generation confrontation network model.
Step 2) constructing M confrontation network generation models:
constructing a generative countermeasure network model C = { C comprising M parallel arrangements of the same kind as the network application1,C2,...,Cm,...,CMEach of which generates a countermeasure network CmComprising a generator network G cascaded in sequencemAnd arbiter network DmWherein, CmRepresenting a generating countermeasure network corresponding to the mth network application, the generator network GmComprising a laminated input layer, a plurality of first fully-connected layers anda tanh activation function output layer; arbiter network DmThe device comprises a laminated input layer, a plurality of second full-connection layers and a sigmoid activation function output layer;
generator network GmThe number of the contained first full-connection layers is 3, the number of the neurons is 80, 100 and 60 respectively, the activation functions are all leak-relu, the output layer contains 15 neurons, and the activation function is tanh;
arbiter network DmThe number of the second full-connection layers is 3, the number of the neurons is 50, 80 and 30 respectively, the activation functions are all leak-relu, the output layer comprises 1 neuron, and the activation function is sigmoid.
The leakage-relu activation function is a variant of the classical relu activation function, the output of the function has a very small gradient to the negative input, and because the reciprocal is always not zero, the occurrence of silent neurons is reduced, and the problem of network gradient disappearance in the back propagation process is solved. The leaky-relu activation function is used in the generation of the countermeasure network, so that the learning speed of the generation of the countermeasure network can be increased, and the training time for generating the countermeasure network is further shortened.
Step 3) generating an iterative training of the antagonistic network model:
step 3 a) initializing each generative confrontation network model CmMedium generator network GmArbiter network DmRespectively is
Figure BDA0003698372120000071
The iteration number is Q, the maximum iteration number is Q, Q =50000, and Q =0;
step 3 b) training sample set XtrainAs inputs to the M parallel arranged generative confrontation network models, each generative confrontation network model CmMedium generator network GmTo XtrainEach of the K data streams with m middle tags
Figure BDA0003698372120000072
Performing characteristic prediction to obtain CmCorresponding set of predicted data stream features
Figure BDA0003698372120000073
Wherein K is less than N1
Step 3 c) arbiter network DmCalculate each one separately
Figure BDA0003698372120000074
And each of
Figure BDA0003698372120000075
From a training sample set XtrainIs given a probability of
Figure BDA0003698372120000076
Corresponding probability set
Figure BDA0003698372120000077
And XtrainCorresponding probability set D of K data streams with m middle labels2={d1,d2,...,dk,...,dKAnd (c) the step of (c) in which,
Figure BDA0003698372120000078
to represent
Figure BDA0003698372120000079
Through a generator network GmThe characteristics of the resulting data stream are predicted,
Figure BDA00036983721200000710
representation arbiter network DmComputing
Figure BDA00036983721200000711
Derived from a sample set XtrainAnd the probability of the label being m, dkRepresentation arbiter network DmComputing
Figure BDA0003698372120000081
Derived from a sample set XtrainAnd the probability of the label being m;
step 3 d) using a cross entropy loss function by
Figure BDA0003698372120000082
Computation generator network GmLoss of (2)
Figure BDA0003698372120000083
At the same time pass
Figure BDA0003698372120000084
And dkComputation decider network DmLoss of
Figure BDA0003698372120000085
And adopting a back propagation method built in an Adam optimizer by
Figure BDA0003698372120000086
Computation generator network GmGradient of network parameters by
Figure BDA0003698372120000087
Computation arbiter network DmA network parameter gradient of (a); then Adam optimizer uses gradient descent algorithm, through GmNetwork parameter gradient pair GmNetwork parameters of
Figure BDA0003698372120000088
Update is carried out by DmNetwork parameter gradient pair DmParameter (d) of
Figure BDA0003698372120000089
Updating to obtain M generated confrontation network models which are arranged in parallel at the iteration; wherein loss of
Figure BDA00036983721200000810
And
Figure BDA00036983721200000811
the calculation formulas of (A) and (B) are respectively as follows:
Figure BDA00036983721200000812
Figure BDA00036983721200000813
step 3 e) judging whether Q = Q is true, if so, obtaining M trained generation confrontation network models
Figure BDA00036983721200000814
Otherwise, let q = q +1, and perform step (3 b);
step 4), acquiring the characteristics of the data flow of the predicted flow:
set X of test samplestestEach trained generator network is used as input of M trained generation confrontation network models
Figure BDA00036983721200000815
To XtestPerforming feature prediction on each data stream sample with the middle label of M to obtain a feature set A = { A) comprising M predicted data stream feature subsets1,A2,...,Am,...,AMIn which A ismRepresenting a test sample set XtestEach sample with m labels passes through a corresponding generator network
Figure BDA00036983721200000816
Predicting the obtained predicted data stream feature set;
step 5), obtaining a network background flow generation result:
step 5 a) initializing the application class v of the first data stream1The iteration number is L, the number of the network background flows to be generated is L, and L =1, wherein v is more than or equal to 11≤5;
Step 5 b) selecting the application category v in the prediction data stream feature set AlFeature subset of (2)
Figure BDA0003698372120000091
Using tarfen equal flow generator script to collect characteristic set according to predicted flow data stream
Figure BDA0003698372120000092
Writing a configuration file, based on which the traffic generator is based
Figure BDA0003698372120000093
Generating an initial data stream sequence clSimultaneously extract
Figure BDA0003698372120000094
Of the next data stream ofl+1Wherein v is 1. Ltoreq. Vl,vl+1≤5;
Step 5 c) judging whether L = L is true, if so, obtaining an initial data stream set c = { c = { c }1,c2,...cl,...cLElse, let l = l +1, and perform step (5 b);
step 5 d) encrypting the information data to be sent by the communication node and embedding the encrypted information data into each initial data stream clIn the method, a set c' = { c) of L pieces of network background traffic is obtained1′,c′2,...c′l,...,c′L}。
Selecting a corresponding generation countermeasure network for next data stream generation according to a next data stream application category obtained by the generation countermeasure network learning, simulating the correlation between different application flows of a user node in a certain time dimension, conforming to the user behavior habit, enabling the simulated flow to be closer to the normal flow, enabling an attacker not to count the correlation of the application flows in the certain time dimension through a flow analysis technology to detect the abnormity of the node, and improving the safety of the covert communication system.
The technical effects of the invention are further explained by combining simulation experiments as follows:
1. simulation conditions and contents:
the platform of the simulation experiment of the invention is as follows: the operating systems are Windows 10, tensorflow version 2.3.0, jupyter notebook version 4.3.0, python version 3.8.3.
The simulation experiment of the invention is to adopt the invention and the prior art "network background flow generating method based on generating confrontation network GAN" to carry on the contrast experiment, use KNN classifier to compare the accuracy and total accuracy of the background flow that the invention and prior art generate 5 kinds of different applications and correspond to, get the comparative result of the two, and draw the comparative result into table 1; comparing the accuracy rate and the total accuracy rate of the background flow corresponding to 5 different applications generated by the method and the prior art by using the SVM classifier to obtain the comparison results of the background flow and the total accuracy rate, and respectively drawing the comparison results into a table 2.
TABLE 1 comparison of background traffic generated for different applications using KNN classifier
HTTP WeChat OneNote 163 mailbox There is a dictionary of way General of
Prior Art 0.71 0.58 0.60 0.63 0.61 0.64
The invention 0.91 0.95 0.92 0.98 0.95 0.94
TABLE 2 comparison of background traffic generated for different applications using SVM classifier
HTTP WeChat OneNote 163 mailbox There is a dictionary of way General of
Prior Art 0.91 0.40 0.60 0.42 0.62 0.61
The invention 0.95 0.98 0.96 0.95 0.96 0.96
2. And (3) simulation result analysis:
according to the background traffic generation method disclosed by the invention, after the correlation between different application traffic of the user node is described by using the next data stream application category as an important feature, the accuracy of the generated background traffic corresponding to each application is effectively improved, the network background traffic can be better simulated and generated according to the behavior habits of the user, and the safety of a covert communication system is improved.
The foregoing description is only an example of the present invention and does not constitute any limitation to the present invention, and it will be apparent to those skilled in the art that various modifications and variations in form and detail may be made without departing from the principle of the present invention after understanding the content and principle of the present invention, but these modifications and variations are within the scope of the claims of the present invention.

Claims (4)

1. A network background traffic generation method based on a generation countermeasure network is characterized by comprising the following steps:
(1) Obtaining a training sample set XtrainAnd test sample set Xtest
(1a) The method comprises the steps that S original flow data packets including M kinds of network applications and continuously sent by a communication node in internet communication are divided into N groups, and each group includes a plurality of original flow data packets with the same source IP address, source port, destination IP address and destination port in one communication processData packets, and extracting statistical characteristics of all original flow data packets in each group to form data streams, so as to obtain a data stream set F = { F } including N data streams1,F2,...,Fn,...,FNWherein M is not less than 2,S not less than 10000nRepresenting the nth data flow containing the application class characteristics of the next data flow describing the relevance of different network flows;
(1b) For each data flow FnThe non-digital features of the data stream are subjected to one-hot coding, and the one-hot coding result is normalized to obtain a preprocessed data stream set
Figure FDA0003698372110000011
The network application category label is then used to label each data stream
Figure FDA0003698372110000012
Is marked to obtain
Figure FDA0003698372110000013
Corresponding network application class label set y = { y = { y =1,y2,...,yn,...,yNAnd then N in the solution1Training sample set X consisting of strip data streams and corresponding labelstrainN will remain2Strip data stream and corresponding label thereof form a test sample set XtestIn which N is1>1/2N,N=N1+N2
(2) Constructing M generation confrontation network models:
constructing a generative countermeasure network model C = { C comprising M parallel arrangements of the same kind as the network application1,C2,...,Cm,...,CMEach of which generates a countermeasure network CmComprising a generator network G cascaded in sequencemAnd arbiter network DmWherein, CmRepresenting a generating countermeasure network corresponding to the mth network application, the generator network GmThe device comprises a laminated input layer, a plurality of first full-connection layers and a tanh activation function output layer; arbiter network DmComprises a laminated input layer,A plurality of second fully-connected layers and sigmoid activation function output layers;
(3) Generating an iterative training of the antagonistic network model:
(3a) Initializing each generative confrontation network model CmMedium generator network GmArbiter network DmRespectively is
Figure FDA0003698372110000021
The iteration frequency is Q, the maximum iteration frequency is Q, Q is more than or equal to 10000, and Q =0;
(3b) Will train sample set XtrainAs inputs to M parallel arranged generative confrontation network models, each generative confrontation network model CmMedium generator network GmTo XtrainEach of the K data streams with m middle tags
Figure FDA0003698372110000022
Performing characteristic prediction to obtain CmCorresponding set of predicted data stream features
Figure FDA0003698372110000023
Wherein K is less than N1
(3c) Arbiter network DmCalculate each one separately
Figure FDA0003698372110000024
And each of
Figure FDA0003698372110000025
Derived from a training sample set XtrainIs given a probability of
Figure FDA0003698372110000026
Corresponding probability set
Figure FDA0003698372110000027
And XtrainCorresponding probability set D of K data streams with m middle labels2={d1,d2,...,dk,...,dKAnd (c) the step of (c) in which,
Figure FDA0003698372110000028
to represent
Figure FDA0003698372110000029
Through a generator network GmThe characteristics of the resulting data stream are predicted,
Figure FDA00036983721100000210
representation arbiter network DmComputing
Figure FDA00036983721100000211
Derived from a sample set XtrainAnd the probability of the label being m, dkRepresentation arbiter network DmCalculating out
Figure FDA00036983721100000212
Derived from a sample set XtrainAnd the probability of a label being m;
(3d) Using a cross-entropy loss function, by
Figure FDA00036983721100000213
Computation generator network GmLoss of (2)
Figure FDA00036983721100000214
At the same time pass
Figure FDA00036983721100000215
And dkComputation decider network DmLoss of
Figure FDA00036983721100000216
And using a counter-propagating method by
Figure FDA00036983721100000217
Computation generationDevice network GmGradient of network parameters by
Figure FDA00036983721100000218
Computation arbiter network DmNetwork parameter gradient of (a); then using a gradient descent algorithm, through GmNetwork parameter gradient pair GmNetwork parameters of
Figure FDA00036983721100000219
Update is carried out by DmNetwork parameter gradient pair DmParameter (d) of
Figure FDA00036983721100000220
Updating to obtain M generated confrontation network models which are arranged in parallel at the iteration;
(3e) Judging whether Q = Q is true, if so, obtaining M trained generation confrontation network models
Figure FDA00036983721100000221
Otherwise, let q = q +1, and perform step (3 b);
(4) Acquiring the characteristics of the data flow of the predicted flow:
set of test samples XtestEach trained generator network is used as input of M trained generation confrontation network models
Figure FDA00036983721100000222
To XtestPerforming feature prediction on each data stream sample with the middle label of M to obtain a feature set A = { A) comprising M predicted data stream feature subsets1,A2,...,Am,...,AMIn which A ismRepresenting a test sample set XtestEach sample with m labels passes through a corresponding generator network
Figure FDA0003698372110000031
Predicting the obtained predicted data stream feature set;
(5) Obtaining a network background flow generation result:
(5a) Initializing an application class v of a first data stream1The iteration number is L, the number of the network background flows to be generated is L, and L =1, wherein v is more than or equal to 11≤M;
(5b) Selecting the application category v in the prediction data stream feature set AlFeature subset of (2)
Figure FDA0003698372110000032
And generates the satisfaction
Figure FDA0003698372110000033
In a randomly selected predictive data stream characteristic
Figure FDA0003698372110000034
Initial data stream sequence clSimultaneously extract
Figure FDA0003698372110000035
Of the next data stream ofl+1Wherein v is 1. Ltoreq. Vl,vl+1≤M;
(5c) Judging whether L = L is true, if so, obtaining an initial data stream set c = { c = }1,c2,...cl,...cLExecuting step (5 d), otherwise, let l = l +1, and executing step (5 b);
(5d) Information data to be sent by the communication node is encrypted, and the encrypted information data is embedded into each initial data stream clObtaining L pieces of network background flow sets c '= { c'1,c′2,...c′l,...,c′L}。
2. The method for generating network background traffic based on generation countermeasure network of claim 1, wherein the main statistical characteristics of all original traffic data packets in each group in step (1 a) comprise digital characteristics and non-digital characteristics, wherein:
the digital characteristics mainly comprise the total number of data packets, the connection duration of the data packets, the mean value and the median of the length of the data packets, the mean value and the median of the arrival time interval of the data packets and the mean value and the median of the length of a sliding window;
the non-numeric characteristics mainly include the protocol type and the next data stream application category.
3. The method according to claim 1, wherein each of the created countermeasure networks C in step (2) is a GANmComprising a generator network G cascaded in sequencemAnd arbiter network DmWherein:
generator network GmThe number of the first full-connection layers is 3, the number of the neurons is 80, 100 and 60 respectively, the activation functions are all leak-relu, the output layer comprises 15 neurons, and the activation function is tanh;
arbiter network DmThe number of the second full-connection layers is 3, the number of the neurons is 50, 80 and 30 respectively, the activation functions are all leak-relu, the output layer comprises 1 neuron, and the activation function is sigmoid.
4. The method for generating network background traffic based on generation of countermeasure network as claimed in claim 1, wherein the passing in step (3 d)
Figure FDA0003698372110000041
Computation generator network GmLoss of (2)
Figure FDA0003698372110000042
And by
Figure FDA0003698372110000043
And dkComputation decider network DmLoss of (2)
Figure FDA0003698372110000044
The calculation formulas are respectively as follows:
Figure FDA0003698372110000045
Figure FDA0003698372110000046
CN202210720772.7A 2022-06-16 2022-06-16 Network background flow generation method based on generation of countermeasure network Active CN115277086B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210720772.7A CN115277086B (en) 2022-06-16 2022-06-16 Network background flow generation method based on generation of countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210720772.7A CN115277086B (en) 2022-06-16 2022-06-16 Network background flow generation method based on generation of countermeasure network

Publications (2)

Publication Number Publication Date
CN115277086A true CN115277086A (en) 2022-11-01
CN115277086B CN115277086B (en) 2023-10-20

Family

ID=83760931

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210720772.7A Active CN115277086B (en) 2022-06-16 2022-06-16 Network background flow generation method based on generation of countermeasure network

Country Status (1)

Country Link
CN (1) CN115277086B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115604131A (en) * 2022-12-15 2023-01-13 广州丰石科技有限公司(Cn) Link flow prediction method, system, electronic device and medium
CN116708258A (en) * 2023-06-20 2023-09-05 中国电子科技集团公司第十五研究所 Background flow network topology convergence method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109889452A (en) * 2019-01-07 2019-06-14 中国科学院计算技术研究所 Network context flow generation method and system based on condition production confrontation network
US20210073630A1 (en) * 2019-09-10 2021-03-11 Robert Bosch Gmbh Training a class-conditional generative adversarial network
WO2021174935A1 (en) * 2020-03-03 2021-09-10 平安科技(深圳)有限公司 Generative adversarial neural network training method and system
CN113726545A (en) * 2021-06-23 2021-11-30 清华大学 Network traffic generation method and device for generating countermeasure network based on knowledge enhancement

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109889452A (en) * 2019-01-07 2019-06-14 中国科学院计算技术研究所 Network context flow generation method and system based on condition production confrontation network
US20210073630A1 (en) * 2019-09-10 2021-03-11 Robert Bosch Gmbh Training a class-conditional generative adversarial network
WO2021174935A1 (en) * 2020-03-03 2021-09-10 平安科技(深圳)有限公司 Generative adversarial neural network training method and system
CN113726545A (en) * 2021-06-23 2021-11-30 清华大学 Network traffic generation method and device for generating countermeasure network based on knowledge enhancement

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
PETER ASHWOOD-SMITH;BILEL JAMOUSSI;DON FEDYK;DAREK SKALECKI;NORTEL NETWORKS;PETER ASHWOOD-SMITH;BILEL JAMOUSSI;DON FEDYK;DAREK SKA: "IMPROVING TOPOLOGY DATA BASE ACCURACY WITH LSP FEEDBACK VIA CR-LDP", IETF *
T. ZSEBY; FRAUNHOFER FOKUS; M. MOLINA; DANTE; N. DUFFIELD; AT AMP;AMP;AMP;T LABS ?RESEARCH;S. NICCOLINI; NEC EUROPE LTD.;F. RASPAL: "Sampling and Filtering Techniques for IP Packet Selection", IETF *
李杰;周路;李华欣;闫璐;朱浩瑾;: "基于生成对抗网络的网络流量特征伪装技术", 计算机工程, no. 12 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115604131A (en) * 2022-12-15 2023-01-13 广州丰石科技有限公司(Cn) Link flow prediction method, system, electronic device and medium
CN116708258A (en) * 2023-06-20 2023-09-05 中国电子科技集团公司第十五研究所 Background flow network topology convergence method and device
CN116708258B (en) * 2023-06-20 2024-04-19 中国电子科技集团公司第十五研究所 Background flow network topology convergence method and device

Also Published As

Publication number Publication date
CN115277086B (en) 2023-10-20

Similar Documents

Publication Publication Date Title
CN115277086B (en) Network background flow generation method based on generation of countermeasure network
CN111565156B (en) Method for identifying and classifying network traffic
CN108629183A (en) Multi-model malicious code detecting method based on Credibility probability section
Liu et al. LSTM-CGAN: Towards generating low-rate DDoS adversarial samples for blockchain-based wireless network detection models
CN111245667A (en) Network service identification method and device
Soleymanpour et al. CSCNN: cost-sensitive convolutional neural network for encrypted traffic classification
Qin et al. Attentional payload anomaly detector for web applications
CN110334488B (en) User authentication password security evaluation method and device based on random forest model
CN112948578B (en) DGA domain name open set classification method, device, electronic equipment and medium
Kopal Of Ciphers and Neurons-Detecting the Type of Ciphers Using Artificial Neural Networks.
Zhang et al. Adaptive matrix sketching and clustering for semisupervised incremental learning
CN112702157B (en) Block cipher system identification method based on improved random forest algorithm
CN114826681A (en) DGA domain name detection method, system, medium, equipment and terminal
Zhou et al. Few-shot website fingerprinting attack with cluster adaptation
Cai et al. A malicious network traffic detection model based on bidirectional temporal convolutional network with multi-head self-attention mechanism
CN117318980A (en) Small sample scene-oriented self-supervision learning malicious traffic detection method
CN113542271B (en) Network background flow generation method based on generation of confrontation network GAN
Wang et al. A two-phase approach to fast and accurate classification of encrypted traffic
Jovic et al. Traditional machine learning methods for side-channel analysis
CN110188928A (en) A kind of the formative optimization system and method for cloud data education training process
Jian et al. An induction learning approach for building intrusion detection models using genetic algorithms
CN113111329B (en) Password dictionary generation method and system based on multi-sequence long-term and short-term memory network
Du et al. DBWE-Corbat: Background network traffic generation using dynamic word embedding and contrastive learning for cyber range
CN111401067B (en) Honeypot simulation data generation method and device
Zhang et al. A novel RNN-GBRBM based feature decoder for anomaly detection technology in industrial control network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant