CN114866291B - DDoS defense system and method based on deep reinforcement learning under SDN - Google Patents

DDoS defense system and method based on deep reinforcement learning under SDN Download PDF

Info

Publication number
CN114866291B
CN114866291B CN202210405147.3A CN202210405147A CN114866291B CN 114866291 B CN114866291 B CN 114866291B CN 202210405147 A CN202210405147 A CN 202210405147A CN 114866291 B CN114866291 B CN 114866291B
Authority
CN
China
Prior art keywords
flow
state
network
reinforcement learning
deep reinforcement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210405147.3A
Other languages
Chinese (zh)
Other versions
CN114866291A (en
Inventor
周海峰
陈述涵
杨明亮
吴春明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202210405147.3A priority Critical patent/CN114866291B/en
Publication of CN114866291A publication Critical patent/CN114866291A/en
Application granted granted Critical
Publication of CN114866291B publication Critical patent/CN114866291B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1458Denial of Service
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a DDoS attack active defense system and method based on deep reinforcement learning under SDN architecture, which are used for collecting state characteristics of an edge switch, extracting network characteristics from dynamic environment through a near-end strategy optimization algorithm, making defense decisions on each flow, namely determining the allowed passing proportion of each flow, discarding malicious flow as possible, checking the deep reinforcement learning action through network constraint conditions, improving the robustness of the method, and completing the active defense on DDoS attack. The construction method of the invention is simple, flexible to realize and high in efficiency.

Description

DDoS defense system and method based on deep reinforcement learning under SDN
Technical Field
The invention belongs to the field of network security active defense under SDN, and particularly relates to a DDoS attack active defense system and method based on deep reinforcement learning under SDN architecture.
Background
The number of DDoS attack events is still increasing year by year and has extremely high attack traffic and short attack duration, so it is important to take defensive measures in time before such attacks rise. Because of the advantages of Software Defined Network architecture in defending against DDoS attacks, such as flexible programming and control features, statistical model and machine learning model based methods can effectively defend against DDoS attacks in Software Defined Networks (SDN), but these methods are less real-time and require re-collection of samples and reconstruction of models before models become ineffective when the attack signature changes. The advent of deep reinforcement learning provides an opportunity to effectively defend against DDoS attacks in real time. Meanwhile, the deep reinforcement learning method runs in a black box mode and depends on an opaque data driving model, so that large differences can occur in DDoS attack defense effects based on the deep reinforcement learning, and therefore, it is important to consider the efficiency, the robustness and the instantaneity of DDoS attack defense.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a DDoS attack active defense system and method based on deep reinforcement learning under an SDN architecture.
The aim of the invention is realized by the following technical scheme: a DDoS attack active defense system based on deep reinforcement learning under an SDN architecture comprises an SDN controller, an edge switch and a deep reinforcement learning agent processing module; the SDN controller comprises a network state collection module, a defensive action execution module and a feedback acquisition module; converting the defending process into a Markov decision process, establishing a network view through an SDN controller, and collecting network characteristic information on an edge switch in real time to reflect the current network request state; based on a near-end strategy optimization algorithm in deep reinforcement learning, extracting network characteristics from dynamic environments, mapping the state of each flow to a defense decision, ensuring the passing of normal flow and discarding malicious flow, and realizing active defense on DDoS attack; and training a deep neural network through interaction between the deep reinforcement learning agent and the network, and optimizing a defense strategy by experience.
Further, the method comprises the steps of:
the network state collection module actively requests the state information of the edge switch, and acquires the returned state information after the delta t moment;
the deep reinforcement learning agent processing module is realized based on a near-end policy optimization algorithm, the input state of the module is port information of the switch and flow information passing through the switch, which are collected by the network state collection module, and the output of the module is action which represents the flow ratio of a certain flow on the edge switch to allow the flow to pass through; wherein the allowable passing ratio of malicious traffic approaches 0%, and the allowable passing ratio of normal traffic approaches 100%;
the defensive action execution module is used for verifying the action of the action output by the deep reinforcement learning agent processing module by using a bandwidth reassignment method, reassigning the bandwidth for each flow according to the size of the flow which is originally allowed to pass, and adding constraint conditions in the process of bandwidth reassignment; the constraint condition is that the total traffic allowed to be sent to the server through the edge switch does not exceed the available bandwidth of the server;
and a feedback acquisition module: the defending action execution module triggers the feedback acquisition module after execution, and at the moment, the feedback acquisition module calls the network state collection module to actively request the state information of the edge switch and the server, and after the delta t moment, the next returned network state information state' is obtained; then, in combination with the flow information passing through the edge switch and the flow information reaching the server in the last network state information state collected by the network state collection module, the malicious flow proportion p is calculated m And a normal flow rate ratio p n The method comprises the steps of carrying out a first treatment on the surface of the Calculating a reward function value reward based on the two ratios; the feedback acquisition module feeds back the next network state information state' and the reward function value reward to the deep reinforcement learning agent processing module.
Further, when the SDN controller does not collect the state of a certain flow, the state of the corresponding flow is 0, and the flow ratio allowed by the corresponding flow is also 0.
Further, the proportion of traffic that a certain flow is allowed to pass on an edge switch ranges from [0.05,1].
Further, the deep reinforcement learning agent processing module comprises an actor neural network A, an actor neural network B, a critique neural network and a memory pool, and specifically comprises:
(2.1) the actor neural network A is responsible for interaction with a network environment, wherein the actor neural network A takes state as input, outputs a distributed mean sigma and a mean mu of actions, and randomly samples the corresponding normal distribution to obtain action;
(2.2) updating the neural network in the deep reinforcement learning agent processing module, depending on a sample set collected in a memory pool; after the defensive action executing module executes the defensive action, collecting feedback information from the feedback collecting module, wherein the feedback information comprises the next network state information state' and a rewarding value report; and storing the (state, action, rewind, state') in the memory pool; when storing f in the memory cell 1 After the group of samples, the actual benefits of the samples are obtainedThe difference of the state cost function is a dominance function value A, the mean square error of the function value A is used as a loss value, the back propagation is carried out, the parameters of the commentator neural network are updated, and the updating process trains f 2 Secondary times;
(2.3) sampling on the distribution of the output of the actor neural network A and sampling on the distribution of the output of the actor neural network B to obtain the ratio of the action probability as the ratio; the loss value updated by the actor neural network B is loss=min (ratio a, clip (1-e, 1+e, ratio) a); wherein e is a custom value, and clip () function limits the ratio to the range of (1-e, 1+e); this update procedure trains f 3 Secondary times;
(2.4) per training f of the entire near segment policy optimization algorithm 1 *f 3 And step, assigning the parameter values of the actor neural network B to the actor neural network A to finish updating of the actor neural network A.
Further, the defensive action execution module performs security defenses against two network conditions:
firstly, when the total flow allowed to pass through the edge switch is larger than the available bandwidth of the server, the defensive action execution module reduces the size of the flow which can reach the server according to the constraint condition;
second, when the total traffic allowed to pass through the edge switch is smaller than the available bandwidth of the server, the defensive action execution module allocates the remaining bandwidth such that the normal traffic proportion is greater than the malicious traffic, comprising the steps of:
and according to the original set TR of allowed traffic output by the deep reinforcement learning agent processing module, under the constraint condition, reallocating the bandwidth allowed to pass through by each flow based on the Softmax function, respectively assigning the reallocated set TR' of allowed traffic to the meter table bound with the flow table, and discarding the traffic exceeding the set value of the meter table.
Further, the constraint is that the total traffic allowed through the edge switch is limited to the server load U S Within 95%.
Further, the feedback acquisition module calculates a reward function value reward:
reward=0.9p n +0.1(1-p m )。
a DDoS attack active defense method based on deep reinforcement learning under SDN architecture comprises the following steps:
(1) Environment initialization: initializing the total training number epositions of parameters, and the training steps of each training number steps; the number of turns of current training is eposide=1, and the number of steps of current training is step=1;
(2) Initializing the size and interval of user package, setting the current training step number to step=1;
(3) The SDN controller actively transmits a message request delta t to the state of an edge switch, wherein the state comprises port information of the switch and flow information passing through the switch;
(4) Analyzing the state information obtained in the step (3);
(5) Judging whether step < steps are met;
(6) If step is not less than steps, then determine whether eposide < eposide is satisfied: if so, the eposide count is incremented by 1, and the process returns to the step (2); if not, ending;
(7) If step is smaller than steps, taking the network state information analyzed in the step (4) as the input of a near-field strategy optimization algorithm, and outputting a proportion set TR through which the corresponding flow is allowed to pass;
(8) Verifying the action output by the near-segment strategy optimization algorithm by using a bandwidth reallocation method, and reallocating the available bandwidth of the server based on a softmax function to obtain an available bandwidth value set TR' of each stream;
(9) Assigning TR' to a meter table speed limit value of the corresponding flow, and discarding the excess flow;
(10) SDN controller actively requests state information of edge switch in delta t and state information of server, and calculates malicious traffic proportion p by combining flow information passing through edge switch and flow information reaching server in state m And a normal flow rate ratio p n The method comprises the steps of carrying out a first treatment on the surface of the Calculating a reward function value reward based on the two ratios;
(11) Storing current training data (state, action, forward, state') into a memory pool, each time f is collected by the memory pool 1 Group data, completeUpdating parameters of the neural network in a near-field strategy optimization algorithm;
(12) step count is increased by 1, so that state=state', as input of a near-stage strategy optimization algorithm in the next training, the next training is carried out in the step (5) until the training steps and the training rounds reach the maximum.
The beneficial effects of the invention are as follows: the invention collects the real-time data characteristics (flow characteristics and port characteristics) of the edge switch as the input of a near-end policy optimization algorithm, finishes the mapping from the flow state to the flow allowed through proportion, intelligently decides the flow allowed through proportion, actively discards malicious traffic in real time, simultaneously uses normal traffic as possible, combines the constraint condition that the total traffic reaching a server should be smaller than the server load for decision debugging, and particularly, reallocates the bandwidth of each flow originally allowed through based on a softmax function. Adopting a near-end strategy optimization algorithm of deep reinforcement learning to realize active defense against DDoS attack; the decision debugging process avoids wrong or dangerous decisions and ensures the efficiency and robustness of DDoS attack defense. The method is simple, flexible to realize and has strong practicability.
Drawings
FIG. 1 is a schematic diagram of the DDoS attack active defense system of the present invention;
fig. 2 is a flow chart of the active defense method of DDoS attack of the present invention.
Detailed Description
As shown in fig. 1, the DDoS attack active defense system based on deep reinforcement learning under an SDN architecture comprises an SDN controller, an edge switch and a deep reinforcement learning agent processing module; the SDN controller comprises a network state collection module, a defensive action execution module and a feedback acquisition module. The invention converts the defending process into a Markov decision process, establishes a network view through the SDN network controller, and collects network characteristic information (flow characteristics) on the edge switch in real time to accurately reflect the current network request state. Extracting network characteristics from dynamic environment through a near-end strategy optimization algorithm in deep reinforcement learning, mapping the state of each flow to a defense decision, ensuring the passing of normal flow and discarding malicious flow, and realizing active defense on DDoS attack; and training a deep neural network through interaction between the deep reinforcement learning agent and the network, and optimizing the relief strategy by experience. The performance difference of passing through normal traffic and discarding malicious traffic under different network states of dynamic change is reduced, and the robustness of the defense method is improved.
In an embodiment of the invention, the SDN controller is implemented based on OpenDaylight (ODL) and the edge switches are implemented based on Open vSwitch (OvS). The network environment includes K edge switches, and there are at most P flows on each edge switch. The state information of the switch includes port information of the switch (a transceiving packet size, a transceiving byte size of each port) and flow information passing through the switch (a packet number, a byte number of a flow whose destination address is a server). The state information of the server includes port information of the server (a packet size and a byte size for each port) and stream information passing through the server (a packet number and a byte number of a stream addressed to the server). The time Δt for the controller to request status information and to collect status information is chosen to be 0.5s. The system of the invention trains 2000 rounds (epodes) altogether, each round of training comprises 200 steps (steps), and each round of training reinitializes the network environment, including the size and interval of user package; each step comprises a network state collection module, a defensive action execution module and a feedback acquisition module which are completely and sequentially called.
(1) The network state collection module actively sends OFPT_STATS_REQUEST to REQUEST the state information of the edge switch, and acquires the returned state information from the OFPT_STATS_REPLY after the delta t moment; and the controller sends the state information to the deep reinforcement learning agent processing module in a JSON format.
(2) The deep reinforcement learning agent processing module is realized based on a near-end policy optimization algorithm, and the input state of the module is port information E of the switch and flow information F passing through the switch, which are collected by the network state collection module, and the state= [ (E) 1 ,F 1 ),...,(E K ,F K )]Wherein F k =[f k1 ,f k2 ...f kP ]When a certain stream is not collectedState f of the corresponding stream kp Is 0. The output of the module is aciion= [ (a) 11 ,...,a 1P ),...,(a K1 ,...,a KP )],a kp The proportion of traffic allowed to pass for the p-th flow on the kth edge switch ranges from [0.05,1]]K=1 to K, p=1 to P, when f kp When 0, it corresponds to a kp Also 0; wherein the traffic allowed through proportion approaches 0% and the traffic allowed through proportion approaches 100%. For example, an action value equal to 0.4 indicates a flow rate of 40% allowed through the flow.
The deep reinforcement learning agent processing module comprises an Actor (Actor) neural network A, an Actor (Actor) neural network B, a critter (Critic) neural network and a memory pool.
(2.1) Actor (Actor) neural network a is responsible for interacting with the network environment, wherein Actor neural network a takes state as input, outputs the distributed mean sigma and mean mu of actions, and randomly samples from the corresponding normal distribution to obtain action.
(2.2) updating the neural network in the deep reinforcement learning agent processing module, depending on the sample set collected in the memory pool. After the defensive action executing module executes the defensive action, collecting feedback information from the feedback collecting module, wherein the feedback information comprises the next network state information state' and a rewarding value report; and (state, action, rewind, state') is stored in the memory pool. After 4 groups of samples are stored in the memory pool, the difference between the actual benefits of the samples and the state cost function is taken as a dominant function value A, the mean square error of the function value A is taken as a loss value, the parameters of the commentator neural network are updated through back propagation, and the updating process is trained for 4 times.
(2.3) sampling on the distribution of the output of the Actor (Actor) neural network A and sampling on the distribution of the output of the Actor (Actor) neural network B to obtain the ratio of the action probability as the ratio; the loss value updated by Actor (Actor) neural network B is loss=min (ratio a, clip (1-e, 1+e, ratio) a), where e takes 0.2 and clip () function limits the ratio within the range of (0.8,1.2). This update process was trained 4 times.
(2.4) after each 16 steps of training of the whole near-field strategy optimization algorithm, the Actor (Actor) neural network B parameter values are assigned to the Actor (Actor) neural network A, and updating of the Actor (Actor) neural network A is completed.
(3) And the defensive action execution module is used for verifying the action output by the deep reinforcement learning agent processing module by utilizing a bandwidth reassignment method, reassigning the bandwidth for each flow according to the size of the flow which is originally allowed to pass, and adding constraint conditions in the process of bandwidth reassignment, namely, allowing the total flow sent to the server through the edge switch not to exceed the available bandwidth of the server.
Specifically, security defenses are made against two network conditions:
first, when the total flow allowed to pass through the edge switch is larger than the available bandwidth of the server, the defensive action execution module reduces the flow which can reach the server according to the constraint condition, and can protect the server from overload.
Second, when the total traffic allowed to pass through the edge switch is smaller than the available bandwidth of the server, since an attacker is in order to recruit fewer agents to overload the server when a DDoS attack occurs, the malicious host tends to send more traffic than the normal host in a short time, and the allocation of the remaining bandwidth can make the normal traffic proportion larger than the malicious traffic, so that the normal traffic passing rate can be effectively improved. The method comprises the following specific steps:
after the deep reinforcement learning agent processing module outputs the flow size action which is originally allowed to pass, the bandwidth which is allowed to pass for each flow is redistributed based on the Softmax function; and limit the total traffic allowed through the edge switches to the server load U S Within 95%. Let the set of allowed traffic of the original γ -stripe stream be tr= [ TR ] 1 ,tr 2 ....,tr γ ]The reassigned set TR' is denoted as:
Figure BDA0003601514380000061
Figure BDA0003601514380000062
and assigning the redistributed TR' to the meter tables bound with the flow tables respectively, so as to achieve the effects of ensuring normal flow to pass and limiting malicious flow. This is accomplished by inserting, deleting and updating flow tables through SalFlowService and SalMeterService's APIs in OpenFlowPlugin, and traffic exceeding the meter table set point will be discarded.
(4) And a feedback acquisition module: the defense action execution module triggers the feedback acquisition module after execution, and at the moment, the feedback acquisition module calls the network state collection module to actively send OFPT_STATS_REQUEST to REQUEST state information of the edge switch and the server, and after the delta t moment, the next returned network state information state' is obtained from OFPT_STATS_REPLY; then, in combination with the flow information passing through the edge switch and the flow information reaching the server in the last network state information state collected by the network state collection module, the malicious flow proportion p is calculated m And a normal flow rate ratio p n . Based on these two ratios, a reward function value, reward=0.9p, is calculated n +0.1(1-p m ). The feedback acquisition module feeds back the next network state information state' and the reward function value reward to the deep reinforcement learning agent processing module.
As shown in fig. 2, the DDoS attack active defense method based on deep reinforcement learning under the SDN architecture is specifically implemented as follows:
(1) And initializing the environment. Initializing the total training number of rounds of parameters, namely, eposide=2000, the training step number of each round, namely, step=200, the number of rounds of current training, namely, eposide=1, and the step number of current training, namely, step=1.
(2) The size and interval of the user's hair pack are initialized and the current number of training steps is set to step=1.
(3) The openDayleight controller actively issues a message requesting the state of the edge switch, including port information of the switch and flow information through the switch, at=0.5 s.
(4) And (3) analyzing the state information acquired in the step (3), wherein the state information comprises the packet number and byte number of the stream with the destination address being the server, and the size of a receiving and transmitting packet and the size of receiving and transmitting bytes of each port of the edge switch. These status information effectively reflect the status of the current network's requests sent to the server and whether the server is congested.
(5) It is determined whether step < steps is satisfied.
(6) If step is not less than steps, then determine whether eposide < eposide is satisfied: if so, the eposide count is incremented by 1, and the process returns to the step (2); if not, ending.
(7) If step is smaller than step, the network state information analyzed in step (4) is taken as input of a near-field policy optimization algorithm, and a proportion set TR which allows the corresponding flow to pass is output.
(8) And verifying the action output by the near-segment strategy optimization algorithm by using a bandwidth reallocation method, and reallocating 95% of available bandwidth of the server based on a softmax function to obtain an available bandwidth value set TR' of each stream.
(9) Assigning TR' to the meter table speed limit value of the corresponding flow, the excess traffic is discarded.
(10) The OpenDayleight controller actively requests state information state' of an edge switch in delta t and state information of a server, and calculates malicious traffic proportion p by combining flow information passing through the edge switch and flow information reaching the server in the state m And a normal flow rate ratio p n . Based on these two ratios, a reward function value, reward, is calculated.
(11) And storing the current training data (state, action, forward, state') into a memory pool, and completing the parameter updating of the neural network in the near-segment strategy optimization algorithm once every 4 groups of data are collected by the memory pool.
(12) step count is increased by 1, so that state=state', as input of a near-stage strategy optimization algorithm in the next training, the next training is carried out in the step (5) until the training steps and the training rounds reach the maximum.
The foregoing description is only of the preferred embodiments of the invention, and all changes and modifications that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims (9)

1. The DDoS attack active defense system based on deep reinforcement learning under the SDN architecture is characterized by comprising an SDN controller, an edge switch and a deep reinforcement learning intelligent agent processing module; the SDN controller comprises a network state collection module, a defensive action execution module and a feedback acquisition module; converting the defending process into a Markov decision process, establishing a network view through an SDN controller, and collecting network characteristic information on an edge switch in real time to reflect the current network request state; based on a near-end strategy optimization algorithm in deep reinforcement learning, extracting network characteristics from dynamic environments, mapping the state of each flow to a defense decision, ensuring the passing of normal flow and discarding malicious flow, and realizing active defense on DDoS attack; training a deep neural network through interaction between a deep reinforcement learning agent and a network, and optimizing and relieving strategies by experience to reduce the performance difference of normal traffic and malicious traffic discarded under different network states of dynamic change;
the network state collection module actively requests the state information of the edge switch, and acquires the returned state information after the delta t moment;
the defensive action execution module is used for verifying action output by the deep reinforcement learning agent processing module by using a bandwidth reallocation method, reallocating bandwidth for each flow according to the size of the flow which is originally allowed to pass, and adding constraint conditions in the bandwidth reallocation process; the constraint condition is that the total traffic allowed to be sent to the server through the edge switch does not exceed the available bandwidth of the server;
the feedback acquisition module is used for: the defending action execution module triggers the feedback acquisition module after execution, and at the moment, the feedback acquisition module calls the network state collection module to actively request the state information of the edge switch and the server, and after the delta t moment, the next returned network state information state' is obtained; then, in combination with the flow information passing through the edge switch and the flow information reaching the server in the last network state information state collected by the network state collection module, the malicious flow proportion p is calculated m And a normal flow rate ratio p n The method comprises the steps of carrying out a first treatment on the surface of the Calculating a reward function value reward based on the two ratios; the feedback acquisition module will be nextThe personal network state information state' and the reward function value reward are fed back to the deep reinforcement learning agent processing module.
2. The DDoS attack active defense system based on deep reinforcement learning under the SDN architecture of claim 1, comprising:
the deep reinforcement learning agent processing module is realized based on a near-end policy optimization algorithm, the input state of the module is port information of the switch and flow information passing through the switch, which are collected by the network state collection module, and the output of the module is action which represents the flow ratio of a certain flow on the edge switch to allow the flow to pass through; wherein the traffic allowed through proportion approaches 0% and the traffic allowed through proportion approaches 100%.
3. The DDoS attack active defense system based on deep reinforcement learning under the SDN architecture of claim 1, wherein when the SDN controller does not collect a state of a certain flow, the state of the corresponding flow is 0, and the proportion of traffic allowed by the corresponding flow is also 0.
4. The DDoS attack active defense system based on deep reinforcement learning under SDN architecture of claim 1, wherein the proportion of traffic allowed to pass by a certain flow on an edge switch ranges from [0.05,1].
5. The DDoS attack active defense system based on deep reinforcement learning under SDN architecture of claim 2, wherein the deep reinforcement learning agent processing module comprises an actor neural network a, an actor neural network B, a critique neural network and a memory pool, and specifically comprises:
(2.1) the actor neural network A is responsible for interaction with a network environment, wherein the actor neural network A takes state as input, outputs a distributed mean sigma and a mean mu of actions, and randomly samples the corresponding normal distribution to obtain action;
(2.2) updating neural networks in the deep reinforcement learning agent processing module, depending on the memory poolA collection of collected samples; after the defensive action executing module executes the defensive action, collecting feedback information from the feedback collecting module, wherein the feedback information comprises the next network state information state' and a rewarding value report; and storing the (state, action, rewind, state') in the memory pool; when storing f in the memory cell 1 After the samples are assembled, the difference between the actual benefits of the samples and the state cost function is taken as a dominant function value A, the mean square error of the function value A is taken as a loss value, the parameters of the commentary neural network are reversely propagated, and the updating process trains f 2 Secondary times;
(2.3) sampling on the distribution of the output of the actor neural network A and sampling on the distribution of the output of the actor neural network B to obtain the ratio of the action probability as the ratio; the loss value updated by the actor neural network B is loss=main (ratio a, clip (1-e, 1+e, ratio) a); wherein e is a custom value, and clip () function limits the ratio to the range of (1-e, 1+e); this update procedure trains f 3 Secondary times;
(2.4) per training f of the entire near-end policy optimization algorithm 1 *f 3 And step, assigning the parameter values of the actor neural network B to the actor neural network A to finish updating of the actor neural network A.
6. The DDoS attack active defense system based on deep reinforcement learning under SDN architecture of claim 2, wherein the defense action execution module performs security defense for two network conditions:
firstly, when the total flow allowed to pass through the edge switch is larger than the available bandwidth of the server, the defensive action execution module reduces the size of the flow which can reach the server according to the constraint condition;
second, when the total traffic allowed to pass through the edge switch is smaller than the available bandwidth of the server, the defensive action execution module allocates the remaining bandwidth such that the normal traffic proportion is greater than the malicious traffic, comprising the steps of:
and according to the original set TR of allowed traffic output by the deep reinforcement learning agent processing module, under the constraint condition, reallocating the bandwidth allowed to pass through by each flow based on the Softmax function, respectively assigning the reallocated set TR' of allowed traffic to the meter table bound with the flow table, and discarding the traffic exceeding the set value of the meter table.
7. The DDoS attack active defense system based on deep reinforcement learning under SDN architecture of claim 6, wherein the constraint is that total traffic allowed through edge switches is limited to server load U S Within 95%.
8. The DDoS attack active defense system based on deep reinforcement learning under the SDN architecture of claim 2, wherein the feedback acquisition module calculates a reward function value reward:
reward=0.9p n +0.1(1-p m )。
9. the DDoS attack active defense method based on deep reinforcement learning under the SDN architecture is characterized by comprising the following steps:
(1) Environment initialization: initializing the total training number epositions of parameters, and the training steps of each training number steps; the number of turns of current training is eposide=1, and the number of steps of current training is step=1;
(2) Initializing the size and interval of user package, setting the current training step number to step=1;
(3) The SDN controller actively transmits a message request delta t to the state of an edge switch, wherein the state comprises port information of the switch and flow information passing through the switch;
(4) Analyzing the state information obtained in the step (3);
(5) Judging whether step < steps are met;
(6) If step is not less than steps, then determine whether eposide < eposide is satisfied: if so, the eposide count is incremented by 1, and the process returns to the step (2); if not, ending;
(7) If step is smaller than steps, taking the network state information analyzed in the step (4) as input of a near-end policy optimization algorithm, and outputting a proportion set TR through which a corresponding flow is allowed to pass;
(8) Verifying the action output by the near-end strategy optimization algorithm by using a bandwidth reallocation method, and reallocating the available bandwidth of the server based on a softmax function to obtain an available bandwidth value set TR' of each stream;
(9) Assigning TR' to a meter table speed limit value of the corresponding flow, and discarding the excess flow;
(10) SDN controller actively requests state information of edge switch in delta t and state information of server, and calculates malicious traffic proportion p by combining flow information passing through edge switch and flow information reaching server in state m And a normal flow rate ratio p n The method comprises the steps of carrying out a first treatment on the surface of the Calculating a reward function value reward based on the two ratios;
(11) Storing current training data (state, action, forward, state') into a memory pool, each time f is collected by the memory pool 1 The data are assembled, and the parameter updating of the neural network in the near-end strategy optimization algorithm is completed once;
(12) step count is increased by 1, so that state=state', which is used as input of the near-end strategy optimization algorithm in the next training, returns to the step (5) to perform the next training until the training steps and training rounds reach the maximum.
CN202210405147.3A 2022-04-18 2022-04-18 DDoS defense system and method based on deep reinforcement learning under SDN Active CN114866291B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210405147.3A CN114866291B (en) 2022-04-18 2022-04-18 DDoS defense system and method based on deep reinforcement learning under SDN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210405147.3A CN114866291B (en) 2022-04-18 2022-04-18 DDoS defense system and method based on deep reinforcement learning under SDN

Publications (2)

Publication Number Publication Date
CN114866291A CN114866291A (en) 2022-08-05
CN114866291B true CN114866291B (en) 2023-06-23

Family

ID=82630532

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210405147.3A Active CN114866291B (en) 2022-04-18 2022-04-18 DDoS defense system and method based on deep reinforcement learning under SDN

Country Status (1)

Country Link
CN (1) CN114866291B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117579384B (en) * 2024-01-16 2024-03-29 杭州智顺科技有限公司 Network security operation and command system based on actual combat

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112491714A (en) * 2020-11-13 2021-03-12 安徽大学 Intelligent QoS route optimization method and system based on deep reinforcement learning in SDN environment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109768981B (en) * 2019-01-20 2021-02-02 北京工业大学 Network attack defense method and system based on machine learning under SDN architecture
CN111740950A (en) * 2020-05-13 2020-10-02 南京邮电大学 SDN environment DDoS attack detection and defense method
CN113452695A (en) * 2021-06-25 2021-09-28 中国舰船研究设计中心 DDoS attack detection and defense method in SDN environment
CN114363093B (en) * 2022-03-17 2022-10-11 浙江君同智能科技有限责任公司 Honeypot deployment active defense method based on deep reinforcement learning

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112491714A (en) * 2020-11-13 2021-03-12 安徽大学 Intelligent QoS route optimization method and system based on deep reinforcement learning in SDN environment

Also Published As

Publication number Publication date
CN114866291A (en) 2022-08-05

Similar Documents

Publication Publication Date Title
Sharma et al. Approximating fair queueing on reconfigurable switches
CN113762530B (en) Precision feedback federal learning method for privacy protection
CN111182637B (en) Wireless network resource allocation method based on generation countermeasure reinforcement learning
CN111740950A (en) SDN environment DDoS attack detection and defense method
CN107483473B (en) Low-speed denial of service attack data flow detection method in cloud environment
CN114866291B (en) DDoS defense system and method based on deep reinforcement learning under SDN
WO2021227508A1 (en) Deep reinforcement learning-based industrial 5g dynamic multi-priority multi-access method
CN110011983B (en) Flow table characteristic-based denial of service attack detection method
CN104750539B (en) Method and apparatus for the virtualization of service quality
CN106878314A (en) Network malicious act detection method based on confidence level
CN114615294A (en) Electric power internet of things gateway edge calculation method
CN113660209A (en) DDoS attack detection system based on sketch and federal learning and application
CN101399708B (en) Method and device for establishing network performance model
CN110049018B (en) SPMA protocol parameter optimization method, system and medium based on reinforcement learning
CN115277563A (en) On-chip network approximate control system based on offline reinforcement learning
CN104158823A (en) Simulation method oriented to LDoS (Low-rate Denial of Service) and LDDoS (Low-rate Distributed Denial of Service)
Neves et al. ATM call control by neural networks
CN111786967B (en) Defense method, system, node and storage medium for DDoS attack
CN115314399B (en) Data center flow scheduling method based on inverse reinforcement learning
CN115766081A (en) Abnormal flow detection method and device for power industrial control cloud platform
KR20200014139A (en) The method of defense against distributed denial-of-service attack on the heterogeneous iot network and the system thereof
Eshete et al. On the transient behavior of CHOKe
CN111752730A (en) Mimicry scheduling judgment method, mimicry scheduler and readable storage medium
Jun et al. Research on cultural algorithm for solving routing problem of mobile agent
CN108965005A (en) The adaptive method for limiting speed and its system of the network equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant