CN109039797B - Strong learning based large flow detection method - Google Patents

Strong learning based large flow detection method Download PDF

Info

Publication number
CN109039797B
CN109039797B CN201810594740.0A CN201810594740A CN109039797B CN 109039797 B CN109039797 B CN 109039797B CN 201810594740 A CN201810594740 A CN 201810594740A CN 109039797 B CN109039797 B CN 109039797B
Authority
CN
China
Prior art keywords
flow
detection
state
detection data
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810594740.0A
Other languages
Chinese (zh)
Other versions
CN109039797A (en
Inventor
王雄
潘志豪
任婧
徐世中
王晟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201810594740.0A priority Critical patent/CN109039797B/en
Publication of CN109039797A publication Critical patent/CN109039797A/en
Application granted granted Critical
Publication of CN109039797B publication Critical patent/CN109039797B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Algebra (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a reinforcement learning-based large flow detection method, which comprises the following steps: s1: detecting a data stream to obtain stream detection data; s2: optimizing a detection data model by adopting a historical sample buffer pool; s3: judging the big flow of the flow detection data by adopting the optimized detection data model, and detecting the big flow again; s4: the flow inspection data is put into the history sample buffer pool, and S2, S3, and S4 are again sequentially performed until the inspection ends. The invention takes the link state of the network and the historical measurement information of the flow as the state, takes the measurement size of the flow as the reward value, adopts the reinforcement learning-based large flow detection method to detect the large flow in the network, can fully extract the characteristics of the correlation and the like of the flow, and can improve the accuracy of the large flow detection.

Description

Strong learning based large flow detection method
Technical Field
The invention relates to the technical field of computers, in particular to a big flow detection method based on reinforcement learning.
Background
Fine-grained network flow measurement is required for planning, operation management, charging and security audit of a data center. Both NetFlow and sFlow are flow-based measurement methods, which can provide fine-grained network traffic measurement, but they require specific network devices or specific functional support, for example, NetFlow can only be used on cisco devices. On the other hand, because the traffic information data to be measured in the actual network is huge, the flow-based measurement method usually needs to consume a lot of network resources (network bandwidth, node storage and computation, etc.), and the scalability is poor. For Netflow, the processing time of each packet is limited by network resources, and when a high-speed switch is used, the processing time of each data packet is limited. In SDN (software defined network), the restricted resources of the switch are ternary content addressable TCAM resources, each TCAM resource can only measure one flow. Due to the shortage of resources, NetFlow will use a sampling method to measure, which will reduce the accuracy of measurement. The FlowRaar provides a method for measuring each flow in real time in a fine-grained manner under the condition of switch resource limitation, and reduces the processing time of a packet and the occupation of network load by compressing a counter of each flow.
The iStamP provides a method for flow aggregation and decoupling, reduces the use of TCAM resources through flow aggregation, and provides fine-grained measurement for flows with larger flow through flow decoupling. Since the size of the traffic is variable, the decoupled flow also needs to be changed frequently, which requires the algorithm to find the current flow with larger traffic in the network and measure the current flow in real time.
The problem is essentially that of a multiple-arm gambling machine MAB (multi-armed bandit) in which there are many machines that look the same, each winning a different probability and varying with time. Each time a gambler shakes a slot machine, a certain cost is spent, and how to maximize the yield is the problem to be solved by the dobby gambling machine. When the gambler finds a slot machine with a higher probability of winning, he may choose to continue to shake the slot machine to obtain a steady return. However, there may be a slot machine with a higher winning rate, or the probability of winning the slot machine decreases over time, so another longer term is to lose a portion of the current prize and explore other slot machines. How to balance between greedy selection of current optima and exploration of other possibilities is a major problem to be solved by dobby slot machines.
Algorithms for solving the MAB have a lot, wherein a greedy strategy is more direct, greedy selection is carried out on the current optimal solution according to a certain probability, such as the probability of 0.95, and the probability of 0.05 is left to search other more optimal solutions. An obvious disadvantage of greedy policy is that contextual information is not fully exploited, such as the possibility that multiple slot machines may have a correlation in advance. From this idea, context-based dobby algorithms have come to mind. The context-based dobby game machine algorithm records a d-dimensional feature array, which is updated each time a selection is made in an iteration, and the d-dimensional feature array records context-dependent data. The purpose of the algorithm is to gather enough information to find the correlation between context and reward so that the optimal choice can be made each time, and the maximum benefit is obtained. Common context-based dobby gambling algorithms are the upper Confidence bound algorithm ucb (upper Confidence bound), neural networks and random forests.
The iSTAMP uses the mucb (modified upper Confidence bound) to detect large flows, but it does not exploit the correlation of flows and is less accurate.
At present, various algorithms for network flow measurement have the problem of low detection and measurement accuracy.
Disclosure of Invention
The invention aims to solve the technical problem that various algorithms for network flow measurement have low detection and measurement accuracy at present, and provides a reinforcement learning-based large flow detection method to solve the problem.
The invention is realized by the following technical scheme:
the method for detecting the large flow based on reinforcement learning comprises the following steps: s1: detecting a data stream to obtain stream detection data; s2: optimizing a detection data model by adopting a historical sample buffer pool; s3: judging the big flow of the flow detection data by adopting the optimized detection data model, and detecting the big flow again; s4: the flow inspection data is put into the history sample buffer pool, and S2, S3, and S4 are again sequentially performed until the inspection ends.
Further, step S3 includes the following sub-steps: s31: the detection data model scores the data stream according to the current state; s32: selecting k flows for detection to obtain new flow detection data and a new network state; s33: the currently detected reward value is derived from the new flow detection data and the new network state.
Further, step S32 includes the following sub-steps: setting a probability threshold epsilonthreshold(ii) a When the random probability is less than epsilonthresholdTime of day, randomSelecting k streams for detection; when the random probability is greater than epsilonthresholdAnd then, sorting the scores of the streams in a reverse order, and selecting the k streams with the highest scores for detection.
Further, the epsilonthresholdObtained by the following formula:
Figure BDA0001691817720000021
wherein steps is the detection times; epsilonsIs the probability upper bound; epsiloneIs the lower probability limit; epsilondelayIs a rate parameter.
Further, step S33 includes the following sub-steps: taking the proportion of the detected network traffic in all the network traffic as a reward value reward; reward is obtained by the following formula:
Figure BDA0001691817720000022
in the formula, action is the set of the current detected flow; last is the set of sizes of the last detected stream; measure is the set of sizes of currently detected streams.
Further, step S4 includes the following sub-steps: the stream detection data put into the historical sample buffer pool includes: the state of the current network; making a decision on according to the current network state; the state next _ state to which the decision is made; the prize value reward for each stream.
Further, step S2 further includes the following sub-steps: obtaining errors of the detection value and the model value according to the current network state and the next _ state of the transferred state after decision making; and optimizing the model according to the errors of the detection value and the model value.
Further, the model adopts a neural network model.
Compared with the prior art, the invention has the following advantages and beneficial effects:
the invention takes the link state of the network and the historical measurement information of the flow as the state, takes the measurement size of the flow as the reward value, adopts the reinforcement learning-based large flow detection method to detect the large flow in the network, can fully extract the characteristics of the correlation and the like of the flow, and can improve the accuracy of the large flow detection.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a schematic diagram of the process steps of the present invention;
FIG. 2 is a block diagram of the system of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.
Examples
As shown in fig. 1 and fig. 2, the reinforced learning-based large flow detection method of the present invention includes the following steps: s1: detecting a data stream to obtain stream detection data; s2: optimizing a detection data model by adopting a historical sample buffer pool; s3: judging the big flow of the flow detection data by adopting the optimized detection data model, and detecting the big flow again; s4: the flow inspection data is put into the history sample buffer pool, and S2, S3, and S4 are again sequentially performed until the inspection ends.
Step S3 includes the following substeps: s31: the detection data model scores the data stream according to the current state; s32: selecting k flows for detection to obtain new flow detection data and a new network state; s33: the currently detected reward value is derived from the new flow detection data and the new network state.
A Markov decision process on the theoretical basis of reinforcement learning, wherein the Markov decision process is in different states stNext, a different action a is performedtDifferent prizes r(s) will be obtainedt,at). In different statesState stNext, a different action a is performedtThe environment will be dynamic according to the probability p(s)t+1|st,at) Change to a new state st+1. The goal of reinforcement learning is to learn a strategy piθ(st,at) I.e. in the current state stNext, what action a should be takentWherein theta is the strategy parameter of the user, and the reinforcement learning objective is to continuously optimize the strategy parameter theta. Action atNot only will the current benefit be directly affected, but the action will also have an effect on future benefits since the next state is also affected by the action, which is referred to as delayed benefits.
Step S32 includes the following substeps: setting a probability threshold epsilonthreshold(ii) a When the random probability is less than epsilonthresholdThen, randomly selecting k streams for detection; when the random probability is greater than epsilonthresholdAnd then, sorting the scores of the streams in a reverse order, and selecting the k streams with the highest scores for detection.
Said epsilonthresholdObtained by the following formula:
Figure BDA0001691817720000041
wherein steps is the detection times; epsilonsIs the probability upper bound; epsiloneIs the lower probability limit; epsilondelayIs a rate parameter.
Initially, a measurement is performed on all streams to obtain the current sizes of all streams. In the early stage of the algorithm operation, more exploration is needed to collect information, and in the later stage of the algorithm, the exploration probability can be reduced to obtain high benefits. To this end, we set a probability threshold epsilonthresholdWhen lower than epsilonthresholdWe randomly choose k streams to observe. If higher than epsilonthresholdThen we sort the scores of each flow in reverse order according to the algorithm, and choose the k flows with the highest scores to observe. The formula of the probability threshold is:
Figure BDA0001691817720000042
as the algorithm runs, the threshold value is gradually changed from epsilonsDown to epsiloneTypically, the value is εs=0.95,εeAt the beginning of the algorithm running, the algorithm has a high probability of randomly selecting to search, and at the later stage of the algorithm, the algorithm selects the current optimal strategy with a high probability, but still leaves a small probability to search. At this stage we get the current aggregate action that should measure the flow. By adjusting epsilondelayThe value of (c) may adjust the rate of decrease.
Step S33 includes the following substeps: taking the proportion of the detected network traffic in all the network traffic as a reward value reward; reward is obtained by the following formula:
Figure BDA0001691817720000043
in the formula, action is the set of the current detected flow; last is the set of sizes of the last detected stream; measure is the set of sizes of currently detected streams.
Step S4 includes the following substeps: the stream detection data put into the historical sample buffer pool includes: the state of the current network; making a decision on according to the current network state; the state next _ state to which the decision is made; the prize value reward for each stream.
The last measured size of each flow is retained, and if the flow is re-measured, the update is replaced with a new value, and if not, the last value is retained. And taking the ratio of the measured network traffic to all the network traffic as the quality of a selected result, and dividing the sum of the currently measured traffic by the sum of the estimated values of each flow to obtain an estimated score reward of the current strategy. The calculation formula is as follows:
Figure BDA0001691817720000044
the size of each flow over the past k measurement periods is chosen. All the measured values in the previous k periods are stored, and the model can extract more context information. By adjusting the size of k, we can extract features of different time scales. Storing the measurement condition under the current strategy, and storing (state, on, next _ state, reward) into a historical sample buffer pool, wherein the state is the state of the current network, and specifically the size of each stream in the past k measurement periods; the aciton is a decision made according to the current state, and stores and selects which flow to measure; next _ state is the state to which the decision is made to transition, i.e. the flow measurement data after the refinement; reward is an estimate of the prize value for each flow, which is measured as the amount of traffic in the total flow, and 0 otherwise.
Step S2 further includes the following sub-steps: obtaining errors of the detection value and the model value according to the current network state and the next _ state of the transferred state after decision making; and optimizing the model according to the errors of the detection value and the model value.
And optimizing the model by using the data in the historical sample buffer pool. The model obtains state _ values according to the state and obtains next _ state _ values according to the next _ state. The prize value received by the actual environment for a state is reward. γ in target _ state _ values γ + (1- γ) reward is a number from 0 to 1 for controlling the attenuation of potential reports. Ideally, the difference between state _ values and target should be small. And optimizing the model according to the error obtained by losss (state _ values, target).
The model adopts a neural network model.
The model used is a neural network, the input is the state action composed of the measured values of the past k cycles, the output is the estimated reward value rewarded corresponding to each flow, the value is the value after the normalization of the softmax, and the formula of the normalization of the softmax is
Figure BDA0001691817720000051
The loss evaluation function we used smooth l1loss, as follows:
Figure BDA0001691817720000052
Figure BDA0001691817720000053
taking the link state of the network and the historical measurement information of the flow as states, taking the measurement size of the flow as a reward value, and selecting k maximum flows for measurement in each strategy; the big flow detection method based on reinforcement learning is adopted to detect the big flow in the network, the characteristics of flow correlation and the like can be fully extracted, and the accuracy of big flow detection can be improved.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (2)

1. The method for detecting the large flow based on reinforcement learning is characterized by comprising the following steps:
s1: detecting a data stream to obtain stream detection data;
s2: optimizing a detection data model by adopting a historical sample buffer pool;
s3: judging the big flow of the flow detection data by adopting the optimized detection data model, and detecting the big flow again;
s4: putting the stream detection data into a history sample buffer pool, and sequentially executing S2, S3 and S4 again until the detection is finished;
wherein, step S3 includes the following substeps:
s31: the detection data model scores the data stream according to the current state;
s32: selecting k flows for detection to obtain new flow detection data and a new network state;
s33: obtaining a currently detected reward value according to the new flow detection data and the new network state;
step S32 includes the following substeps:
setting a probability threshold epsilonthreshold
When the random probability is less than the probability threshold epsilonthresholdThen, randomly selecting k streams for detection;
when the random probability is greater than the probability threshold epsilonthresholdThen, sorting the scores of the streams in a reverse order, and selecting k streams with the highest scores for detection;
the probability threshold epsilonthresholdObtained by the following formula:
Figure FDA0003255897060000011
wherein steps is the detection times; epsilonsIs a probability threshold epsilonthresholdThe upper limit of (d); epsiloneIs a probability threshold epsilonthresholdThe lower limit of (d); epsilondelayIs a rate parameter;
step S33 includes the following substeps:
taking the proportion of the detected network traffic in all the network traffic as a reward value reward;
reward is obtained by the following formula:
Figure FDA0003255897060000012
in the formula, action is the set of the current detected flow; last is the set of sizes of the last detected stream; measure is the set of the sizes of the currently detected streams, and n is the total number of the streams in the network;
step S4 includes the following substeps:
the stream detection data put into the historical sample buffer pool includes: the state of the current network; making a decision on according to the current network state; the state next _ state to which the decision is made; detecting a reward value reward for a stream;
step S2 further includes the following sub-steps:
obtaining errors of the detection value and the model value according to the current network state and the next _ state of the transferred state after decision making;
and optimizing the model according to the errors of the detection value and the model value.
2. The reinforcement learning-based large flow detection method according to claim 1, wherein the model is a neural network model.
CN201810594740.0A 2018-06-11 2018-06-11 Strong learning based large flow detection method Expired - Fee Related CN109039797B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810594740.0A CN109039797B (en) 2018-06-11 2018-06-11 Strong learning based large flow detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810594740.0A CN109039797B (en) 2018-06-11 2018-06-11 Strong learning based large flow detection method

Publications (2)

Publication Number Publication Date
CN109039797A CN109039797A (en) 2018-12-18
CN109039797B true CN109039797B (en) 2021-11-23

Family

ID=64612503

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810594740.0A Expired - Fee Related CN109039797B (en) 2018-06-11 2018-06-11 Strong learning based large flow detection method

Country Status (1)

Country Link
CN (1) CN109039797B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110351166B (en) * 2019-08-12 2021-08-17 电子科技大学 Network-level fine-grained flow measurement method based on flow statistical characteristics
CN112256739B (en) * 2020-11-12 2022-11-18 同济大学 Method for screening data items in dynamic flow big data based on multi-arm gambling machine
CN113746947B (en) * 2021-07-15 2022-05-06 清华大学 IPv6 active address detection method and device based on reinforcement learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102821002A (en) * 2011-06-09 2012-12-12 中国移动通信集团河南有限公司信阳分公司 Method and system for network flow anomaly detection
CN103840988A (en) * 2014-03-17 2014-06-04 湖州师范学院 Network traffic measurement method based on RBF neural network
CN106411597A (en) * 2016-10-14 2017-02-15 广东工业大学 Network traffic abnormality detection method and system
CN107682317A (en) * 2017-09-06 2018-02-09 中国科学院计算机网络信息中心 Establish method, data detection method and the equipment of Data Detection model
CN107948166A (en) * 2017-11-29 2018-04-20 广东亿迅科技有限公司 Traffic anomaly detection method and device based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102821002A (en) * 2011-06-09 2012-12-12 中国移动通信集团河南有限公司信阳分公司 Method and system for network flow anomaly detection
CN103840988A (en) * 2014-03-17 2014-06-04 湖州师范学院 Network traffic measurement method based on RBF neural network
CN106411597A (en) * 2016-10-14 2017-02-15 广东工业大学 Network traffic abnormality detection method and system
CN107682317A (en) * 2017-09-06 2018-02-09 中国科学院计算机网络信息中心 Establish method, data detection method and the equipment of Data Detection model
CN107948166A (en) * 2017-11-29 2018-04-20 广东亿迅科技有限公司 Traffic anomaly detection method and device based on deep learning

Also Published As

Publication number Publication date
CN109039797A (en) 2018-12-18

Similar Documents

Publication Publication Date Title
CN109039797B (en) Strong learning based large flow detection method
US7493346B2 (en) System and method for load shedding in data mining and knowledge discovery from stream data
US11501636B2 (en) Road segment speed prediction method, apparatus, server, medium, and program product
CN111444021B (en) Synchronous training method, server and system based on distributed machine learning
CN110443352B (en) Semi-automatic neural network optimization method based on transfer learning
CN113489674B (en) Malicious traffic intelligent detection method and application for Internet of things system
CN109471847B (en) I/O congestion control method and control system
Liu et al. Fine-grained flow classification using deep learning for software defined data center networks
Chen et al. An experience driven design for IEEE 802.11 ac rate adaptation based on reinforcement learning
CN115277354B (en) Fault detection method for command control network management system
CN114500561B (en) Power Internet of things network resource allocation decision-making method, system, equipment and medium
CN114584406B (en) Industrial big data privacy protection system and method for federated learning
CN116524712A (en) Highway congestion prediction method, system and device integrating space-time associated data
CN117971475A (en) Intelligent management method and system for GPU computing force pool
CN114202065B (en) Stream data prediction method and device based on incremental evolution LSTM
Dong et al. Network traffic identification in packet sampling environment
Zhu et al. Adaptive deep reinforcement learning for non-stationary environments
Modi et al. QoS driven channel selection algorithm for opportunistic spectrum access
CN104283934B (en) A kind of WEB service method for pushing, device and server based on reliability prediction
Sarnovsky et al. Adaptive bagging methods for classification of data streams with concept drift
CN113037648B (en) Data transmission method and device
CN117118836A (en) Multi-stage energy-saving migration method for service function chain based on resource prediction
Wu et al. Multimedia traffic classification for imbalanced environment
Khudoyarova et al. Using Machine Learning to Analyze Network Traffic Anomalies
Li et al. Research on scale-free network user-side big data balanced partition strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20211123