CN114039927A - Control method for routing flow of power information network - Google Patents

Control method for routing flow of power information network Download PDF

Info

Publication number
CN114039927A
CN114039927A CN202111299512.9A CN202111299512A CN114039927A CN 114039927 A CN114039927 A CN 114039927A CN 202111299512 A CN202111299512 A CN 202111299512A CN 114039927 A CN114039927 A CN 114039927A
Authority
CN
China
Prior art keywords
data
network
agent
priority
flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111299512.9A
Other languages
Chinese (zh)
Other versions
CN114039927B (en
Inventor
孟凡军
王震宇
潘裕庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
Suzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd filed Critical Suzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority to CN202111299512.9A priority Critical patent/CN114039927B/en
Publication of CN114039927A publication Critical patent/CN114039927A/en
Application granted granted Critical
Publication of CN114039927B publication Critical patent/CN114039927B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2425Traffic characterised by specific attributes, e.g. priority or QoS for supporting services specification, e.g. SLA
    • H04L47/2433Allocation of priorities to traffic types
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a control method for routing flow of a power information network, which comprises the steps of capturing flow data and creating a data set; preprocessing a data set X, wherein the preprocessing comprises adjusting unordered data; establishing a network environment and creating agents, and interacting the agents with the network environment, including performing an action by assigning a priority to the agent, performing an action y at the agentkThereafter, the agent is awarded a prize bkNext state z with network environmentk', and four tuples (z)k,yk,bk,zk') put into an experience pool U; for the quadruple (z) in the experience pool Uk,yk,bk,zk') sampling and then updating the agent; and processing the flow data to be detected through the updated agent to obtain a corresponding priority, and further controlling the routing flow of the power information network based on the priority. The invention providesThe method can distribute reasonable priority to the flow data, thereby ensuring the transmission efficiency and quality of the power network system.

Description

Control method for routing flow of power information network
Technical Field
The invention relates to the field of power information networks, in particular to a control method for routing flow of a power information network.
Background
Route flow control is a rule specified to ensure that each application is provided with different bandwidth and priority, so that limited bandwidth plays the most role. There is a need for connection to the internet for various devices, which are typically connected to a network. The route management device often needs to be responsible for network connectivity and flow control. The traditional flow control method is based on IP and ports, can realize the control of simple flow distribution, but lacks scalability and expansibility, cannot adapt to novel flow to make a distribution scheme, is difficult to find a proper rule to distribute broadband for different applications, wastes time and labor, and is more and more difficult to meet the increasingly complex requirements.
With the advance of power grid intellectualization, power information networks are continuously developed, the communication data volume of the power information networks is more and more huge, and the situation that the service quality of the power information networks is reduced due to network node congestion is frequent because the information flow of an application system in the information networks needs a large broadband (such as a video consultation system and the like). Although some methods are used for solving the problem at present, the power information network has the characteristics of fast flow real-time change, large data volume, large difference of different service requirements, high complexity and the like, so that the problem is difficult to solve by using a single method.
In view of this, it is necessary to design a more reasonable dynamic control method for routing traffic by integrating the advantages of various methods and implementing autonomous learning traffic control for the power information network.
Disclosure of Invention
In view of the above, it is necessary to provide a method for controlling power information network routing traffic, which can effectively control the power information network routing traffic, and the technical solution provided by the present invention is as follows:
the invention provides a control method for routing flow of a power information network, which comprises the following steps:
s1, capturing flow data and creating a data set;
s2, preprocessing the data set X, wherein the preprocessing comprises adjusting unordered data;
s3, establishing a network environment and creating an agent, and making the agent interact with the network environment, including the following steps:
s31, utilizing the agent to assign the priority according to the following formula, namely executing the action:
Figure BDA0003337877830000021
in the formula, ykIs the action currently performed by the agent, and ykIs e.g. Y, said Y comprising low, medium and high priority, argmax being a predefined parameter, zkIs the current state of the network environment, and zkObtaining, from the data set X, ζ is an independent Gaussian noise;
s32, executing action y on the intelligent agentkThereafter awarding a prize b to said agentkWith the next state z of the network environmentk', and four tuples (z)k,yk,bk,zk') into an experience pool U, and repeatedly executing S31-S32 until all z in the data set X are obtainedkCorresponding to ykTo obtain all four tuples (z)k,yk,bk,zk′);
S33, for quadruple (z) in the experience pool Uk,yk,bk,zk') sampling and then updating the agent;
and S4, processing the traffic data to be detected through the updated agent to obtain a corresponding priority, and further controlling the routing traffic of the power information network based on the priority.
The fusion reinforcement learning method based on the invention enables the intelligent agent to achieve maximum return through a learning strategy in the interaction process with the network environment, can fully exert the advantages of various methods, can identify the flow of different equipment and services, and distributes a proper broadband to the equipment and the services, thereby realizing more reasonable network access quality and solving the problem of dynamic control of the routing flow of the power information network.
Further, the traffic data in step S1 includes data ID, data IP address, protocol type, and accumulated number of bytes.
Further, in step S2, the method for adjusting unordered data includes:
performing one-hot (one-hot) encoding on the data IP address and the protocol type in the traffic data to obtain data as follows:
z=(x1,x2,...,xn)
where z is data, n is the dimension in which the data z are connected together, xiAre numerical values.
Further, the agent includes a target network and a policy network.
Further preferably, the policy network comprises an input layer, a fully connected layer and an output layer, the input layer being configured to input the xiThe output layer is configured to output the ykI.e. priority;
the input layer comprises 16 nodes, the full-link layer comprises a first layer full-link layer and a second layer full-link layer, and the first layer full-link layer and the second layer full-link layer all comprise 256 nodes.
Further, the control method may further comprise awarding a prize b to the agent byk
bk=-netlag*Sumsize
In the formula, bkTo give the agent a reward, netlag is the network delay factor and Sumsize is the number of bytes accumulated.
Further, the control method further comprises comparing (z) in the experience pool U by the following formulak,yk,bk,zk') sampling:
Figure BDA0003337877830000031
Figure BDA0003337877830000032
wherein M (i) is after samplingI is the ith quadruple in the experience pool U, miIs the weight of the ith data sample, k is the total number of data in the experience pool U, mjIs the weight of the jth data sample, alpha is the ratio adopted by the priority, AiAs a dominance function, bkFor the reward, gamma is the decay factor,
Figure BDA0003337877830000033
in order to be a policy network,
Figure BDA0003337877830000034
as a target network, zkIs the current state of the network environment, and zkObtaining, from said data set X, zk' is the next state of the network environment, ykIs the action currently performed by the agent, and ykE is Y, the Y comprises low priority, medium priority and high priority, zeta is independent Gaussian noise, and epsilon is a positive number.
Further, updating the agent in step S33 includes updating the policy network parameters by:
Figure BDA0003337877830000035
wherein j is the serial number of the data extracted from the experience pool U, m is the number of the data extracted from the experience pool U, gamma is the attenuation factor,
Figure BDA0003337877830000036
in order to be a policy network,
Figure BDA0003337877830000037
as a target network, zjIs state, yjIs zjAction performed in the State, zjIs' zjPerforming action y in StatejThe next state entered, bjIs zjPerforming action y in StatejAwarded prize, omegajAnd the corresponding weight of the jth data in the strategy network parameter updating process is determined by the following formula:
ωj=(N*M(j))
in the formula, ωjAnd (3) the weight corresponding to the jth data when the strategy network parameters are updated is taken as the jth data, N is the sampling number of the experience pool U, M (j) is the sampled data, j is the jth quadruple in the experience pool U, and beta is the hyperparameter.
Further preferably, the updating the agent in step S33 further includes updating the target network parameter:
and after a preset time, obtaining the updated content of the target network parameters from the policy network.
Further, the control method further includes:
the method comprises the steps of carrying out grade division on flow data to be detected according to a preset flow threshold, wherein the grade comprises small flow, medium flow and large flow, the small flow grade corresponds to a high priority, the medium flow corresponds to a medium priority, and the large flow corresponds to a low priority.
The invention has the following advantages: by utilizing interaction between the intelligent agent and the network environment and training learning, reasonable priority can be distributed to the flow data, so that the transmission efficiency and quality of the power network system are ensured.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a first flowchart of a control method for routing traffic of an electrical information network according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a policy network for a control method of routing traffic of an electrical information network according to an embodiment of the present invention;
fig. 3 is a second flowchart of a method for controlling routing traffic of an electrical information network according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood and more clearly understood by those skilled in the art, the technical solutions of the embodiments of the present invention will be described below in detail and completely with reference to the accompanying drawings. It should be noted that the implementations not shown or described in the drawings are in a form known to those of ordinary skill in the art. Additionally, while exemplifications of parameters including particular values may be provided herein, it is to be understood that the parameters need not be exactly equal to the respective values, but may be approximated to the respective values within acceptable error margins or design constraints. It is to be understood that the described embodiments are merely exemplary of a portion of the invention and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. In addition, the terms "comprises" and "comprising," and any variations thereof, in the description and claims of this invention, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In one embodiment of the invention, a control method for routing traffic of a power information network is provided, which comprises the following steps:
s1, capturing flow data and creating a data set, wherein the flow data comprises a data ID, a data IP address, a protocol type and a cumulative byte number, and can be historical flow data, but the protection scope of the invention is not limited by the data;
s2, preprocessing the data set X, wherein the preprocessing comprises adjusting unordered data;
s3, establishing a network environment and creating an agent, and making the agent interact with the network environment, wherein the agent comprises a target network and a strategy network, and the interaction process specifically comprises the following steps:
s31, utilizing the agent to assign the priority according to the following formula, namely executing the action:
Figure BDA0003337877830000051
in the formula, ykIs the action currently performed by the agent, and ykIs e.g. Y, said Y comprising low, medium and high priority, argmax being a predefined parameter, zkIs the current state of the network environment, and zkObtaining, from the data set X, ζ is an independent Gaussian noise;
s32, executing action y on the intelligent agentkThereafter awarding a prize b to said agentkWith the next state z of the network environmentk', and four tuples (z)k,yk,bk,zk') into an experience pool U, and repeatedly executing S31-S32 until all z in the data set X are obtainedkCorresponding to ykTo obtain all four tuples (z)k,yk,bk,zk′);
S33, for quadruple (z) in the experience pool Uk,yk,bk,zk') sampling and updating the agent, wherein updating the agent comprises updating the policy network parameter and the target network parameter, and for the target network parameter, the updated content of the target network parameter can be obtained from the policy network after a preset time.
And S4, processing the traffic data to be detected through the updated agent to obtain a corresponding priority, and further controlling the routing traffic of the power information network based on the priority. Specifically, the flow data to be detected may be classified into classes according to a preset flow threshold, where the classes include a small flow, a medium flow, and a large flow, the small flow class corresponds to a high priority, the medium flow class corresponds to a medium priority, and the large flow class corresponds to a low priority.
In an embodiment of the present invention, the method for controlling routing traffic includes:
the method comprises the following steps: data packet capture
Flow data collection is a prerequisite for flow analysis and flow control. The process of packet capture is to capture and collect traffic data. In this embodiment, a current specific network environment is used to construct a service traffic characteristic, data is stored in a memory pool with a fixed size, and the traffic data forms a data set X, so that the method training speed is increased, the method can adapt to the network use preference of each department sub-network or application system sub-network, and the method can be updated more flexibly from the existing environment. The service flow characteristics of the data packet collection record in this embodiment are shown in table 1:
TABLE 1 traffic flow characteristics
Name of field Type (B) Length of Description of the invention
ID Int (integer type) 16 Data ID
IP Char (character type) 15 Data IP address
Protocol Char (character type) 10 Type of protocol
Sumsize Int (integer type) 16 Accumulated byte number
Step two: data pre-processing
During the training process, the data set X needs to be converted into a data form that can be used by the method. Because the IP and protocol in the service data type are text type data and are unordered, the two data are subjected to one-hot (one-hot) encoding, and discrete values are expanded to a numerical space, so that data z ═ x (x) of a single service traffic type is obtained1,x2,...,xn) Where n represents the dimension in which the data z are connected together, xiSpecific numerical values are shown.
Step three: network environment modeling
The core design idea of the control method is to improve the network transmission quality by searching the optimal strategy to distribute the priority of the flow, and mainly depends on an intelligent agent established by reinforcement learning to determine the priority of the service flow and the flow data.
In reinforcement learning, the agent needs to make a decision, i.e. the assignment of priority to the current traffic flow, at each time step. At this time, the state of the environment is zkWhere k represents the time step and z is sampled from the data set X. Action y of agentkE Y, Y ═ {0,1,2}, where 0 denotes low priority, 1 denotes medium priority, and 2 denotes high priority, for the current state zkPerforming action ykTo assign a priority.
In this embodiment, action y on agentkSet prize as bk(ii) a-netlag Sumsize, wherein bkIn order to give the intelligent agent a reward, netlag is a network delay coefficient, and Sumsize is a cumulative number of bytes, so as to construct a better network experience, and the priority of small byte flow data with low time delay required by the network is higher. The goal of setting the agent is to prioritize small flows as much as possible so that the user's immediate flow is fed back, while large flows of I P downloads etc. will be de-prioritized.
Specifically, the agent includes a target network and a policy network, as shown in FIG. 2, the policy network includes an input layer configured to input the x, a full connection layer, and an output layeriI.e. feature vectors, the output layer being configured to output the ykI.e. the result of the priority assignment (high, medium, low). Wherein, in one embodiment of the present invention, the input layer comprises 16 nodes, the fully-connected layer comprises a first layer fully-connected layer and a second layer fully-connected layer, and the first layer fully-connected layer and the second layer fully-connected layer each comprise 256 nodes.
Step four: model building and training
Strategy network in initialization method
Figure BDA0003337877830000073
Parameter (d) of
Figure BDA0003337877830000074
Attenuation factor gamma, target network
Figure BDA0003337877830000075
Network parameters of
Figure BDA0003337877830000076
And experience pool U (the default data structure for experience pool is SumTree).
Specifically, the first step is the collection of data. First, the agent interacts with the simulated network environment to collect data. At each time step, the agent chooses the action to perform, i.e., assigns a priority, using a greedy approach according to the following formula:
Figure BDA0003337877830000071
in the formula, ykIs the action currently performed by the agent, and ykE.g. Y, said Y comprising low, medium and high priority, argmax being a predefined parameter, in this embodiment argmax indicating that the maximum is obtained
Figure BDA0003337877830000072
Parameter, z, to which value correspondskIs the current state of the network environment, and zkFrom the data set X, ζ is an independent Gaussian noise that may enhance the exploratory capabilities of the agent.
The agent obtains the next state z after performing the actionk' with reward bkAnd will quadruple (z)k,yk,bk,zk') into an experience pool U.
The second step is the training of the model. Firstly, sampling an experience pool U, and regarding any track (z)k,yk,bk,zk') the sampling method is as follows:
Figure BDA0003337877830000081
Figure BDA0003337877830000082
where M (i) is the sampled data, i is the ith quadruple in the experience pool U, and miIs the weight of the ith data sample, k is the total number of data in the experience pool U, mjIs the weight of the jth data sample, alpha is the ratio adopted by the priority, AiIs an advantage function which is used for measuring the prediction accuracy degree of the current network parameter to the ith data, bkFor the reward, gamma is the decay factor,
Figure BDA0003337877830000083
in order to be a policy network,
Figure BDA0003337877830000084
as a target network, zkIs the current state of the network environment, and zkObtaining, from said data set X, zk' is the next state of the network environment, ykIs the action currently performed by the agent, and ykE Y, said Y comprising low, medium and high priority, ζ being independent gaussian noise, and e being a positive number, in this embodiment e being a very small positive number, for avoiding | aiThe trace with 0 is not sampled.
After sampling is completed, updating the policy network parameters is completed by using the following formula:
Figure BDA0003337877830000085
wherein j is the serial number of the data extracted from the experience pool U, m is the number of the data extracted from the experience pool U, gamma is the attenuation factor,
Figure BDA0003337877830000086
in order to be a policy network,
Figure BDA0003337877830000087
as a target network, zjIs state, yjIs zjAction performed in the State, zjIs' zjPerforming action y in StatejThe next state to be entered, argmax being a predefined parameter, bjIs zjPerforming action y in StatejAwarded prize, omegajAnd the corresponding weight of the jth data in the strategy network parameter updating process is determined by the following formula:
ωj=(N*M(j))
in the formula, ωjThe corresponding weight of the jth data when updating the strategy network parameters is defined, and N is the sampling number of the experience pool UThe quantity, m (j), is the sampled data, j is the jth quadruple in the experience pool U, and β is a hyperparameter, in this embodiment, β is a hyperparameter used for adjusting the influence of the weight on the convergence rate.
In addition, it should be noted that only the parameters in the policy network can be updated in a gradient manner;
Figure BDA0003337877830000088
the target network is shown, in this embodiment, the structure of the target network is the same as that of the policy network, and the parameters of the target network are copied from the policy network after a period of time and do not participate in direct updating. The random gradient descent method selects a learning rate adaptive optimizer and updates the gradient in a small batch of samples.
Step five: flow control
In this embodiment, after the policy network model is trained and stored locally, the local policy model is read first, the traffic data packet to be detected is subjected to data preprocessing to obtain a coded state $ z, the coded state $ z is input into the model to obtain an appropriate priority assigned to the traffic data packet, and then the result is applied to an egress queue to form a queue algorithm scheduling rule, which is shown in fig. 3.
In an embodiment of the present invention, a flow control system based on the control method for the routing flow of the power information network is provided, which includes an environment modeling module, a control construction module, and a flow control module, wherein the environment modeling module is used for modeling a routing environment and adapting to a data input format of the control method; the control construction module is used for establishing a network model and training to obtain a control model; and the flow control module is used for obtaining the current flow control scheme by using the control construction module. Specifically, as shown in fig. 1, after the priority is determined, the flow control is performed based on the priority.
The invention designs a fusion reinforcement learning method, autonomously learns the flow control problem, learns the control method of the current network environment flow in continuous trial and error, and can be better suitable for solving the problem of complex dynamic control of the power information network routing flow.
The above description is only for the preferred embodiment of the present invention and is not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes that can be directly or indirectly applied to other related technical fields using the contents of the present specification and the accompanying drawings are included in the scope of the present invention.

Claims (10)

1. A control method for routing traffic of a power information network is characterized by comprising the following steps:
s1, capturing flow data and creating a data set;
s2, preprocessing the data set X, wherein the preprocessing comprises adjusting unordered data;
s3, establishing a network environment and creating an agent, and making the agent interact with the network environment, including the following steps:
s31, utilizing the agent to assign the priority according to the following formula, namely executing the action:
Figure FDA0003337877820000011
in the formula, ykIs the action currently performed by the agent, and ykIs e.g. Y, said Y comprising low, medium and high priority, argmax being a predefined parameter, zkIs the current state of the network environment, and zkObtained from the data set X, ζ is the independent Gaussian noise,
Figure FDA0003337877820000012
is a policy network;
s32, executing action y on the intelligent agentkThereafter awarding a prize b to said agentkWith the next state z of the network environmentk', and four tuples (z)k,yk,bk,zk') into an experience pool U, and repeatedly executing S31-S32 until all z in the data set X are obtainedkCorresponding to ykTo obtainTo all quadruples (z)k,yk,bk,zk′);
S33, for quadruple (z) in the experience pool Uk,yk,bk,zk') sampling and then updating the agent;
and S4, processing the traffic data to be detected through the updated agent to obtain a corresponding priority, and further controlling the routing traffic of the power information network based on the priority.
2. The control method for routing traffic of the electric power information network according to claim 1, wherein the traffic data in the step S1 includes data ID, data IP address, protocol type, and number of accumulated bytes.
3. The method for controlling routing traffic of an electrical power information network of claim 2, wherein in step S2, the method of adjusting the unordered data comprises:
performing one-hot coding on the data IP address and the protocol type in the flow data to obtain data as follows:
z=(x1,x2,...,xn)
where z is data, n is the dimension in which the data z are connected together, xiAre numerical values.
4. The control method for routing traffic for a power information network of claim 3, wherein the agent comprises a target network and a policy network.
5. The method of claim 4, wherein the policy network comprises an input layer, a fully connected layer, and an output layer, the input layer configured to input the xiThe output layer is configured to output the ykI.e. priority;
the input layer comprises 16 nodes, the full-link layer comprises a first layer full-link layer and a second layer full-link layer, and the first layer full-link layer and the second layer full-link layer all comprise 256 nodes.
6. The method of claim 4 for controlling routing traffic for a power information network, the method further comprising awarding a prize b to the agent byk
bk=-netlag*Sumsize
In the formula, bkTo give the agent a reward, netlag is the network delay factor and Sumsize is the number of bytes accumulated.
7. The control method for routing traffic for an electrical power information network of claim 6, further comprising matching (z) in experience pool U byk,yk,bk,zk') sampling:
Figure FDA0003337877820000021
Figure FDA0003337877820000022
where M (i) is the sampled data, i is the ith quadruple in the experience pool U, and miIs the weight of the ith data sample, k is the total number of data in the experience pool U, mjFor the weight of the jth data sample, α is the ratio adopted by the priority, bkFor awards, gamma is the attenuation factor, zkIs the current state of the network environment, and zkObtaining, from said data set X, zk' is the next state of the network environment, ykIs the action currently performed by the agent, and ykE.g., Y, said Y comprising low, medium and high priority, ζ being independent gaussian noise,
Figure FDA0003337877820000023
in order to be the target network,
Figure FDA0003337877820000024
and
Figure FDA0003337877820000025
for policy networks, ε is a positive number, AiAs a merit function, it is determined by the following equation:
Figure FDA0003337877820000031
in the formula, bkFor bonus, gamma is the attenuation factor, argmax is a predefined parameter, zkIs the current state of the network environment, and zkObtaining, from said data set X, zk' is the next state of the network environment, ykIs the action currently performed by the agent, and ykE.g., Y, said Y comprising a low priority, a medium priority and a high priority,
Figure FDA0003337877820000032
in order to be the target network,
Figure FDA0003337877820000033
and
Figure FDA0003337877820000034
for a policy network, ζ is the independent gaussian noise.
8. The method of claim 7, wherein the updating the agent in step S33 includes updating the policy network parameters by:
Figure FDA0003337877820000035
where j is the serial number of the data extracted from the experience pool U, m is the number of the data extracted from the experience pool U, γ is the attenuation factor, zjIs state, yjIs zjAction executed in State, z'jIs zjPerforming action y in StatejThe next state entered, bjIs zjPerforming action y in StatejThe benefit to be obtained is that the user has,
Figure FDA0003337877820000036
in order to be the target network,
Figure FDA0003337877820000037
and
Figure FDA0003337877820000038
for policy networks, ωjAnd the corresponding weight of the jth data in the strategy network parameter updating process is determined by the following formula:
ωj=(N*M(j))
in the formula, ωjAnd (3) the weight corresponding to the jth data when the strategy network parameters are updated is taken as the jth data, N is the sampling number of the experience pool U, M (j) is the sampled data, j is the jth quadruple in the experience pool U, and beta is the hyperparameter.
9. The method for control of power information network routing traffic of claim 8, wherein updating the agent in step S33 further comprises updating the target network parameters:
and after a preset time, obtaining the updated content of the target network parameters from the policy network.
10. The control method for routing traffic for an electrical power information network of claim 1, wherein the control method further comprises:
the method comprises the steps of carrying out grade division on flow data to be detected according to a preset flow threshold, wherein the grade comprises small flow, medium flow and large flow, the small flow grade corresponds to a high priority, the medium flow corresponds to a medium priority, and the large flow corresponds to a low priority.
CN202111299512.9A 2021-11-04 2021-11-04 Control method for routing flow of power information network Active CN114039927B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111299512.9A CN114039927B (en) 2021-11-04 2021-11-04 Control method for routing flow of power information network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111299512.9A CN114039927B (en) 2021-11-04 2021-11-04 Control method for routing flow of power information network

Publications (2)

Publication Number Publication Date
CN114039927A true CN114039927A (en) 2022-02-11
CN114039927B CN114039927B (en) 2023-09-12

Family

ID=80143003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111299512.9A Active CN114039927B (en) 2021-11-04 2021-11-04 Control method for routing flow of power information network

Country Status (1)

Country Link
CN (1) CN114039927B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108900419A (en) * 2018-08-17 2018-11-27 北京邮电大学 Route decision method and device based on deeply study under SDN framework
CN111526036A (en) * 2020-03-20 2020-08-11 西安电子科技大学 Short flow real-time optimization method, system and network transmission terminal
CN113392971A (en) * 2021-06-11 2021-09-14 武汉大学 Strategy network training method, device, equipment and readable storage medium
US20210329668A1 (en) * 2020-04-15 2021-10-21 Samsung Electronics Co., Ltd. Method and system for radio-resource scheduling in telecommunication-network
CN113543156A (en) * 2021-06-24 2021-10-22 中国科学院沈阳自动化研究所 Industrial wireless network resource allocation method based on multi-agent deep reinforcement learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108900419A (en) * 2018-08-17 2018-11-27 北京邮电大学 Route decision method and device based on deeply study under SDN framework
CN111526036A (en) * 2020-03-20 2020-08-11 西安电子科技大学 Short flow real-time optimization method, system and network transmission terminal
US20210329668A1 (en) * 2020-04-15 2021-10-21 Samsung Electronics Co., Ltd. Method and system for radio-resource scheduling in telecommunication-network
CN113392971A (en) * 2021-06-11 2021-09-14 武汉大学 Strategy network training method, device, equipment and readable storage medium
CN113543156A (en) * 2021-06-24 2021-10-22 中国科学院沈阳自动化研究所 Industrial wireless network resource allocation method based on multi-agent deep reinforcement learning

Also Published As

Publication number Publication date
CN114039927B (en) 2023-09-12

Similar Documents

Publication Publication Date Title
Xiao et al. Deep-q: Traffic-driven qos inference using deep generative network
US5598532A (en) Method and apparatus for optimizing computer networks
Grossglauser et al. A time-scale decomposition approach to measurement-based admission control
CN112486690B (en) Edge computing resource allocation method suitable for industrial Internet of things
CN108199928A (en) A kind of multidimensional power telecom network method for predicting and system
Bonald et al. On performance bounds for balanced fairness
US20030061017A1 (en) Method and a system for simulating the behavior of a network and providing on-demand dimensioning
CN107332770B (en) Method for selecting routing path of necessary routing point
CN114760669B (en) Flow prediction-based route decision method and system
CN109446385A (en) A kind of method of equipment map that establishing Internet resources and the application method of the equipment map
Rahman et al. Auto-scaling network service chains using machine learning and negotiation game
CN109347657B (en) Method for constructing virtual data domain of scientific and technological service under SDN mode
CN109243165A (en) A kind of method of the network T-S fuzzy system packet loss of processing event triggering
Sadek et al. ATM dynamic bandwidth allocation using F-ARIMA prediction model
Nekouei et al. Performance analysis of gradient-based nash seeking algorithms under quantization
CN101399708B (en) Method and device for establishing network performance model
Li et al. Sampling and control strategy: networked control systems subject to packet disordering
Cenggoro et al. Dynamic bandwidth management based on traffic prediction using Deep Long Short Term Memory
CN110601916A (en) Flow sampling and application sensing system based on machine learning
Zhang et al. Traffic fluctuations on weighted networks
CN114039927A (en) Control method for routing flow of power information network
CN116389347A (en) Dynamic SDN route optimization algorithm based on reinforcement learning
Aziz et al. Content-Aware Network Traffic Prediction Framework for Quality of Service-Aware Dynamic Network Resource Management
CN111294284B (en) Traffic scheduling method and device
Sikha et al. On the Inter-Departure Times in $ M/\widetilde {D}/1/{B} _ {on} $ Queue With Queue-Length Dependent Service and Deterministic/Exponential Vacations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant