CN114039927A

CN114039927A - Control method for routing flow of power information network

Info

Publication number: CN114039927A
Application number: CN202111299512.9A
Authority: CN
Inventors: 孟凡军; 王震宇; 潘裕庆
Original assignee: Suzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Current assignee: Suzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority date: 2021-11-04
Filing date: 2021-11-04
Publication date: 2022-02-11
Anticipated expiration: 2041-11-04
Also published as: CN114039927B

Abstract

The invention discloses a control method for routing flow of a power information network, which comprises the steps of capturing flow data and creating a data set; preprocessing a data set X, wherein the preprocessing comprises adjusting unordered data; establishing a network environment and creating agents, and interacting the agents with the network environment, including performing an action by assigning a priority to the agent, performing an action y at the agent_kThereafter, the agent is awarded a prize b_kNext state z with network environment_k', and four tuples (z)_k,y_k,b_k,z_k') put into an experience pool U; for the quadruple (z) in the experience pool U_k,y_k,b_k,z_k') sampling and then updating the agent; and processing the flow data to be detected through the updated agent to obtain a corresponding priority, and further controlling the routing flow of the power information network based on the priority. The invention providesThe method can distribute reasonable priority to the flow data, thereby ensuring the transmission efficiency and quality of the power network system.

Description

Control method for routing flow of power information network

Technical Field

The invention relates to the field of power information networks, in particular to a control method for routing flow of a power information network.

Background

Route flow control is a rule specified to ensure that each application is provided with different bandwidth and priority, so that limited bandwidth plays the most role. There is a need for connection to the internet for various devices, which are typically connected to a network. The route management device often needs to be responsible for network connectivity and flow control. The traditional flow control method is based on IP and ports, can realize the control of simple flow distribution, but lacks scalability and expansibility, cannot adapt to novel flow to make a distribution scheme, is difficult to find a proper rule to distribute broadband for different applications, wastes time and labor, and is more and more difficult to meet the increasingly complex requirements.

With the advance of power grid intellectualization, power information networks are continuously developed, the communication data volume of the power information networks is more and more huge, and the situation that the service quality of the power information networks is reduced due to network node congestion is frequent because the information flow of an application system in the information networks needs a large broadband (such as a video consultation system and the like). Although some methods are used for solving the problem at present, the power information network has the characteristics of fast flow real-time change, large data volume, large difference of different service requirements, high complexity and the like, so that the problem is difficult to solve by using a single method.

In view of this, it is necessary to design a more reasonable dynamic control method for routing traffic by integrating the advantages of various methods and implementing autonomous learning traffic control for the power information network.

Disclosure of Invention

In view of the above, it is necessary to provide a method for controlling power information network routing traffic, which can effectively control the power information network routing traffic, and the technical solution provided by the present invention is as follows:

the invention provides a control method for routing flow of a power information network, which comprises the following steps:

s1, capturing flow data and creating a data set;

s2, preprocessing the data set X, wherein the preprocessing comprises adjusting unordered data;

s3, establishing a network environment and creating an agent, and making the agent interact with the network environment, including the following steps:

s31, utilizing the agent to assign the priority according to the following formula, namely executing the action:

in the formula, y_kIs the action currently performed by the agent, and y_kIs e.g. Y, said Y comprising low, medium and high priority, argmax being a predefined parameter, z_kIs the current state of the network environment, and z_kObtaining, from the data set X, ζ is an independent Gaussian noise;

s32, executing action y on the intelligent agent_kThereafter awarding a prize b to said agent_kWith the next state z of the network environment_k', and four tuples (z)_k,y_k,b_k,z_k') into an experience pool U, and repeatedly executing S31-S32 until all z in the data set X are obtained_kCorresponding to y_kTo obtain all four tuples (z)_k,y_k,b_k,z_k′)；

S33, for quadruple (z) in the experience pool U_k,y_k,b_k,z_k') sampling and then updating the agent;

and S4, processing the traffic data to be detected through the updated agent to obtain a corresponding priority, and further controlling the routing traffic of the power information network based on the priority.

The fusion reinforcement learning method based on the invention enables the intelligent agent to achieve maximum return through a learning strategy in the interaction process with the network environment, can fully exert the advantages of various methods, can identify the flow of different equipment and services, and distributes a proper broadband to the equipment and the services, thereby realizing more reasonable network access quality and solving the problem of dynamic control of the routing flow of the power information network.

Further, the traffic data in step S1 includes data ID, data IP address, protocol type, and accumulated number of bytes.

Further, in step S2, the method for adjusting unordered data includes:

performing one-hot (one-hot) encoding on the data IP address and the protocol type in the traffic data to obtain data as follows:

z＝(x₁,x₂,...,x_n)

where z is data, n is the dimension in which the data z are connected together, x_iAre numerical values.

Further, the agent includes a target network and a policy network.

Further preferably, the policy network comprises an input layer, a fully connected layer and an output layer, the input layer being configured to input the x_iThe output layer is configured to output the y_kI.e. priority;

the input layer comprises 16 nodes, the full-link layer comprises a first layer full-link layer and a second layer full-link layer, and the first layer full-link layer and the second layer full-link layer all comprise 256 nodes.

Further, the control method may further comprise awarding a prize b to the agent by_k：

b_k＝-netlag*Sumsize

In the formula, b_kTo give the agent a reward, netlag is the network delay factor and Sumsize is the number of bytes accumulated.

Further, the control method further comprises comparing (z) in the experience pool U by the following formula_k,y_k,b_k,z_k') sampling:

wherein M (i) is after samplingI is the ith quadruple in the experience pool U, m_iIs the weight of the ith data sample, k is the total number of data in the experience pool U, m_jIs the weight of the jth data sample, alpha is the ratio adopted by the priority, A_iAs a dominance function, b_kFor the reward, gamma is the decay factor,

in order to be a policy network,

as a target network, z_kIs the current state of the network environment, and z_kObtaining, from said data set X, z_k' is the next state of the network environment, y_kIs the action currently performed by the agent, and y_kE is Y, the Y comprises low priority, medium priority and high priority, zeta is independent Gaussian noise, and epsilon is a positive number.

Further, updating the agent in step S33 includes updating the policy network parameters by:

wherein j is the serial number of the data extracted from the experience pool U, m is the number of the data extracted from the experience pool U, gamma is the attenuation factor,

in order to be a policy network,

as a target network, z_jIs state, y_jIs z_jAction performed in the State, z_jIs' z_jPerforming action y in State_jThe next state entered, b_jIs z_jPerforming action y in State_jAwarded prize, omega_jAnd the corresponding weight of the jth data in the strategy network parameter updating process is determined by the following formula:

ω_j＝(N*M(j))^-β

in the formula, ω_jAnd (3) the weight corresponding to the jth data when the strategy network parameters are updated is taken as the jth data, N is the sampling number of the experience pool U, M (j) is the sampled data, j is the jth quadruple in the experience pool U, and beta is the hyperparameter.

Further preferably, the updating the agent in step S33 further includes updating the target network parameter:

and after a preset time, obtaining the updated content of the target network parameters from the policy network.

Further, the control method further includes:

the method comprises the steps of carrying out grade division on flow data to be detected according to a preset flow threshold, wherein the grade comprises small flow, medium flow and large flow, the small flow grade corresponds to a high priority, the medium flow corresponds to a medium priority, and the large flow corresponds to a low priority.

The invention has the following advantages: by utilizing interaction between the intelligent agent and the network environment and training learning, reasonable priority can be distributed to the flow data, so that the transmission efficiency and quality of the power network system are ensured.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a first flowchart of a control method for routing traffic of an electrical information network according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a policy network for a control method of routing traffic of an electrical information network according to an embodiment of the present invention;

fig. 3 is a second flowchart of a method for controlling routing traffic of an electrical information network according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood and more clearly understood by those skilled in the art, the technical solutions of the embodiments of the present invention will be described below in detail and completely with reference to the accompanying drawings. It should be noted that the implementations not shown or described in the drawings are in a form known to those of ordinary skill in the art. Additionally, while exemplifications of parameters including particular values may be provided herein, it is to be understood that the parameters need not be exactly equal to the respective values, but may be approximated to the respective values within acceptable error margins or design constraints. It is to be understood that the described embodiments are merely exemplary of a portion of the invention and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. In addition, the terms "comprises" and "comprising," and any variations thereof, in the description and claims of this invention, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In one embodiment of the invention, a control method for routing traffic of a power information network is provided, which comprises the following steps:

s1, capturing flow data and creating a data set, wherein the flow data comprises a data ID, a data IP address, a protocol type and a cumulative byte number, and can be historical flow data, but the protection scope of the invention is not limited by the data;

s3, establishing a network environment and creating an agent, and making the agent interact with the network environment, wherein the agent comprises a target network and a strategy network, and the interaction process specifically comprises the following steps:

S33, for quadruple (z) in the experience pool U_k,y_k,b_k,z_k') sampling and updating the agent, wherein updating the agent comprises updating the policy network parameter and the target network parameter, and for the target network parameter, the updated content of the target network parameter can be obtained from the policy network after a preset time.

And S4, processing the traffic data to be detected through the updated agent to obtain a corresponding priority, and further controlling the routing traffic of the power information network based on the priority. Specifically, the flow data to be detected may be classified into classes according to a preset flow threshold, where the classes include a small flow, a medium flow, and a large flow, the small flow class corresponds to a high priority, the medium flow class corresponds to a medium priority, and the large flow class corresponds to a low priority.

In an embodiment of the present invention, the method for controlling routing traffic includes:

the method comprises the following steps: data packet capture

Flow data collection is a prerequisite for flow analysis and flow control. The process of packet capture is to capture and collect traffic data. In this embodiment, a current specific network environment is used to construct a service traffic characteristic, data is stored in a memory pool with a fixed size, and the traffic data forms a data set X, so that the method training speed is increased, the method can adapt to the network use preference of each department sub-network or application system sub-network, and the method can be updated more flexibly from the existing environment. The service flow characteristics of the data packet collection record in this embodiment are shown in table 1:

TABLE 1 traffic flow characteristics

Name of field	Type (B)	Length of	Description of the invention
				ID	Int (integer type)	16	Data ID
IP	Char (character type)	15	Data IP address
				Protocol	Char (character type)	10	Type of protocol
Sumsize	Int (integer type)	16	Accumulated byte number

Step two: data pre-processing

During the training process, the data set X needs to be converted into a data form that can be used by the method. Because the IP and protocol in the service data type are text type data and are unordered, the two data are subjected to one-hot (one-hot) encoding, and discrete values are expanded to a numerical space, so that data z ═ x (x) of a single service traffic type is obtained₁,x₂,...,x_n) Where n represents the dimension in which the data z are connected together, x_iSpecific numerical values are shown.

Step three: network environment modeling

The core design idea of the control method is to improve the network transmission quality by searching the optimal strategy to distribute the priority of the flow, and mainly depends on an intelligent agent established by reinforcement learning to determine the priority of the service flow and the flow data.

In reinforcement learning, the agent needs to make a decision, i.e. the assignment of priority to the current traffic flow, at each time step. At this time, the state of the environment is z_kWhere k represents the time step and z is sampled from the data set X. Action y of agent_kE Y, Y ═ {0,1,2}, where 0 denotes low priority, 1 denotes medium priority, and 2 denotes high priority, for the current state z_kPerforming action y_kTo assign a priority.

In this embodiment, action y on agent_kSet prize as b_k(ii) a-netlag Sumsize, wherein b_kIn order to give the intelligent agent a reward, netlag is a network delay coefficient, and Sumsize is a cumulative number of bytes, so as to construct a better network experience, and the priority of small byte flow data with low time delay required by the network is higher. The goal of setting the agent is to prioritize small flows as much as possible so that the user's immediate flow is fed back, while large flows of I P downloads etc. will be de-prioritized.

Specifically, the agent includes a target network and a policy network, as shown in FIG. 2, the policy network includes an input layer configured to input the x, a full connection layer, and an output layer_iI.e. feature vectors, the output layer being configured to output the y_kI.e. the result of the priority assignment (high, medium, low). Wherein, in one embodiment of the present invention, the input layer comprises 16 nodes, the fully-connected layer comprises a first layer fully-connected layer and a second layer fully-connected layer, and the first layer fully-connected layer and the second layer fully-connected layer each comprise 256 nodes.

Step four: model building and training

Strategy network in initialization method

Parameter (d) of

Attenuation factor gamma, target network

Network parameters of

And experience pool U (the default data structure for experience pool is SumTree).

Specifically, the first step is the collection of data. First, the agent interacts with the simulated network environment to collect data. At each time step, the agent chooses the action to perform, i.e., assigns a priority, using a greedy approach according to the following formula:

in the formula, y_kIs the action currently performed by the agent, and y_kE.g. Y, said Y comprising low, medium and high priority, argmax being a predefined parameter, in this embodiment argmax indicating that the maximum is obtained

Parameter, z, to which value corresponds_kIs the current state of the network environment, and z_kFrom the data set X, ζ is an independent Gaussian noise that may enhance the exploratory capabilities of the agent.

The agent obtains the next state z after performing the action_k' with reward b_kAnd will quadruple (z)_k,y_k,b_k,z_k') into an experience pool U.

The second step is the training of the model. Firstly, sampling an experience pool U, and regarding any track (z)_k,y_k,b_k,z_k') the sampling method is as follows:

where M (i) is the sampled data, i is the ith quadruple in the experience pool U, and m_iIs the weight of the ith data sample, k is the total number of data in the experience pool U, m_jIs the weight of the jth data sample, alpha is the ratio adopted by the priority, A_iIs an advantage function which is used for measuring the prediction accuracy degree of the current network parameter to the ith data, b_kFor the reward, gamma is the decay factor,

in order to be a policy network,

as a target network, z_kIs the current state of the network environment, and z_kObtaining, from said data set X, z_k' is the next state of the network environment, y_kIs the action currently performed by the agent, and y_kE Y, said Y comprising low, medium and high priority, ζ being independent gaussian noise, and e being a positive number, in this embodiment e being a very small positive number, for avoiding | a_iThe trace with 0 is not sampled.

After sampling is completed, updating the policy network parameters is completed by using the following formula:

in order to be a policy network,

as a target network, z_jIs state, y_jIs z_jAction performed in the State, z_jIs' z_jPerforming action y in State_jThe next state to be entered, argmax being a predefined parameter, b_jIs z_jPerforming action y in State_jAwarded prize, omega_jAnd the corresponding weight of the jth data in the strategy network parameter updating process is determined by the following formula:

ω_j＝(N*M(j))^-β

in the formula, ω_jThe corresponding weight of the jth data when updating the strategy network parameters is defined, and N is the sampling number of the experience pool UThe quantity, m (j), is the sampled data, j is the jth quadruple in the experience pool U, and β is a hyperparameter, in this embodiment, β is a hyperparameter used for adjusting the influence of the weight on the convergence rate.

In addition, it should be noted that only the parameters in the policy network can be updated in a gradient manner;

the target network is shown, in this embodiment, the structure of the target network is the same as that of the policy network, and the parameters of the target network are copied from the policy network after a period of time and do not participate in direct updating. The random gradient descent method selects a learning rate adaptive optimizer and updates the gradient in a small batch of samples.

Step five: flow control

In this embodiment, after the policy network model is trained and stored locally, the local policy model is read first, the traffic data packet to be detected is subjected to data preprocessing to obtain a coded state $ z, the coded state $ z is input into the model to obtain an appropriate priority assigned to the traffic data packet, and then the result is applied to an egress queue to form a queue algorithm scheduling rule, which is shown in fig. 3.

In an embodiment of the present invention, a flow control system based on the control method for the routing flow of the power information network is provided, which includes an environment modeling module, a control construction module, and a flow control module, wherein the environment modeling module is used for modeling a routing environment and adapting to a data input format of the control method; the control construction module is used for establishing a network model and training to obtain a control model; and the flow control module is used for obtaining the current flow control scheme by using the control construction module. Specifically, as shown in fig. 1, after the priority is determined, the flow control is performed based on the priority.

The invention designs a fusion reinforcement learning method, autonomously learns the flow control problem, learns the control method of the current network environment flow in continuous trial and error, and can be better suitable for solving the problem of complex dynamic control of the power information network routing flow.

The above description is only for the preferred embodiment of the present invention and is not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes that can be directly or indirectly applied to other related technical fields using the contents of the present specification and the accompanying drawings are included in the scope of the present invention.

Claims

1. A control method for routing traffic of a power information network is characterized by comprising the following steps:

s1, capturing flow data and creating a data set;

in the formula, y_kIs the action currently performed by the agent, and y_kIs e.g. Y, said Y comprising low, medium and high priority, argmax being a predefined parameter, z_kIs the current state of the network environment, and z_kObtained from the data set X, ζ is the independent Gaussian noise,

is a policy network;

s32, executing action y on the intelligent agent_kThereafter awarding a prize b to said agent_kWith the next state z of the network environment_k', and four tuples (z)_k,y_k,b_k,z_k') into an experience pool U, and repeatedly executing S31-S32 until all z in the data set X are obtained_kCorresponding to y_kTo obtainTo all quadruples (z)_k,y_k,b_k,z_k′)；

2. The control method for routing traffic of the electric power information network according to claim 1, wherein the traffic data in the step S1 includes data ID, data IP address, protocol type, and number of accumulated bytes.

3. The method for controlling routing traffic of an electrical power information network of claim 2, wherein in step S2, the method of adjusting the unordered data comprises:

performing one-hot coding on the data IP address and the protocol type in the flow data to obtain data as follows:

z＝(x₁,x₂,...,x_n)

4. The control method for routing traffic for a power information network of claim 3, wherein the agent comprises a target network and a policy network.

5. The method of claim 4, wherein the policy network comprises an input layer, a fully connected layer, and an output layer, the input layer configured to input the x_iThe output layer is configured to output the y_kI.e. priority;

6. The method of claim 4 for controlling routing traffic for a power information network, the method further comprising awarding a prize b to the agent by_k：

b_k＝-netlag*Sumsize

7. The control method for routing traffic for an electrical power information network of claim 6, further comprising matching (z) in experience pool U by_k,y_k,b_k,z_k') sampling:

where M (i) is the sampled data, i is the ith quadruple in the experience pool U, and m_iIs the weight of the ith data sample, k is the total number of data in the experience pool U, m_jFor the weight of the jth data sample, α is the ratio adopted by the priority, b_kFor awards, gamma is the attenuation factor, z_kIs the current state of the network environment, and z_kObtaining, from said data set X, z_k' is the next state of the network environment, y_kIs the action currently performed by the agent, and y_kE.g., Y, said Y comprising low, medium and high priority, ζ being independent gaussian noise,

in order to be the target network,

and

for policy networks, ε is a positive number, A_iAs a merit function, it is determined by the following equation:

in the formula, b_kFor bonus, gamma is the attenuation factor, argmax is a predefined parameter, z_kIs the current state of the network environment, and z_kObtaining, from said data set X, z_k' is the next state of the network environment, y_kIs the action currently performed by the agent, and y_kE.g., Y, said Y comprising a low priority, a medium priority and a high priority,

in order to be the target network,

and

for a policy network, ζ is the independent gaussian noise.

8. The method of claim 7, wherein the updating the agent in step S33 includes updating the policy network parameters by:

where j is the serial number of the data extracted from the experience pool U, m is the number of the data extracted from the experience pool U, γ is the attenuation factor, z_jIs state, y_jIs z_jAction executed in State, z'_jIs z_jPerforming action y in State_jThe next state entered, b_jIs z_jPerforming action y in State_jThe benefit to be obtained is that the user has,

in order to be the target network,

and

for policy networks, ω_jAnd the corresponding weight of the jth data in the strategy network parameter updating process is determined by the following formula:

ω_j＝(N*M(j))^-β

9. The method for control of power information network routing traffic of claim 8, wherein updating the agent in step S33 further comprises updating the target network parameters:

10. The control method for routing traffic for an electrical power information network of claim 1, wherein the control method further comprises: