CN117082150A

CN117082150A - Scheduling method and device of network controller, computer equipment and storage medium

Info

Publication number: CN117082150A
Application number: CN202311139950.8A
Authority: CN
Inventors: 谭振林; 卢泉; 蓝双凤; 马培勇; 张慧月
Original assignee: China Telecom Technology Innovation Center; China Telecom Corp Ltd
Current assignee: China Telecom Technology Innovation Center; China Telecom Corp Ltd
Priority date: 2023-09-05
Filing date: 2023-09-05
Publication date: 2023-11-17

Abstract

The application relates to a scheduling method, a scheduling device, computer equipment and a storage medium of a network controller. The method comprises the following steps: determining a time attenuation factor of the network controller under the current time step; the network controller is used for controlling each network device in the corresponding network subarea; the time decay factor decreases with increasing time steps; generating a random number in the value range of the time attenuation factor; the value range is determined according to the property of the time attenuation function used in calculating the time attenuation factor; if the random number is smaller than the time attenuation factor, randomly selecting one network subarea from the network subareas to obtain a target network subarea; if the random number is greater than or equal to the time attenuation factor, determining a target network subarea from each network subarea according to the network state change indexes respectively corresponding to each network subarea; the network controller is scheduled to the target network sub-area. The application can improve the network performance.

Description

Scheduling method and device of network controller, computer equipment and storage medium

Technical Field

The present application relates to the field of computer networks, and in particular, to a scheduling method and apparatus for a network controller, a computer device, and a storage medium.

Background

With the rapid development of wireless communication technology, the number of various network devices has also grown rapidly. To optimize the network, a Software defined network controller (Software-Defined Networking Controller, SDN) may be employed to manage and schedule network devices. Currently, the scheduling method for the network controller is mainly static, that is, the scheduling method only depends on preset static rules and schedules the SDN controller manually. However, the above scheduling method cannot well adapt to dynamic changes of network states, thereby affecting network performance.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a scheduling method, apparatus, computer device, and storage medium for a network controller capable of improving network performance.

In a first aspect, the present application provides a scheduling method of a network controller. The method comprises the following steps:

determining a time attenuation factor of the network controller under the current time step; the network controller is used for controlling each network device in the corresponding network subarea; the time decay factor decreases with increasing time steps;

generating a random number in the value range of the time attenuation factor; the value range is determined according to the property of the time attenuation function used in calculating the time attenuation factor;

Comparing the time attenuation factor with the random number;

randomly selecting one network subarea from all network subareas under the condition that the random number is smaller than the time attenuation factor to obtain a target network subarea;

under the condition that the random number is larger than or equal to the time attenuation factor, determining a target network subarea from each network subarea according to the network state change indexes respectively corresponding to each network subarea;

the network controller is scheduled to the target network sub-area.

In a second aspect, the application further provides a scheduling device of the network controller. The device comprises:

the time attenuation factor determining module is used for determining the time attenuation factor of the network controller under the current time step; the network controller is used for controlling each network device in the corresponding network subarea; the time decay factor decreases with increasing time steps;

the random number generation module is used for generating random numbers in the value range of the time attenuation factor; the value range is determined according to the property of the time attenuation function used in calculating the time attenuation factor;

the parameter comparison module is used for comparing the time attenuation factor with the random number;

the network region determining module is used for randomly selecting one network sub-region from the network sub-regions under the condition that the random number is smaller than the time attenuation factor to obtain a target network sub-region; under the condition that the random number is larger than or equal to the time attenuation factor, determining a target network subarea from each network subarea according to the network state change indexes respectively corresponding to each network subarea;

And the controller scheduling module is used for scheduling the network controller to the target network subarea.

In some embodiments, the network state change indexes respectively corresponding to the network subareas include the network state change indexes of the network subareas under the respective scheduling time steps; scheduling time steps are corresponding time steps when controller scheduling events occur in the corresponding network subareas; the controller schedules events, which are events that schedule the network controller to the corresponding network sub-area. The network area determining module is further used for respectively calculating the average value of the network state change indexes of each network subarea under the respective scheduling time step under the condition that the random number is larger than or equal to the time attenuation factor, so as to obtain the network performance rewards corresponding to each network subarea; and determining a target network subarea from the network subareas according to the network performance rewards corresponding to the network subareas.

In some embodiments, the network area determining module is further configured to determine a network area with the largest network performance reward from among the network areas, to obtain the target network area.

In some embodiments, the scheduling device of the network controller further includes a reward updating module, configured to determine a current network performance of the target network sub-area, and obtain a first network performance; and updating the network performance rewards corresponding to the target network subareas according to the first network performance, and obtaining updated network performance rewards.

In some embodiments, the reward update module is further configured to determine a second network performance; the second network performance is the network performance of the target network sub-area before the network controller is scheduled to the target network sub-area; determining a target network performance change index of the target network subarea under the current time step according to the difference between the first network performance and the second network performance; and updating the network performance rewards corresponding to the target network subareas according to the target network performance change indexes to obtain updated network performance rewards.

In some embodiments, the reward updating module is further configured to calculate a sum of network state change indexes of the target network sub-area under each corresponding scheduling time step, to obtain a first index total value; obtaining a second index total value according to the sum of the first index total value corresponding to the target network subarea and the target network performance change index; and updating the network performance rewards corresponding to the target network subareas according to the second index total value to obtain updated network performance rewards.

In some embodiments, the reward updating module is further configured to determine a sum of the current time step and a step number of each scheduling time step corresponding to the target network sub-area, to obtain a total scheduling step number; and determining the ratio of the total value of the second index to the total scheduling step number as the updated network performance rewards.

In a third aspect, the present application provides a computer device comprising a memory and a processor, the memory storing a computer program, the processor executing the computer program to perform the steps of the method for scheduling a network controller as described above.

In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the scheduling method of a network controller as described above.

In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the scheduling method of a network controller as described above.

The scheduling method, the scheduling device, the computer equipment, the storage medium and the computer program product of the network controller are realized by determining the time attenuation factor of the network controller under the current time step; the network controller is used for controlling each network device in the corresponding network subarea; the time decay factor decreases with increasing time steps; generating a random number in the value range of the time attenuation factor; the value range is determined according to the property of the time attenuation function used in calculating the time attenuation factor; comparing the time attenuation factor with the random number; randomly selecting one network subarea from all network subareas under the condition that the random number is smaller than the time attenuation factor to obtain a target network subarea; under the condition that the random number is larger than or equal to the time attenuation factor, determining a target network subarea from each network subarea according to the network state change indexes respectively corresponding to each network subarea; the network controller is scheduled to the target network sub-area. According to the application, by calculating the time attenuation factor used by the network controller in each time step and automatically executing different scheduling strategies according to the comparison condition between the time attenuation factor and the random number, namely, automatically executing the strategy of randomly selecting the network subarea or automatically executing the strategy of selecting the better target network subarea according to the network performance change index, the application does not need to be manually adjusted, and can better cope with the dynamic change of the network environment, thereby improving the network performance. In addition, the application combines the time attenuation strategy, and continuously reduces the time attenuation factor through the increase of the time steps, so that more exploration can be realized in the initial stage, more network controller distribution is tried, and the decision is made as much as possible by utilizing the known knowledge along with the time, thereby further realizing the dynamic optimization of the network controller distribution and further improving the network performance.

Drawings

Fig. 1 is an application environment schematic diagram of a scheduling method of a network controller according to an embodiment of the present application;

fig. 2 is a flow chart of a scheduling method of a network controller according to an embodiment of the present application;

fig. 3 is a schematic diagram of probability of scheduling a network controller to each network sub-area according to an embodiment of the present application;

fig. 4 is a flowchart of another scheduling method of a network controller according to an embodiment of the present application;

fig. 5 is a block diagram of a scheduling system of a network controller according to an embodiment of the present application;

fig. 6 is an internal structure diagram of a computer device according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The scheduling method of the network controller provided by the embodiment of the application can be applied to an application environment shown in fig. 1. The server 102 may communicate with each network device 104 through a network controller, for example, the server 102 may communicate with the network device 1, the network devices 2, … …, and the network device N through the network controller 1, respectively, to manage the network devices through the network controller 1. The data storage system may store data that needs to be processed by the server 102, and the data storage system may be integrated on the server 102, or may be placed on the cloud or other network servers 102. The server 102 determines a time decay factor of the network controller at the current time step; the network controller is configured to control each network device 104 in the corresponding network sub-area; the time decay factor decreases with increasing time steps; generating a random number in the value range of the time attenuation factor; the value range is determined according to the property of the time attenuation function used in calculating the time attenuation factor; comparing the time attenuation factor with the random number; randomly selecting one network subarea from all network subareas under the condition that the random number is smaller than the time attenuation factor to obtain a target network subarea; under the condition that the random number is larger than or equal to the time attenuation factor, determining a target network subarea from each network subarea according to the network state change indexes respectively corresponding to each network subarea; the network controller is scheduled to the target network sub-area.

In some embodiments, as shown in fig. 2, a scheduling method of a network controller is provided, and the method is applied to the server of fig. 1 for illustration. In this embodiment, the method includes, but is not limited to, the steps of:

step S202, determining a time attenuation factor of the network controller under the current time step.

The network controller is a core component in a software defined network architecture and is used for centrally managing and controlling each network device in the network subarea managed by the network controller, so that the flexibility, the programmability and the automation of the network are realized, and higher performance, safety and reliability are provided for the network. It should be noted that each network sub-area may constitute an entire network area.

Time step refers to the period in which decisions are made or network states are observed. The specific setting of the time steps depends on the application requirements and the real-time nature of the system, for example, the time steps may be set to every minute, every 5 minutes, every hour, etc.

The current time step refers to the time step corresponding to the current time point.

The time-decay factor is a decay factor used in time-series analysis to reduce the importance of past observations while weighting the importance of current observations. In an embodiment of the application, the time decay factor decreases with increasing time steps.

Specifically, the server substitutes the current time step into the time decay function to calculate a time decay factor of the network controller at the current time step.

Wherein the time decay function is a mathematical function for calculating a time decay factor, the increase in time steps may be mapped to a decrease in time decay factor. In practical applications, the selection of a suitable time-decay function depends on the characteristics of the time sequence and specific requirements, and different time-decay functions can adapt to different data distribution and change rules.

Step S204, generating random numbers in the range of the time attenuation factors.

The value range of the time attenuation factor is determined according to the property of the time attenuation function used in calculating the attenuation factor.

In some embodiments, the range of values of the time decay function may be determined directly as the range of values of the time decay factor. In other embodiments, the range of values of the decay factor may be further determined from the range of values of the time decay function, depending on the requirements of the particular application.

Specifically, in order to effectively perform comparison and selection between exploration (i.e., step S208) and utilization (i.e., step S210), it is necessary to ensure that the range of values of the random number is consistent with the range of values of the time-decay factor, i.e., the server randomly takes a number from the range of values of the time-decay factor, and obtains the random number.

Step S206, comparing the time attenuation factor with the random number.

Specifically, the server compares the time decay factor with the random number to determine which scheduling policy the network controller is currently required to execute.

Step S208, randomly selecting one network subarea from all network subareas under the condition that the random number is smaller than the time attenuation factor, and obtaining the target network subarea.

Specifically, in the case that the random number is smaller than the time attenuation factor, the server may then use an "exploration" policy, that is, may attempt new network controller distribution in a random manner, for example, randomly select one network sub-area from each network sub-area, and use the selected network sub-area as the target network sub-area.

Step S210, determining a target network subarea from each network subarea according to the network state change indexes respectively corresponding to each network subarea under the condition that the random number is larger than or equal to the time attenuation factor;

the network state change index refers to an index capable of reflecting the change condition of network performance in a network subarea. In some embodiments, the network state change indicator includes at least one of network delay, network bandwidth, network packet loss rate, network jitter, network throughput, or network reachability.

It will be appreciated that network delay refers to the time required for network data to travel from a sender to a receiver, with lower delays generally indicating better network connection quality. Network bandwidth refers to the amount of data transmitted by the network per unit time, with higher bandwidth indicating that the network can support greater data transmission. The network packet loss rate refers to the proportion of data packets lost in the network transmission process, and lower packet loss rate indicates that the stability of network connection is better. Network jitter refers to instability of transmission time of network data packets, namely, change of interval time of data packets reaching a receiving end, and lower jitter indicates higher stability of data packet transmission. The network throughput refers to the data volume transmitted by the network in unit time, and can reflect whether the network connection has high utilization rate. Network reachability refers to the probability of normal operation of the network within a certain time, i.e. the reliability of the network service, and higher reachability means better reliability of the network service.

Specifically, when the random number is greater than or equal to the time attenuation factor, the server selects one network subarea from the network subareas according to the network state change indexes respectively corresponding to the network subareas, and determines the selected network subarea as a target network subarea.

Step S212, the network controller is scheduled to the target network sub-area.

Specifically, the server schedules the network controller to the target network subregion to enable the target network subregion to control each network device in the target network subregion.

Illustratively, it is assumed that the network sub-area corresponding to the network controller before scheduling is the network sub-area 1, which can manage each network device within the network sub-area 1. At the current time step, then, a further network sub-area, i.e. the network sub-area 2 is determined as the target network sub-area, can be determined by means of the above-described embodiments. At this time, the network controller needs to be scheduled to the network sub-area 2 to cancel the authority of the network controller to manage each network device in the network sub-area 1, and give the network controller authority to manage each network device in the network sub-area 2, so that the network controller manages each network device in the network sub-area 2.

According to the scheduling method of the network controller, the time attenuation factor of the network controller under the current time step is determined; the network controller is used for controlling each network device in the corresponding network subarea; the time decay factor decreases with increasing time steps; generating a random number in the value range of the time attenuation factor; the value range is determined according to the property of the time attenuation function used in calculating the time attenuation factor; comparing the time attenuation factor with the random number; randomly selecting one network subarea from all network subareas under the condition that the random number is smaller than the time attenuation factor to obtain a target network subarea; under the condition that the random number is larger than or equal to the time attenuation factor, determining a target network subarea from each network subarea according to the network state change indexes respectively corresponding to each network subarea; the network controller is scheduled to the target network sub-area. According to the application, by calculating the time attenuation factor used by the network controller in each time step and automatically executing different scheduling strategies according to the comparison condition between the time attenuation factor and the random number, namely, automatically executing the strategy of randomly selecting the network subarea or automatically executing the strategy of selecting the better target network subarea according to the network performance change index, the application does not need to be manually adjusted, and can better cope with the dynamic change of the network environment, thereby improving the network performance. In addition, the application combines the time attenuation strategy, and continuously reduces the time attenuation factor through the increase of the time steps, so that more exploration can be realized in the initial stage, more network controller distribution is tried, and the decision is made as much as possible by utilizing the known knowledge along with the time, thereby further realizing the dynamic optimization of the network controller distribution and further improving the network performance.

In some embodiments, the network state change indexes respectively corresponding to the network subareas include the network state change indexes of the network subareas under the respective scheduling time steps; scheduling time steps are corresponding time steps when controller scheduling events occur in the corresponding network subareas; the controller schedules events, which are events that schedule the network controller to the corresponding network sub-area. Step S210 specifically includes, but is not limited to, including: under the condition that the random number is larger than or equal to the time attenuation factor, respectively calculating the average value of the network state change indexes of each network subarea under the respective scheduling time step to obtain the network performance rewards corresponding to each network subarea; and determining a target network subarea from the network subareas according to the network performance rewards corresponding to the network subareas.

Wherein, the network performance rewards refer to the average value of the network state change indexes of the network subareas under the corresponding scheduling time steps.

Specifically, under the condition that the random number is greater than or equal to the time attenuation factor, the server calculates the average value of the network state change indexes of each network subarea under the respective scheduling time step respectively, and obtains the network performance rewards corresponding to each network subarea. The server determines a network subarea from the network subareas according to the network performance rewards corresponding to the network subareas, and takes the determined network subarea as a target network subarea.

It can be seen that, in this embodiment, in the case that the random number is greater than or equal to the time decay factor, it is indicated that, in the current time step, the network controller needs to execute the "utilization" policy, that is, adhere to the current optimal controller distribution, where the current optimal controller distribution is determined by the network performance rewards respectively corresponding to the network sub-areas. In this way, it can be determined to which network sub-area the network controller is scheduled to achieve better network performance.

In some embodiments, the step of determining the target network sub-area from each network sub-area according to the respective corresponding network performance rewards of each network sub-area includes, but is not limited to, specifically including: and determining a network subarea with the maximum network performance rewards from the network subareas, and obtaining a target network subarea.

Specifically, the server determines a network subarea with the largest network performance rewards from the network subareas, and takes the network subarea as a target network subarea. This is because the improvement of the network state change of the target network sub-area is most obvious after the corresponding network controller is scheduled to the target network sub-area in the previous network controller scheduling process. Therefore, under the current time step, the network controller is scheduled to the target network subarea with the largest network performance rewards, and the probability of improving the network state change can be improved, so that better network performance is obtained.

In some embodiments, after step S212, the scheduling method of the network controller specifically further includes, but is not limited to, including: determining the current network performance of a target network subarea to obtain a first network performance; and updating the network performance rewards corresponding to the target network subareas according to the first network performance, and obtaining updated network performance rewards.

Specifically, the server determines the current network performance of the target network subarea to obtain a first network performance; and updating the network performance rewards corresponding to the target network subareas according to the first network performance, and obtaining updated network performance rewards. It should be noted that the updated network performance rewards are used to provide references for determining new target network sub-areas for the network controllers in subsequent time steps, so as to implement dynamic and intelligent network controller scheduling.

In some embodiments, the step of updating the network performance rewards corresponding to the target network sub-area according to the first network performance and obtaining the updated network performance rewards specifically includes, but is not limited to, including: determining a second network performance; determining a target network performance change index of the target network subarea under the current time step according to the difference between the first network performance and the second network performance; and updating the network performance rewards corresponding to the target network subareas according to the target network performance change indexes to obtain updated network performance rewards.

Wherein the second network performance is the network performance of the target network sub-area before the network controller is scheduled to the target network sub-area.

Specifically, the server determines the network performance of the target network sub-area before scheduling the network controller to the target network sub-area, resulting in the second network performance. The server determines a difference between the first network performance and the second network performance as a target network performance change index for the target network sub-area at the current time step. And the server updates the network performance rewards corresponding to the target network subareas according to the target network performance change indexes to obtain updated network performance rewards.

In some embodiments, the step of updating the network performance rewards corresponding to the target network sub-area according to the target network performance change index to obtain updated network performance rewards specifically includes, but is not limited to, including: calculating the sum of network state change indexes of the target network subarea under each corresponding scheduling time step to obtain a first index total value; obtaining a second index total value according to the sum of the first index total value corresponding to the target network subarea and the target network performance change index; and updating the network performance rewards corresponding to the target network subareas according to the second index total value to obtain updated network performance rewards.

Specifically, the server calculates the sum of the network state change indexes of the target network subarea under each corresponding scheduling time step, and determines the calculated sum as a first index total value. And the server adds the first index total value corresponding to the target network subarea and the target network performance change index to obtain a second index total value. And the server updates the network performance rewards corresponding to the target network subareas according to the second index total value to obtain updated network performance rewards.

In some embodiments, the step of updating the network performance rewards corresponding to the target network sub-area according to the second index total value to obtain updated network performance rewards specifically includes, but is not limited to, including: determining the sum of the steps of each scheduling time step corresponding to the current time step and the target network subarea to obtain the total scheduling step number; and determining the ratio of the total value of the second index to the total scheduling step number as the updated network performance rewards.

Specifically, the server adds the current time step and each scheduling time step corresponding to the target network subarea to obtain the total scheduling step number. And the server divides the total value of the second index by the total scheduling steps to obtain updated network performance rewards.

In some embodiments, the policy to schedule the network controller is a reinforcement learning based dynamic network controller scheduling policy. For example, an εgreedy strategy (ε -greedy strategy) that incorporates time decay is used. In the epsilon-greedy strategy, when the proxy selects actions, exploratory actions are performed with epsilon probability (in the embodiment of the application, a network subarea is randomly selected as a target network subarea), and utilitarian actions are performed with 1-epsilon probability (in the embodiment of the application, an optimal network subarea is selected according to the network state change index as a target network subarea). Wherein epsilon is the time attenuation factor.

And the epsilon-greedy strategy combined with the time attenuation is a mechanism for introducing the time attenuation on the basis of the epsilon-greedy strategy. Specifically, initially, the value of ε is larger and exploratory actions are more favored in order to better explore the environment. Over time, epsilon will decay as a function of the decay over time, more toward a utilitarian action to better utilize the existing knowledge. Therefore, smooth transition from exploration to utilization can be realized, and automatic and intelligent network controller scheduling is realized, so that network performance is improved.

Exemplary, if the network area is divided into A ₁ 、A ₂ And A ₃ These three network sub-areas and each have one network controller, namely controller 1, controller 2 and controller 3, respectively. As shown in fig. 3, taking the controller 1 as an example, the controller 1 randomly selects an action to "explore" with the probability of ε (t), i.e., randomly moves to a ₂ And A ₃ Area, and observe whether better network performance can be obtained. Or selecting the best action currently known with a probability of 1-epsilon (t) to "utilize", where the controller 1 can be considered to remain in the area, observing whether better network performance is obtained. Wherein epsilon (t) is set as a time decay function, and in the initial stage, the value of epsilon is larger, more 'exploration' is carried out, the value of epsilon gradually decreases along with the time, and more decisions are made depending on the existing knowledge.

In some embodiments, as shown in fig. 4, the scheduling method of the network controller of the present application includes: at each time step, i.e. at each t, epsilon (t), i.e. the temporal decay factor, is first calculated from the current time step and the temporal decay function. Whether to randomly select one of the network sub-areas or the network sub-area with the largest average prize R is decided based on the relation of the random number R and epsilon (t). The network controller is then "moved" to the selected area, calculates the prize P based on the change in network performance, updates the average prize R, and then proceeds to the next time step, which continues until some termination condition is met, such as the maximum number of time steps or average prize convergence. Wherein, the reward P refers to the target network performance change index of the selected network subarea under the current time step, and the average reward R refers to the network performance reward corresponding to the selected network subarea.

In practical applications, the time decay function is ε (t) =1/(1+t), and the corresponding value range is 0 to 1, so the random number r generation range can be determined to be 0 to 1. Further, the time step is determined to be 5 minutes, and in the following 3 time steps, the network controller will perform epsilon-greedy policy scheduling based on the time decay function.

In the case of being located at time step 1, then execution is performed:

1. the epsilon value is calculated, since it is the 1 st time step, so t=1, and epsilon (1) =0.5 is calculated according to the formula epsilon (t) =1/(1+t).

2. A random number between 0 and 1 is randomly generated, assuming that the random number generated at this time is 0.7.

3. Because the random number 0.7 is greater than ε (1), the network sub-region with the largest average prize R can be selected. Since the average prize R for all network sub-areas is 0 at this time, one network sub-area can be arbitrarily selected, assuming selection A ₁ An area.

4. Scheduling network controllers to A ₁ In the area, the change of the network performance is observed, and if the network delay is reduced from 100ms to 80ms, the promotion percentage of the network performance can be used as the reward P to calculate the reward P ₁₁ (100-80)/100=0.2. Wherein, rewards P ₁₁ Refers to A at time step 1 ₁ Target network performance change index for the region.

5. Update A ₁ Average prize for zone, new average prize R ₁ = (0+0.2)/2=0.1. Wherein a new average prize R ₁ Is A ₁ And updating network performance rewards corresponding to the areas.

In the case of being located at time step 2, then execution is performed:

1. the epsilon value was calculated, since it was the 2 nd time step, so t=2, and epsilon (2) =1/(1+2) =0.33 was calculated.

2. A random number between 0 and 1 is randomly generated, assuming that the random number generated at this time is 0.2.

3. Because the random number 0.2 is smaller than ε (2), one network sub-region can be randomly selected, assuming selection A ₃ 。

4. Will beThe network controller moves to zone a ₃ Observing the change in network performance, calculating the prize P assuming that the network delay increases from 80ms to 100ms ₁₃ Is (80-100)/80= -0.25. Wherein, rewards P ₁₃ Refers to A at time step 2 ₃ Target network performance change index for the region.

5. Update A ₃ Average prize for zone, new average prize R ₃ = (0-0.25)/2= -0.125. Wherein a new average prize R ₃ Is A ₃ And updating network performance rewards corresponding to the areas.

In the case of being located at time step 3, then execution is performed:

1. the epsilon value was calculated, since it was the 3 rd time step, so t=3, and epsilon (3) =1/(1+3) =0.25 was calculated.

2. A random number between 0 and 1 is randomly generated, assuming that the random number generated at this time is 0.4.

3. Because the random number 0.4 is greater than ε (3), the network sub-region with the largest average prize R needs to be selected, at which point A is selected ₁ Area due to A ₁ The average rewards of the areas are 0.1 which is larger than A ₂ Region and A ₂ Average rewards for the area.

4. Move the controller to A ₁ Regional, observed network performance change, assuming network delay is reduced from 100ms to 75ms, and prize P is calculated ₁₂ (100-75)/100=0.25. Wherein, rewards P ₁₂ Refers to region A at time step 3 ₁ Target network performance variation index of (a).

5. Update A ₁ Average prize of area, new average prize is R ₁ = (0.1+0.25)/2=0.175. Wherein a new average prize R ₁ Is A ₁ And updating network performance rewards corresponding to the areas.

Through the steps, the epsilon-greedy strategy is seen to find balance between random exploration and selection of the optimal area, distribution of the controller is adjusted according to changes of network performance, and the network performance is improved through iterative updating and optimization.

Further, for the calculation of the average prize R, the following is specified:

suppose for a certain A _i A region which has been selected to N _i Times, and the prize values obtained in these times are P respectively _i1 、P _i2 、……、P _iNi . Then the A _i Average prize R for zone _i Can be calculated by the following formula (1). If in the next time step, A _i The region is again selected and a new prize P is obtained _i(Ni+1) Then at the next time step, A _i The average prize of a zone should be calculated according to the following formula (2).

Wherein N is _i +1 means A _i The region has been selected to N _i +1 times, P _i(Ni+1) Refers to the position of N _i In +1, A _i The prize value obtained by the region, R _i,new Is A _i New average prize value, R in zone _i,old Is A _i The old average prize value in the region, equation (2), ensures that each new prize is properly incorporated into the calculation of the average prize R.

It should be noted that, in the embodiment of the present application, different time decay functions may be selected according to different requirements. Such as a time decay function ε (t) =e≡ (- λt), characterized by a gradual approach of ε (t) to 0 at an exponential rate over time, which is faster than ε (t) =1/(1+t). This means that the system is more prone to exploration in the initial phase, whereas over time the likelihood of exploration decreases rapidly, the system is more prone to utilizing the known optimal strategy.

In some embodiments, the scheduling method of the network controller of the present application specifically further includes, but is not limited to, the following steps:

(1) A time decay factor of the network controller at the current time step is determined.

(2) A random number is generated that lies within the range of values of the time decay factor.

(3) Comparing the time attenuation factor with the random number, if the random number is smaller than the time attenuation factor, executing the step (4) and the steps (7) to (14), and if the random number is larger than or equal to the time attenuation factor, executing the steps (5) to (14).

(4) And randomly selecting one network subarea from all the network subareas to obtain a target network subarea.

(5) And respectively calculating the average value of the network state change indexes of each network subarea under the respective scheduling time step to obtain the network performance rewards corresponding to each network subarea.

(6) And determining a network subarea with the maximum network performance rewards from the network subareas, and obtaining a target network subarea.

(7) The network controller is scheduled to the target network sub-area.

(8) And determining the current network performance of the target network subarea to obtain the first network performance.

(9) A second network performance is determined.

(10) And determining a target network performance change index of the target network subarea under the current time step according to the difference between the first network performance and the second network performance.

(11) And calculating the sum of the network state change indexes of the target network subarea under the corresponding scheduling time steps to obtain a first index total value.

(12) And obtaining a second index total value according to the sum of the first index total value corresponding to the target network subarea and the target network performance change index.

(13) And determining the sum of the steps of each scheduling time step corresponding to the current time step and the target network subarea to obtain the total scheduling step number.

(14) And determining the ratio of the total value of the second index to the total scheduling step number as the updated network performance rewards.

It should be noted that, the scheme innovatively applies the epsilon-greedy strategy based on time attenuation to the scheduling of the network controller, and continuously updates the strategy by observing the change of the network state and calculating rewards, thereby realizing the dynamic and intelligent scheduling of the network controller. In addition, embodiments of the present application combine the epsilon-greedy strategy of time decay to achieve a balance between "exploration" (trying new controller distribution) to "utilization" (persisting to use the current optimal controller distribution). Over time, the search probability ε will decay according to a time decay function, enabling a smooth transition from search to utilization.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a scheduling device of the network controller. The implementation of the solution provided by the system is similar to the implementation described in the above method, so the specific limitation in the embodiments of the scheduling apparatus of one or more network controllers provided below may refer to the limitation of the scheduling method of the network controller hereinabove, and will not be repeated herein.

As shown in fig. 5, an embodiment of the present application provides a scheduling apparatus for a network controller, including:

a time attenuation factor determining module 502, configured to determine a time attenuation factor of the network controller at a current time step; the network controller is used for controlling each network device in the corresponding network subarea; the time decay factor decreases with increasing time steps;

a random number generation module 504, configured to generate a random number within a range of values of the time-decay factor; the value range is determined according to the property of the time attenuation function used in calculating the time attenuation factor;

a parameter comparison module 506, configured to compare the time attenuation factor with a random number;

the network region determining module 508 is configured to randomly select one network region from the network regions to obtain a target network region when the random number is smaller than the time attenuation factor; under the condition that the random number is larger than or equal to the time attenuation factor, determining a target network subarea from each network subarea according to the network state change indexes respectively corresponding to each network subarea;

a controller scheduling module 510, configured to schedule the network controller to the target network sub-area.

The scheduling device of the network controller determines the time attenuation factor of the network controller under the current time step; the network controller is used for controlling each network device in the corresponding network subarea; the time decay factor decreases with increasing time steps; generating a random number in the value range of the time attenuation factor; the value range is determined according to the property of the time attenuation function used in calculating the time attenuation factor; comparing the time attenuation factor with the random number; randomly selecting one network subarea from all network subareas under the condition that the random number is smaller than the time attenuation factor to obtain a target network subarea; under the condition that the random number is larger than or equal to the time attenuation factor, determining a target network subarea from each network subarea according to the network state change indexes respectively corresponding to each network subarea; the network controller is scheduled to the target network sub-area. According to the application, by calculating the time attenuation factor used by the network controller in each time step and automatically executing different scheduling strategies according to the comparison condition between the time attenuation factor and the random number, namely, automatically executing the strategy of randomly selecting the network subarea or automatically executing the strategy of selecting the better target network subarea according to the network performance change index, the application does not need to be manually adjusted, and can better cope with the dynamic change of the network environment, thereby improving the network performance. In addition, the application combines the time attenuation strategy, and continuously reduces the time attenuation factor through the increase of the time steps, so that more exploration can be realized in the initial stage, more network controller distribution is tried, and the decision is made as much as possible by utilizing the known knowledge along with the time, thereby further realizing the dynamic optimization of the network controller distribution and further improving the network performance.

In some embodiments, the network state change indexes respectively corresponding to the network subareas include the network state change indexes of the network subareas under the respective scheduling time steps; scheduling time steps are corresponding time steps when controller scheduling events occur in the corresponding network subareas; the controller schedules events, which are events that schedule the network controller to the corresponding network sub-area. The network region determining module 508 is further configured to, when the random number is greater than or equal to the time attenuation factor, calculate an average value of network state change indexes of each network sub-region under respective scheduling time steps, and obtain respective corresponding network performance rewards of each network sub-region; and determining a target network subarea from the network subareas according to the network performance rewards corresponding to the network subareas.

In some embodiments, the network area determining module 508 is further configured to determine a network subarea with the greatest network performance reward from the network subareas, so as to obtain the target network subarea.

The above-mentioned respective modules in the scheduling apparatus of the network controller may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In some embodiments, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store data related to scheduling of the network controller. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements the steps in the scheduling method of the network controller described above.

It will be appreciated by those skilled in the art that the structure shown in FIG. 6 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In some embodiments, a computer device is provided, comprising a memory storing a computer program and a processor implementing the steps of the method embodiments described above when the computer program is executed.

In some embodiments, a computer readable storage medium is provided, the computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

In some embodiments, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.

Those skilled in the art will appreciate that implementing all or part of the above described embodiment methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method for scheduling a network controller, comprising:

Comparing the time attenuation factor with the random number;

determining a target network subarea from each network subarea according to the network state change index corresponding to each network subarea when the random number is larger than or equal to the time attenuation factor;

scheduling the network controller to the target network sub-area.

2. The method of claim 1, wherein the network state change indicators corresponding to the network sub-regions respectively include network state change indicators of the network sub-regions respectively under respective scheduling time steps; the scheduling time step is a corresponding time step when a controller scheduling event occurs in the corresponding network subarea; the controller schedules the event, which is an event for scheduling the network controller to the corresponding network subarea;

and determining a target network subarea from the network subareas according to the network state change indexes respectively corresponding to the network subareas when the random number is greater than or equal to the time attenuation factor, wherein the determining comprises the following steps of:

Under the condition that the random number is larger than or equal to the time attenuation factor, respectively calculating the average value of network state change indexes of each network subarea under respective scheduling time steps to obtain network performance rewards corresponding to each network subarea;

and determining a target network subarea from the network subareas according to the network performance rewards corresponding to the network subareas.

3. The method of claim 2, wherein the determining the target network sub-area from each of the network sub-areas according to the respective network performance rewards for each of the network sub-areas comprises:

and determining a network subarea with the largest network performance rewards from the network subareas, and obtaining the target network subarea.

4. The method of claim 1, wherein after said scheduling the network controller to the target network sub-area, the method further comprises:

determining the current network performance of the target network subarea to obtain a first network performance;

and updating the network performance rewards corresponding to the target network subareas according to the first network performance to obtain updated network performance rewards.

5. The method of claim 4, wherein updating the network performance rewards corresponding to the target network sub-area according to the first network performance, and obtaining updated network performance rewards, comprises:

determining a second network performance; the second network performance is a network performance of the target network sub-area before the network controller is scheduled to the target network sub-area;

determining a target network performance change index of the target network subarea under the current time step according to the difference between the first network performance and the second network performance;

and updating the network performance rewards corresponding to the target network subareas according to the target network performance change indexes to obtain updated network performance rewards.

6. The method of claim 5, wherein updating the network performance rewards corresponding to the target network sub-area according to the target network performance change index to obtain updated network performance rewards comprises:

calculating the sum of network state change indexes of the target network subarea under each corresponding scheduling time step to obtain a first index total value;

Obtaining a second index total value according to the sum of the first index total value corresponding to the target network subarea and the target network performance change index;

and updating the network performance rewards corresponding to the target network subareas according to the second index total value to obtain updated network performance rewards.

7. The method of claim 6, wherein updating the network performance rewards corresponding to the target network sub-area according to the second index total value to obtain updated network performance rewards comprises:

determining the sum of the step numbers of each scheduling time step corresponding to the current time step and the target network subarea to obtain the total scheduling step number;

and determining the ratio of the second index total value to the total scheduling step number as updated network performance rewards.

8. A scheduling apparatus for a network controller, comprising:

the network region determining module is used for randomly selecting one network sub-region from all the network sub-regions to obtain a target network sub-region under the condition that the random number is smaller than the time attenuation factor; determining a target network subarea from each network subarea according to the network state change index corresponding to each network subarea when the random number is larger than or equal to the time attenuation factor;

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.