WO2023207663A1 - Traffic scheduling method - Google Patents

Traffic scheduling method Download PDF

Info

Publication number
WO2023207663A1
WO2023207663A1 PCT/CN2023/088860 CN2023088860W WO2023207663A1 WO 2023207663 A1 WO2023207663 A1 WO 2023207663A1 CN 2023088860 W CN2023088860 W CN 2023088860W WO 2023207663 A1 WO2023207663 A1 WO 2023207663A1
Authority
WO
WIPO (PCT)
Prior art keywords
action
combination
traffic
cloud
action combination
Prior art date
Application number
PCT/CN2023/088860
Other languages
French (fr)
Chinese (zh)
Inventor
彭小新
王晓亮
宋扬
薛蹦蹦
于兴兴
李嘉
徐婷婷
程冬旭
Original Assignee
阿里云计算有限公司
阿里巴巴(中国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里云计算有限公司, 阿里巴巴(中国)有限公司 filed Critical 阿里云计算有限公司
Publication of WO2023207663A1 publication Critical patent/WO2023207663A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1023Server selection for load balancing based on a hash applied to IP addresses or costs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network

Definitions

  • This specification relates to the field of communication technology, and in particular, to a traffic scheduling method.
  • VNF Virtual Network Function
  • ECS Elastic Compute Service
  • this specification provides a traffic scheduling method to solve the deficiencies in related technologies.
  • a traffic scheduling method includes:
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the steps of the method described in the first aspect are implemented.
  • a network function virtualization NFV platform controller including a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the The described procedures are steps for implementing the method described in the first aspect.
  • the dynamics of the tenant traffic to be scheduled can be achieved , Personalized cloud host resource allocation.
  • the scheduling decisions of tenant traffic to be scheduled in this manual can be applied to different actual application scenarios in a timely manner. And ensure that tenant traffic is reasonably scheduled to the corresponding cloud host, thereby achieving reasonable allocation of cloud host resources.
  • the trained reinforcement learning model can meet the traffic scheduling purpose expected in this specification. For example, by preferentially scheduling the traffic of tenants currently to be allocated to cloud hosts that have been allocated the traffic of other tenants, the resource utilization of the corresponding cloud hosts can be improved within a safe range, and the total cost of cloud host resources can be reduced. Moreover, since the number of started cloud hosts can be reduced under the same conditions, it can at least save the basic resource consumption of system operation, heat dissipation and other aspects necessary for the operation of these cloud hosts, and reduce greenhouse gas emissions, which is conducive to early deployment. Achieve the goal of peak carbon neutrality.
  • Figure 1 is a schematic architectural diagram of a traffic scheduling system according to an exemplary embodiment of this specification
  • FIG. 2 is a schematic flowchart of a traffic scheduling method according to an exemplary embodiment of this specification
  • Figure 3 is a schematic diagram of cloud host allocation for tenant traffic according to an exemplary embodiment of this specification
  • Figure 4 is a schematic structural diagram of a network function virtualization NFV platform controller according to an exemplary embodiment of this specification
  • Figure 5 is a schematic structural diagram of a traffic scheduling device according to an exemplary embodiment of this specification.
  • first, second, third, etc. may be used in this specification to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from each other.
  • first information may also be called second information, and similarly, the second information may also be called first information.
  • word “if” as used herein may be interpreted as "when” or “when” or “in response to determining.”
  • NFV Network Functions Virtualization
  • the elephant flow in tenant traffic is usually offloaded using a virtual processor distributed processing unit (VDPU) to reduce the waste of idle resources.
  • VDPU virtual processor distributed processing unit
  • the corresponding scheduling plan is usually obtained based on weighted calculation.
  • weighted calculation due to the uncertainty caused by dynamic changes in tenant traffic, values such as weight values in the above weighted calculation will not be applicable to different actual application scenarios in a timely manner. Therefore, this specification proposes the following technical solutions to solve the above problems.
  • FIG. 1 is a schematic architectural diagram of a traffic scheduling system according to an exemplary embodiment of this specification. As shown in Figure 1, a reinforcement learning module 11 and an NFV platform controller 12 may be included.
  • the reinforcement learning (Reinforcement Learning, RL) module 11 is used to receive the environmental data of the corresponding NFV platform, and generate corresponding traffic scheduling decisions for the tenant traffic to be scheduled in the above-mentioned NFV platform, so that the NFV platform controller 12 makes the traffic scheduling decisions according to the traffic scheduling decisions.
  • the unscheduled tenant traffic is scheduled so that the unscheduled tenant traffic is processed by the cloud host indicated by the decision.
  • the reinforcement learning module 11 can use any type of reinforcement learning model to achieve the above functions; as an exemplary embodiment, the reinforcement learning model can be a reinforcement learning model based on a neural network algorithm, such as Deep Q-Learning (Deep Q-Learning). , DQN) model, etc.
  • the NFV platform controller 12 is used to schedule traffic for the corresponding NFV platform. After obtaining the environment data corresponding to the above-mentioned NFV platform, the NFV platform controller 12 can send the environment data to the reinforcement learning module 11, and according to the traffic scheduling policy returned by the reinforcement learning module 11, the tenant traffic to be scheduled in the above-mentioned NFV platform Schedule to the corresponding cloud host.
  • the reinforcement learning module 11 and the NFV platform controller 12 can run on the same electronic device, or they can run on different electronic devices respectively, and this specification does not limit this.
  • the above-mentioned electronic device may be, for example, a physical server including an independent host, or a virtual server hosted by a host cluster, which is not limited in this specification.
  • Figure 2 is a schematic flowchart of a traffic scheduling method according to an exemplary embodiment of this specification. As shown in Figure 2, the method may include the following steps:
  • S201 Determine environmental data, which includes resource demand information of tenant traffic to be scheduled and actual allocation of cloud host resources.
  • tenant traffic to be scheduled can be Prioritize scheduling to already used cloud hosts, thereby reducing costs by improving the resource utilization of cloud hosts on the one hand, and allowing the remaining cloud hosts to respond to unpredictable traffic bursts, equipment failures and other emergencies at any time.
  • tenant traffic to be scheduled can be prioritized to cloud hosts with relatively low resource utilization, so that the resource utilization of each cloud host is as equal as possible to achieve the longest service life. balance.
  • other scheduling purposes can also be formulated based on actual needs, which will not be detailed here.
  • the resources provided by the cloud host can include various types, such as bandwidth resources, CPU resources, GPU resources, memory resources, etc., which are not limited in this manual.
  • Tenant traffic to be scheduled needs to be processed by one or more of the above types of resources provided by the cloud host, depending on the traffic processing requirements. For example, when the tenant traffic to be scheduled is traffic carrying computing data, data calculation needs to be performed through the CPU resources, memory resources, etc. provided by the cloud host. At this time, the cloud host is equivalent to the computing node; for another example, when the tenant traffic to be scheduled is When the traffic carrying communication data needs to be filtered and forwarded through the bandwidth resources, CPU resources, etc. provided by the cloud host, the cloud host is equivalent to the network element.
  • this specification can be used for the tenant to be scheduled by inputting it as the above-mentioned environmental data into the reinforcement learning model for processing.
  • the traffic realizes dynamic and personalized cloud host resource allocation. Compared with the allocation of cloud host resources based on preset fixed weights in related technologies, the scheduling decision of tenant traffic to be scheduled in this manual can be applied to different situations in a timely manner. Actual application scenarios and achieve the scheduling purpose as mentioned above.
  • the above tenant traffic to be scheduled can be received traffic from a new tenant, that is, the triggering condition of the traffic scheduling method in this specification can be: receiving traffic from a new tenant; in other words, when receiving traffic from a new tenant, The traffic of this new tenant can be regarded as the above-mentioned tenant traffic to be scheduled, and enter S201 and its subsequent steps to achieve reasonable scheduling of this traffic, so that the traffic can be processed through reasonable cloud host resources.
  • the above triggering conditions may also include any of the following: the resource utilization rate of the preset number of cloud hosts reaches the preset expansion threshold, which triggers the expansion demand; the resource utilization rate of the preset number of cloud hosts is lower than the preset shrinkage threshold, which triggers the expansion requirement. Shrinkage requirements, etc.
  • the tenant traffic that has been originally scheduled can be used as the above-mentioned tenant traffic to be scheduled and re-scheduled through the technical solution in this specification.
  • S202 Input the environmental data into the reinforcement learning model generated by pre-training, and obtain the actions output by the reinforcement learning model and the score of each action; wherein each action corresponds to one or more cloud hosts.
  • the above-mentioned environmental data After the above-mentioned environmental data is determined, it can be input into a reinforcement learning model generated by pre-training.
  • the above-mentioned reinforcement learning model can determine the action corresponding to the environmental data and the score of each action, wherein each of the above-mentioned action representations can be assigned to a processing
  • the to-be-scheduled tenant traffic is a set of cloud hosts, and the set of cloud hosts includes one or more cloud hosts, that is, each action can correspond to one or more different cloud hosts.
  • the number of alternative cloud hosts is greater, since there are relatively more combinations between these cloud hosts, the number of cloud host sets formed by these cloud hosts is usually relatively greater; vice versa, so the above action The number is positively related to the number of alternative cloud hosts.
  • the scheduling results adopted for tenant traffic to be scheduled may also be different.
  • the technical solution of this specification will be described below with cloud host cost minimization as an exemplary scheduling purpose.
  • Equation (1) it is used to represent the minimum value of the total cost of all cloud hosts that have been used for traffic processing, which corresponds to the scheduling purpose of minimizing the cost of the cloud hosts listed above.
  • k e represents the cost of using cloud host e
  • y e is a binary variable indicating whether cloud host e is used
  • E is the set of all available cloud hosts.
  • the Markov decision process can be described as (S, A, P, R, ⁇ ), where S is the state space, A is the action space, P is the state transition function, R is the reward function, and ⁇ is the value belonging to the interval [ 0, 1] discount coefficient.
  • the NFV platform controller can determine the above-mentioned environmental data as state information S t , and input it to the above-mentioned reinforcement learning model, and then obtain the action A t output by the above-mentioned reinforcement learning model, and the NFV platform controller can according to Action A t performs the corresponding traffic scheduling operation.
  • the NFV platform controller can give the above-mentioned reinforcement learning model a corresponding reward R t for updating the above-mentioned reinforcement learning model, thereby optimizing the model decision-making strategy.
  • the goal is to find an optimal strategy to maximize the cumulative reward. It can be seen that through the reasonable formulation of the reward function in this specification, the reinforcement learning model can be gradually adjusted to achieve or approach the scheduling purpose as mentioned above after optimizing based on rewards.
  • the above reinforcement learning model can be constructed using different algorithms according to the actual needs of the NFV platform, and this description does not limit this.
  • the above-mentioned reinforcement learning model may include a DQN model.
  • the DQN model uses a neural network structure to approximate the Q Value table Q(s,a) of the traditional Q Learning algorithm, making it Q(s,a, ⁇ ) represented by the neural network node weight parameter ⁇ .
  • the multi-layer perceptron (MLP) can be used in the neural network to construct the corresponding Q network.
  • the above score for each action can be the Q value (Q Value) output by the Q network in the DQN model.
  • S203 Determine the action combination with the highest score, and schedule the tenant traffic to be scheduled to the cloud host corresponding to the action combination with the highest score; wherein each action combination corresponds to one or more actions.
  • the above-mentioned reinforcement learning model When the above-mentioned reinforcement learning model outputs the actions corresponding to the above-mentioned environmental data and the score of each action, all actions can be arranged and combined and the action combination with the highest score can be determined based on the score. At the same time, the NFV platform controller can use the action combination with the highest score to determine the action combination. The tenant traffic to be scheduled is scheduled to the cloud host corresponding to the action combination with the highest score.
  • reinforcement learning is particularly suitable for sequential decision-making problems in dynamic environments, such as the traffic scheduling problem (or cloud host allocation problem) in this specification.
  • Each action output by the reinforcement learning model can represent a cloud host allocation plan selected for the tenant traffic to be allocated, so the action combination with the highest score in S203 is actually the action with the highest score.
  • the number of actions formed will be very large, for example, it may exceed millions, which will bring great complexity and calculation amount. Therefore, each action can be simplified by The number of machines, and in S203, each action is combined to form an action combination.
  • Each action combination contains multiple actions.
  • each action combination can represent a cloud host allocation plan selected for the tenant traffic to be allocated, so that Making the total number of actions controllable helps to efficiently determine the highest-scoring combination of actions.
  • each action can be simplified to include only one cloud host, at which point the total number of actions is minimal and equal to the total number of cloud hosts.
  • the constraints may include the first constraint for individually constraining a single cloud host, or the second constraint for overall constraints for all cloud hosts corresponding to the action combination, or both types of constraints may exist at the same time. .
  • the first constraint can be used to filter actions output by the reinforcement learning model, and at least one cloud host corresponding to the filtered action does not satisfy the first constraint.
  • the first constraint may include: the resource proportion of the cloud host to which the tenant traffic is to be scheduled does not exceed a preset resource threshold. For example: There are two actions A and B and cloud hosts 1 and 2. The available bandwidths of the two cloud hosts are a and b respectively. Action A includes cloud host 1, action B includes cloud hosts 1 and 2, and the tenants to be scheduled are The traffic demand for bandwidth is X(a ⁇ X ⁇ a+b).
  • action A will be smaller than the available bandwidth of the corresponding cloud host a because the bandwidth demand of the tenant to be scheduled cannot exceed the sum of the maximum available bandwidth corresponding to the action.
  • the traffic is filtered based on bandwidth requirements, leaving only action B.
  • the action combination with the highest score is determined from the alternative action combinations formed by the remaining actions.
  • the action combination with the highest score is: all alternatives that satisfy the preset second constraint.
  • the second constraint may include at least one of the following conditions: the number of cloud hosts to which the tenant traffic to be scheduled remains within a preset interval, and the cloud hosts corresponding to the corresponding action combinations are not the same as the cloud hosts corresponding to the scheduled tenant traffic. Total overlap.
  • actions C and D that are not filtered by the above-mentioned first constraint like action B, and action C includes cloud hosts 1, 2, and 3, and action D includes cloud hosts 2 and 3.
  • action C is filtered because the number of cloud hosts corresponding to the entire action C is greater than 2.
  • actions B and D remain. Based on the scores given by the reinforcement learning model for actions B and D, the action combination with the highest score is determined, and then the tenant traffic to be scheduled is scheduled to the cloud host corresponding to the action combination with the highest score. .
  • the above remaining actions can be arranged and combined to generate the above alternative action combinations, and then all alternative action combinations are filtered according to the above second constraint, and the filtered alternative action combinations are sorted according to all contained The actions are sorted by their total score, and the alternative action combination ranked first is determined as the action combination with the highest score.
  • the above-mentioned remaining actions are sorted according to the score of each action, and the remaining actions with relatively higher scores are prioritized to be arranged and combined to generate the above-mentioned alternative action combinations, until the generated alternative action combinations meet the above-mentioned requirements. second constraint, and determine the alternative action combination as the action combination with the highest score above.
  • the sorting operation precedes the permutation and combination operation. Therefore, in this embodiment, when determining the action combination with the highest score, it is not necessary to traverse the scores of all action combinations, but only the remaining ones with relatively higher scores. Actions are prioritized and combined, thereby increasing the chance of identifying the highest-scoring action combinations above in advance.
  • the reinforcement learning model in this specification can be updated according to the rewards corresponding to the output actions, so as to continuously improve the processing effect of the reinforcement learning model and achieve or approximate the required scheduling purpose.
  • the NFV platform controller can combine the historical environment data, the historical action combination with the highest score corresponding to the historical environment data, the reward corresponding to the historical action combination, and the updated historical environment formed after the historical action combination is executed.
  • the data is stored in a preset cache area, and one or more sets of historical data are randomly selected from the preset cache area regularly to update the above reinforcement learning model.
  • the reward corresponding to the above historical action combination can be calculated by a preset reward function on the historical environment data, the preset cloud host price model and the historical action combination; wherein, the reward size is the same as the cloud host corresponding to the historical action combination.
  • the total cost is negatively correlated, which allows the above reinforcement learning model to give higher scores to the action combinations corresponding to cloud hosts with lower total costs.
  • the above-mentioned preset reward function can be used to guide the reinforcement learning algorithm to find the optimal strategy. Then, through the reasonable formulation of the preset reward function, the optimization direction of the reinforcement learning model can be controlled so that it can achieve or approach the required scheduling purpose after continuous updates.
  • the scheduling purpose is to minimize the total cost of the cloud host
  • the above reward function can be designed as: 1. If an action includes scheduling tenant traffic to be scheduled to an unused cloud host e (this ECS (has not been assigned to-be-scheduled traffic before), then the reward for this action is -k e ke; 2. If an action schedules the to-be-scheduled tenant traffic to a used cloud host e, the reward is 0. After the reward is calculated, the reward can be regularized so that it falls within the interval [-1,1] to facilitate subsequent update operations of the reinforcement learning model.
  • this specification models the scheduling problem of tenant traffic to be scheduled as a multi-stage sequence decision-making problem suitable for reinforcement learning, and adopts a model corresponding to the reinforcement learning algorithm DQN and a filtering combination algorithm determined based on constraints.
  • historical environment data is used to train reinforcement learning models, and tenant traffic is scheduled to the cloud host set with the highest score and that meets actual needs.
  • the above-mentioned multi-stage sequence decision-making problem can be analyzed and calculated based on the dynamically changing characteristics of tenant traffic.
  • the to-be-scheduled tasks in this specification can be The scheduling decisions of tenant traffic can be applied to different practical application scenarios in a timely manner, so that the NFV platform can achieve the required scheduling purposes.
  • Figure 3 is a schematic diagram of cloud host allocation for tenant traffic according to an exemplary embodiment of this specification.
  • the NFV platform controller cooperates with the reinforcement learning module to allocate cloud hosts based on the information provided by the reinforcement learning module. Policy, allocate corresponding cloud hosts to tenants, so that these cloud hosts process the tenant's traffic. in particular:
  • the NFV platform controller can obtain or maintain tenant traffic information, cloud host resource information, reliability policies, price models, scheduling decisions, etc. respectively. Among them, tenant traffic information and cloud host resource information belong to environmental information.
  • the NFV platform controller can provide these environmental data as status information to the environmental assessment module in the reinforcement learning module.
  • the environmental assessment module further inputs the above status information into reinforcement learning. Model.
  • the tenant traffic information and cloud host resource information involve the CPU resources, memory resources and bandwidth provided by the cloud host. resources.
  • b i , c i , m i respectively represent the bandwidth resources and CPU resources required for the traffic of tenant i. and memory resources.
  • the state vector Si is used to indicate that the i-th state encapsulates various resource information on all available cloud hosts and resource information required by the i-th tenant, and its total length can be 3N+3.
  • this specification does not require that the above-mentioned environmental data and the state information S_t must be consistent in format.
  • the above-mentioned environmental data is in a string format, it can be converted into an equivalent vector format before inputting the above-mentioned reinforcement learning model.
  • Reinforcement learning models can be built based on the DQN algorithm.
  • the DQN model obtains the status information Si from the environmental assessment module, it is used as the input of the DQN model.
  • the neural network included in the DQN model calculates and outputs the Q value of each action, where each action corresponds to a cloud host. .
  • the DQN model passes the calculated Q values of all actions to the filter combination algorithm.
  • the filter combination algorithm can select the action combination adopted this time based on the Q value.
  • the cloud host corresponding to the action combination will be assigned to the tenant.
  • the filtering combination algorithm can be implemented through the following formula defined in this manual:
  • D is the set of tenant traffic to be scheduled
  • U is the cloud host resource threshold
  • G i is the number of cloud hosts used by tenant i
  • b i , c i , m i are the bandwidth, CPU resources, and memory resources occupied by tenant i's to-be-scheduled tenant traffic
  • B e , C e and Me are the bandwidth, cloud host, and memory resources of cloud host e.
  • equations (2), (3), and (4) are used to individually constrain a single cloud host. Specifically, equations (2), (3), and (4) are used to limit the resource capacity of cloud hosts: the resource percentage used by all to-be-scheduled tenant traffic allocated to each cloud host should not exceed a predetermined threshold U, To ensure that each cloud host has sufficient resources to cope with emergencies such as traffic surges.
  • Equation (5) is used to constrain the allocated number of cloud hosts: G i indicates the number of cloud hosts used by tenant i. This value should not be less than a fixed minimum value G min . By allocating the tenant traffic to be scheduled to multiple On a cloud host, high reliability can be achieved. On the other hand, the value should not exceed some fixed maximum value G max . The values of G min and G max can be determined according to the actual needs of the NFV platform. Equation (6) is used to constrain the overlap of cloud hosts: the number of cloud hosts shared between two tenants should not exceed a certain parameter O. The actual value of O is determined by the NFV platform administrator. The purpose of incomplete overlap between different tenants is to prevent dangerous traffic of some tenants from affecting the traffic of other tenants, thereby improving system reliability.
  • the filter combination algorithm can first filter out actions that do not satisfy equations (2), (3), and (4), then sort them according to the Q values of all remaining actions, and then try various actions starting from the remaining actions with relatively higher Q values.
  • alternative action combinations that satisfy the above equation (5) can be obtained in sequence, and each alternative action combination should be obtained in order from high to low in total Q value.
  • the above constraints can be defined in the reliability policy maintained by the NFV platform controller, and the relevant content of the reliability policy can be provided by the NFV platform controller to the filtering combination.
  • Algorithms for generating scheduling decisions For example, the NFV platform controller indirectly provides relevant content of the reliability strategy to the filtering combination algorithm through the environmental assessment module.
  • the NFV platform controller can update the constraints in the reliability policy based on actual needs.
  • the filtering combination algorithm can transfer relevant information to the NFV platform controller, so that the NFV platform controller can form a scheduling strategy accordingly.
  • the cloud host allocation plan is to select cloud hosts 1 to 4 corresponding to actions A1 to A4
  • the corresponding scheduling policy can be: dispatch tenant traffic to cloud hosts 1 to 4 for processing.
  • the price model maintained by the NFV platform controller may be a cloud host price model.
  • the environmental assessment module can calculate the rewards generated by this cloud host allocation based on the cloud host price model, as well as the aforementioned status information, cloud host allocation plan, and preset reward function.
  • the corresponding scheduling scheme can prioritize the traffic of tenants currently to be allocated to cloud hosts that have been allocated the traffic of other tenants, in a secure manner. Improving the resource utilization of the corresponding cloud host within the scope can reduce the total cost of cloud host resources.
  • the number of started cloud hosts can be reduced under the same conditions, it can at least save the basic resource consumption of system operation, heat dissipation and other aspects necessary for the operation of these cloud hosts, and reduce greenhouse gas emissions, which is conducive to early deployment. Achieve the goal of peak carbon neutrality.
  • Reinforcement learning models can maintain buffered datasets.
  • the above-mentioned status information, cloud host allocation plan, rewards and the latter status information formed after executing the cloud host allocation plan can be recorded as a set of data to the above-mentioned buffer data set.
  • the buffer data set is used to save each set of data formed by each allocation.
  • the reinforcement learning model can periodically select one or more sets of data from the buffer data set for its own model update; the selection of data can be random selection.
  • the above embodiment describes the implementation process of reasoning through the reinforcement learning model that has completed training.
  • This reasoning process can enable the reinforcement learning module to provide the NFV platform controller with a scheduling strategy that meets the scheduling purpose. Since the above scheduling strategy is essentially It is the result output by the above-mentioned reinforcement learning model after inputting the above-mentioned resource demand information of the tenant traffic to be scheduled and the actual allocation of the above-mentioned cloud host resources into the reinforcement learning model. Therefore, compared with the fixed preset-based method in related technologies, Weights are used to allocate cloud host resources.
  • the scheduling strategies obtained in this manual can be applied to different practical application scenarios in a timely manner.
  • the reinforcement learning model needs to be trained multiple times and iteratively to ensure that the scheduling policy output by the reinforcement learning model can meet the scheduling purpose.
  • the training process for the reinforcement learning model is similar to the above-mentioned reasoning process, including: inputting the environmental data in the training sample into the reinforcement learning model, obtaining the actions and scores output by the model, and determining the action combination with the highest score; and, based on the score The highest action combination can determine whether the preset scheduling purpose has been achieved. If it has not been achieved, iterative training of the reinforcement learning model needs to continue until the reinforcement learning model can meet the scheduling purpose. Of course, it can also be determined whether to continue iterative training of the reinforcement learning model based on other conditions such as whether the number of training times has reached the preset number.
  • each group of data cached in the above buffer data set can still be used, that is, by periodically randomly selecting a group or Multiple sets of data update the parameters of the deep neural network used in the reinforcement learning model, thereby helping to overcome the correlation and non-stationary distribution of empirical data.
  • FIG 4 is a schematic structural diagram of a network function virtualization NFV platform controller in an exemplary embodiment.
  • the NFV platform controller includes a processor, internal bus, network interface, memory and non-volatile storage, and of course may include other required hardware.
  • the processor reads the corresponding computer program from the non-volatile memory into the memory and then runs it, forming a traffic scheduling device at the logical level.
  • this specification does not exclude other implementation methods, such as logic devices or a combination of software and hardware, etc. That is to say, the execution subject of the following processing flow is not limited to each logical unit, and may also be hardware or logic device.
  • this specification also provides an embodiment of a traffic scheduling device.
  • FIG. 5 is a schematic structural diagram of a traffic scheduling device according to an exemplary embodiment.
  • the device may include:
  • the environment data determination unit 501 is used to determine the environment data, which includes the resource demand information of the tenant traffic to be scheduled and the actual allocation of cloud host resources;
  • the reinforcement learning model processing unit 502 is used to input the environmental data into the reinforcement learning model generated by pre-training, and obtain the actions output by the reinforcement learning model and the score of each action; wherein each action corresponds to one or more cloud host;
  • the traffic scheduling unit 503 is configured to determine the action combination with the highest score, and schedule the tenant traffic to be scheduled to the cloud host corresponding to the action combination with the highest score; wherein each action combination corresponds to one or more actions.
  • the reinforcement learning model includes a deep Q network model; wherein each of the reinforcement learning models output The score of an action is the Q-value of the corresponding action.
  • the device also includes:
  • the action constraint unit 504 is used to filter the actions output by the reinforcement learning model according to the preset first constraint condition; wherein the first constraint condition is used to individually constrain a single cloud host, and the filtered actions correspond to At least one cloud host does not satisfy the first constraint;
  • the action combination with the highest score is determined from the alternative action combinations formed by the remaining actions.
  • the action combination with the highest score is: the action combination with the highest score among all alternative action combinations that satisfy the preset second constraint condition. Select an action combination; among them, the second constraint condition is used to impose overall constraints on all cloud hosts corresponding to the action combination.
  • the action constraint unit 504 is specifically used to:
  • the remaining actions are sorted according to the score of each action, and the remaining actions with relatively higher scores are prioritized to generate the alternative action combination until the generated alternative action combination meets the second constraint. , and determine the alternative action combination as the action combination with the highest score; or,
  • the remaining actions are arranged and combined to generate the alternative action combinations, all alternative action combinations are filtered according to the second constraint, and the filtered alternative action combinations are calculated according to the total score of all included actions. Sorting is performed, and the alternative action combination ranked first is determined as the action combination with the highest score.
  • the first constraint includes: the resource proportion of the cloud host to which the tenant traffic to be scheduled does not exceed a preset resource threshold;
  • the second constraint includes at least one of the following: the number of cloud hosts to which the tenant traffic to be scheduled remains within a preset interval, the cloud host corresponding to the corresponding action combination and the cloud host corresponding to the scheduled tenant traffic are not the same. Total overlap.
  • the device also includes:
  • the reinforcement learning model update unit 505 is used to randomly select one or more sets of historical data from the preset cache area to update the reinforcement learning model;
  • Each set of historical data includes: historical environment data, the historical action combination with the highest score corresponding to the historical environment data, the reward corresponding to the historical action combination, and the updated historical environment data formed after the execution of the historical action combination.
  • the reward corresponding to the historical action combination is calculated by a preset reward function based on the historical environment data, the preset cloud host price model and the historical action combination; where the reward size is related to the historical action
  • the total cost of the cloud hosts corresponding to the combination is negatively correlated.
  • the device is executed when any of the following trigger conditions is met:
  • the resource usage of the preset number of cloud hosts reaches the preset expansion threshold and triggers expansion requirements
  • the resource utilization of the preset number of cloud hosts is lower than the preset shrinkage threshold, triggering the need for shrinkage.
  • the device embodiment since it basically corresponds to the method embodiment, please refer to the method embodiment for relevant information. Partial description is enough.
  • the device embodiments described above are only illustrative.
  • the units described as separate components may or may not be physically separated.
  • the components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in this specification. Persons of ordinary skill in the art can understand and implement the method without any creative effort.
  • Embodiments of the subject matter and functional operations described in this specification may be implemented in digital electronic circuits, tangibly embodied computer software or firmware, computer hardware including the structures disclosed in this specification and their structural equivalents, or in A combination of one or more.
  • Embodiments of the subject matter described in this specification may be implemented as one or more computer programs, i.e., one or more of computer program instructions encoded on a tangible non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Multiple modules.
  • the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical, or electromagnetic signal, which signal is generated to encode and transmit the information to a suitable receiver device for transmission by the data
  • the processing device executes.
  • Computer storage media may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
  • the processes and logic flows described in this specification may be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output.
  • the processes and logic flows may also be performed by dedicated logic circuits, such as FPGA (Field Programmable Gate Array) or ASIC (Application Specific Integrated Circuit), and the device may also be implemented as a dedicated logic circuit.
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • Computers suitable for executing the computer program include, for example, general and/or special purpose microprocessors, or any other type of central processing unit.
  • the central processing unit will receive instructions and data from read-only memory and/or random access memory.
  • the basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data.
  • the computer will also include one or more mass storage devices for storing data, such as magnetic, magneto-optical or optical disks, or the like, or the computer will be operably coupled to such mass storage device to receive data therefrom or to It transmits data, or both.
  • the computer is not required to have such a device.
  • the computer may be embedded in another device, such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a global positioning system (GPS) receiver, or a universal serial bus (USB) ) flash drives for portable storage devices, to name a few.
  • PDA personal digital assistant
  • GPS global positioning system
  • USB universal serial bus
  • Computer-readable media suitable for storage of computer program instructions and data include all forms of non-volatile memory, media and memory devices, including, for example, semiconductor memory devices (such as EPROM, EEPROM, and flash memory devices), magnetic disks (such as internal hard disks or removable memory devices). removable disks), magneto-optical disks, and CD ROM and DVD-ROM disks.
  • semiconductor memory devices such as EPROM, EEPROM, and flash memory devices
  • magnetic disks such as internal hard disks or removable memory devices. removable disks
  • magneto-optical disks and CD ROM and DVD-ROM disks.
  • the processor and memory may be supplemented by or incorporated into special purpose logic circuitry.
  • composition can point to a subcomposition or a variant of a subcomposition.

Abstract

The present description provides a traffic scheduling method. The method comprises: determining environmental data, the environmental data comprising resource requirement information of tenant traffic to be scheduled and actual allocation of cloud host resources; inputting the environmental data into a pre-trained generated reinforcement learning model, and obtaining actions output by the reinforcement learning model and a score of each action, wherein each action corresponds to one or more cloud hosts; and determining an action combination having the highest score, and scheduling said tenant traffic to the cloud host corresponding to the action combination having the highest score, wherein each action combination corresponds to one or more actions.

Description

流量调度方法Traffic scheduling method
本申请要求于2022年04月29日提交中国专利局、申请号为202210476232.9、申请名称为“流量调度方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed with the China Patent Office on April 29, 2022, with the application number 202210476232.9 and the application name "Traffic Scheduling Method", the entire content of which is incorporated into this application by reference.
技术领域Technical field
本说明书涉及通信技术领域,尤其涉及一种流量调度方法。This specification relates to the field of communication technology, and in particular, to a traffic scheduling method.
背景技术Background technique
随着云技术的不断发展,虚拟网络功能(Virtual Network Function,VNF)开始被广泛应用于现代的云网络中。而随着使用云服务的企业和组织的快速增多,他们对网络功能的可扩展性,部署速度和性能有更严格的要求。基于服务器的虚拟网络功能实现难以满足这些要求。为了以更弹性的方式部署和交付网络功能,使用云网络提供的云主机(Elastic Compute Service,ECS)部署虚拟网络功能成为合适的解决方案。这种云原生设计易于管理,同时能够按需资源供应,弹性分配资源,在云主机上可以方便的部署用户网络功能且维持高可用性。With the continuous development of cloud technology, virtual network functions (Virtual Network Function, VNF) have begun to be widely used in modern cloud networks. With the rapid increase of enterprises and organizations using cloud services, they have stricter requirements for the scalability, deployment speed and performance of network functions. Server-based implementation of virtual network functions is difficult to meet these requirements. In order to deploy and deliver network functions in a more flexible manner, using cloud hosts (Elastic Compute Service, ECS) provided by cloud networks to deploy virtual network functions has become a suitable solution. This cloud-native design is easy to manage, and can supply resources on demand and flexibly allocate resources. It can easily deploy user network functions on cloud hosts and maintain high availability.
发明内容Contents of the invention
有鉴于此,本说明书提供一种流量调度方法,以解决相关技术中存在的不足。In view of this, this specification provides a traffic scheduling method to solve the deficiencies in related technologies.
具体地,本说明书是通过如下技术方案实现的:Specifically, this specification is implemented through the following technical solutions:
根据本说明书实施例的第一方面,提供了一种流量调度方法,所述方法包括:According to a first aspect of the embodiments of this specification, a traffic scheduling method is provided, and the method includes:
确定环境数据,所述环境数据包含待调度租户流量的资源需求信息与云主机资源的实际分配情况;Determine environmental data, which includes resource demand information of tenant traffic to be scheduled and actual allocation of cloud host resources;
将所述环境数据输入预先训练生成的强化学习模型,并得到所述强化学习模型输出的动作和每一动作的得分;其中,每一动作对应一个或多个云主机;Input the environmental data into the reinforcement learning model generated by pre-training, and obtain the actions output by the reinforcement learning model and the score of each action; wherein each action corresponds to one or more cloud hosts;
确定得分最高的动作组合,并将所述待调度租户流量调度至所述得分最高的动作组合对应的云主机;其中,每一动作组合对应一个或多个动作。Determine the action combination with the highest score, and schedule the tenant traffic to be scheduled to the cloud host corresponding to the action combination with the highest score; wherein each action combination corresponds to one or more actions.
根据本说明书实施例的第二方面,提供了一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现如第一方面所述的方法的步骤。According to a second aspect of the embodiments of this specification, a computer-readable storage medium is provided, on which a computer program is stored, and when the program is executed by a processor, the steps of the method described in the first aspect are implemented.
根据本说明书实施例的第三方面,提供了一种网络功能虚拟化NFV平台控制器,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如第一方面所述的方法的步骤。 According to the third aspect of the embodiment of this specification, a network function virtualization NFV platform controller is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor. The processor executes the The described procedures are steps for implementing the method described in the first aspect.
在本说明书所提供的技术方案中,通过将待调度租户流量的资源需求信息与云主机资源的实际分配情况作为环境数据输入至强化学习模型中进行处理,可以针对该待调度租户流量实现动态化、个性化的云主机资源分配,相比于相关技术中基于预设的固定权重对云主机资源进行分配,可使本说明书中的待调度租户流量的调度决策能够及时适用于不同实际应用场景,并保证租户流量合理地调度至相应的云主机,从而实现对云主机资源的合理分配。In the technical solution provided in this specification, by inputting the resource demand information of the tenant traffic to be scheduled and the actual allocation of cloud host resources as environmental data into the reinforcement learning model for processing, the dynamics of the tenant traffic to be scheduled can be achieved , Personalized cloud host resource allocation. Compared with the allocation of cloud host resources based on preset fixed weights in related technologies, the scheduling decisions of tenant traffic to be scheduled in this manual can be applied to different actual application scenarios in a timely manner. And ensure that tenant traffic is reasonably scheduled to the corresponding cloud host, thereby achieving reasonable allocation of cloud host resources.
通过对强化学习模型的训练,可使训练后的强化学习模型满足本说明书中所希望达到的流量调度目的。譬如,通过将当前待分配的租户的流量优先调度至已分配有其他租户的流量的云主机,在安全范围内提高相应云主机的资源利用率,可以降低云主机资源的总成本。并且,由于在相同条件下可以减少启动的云主机数量,能够至少节省下这些云主机在运行时所必需的系统运行、散热等各方面的基础资源消耗,减少温室气体的排放量,有利于早日达成碳达峰碳中和的目标。By training the reinforcement learning model, the trained reinforcement learning model can meet the traffic scheduling purpose expected in this specification. For example, by preferentially scheduling the traffic of tenants currently to be allocated to cloud hosts that have been allocated the traffic of other tenants, the resource utilization of the corresponding cloud hosts can be improved within a safe range, and the total cost of cloud host resources can be reduced. Moreover, since the number of started cloud hosts can be reduced under the same conditions, it can at least save the basic resource consumption of system operation, heat dissipation and other aspects necessary for the operation of these cloud hosts, and reduce greenhouse gas emissions, which is conducive to early deployment. Achieve the goal of peak carbon neutrality.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本说明书。It should be understood that the above general description and the following detailed description are only exemplary and explanatory, and do not limit this specification.
附图说明Description of the drawings
为了更清楚地说明本说明书实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本说明书中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。In order to more clearly explain the embodiments of this specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only These are some embodiments recorded in this specification. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings.
图1是本说明书一示例性实施例示出的一种流量调度系统的架构示意图;Figure 1 is a schematic architectural diagram of a traffic scheduling system according to an exemplary embodiment of this specification;
图2是本说明书一示例性实施例示出的一种流量调度方法的流程示意图;Figure 2 is a schematic flowchart of a traffic scheduling method according to an exemplary embodiment of this specification;
图3是本说明书一示例性实施例示出的一种对租户流量进行云主机分配的示意图;Figure 3 is a schematic diagram of cloud host allocation for tenant traffic according to an exemplary embodiment of this specification;
图4是本说明书一示例性实施例示出的一种网络功能虚拟化NFV平台控制器的示意结构图;Figure 4 is a schematic structural diagram of a network function virtualization NFV platform controller according to an exemplary embodiment of this specification;
图5是本说明书一示例性实施例示出的一种流量调度装置的结构示意图。Figure 5 is a schematic structural diagram of a traffic scheduling device according to an exemplary embodiment of this specification.
具体实施方式Detailed ways
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本说明书相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本说明书的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. When the following description refers to the drawings, the same numbers in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with this specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of this specification, as detailed in the appended claims.
在本说明书使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本说明书。在本说明书和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并 包含一个或多个相关联的列出项目的任何或所有可能组合。The terminology used in this specification is for the purpose of describing particular embodiments only and is not intended to limit the specification. As used in this specification and the appended claims, the singular forms "a,""the" and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term "and/or" as used herein refers to both Contains any or all possible combinations of one or more associated listed items.
应当理解,尽管在本说明书可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本说明书范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, third, etc. may be used in this specification to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from each other. For example, without departing from the scope of this specification, the first information may also be called second information, and similarly, the second information may also be called first information. Depending on the context, the word "if" as used herein may be interpreted as "when" or "when" or "in response to determining."
申请人在对相关技术的研究中发现:服务提供商通常基于网络功能虚拟化(Network Functions Virtualization,NFV)管控平台管理来自租户(即上述使用云服务的企业和组织)的多种不同的网络功能的部署和流量调度,其中,对于租户流量中的大象流,通常将使用虚拟处理器分散处理单元(Virtual Distributed Process Unit,VDPU)进行卸载以减少空闲资源浪费。对于其他租户流量,通常将基于加权计算的方式获取对应的调度方案。然而,由于租户流量的动态变化所引起的不确定性,上述加权计算中例如权重值等数值将无法及时适用于不同实际应用场景。因此,本说明书提出以下技术方案以解决以上问题。The applicant found in its research on related technologies that service providers usually manage multiple different network functions from tenants (i.e., the above-mentioned enterprises and organizations using cloud services) based on the Network Functions Virtualization (NFV) management and control platform. Deployment and traffic scheduling. Among them, the elephant flow in tenant traffic is usually offloaded using a virtual processor distributed processing unit (VDPU) to reduce the waste of idle resources. For other tenant traffic, the corresponding scheduling plan is usually obtained based on weighted calculation. However, due to the uncertainty caused by dynamic changes in tenant traffic, values such as weight values in the above weighted calculation will not be applicable to different actual application scenarios in a timely manner. Therefore, this specification proposes the following technical solutions to solve the above problems.
图1是本说明书一示例性实施例示出的流量调度系统的架构示意图。如图1所示,可以包括强化学习模块11和NFV平台控制器12。Figure 1 is a schematic architectural diagram of a traffic scheduling system according to an exemplary embodiment of this specification. As shown in Figure 1, a reinforcement learning module 11 and an NFV platform controller 12 may be included.
强化学习(Reinforcement Learning,RL)模块11用于接收对应的NFV平台的环境数据,并针对上述NFV平台中的待调度租户流量生成相应的流量调度决策,使NFV平台控制器12根据该流量调度决策对待调度租户流量进行调度,从而通过决策所指示的云主机对待调度租户流量进行处理。其中,强化学习模块11可以采用任意类型的强化学习模型来实现上述功能;作为一示例性实施例,该强化学习模型可以为基于神经网络算法的强化学习模型,比如深度Q网络(Deep Q-Learning,DQN)模型等。The reinforcement learning (Reinforcement Learning, RL) module 11 is used to receive the environmental data of the corresponding NFV platform, and generate corresponding traffic scheduling decisions for the tenant traffic to be scheduled in the above-mentioned NFV platform, so that the NFV platform controller 12 makes the traffic scheduling decisions according to the traffic scheduling decisions. The unscheduled tenant traffic is scheduled so that the unscheduled tenant traffic is processed by the cloud host indicated by the decision. Among them, the reinforcement learning module 11 can use any type of reinforcement learning model to achieve the above functions; as an exemplary embodiment, the reinforcement learning model can be a reinforcement learning model based on a neural network algorithm, such as Deep Q-Learning (Deep Q-Learning). , DQN) model, etc.
NFV平台控制器12用于对相应的NFV平台进行流量调度。NFV平台控制器12在获取上述NFV平台对应的环境数据后,可以将该环境数据发送至强化学习模块11,并根据强化学习模块11返回的流量调度策略,将上述NFV平台中的待调度租户流量调度至对应的云主机中。其中,强化学习模块11和NFV平台控制器12可以运行于同一电子设备,或者二者可以分别运行于不同的电子设备中,本说明书并不对此进行限制。上述电子设备例如可以为包含一独立主机的物理服务器,又或者为主机集群承载的虚拟服务器,本说明书并不对此进行限制。The NFV platform controller 12 is used to schedule traffic for the corresponding NFV platform. After obtaining the environment data corresponding to the above-mentioned NFV platform, the NFV platform controller 12 can send the environment data to the reinforcement learning module 11, and according to the traffic scheduling policy returned by the reinforcement learning module 11, the tenant traffic to be scheduled in the above-mentioned NFV platform Schedule to the corresponding cloud host. Among them, the reinforcement learning module 11 and the NFV platform controller 12 can run on the same electronic device, or they can run on different electronic devices respectively, and this specification does not limit this. The above-mentioned electronic device may be, for example, a physical server including an independent host, or a virtual server hosted by a host cluster, which is not limited in this specification.
下面结合图2所示实施例对本说明书的技术方案进行阐述。图2是本说明书一示例性实施例示出的一种流量调度方法的流程示意图,如图2所示,该方法可以包括以下步骤:The technical solution of this specification will be described below with reference to the embodiment shown in FIG. 2 . Figure 2 is a schematic flowchart of a traffic scheduling method according to an exemplary embodiment of this specification. As shown in Figure 2, the method may include the following steps:
S201,确定环境数据,所述环境数据包含待调度租户流量的资源需求信息与云主机资源的实际分配情况。S201. Determine environmental data, which includes resource demand information of tenant traffic to be scheduled and actual allocation of cloud host resources.
在本说明书的技术方案中,通过确定待调度租户流量的资源需求信息与云主机资源的实际分配情况,可以实现租户流量与云主机之间的合理匹配,从而达到所需的调度目的。例如,出于对云主机的运行成本的有效控制和流量处理的稳定性,可以将待调度租户流量 优先调度至已经使用的云主机,从而一方面通过提高云主机的资源利用率以降低成本,另一方面使得剩余的云主机可以随时应对不可预测的流量突发和设备失效等突发情况。再例如,出于对所有云主机的使用寿命进行平衡,可以将待调度租户流量优先调度至资源利用率相对较低的云主机,使得各个云主机的资源利用率尽量相等,以实现使用寿命的平衡。类似地,还可以基于实际需求来制定其他的调度目的,此处不再一一赘述。In the technical solution of this specification, by determining the resource demand information of the tenant traffic to be scheduled and the actual allocation of cloud host resources, a reasonable match between the tenant traffic and the cloud host can be achieved, thereby achieving the required scheduling purpose. For example, to effectively control the operating costs of cloud hosts and stabilize traffic processing, tenant traffic to be scheduled can be Prioritize scheduling to already used cloud hosts, thereby reducing costs by improving the resource utilization of cloud hosts on the one hand, and allowing the remaining cloud hosts to respond to unpredictable traffic bursts, equipment failures and other emergencies at any time. For another example, in order to balance the service life of all cloud hosts, tenant traffic to be scheduled can be prioritized to cloud hosts with relatively low resource utilization, so that the resource utilization of each cloud host is as equal as possible to achieve the longest service life. balance. Similarly, other scheduling purposes can also be formulated based on actual needs, which will not be detailed here.
云主机所提供的资源可以包括多种类型,比如带宽资源、CPU资源、GPU资源、内存资源等,本说明书中并不对此进行限制。待调度租户流量需要由云主机所提供的上述一种或多种类型的资源进行处理,这取决于流量的处理需求。例如,当待调度租户流量为携带计算数据的流量时,需要通过云主机所提供的CPU资源、内存资源等进行数据计算,此时云主机相当于计算节点;再例如,当待调度租户流量为携带通讯数据的流量时,需要通过云主机所提供的带宽资源、CPU资源等进行过滤和转发,此时云主机相当于网元。The resources provided by the cloud host can include various types, such as bandwidth resources, CPU resources, GPU resources, memory resources, etc., which are not limited in this manual. Tenant traffic to be scheduled needs to be processed by one or more of the above types of resources provided by the cloud host, depending on the traffic processing requirements. For example, when the tenant traffic to be scheduled is traffic carrying computing data, data calculation needs to be performed through the CPU resources, memory resources, etc. provided by the cloud host. At this time, the cloud host is equivalent to the computing node; for another example, when the tenant traffic to be scheduled is When the traffic carrying communication data needs to be filtered and forwarded through the bandwidth resources, CPU resources, etc. provided by the cloud host, the cloud host is equivalent to the network element.
由于待调度租户流量的资源需求信息与上述云主机资源的实际分配情况具有较强的实时性,因而本说明书通过将其作为上述环境数据输入至强化学习模型中进行处理,可以针对该待调度租户流量实现动态化、个性化的云主机资源分配,相比于相关技术中基于预设的固定权重对云主机资源进行分配,可使本说明书中的待调度租户流量的调度决策能够及时适用于不同实际应用场景,并达到如前文所述的调度目的。Since the resource demand information of the tenant traffic to be scheduled and the actual allocation of the above-mentioned cloud host resources have strong real-time nature, this specification can be used for the tenant to be scheduled by inputting it as the above-mentioned environmental data into the reinforcement learning model for processing. The traffic realizes dynamic and personalized cloud host resource allocation. Compared with the allocation of cloud host resources based on preset fixed weights in related technologies, the scheduling decision of tenant traffic to be scheduled in this manual can be applied to different situations in a timely manner. Actual application scenarios and achieve the scheduling purpose as mentioned above.
上述的待调度租户流量可以为接收到的来自新租户的流量,即本说明书的流量调度方法的触发条件可以为:接收到来自新租户的流量;换言之,在接收到来自新租户的流量时,可以将该新租户的流量作为上述的待调度租户流量,并进入S201及其后续步骤,以实现针对该流量的合理调度,从而通过合理的云主机资源对该流量进行处理。当然,上述的触发条件还可以包括以下任一:预设数量云主机的资源使用率达到预设扩容阈值而触发扩容需求;预设数量云主机的资源利用率低于预设缩容阈值而触发缩容需求等。在发生扩容或缩容的情况下,原本已经完成调度的至少一部分租户流量可以作为上述的待调度租户流量,重新通过本说明书的技术方案进行调度。The above tenant traffic to be scheduled can be received traffic from a new tenant, that is, the triggering condition of the traffic scheduling method in this specification can be: receiving traffic from a new tenant; in other words, when receiving traffic from a new tenant, The traffic of this new tenant can be regarded as the above-mentioned tenant traffic to be scheduled, and enter S201 and its subsequent steps to achieve reasonable scheduling of this traffic, so that the traffic can be processed through reasonable cloud host resources. Of course, the above triggering conditions may also include any of the following: the resource utilization rate of the preset number of cloud hosts reaches the preset expansion threshold, which triggers the expansion demand; the resource utilization rate of the preset number of cloud hosts is lower than the preset shrinkage threshold, which triggers the expansion requirement. Shrinkage requirements, etc. In the event of capacity expansion or reduction, at least part of the tenant traffic that has been originally scheduled can be used as the above-mentioned tenant traffic to be scheduled and re-scheduled through the technical solution in this specification.
S202,将所述环境数据输入预先训练生成的强化学习模型,并得到所述强化学习模型输出的动作和每一动作的得分;其中,每一动作对应一个或多个云主机。S202: Input the environmental data into the reinforcement learning model generated by pre-training, and obtain the actions output by the reinforcement learning model and the score of each action; wherein each action corresponds to one or more cloud hosts.
当确定上述环境数据后,可以将其输入至预先训练生成的强化学习模型,上述强化学习模型可以确定对应所述环境数据的动作以及每一动作得分,其中,上述每一动作表示可以分配至处理该待调度租户流量的一云主机集合,且该云主机集合包含一个或多个云主机,即每一动作可以分别对应有一个或多个不同的云主机。当备选的云主机的数量越多时,由于这些云主机之间的组合情况相对越多,那么这些云主机所形成的云主机集合的数量通常也相对越多;反之亦然,故上述动作的数量与备选的云主机的数量呈正相关。After the above-mentioned environmental data is determined, it can be input into a reinforcement learning model generated by pre-training. The above-mentioned reinforcement learning model can determine the action corresponding to the environmental data and the score of each action, wherein each of the above-mentioned action representations can be assigned to a processing The to-be-scheduled tenant traffic is a set of cloud hosts, and the set of cloud hosts includes one or more cloud hosts, that is, each action can correspond to one or more different cloud hosts. When the number of alternative cloud hosts is greater, since there are relatively more combinations between these cloud hosts, the number of cloud host sets formed by these cloud hosts is usually relatively greater; vice versa, so the above action The number is positively related to the number of alternative cloud hosts.
如前所述,由于调度目标的差异,针对待调度租户流量所采用的调度结果也可能存在差异。为了便于理解,下文以云主机成本最小化作为示例性的调度目的,对本说明书的技术方案进行说明。
As mentioned above, due to differences in scheduling targets, the scheduling results adopted for tenant traffic to be scheduled may also be different. For ease of understanding, the technical solution of this specification will be described below with cloud host cost minimization as an exemplary scheduling purpose.
如等式(1)所示,用于表示所有已用于流量处理的云主机的总成本的最小值,即对应于上述示例性列举的云主机成本最小化的调度目的。其中,ke表示使用云主机e的成本,ye为指示云主机e是否被使用二元变量,E为所有可用云主机的集合。As shown in Equation (1), it is used to represent the minimum value of the total cost of all cloud hosts that have been used for traffic processing, which corresponds to the scheduling purpose of minimizing the cost of the cloud hosts listed above. Among them, k e represents the cost of using cloud host e, y e is a binary variable indicating whether cloud host e is used, and E is the set of all available cloud hosts.
在多租户场景下,针对各个租户的流量进行调度时,容易理解的是:在将任一租户的流量调度至相应的云主机进行处理后,就会导致云主机资源的实际分配情况产生变化,因而每个租户的流量调度方案都会受之前所有租户的流量调度方案的影响。可见,本说明书的技术方案所解决的是涉及多阶段处理的顺序决策问题,可以采用马尔可夫决策过程(Markov Decision Process,MDP)来描述,因而本说明书通过基于马尔可夫决策过程的强化学习模型来解决上述的流量调度问题。In a multi-tenant scenario, when scheduling the traffic of each tenant, it is easy to understand that after scheduling the traffic of any tenant to the corresponding cloud host for processing, the actual allocation of cloud host resources will change. Therefore, each tenant's traffic scheduling plan will be affected by the traffic scheduling plans of all previous tenants. It can be seen that the technical solution of this specification solves a sequential decision-making problem involving multi-stage processing, which can be described by a Markov Decision Process (Markov Decision Process, MDP). Therefore, this specification uses reinforcement learning based on the Markov Decision Process model to solve the above traffic scheduling problem.
马尔可夫决策过程可以描述为(S,A,P,R,γ),其中S是状态空间,A是动作空间,P是状态转移函数,R是奖励函数,以及γ是取值属于区间[0,1]的折扣系数。在本说明书中,NFV平台控制器可以将上述的环境数据确定为状态信息St,并输入至上述的强化学习模型,进而获取上述强化学习模型输出的动作At,以及NFV平台控制器可以根据动作At执行对应的流量调度操作。进一步的,NFV平台控制器可以给予上述强化学习模型以相应的奖励Rt,以用于更新上述强化学习模型,从而优化模型决策策略,其目标是找到一个最优策略以获得最大化累计奖励。可见,本说明书中通过对奖励函数的合理制定,使得强化学习模型在基于奖励进行优化后,能够逐步调整以达到或接近如前所述的调度目的。The Markov decision process can be described as (S, A, P, R, γ), where S is the state space, A is the action space, P is the state transition function, R is the reward function, and γ is the value belonging to the interval [ 0, 1] discount coefficient. In this specification, the NFV platform controller can determine the above-mentioned environmental data as state information S t , and input it to the above-mentioned reinforcement learning model, and then obtain the action A t output by the above-mentioned reinforcement learning model, and the NFV platform controller can according to Action A t performs the corresponding traffic scheduling operation. Further, the NFV platform controller can give the above-mentioned reinforcement learning model a corresponding reward R t for updating the above-mentioned reinforcement learning model, thereby optimizing the model decision-making strategy. The goal is to find an optimal strategy to maximize the cumulative reward. It can be seen that through the reasonable formulation of the reward function in this specification, the reinforcement learning model can be gradually adjusted to achieve or approach the scheduling purpose as mentioned above after optimizing based on rewards.
上述强化学习模型可以根据NFV平台的实际需求使用不同算法进行构建,本说明书并不对此进行限制。在一实施例中,上述强化学习模型可以包括DQN模型。DQN模型采用神经网络结构来近似传统Q Learning算法的Q Value表Q(s,a),使之成为由神经网络节点权重参数θ表示的Q(s,a,θ)。其中,神经网络中可以使用多层感知器(MLP)来构建相应的Q网络。相应的的,当采用DQN模型作为本说明书的强化学习模型时,上述针对每一动作的得分可以为该DQN模型中的Q网络所输出的Q值(Q Value)。The above reinforcement learning model can be constructed using different algorithms according to the actual needs of the NFV platform, and this description does not limit this. In an embodiment, the above-mentioned reinforcement learning model may include a DQN model. The DQN model uses a neural network structure to approximate the Q Value table Q(s,a) of the traditional Q Learning algorithm, making it Q(s,a,θ) represented by the neural network node weight parameter θ. Among them, the multi-layer perceptron (MLP) can be used in the neural network to construct the corresponding Q network. Correspondingly, when the DQN model is used as the reinforcement learning model in this specification, the above score for each action can be the Q value (Q Value) output by the Q network in the DQN model.
S203,确定得分最高的动作组合,并将所述待调度租户流量调度至所述得分最高的动作组合对应的云主机;其中,每一动作组合对应一个或多个动作。S203: Determine the action combination with the highest score, and schedule the tenant traffic to be scheduled to the cloud host corresponding to the action combination with the highest score; wherein each action combination corresponds to one or more actions.
当上述强化学习模型输出对应上述环境数据的动作以及每一动作得分后,可以将所有动作进行排列组合并根据得分确定得分最高的动作组合,同时NFV平台控制器可以根据该得分最高的动作组合将上述待调度租户流量调度至上述得分最高的动作组合对应的云主机。When the above-mentioned reinforcement learning model outputs the actions corresponding to the above-mentioned environmental data and the score of each action, all actions can be arranged and combined and the action combination with the highest score can be determined based on the score. At the same time, the NFV platform controller can use the action combination with the highest score to determine the action combination. The tenant traffic to be scheduled is scheduled to the cloud host corresponding to the action combination with the highest score.
如前文所述,强化学习特别适用于动态环境中的顺序决策问题,例如本说明书的流量调度问题(或称,云主机分配问题)。强化学习模型所输出的每个动作可以代表为待分配租户流量所选择的一种云主机分配方案,那么S203中得分最高的动作组合实际上就是得分最高的动作。但是,如果云主机的总数较多,那么所形成动作的数量将非常大,譬如可能超过上百万,这将带来极大的复杂度和计算量。因此,可以通过简化每个动作所含云主 机的数量,并在S203对各个动作进行组合形成动作组合、每个动作组合包含多个动作,此时每个动作组合可以代表为待分配租户流量所选择的一种云主机分配方案,这样可以使得动作的总数量可控,有助于高效地确定出得分最高的动作组合。例如,可以将每个动作简化至仅包含一个云主机,此时动作的总数量最小且与云主机的总数量相等。As mentioned above, reinforcement learning is particularly suitable for sequential decision-making problems in dynamic environments, such as the traffic scheduling problem (or cloud host allocation problem) in this specification. Each action output by the reinforcement learning model can represent a cloud host allocation plan selected for the tenant traffic to be allocated, so the action combination with the highest score in S203 is actually the action with the highest score. However, if the total number of cloud hosts is large, the number of actions formed will be very large, for example, it may exceed millions, which will bring great complexity and calculation amount. Therefore, each action can be simplified by The number of machines, and in S203, each action is combined to form an action combination. Each action combination contains multiple actions. At this time, each action combination can represent a cloud host allocation plan selected for the tenant traffic to be allocated, so that Making the total number of actions controllable helps to efficiently determine the highest-scoring combination of actions. For example, each action can be simplified to include only one cloud host, at which point the total number of actions is minimal and equal to the total number of cloud hosts.
出于云主机的运行稳定性、系统可靠性等方面的考量,除了通过得分来选择动作组合之外,还需要确保云主机分配方案中所涉及的云主机满足预定义的约束条件。约束条件可以包括用于对单台云主机进行单独约束的第一约束条件,也可以包括用于对动作组合对应的所有云主机进行整体约束的第二约束条件,或者同时存在这两类约束条件。Due to considerations such as cloud host operational stability and system reliability, in addition to selecting action combinations through scores, it is also necessary to ensure that the cloud hosts involved in the cloud host allocation plan meet predefined constraints. The constraints may include the first constraint for individually constraining a single cloud host, or the second constraint for overall constraints for all cloud hosts corresponding to the action combination, or both types of constraints may exist at the same time. .
对于第一约束条件,可以用于对上述强化学习模型输出的动作进行过滤,且滤除的动作对应的至少一个云主机不满足上述第一约束条件。其中,第一约束条件可以包括:待调度租户流量调度至的云主机的资源占比不超过预设资源阈值。例如:存在两个动作A、B以及云主机1、2,其中,两个云主机的可用带宽分别为a、b,动作A包含云主机1,动作B包含云主机1、2,待调度租户流量对带宽的需求为X(a<X<a+b)。那么当第一约束条件中包含一约束条件要求为待调度租户流量对带宽的需求不能超过动作对应的最大可用带宽之和时,则动作A由于对应的云主机a的可提供带宽小于待调度租户流量对带宽的需求而被过滤,仅留下动作B。The first constraint can be used to filter actions output by the reinforcement learning model, and at least one cloud host corresponding to the filtered action does not satisfy the first constraint. The first constraint may include: the resource proportion of the cloud host to which the tenant traffic is to be scheduled does not exceed a preset resource threshold. For example: There are two actions A and B and cloud hosts 1 and 2. The available bandwidths of the two cloud hosts are a and b respectively. Action A includes cloud host 1, action B includes cloud hosts 1 and 2, and the tenants to be scheduled are The traffic demand for bandwidth is X(a<X<a+b). Then, when the first constraint includes a constraint requiring that the bandwidth demand of the tenant traffic to be scheduled cannot exceed the sum of the maximum available bandwidth corresponding to the action, then action A will be smaller than the available bandwidth of the corresponding cloud host a because the bandwidth demand of the tenant to be scheduled cannot exceed the sum of the maximum available bandwidth corresponding to the action. The traffic is filtered based on bandwidth requirements, leaving only action B.
对于第二约束条件,可以用于对动作组合对应的所有云主机进行整体约束。比如在上文所述的方案中,从由剩余动作形成的备选动作组合中确定出得分最高的动作组合,该得分最高的动作组合为:在满足预设的第二约束条件的所有备选动作组合中得分最高的备选动作组合。其中,第二约束条件可以包含下述至少一种条件:待调度租户流量所调度至的云主机数量保持在预设区间内、相应动作组合对应的云主机与已调度租户流量对应的云主机不完全重叠。例如前文所述,还存在动作C、D与动作B一样没有被上述第一约束条件过滤,且动作C包含云主机1、2、3,动作D包含云主机2、3。那么当第二约束条件中包含一约束条件要求为待调度租户流量调度至的云主机数量不大于2时,则由于动作C整体对应的云主机数量大于2而被过滤。此时,仅剩余动作B、D,则根据强化学习模型对动作B、D给出的得分,确定得分最高的动作组合,进而将待调度租户流量调度至该得分最高的动作组合对应的云主机。For the second constraint, it can be used to impose overall constraints on all cloud hosts corresponding to the action combination. For example, in the solution described above, the action combination with the highest score is determined from the alternative action combinations formed by the remaining actions. The action combination with the highest score is: all alternatives that satisfy the preset second constraint. The alternative action combination with the highest score in the action combination. The second constraint may include at least one of the following conditions: the number of cloud hosts to which the tenant traffic to be scheduled remains within a preset interval, and the cloud hosts corresponding to the corresponding action combinations are not the same as the cloud hosts corresponding to the scheduled tenant traffic. Total overlap. For example, as mentioned above, there are also actions C and D that are not filtered by the above-mentioned first constraint like action B, and action C includes cloud hosts 1, 2, and 3, and action D includes cloud hosts 2 and 3. Then, when the second constraint includes a constraint requiring that the number of cloud hosts to which the tenant traffic to be scheduled is not greater than 2, action C is filtered because the number of cloud hosts corresponding to the entire action C is greater than 2. At this time, only actions B and D remain. Based on the scores given by the reinforcement learning model for actions B and D, the action combination with the highest score is determined, and then the tenant traffic to be scheduled is scheduled to the cloud host corresponding to the action combination with the highest score. .
在一实施例中,可以对上述剩余动作进行排列组合以生成上述备选动作组合,再根据上述第二约束条件对所有备选动作组合进行筛选,将筛选出的备选动作组合按照所含所有动作的总计得分大小进行排序,并将排列在首位的备选动作组合确定为上述得分最高的动作组合。In one embodiment, the above remaining actions can be arranged and combined to generate the above alternative action combinations, and then all alternative action combinations are filtered according to the above second constraint, and the filtered alternative action combinations are sorted according to all contained The actions are sorted by their total score, and the alternative action combination ranked first is determined as the action combination with the highest score.
在另一实施例中,按照每一动作的得分大小对上述剩余动作进行排序,优先对得分相对更高的剩余动作进行排列组合以生成上述备选动作组合,直至生成的备选动作组合符合上述第二约束条件,并将该备选动作组合确定为上述得分最高的动作组合。本领域技术人员可以理解的是,由于对同一大规模数据的排序效率显然高于排列组合效率,且相较上一 实施例而言,该实施例的排序操作先于排列组合操作,故本实施例中在确定得分最高的动作组合时可以不需要遍历所有动作组合的得分,而是仅对得分相对更高的剩余动作优先进行排列组合,进而增加提前确定上述得分最高的动作组合的机会。In another embodiment, the above-mentioned remaining actions are sorted according to the score of each action, and the remaining actions with relatively higher scores are prioritized to be arranged and combined to generate the above-mentioned alternative action combinations, until the generated alternative action combinations meet the above-mentioned requirements. second constraint, and determine the alternative action combination as the action combination with the highest score above. Those skilled in the art can understand that since the sorting efficiency of the same large-scale data is obviously higher than the efficiency of permutation and combination, and compared with the previous In this embodiment, the sorting operation precedes the permutation and combination operation. Therefore, in this embodiment, when determining the action combination with the highest score, it is not necessary to traverse the scores of all action combinations, but only the remaining ones with relatively higher scores. Actions are prioritized and combined, thereby increasing the chance of identifying the highest-scoring action combinations above in advance.
如前所述,本说明书中的强化学习模型可以根据所输出动作对应的奖励进行更新,以不断提升该强化学习模型的处理效果,达到或逼近所需的调度目的。在一实施例中,NFV平台控制器可以将历史环境数据、该历史环境数据对应的得分最高的历史动作组合、该历史动作组合对应的奖励与所述历史动作组合执行后形成的更新后历史环境数据存储至预设缓存区中,并定期从该预设缓存区中随机选择一组或多组历史数据,以更新上述强化学习模型。其中,上述历史动作组合对应的奖励可由预设奖励函数对所述历史环境数据、预设的云主机价格模型以及历史动作组合进行计算得到;其中,奖励大小与所述历史动作组合对应的云主机的总成本呈负相关,这使得上述强化学习模型此后可以对总成本更低的云主机所对应的动作组合给予更高的得分。As mentioned above, the reinforcement learning model in this specification can be updated according to the rewards corresponding to the output actions, so as to continuously improve the processing effect of the reinforcement learning model and achieve or approximate the required scheduling purpose. In one embodiment, the NFV platform controller can combine the historical environment data, the historical action combination with the highest score corresponding to the historical environment data, the reward corresponding to the historical action combination, and the updated historical environment formed after the historical action combination is executed. The data is stored in a preset cache area, and one or more sets of historical data are randomly selected from the preset cache area regularly to update the above reinforcement learning model. Wherein, the reward corresponding to the above historical action combination can be calculated by a preset reward function on the historical environment data, the preset cloud host price model and the historical action combination; wherein, the reward size is the same as the cloud host corresponding to the historical action combination. The total cost is negatively correlated, which allows the above reinforcement learning model to give higher scores to the action combinations corresponding to cloud hosts with lower total costs.
可见,上述的预设奖励函数可以用于指导强化学习算法找到最优策略。那么,通过对该预设奖励函数的合理制定,可以控制对强化学习模型的优化方向,使其不断更新后能够达到或逼近所需的调度目的。例如,当调度目的为使得云主机的总成本最小化时,可以将上述奖励函数设计为:1、如果某一动作包含将待调度租户流量调度到某个未使用过的云主机e(此ECS上之前没有被分配过待调度流量),那么该动作的奖励为-keke;2、如果某一动作将待调度租户流量调度至某个已使用的云主机e,则奖励为0。在计算得到奖励后,可以对奖励执行正则化,使其落在区间[-1,1]内,以便于后续强化学习模型的更新操作。It can be seen that the above-mentioned preset reward function can be used to guide the reinforcement learning algorithm to find the optimal strategy. Then, through the reasonable formulation of the preset reward function, the optimization direction of the reinforcement learning model can be controlled so that it can achieve or approach the required scheduling purpose after continuous updates. For example, when the scheduling purpose is to minimize the total cost of the cloud host, the above reward function can be designed as: 1. If an action includes scheduling tenant traffic to be scheduled to an unused cloud host e (this ECS (has not been assigned to-be-scheduled traffic before), then the reward for this action is -k e ke; 2. If an action schedules the to-be-scheduled tenant traffic to a used cloud host e, the reward is 0. After the reward is calculated, the reward can be regularized so that it falls within the interval [-1,1] to facilitate subsequent update operations of the reinforcement learning model.
通过上述实施例可知,本说明书将待调度租户流量的调度问题建模为适合强化学习的多阶段序列决策问题,并通过采用强化学习算法DQN对应的模型,以及基于约束条件确定的过滤组合算法相结合,利用历史环境数据训练强化学习模型,将租户流量调度至得分最高且符合实际需求的云主机集合。其中,上述多阶段序列决策问题可以针对租户流量动态变化的特性进行针对性的分析与计算,相较相关技术中基于预设的固定权重对云主机资源进行分配,可使本说明书中的待调度租户流量的调度决策能够及时适用于不同实际应用场景,以使NFV平台实现所需的调度目的。As can be seen from the above embodiments, this specification models the scheduling problem of tenant traffic to be scheduled as a multi-stage sequence decision-making problem suitable for reinforcement learning, and adopts a model corresponding to the reinforcement learning algorithm DQN and a filtering combination algorithm determined based on constraints. Combined, historical environment data is used to train reinforcement learning models, and tenant traffic is scheduled to the cloud host set with the highest score and that meets actual needs. Among them, the above-mentioned multi-stage sequence decision-making problem can be analyzed and calculated based on the dynamically changing characteristics of tenant traffic. Compared with the allocation of cloud host resources based on preset fixed weights in related technologies, the to-be-scheduled tasks in this specification can be The scheduling decisions of tenant traffic can be applied to different practical application scenarios in a timely manner, so that the NFV platform can achieve the required scheduling purposes.
下面结合图3所示实施例对本说明书的技术方案进行阐述。图3是本说明书一示例性实施例示出的一种针对租户流量进行云主机分配的示意图,如图3所示,NFV平台控制器通过与强化学习模块进行配合,以根据强化学习模块所提供的策略,向租户分配相应的云主机,从而由这些云主机对该租户的流量进行处理。具体而言:The technical solution of this specification will be described below with reference to the embodiment shown in FIG. 3 . Figure 3 is a schematic diagram of cloud host allocation for tenant traffic according to an exemplary embodiment of this specification. As shown in Figure 3, the NFV platform controller cooperates with the reinforcement learning module to allocate cloud hosts based on the information provided by the reinforcement learning module. Policy, allocate corresponding cloud hosts to tenants, so that these cloud hosts process the tenant's traffic. in particular:
NFV平台控制器可以分别获取或维护租户流量信息、云主机资源信息、可靠性策略、价格模型和调度决策等。其中,租户流量信息和云主机资源信息属于环境信息,NFV平台控制器可以将这些环境数据作为状态信息提供至强化学习模块中的环境评估模块,由环境评估模块进一步将上述的状态信息输入强化学习模型。The NFV platform controller can obtain or maintain tenant traffic information, cloud host resource information, reliability policies, price models, scheduling decisions, etc. respectively. Among them, tenant traffic information and cloud host resource information belong to environmental information. The NFV platform controller can provide these environmental data as status information to the environmental assessment module in the reinforcement learning module. The environmental assessment module further inputs the above status information into reinforcement learning. Model.
假定租户流量信息和云主机资源信息涉及云主机所提供的CPU资源、内存资源和带宽 资源,此时可以将上述的状态信息表征为Si=<B,C,M,bi,ci,mi>,其中:假设所有可用的云主机数量为N,则B、C、M都是长度为N的向量,分别表示N个ECS上剩余可用的带宽资源、CPU资源和内存资源的容量,bi,ci,mi分别表示租户i的流量所需要的带宽资源、CPU资源和内存资源。因此,状态向量Si用于表示第i个状态封装了所有可用的云主机上的各项资源信息和第i个租户所需的资源信息,其总长度可以为3N+3。当然,本说明书中并不要求上述环境数据与状态信息S_t必须保持格式上的一致,例如上述环境数据为字符串等格式时,可以在输入上述强化学习模型之前转化为等价的向量格式。It is assumed that the tenant traffic information and cloud host resource information involve the CPU resources, memory resources and bandwidth provided by the cloud host. resources. At this time, the above status information can be characterized as Si =<B, C, M, bi , c i , m i >, where: assuming that the number of all available cloud hosts is N, then B, C, M They are all vectors with a length of N, respectively representing the remaining available bandwidth resources, CPU resources and memory resource capacity on N ECSs. b i , c i , m i respectively represent the bandwidth resources and CPU resources required for the traffic of tenant i. and memory resources. Therefore, the state vector Si is used to indicate that the i-th state encapsulates various resource information on all available cloud hosts and resource information required by the i-th tenant, and its total length can be 3N+3. Of course, this specification does not require that the above-mentioned environmental data and the state information S_t must be consistent in format. For example, when the above-mentioned environmental data is in a string format, it can be converted into an equivalent vector format before inputting the above-mentioned reinforcement learning model.
强化学习模型可以基于DQN算法构建。该DQN模型从环境评估模块中获得状态信息Si后,将其作为DQN模型的输入,由该DQN模型所含的神经网络计算并输出每一个动作的Q值,其中每一动作对应一个云主机。以及,DQN模型将计算得到所有动作的Q值传递给过滤组合算法,过滤组合算法可以基于Q值选取本次采用的动作组合,该动作组合所对应的云主机将被分配至租户。过滤组合算法在针对动作进行处理的过程中,可以通过本说明书中定义的下述公式实现:




Reinforcement learning models can be built based on the DQN algorithm. After the DQN model obtains the status information Si from the environmental assessment module, it is used as the input of the DQN model. The neural network included in the DQN model calculates and outputs the Q value of each action, where each action corresponds to a cloud host. . And, the DQN model passes the calculated Q values of all actions to the filter combination algorithm. The filter combination algorithm can select the action combination adopted this time based on the Q value. The cloud host corresponding to the action combination will be assigned to the tenant. In the process of processing actions, the filtering combination algorithm can be implemented through the following formula defined in this manual:




其中,D为待调度租户流量的集合;U为云主机资源阈值;Gi为租户i使用的云主机数量;为指示租户i的待调度租户流量是否分配在云主机e上的二元变量;bi、ci、mi为租户i的待调度租户流量占用的带宽、CPU资源、内存资源;Be、Ce、Me为云主机e的带宽、云主机、内存资源。Among them, D is the set of tenant traffic to be scheduled; U is the cloud host resource threshold; G i is the number of cloud hosts used by tenant i; are binary variables indicating whether tenant i's to-be-scheduled tenant traffic is allocated on cloud host e; b i , c i , m i are the bandwidth, CPU resources, and memory resources occupied by tenant i's to-be-scheduled tenant traffic; B e , C e and Me are the bandwidth, cloud host, and memory resources of cloud host e.
上述的等式(2)、(3)、(4)用于对单台云主机进行单独约束。具体而言,等式(2)、(3)、(4)用于限制云主机的资源容量:分配到每一云主机上的所有待调度租户流量使用的资源百分比不应超过预定阈值U,以确保每一云主机有足够的资源能够应对诸如流量突增等突发情况。The above equations (2), (3), and (4) are used to individually constrain a single cloud host. Specifically, equations (2), (3), and (4) are used to limit the resource capacity of cloud hosts: the resource percentage used by all to-be-scheduled tenant traffic allocated to each cloud host should not exceed a predetermined threshold U, To ensure that each cloud host has sufficient resources to cope with emergencies such as traffic surges.
上述的等式(5)、(6)用于对动作组合对应的所有云主机进行整体约束。具体而言,等式(5)用于约束云主机的分配数量:Gi指示租户i使用的云主机数量,该值不应小于某固定最小值Gmin,通过将待调度租户流量分配到多个云主机上,可以实现高可靠性。另一方面,该值不应超过某固定最大值Gmax。Gmin和Gmax的取值可以根据NFV平台的实际需求确定。等式(6)用于约束云主机的重叠情况:两个租户之间共享的云主机数量不应超过某参数O。O的实际取值由NFV平台管理员决定。不同租户之间不完全重叠的目的是防止部分租户的危险流量影响其租户的流量,从而提高系统可靠性。 The above equations (5) and (6) are used to constrain all cloud hosts corresponding to the action combination as a whole. Specifically, Equation (5) is used to constrain the allocated number of cloud hosts: G i indicates the number of cloud hosts used by tenant i. This value should not be less than a fixed minimum value G min . By allocating the tenant traffic to be scheduled to multiple On a cloud host, high reliability can be achieved. On the other hand, the value should not exceed some fixed maximum value G max . The values of G min and G max can be determined according to the actual needs of the NFV platform. Equation (6) is used to constrain the overlap of cloud hosts: the number of cloud hosts shared between two tenants should not exceed a certain parameter O. The actual value of O is determined by the NFV platform administrator. The purpose of incomplete overlap between different tenants is to prevent dangerous traffic of some tenants from affecting the traffic of other tenants, thereby improving system reliability.
过滤组合算法可以首先过滤不满足等式(2)、(3)、(4)的动作,然后根据所有剩余动作的Q值对其排序,进而从Q值相对更高的剩余动作开始尝试各种动作的排列组合,该尝试过程中可以依次得到满足上述等式(5)的备选动作组合,且应当按照总Q值从高至低的顺序依次得到各个备选动作组合。The filter combination algorithm can first filter out actions that do not satisfy equations (2), (3), and (4), then sort them according to the Q values of all remaining actions, and then try various actions starting from the remaining actions with relatively higher Q values. During this trial process, alternative action combinations that satisfy the above equation (5) can be obtained in sequence, and each alternative action combination should be obtained in order from high to low in total Q value.
例如,假定剩余动作及其Q值分别为:动作A1、Q值为100,动作A2、Q值为98,动作A3、Q值为95,动作A4、Q值为90、动作A5、Q值为88……那么,假定等式(5)所指示的云主机数量为3~4,应当首先选取Q值最高的4个动作A1-A4,并确定这一动作组合(A1,A2,A3,A4)是否满足等式(6):如果满足等式(6),则可以确定动作组合(A1,A2,A3,A4)为最终确定出的总Q值最高的动作组合,并据此生成云主机分配方案;如果不满足等式(6),则重新选取另外4个动作,直至满足等式(6)。For example, assume that the remaining actions and their Q values are: action A1, Q value is 100, action A2, Q value is 98, action A3, Q value is 95, action A4, Q value is 90, action A5, Q value is 88... Then, assuming that the number of cloud hosts indicated by equation (5) is 3 to 4, we should first select the four actions A1-A4 with the highest Q values, and determine this action combination (A1, A2, A3, A4 ) satisfies equation (6): If equation (6) is met, the action combination (A1, A2, A3, A4) can be determined to be the action combination with the highest total Q value finally determined, and the cloud host can be generated accordingly Allocation plan; if equation (6) is not satisfied, reselect another 4 actions until equation (6) is satisfied.
可见,通过上述方式依次确定备选动作组合,相比于先生成所有备选动作组合再考虑约束条件的方式而言,可以极大地提升对云主机分配方案的生成效率。It can be seen that by sequentially determining the alternative action combinations through the above method, compared with the method of first generating all alternative action combinations and then considering the constraints, the efficiency of generating the cloud host allocation plan can be greatly improved.
上述的约束条件,诸如等式(2)~(6)所示,可以定义于NFV平台控制器所维护的可靠性策略,并可由NFV平台控制器将该可靠性策略的相关内容提供至过滤组合算法,以用于对调度决策的生成。例如,NFV平台控制器将该可靠性策略的相关内容通过环境评估模块间接提供至过滤组合算法。NFV平台控制器可以根据实际需求,对可靠性策略中的约束条件进行更新。The above constraints, such as those shown in equations (2) to (6), can be defined in the reliability policy maintained by the NFV platform controller, and the relevant content of the reliability policy can be provided by the NFV platform controller to the filtering combination. Algorithms for generating scheduling decisions. For example, the NFV platform controller indirectly provides relevant content of the reliability strategy to the filtering combination algorithm through the environmental assessment module. The NFV platform controller can update the constraints in the reliability policy based on actual needs.
进一步的,过滤组合算法在确定出上述的云主机分配方案后,可以将相关信息传递至NFV平台控制器,使得NFV平台控制器可以据此形成调度策略。例如,当云主机分配方案为选取动作A1-A4对应的云主机1~4时,相应的调度策略可以为:将租户流量调度至云主机1~4进行处理。Furthermore, after determining the above-mentioned cloud host allocation plan, the filtering combination algorithm can transfer relevant information to the NFV platform controller, so that the NFV platform controller can form a scheduling strategy accordingly. For example, when the cloud host allocation plan is to select cloud hosts 1 to 4 corresponding to actions A1 to A4, the corresponding scheduling policy can be: dispatch tenant traffic to cloud hosts 1 to 4 for processing.
作为示例性的描述,当调度目的为确保所有云主机的总成本最小化时,NFV平台控制器所维护的价格模型可以为云主机价格模型。环境评估模块可以根据该云主机价格模型,以及前述的状态信息、云主机分配方案以及预设的奖励函数,计算出本次云主机分配所产生的奖励。如前所述,通过将调度目的设定为所有云主机的总成本最小化,相应的调度方案可以将当前待分配的租户的流量优先调度至已分配有其他租户的流量的云主机,在安全范围内提高相应云主机的资源利用率,可以降低云主机资源的总成本。并且,由于在相同条件下可以减少启动的云主机数量,能够至少节省下这些云主机在运行时所必需的系统运行、散热等各方面的基础资源消耗,减少温室气体的排放量,有利于早日达成碳达峰碳中和的目标。As an exemplary description, when the scheduling purpose is to ensure that the total cost of all cloud hosts is minimized, the price model maintained by the NFV platform controller may be a cloud host price model. The environmental assessment module can calculate the rewards generated by this cloud host allocation based on the cloud host price model, as well as the aforementioned status information, cloud host allocation plan, and preset reward function. As mentioned above, by setting the scheduling purpose to minimize the total cost of all cloud hosts, the corresponding scheduling scheme can prioritize the traffic of tenants currently to be allocated to cloud hosts that have been allocated the traffic of other tenants, in a secure manner. Improving the resource utilization of the corresponding cloud host within the scope can reduce the total cost of cloud host resources. Moreover, since the number of started cloud hosts can be reduced under the same conditions, it can at least save the basic resource consumption of system operation, heat dissipation and other aspects necessary for the operation of these cloud hosts, and reduce greenhouse gas emissions, which is conducive to early deployment. Achieve the goal of peak carbon neutrality.
强化学习模型可以维护有缓冲区数据集。上述的状态信息、云主机分配方案、奖励和执行该云主机分配方案后所形成的后一状态信息可以作为一组数据,记录至上述的缓冲区数据集。实际上,缓冲区数据集就是用于保存每次分配所形成的各组数据。以及,强化学习模型可以周期性地从缓冲区数据集中选取一组或多组数据,以用于自身的模型更新;其中,针对数据的选取可以为随机选取。 Reinforcement learning models can maintain buffered datasets. The above-mentioned status information, cloud host allocation plan, rewards and the latter status information formed after executing the cloud host allocation plan can be recorded as a set of data to the above-mentioned buffer data set. In fact, the buffer data set is used to save each set of data formed by each allocation. In addition, the reinforcement learning model can periodically select one or more sets of data from the buffer data set for its own model update; the selection of data can be random selection.
本领域技术人员可以理解的是:Those skilled in the art can understand that:
首先,上述实施例描述的是通过已完成训练的强化学习模型进行推理的实现过程,该推理过程可以使得强化学习模块向NFV平台控制器提供满足调度目的的调度策略,其中由于上述调度策略本质上是通过将上述待调度租户流量的资源需求信息与上述云主机资源的实际分配情况输入至强化学习模型中处理后上述强化学习模型所输出的结果,因此相比于相关技术中基于预设的固定权重对云主机资源进行分配,本说明书中获取的调度策略的能够及时适用于不同实际应用场景。First of all, the above embodiment describes the implementation process of reasoning through the reinforcement learning model that has completed training. This reasoning process can enable the reinforcement learning module to provide the NFV platform controller with a scheduling strategy that meets the scheduling purpose. Since the above scheduling strategy is essentially It is the result output by the above-mentioned reinforcement learning model after inputting the above-mentioned resource demand information of the tenant traffic to be scheduled and the actual allocation of the above-mentioned cloud host resources into the reinforcement learning model. Therefore, compared with the fixed preset-based method in related technologies, Weights are used to allocate cloud host resources. The scheduling strategies obtained in this manual can be applied to different practical application scenarios in a timely manner.
其次,在上述推理过程之前,需要对强化学习模型进行多次、反复迭代训练,才能够确保该强化学习模型输出的调度策略能够满足调度目的。针对强化学习模型的训练过程与上述推理过程相类似,包括:将训练样本中的环境数据输入强化学习模型,得到模型输出的动作和得分,并确定出得分最高的动作组合;以及,根据该得分最高的动作组合,可以确定预设的调度目的是否已经达成,若未达成则需要继续对强化学习模型进行迭代训练,直至强化学习模型能够满足调度目的。当然,也可以根据训练次数是否已经达到预设次数等其他条件,确定是否继续对强化学习模型进行迭代训练。Secondly, before the above reasoning process, the reinforcement learning model needs to be trained multiple times and iteratively to ensure that the scheduling policy output by the reinforcement learning model can meet the scheduling purpose. The training process for the reinforcement learning model is similar to the above-mentioned reasoning process, including: inputting the environmental data in the training sample into the reinforcement learning model, obtaining the actions and scores output by the model, and determining the action combination with the highest score; and, based on the score The highest action combination can determine whether the preset scheduling purpose has been achieved. If it has not been achieved, iterative training of the reinforcement learning model needs to continue until the reinforcement learning model can meet the scheduling purpose. Of course, it can also be determined whether to continue iterative training of the reinforcement learning model based on other conditions such as whether the number of training times has reached the preset number.
此外,如前文所述,即便是对于已经训练完成的强化学习模型,在实际推理使用的过程中,仍然可以利用上述缓冲区数据集中缓存的各组数据,即通过周期性地随机选取一组或多组数据对强化学习模型采用的深度神经网络的参数进行更新,从而有助于克服经验数据的相关性和非平稳分布。In addition, as mentioned above, even for the reinforcement learning model that has been trained, during the actual inference process, each group of data cached in the above buffer data set can still be used, that is, by periodically randomly selecting a group or Multiple sets of data update the parameters of the deep neural network used in the reinforcement learning model, thereby helping to overcome the correlation and non-stationary distribution of empirical data.
图4是一示例性实施例中的一种网络功能虚拟化NFV平台控制器的示意结构图。请参考图4,在硬件层面,该NFV平台控制器包括处理器、内部总线、网络接口、内存以及非易失性存储器,当然还可能包括其他所需要的硬件。处理器从非易失性存储器中读取对应的计算机程序到内存中然后运行,在逻辑层面上形成流量调度装置。当然,除了软件实现方式之外,本说明书并不排除其他实现方式,比如逻辑器件抑或软硬件结合的方式等等,也就是说以下处理流程的执行主体并不限定于各个逻辑单元,也可以是硬件或逻辑器件。Figure 4 is a schematic structural diagram of a network function virtualization NFV platform controller in an exemplary embodiment. Please refer to Figure 4. At the hardware level, the NFV platform controller includes a processor, internal bus, network interface, memory and non-volatile storage, and of course may include other required hardware. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs it, forming a traffic scheduling device at the logical level. Of course, in addition to software implementation, this specification does not exclude other implementation methods, such as logic devices or a combination of software and hardware, etc. That is to say, the execution subject of the following processing flow is not limited to each logical unit, and may also be hardware or logic device.
与前述流量调度方法的实施例相对应,本说明书还提供了一种流量调度装置的实施例。Corresponding to the foregoing embodiment of the traffic scheduling method, this specification also provides an embodiment of a traffic scheduling device.
请参考图5,图5是一示例性实施例示出的一种流量调度装置的结构示意图。如图5所示,在软件实施方式中,该装置可以包括:Please refer to FIG. 5 , which is a schematic structural diagram of a traffic scheduling device according to an exemplary embodiment. As shown in Figure 5, in a software implementation, the device may include:
环境数据确定单元501,用于确定环境数据,所述环境数据包含待调度租户流量的资源需求信息与云主机资源的实际分配情况;The environment data determination unit 501 is used to determine the environment data, which includes the resource demand information of the tenant traffic to be scheduled and the actual allocation of cloud host resources;
强化学习模型处理单元502,用于将所述环境数据输入预先训练生成的强化学习模型,并得到所述强化学习模型输出的动作和每一动作的得分;其中,每一动作对应一个或多个云主机;The reinforcement learning model processing unit 502 is used to input the environmental data into the reinforcement learning model generated by pre-training, and obtain the actions output by the reinforcement learning model and the score of each action; wherein each action corresponds to one or more cloud host;
流量调度单元503,用于确定得分最高的动作组合,并将所述待调度租户流量调度至所述得分最高的动作组合对应的云主机;其中,每一动作组合对应一个或多个动作。The traffic scheduling unit 503 is configured to determine the action combination with the highest score, and schedule the tenant traffic to be scheduled to the cloud host corresponding to the action combination with the highest score; wherein each action combination corresponds to one or more actions.
可选的,所述强化学习模型包括深度Q网络模型;其中,所述强化学习模型输出的每 一动作的得分为相应动作的Q值。Optionally, the reinforcement learning model includes a deep Q network model; wherein each of the reinforcement learning models output The score of an action is the Q-value of the corresponding action.
可选的,所述装置还包括:Optionally, the device also includes:
动作约束单元504,用于根据预设的第一约束条件对所述强化学习模型输出的动作进行过滤;其中,第一约束条件用于对单台云主机进行单独约束,且滤除的动作对应的至少一个云主机不满足所述第一约束条件;The action constraint unit 504 is used to filter the actions output by the reinforcement learning model according to the preset first constraint condition; wherein the first constraint condition is used to individually constrain a single cloud host, and the filtered actions correspond to At least one cloud host does not satisfy the first constraint;
从由剩余动作形成的备选动作组合中确定出所述得分最高的动作组合,所述得分最高的动作组合为:在满足预设的第二约束条件的所有备选动作组合中得分最高的备选动作组合;其中,第二约束条件用于对动作组合对应的所有云主机进行整体约束。The action combination with the highest score is determined from the alternative action combinations formed by the remaining actions. The action combination with the highest score is: the action combination with the highest score among all alternative action combinations that satisfy the preset second constraint condition. Select an action combination; among them, the second constraint condition is used to impose overall constraints on all cloud hosts corresponding to the action combination.
可选的,所述动作约束单元504具体用于:Optionally, the action constraint unit 504 is specifically used to:
按照每一动作的得分大小对所述剩余动作进行排序,优先对得分相对更高的剩余动作进行排列组合以生成所述备选动作组合,直至生成的备选动作组合符合所述第二约束条件,并将该备选动作组合确定为所述得分最高的动作组合;或者,The remaining actions are sorted according to the score of each action, and the remaining actions with relatively higher scores are prioritized to generate the alternative action combination until the generated alternative action combination meets the second constraint. , and determine the alternative action combination as the action combination with the highest score; or,
对所述剩余动作进行排列组合以生成所述备选动作组合,根据所述第二约束条件对所有备选动作组合进行筛选,将筛选出的备选动作组合按照所含所有动作的总计得分大小进行排序,并将排列在首位的备选动作组合确定为所述得分最高的动作组合。The remaining actions are arranged and combined to generate the alternative action combinations, all alternative action combinations are filtered according to the second constraint, and the filtered alternative action combinations are calculated according to the total score of all included actions. Sorting is performed, and the alternative action combination ranked first is determined as the action combination with the highest score.
可选的,所述第一约束条件包括:所述待调度租户流量调度至的云主机的资源占比不超过预设资源阈值;Optionally, the first constraint includes: the resource proportion of the cloud host to which the tenant traffic to be scheduled does not exceed a preset resource threshold;
所述第二约束条件包含下述至少之一:所述待调度租户流量所调度至的云主机数量保持在预设区间内、相应动作组合对应的云主机与已调度租户流量对应的云主机不完全重叠。The second constraint includes at least one of the following: the number of cloud hosts to which the tenant traffic to be scheduled remains within a preset interval, the cloud host corresponding to the corresponding action combination and the cloud host corresponding to the scheduled tenant traffic are not the same. Total overlap.
可选的,所述装置还包括:Optionally, the device also includes:
强化学习模型更新单元505,用于定期从预设缓存区中随机选择一组或多组历史数据,以更新所述强化学习模型;The reinforcement learning model update unit 505 is used to randomly select one or more sets of historical data from the preset cache area to update the reinforcement learning model;
其中,每组历史数据包含:历史环境数据、所述历史环境数据对应的得分最高的历史动作组合、所述历史动作组合对应的奖励与所述历史动作组合执行后形成的更新后历史环境数据。Each set of historical data includes: historical environment data, the historical action combination with the highest score corresponding to the historical environment data, the reward corresponding to the historical action combination, and the updated historical environment data formed after the execution of the historical action combination.
可选的,所述历史动作组合对应的奖励由预设奖励函数对所述历史环境数据、预设的云主机价格模型以及所述历史动作组合进行计算得到;其中,奖励大小与所述历史动作组合对应的云主机的总成本呈负相关。Optionally, the reward corresponding to the historical action combination is calculated by a preset reward function based on the historical environment data, the preset cloud host price model and the historical action combination; where the reward size is related to the historical action The total cost of the cloud hosts corresponding to the combination is negatively correlated.
可选的,在满足下述任一触发条件的情况下执行所述装置:Optionally, the device is executed when any of the following trigger conditions is met:
接收到来自新租户的流量;Receive traffic from new tenants;
预设数量云主机的资源使用率达到预设扩容阈值而触发扩容需求;The resource usage of the preset number of cloud hosts reaches the preset expansion threshold and triggers expansion requirements;
预设数量云主机的资源利用率低于预设缩容阈值而触发缩容需求。The resource utilization of the preset number of cloud hosts is lower than the preset shrinkage threshold, triggering the need for shrinkage.
上述装置中各个单元的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。For details on the implementation process of the functions and effects of each unit in the above device, please refer to the implementation process of the corresponding steps in the above method, and will not be described again here.
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例 的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本说明书方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。For the device embodiment, since it basically corresponds to the method embodiment, please refer to the method embodiment for relevant information. Partial description is enough. The device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated. The components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in this specification. Persons of ordinary skill in the art can understand and implement the method without any creative effort.
本说明书中描述的主题及功能操作的实施例可以在以下中实现:数字电子电路、有形体现的计算机软件或固件、包括本说明书中公开的结构及其结构性等同物的计算机硬件、或者它们中的一个或多个的组合。本说明书中描述的主题的实施例可以实现为一个或多个计算机程序,即编码在有形非暂时性程序载体上以被数据处理装置执行或控制数据处理装置的操作的计算机程序指令中的一个或多个模块。可替代地或附加地,程序指令可以被编码在人工生成的传播信号上,例如机器生成的电、光或电磁信号,该信号被生成以将信息编码并传输到合适的接收机装置以由数据处理装置执行。计算机存储介质可以是机器可读存储设备、机器可读存储基板、随机或串行存取存储器设备、或它们中的一个或多个的组合。Embodiments of the subject matter and functional operations described in this specification may be implemented in digital electronic circuits, tangibly embodied computer software or firmware, computer hardware including the structures disclosed in this specification and their structural equivalents, or in A combination of one or more. Embodiments of the subject matter described in this specification may be implemented as one or more computer programs, i.e., one or more of computer program instructions encoded on a tangible non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Multiple modules. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical, or electromagnetic signal, which signal is generated to encode and transmit the information to a suitable receiver device for transmission by the data The processing device executes. Computer storage media may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
本说明书中描述的处理及逻辑流程可以由执行一个或多个计算机程序的一个或多个可编程计算机执行,以通过根据输入数据进行操作并生成输出来执行相应的功能。所述处理及逻辑流程还可以由专用逻辑电路—例如FPGA(现场可编程门阵列)或ASIC(专用集成电路)来执行,并且装置也可以实现为专用逻辑电路。The processes and logic flows described in this specification may be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows may also be performed by dedicated logic circuits, such as FPGA (Field Programmable Gate Array) or ASIC (Application Specific Integrated Circuit), and the device may also be implemented as a dedicated logic circuit.
适合用于执行计算机程序的计算机包括,例如通用和/或专用微处理器,或任何其他类型的中央处理单元。通常,中央处理单元将从只读存储器和/或随机存取存储器接收指令和数据。计算机的基本组件包括用于实施或执行指令的中央处理单元以及用于存储指令和数据的一个或多个存储器设备。通常,计算机还将包括用于存储数据的一个或多个大容量存储设备,例如磁盘、磁光盘或光盘等,或者计算机将可操作地与此大容量存储设备耦接以从其接收数据或向其传送数据,抑或两种情况兼而有之。然而,计算机不是必须具有这样的设备。此外,计算机可以嵌入在另一设备中,例如移动电话、个人数字助理(PDA)、移动音频或视频播放器、游戏操纵台、全球定位系统(GPS)接收机、或例如通用串行总线(USB)闪存驱动器的便携式存储设备,仅举几例。Computers suitable for executing the computer program include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. Typically, the central processing unit will receive instructions and data from read-only memory and/or random access memory. The basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Typically, the computer will also include one or more mass storage devices for storing data, such as magnetic, magneto-optical or optical disks, or the like, or the computer will be operably coupled to such mass storage device to receive data therefrom or to It transmits data, or both. However, the computer is not required to have such a device. Additionally, the computer may be embedded in another device, such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a global positioning system (GPS) receiver, or a universal serial bus (USB) ) flash drives for portable storage devices, to name a few.
适合于存储计算机程序指令和数据的计算机可读介质包括所有形式的非易失性存储器、媒介和存储器设备,例如包括半导体存储器设备(例如EPROM、EEPROM和闪存设备)、磁盘(例如内部硬盘或可移动盘)、磁光盘以及CD ROM和DVD-ROM盘。处理器和存储器可由专用逻辑电路补充或并入专用逻辑电路中。Computer-readable media suitable for storage of computer program instructions and data include all forms of non-volatile memory, media and memory devices, including, for example, semiconductor memory devices (such as EPROM, EEPROM, and flash memory devices), magnetic disks (such as internal hard disks or removable memory devices). removable disks), magneto-optical disks, and CD ROM and DVD-ROM disks. The processor and memory may be supplemented by or incorporated into special purpose logic circuitry.
虽然本说明书包含许多具体实施细节,但是这些不应被解释为限制任何发明的范围或所要求保护的范围,而是主要用于描述特定发明的具体实施例的特征。本说明书内在多个实施例中描述的某些特征也可以在单个实施例中被组合实施。另一方面,在单个实施例中描述的各种特征也可以在多个实施例中分开实施或以任何合适的子组合来实施。此外,虽 然特征可以如上所述在某些组合中起作用并且甚至最初如此要求保护,但是来自所要求保护的组合中的一个或多个特征在一些情况下可以从该组合中去除,并且所要求保护的组合可以指向子组合或子组合的变型。Although this specification contains many specific implementation details, these should not be construed to limit the scope of any invention or what is claimed, but rather serve primarily to describe features of specific embodiments of particular inventions. Certain features described in this specification as multiple embodiments can also be combined in a single embodiment. On the other hand, various features that are described in a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. In addition, although While features may function in certain combinations as described above and are even initially claimed as such, one or more features from a claimed combination may in some cases be removed from that combination and what is claimed A composition can point to a subcomposition or a variant of a subcomposition.
类似地,虽然在附图中以特定顺序描绘了操作,但是这不应被理解为要求这些操作以所示的特定顺序执行或顺次执行、或者要求所有例示的操作被执行,以实现期望的结果。在某些情况下,多任务和并行处理可能是有利的。此外,上述实施例中的各种系统模块和组件的分离不应被理解为在所有实施例中均需要这样的分离,并且应当理解,所描述的程序组件和系统通常可以一起集成在单个软件产品中,或者封装成多个软件产品。Similarly, although operations are depicted in a specific order in the drawings, this should not be construed as requiring that the operations be performed in the specific order shown, or sequentially, or that all illustrated operations be performed to achieve desired results. result. In some cases, multitasking and parallel processing may be advantageous. Furthermore, the separation of various system modules and components in the above embodiments should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product , or packaged into multiple software products.
由此,主题的特定实施例已被描述。其他实施例在所附权利要求书的范围以内。在某些情况下,权利要求书中记载的动作可以以不同的顺序执行并且仍实现期望的结果。此外,附图中描绘的处理并非必需所示的特定顺序或顺次顺序,以实现期望的结果。在某些实现中,多任务和并行处理可能是有利的。Thus, specific embodiments of the subject matter have been described. Other embodiments are within the scope of the appended claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Furthermore, the processes depicted in the figures do not necessarily require the specific order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.
以上所述仅为本说明书的较佳实施例而已,并不用以限制本说明书,凡在本说明书的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本说明书保护的范围之内。 The above are only preferred embodiments of this specification and are not intended to limit this specification. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of this specification shall be included in this specification. within the scope of protection.

Claims (10)

  1. 一种流量调度方法,其特征在于,所述方法包括:A traffic scheduling method, characterized in that the method includes:
    确定环境数据,所述环境数据包含待调度租户流量的资源需求信息与云主机资源的实际分配情况;Determine environmental data, which includes resource demand information of tenant traffic to be scheduled and actual allocation of cloud host resources;
    将所述环境数据输入预先训练生成的强化学习模型,并得到所述强化学习模型输出的动作和每一动作的得分;其中,每一动作对应一个或多个云主机;Input the environmental data into the reinforcement learning model generated by pre-training, and obtain the actions output by the reinforcement learning model and the score of each action; wherein each action corresponds to one or more cloud hosts;
    确定得分最高的动作组合,并将所述待调度租户流量调度至所述得分最高的动作组合对应的云主机;其中,每一动作组合对应一个或多个动作。Determine the action combination with the highest score, and schedule the tenant traffic to be scheduled to the cloud host corresponding to the action combination with the highest score; wherein each action combination corresponds to one or more actions.
  2. 根据权利要求1所述的方法,其特征在于,所述强化学习模型包括深度Q网络模型;其中,所述强化学习模型输出的每一动作的得分为相应动作的Q值。The method of claim 1, wherein the reinforcement learning model includes a deep Q network model; wherein the score of each action output by the reinforcement learning model is the Q value of the corresponding action.
  3. 根据权利要求1所述的方法,其特征在于,所述确定所述得分最高的动作组合,包括:The method according to claim 1, wherein determining the action combination with the highest score includes:
    根据预设的第一约束条件对所述强化学习模型输出的动作进行过滤;其中,第一约束条件用于对单台云主机进行单独约束,且滤除的动作对应的至少一个云主机不满足所述第一约束条件;Filter the actions output by the reinforcement learning model according to the preset first constraint condition; wherein the first constraint condition is used to individually constrain a single cloud host, and at least one cloud host corresponding to the filtered action does not satisfy The first constraint;
    从由剩余动作形成的备选动作组合中确定出所述得分最高的动作组合,所述得分最高的动作组合为:在满足预设的第二约束条件的所有备选动作组合中得分最高的备选动作组合;其中,第二约束条件用于对动作组合对应的所有云主机进行整体约束。The action combination with the highest score is determined from the alternative action combinations formed by the remaining actions. The action combination with the highest score is: the action combination with the highest score among all alternative action combinations that satisfy the preset second constraint condition. Select an action combination; among them, the second constraint condition is used to impose overall constraints on all cloud hosts corresponding to the action combination.
  4. 根据权利要求3所述的方法,其特征在于,所述从由剩余动作形成的备选动作组合中确定出所述得分最高的动作组合,包括:The method according to claim 3, wherein determining the action combination with the highest score from the alternative action combinations formed by the remaining actions includes:
    按照每一动作的得分大小对所述剩余动作进行排序,优先对得分相对更高的剩余动作进行排列组合以生成所述备选动作组合,直至生成的备选动作组合符合所述第二约束条件,并将该备选动作组合确定为所述得分最高的动作组合;或者,The remaining actions are sorted according to the score of each action, and the remaining actions with relatively higher scores are prioritized to generate the alternative action combination until the generated alternative action combination meets the second constraint. , and determine the alternative action combination as the action combination with the highest score; or,
    对所述剩余动作进行排列组合以生成所述备选动作组合,根据所述第二约束条件对所有备选动作组合进行筛选,将筛选出的备选动作组合按照所含所有动作的总计得分大小进行排序,并将排列在首位的备选动作组合确定为所述得分最高的动作组合。The remaining actions are arranged and combined to generate the alternative action combinations, all alternative action combinations are filtered according to the second constraint, and the filtered alternative action combinations are calculated according to the total score of all included actions. Sorting is performed, and the alternative action combination ranked first is determined as the action combination with the highest score.
  5. 根据权利要求3所述的方法,其特征在于,The method according to claim 3, characterized in that:
    所述第一约束条件包括:所述待调度租户流量调度至的云主机的资源占比不超过预设资源阈值;The first constraint includes: the resource proportion of the cloud host to which the tenant traffic to be scheduled does not exceed a preset resource threshold;
    所述第二约束条件包含下述至少之一:所述待调度租户流量所调度至的云主机数量保持在预设区间内、相应动作组合对应的云主机与已调度租户流量对应的云主机不完全重叠。The second constraint includes at least one of the following: the number of cloud hosts to which the tenant traffic to be scheduled remains within a preset interval, the cloud host corresponding to the corresponding action combination and the cloud host corresponding to the scheduled tenant traffic are not the same. Total overlap.
  6. 根据权利要求1所述的方法,其特征在于,还包括:The method according to claim 1, further comprising:
    定期从预设缓存区中随机选择一组或多组历史数据,以更新所述强化学习模型;Regularly randomly select one or more sets of historical data from the preset cache area to update the reinforcement learning model;
    其中,每组历史数据包含:历史环境数据、所述历史环境数据对应的得分最高的历史动作组合、所述历史动作组合对应的奖励与所述历史动作组合执行后形成的更新后历史环 境数据。Each set of historical data includes: historical environment data, the historical action combination with the highest score corresponding to the historical environment data, the reward corresponding to the historical action combination, and the updated historical loop formed after the execution of the historical action combination. environment data.
  7. 根据权利要求6所述的方法,其特征在于,所述历史动作组合对应的奖励由预设奖励函数对所述历史环境数据、预设的云主机价格模型以及所述历史动作组合进行计算得到;其中,奖励大小与所述历史动作组合对应的云主机的总成本呈负相关。The method according to claim 6, characterized in that the reward corresponding to the historical action combination is calculated by a preset reward function based on the historical environment data, the preset cloud host price model and the historical action combination; Among them, the reward size is negatively correlated with the total cost of the cloud host corresponding to the historical action combination.
  8. 根据权利要求1所述的方法,其特征在于,在满足下述任一触发条件的情况下执行所述方法:The method according to claim 1, characterized in that the method is executed when any of the following trigger conditions is met:
    接收到来自新租户的流量;Receive traffic from new tenants;
    预设数量云主机的资源使用率达到预设扩容阈值而触发扩容需求;The resource usage of the preset number of cloud hosts reaches the preset expansion threshold and triggers expansion requirements;
    预设数量云主机的资源利用率低于预设缩容阈值而触发缩容需求。The resource utilization of the preset number of cloud hosts is lower than the preset shrinkage threshold, triggering the need for shrinkage.
  9. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时实现如权利要求1~8任一所述方法的步骤。A computer-readable storage medium on which a computer program is stored, characterized in that when the program is executed by a processor, the steps of the method according to any one of claims 1 to 8 are implemented.
  10. 一种网络功能虚拟化NFV平台控制器,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1~8任一所述方法的步骤。 A network function virtualization NFV platform controller, including a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that when the processor executes the program, it implements claim 1 ~8 Steps of any of the methods.
PCT/CN2023/088860 2022-04-29 2023-04-18 Traffic scheduling method WO2023207663A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210476232.9 2022-04-29
CN202210476232.9A CN114745392A (en) 2022-04-29 2022-04-29 Flow scheduling method

Publications (1)

Publication Number Publication Date
WO2023207663A1 true WO2023207663A1 (en) 2023-11-02

Family

ID=82285412

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/088860 WO2023207663A1 (en) 2022-04-29 2023-04-18 Traffic scheduling method

Country Status (2)

Country Link
CN (1) CN114745392A (en)
WO (1) WO2023207663A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117407718A (en) * 2023-12-15 2024-01-16 杭州宇谷科技股份有限公司 Training method, application method and system of battery replacement prediction model

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114745392A (en) * 2022-04-29 2022-07-12 阿里云计算有限公司 Flow scheduling method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492774A (en) * 2018-11-06 2019-03-19 北京工业大学 A kind of cloud resource dispatching method based on deep learning
CN111143036A (en) * 2019-12-31 2020-05-12 广东省电信规划设计院有限公司 Virtual machine resource scheduling method based on reinforcement learning
CN112311578A (en) * 2019-07-31 2021-02-02 中国移动通信集团浙江有限公司 VNF scheduling method and device based on deep reinforcement learning
US11206221B1 (en) * 2021-06-04 2021-12-21 National University Of Defense Technology Online task dispatching and scheduling system and method thereof
CN114745392A (en) * 2022-04-29 2022-07-12 阿里云计算有限公司 Flow scheduling method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108170517A (en) * 2018-01-08 2018-06-15 武汉斗鱼网络科技有限公司 A kind of container allocation method, apparatus, server and medium
KR102154446B1 (en) * 2019-11-14 2020-09-09 한국전자기술연구원 Method for fast scheduling for resource balanced allocation on distributed and collaborative container platform environment
US11416296B2 (en) * 2019-11-26 2022-08-16 International Business Machines Corporation Selecting an optimal combination of cloud resources within budget constraints
CN112052071B (en) * 2020-09-08 2023-07-04 福州大学 Cloud software service resource allocation method combining reinforcement learning and machine learning
CN113747450B (en) * 2021-07-27 2022-12-09 清华大学 Service deployment method and device in mobile network and electronic equipment
CN113886006A (en) * 2021-09-13 2022-01-04 中铁信弘远(北京)软件科技有限责任公司 Resource scheduling method, device and equipment and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492774A (en) * 2018-11-06 2019-03-19 北京工业大学 A kind of cloud resource dispatching method based on deep learning
CN112311578A (en) * 2019-07-31 2021-02-02 中国移动通信集团浙江有限公司 VNF scheduling method and device based on deep reinforcement learning
CN111143036A (en) * 2019-12-31 2020-05-12 广东省电信规划设计院有限公司 Virtual machine resource scheduling method based on reinforcement learning
US11206221B1 (en) * 2021-06-04 2021-12-21 National University Of Defense Technology Online task dispatching and scheduling system and method thereof
CN114745392A (en) * 2022-04-29 2022-07-12 阿里云计算有限公司 Flow scheduling method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117407718A (en) * 2023-12-15 2024-01-16 杭州宇谷科技股份有限公司 Training method, application method and system of battery replacement prediction model
CN117407718B (en) * 2023-12-15 2024-03-26 杭州宇谷科技股份有限公司 Training method, application method and system of battery replacement prediction model

Also Published As

Publication number Publication date
CN114745392A (en) 2022-07-12

Similar Documents

Publication Publication Date Title
WO2023207663A1 (en) Traffic scheduling method
CN110381541B (en) Smart grid slice distribution method and device based on reinforcement learning
Ben Alla et al. A novel task scheduling approach based on dynamic queues and hybrid meta-heuristic algorithms for cloud computing environment
CN104572307B (en) The method that a kind of pair of virtual resource carries out flexible scheduling
CN108958916B (en) Workflow unloading optimization method under mobile edge environment
WO2023184939A1 (en) Deep-reinforcement-learning-based adaptive efficient resource allocation method for cloud data center
US20180314971A1 (en) Training Machine Learning Models On A Large-Scale Distributed System Using A Job Server
CN114257599A (en) Adaptive finite duration edge resource management
US20140250440A1 (en) System and method for managing storage input/output for a compute environment
CN109788046B (en) Multi-strategy edge computing resource scheduling method based on improved bee colony algorithm
CN113032120B (en) Industrial field big data task cooperative scheduling method based on edge calculation
CN104168318A (en) Resource service system and resource distribution method thereof
CN115037749B (en) Large-scale micro-service intelligent multi-resource collaborative scheduling method and system
Fan et al. Multi-objective optimization of container-based microservice scheduling in edge computing
WO2024060571A1 (en) Heterogeneous computing power-oriented multi-policy intelligent scheduling method and apparatus
CN104580447A (en) Spatio-temporal data service scheduling method based on access heat
CN113806018A (en) Kubernetes cluster resource hybrid scheduling method based on neural network and distributed cache
US20220156633A1 (en) System and method for adaptive compression in federated learning
CN114443249A (en) Container cluster resource scheduling method and system based on deep reinforcement learning
CN114675975B (en) Job scheduling method, device and equipment based on reinforcement learning
CN116614394A (en) Service function chain placement method based on multi-target deep reinforcement learning
Zhang et al. Employ AI to improve AI services: Q-learning based holistic traffic control for distributed co-inference in deep learning
CN114691372A (en) Group intelligent control method of multimedia end edge cloud system
CN115809148B (en) Load balancing task scheduling method and device for edge computing
CN114968402A (en) Edge calculation task processing method and device and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23795096

Country of ref document: EP

Kind code of ref document: A1