CN115714820A

CN115714820A - Distributed micro-service scheduling optimization method

Info

Publication number: CN115714820A
Application number: CN202211424666.0A
Authority: CN
Inventors: 李寒; 赵卓峰
Original assignee: North China University of Technology
Current assignee: North China University of Technology
Priority date: 2022-11-14
Filing date: 2022-11-14
Publication date: 2023-02-24

Abstract

A distributed micro-service scheduling optimization method comprises the following steps; step 1: modeling the environment where the t-th scheduling task is located into a quadruplet to describe the deployment condition of the micro-service instance in the distributed environment instance at the moment; step 2: modeling the action of the t-th scheduling task as a binary group to describe that the micro service is deployed to the node at the moment; and 3, step 3: the sum of the difference of the service time delays of all the services after the action is finished and before the action is finished and the difference of the resource balance is used for reward description; and 4, step 4: training to obtain a prediction model of the micro-service instance scheduling optimization scheme; and 5: and predicting by using a prediction model to obtain a corresponding scheduling optimization scheme. The invention introduces the dependence between services, between services and data and between services and users into the micro-service scheduling constraint, utilizes the continuous optimization capability of reinforcement learning, and is a micro-service scheduling optimization method which is suitable for a distributed scene and has the continuous optimization capability.

Description

Distributed micro-service scheduling optimization method

Technical Field

The invention belongs to the technical field of micro-service scheduling optimization, and particularly relates to a distributed micro-service scheduling optimization method.

Background

In a distributed service scenario, micro-services are not only deployed on the cloud, but also deployed on the sides and the ends according to requirements. The distributed deployment mode of the microservices brings certain difficulty to scheduling.

First, since the micro services generally have the characteristic of single function, the micro services cooperate with each other to provide services, and therefore, the invocation among the micro services is ubiquitous. If such data calls between microservices occur frequently, service response time may be increased, affecting service performance.

Secondly, data is the core of the service, the dependence between the service and the data generally exists, if the data cannot be moved due to factors such as large data size and data security, the deployment position of the service having the dependence relationship with the data cannot be deployed at will, and if the deployment position is not appropriate, the service delay is likely to increase, and even the service cannot be executed.

Third, if the user is limited by the administrative domain, the distance between the service and the user will increase the data transmission time, so the deployment location of the service should be constrained.

In summary, in a distributed scenario, the deployment locations of the micro-services, data and users are not arbitrary, and the scheduling of the micro-services needs to consider the dependency constraints between services, services and data, and users and services. However, most of the existing micro-service scheduling only considers service scheduling on the cloud, and usually a resource center, a micro-service and a user are used as entities, and constraints include resource constraints, balance constraints and the like, and the dependence between services, between services and data and between services and users is not used as a factor for influencing scheduling. In addition, the current micro-service scheduling method basically adopts a mode of one-time training and multiple applications, and has no continuous optimization capability.

Disclosure of Invention

In order to overcome the technical problems, the invention aims to provide a distributed micro-service scheduling optimization method, which introduces the dependence between services, between services and data and between services and users into micro-service scheduling constraints, utilizes the continuous optimization capability of reinforcement learning, and is a micro-service scheduling method which is suitable for distributed scenes and has the continuous optimization capability.

In order to achieve the purpose, the invention adopts the technical scheme that:

a distributed micro-service scheduling optimization method comprises the following steps;

step 1: modeling the environment where the t-th scheduling task is located into a quadruple based on the micro-service instance, the distributed environment instance and the deployment position of the micro-service in the distributed environment, so as to describe the deployment condition of the micro-service instance in the distributed environment instance at the moment;

step 2: modeling the action of the t-th scheduling task as a binary group to describe the microservice s at the moment _i Is deployed to node n _j ；

And step 3: aiming at low service delay and high resource balance, the sum of the difference of service delay and the difference of resource balance of all services after and before the action is finished is used for reward description;

and 4, step 4: environment E _t = (t, S, N, SN), where t is a natural number greater than 0 and represents the t-th micro-service scheduling optimization task, S is a vector formed by description information of all micro-service instances, N is a vector formed by description information of all nodes in a cloud environment instance, and SN is a vector formed by an order couple with the number of a micro-service instance as a first element and the number of a node as a second element; action a _t ＝(s _i ,n _j ) Wherein s is _i Numbering for micro-service instances, n _j Is s is _i The number of nodes deployed to; prize r _t ＝(g _t+1 -g _t )-(avgT _t+1 -avgT _t ) Wherein g is _t And g _t+1 Respectively is the action a _t Resource balance before and after execution, avgT _t And avgT _t+1 Respectively is the action a _t Inputting the DDPG model for training after the service time delay before and after execution to obtain a prediction model of the micro-service instance scheduling optimization scheme;

and 5: and predicting the scheduling optimization scheme of the micro service instance in the cloud environment instance by using the trained micro service instance scheduling optimization scheme prediction model to obtain a corresponding scheduling optimization scheme, so as to solve the technical problem of deploying the micro service instance to the nodes of the cloud environment instance on the premise of ensuring that the cloud environment instance has high resource balance and low service delay.

The step 1 specifically comprises the following steps:

description of the environment: the environment state E of the tth micro-service scheduling optimization task _t Defined as a quadruple, E _t = (t, S, N, SN), where t is a scheduling optimization task number; s is a vector representing the current service, i.e., S = { S = { S } ₁ ,s ₂ ,…,s _i ,…,s _n }; n is a vector representing the current node, i.e., N = { N ₁ ,n ₂ ,…n _j ,…n _m }; SN is a vector representing the current service deployment scenario, i.e., SN = (c =: (d))<s _s1 ,n _n1 >,<s _s2 ,n _n2 >,…<s _sk ,n _nk >,…<s _sNum ,n _nNum >) Wherein each element is an ordered pair<s _sk ,n _nk >Represents a service s _sk Is deployed to node n _nk The above.

The step 2 specifically comprises the following steps:

action description: defining the action of the tth micro-service scheduling optimization task as executing the operation a of deploying the service to the node in the current optimization scheduling task _t ＝(s _i ,n _j ) I.e. service s _i Is deployed to node n _j Wherein n is _j The selection of (a) is to meet the constraints of service dependency, data dependency and user dependency, and is oriented to a distributed service environment.

The scheduling constraints for the service are designed as follows:

service dependence: the functions of the micro-services are relatively independent, user requirements are composite in a distributed environment, the micro-services need to be cooperated with each other to meet the user requirements, the calling relationship among the micro-services is analyzed, service dependencies with various strengths are extracted from link data of the micro-services to restrict an action set, specifically, the times of the calling relationship among the services are obtained through statistics from the micro-service calling link data, then the calling times among the services are divided by the total calling times in a service unit time to serve as the strength of the service dependencies, and then when the service to be deployed and other services have strong service dependency relationships, the node where the strong dependency services are located or adjacent nodes are preferentially selected;

data dependence: data is a core element of a service, so that data dependence is used for limiting an action set, the position of a data source can be obtained through priori knowledge, the dependence relationship between the service and the data source is obtained through analysis of a business process description file, matrix description is adopted, rows of a matrix represent the node number of the service, columns of the matrix represent the node number of the data, if dependence exists between the service and the data, the node number of the service and the node number of the data are used as subscripts, the element is 1, otherwise, the element is 0, and then when the service to be deployed and the data have the dependence relationship, the node where the data is located or an adjacent node is preferentially selected;

the user relies on: the user is a main body for calling the service, so that the user dependence is used for defining an action set, the dependence relationship between the user and the service is obtained by priori knowledge, matrix description is adopted, rows of the matrix represent the number of nodes where the service is located, columns of the matrix represent the number of nodes where the user accesses, if the dependence exists between the service and the user, the number of the nodes where the service is located and the number of the nodes where the user accesses are used as subscripts, the subscripts are taken as 1, otherwise, the subscripts are taken as 0, and then when the service to be deployed and the user have the dependence relationship, the node where the user is located or adjacent nodes are preferentially selected.

The step 3 specifically comprises the following steps:

reward description: by action a _t The sum of the difference of the service time delay and the difference of the resource balance of all the services after and before completion is used as a basis for setting a reward value, the calculation method is as shown in a formula (1), when the service time delay is reduced and the resource balance is improved, the reward is a positive value, otherwise, the reward is a 0 or negative value;

r _t ＝(g _t+1 -g _t )-(avgT _t+1 -avgT _t ) Formula (1)

As shown in formula (1), r _t Scheduling an optimized task for the tth micro-service with a reward value, g, after execution _t And g _t+1 Respectively is an action a _t Resource balance before and after execution, avgT _t And avgT _t+1 Respectively is the action a _t Service delays before and after execution.

The method for calculating the resource balance and the service delay comprises the following steps:

1) Resource balance: defining resource balance based on CPU utilization and memory utilization;

CPU utilization rate: the CPU utilization rate of the node is defined as the ratio of the allocated CPU resource to the total CPU resource in the node, the allocated CPU resource is calculated by the CPU use condition of the container on the node, as shown in formula (2), n is the total number of the containers on the node;

formula (2)

The CPU utilization is calculated as shown in equation (3) and Capacity _cpu Is the total CPU resource of the node;

the memory utilization rate is as follows: the memory utilization rate of a node is defined as the ratio of allocated memory resources to total memory resources in the node, the allocated memory resources are calculated according to the memory use condition of a container on the node, and n is the total number of the containers on the node as shown in a formula (4);

formula (4)

The calculation method of the memory utilization rate is shown as the formula (5), capacity _mem Is the total memory resource of the node;

the node resource balance is defined as the absolute value of the difference between the node CPU utilization rate and the memory utilization rate, as shown in formula (6);

g _i ＝|Ratio_Node _cpu ⁱ –Ratio_Node _mem ⁱ equation (6)

Wherein i is the Node number, ratio _ Node _cpu ⁱ Indicates the CPU utilization of the i-th Node, ratio _ Node _mem ⁱ Representing the memory utilization rate of the ith node;

the cloud environment comprises a large number of nodes, resource balance needs to be comprehensively considered for all the nodes, single node balance cannot represent good performance of the container cloud, the resource balance of the cloud environment is evaluated by using the variance of the node resource balance of all the nodes, and reward scoring is assisted, wherein the resource balance is shown in a formula (7);

wherein the content of the first and second substances,

is the average value of resource balance of each node, g _i Resource balance for the ith node;

2) Service delay: the sum of the communication delay and the execution delay is used for expressing the service delay, as shown in a formula (8);

T _i,j ＝comT _i,j +exeT _i,j formula (8)

Wherein, T _i,j Indicating the service delay, comT, of service i at node j _i,j Representing the communication delay of service i at node j, resulting from the network transmission of services and data dependent on service i, exeT _i,j Representing the execution delay of the service i on the node j;

the communication delay of a service can be defined as the sum of the delays generated by service dependence and data dependence, as shown in equation (9);

comT _i,j ＝sevT _i,j +datT _i,j formula (9)

Wherein, sevT _i,j Representing the time delay of the service i on the node j caused by service dependence, and representing the time delay by the sum of the dependence strength and the quotient of the bandwidth of each hop, wherein the dependence strength between services is defined as the number of times of calling between services divided by the total number of times of calling in the unit time of the service; datT _i,j The time delay generated by data dependence of the service i on the node j can be represented by the data quantityA summation representation of the hop bandwidth quotient values;

assuming that a node can deploy all services scheduled to it, the execution latency of a service is defined as the ratio of the service instruction length to the processing capacity of the node CPU, as shown in equation (10);

among them, mips _j CPU instruction execution speed, CPU U, representing node j _i Representing CPU utilization, mi, of service i _i The instruction length of the service i is represented, the first two indexes can be obtained through a monitoring system, and the third index is priori knowledge;

a large number of services exist in distributed nodes, and the mean value of all service response times is adopted to measure the service delay in the cloud environment, as shown in formula (11).

Wherein i represents a service number, the total number of services is N, j represents a node number, and the total number of nodes is M.

The step 4 specifically comprises the following steps:

describing micro-service scheduling optimization problem based on deep reinforcement learning as 5-tuple O _t ＝(t,E _t ,a _t ,r _t ,E _t+1 ) Wherein t is a value between 1 and n, a task number optimized for scheduling; e _t Representing the state of the distributed environment at the tth task time; a is _t Is shown in state E _t The action taken next, i.e. the scheduling scheme of the service; r is _t Is shown in state E _t Lower sampling a _t The reward earned by this action; e _t+1 Is shown in state E _t Lower sampling a _t The new state obtained by the action depicts that the 5-tuple of each scheduling optimization task is stored in an experience replay pool to provide a basis for model training;

DDPG uses a routing policy networkA dual network structure consisting of a value network, the input of the policy network being the state E of the current environment _t Output corresponding action a _t . Value network input is the state E of the current environment _t And action a _t The output is E _t Performing action a in State _t In addition, the policy network and the value network are both divided into an online network and a target network, and the online network and the policy network have the same corresponding structures but different initialization parameters.

The step 5 specifically comprises the following steps:

and based on the model obtained by training, inputting the environment of the current micro-service scheduling optimization task and scheduling constraint consisting of a service dependency matrix, a data dependency matrix and a user dependency matrix into the model obtained by training, executing the model to generate a scheduling optimization scheme, and feeding back an obtained reward value for executing the current micro-service scheduling optimization task to the environment for continuous optimization.

The invention has the beneficial effects.

The method converts the scheduling optimization problem of the micro-services into a prediction problem based on reinforcement learning, quantifies factors influencing the performance of the micro-services such as service dependence, data dependence, user dependence, distributed resources and the like into the scheduling model, considers more optimization factors, and is also favorable for improving the scheduling efficiency of the services formed by combining the micro-services.

The invention analyzes the characteristics of the micro-service, including service dependency, data dependency and user dependency. Based on the characteristics, three types of scheduling constraints are designed, and a model of a distributed micro-service scheduling optimization problem is provided. And then, converting the scheduling problem of the distributed micro-service into a prediction problem based on a model, and performing continuous scheduling optimization on the micro-service based on the trained model. According to the scheduling method and the scheduling device, the efficiency of generating the scheduling scheme can be effectively improved on the premise of ensuring the effectiveness of the scheduling scheme, and the timeliness requirements of the micro-service and the service obtained by the micro-service combination can be better met.

Description of the drawings:

FIG. 1 is a schematic diagram of the basic environment of a distributed service.

FIG. 2 is a basic flow of a micro-service scheduling optimization problem based on deep reinforcement learning.

Fig. 3 is a structural diagram of a scheduling optimization method based on deep reinforcement learning.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The application designs a distributed micro-service scheduling optimization method. In a distributed scientific and technical service environment, factors influencing service performance are many, various scheduling optimization algorithms with different optimization targets are generated at the same time for guaranteeing service quality, and the final purpose is to guarantee normal operation of services. However, the service scheduling optimization algorithms proposed at different times have corresponding limitations. In addition, there is a possibility that the scheduling effect is not ideal when the same scheduling scheme is adopted for different types of micro-services. Therefore, research is conducted on a scheduling optimization method of microservices for a distributed environment.

FIG. 1 shows the basic environment of distributed services, which mainly includes three types of objects, namely distributed nodes, services and users. A service is a core, meaning a microservice that provides business functions. The nodes are service carriers and comprise cloud nodes and edge nodes and are used for providing computing and storage capacity, the nodes are connected through a network in advance, and the transmission speed is influenced by bandwidth. The users are consumers of the service, and the distribution of the users is limited by the administrative domain of the industry or the unit. In summary, in a distributed scientific and technological service environment, micro-service scheduling refers to a process of deploying micro-services to appropriate nodes with the goal of optimizing a target under certain constraints.

The invention comprehensively considers factors influencing service scheduling and scheduling efficiency, mainly considers the constraints of 3 control actions and the optimization targets of 2 auxiliary rewards in the service scheduling optimization process, and adopts a Deep Deterministic Policy Gradient (DDPG) algorithm to realize the scheduling optimization of the service.

(1) Scheduling constraints

For a distributed service environment, the scheduling constraint of the service is designed as follows:

1) Service dependence: typically, the functions of the microservices are relatively independent. In a distributed environment, the scientific and technological service requirements of users are complex, and micro-services need to meet the requirements of users through mutual cooperation. Thus, dependencies between microservices are ubiquitous. The research defines the calling relationship among the micro-services as dependency, and divides the strength of the service dependency into strong dependency and weak dependency according to the calling frequency of the service. Generally speaking, services with strong dependency are frequently called, and if the services with strong dependency are combined and deployed in the vicinity or even on the same node, network overhead caused by data transmission can be effectively avoided, so that the performance of the scientific and technological services is optimized. Service dependencies can be used to constrain the set of actions in a deep reinforcement learning based scheduling optimization approach. Specifically, the number of times of the call relationship between services may be obtained by statistics from service call link data obtained by the isotio monitoring, and then the number of times of the call between services is divided by the total number of times of the call within the service unit time as the strength of the service dependency. And then, when the service to be deployed has a strong service dependency relationship with other services, preferentially selecting the node where the strong service is located or adjacent to the strong service.

2) Data dependence: data is the core of a service, and almost all services need data processing, so that the dependence between the service and the data is ubiquitous. Services have to process data and necessarily need to access the data. If the data and the service are not located in the same node, data transmission is performed, so that the transmission time of the data also becomes a factor influencing service delay, and the service and the data are adjacent to each other as much as possible, which is favorable for guaranteeing service performance. Especially for data with a certain size, the influence of the transmission time on the performance is more obvious. Thus, the present application uses data dependencies as behavioral constraints for defining action sets. The position of the data source can be obtained through priori knowledge, and the dependency relationship between the service and the data source can be obtained through analysis of the service flow description file.

3) The user relies on: the user is the subject of the service, and the deployment of the service may also be affected by the user. For services with security and management requirements, the services are deployed within the limited range of users, so the deployment positions of the services should be constrained according to the user information. Thus, the present application takes data dependence as a behavioral constraint to define a set of actions. The dependency relationship between the user and the service is obtained by prior knowledge.

(2) Optimizing an objective

For a distributed service environment, the optimization target of the service is designed as follows:

generally speaking, controlling cost and improving user experience are the most interesting evaluation criteria for service scheduling. In terms of cost control, reasonable allocation of resources will help control costs. For a node bearing a service, a CPU and a memory are main resources, and if utilization rates of the CPU and the memory are unbalanced, especially when the utilization rate of the CPU or the memory is too high, another resource is wasted, so that the utilization rate of the whole resource is low, and the cost is increased. Accuracy and timeliness are fundamental quality requirements for all information, as are the services used to provide the information. The accuracy is determined by the logic inherent in the service and is not the subject of the present application. Timeliness is represented as service delay, and if timeliness of user requirements cannot be met, user experience can be directly influenced. Therefore, the method takes resource balance and service delay as the targets of micro-service scheduling optimization.

1) Resource balance: the node is a basic unit for bearing service, and balanced utilization of resources can not only avoid service performance reduction caused by high-load nodes, but also realize cost control to a certain extent, namely avoid resource waste and cost improvement caused by low-load and even zero-load nodes. The resource balance is defined based on the CPU utilization rate and the memory utilization rate.

CPU utilization rate: the CPU utilization of a node is defined as the ratio of the allocated CPU resources to the total CPU resources in the node. The allocated CPU resources are calculated by the CPU usage of the containers on the nodes, as shown in equation (1), where n is the total number of containers on the nodes.

Formula (2)

The CPU utilization is calculated as shown in equation (3) with Capacity _cpu Is the total CPU resources of the node.

The memory utilization rate is as follows: the memory utilization of a node is defined as the ratio of the allocated memory resources to the total memory resources in the node. The allocated memory resources are calculated by the memory usage of the containers on the nodes, as shown in formula (4), where n is the total number of containers on the nodes.

Formula (4)

The calculation mode of the memory utilization rate is shown as the formula (5), and the Capacity _mem Is the total memory resource of the node.

Node resource balance is defined as the absolute value of the difference between the node CPU utilization and the memory utilization, as shown in equation (6).

g _i ＝|Ratio_Node _cpu ⁱ –Ratio_Node _mem ⁱ Equation (6)

Wherein i is the Node number, ratio _ Node _cpu ⁱ Indicates the CPU utilization of the i-th Node, ratio _ Node _mem ⁱ And the memory utilization rate of the ith node is shown.

The cloud environment comprises a large number of nodes, resource balance needs to be comprehensively considered for all the nodes, and single node balance cannot represent good performance of the container cloud, so that the resource balance of the cloud environment is evaluated by using variance of the node resource balance of all the nodes, and reward points are assisted, as shown in formula (7).

Wherein, the first and the second end of the pipe are connected with each other,

is the average value of resource balance of each node, g _i The resource balance of the ith node.

2) Service delay: the final object of the service is the user, and the delay is one of the most important indexes affecting the user experience. In order to trade off the response performance of all distributed technology services. The main factor affecting the service delay is data transmission and service execution, so the sum of the communication delay and the execution delay is used to represent the service delay, as shown in equation (8).

T _i,j ＝comT _i,j +exeT _i,j Formula (8)

Wherein, T _i,j Indicating the service delay, comT, of service i at node j _i,j Representing the communication delay of service i at node j, resulting from the network transmission of services and data on which service i depends. exeT _i,j Representing the execution delay of service i at node j.

Services are dependent on services, data and users. That is, there is data transfer between services, between data sources and services, and user requests for services. In general, the amount of data requested by the user is negligible, so the communication delay of the service can be defined as the sum of the delays generated by the service dependency and the data dependency, as shown in equation (9).

comT _i,j ＝sevT _i,j +datT _i,j Formula (9)

Wherein, sevT _i,j Representing the time delay of the service i on the node j caused by service dependence, wherein the time delay can be represented by the sum of the dependence strength and the quotient of the bandwidth of each hop, and the dependence strength between services is defined as the number of times of calling between services divided by the total number of times of calling in the unit time of the service; datT _i,j The delay caused by data dependence of the service i on the node j can be represented by the sum of the data quantity and the quotient of the bandwidth of each hop.

It is assumed that a node can deploy all services scheduled to it, that is, the memory capacity of the node can meet the task requirement, and the execution time of the service is mainly affected by the CPU capability of the node and is related to the complexity of the service itself because the difference between different memory speeds is not large. Therefore, the execution latency of a service is defined as the ratio of the service instruction length to the processing power of the node CPU, as shown in equation (10).

Among them, mips _j CPU instruction execution speed, CPU, representing node j _i Representing CPU utilization, mi, of service i _i The instruction length of the service i is represented, the first two indexes can be obtained through the monitoring system, and the third index is priori knowledge.

A large number of services exist in distributed nodes, and the mean value of all service response times is adopted for measuring the service delay in the cloud environment, as shown in a formula (11).

(3) Scheduling optimization method based on deep reinforcement learning

The method and the device comprehensively consider factors influencing service scheduling and scheduling efficiency, mainly consider the constraints of 3 control actions and the optimization targets of 2 auxiliary rewards in the service scheduling optimization process, and realize the scheduling optimization of the service by adopting a Deep Deterministic Policy Gradient (DDPG) algorithm.

1) Model (model)

Description of the environment: the method and the device schedule the current service to optimize the environment state E of the task t _t Defined as a quadruple, E _t = (t, S, N, SN), where t is a scheduling optimization task number; s is a vector representing the current service; n is a vector representing the current node; SN is expressed asVector of previous service deployment scheme, each element in SN being an ordered couple<s _i ,n _j >Represents a service s _i Is deployed to node n _j The above.

Action description: the action of service scheduling is defined as performing the operation of deploying the service to the node in the current optimized scheduling task, denoted as a _t ＝(s _i ,n _j ) I.e. service s _i Is deployed to node n _j The above. In particular, the selection of actions is subject to the constraints of service-dependent, data-dependent, and user-dependent constraints.

Reward description: according to the goal of service optimization scheduling, a service deployment scheme with low service delay and high resource balance is a better scheme. The present application takes action a _t The sum of the difference of service delay and the difference of resource balance of all services after completion and before completion is used as a basis for setting the reward value, and the calculation method is shown as formula (12). When the service delay is reduced and the resource balance is improved, the reward is a positive value, otherwise, the reward is 0 or a negative value.

r _t ＝(g _t+1 -g _t )-(avgT _t+1 -avgT _t ) Formula (12)

2) Method implementation

The scheduling requirements of the service are constantly generated and the scheduling algorithm should have the capability of continuous control. In addition, the service on the cloud has a certain scale, the influence factor is complex, and the scheduling is continuously performed, so the scheduling algorithm has higher efficiency. The Deep Deterministic Policy Gradient algorithm (DDPG) combines Deep learning and reinforcement learning, is proposed for solving the problem of continuous action control, has the advantages of being suitable for solving a complex nonlinear problem, high in solving efficiency and capable of supporting parallel computing, and can be suitable for solving a service scheduling problem in a cloud environment. Therefore, a scheduling algorithm based on deep reinforcement learning is designed on the basis of a depth determination strategy gradient algorithm.

As shown in FIG. 2, the present application describes the service scheduling optimization problem based on deep reinforcement learning as a 5-tuple O _t ＝(t,E _t ,a _t ,r _t ,E _t+1 ) Wherein t is a value between 1 and n, a task number optimized for scheduling; e _t Representing the state of the distributed environment at the tth task time; a is _t Is shown in state E _t The action taken next, i.e. the scheduling scheme of the service; r is _t Is shown in state E _t Lower sampling a _t The reward earned by this action; e _t+1 Is shown in state E _t Lower sampling a _t This action obtains a new state. The 5-tuples depicting the scheduling optimization tasks are stored in an experience replay pool, and a basis is provided for model training.

In order to solve the problem of slow convergence, the DDPG uses a dual-network structure consisting of a policy network and a value network, and the structure of the DDPG-based service scheduling optimization method is shown in fig. 2. The input to the policy network is the state E of the current environment _t Output corresponding action a _t . Value network input is the state E of the current environment _t And action a _t The output is E _t Performing action a in State _t Scoring of (4). In addition, policy networks and value networks are both classified into online networks and target networks. The corresponding structures of the online network and the policy network are the same, but the initialization parameters are different.

The specific flow of the scheduling optimization method based on deep reinforcement learning is shown in fig. 3, and firstly, system parameters are initialized, a neural network is constructed, network weights and hyper-parameters are initialized, and an environment is initialized. Then state E obtained from the environment _t Transmitting to the policy network, the policy network outputting corresponding action a _t . A is to _t Obtaining a corresponding reward r as an input to an environment _t And state E at the next moment _t+1 . Will (E) _t ,a _t ,r _t ,E _t+1 ) Stored as a piece of state transition data in the experience replay pool. And acquiring small batch of data from the experience storage pool, and training the neural network. The above process is repeated until the model converges. Thereafter, based on the model, the service scheduling task is executed and the optimization is continued.

Claims

1. A distributed micro-service scheduling optimization method is characterized by comprising the following steps;

and 4, step 4: inputting the environment, the action and the reward into a DDPG model, and training to obtain a prediction model of the micro-service instance scheduling optimization scheme;

and 5: and predicting the scheduling optimization scheme of the micro service instance in the cloud environment instance by using the micro service instance scheduling optimization scheme prediction model obtained by training to obtain a corresponding scheduling optimization scheme.

2. The distributed micro-service scheduling optimization method according to claim 1, wherein the step 1 specifically comprises:

description of the environment: the environment state E of the tth micro-service scheduling optimization task _t Defined as a quadruple, E _t = (t, S, N, SN), where t is a scheduling optimization task number; s is a vector representing the current service, i.e., S = { S = { S } ₁ ,s ₂ ,…,s _i ,…,s _n }; n is a vector representing the current node, i.e., N = { N ₁ ,n ₂ ,…n _j ,…n _m }; SN is a vector representing the current service deployment scenario, i.e., SN = (c =: (d))<s _s1 ,n _n1 >,<s _s2 ,n _n2 >,…<s _sk ,n _nk >,…<s _sNum ,n _nNum >) Wherein each element is an ordinal couple<s _sk ,n _nk >Represents a service s _sk Is deployed to node n _nk The above.

3. The distributed microservice scheduling optimization method according to claim 1, wherein the step 2 specifically comprises:

action description: defining the action of the t-th micro-service scheduling optimization task as executing the operation a of deploying the service to the node in the current optimization scheduling task _t ＝(s _i ,n _j ) I.e. service s _i Is deployed to node n _j Wherein n is _j The selection of (a) is to meet the limitation of service dependency, data dependency and user dependency constraints, and is oriented to the distributed service environment.

4. The distributed micro-service scheduling optimization method according to claim 3, wherein the scheduling constraint of the service is designed as follows:

5. The distributed micro-service scheduling optimization method according to claim 1, wherein the step 3 specifically includes:

the reward description: by action a _t The sum of the difference of the service time delay and the difference of the resource balance of all the services after and before completion is used as a basis for setting a reward value, the calculation method is as shown in a formula (1), when the service time delay is reduced and the resource balance is improved, the reward is a positive value, otherwise, the reward is a 0 or negative value;

r _t ＝(g _t+1 -g _t )-(avgT _t+1 -avgT _t ) Formula (1)

As shown in formula (1), r _t Scheduling an optimized task for the tth micro-service with a reward value, g, after execution _t And g _t+1 Respectively is the action a _t Resource balance before and after execution, avgT _t And avgT _t+1 Respectively is the action a _t Service delays before and after execution.

6. The distributed microservice scheduling optimization method according to claim 5, wherein the resource balance and service latency are calculated as follows:

formula (2) the CPU utilization is calculated as shown in formula (3), capacity _cpu Is the total CPU resource of the node;

the memory utilization rate is as follows: the memory utilization rate of a node is defined as the ratio of allocated memory resources to total memory resources in the node, the allocated memory resources are calculated through the memory usage condition of containers on the node, as shown in formula (4), n is the total number of containers on the node;

formula (4)

The calculation mode of the memory utilization rate is shown as the formula (5), and the Capacity _mem Is the total memory resource of the node;

g _i ＝|Ratio_Node _cpu ⁱ –Ratio_Node _mem ⁱ equation (6)

wherein the content of the first and second substances,

is the average value of resource balance of each node, g _i The resource balance of the ith node;

T _i,j ＝comT _i,j +exeT _i,j formula (8)

comT _i,j ＝sevT _i,j +datT _i,j formula (9)

Wherein, sevT _i,j Representing the time delay of the service i on the node j caused by service dependence, and representing the time delay by the sum of the dependence strength and the quotient of the bandwidth of each hop, wherein the dependence strength between services is defined as the number of times of calling between services divided by the total number of times of calling in the unit time of the service; datT _i,j The time delay generated by data dependence of the service i on the node j is represented by the sum of the data volume and the quotient of the bandwidth of each hop;

among them, mips _j CPU instruction execution speed, CPU, representing node j _i Representing CPU utilization, mi, of service i _i The instruction length of the service i is represented, the first two indexes can be obtained through a monitoring system, and the third index is priori knowledge;

7. The distributed micro-service scheduling optimization method according to claim 1, wherein the step 4 specifically includes:

describing micro-service scheduling optimization problem based on deep reinforcement learning as 5-tuple O _t ＝(t,E _t ,a _t ,r _t ,E _t+1 ) Wherein t is a value between 1 and n, a task number optimized for scheduling; e _t Representing the state of the distributed environment at the tth task time; a is _t Is shown in state E _t The action taken, i.e. the scheduling scheme of the service; r is _t Is shown in state E _t Lower sampling a _t The reward earned by this action; e _t+1 Is shown in state E _t Lower sampling a _t The new state obtained by the action depicts that the 5-tuple of each scheduling optimization task is stored in an experience replay pool to provide a basis for model training;

DDPG uses a dual network structure consisting of a policy network whose input is the state E of the current environment and a value network _t Output corresponding action a _t . Value network input is the state E of the current environment _t And action a _t The output is E _t Performing action a in State _t In addition, a policy network andthe value network is divided into an online network and a target network, the corresponding structures of the online network and the strategy network are the same, but the initialization parameters are different.

8. The distributed microservice scheduling optimization method according to claim 1, wherein the step 5 specifically comprises: