CN109861850B

CN109861850B - SLA-based stateless cloud workflow load balancing scheduling method

Info

Publication number: CN109861850B
Application number: CN201910028641.0A
Authority: CN
Inventors: 余阳; 黄钦开
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2019-01-11
Filing date: 2019-01-11
Publication date: 2021-04-02
Anticipated expiration: 2039-01-11
Also published as: CN109861850A

Abstract

The invention discloses a stateless cloud workflow load balancing scheduling method based on SLA, which selects different SLA levels according to the requirements of different tenants on service scenes and process models of the tenants, provides different request throughput services for the tenants through the different SLA levels, performs hierarchical service on different process requests, realizes real-time engine load monitoring by combining shared memory and the distribution condition of the process models on an engine, reduces the request peak of the engine service and the whole memory overhead of an engine cluster, thereby improving the load balancing capacity of the cloud workflow under a multi-tenant architecture, and enabling a process service provider to provide services for more tenants on the basis of meeting the analysis execution performance requirements of different tenants on the request throughput and different process definitions.

Description

SLA-based stateless cloud workflow load balancing scheduling method

Technical Field

The invention relates to the technical field of workflow and cloud computing, in particular to a stateless cloud workflow load balancing scheduling method based on SLA.

Background

With the development of distributed computing, particularly grid technology, cloud computing has emerged as a new service computing model. Cloud computing is a mode of resource delivery and usage, and refers to obtaining resources required by an application through a network, including hardware, platforms, software, and the like, and the network providing the resources is referred to as a "cloud". In cloud computing, anything is a service, which can be generally divided into three levels: infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS).

The cloud workflow is a PaaS-level service, and refers to a distributed system for providing workflow services in a platform-as-a-service cloud computing mode. Compared with a traditional workflow system, the cloud workflow has the main advantages that: the cloud workflow provides a mode of using according to needs and paying according to quantity, and the mode can effectively reduce the investment cost of using workflow management software by enterprises and reduce the starting difficulty; the cloud workflow has the advantages of high resource utilization rate and high service performance, the centralized management mode can fully utilize computing power, and flexible resource configuration can also cope with request loads in different time periods.

The traditional workflow engine is usually realized based on a stateful scheme, and in order to better exert the flexibility of cloud resources and improve the reliability of a cloud workflow system in a cloud environment, the workflow engine realized based on the stateless scheme can better meet the requirements of cloud workflow; for a cloud workflow system realized based on a stateless workflow engine, on one hand, due to the characteristics of workflow services, a flow model is necessary to be analyzed and analysis results are necessary to be stored, and certain computing resources and storage resources are required to be occupied; on the other hand, in a cloud environment, a cloud workflow needs to support multi-tenant business process execution, and a load scenario is much more complex than that of a traditional workflow engine. When the cloud workflow system realized based on the stateless workflow engine faces the request load of multiple tenants, multiple process models and multiple process instances, if only the stateless property of the service is considered for scheduling, the characteristics of the workflow service can not be fully utilized and exerted, so that better request load balancing effect and user experience can not be realized.

Currently, the system architecture and management structure of the existing workflow cluster system in the workflow field usually only serve the same user or the same organization. In a cloud service business action mode, a process service provider wants to provide process analysis services for more tenants under the condition of the same hardware resources, different tenants often have different requirements on request throughput of engine services according to business scenes, and the same tenant often has different requirements on analysis execution performance defined by different processes, so that the tenant and the process service provider need to sign an SLA contract, and a system provides corresponding service levels to the tenant according to the SLA contract.

Disclosure of Invention

The invention provides a stateless cloud workflow load balancing scheduling method based on SLA (service level agreement) for solving the problems of requirements of different tenants on different service levels under cloud workflow and the requirement of a process service provider on increasing the number of tenants under the same hardware resource, which realizes optimization of load balancing effect and execution performance of cloud workflow requests while ensuring cloud tenant service experience, so that a cloud workflow system provides process analysis service for more tenants under a normal service state.

In order to achieve the purpose of the invention, the technical scheme is as follows: a method for load balancing scheduling of stateless cloud workflows based on SLAs is disclosed, wherein when a process instance request corresponding to a tenant uploading process model is received, the cloud workflows schedule the process instance request to a stateless workflow engine in a cluster, and the execution comprises the following steps:

and smoothing the load waveform of the admission layer:

s101: an admission layer receives a tenant process instance request, and acquires a service request arrival rate RAR index of a tenant and a request response time level RTL for the process instance request from a tenant SLA warehouse according to a tenant ID or process instance request information;

s102: judging whether the tenant service request rate meets an RAR index or not according to a system current-limiting algorithm, if the tenant service request rate exceeds the service request rate specified by the RAR index, directly filtering the request, feeding back the request to the tenant, and prompting to purchase a higher RAR level, otherwise, executing the next step;

s103: judging RTL levels, executing the request balanced distribution of a scheduling layer according to different RTL levels, acquiring the request number of a current immediate execution queue and a delay queue, calculating the score of a current process instance request for each delay queue according to the request number of the delay queue by using a historical load variable historySize, and placing the request in the delay queue with the highest score;

and the dispatching layer requests balanced dispatching:

s201: the scheduling layer receives a request from an immediate execution queue of the admission layer, and acquires a load information set E ═ E of each process engine service sent by the process service layer from the shared memory₁,…,e_m]，e_i＝(cpu_i,ram_i)，cpu_iRepresentation flow Engine service e_iCurrent cpu occupancy, ram_iRepresentation flow Engine service e_iCurrent ram occupancy;

s202: flow for obtaining request from flow example warehouse by scheduling layerThe distribution condition set D ═ D of the process instance corresponding to the model in the process engine service₁,d₂,…,d_m]，d_i∈[0,1]When d is_iWhen equal to 0, the flow model does not run at e_iOn the engine, otherwise, the reverse is true;

s203: dividing the process engine services into two groups E according to the distribution condition set D₁And E₂，E₁In which all the engines on which the process model has been executed, i.e. d, are stored_i＝1；E₂The rest of the engines are stored;

s204: for E₁And E₂The element of (A) is subjected to engine busyness calculation to respectively obtain E₁、E₂Least busy engine service

Judgment inequality

If not, dispatch the process instance request to

Otherwise is assigned to

Modifying the distribution condition set in the process instance warehouse; completing the request scheduling of the process instance;

where β is a cost parameter for allocating the process instance request to the new engine, and may be set according to specific hardware resource characteristics.

Preferably, in step S101, the service request arrival rate RAR is used to measure the throughput of the process instance request, and represents the highest number of process instance requests that can be sent by the tenant per second;

the RAR index is divided into three stages, and v is defined₀，v₁，v₂Wherein v is₀、v₁、v₂Is an integer and has v₀>v₁>v₂Then threeThe individual levels are described as follows:

RAR 0 means that the service request arrival rate is at most equal to v₀；

RAR 1 means that the service request arrival rate is at most equal to v₁；

RAR 2 means that the service request arrival rate is at most equal to v₂；

Different levels of RAR correspond to different charging.

Preferably, in step S101, the request response time level RTL is used to measure the processing performance of different process requests, and the RTL is proposed based on the diversity of the execution time ranges of the workflow;

the RTL level is divided into three levels, parameters a, b and t are defined, wherein a, b and t are integers, a is smaller than b, t represents the time required by the engine to process a process instance request and represents the length of a time slice, and the RTL level can be obtained by testing the service of the process engine, and then the RTL level is as follows:

RTL 0: the request of the process example is responded in 1 time slice, namely t;

RTL 1: the request responds at the latest (a +1) time slices, namely (a +1) t;

RLT 2: the request responds at the latest at (b +1) time slices, i.e., (b +1) t.

Further, the system current limiting algorithm adopts a sliding window algorithm to ensure the RAR index of the tenant; the implementation is performed in a manner of using request cache for the RTL level, and the specific manner is as follows:

the admission layer maintains b +1 queues for storing process instance requests, each queue corresponds to a delay duration variable representing the delay duration of the process instance requests in the queue, and the values of the delay duration variable are 0t, 1t, 2t, … and bt, respectively, wherein: t is a time slice defined in RTL, a queue with a delay time variable of 0t is an immediate execution queue, a queue with a delay time variable of 1t, 2t, …, bt is a delay queue; the admission layer also needs to update the delay queue and the historical load variable after 1 time slice, and the historical load variable is used for measuring the condition of the number of requests of each past time slice; the admission layer needs to put new process instance requests into corresponding delay queues according to the RTL level set by the tenant for the process tasks and the number of the process instance requests stored in each current queue.

Still further, step S103, specifically, determining an RTL level;

if the RTL level is RTL 0, directly executing the request balance dispatch of the scheduling layer;

if the RTL level is RTL 1, which indicates that the delay time is less than or equal to a slots, the request numbers of the current immediate execution queue and the previous a delay queues are obtained, and the set is N ═ N₀,n₁,…,n_a]Wherein n is_iThe number of requests representing the ith queue, i ═ 0, 1 … a;

if the RTL level is RTL 2, which indicates that the delay time is less than or equal to b slots, the request numbers of the current immediate execution queue and all the delay queues are obtained, and the set is N ═ N₀,n₁,…,n_b]Wherein n is_jThe number of requests in the jth queue is represented, j being 0 and 1 … b.

Still further, in step S103, the calculation method for calculating the score of the current request for each delay queue is as follows:

in the formula: n is_iE N, i denotes the position of the current delay queue in the set N.

Still further, the admission layer needs to update the delay queues and the historical load variables after every 1 time slice, including the following steps:

h1: subtracting 1 from the delay time variable of all the delay queues, judging whether the delay time variable is equal to 0, if so, adding all the requests in the queues to the immediate execution queue, and resetting the delay time to be bt;

h2: executing the queue immediately requires running a thread all the time to judge whether the queue has a request, submitting the request to a scheduling layer in sequence for request dispatching according to the rate of processing the request by an engine service, and recording the number of the requests submitted to a process engine in each time slice;

h3: acquiring the request number requestSize submitted to a scheduling layer by an immediate execution queue for request dispatch in the past 1 time slice, and updating a historical load variable historySize according to the following formula:

historySize＝α*historySize+(1-α)*requestSize

wherein: α represents a weighting factor representing the degree of attenuation of the previous second historsize value.

Still further, the engine busyness calculation formula is as follows:

busyness_i＝w₁*cpu_i+w₂*ram_i,w₁+w₂＝1

wherein: w is a₁And w₂The two parameters respectively represent the importance degrees of two load parameters, namely cpu and ram, and need to be configured according to the characteristics of hardware resources.

In the request balanced distribution of the scheduling layer, the scheduling layer realizes that the requests corresponding to the same process model are distributed to a small number of engines according to the load condition of the process engine service of the process service layer and the characteristics of the stateless workflow engine, so that the calculation and the memory consumption caused by multiple analysis and result storage of the same process model are reduced.

The invention has the following beneficial effects: according to the invention, different SLA levels are selected according to the requirements of different tenants on service scenes and process models of the tenants, different request throughput services are provided for the tenants by the cloud workflow system through the different SLA levels, different process requests are subjected to classified service, and the real-time engine load monitoring and the distribution condition of the process models on the engine are realized by combining shared memory, so that the overall memory overhead of an engine cluster is reduced while the request wave crest of the engine service is reduced, the load balancing capability of the cloud workflow under a multi-tenant architecture is improved, and a process service provider can provide services for more tenants on the basis of meeting the analysis execution performance requirements of different tenants on the request throughput and different process definitions.

Drawings

FIG. 1 is a cloud workflow core component diagram.

Detailed Description

The invention is described in detail below with reference to the drawings and the detailed description.

Example 1

As shown in fig. 1, a method for stateless cloud workflow load balancing scheduling based on SLA includes steps of admission layer load waveform smoothing, scheduling layer request balancing assignment processing; before specifically introducing the two steps, defining quantitative indexes for representing different request throughputs and processing performances of different flow requests in a cloud workflow service SLA, wherein the request throughputs are measured by using a service request arrival rate RAR and represent the highest flow instance request number which can be sent by a tenant per second; the performance of different process request processing is measured by the request response time level RTL, which is proposed based on the diversity of the execution time range of the workflow, which varies from several microseconds to several months.

According to the embodiment, according to the needs of tenants for service scenes, RAR indexes can be divided into three levels, and v is defined₀，v₁，v₂Wherein v is₀、v₁、v₂Is an integer and has v₀>v₁>v₂Then the three levels are described as follows:

RAR 0 for high concurrent service scene, that is, service request arrival rate is at most equal to v₀；

RAR 1 is used for a common concurrent service scene, and means that the service request arrival rate is at most equal to v₁；

RAR 2 for low concurrent service scenario, meaning that service request arrival rate is at most equal to v₂；

RARs of different classes correspond to different charges, and classes with higher service request arrival rates also have correspondingly higher charges.

In this embodiment, according to different requirements of different process requests on processing real-time performance, the RTL level may be divided into three levels, before detailed statement, parameters a, b, and t are defined, where a, b, and t are integers, and a is smaller than b, and t represents a time required by an engine to process a process instance request, and also represents a length of a time slice, and can be obtained by testing engine services, and the SLA level is stated as follows:

RTL 0: for a process instance request with higher real-time requirement, the process instance request responds in 1 time slice, namely t; mostly, the process is automated;

RTL 1: for a general flow instance request with real-time requirements, the request responds at (a +1) time slices at the latest, namely (a +1) t;

RLT 2: for a flow instance request with lower real-time requirements, the request responds at the latest in (b +1) time slices, i.e., (b +1) t.

The tenant can select different SLA indexes for different tasks in the process model according to the actual use condition of the model when uploading the process model, so that different charging is obtained, and the higher the SLA index is, the higher the charging response is.

In the method for load balancing scheduling of a stateless cloud workflow based on an SLA according to this embodiment, when a process instance request corresponding to a tenant upload process model is received, a cloud workflow schedules the process instance request to a stateless workflow engine in a cluster, and the execution includes the following steps:

and smoothing the load waveform of the admission layer:

s101: the admission layer receives a tenant process instance request, and acquires an RAR index of the tenant and an RTL level requested by the process instance from a tenant SLA warehouse according to the tenant ID and the process request information;

s102: judging whether the tenant service request rate meets an RAR index or not according to a time window algorithm, if the tenant service request rate exceeds the service request rate specified by the RAR index, directly filtering the request, feeding back the request to the tenant, and prompting to purchase a higher RAR level; otherwise, executing the next step;

s103: judging the RTL level, and if the RTL level is RTL 0, directly executing the request balance allocation of a scheduling layer for scheduling; if the RTL level is RTL 1, which indicates that only a time slices can be delayed at most, then the current immediate execution queue and the previous a are obtainedThe request number of each delay queue is set as N ═ N₀,n₁,…,n_a]Wherein n is_iThe number of requests representing the ith queue, i ═ 0, 1 … a; if the RTL level is RTL 2, which indicates that b time slices can be delayed at most, the request numbers of the current immediate execution queue and all the delay queues are obtained, and the set is N ═ N₀,n₁,…,n_b]Wherein n is_jRepresents the number of requests of the jth queue, j is 0, 1 … b;

s104: using the historical load variable historySize, for each element of set N, the score for each delay queue for the current request is calculated using:

in the formula, n_iE N, i denotes the position of the current delay queue in the set N.

And (4) placing the request in the delay queue with the highest score according to the score of each delay queue calculated in the steps.

The dispatch layer request balanced dispatch comprises the following steps:

s202: the scheduling layer obtains a distribution condition set D ═ D [ D ] of the flow instance corresponding to the requested flow model in the flow engine service from the flow instance warehouse₁,d₂,…,d_m]，d_i∈[0,1]When d is_iWhen equal to 0, the flow model does not run at e_iOn the engine, otherwise, the reverse is true;

s204: for E₁And E₂The engine busyness calculation is performed by the following formula:

busyness_i＝w₁*cpu_i+w₂*ram_i,w₁+w₂＝1

wherein: w is a₁And w₂The two parameters respectively represent the importance degrees of two load parameters, namely cpu and ram, and need to be configured according to the characteristics of hardware resources; respectively obtaining E through an engine busyness formula₁、E₂Least busy engine service

Judgment inequality

Whether the result is true or not; if the inequality holds, dispatch the request to

Otherwise is assigned to

Modifying the distribution condition set in the process instance warehouse;

wherein: beta is a cost parameter for allocating the process instance request to the new engine, and can be set according to the specific hardware resource characteristics;

s205: and completing request scheduling.

In the embodiment, in the request balanced dispatch of the scheduling layer, the scheduling layer allocates the requests corresponding to the same process model to a small number of engines according to the load condition of the process engine service of the process service layer and according to the characteristics of the stateless workflow engine, so that the computation and the memory consumption caused by multiple analysis and result storage of the same process model are reduced.

In the system current limiting algorithm of the embodiment, a sliding window algorithm is adopted to ensure an RAR index of a tenant; the implementation is performed in a manner of using request cache for the RTL level, and the specific manner is as follows:

The admission layer needs to update the delay queues and the historical load variables after every 1 time slice, including the following steps:

historySize＝α*historySize+(1-α)*requestSize

wherein: α represents a weighting factor representing the decay rate of the previous second historsize value.

The method is characterized in that different requirements of service scenes of different tenants on request throughput and different process definition analysis execution performances are based, the request delay is carried out by utilizing the diversity of execution time ranges in the workflow, particularly the characteristic that part of tasks do not need real-time response, and the current limitation of the tenants and the request load balancing optimization under the current limitation constraint are realized; combining the characteristics of the stateless workflow engine service, realizing a load balancing strategy of requesting minimum engine number distribution by the process model; the access of the engine service load information is realized by utilizing the shared memory; according to the method, the load balancing effect and the execution performance of the cloud workflow request can be optimized while the service experience of the cloud tenants is guaranteed, so that the cloud workflow system can provide flow analysis services for more tenants in a normal service state.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A method for load balancing scheduling of stateless cloud workflows based on SLA is characterized in that: when a process instance request corresponding to a tenant uploading process model is received, the cloud workflow schedules the process instance request to a stateless workflow engine in a cluster, and the execution comprises the following steps:

and smoothing the load waveform of the admission layer:

s103: judging RTL levels, executing the request balanced distribution of a scheduling layer according to different RTL levels, acquiring the request number of a current immediate execution queue and a delay queue, calculating the score of a current process instance request for each delay queue according to the request number of the delay queue by using a historical load variable historySize, and placing the process instance request in the delay queue with the highest score;

and the dispatching layer requests balanced dispatching:

Judgment inequality

If not, dispatch the process instance request to

Otherwise is assigned to

2. The SLA-based stateless cloud workflow load balancing scheduling method of claim 1, wherein: step S101, the service request arrival rate RAR is used for measuring the throughput of the process instance request and representing the highest process instance request number which can be sent by the tenant per second;

the RAR index is divided into three stages, and v is defined₀，v₁，v₂Wherein v is₀、v₁、v₂Is an integer and has v₀>v₁>v₂Then the three levels are described as follows:

RAR 0 means that the service request arrival rate is at most equal to v₀；

RAR 1 means that the service request arrival rate is at most equal to v₁；

RAR 2 means that the service request arrival rate is at most equal to v₂；

Different levels of RAR correspond to different charging.

3. The SLA-based stateless cloud workflow load balancing scheduling method of claim 1, wherein: step S101, the request response time level RTL is used for measuring the processing performance of different process requests, and the RTL is proposed based on the diversity of the execution time range of the workflow;

the RTL level is divided into three levels, and parameters a, b and t are defined, wherein a, b and t are integers, a is smaller than b, t represents the time required by the engine to process a flow instance request and represents the length of a time slice, and the RTL level is as follows:

RTL 1: the process instance request responds at the latest (a +1) time slices, i.e., (a +1) t;

RLT 2: the process instance request responds at the latest (b +1) time slices, i.e., (b +1) t.

4. The SLA-based stateless cloud workflow load balancing scheduling method of claim 3, wherein: the system current limiting algorithm adopts a sliding window algorithm to ensure the RAR index of the tenant; the implementation is performed in a manner of using request cache for the RTL level, and the specific manner is as follows:

5. The SLA-based stateless cloud workflow load balancing scheduling method of claim 4, wherein: step S103, specifically, judging the RTL level;

if the RTL level is RTL 1, the description is extendedIf the delay time is less than or equal to a time slices, acquiring the request number of the current immediate execution queue and the previous a delay queues, wherein the set is N ═ N₀,n₁,…,n_a]Wherein n is_iThe number of requests representing the ith queue, i ═ 0, 1 … a;

6. The SLA-based stateless cloud workflow load balancing scheduling method of claim 5, wherein: step S103, the calculation method for calculating the score of the current request for each delay queue is as follows:

7. The SLA-based stateless cloud workflow load balancing scheduling method of claim 6, wherein: the admission layer needs to update the delay queues and the historical load variables after every 1 time slice, including the following steps:

h1: subtracting 1 from the delay time variable of all the delay queues, judging whether the delay time variable is equal to 0, if so, adding all the requests in the delay queues to the immediate execution queue, and resetting the delay time to be bt;

historySize＝α*historySize+(1-α)*requestSize

8. The SLA-based stateless cloud workflow load balancing scheduling method of claim 7, wherein: the engine busyness calculation formula is as follows:

busyness_i＝w₁*cpu_i+w₂*ram_i,w₁+w₂＝1