CN117573374A

CN117573374A - System and method for server to have no perceived resource allocation

Info

Publication number: CN117573374A
Application number: CN202410050810.1A
Authority: CN
Inventors: 金鑫; 章梓立; 金超
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2024-01-15
Filing date: 2024-01-15
Publication date: 2024-02-20
Anticipated expiration: 2044-01-15
Also published as: CN117573374B

Abstract

The application provides a system and a method for server non-perception resource allocation, which relate to the technical field of cloud computing, wherein the system comprises: a composer and a function running module; the orchestrator comprises a task analysis module and a resource configuration automatic generation module; the task analysis module is used for analyzing the historical operation information of the task submitted by the user and determining the operation time model and the cost model corresponding to each stage in the task; the resource allocation automatic generation module is used for generating an optimal resource allocation strategy corresponding to the task according to the running time model, the cost model, the performance limit designated by the user and the task; and the function operation module is used for executing the task based on the optimal resource configuration strategy. The aim is to optimize the cost or delay while satisfying the delay or cost margin required by the user for a resource configuration that is computationally agnostic to the server.

Description

System and method for server to have no perceived resource allocation

Technical Field

The application relates to the technical field of cloud computing, in particular to a system and a method for server non-perception resource allocation.

Background

The server unaware calculation can automatically deploy user codes, expand and contract the volume of resources and charge according to the user demands, which provides great convenience and lower cost for cloud application developers. Developers are also attracted to migrate their cloud applications to platforms that are server-agnostic computing. Developers can organize cloud applications into server-agnostic workflows, each node of the workflows corresponds to one server-agnostic function, the connecting edges of the nodes in the workflows represent data dependence between the two functions, and through the definition, the cloud applications can be deployed on a server-agnostic computing platform and are expanded and contracted according to charging as required.

When developing cloud applications, developers often seek to meet certain criteria, such as end-to-end latency (i.e., runtime of the workflow) and running costs, on key performance indicators. There is a natural trade-off between latency and cost for these applications when running. Allocating more resources may reduce latency, but this may increase costs accordingly; while less resource allocation may result in increased latency, but at lower cost. Users typically set specific delay or cost limits and expect that the server's unaware computing platform will be able to provide enough computing resources to meet these set delay and cost limits. The existing mainstream systems either do not support automatic configuration of resources of different node functions of a workflow or are not accurate enough in fitting performance-resource relationships of functions, and the defects cause that the existing systems cannot provide optimal resource configuration, cause that performance limits designated by users cannot be met, or cause that provided resources cannot be fully utilized to cause cost waste.

Disclosure of Invention

In view of this, the present application provides a system in which a server does not have a perceived resource allocation. The aim is to optimize the cost (or delay) while meeting the delay (or cost) limits required by the user for a resource configuration that is computationally agnostic to the server.

In a first aspect of embodiments of the present application, there is provided a system for server unaware resource allocation, the system comprising: a composer and a function running module; the scheduler comprises a task analysis module and a resource configuration automatic generation module;

the task analysis module is used for analyzing the historical operation information of the task submitted by the user and determining the operation time model and the cost model corresponding to each stage in the task;

the automatic resource allocation generating module is used for generating an optimal resource allocation strategy corresponding to the task according to the running time model, the cost model, the performance limit designated by the user and the task;

and the function operation module is used for executing the task based on the optimal resource allocation strategy.

Optionally, the task analysis module includes:

the operation time model building module is used for fitting operation time functions of each stage in the task according to the historical operation information to obtain operation time models corresponding to each stage in the task;

And the cost model construction module is used for multiplying the running time models corresponding to the stages in the task by the charging factors respectively to obtain the cost models corresponding to the stages in the task.

Optionally, the runtime model building module includes:

the first construction module is used for fitting random variables in the respective initialized running time functions of each stage in the task according to the historical running information to obtain respective corresponding first running time models of each stage in the task;

the second construction module is used for fitting random variables in the data read-write operation time functions of each stage in the task according to the historical operation information to obtain second operation time models corresponding to each stage in the task;

the third construction module is used for fitting random variables in the respective calculation running time functions of each stage in the task according to the historical running information to obtain respective corresponding third running time models of each stage in the task;

and the runtime model construction submodule is used for determining the runtime model corresponding to each stage in the task based on the first runtime model, the second runtime model and the third runtime model corresponding to each stage in the task.

Optionally, the automatic resource configuration generating module includes:

a mathematical model construction module, configured to construct a mathematical model of the task according to the runtime model, the cost model, a performance limit specified by a user, and the task, where the mathematical model includes an objective function and a constraint condition;

the sampling module is used for sampling each random variable in the fitting range of each random variable in the mathematical model obtained by fitting to obtain a first objective function and a first constraint condition;

and the solving module is used for determining an optimal resource configuration strategy of the task based on the first objective function and the first constraint condition.

Optionally, the sampling module includes:

the first sampling module is used for sampling each random variable for a preset number of times in the fitting range of each random variable in the constraint condition of the mathematical model obtained by fitting, so as to obtain the first constraint condition;

and the second sampling module is used for sampling the average value of each random variable in the fitting range of each random variable in the objective function of the mathematical model obtained by fitting to obtain a first objective function.

Optionally, the first sampling module further includes:

and the sampling frequency determining module is used for determining the value of the preset frequency through a preset algorithm.

Optionally, the first sampling module includes:

the first sampling submodule is used for sampling each random variable for a preset number of times in the respective fitting range of each random variable in the constraint conditions of the mathematical model obtained by fitting, and obtaining initial constraint conditions of corresponding preset numbers;

and the constraint condition pruning module is used for carrying out redundant pruning on the initial constraint conditions with the preset number according to a preset rule to obtain the residual initial constraint conditions after the redundant pruning, wherein the residual initial constraint conditions form the first constraint condition.

Optionally, the solving module includes:

the local optimal resource allocation determining module is used for determining local optimal resource allocation through a gradient descent algorithm based on the first objective function and the first constraint condition;

and the optimal resource allocation strategy determining module is used for determining a feasible optimal resource allocation strategy around the local optimal resource position based on the determined local optimal resource allocation.

Optionally, the orchestrator further comprises a function log module;

the function log module is used for recording operation information obtained after each task is executed so as to form respective historical operation information of each task.

A second aspect of the present application provides a method for server unaware resource allocation, the method comprising:

analyzing historical operation information of a task submitted by a user, and determining an operation time model and a cost model corresponding to each stage in the task;

generating an optimal resource allocation strategy corresponding to the task according to the running time model, the cost model, the performance limit designated by the user and the task;

and executing the task based on the optimal resource allocation strategy.

Aiming at the prior art, the application has the following advantages:

the embodiment of the application provides a system for server non-aware resource allocation, which comprises: a composer and a function running module; the orchestrator comprises a task analysis module and a resource configuration automatic generation module; the task analysis module is used for analyzing the historical operation information of the task submitted by the user and determining the operation time model and the cost model corresponding to each stage in the task; the resource allocation automatic generation module is used for generating an optimal resource allocation strategy corresponding to the task according to the running time model, the cost model, the performance limit designated by the user and the task; and the function operation module is used for executing the task based on the optimal resource configuration strategy. Therefore, when the optimal resource allocation strategy is determined, the optimal resource allocation strategy is determined by taking the performance limit of the task into consideration and simultaneously taking the running time model and the cost model of the task into consideration, so that the resource allocation without perceived computation on the server achieves the optimization of cost (or delay) while meeting the delay (or cost) limit required by the user.

The foregoing description is only an overview of the technical solutions of the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the above-mentioned and other objects, features and advantages of the present application more clearly understood, the following detailed description of the present application will be given.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

Fig. 1 is a schematic diagram of a system for server unaware resource allocation according to an embodiment of the present application;

fig. 2 is a schematic diagram of determining an optimal resource configuration in a system in which a server does not have a perceived resource configuration according to an embodiment of the present application;

FIG. 3 is another schematic diagram of a system for server unaware resource allocation according to an embodiment of the present application;

fig. 4 is a flowchart of a method for server unaware resource allocation according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of a system for server unaware resource allocation according to an embodiment of the present application, as shown in fig. 1, the system 100 includes: a composer 101, a function running module 102; the orchestrator 101 comprises a task analysis module 1011 and a resource configuration automatic generation module 1012; the task analysis module is used for analyzing the historical operation information of the task submitted by the user and determining the operation time model and the cost model corresponding to each stage in the task; the automatic resource allocation generating module is used for generating an optimal resource allocation strategy corresponding to the task according to the running time model, the cost model, the performance limit designated by the user and the task; and the function operation module is used for executing the task based on the optimal resource allocation strategy.

In this embodiment, the task (i.e., workflow) for performing server-agnostic computation includes multiple phases, with a corresponding function in one phase. A user submits a task for server unaware calculation to a orchestrator of a workflow, wherein the task comprises a DAG (Directed Acyclic Graph directed acyclic graph) corresponding to the task, and the DAG comprises function codes of each execution stage of the task and data dependency relations among the execution stages. After receiving a task submitted by a user, the scheduler of the workflow registers the task to the function operation module, and initializes a function monitoring module in the function operation module, wherein the function monitoring module is used for monitoring a function execution process in the task when the task is executed. After the task registration is completed, the orchestrator receives performance limits submitted by the user, including but not limited to delays, which refer to the run time that the task needs to take to execute, and costs, which refer to the cost of the expense that the task needs to take to execute. After the scheduler receives the performance limit of the task, the task analysis module in the scheduler obtains the historical operation information recorded by the function log module in the scheduler and generated by past execution of the task, and analyzes the historical operation information to obtain the respective operation time model and cost model of each stage in the task, namely, for any one stage of the task, the task analysis module has one operation time model and one cost model corresponding to the any one stage. Wherein the runtime model of a phase is a function curve between the runtime of the phase and the resource configuration for the phase, including but not limited to the parallelism of executing the phase, which refers to the number of compute instances used to execute the function of the phase, and the number of CPUs.

In this embodiment, in the case where a user submits a task for performing server-unaware computation to the orchestrator of the workflow as a task submitted for the first time, that is, the task has not been previously performed with server-unaware computation, there is no history running information of the task. At this time, the scheduler of the workflow will execute the task under different resource configurations, that is, different parallelism and different number of CPUs, so as to obtain the historical running information of the task, and determine the running time model and the cost model corresponding to each stage in the task based on the same implementation mode. Meanwhile, it should be understood that, under the condition that the user determines the running time model and the cost model corresponding to each stage in the task through analysis processing of the task analysis module before submitting the task of performing server-less calculation to the orchestrator of the workflow, the running time model and the cost model corresponding to each stage in the task are provided in the task analysis module, the running time model and the cost model corresponding to each stage in the task can be directly called at this time, and the running time model and the cost model corresponding to each stage in the task can be updated according to the newly generated running information after the new running of the task is completed.

In this embodiment, after determining, by the task analysis module, the runtime model and the cost model corresponding to each stage in the task submitted by the user, the resource configuration automatic generation module in the scheduler of the workflow receives the runtime model and the cost model corresponding to each stage in the task, and generates, in combination with the performance limit submitted by the user and the property of the task, an optimal resource configuration policy satisfying the performance limit, and inputs the optimal resource configuration policy to the function operation module, so that the function operation module adjusts the resource configuration of the function for each stage in the task based on the optimal resource configuration policy. Because of the trade-off relationship between latency and cost, when the performance bound submitted by the user is that for latency (e.g., run time is not higher than s seconds), the optimal resource allocation policy is the lowest cost resource allocation policy that meets the performance bound for the latency; when the performance bound submitted by the user is a performance bound for a cost (e.g., the cost is no more than mcost), the optimal resource allocation policy is the least run-time consuming resource allocation policy that meets the performance bound for the cost. The properties of the task include, but are not limited to, the dependency relationship of each stage in the task, the resource information of each stage, and the like.

In this embodiment, after the function execution module performs the corresponding resource configuration based on the determined optimal resource configuration policy, the user sends an execution request for the task to the scheduler of the workflow, and the scheduler of the workflow forwards the execution request to the function execution module, and the function execution module in the function execution module performs the task based on the configured resource and returns the execution result. The function execution module is deployed to a centralized node, and the function execution module executes tasks in the server unaware workflow job according to the optimal resource allocation policy determined by the orchestrator. The function execution module is responsible for calling the SDK of the server non-aware platform and is used for modifying function configuration, calling functions and transferring metadata of intermediate data.

In combination with the above embodiment, in an implementation manner, the embodiment of the application further provides a system for server unaware resource configuration. In the system, the task analysis module includes: the operation time model building module is used for fitting operation time functions of each stage in the task according to the historical operation information to obtain operation time models corresponding to each stage in the task; and the cost model construction module is used for multiplying the running time models corresponding to the stages in the task by the charging factors respectively to obtain the cost models corresponding to the stages in the task.

In this embodiment, the run-time model of the stage in the task of the server non-perceptual computation is actually a function expression result obtained by fitting a random variable in a function expression, and the function expression result is different from the original function expression in that the random variable becomes a determined known variable after being fitted, and the determined known variable is a specific value range. The function expression comprises a plurality of random variables, the function of the historical operation information of the task is to fit each random variable in the function expression, and after the random variables in the function expression of each stage in the task are fitted, the function expression results of each stage in the task, namely the operation time models of each stage in the task, can be obtained. Specifically, the task analysis module comprises a running time model construction module, which is used for fitting random variables in running time functions of each stage in the task according to historical running information of the task submitted by a user. After the fitting of the random variables in the running time functions of each stage in the task is completed, the random variable result obtained by the fitting of the stage is carried into the running time function of the stage for any stage, and the running time function result obtained by the fitting of the stage is the running time model of the stage. That is, the original run-time function of the stage includes a plurality of random variables, and after fitting, the fitting result of the random variable corresponding to the stage is brought into the run-time function of the stage, so as to obtain a run-time function result of the stage, wherein the random variable in the run-time function result becomes a determined known variable, and the determined known variable is actually a specific value range, and the run-time function result is the run-time model of the stage. The running time functions of all the stages in the task are identical, and the difference is that after fitting is performed based on the historical running information of the task, the fitting results of random variables in the running time functions of all the stages are different.

In this embodiment, for the cost model, the current charging mode of the server-less aware-computing platform is that the running time of the function is multiplied by a fixed charging factor, which is the corresponding cost, so that the cost model for any one stage in the task in this application is that the running time model for any one stage is multiplied by the corresponding charging factor, and the charging factor changes due to the different server-less aware-computing platforms, so that the charging factor can be determined according to the specific server-less aware-computing platform, and is not specifically limited herein.

In combination with the above embodiment, in an implementation manner, the embodiment of the application further provides a system for server unaware resource configuration. In the system, the runtime model building module includes: the first construction module is used for fitting random variables in the respective initialized running time functions of each stage in the task according to the historical running information to obtain respective corresponding first running time models of each stage in the task; the second construction module is used for fitting random variables in the data read-write operation time functions of each stage in the task according to the historical operation information to obtain second operation time models corresponding to each stage in the task; the third construction module is used for fitting random variables in the respective calculation running time functions of each stage in the task according to the historical running information to obtain respective corresponding third running time models of each stage in the task; and the runtime model construction submodule is used for determining the runtime model corresponding to each stage in the task based on the first runtime model, the second runtime model and the third runtime model corresponding to each stage in the task.

In this embodiment, since the functions of each stage in the task where the server does not have the perception calculation can be subdivided into several steps of initialization, data reading and writing and calculation when the functions are executed, the construction of the runtime function of the stage in the task is also divided into three parts, namely, the initialization runtime function corresponding to the function in the stage when the initialization is performed, the data reading and writing runtime function of the function in the stage when the data reading and writing is performed, and the calculation runtime function of the function in the stage when the calculation is performed.

In this embodiment, the expression for initializing the runtime function is as follows:wherein, when the function of the stage is initialized to be hot start, the corresponding initialization time is 0, and when the function of the stage is initialized to be cold start, the corresponding initialization time is a random variable +.>The random variable will be affected by the network bandwidth and the size of the memory occupied by the virtual machine behind the instance of the execution function.

In this embodiment, the expression of the data read-write run-time function is as follows:wherein->Representing the parallelism of the phases, namely the number of instances used in executing the functions in the phases, in a cloud platform, an instance refers to a container or a lightweight virtual machine; / >Representation phaseBandwidth available to the medium function; />The number of CPU cores; />The bandwidth available for a CPU core belongs to random variables; />Representing the total bandwidth of the equipment where the function is located, belonging to random variables; />The size of the data input representing the function belongs to a random variable;an overhead representing a fixed data transfer, such as an API call overhead, belongs to a random variable.

In this embodiment, the expression of the function's calculation run-time function is as follows:wherein->The calculation amount of each CPU core in the function of the stage; />And->Coefficients that are polynomials; />And->Is the number of terms of the polynomial. The expression can fit calculation logic with polynomial and logarithmic polynomial complexity, and for calculation of non-conforming polynomial or logarithmic polynomial complexity, the expression can be well fit due to limited independent variable rangeThe running time is counted up.

In this embodiment, a first building module in the runtime model building module fits random variables in the respective initialized runtime functions of each stage in a task based on historical running information and the initialized runtime functions of the task submitted by a user, and after the fitting is completed, respective fitting results of each stage are brought into the respective initialized runtime functions to obtain respective corresponding first runtime models of each stage. And a second building module in the operation time model building module is used for fitting random variables in the respective data read-write operation time functions of each stage in the task based on the historical operation information and the data read-write operation time functions of the task submitted by the user, and after the fitting is completed, the respective fitting results of each stage are brought into the respective data read-write operation time functions to obtain the respective second operation time model of each stage. And a third building module in the operation time model building module is used for fitting random variables in the respective calculation operation time functions of each stage in the task based on the historical operation information and calculation operation time functions of the task submitted by the user, and after the fitting is completed, the respective fitting results of each stage are brought into the respective calculation operation time functions to obtain the respective third operation time model of each stage. The fitting result of the random variable is a specific value range.

In this embodiment, after the first runtime model, the second runtime model, and the third runtime model of each stage in the task submitted by the user are obtained by fitting, the first runtime model, the second runtime model, and the third runtime model of each stage in the task submitted by the user are summed by the runtime model building sub-module in the runtime model building module, so as to obtain the runtime model of each stage in the task. That is, for each stage in the task submitted by the user, there is a first runtime model, a second runtime model, and a third runtime model corresponding to the stage, and the result obtained by summing the first runtime model, the second runtime model, and the third runtime model corresponding to the stage is the runtime model of the stage.

In combination with the above embodiment, in an implementation manner, the embodiment of the application further provides a system for server unaware resource configuration. In the system, the resource configuration automatic generation module comprises: a mathematical model construction module, configured to construct a mathematical model of the task according to the runtime model, the cost model, a performance limit specified by a user, and the task, where the mathematical model includes an objective function and a constraint condition; the sampling module is used for sampling each random variable in the fitting range of each random variable in the mathematical model obtained by fitting to obtain a first objective function and a first constraint condition; and the solving module is used for determining an optimal resource configuration strategy of the task based on the first objective function and the first constraint condition.

In this embodiment, after the task analysis module in the orchestrator applied to the server unaware computing platform fits and obtains the running time model and the cost model of each stage in the task submitted by the user, the mathematical model construction module in the resource allocation automatic generation module constructs a mathematical model of the task according to the running time model and the cost model of each stage in the task, the performance limit submitted by the user and the submitted task itself, so as to convert the problem of determining the optimal resource allocation strategy into a mathematical problem. Wherein the mathematical model includes an objective function and constraints.

Specifically: in the case where the user submitted performance limit is the performance limit for delay, i.e., the performance limit for run time, the expression of the mathematical model constructed is:；. Wherein (1)>Representing the parallelism configuration of all phases of a task, i.e. the sum of the function instances that are required for executing the functions of all phases of the task, +.>Representing a parallelism configuration of a first stage; />The sum of the number of CPUs representing all phases of the task, < +.>Representing the number of CPUs in the first stage; / >Is indicated at->The cost of the task under the configuration of (1) and the specific calculation can be obtained by calculating a cost model of all stages in the task; />Is indicated at->Run time of task under configuration, +.>Representing a user-specified delay performance bound. The objective function of the mathematical optimization problem is to minimize the cost of task execution>The constraint is that the run-time meets the delay performance limit submitted by the user, i.e. +.>The confidence level of (2) is required to be equal to or greater than a preset value +.>. Wherein the preset value->Preferably 99.9%, where the performance limit submitted by the user is met, there is a corresponding confidence because it is found that there is a certain difference in the execution time for the same task during the actual execution even if the configuration for the task is the same, so that executing the same task multiple times with the same configuration that meets the performance limit submitted by the user is a case that may not meet the performance limit submitted by the user, so that the present application, in order to ensure that the optimal resource allocation policy determined by the present application can ensure that the performance limit submitted by the user is met as much as possible, sets in the constructed constraint conditions that the confidence that meets the performance limit submitted by the user needs to be greater than or equal to a preset value >(e.g., 99.9%) to define that the ultimately determined optimal resource allocation policy has a 99.9% probability of meeting the user submitted performance limits.

Specifically: in the case where the performance limit submitted by the user is the performance limit for the cost of the fee, the expression of the constructed mathematical model is:

；/>. Wherein (1)>Representing the parallelism configuration of all phases of a task, i.e. the sum of the function instances that are required for executing the functions of all phases of the task, +.>Representing a parallelism configuration of a first stage; />The sum of the number of CPUs representing all phases of the task, < +.>Representing the number of CPUs in the first stage; />Is indicated at->The cost of the task under the configuration of (1) and the specific calculation can be obtained by calculating a cost model of all stages in the task; />Is indicated at->The specific calculation of the running time of the task under the configuration of (1) can be obtained by calculation of a running time model of all stages in the task; />Representing user-specified cost performance limits. The objective function of the mathematical optimization problem is to minimize the run time of task execution +.>The constraint is that the cost meets the cost performance limit submitted by the user, i.e. +.>The confidence level of (2) is required to be equal to or greater than a preset value +. >. Wherein the preset value->Preferably 99.9%, there is a corresponding confidence in meeting the performance limits submitted by the user because it is found in the actual execution that there is a certain difference in the corresponding execution time for the same task even if the configuration for the task is the same, thus taking a singleMultiple execution of the same task with the same configuration satisfying the performance limit submitted by the user may be in a situation that the performance limit submitted by the user is not satisfied, so in order to ensure that the optimal resource allocation policy determined by the application can ensure that the performance limit submitted by the user is satisfied as much as possible, the application sets a confidence level of satisfying the performance limit submitted by the user in the constructed constraint condition to be greater than or equal to a preset value->(e.g., 99.9%) to define that the ultimately determined optimal resource allocation policy has a 99.9% probability of meeting the user submitted performance limits.

In this embodiment, since there may be multiple stages for one task to execute in parallel, one task is configuredLower->The cost of the fee is accumulated in all stages and the task is configuredLower->The run time is not simply an accumulation of all phases, but rather determines which phases to execute in parallel based on dependencies between phases in a task, and only the run time of the phase that spends the most run time is needed for the phases that execute in parallel, so a task is configured >Lower->Cost of fee and->Runtime is not a simple cause of billingSub-multiple relationship. The reason why the mathematical model construction module needs to use the task itself when constructing the mathematical model of the task is also because the mathematical model construction module needs to determine which phases will be executed in parallel based on the information such as the dependency relationship among the phases in the task itself, so as to construct the mathematical model corresponding to the task.

In the present embodiment, due to the aboveAnd->The specific formulas of the model (a) contain random variables with fitting results in specific value ranges, so that the mathematical optimization problem corresponding to the mathematical model is a probability constraint problem.

In this embodiment, in order to determine an optimal resource allocation policy for a task submitted by a user based on the mathematical model, the probability constraint problem corresponding to the mathematical model is converted into a deterministic optimization problem by a sampling module in the present application.

Specifically: since the cost and the running time in the mathematical model are obtained based on the running time model and the cost model of all the stages in the task, the random variables in the running time model and the cost model are specific value ranges. Therefore, the sampling module is used for sampling specific numerical values of each random variable in the running time model and the cost model in the mathematical model constructed for the task submitted by the user in the value range obtained by self fitting, so that the objective function and the constraint condition in the mathematical model are converted into the determined first objective function and the determined first constraint condition. When sampling is performed on any random variable, sampling can only be performed within a value range obtained by fitting the random variable.

In this embodiment, after the objective function and the constraint condition in the mathematical model constructed for the task submitted by the user are converted into the determined first objective function and the first constraint condition by the sampling module, the probability constraint problem corresponding to the mathematical model is converted into the deterministic optimization problem by the sampling module in this application. Then, a solution module in the automatic resource allocation generating module determines an optimal resource allocation result based on the obtained first objective function and the first constraint condition to form an optimal resource allocation strategy of the task submitted by the user.

In combination with the above embodiment, in an implementation manner, the embodiment of the application further provides a system for server unaware resource configuration. In the system, the sampling module includes: the first sampling module is used for sampling each random variable for a preset number of times in the fitting range of each random variable in the constraint condition of the mathematical model obtained by fitting, so as to obtain the first constraint condition; the second sampling module is used for sampling the average value of each random variable in the fitting range of each random variable in the objective function of the mathematical model obtained by fitting to obtain a first objective function; the first sampling module further includes: and the sampling frequency determining module is used for determining the value of the preset frequency through a preset algorithm.

In this embodiment, for the constraints in the mathematical model described above, the more samples, the more determined constraints are obtained, and the greater the probability that the performance limit submitted by the end user is satisfied. The present application specifies probability values, i.e. preset values, that the corresponding user-submitted performance limits are met when constructing the mathematical modelOr a preset value->Undersampling may result in failure to meet the probability values that the user-submitted performance limits specified in the mathematical model are met, and oversubscription may result in wasted computing resources. In order to solve the problem, the application provides a preset algorithm, and under the condition that a mathematical model is built, the sampling frequency of the sampling module to the constraint condition is determined through the preset algorithm.

The expression of the preset algorithm is as follows:wherein->Representation->I.e. the total parallelism and number of CPUs that can be provided to the task; />Representing a percentage of delay, such as p95 delay. It can be demonstrated by Hough inequality and approximate sampling theory that the sampling frequency is +.>When the inequality of the preset algorithm is met, the probability that the performance limit submitted by the user is met when the task submitted by the user is executed by the determined optimal resource allocation strategy can be guaranteed to reach a preset value.

In this embodiment, the first sampling module in the sampling module performs sampling of the monte carlo for a preset number of times in the respective value range of each random variable in the constraint condition in the data model based on the preset number of times of sampling determined by the sampling number determining module, so as to obtain a first constraint condition of a corresponding preset number. I.e. how many times there are corresponding determined constraints for random variables in the constraints in the mathematical model, all of which will constitute the first constraint. For example, the random variables in the constraint in the mathematical model comprise 5, and 3 samples are taken for the 5 random variables, i.e. 3 samples are taken for each random variable, whereby 3 constraints are obtained, which together constitute the first constraint.

In this embodiment, the second sampling module in the sampling module directly samples the average value of each random variable in the objective function in the mathematical model in the respective value range, so as to obtain a determined objective function, where the determined objective function is the first objective function.

In combination with the above embodiment, in an implementation manner, the embodiment of the application further provides a system for server unaware resource configuration. In the system, the first sampling module includes: the first sampling submodule is used for sampling each random variable for a preset number of times in the respective fitting range of each random variable in the constraint conditions of the mathematical model obtained by fitting, and obtaining initial constraint conditions of corresponding preset numbers; and the constraint condition pruning module is used for carrying out redundant pruning on the initial constraint conditions with the preset number according to a preset rule to obtain the residual initial constraint conditions after the redundant pruning, wherein the residual initial constraint conditions form the first constraint condition.

In this embodiment, in the foregoing embodiment, after the constraint condition in the mathematical model is sampled for a preset number of times, if the preset number of times is too large, the subsequent solution optimization process is very complex, and the overhead of checking the constraint condition is aggravated. In order to alleviate this problem, the present application prunes redundant constraints by constraining pruning.

Specifically: the method comprises the steps that firstly, a first sampling sub-module in a first sampling module samples each random variable in constraint conditions in a data model in a respective value range for a preset number of times based on the preset number of times of sampling determined by a sampling number determining module, so as to obtain initial constraint conditions with corresponding preset numbers. After the initial constraint conditions of the preset number are obtained, the constraint condition pruning module in the first sampling module performs redundant pruning on the obtained initial constraint conditions of the preset number based on preset rules set in advance, and redundant initial constraint conditions which are in constraint coincidence in the initial constraint conditions of the preset number are removed. And then determining each initial constraint condition remained after pruning as a first constraint condition finally used for determining the optimal resource allocation strategy.

In this embodiment, the preset rule is: representing coefficients of each unknown variable in the constraint expression as vectors，/>Representing constraint->The vector of coefficients of all unknown variables in +.>In the case of (2), then constraint->And constraint condition->Constraint condition->Will be deleted because of the +.>The corresponding feasible region is +.>Corresponding feasible domains. />Representing a series of constraints, pruning based on the preset rule will find +.>Is formed by the smallest subset of (a) and (b)>Feasible region and->The same applies.

For example, constraintsIs->Constraint condition->Is->Due to constraint->Unknown variable->The coefficient of (2) is smaller than the constraint->Unknown variable->Coefficient of (2) while constraint->Unknown variable->The coefficient of (2) is smaller than the constraint->Unknown variable->Coefficient of (2), thus->Will be pruned.

In combination with the above embodiment, in an implementation manner, the embodiment of the application further provides a system for server unaware resource configuration. In the system, the solving module includes: the local optimal resource allocation determining module is used for determining local optimal resource allocation through a gradient descent algorithm based on the first objective function and the first constraint condition; and the optimal resource allocation strategy determining module is used for determining a feasible optimal resource allocation strategy around the local optimal resource position based on the determined local optimal resource allocation.

In this embodiment, after the probability constraint problem corresponding to the mathematical model is converted into the deterministic optimization problem, the deterministic optimization problem can be determined to be a convex optimization problem by each of the runtime functions, so that, for the deterministic optimization problem, the local optimum of the deterministic optimization problem is the global optimum. Thus, the present application first solves by a gradient descent algorithm based on the determined first objective function and the first constraint, each point in the solution representing a resource allocation, i.e. aThe solution by the gradient descent algorithm is to solve to a local extremum point, and since the deterministic optimization problem is a convex optimization problem, the local extremum point is the global extremum point, and the resource configuration corresponding to the local extremum point is an optimal resource configuration. However, in the actual solving process, the local extremum point solved by the gradient descent algorithm may be a decimal, and the actual resource configuration is discrete, that is, the parallelism and the number of CPUs take integers, and the local extremum point may not be the actual optimal resource configuration. Therefore, after determining the local extremum point, the method further detects around the local extremum point, and in the detection process, each feasible point around the local extremum point is iteratively checked, wherein the feasible points are all the rounded points meeting the first objective function and the first constraint condition, the final optimal resource configuration is found in each feasible point, and the optimal resource configuration is determined as an optimal resource configuration strategy for executing the task submitted by the user.

In this embodiment, as shown in fig. 2, fig. 2 is a schematic diagram illustrating determination of an optimal resource configuration in a system with no perceived resource configuration of a server according to an embodiment of the present application. As shown in FIG. 2, each point in the graph represents a resource allocation, i.e. The feasible region circles out the points, i.e. the resource configurations, that fulfil the above-mentioned first objective function and first constraint. The first procedure is to determine the local extreme point by a gradient descent algorithm: firstly, randomly selecting an entry point from a feasible domain, and iterating according to the gradient until a local extreme point is approached, wherein the local extreme point is the position of an arrow end in the figure. Since the local extremum point may not be an integer, it may not be the actual optimal resource configuration. The second process, namely the probing process, is performed at this time: and defining a detection range around the local extreme point, then taking all feasible points in the detection range, namely taking the feasible points as actual resource allocation points, after finding all the feasible points in the detection range, finding one feasible point with the lowest cost from all the feasible points when the performance limit specified by the user is the running time, determining the feasible point with the lowest cost as the optimal resource allocation, and finding one feasible point with the lowest running time from all the feasible points when the performance limit specified by the user is the cost, and determining the feasible point with the lowest cost as the optimal resource allocation.

Specifically: the method comprises the steps that firstly, a local optimal resource configuration determining module in a solving module solves to a local extreme point through a gradient descent algorithm based on a determined first objective function and a first constraint condition, and the local extreme point is the local optimal resource configuration. After the local extreme point is solved, the optimal resource allocation strategy determination module in the solving module is used for detecting feasible points around the local extreme point based on the determined local extreme point, determining an optimal feasible point from all detected feasible points, and determining the feasible point as the final optimal resource allocation, so that the final optimal resource allocation strategy is obtained. The solving module can solve the first objective function and the first constraint condition through a gradient descent algorithm based on the Scipy software package, so that a local extreme point is obtained. It should be appreciated that, based on other embodiments, the solution module may also solve the first objective function and the first constraint condition by using a gradient descent algorithm, so as to obtain a local extremum point, which is not specifically limited herein.

In combination with the above embodiment, in an implementation manner, the embodiment of the application further provides a system for server unaware resource configuration. In the system, the orchestrator further comprises a function log module; the function log module is used for recording operation information obtained after each task is executed so as to form respective historical operation information of each task.

In this embodiment, the orchestrator further includes a function log module. The function log module is used for recording the operation information obtained after each task is executed, and storing the obtained operation information of each task to form the respective historical operation information of each task. The function execution module in the function operation module executes the task based on the configured resource and returns the execution result, and simultaneously reports the function execution result and the abnormality information to the function log module of the orchestrator. If an abnormality occurs, the function execution module may assist the function log module in terminating the task or re-executing the task. Meanwhile, the function log module receives available resources on each server reported by the function monitoring module in the function operation module, wherein the available resources comprise idle CPU, memory and the like. In addition, the function log module is also responsible for recording the execution condition of each function in the task, recording the execution result, communication condition, execution time, running cost and task state of the function, and forwarding an abnormal message if the function is abnormal, so that the scheduler controls the function running module to execute the abnormal function again. The function monitoring module is also responsible for recording the execution time and resource consumption of the task, calculating the running cost of the task and reporting to a log function module in the orchestrator through network communication. When an exception occurs, the function execution module reports the exception information to the log function module and terminates or re-executes the task according to feedback from the orchestrator.

As shown in fig. 3, fig. 3 is another schematic diagram of a system for server unaware resource allocation according to an embodiment of the present application. The system for server non-perception resource allocation mainly comprises two parts, namely a workflow orchestrator and a function operation module, wherein the workflow orchestrator comprises a resource allocation automatic generation module, a task analysis module and a function log module, and the function operation module comprises a function execution module and a function monitoring module.

According to the system for the server non-perception resource allocation, the complex relation between the resource allocation and the application performance of the task is captured through the accurate running time model and the accurate cost buying model. The probability constraint problem with user-defined performance (delay or cost) bounds is quickly solved based on this runtime model and cost module. The developer only needs to specify the performance limit of delay or cost for the task, the system provided by the application can automatically configure resources, minimize the execution delay under the cost limit or minimize the cost under the delay limit, and allow the developer to select a resource configuration point corresponding to the proper performance requirement in the whole delay and cost Pareto front curve. Firstly, by utilizing the characteristic of non-perception execution of a server, the performance prediction model (namely the running time model and the cost) based on a random function and an analysis model is established, and uncertainty and difference of function execution in non-perception calculation of the server are embodied by random traversal. Based on the performance prediction model, a probability constraint model (i.e., the mathematical model described above) is constructed based on the delay or cost limits entered by the user, and is first transformed into a deterministic optimization problem by sampling the random variables with monte carlo, and the performance limits are guaranteed by the inequality of the preset algorithm described above. The present application also exploits the convex nature of the problem and uses efficient gradient descent algorithms to find the optimal resource allocation under performance limits.

The system for server unaware resource allocation can be realized based on isomorphic computing servers, and can also be realized based on heterogeneous computing clusters comprising various heterogeneous servers and acceleration hardware (such as GPU). Meanwhile, the system can be integrated with a multi-task resource allocation mechanism of different users, so that the resource utilization rate is improved and the user cost is reduced. The system fully considers the elastic resource allocation of the server without perception calculation and the price charging of fine granularity, aims at the delay or cost performance limit input by the user, and automatically allocates the resource allocation of the functions of each stage in the task under the condition of meeting the performance limit, thereby achieving the optimal cost or delay. The system mainly comprises two parts, one is a centralized working orchestrator, and the other is a function running module. The function operation module is responsible for executing corresponding tasks according to the optimal resource configuration strategy solved by the scheduler. In order to verify the performance of a system provided by the application, wherein the server does not have perceived resource allocation, a corresponding system prototype is implemented and evaluated. Experiments are carried out by using three typical server non-perception workflow tasks, and performance evaluation is carried out on a system prototype in terms of task completion time, running cost, solving time and the like for big data analysis, machine learning assembly line and video analysis. Experimental results show that compared with a baseline method, the system with the server without perceived resource allocation can reduce the test time of tasks by up to 2.1 times, reduce the running cost by up to 2.5 times, and the solving time is negligible.

Based on the same inventive concept, a second aspect of the present application provides a method for server non-aware resource allocation, as shown in fig. 4, and fig. 4 is a flowchart of a method for server non-aware resource allocation provided in an embodiment of the present application. The method comprises the following steps:

step S1: analyzing historical operation information of a task submitted by a user, and determining an operation time model and a cost model corresponding to each stage in the task;

step S2: generating an optimal resource allocation strategy corresponding to the task according to the running time model, the cost model, the performance limit designated by the user and the task;

step S3: and executing the task based on the optimal resource allocation strategy.

Optionally, the analyzing the historical operation information of the task submitted by the user, determining the operation time model and the cost model corresponding to each stage in the task, includes:

fitting the respective running time functions of each stage in the task according to the historical running information to obtain respective corresponding running time models of each stage in the task;

and multiplying the running time models corresponding to the stages in the task by the charging factors respectively to obtain the cost models corresponding to the stages in the task.

Optionally, the fitting the running time function of each stage in the task according to the historical running information to obtain the running time model corresponding to each stage in the task includes:

fitting random variables in the respective initialized running time functions of each stage in the task according to the historical running information to obtain respective corresponding first running time models of each stage in the task;

fitting random variables in the data read-write operation time functions of each stage in the task according to the historical operation information to obtain second operation time models corresponding to each stage in the task;

fitting random variables in the respective calculation running time functions of each stage in the task according to the historical running information to obtain respective corresponding third running time models of each stage in the task;

and determining the corresponding operation time model of each stage in the task based on the corresponding first operation time model, the corresponding second operation time model and the corresponding third operation time model of each stage in the task.

Optionally, the generating an optimal resource allocation policy corresponding to the task according to the runtime model, the cost model, the performance limit specified by the user and the task includes:

constructing a mathematical model of the task according to the runtime model, the cost model, a user-specified performance limit and the task, wherein the mathematical model comprises an objective function and a constraint condition;

sampling each random variable in a fitting range of each random variable in the mathematical model obtained by fitting to obtain a first objective function and a first constraint condition;

an optimal resource allocation policy for the task is determined based on the first objective function and the first constraint.

Optionally, the sampling each random variable in the fitting range of each random variable in the mathematical model obtained by fitting to obtain a first objective function and a first constraint condition, including:

sampling each random variable for a preset number of times in each fitting range of each random variable in constraint conditions of the mathematical model obtained by fitting, so as to obtain the first constraint conditions;

And in the fitting range of each random variable in the objective function of the mathematical model obtained by fitting, sampling the average value of each random variable to obtain a first objective function.

Optionally, the method further comprises: and determining the value of the preset times through a preset algorithm.

Optionally, the sampling the random variables for a preset number of times in the fitting range of each random variable in the constraint condition of the mathematical model obtained by fitting, to obtain the first constraint condition includes:

sampling each random variable for preset times in each fitting range of each random variable in constraint conditions of the mathematical model obtained by fitting, and obtaining initial constraint conditions of corresponding preset number;

and performing redundant pruning on the initial constraint conditions with the preset number according to a preset rule to obtain residual initial constraint conditions after the redundant pruning, wherein the residual initial constraint conditions form the first constraint condition.

Optionally, the determining, based on the first objective function and the first constraint condition, an optimal resource allocation policy of the task includes:

Determining a local optimal resource configuration through a gradient descent algorithm based on the first objective function and the first constraint condition;

and determining a feasible optimal resource configuration strategy around the local optimal resource position based on the determined local optimal resource configuration.

Optionally, the method further comprises: and recording the operation information obtained after each task is executed to form the respective historical operation information of each task.

For the method embodiment, since it is substantially similar to the system embodiment, the description is relatively simple, and the relevant points are referred to in the description of the system embodiment.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application. Any modifications, equivalent substitutions, improvements, etc. that are within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A system for server unaware resource allocation, the system comprising: a composer and a function running module; the scheduler comprises a task analysis module and a resource configuration automatic generation module;

2. The server unaware resource configuration system of claim 1, wherein the task analysis module comprises:

3. The server-unaware resource configuration system of claim 2, wherein the runtime model building module comprises:

4. A server unaware resource configuration system as in claim 3, wherein the resource configuration auto-generation module comprises:

5. The system for server-unaware resource allocation of claim 4, wherein the sampling module comprises:

6. The server-unaware resource configuration system of claim 5, wherein the first sampling module further comprises:

7. The system for server unaware resource allocation of claim 6, wherein the first sampling module comprises:

8. The server-unaware resource configuration system of claim 7, wherein the solution module comprises:

9. The server unaware resource allocation system of claim 1, wherein the orchestrator further comprises a function log module;

10. A method for server unaware resource allocation, the method comprising:

And executing the task based on the optimal resource allocation strategy.