CN111258746B

CN111258746B - Resource allocation method and service equipment

Info

Publication number: CN111258746B
Application number: CN201811455536.7A
Authority: CN
Inventors: 张杨; 冯亦挥; 李治; 汤志鹏
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2023-04-25
Anticipated expiration: 2038-11-30
Also published as: CN111258746A

Abstract

The application provides a resource allocation method and service equipment, wherein the method comprises the following steps: acquiring resource use data of allocated resources in a resource pool; determining a surplus resource from the allocated resources according to the resource use data of the allocated resources; and distributing the allowance resources to target jobs of the resources to be distributed. The technical effects of effectively reducing the resource waste and improving the resource utilization rate are achieved by solving the problem of resource waste existing in the existing resource allocation method in the distributed system.

Description

Resource allocation method and service equipment

Technical Field

The application belongs to the technical field of data processing, and particularly relates to a resource allocation method and service equipment.

Background

With the development of data processing technology, data processing methods for performing job processing using a distributed system are becoming popular. Based on a distributed system, most of the existing resource allocation methods are that a job manager sends a resource use application to a resource scheduler according to a job scale involved in execution of a target job so as to apply for physical resources meeting the resource demand of job execution; the resource scheduler searches static resources meeting the resource demand from unallocated resources according to the resource use application, and provides the static resources for the job manager; and then the job manager sends the job node for executing the target job to the machine where the static resource is located so as to complete the corresponding job.

However, in the above-described resource allocation method, in order to ensure that the target job can be executed stably, a relatively large amount of resources is often set as the required amount of resources according to the scale of the target job. However, for the target job, a large amount of resources is not required at the time of actual execution, and these resources are wasted.

Aiming at the problem of resource waste existing in the existing resource allocation mode, no effective solution is proposed at present.

Disclosure of Invention

The application aims to provide a resource allocation method and service equipment so as to solve the problem of resource waste in the prior art.

The application provides a resource allocation method and service equipment, which are realized by the following steps:

a resource allocation method, comprising:

acquiring resource use data of allocated resources in a resource pool;

determining a surplus resource from the allocated resources according to the resource use data of the allocated resources;

and distributing the allowance resources to target jobs of the resources to be distributed.

A service device comprising a processor and a memory for storing processor-executable instructions that when executed by the processor implement:

Acquiring resource use data of allocated resources in a resource pool;

A computer readable storage medium having stored thereon computer instructions that when executed perform the steps of:

acquiring resource use data of allocated resources in a resource pool;

According to the resource allocation method and system, the surplus resources in the allocated resources are determined, and the surplus resources are allocated to the target job of the resources to be allocated, namely, the surplus data in the allocated resources are secondarily allocated, so that the technical problem of low resource utilization rate in the existing method can be solved, and the technical effects of fully utilizing the resources and improving the job processing efficiency are achieved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 is a schematic architecture diagram of a resource allocation system provided herein;

FIG. 2 is another architecture diagram of the resource allocation system provided herein;

FIG. 3 is a schematic diagram of user request job processing provided herein;

FIG. 4 is a timing diagram of resource allocation provided herein;

FIG. 5 is a method flow diagram of a resource allocation method provided herein;

FIG. 6 is a block diagram of a service device provided herein;

fig. 7 is a block diagram of the resource allocation device provided in the present application.

Detailed Description

In order to better understand the technical solutions in the present application, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.

Considering that the prior resource allocation method determines the peak value of the resource quantity required by the target job when allocating the running resource for the target job, and then searches the unallocated resource for the resource which can reach the determined peak value of the resource quantity to be allocated to the target job. However, in actual job processing, the usage amount of resources by the target job is not always kept at the peak value of the resource amount, and the resources consumed when the target job is executed in most of the time are often smaller than the peak value of the resource amount.

For example, the target job may consume a total amount of resources previously allocated to the target job only for a very small period of time in the middle of execution, and the amount of resources actually utilized to run the target job during other periods of time may be much lower than the peak amount of resources described above. At this time, these unused margin resources (i.e., the difference between the total resources allocated to the target job and the actual resources used by the running target job) are left idle, and these margin resources are left idle for this period of time, which results in waste of resources.

Further, since the amount of resources in the system is limited, when there are many jobs, if the resources have already been allocated, other jobs can only enter a waiting state, and even if there are surplus resources in the amount of resources allocated for some target jobs, these surplus resources cannot be allocated to jobs in the waiting state.

Based on this, in this example, it is considered that if the remaining resources among the allocated resources can be allocated to the job of the resource to be allocated, the processing load of the system when the job is more can be relieved from some extent. Specifically, according to the use condition of the resources in the jobs with the allocated resources, the surplus resources can be determined, and the surplus resources are allocated to the jobs in the queuing state of the system, so that the jobs in the queuing state are processed.

As shown in fig. 1, an embodiment of the present application provides a resource allocation system. Wherein the resource allocation system may comprise: a resource allocation server 101 and a plurality of physical resources 102 (machines). Wherein the resource allocation server 101 is configured to allocate a plurality of physical resources, that is, allocate a plurality of physical resources to a target job requesting the resources.

In one embodiment, the resources provided by the physical resources may specifically include one or more of the following resources: disk resources, network resources, CPU resources, GPU resources and the like required by job execution. The physical resource may be a server or a server cluster, or may be a cloud processor, a cloud storage, or the like. It should be noted that the above-listed various types of resources are only for better illustrating the embodiments of the present application. In the specific implementation, other types of resources besides the listed resources can be introduced as the resources provided by the physical resources according to specific situations and job requirements. The present application is not limited thereto. The physical resources may be processing resources, storage resources, etc. required to process the target job.

The above-mentioned resource allocation server 101 may be a single server or a processor, or may be a server cluster, and in actual implementation, a specific implementation manner of the resource allocation server 101 may be selected according to actual needs, which is not limited in this application.

Further, it is considered that if the remaining resources among the allocated resources are required to be allocated to the job in the resource waiting state to achieve processing of the job in the waiting state, it is required to know the resource amount of the remaining resources. In order to determine the resource quantity of the surplus resources in the allocated resources, a machine node for monitoring the use condition of the resources can be arranged on the physical resources, the set machine node can collect the use data of the resources of the physical resources in real time, and the machine node can upload the determined resource surplus to the resource allocation server in real time, so that the resource allocation server can reallocate the surplus resources.

It should be noted that, setting up a machine node on a physical resource to detect a resource usage situation is only an exemplary description, and other manners of determining a resource usage situation may be adopted in actual implementation, for example, a centralized detector may be set to monitor usage situations of all physical resources, or a machine node may be set to monitor resource usage situations of one or more physical resources related to the physical resource where the machine node is located, or usage situations of resource amounts allocated to each target job may be calculated according to execution situations of the target job. The specific manner of determining the usage situation of the resource can be determined according to the actual usage scenario and the usage situation, which is not limited in this application.

Further, considering that the surplus resource is a resource with relatively small resource amount, if resource allocation is performed by searching for a physical resource satisfying the resource amount from the surplus resource based on the job required amount, if the job required amount is large, there is a high possibility that the job cannot be matched with a reasonable surplus resource, so that next job matching is performed, and the matching efficiency is low. For this reason, it is considered that a job which can be processed by first matching the margin resource may be taken as a job reassigned for the margin resource in such a manner that the jobs are matched with the margin resource. For example, the amount of resources of the surplus resources is 15, the job list of the queuing queue may be called, the amounts of resources required for each job in the job list may be successively matched, for example, the amount of resources required for job 1 in the job list is 20, 20>15, and thus, is not satisfied, and if the amount of resources required for job 2 is 13, 13<15, and satisfied, the determination of the next job is made, and therefore, the surplus resources may be allocated to job 2 to realize the processing of job 2.

However, if the determined surplus resource is allocated to the job in the queuing queue (assuming job a), and the resource amount demand of the original job (i.e., the job to which the non-surplus resource has been allocated, assuming job M) increases at this time, it is necessary to re-divide the resource allocated to job a to job M, which corresponds to the need to suspend or stop the processing of job a.

For this purpose, in this example, a resource allocation server is provided, as shown in fig. 2, which may include: a resource scheduler and a job manager, which may be coupled to the resource scheduler. Wherein the job manager may specifically communicate with a plurality of job nodes that may be used to perform specific job tasks.

In a specific implementation, as shown in fig. 3, a user may send a job request to a resource allocation server, and a job manager in the resource allocation server may determine a corresponding target job according to the job request, analyze a resource demand amount to be used for executing the target job, and further generate a resource use application containing the resource demand amount. The job manager sends the resource use application to the resource scheduler so that the resource scheduler can allocate a corresponding resource execution target job from a plurality of physical resources. The resource scheduler is coupled to a machine node of each of the plurality of physical resources. And each physical resource is respectively provided with a machine node which is used for monitoring and recording the resource use data in the corresponding physical resource in real time. The resource usage data may specifically include: the case of unallocated resources among physical resources, the use case of allocated resources among physical resources, and the like.

Specifically, the usage of the allocated resources in the physical resources may further include a current usage of the allocated resources, a current remaining amount of the allocated resources, a usage of the allocated resources within a preset period of time, and so on. The resource scheduler may obtain current resource usage data for each physical resource via the machine node. When the resource scheduler acquires the current resource use data, the resource scheduler can communicate with the machine node in real time, so that the current resource use data can be acquired in real time. Of course, in order to reduce the data processing pressure, the current resource usage data of the physical resource sent by the machine node may also be received at preset time intervals (for example, 2 minutes).

After receiving the current resource usage data, the resource scheduler may first search each physical resource according to the current resource usage data, and determine whether there is an unallocated resource (i.e., a first resource (also referred to as Normal resource) that meets the resource demand of the target job in each physical resource. The above-mentioned resource demand amount according with the target job may be understood as a resource demand amount of the target job or more. For example, the resource demand of the target job a is a CPU of 5G, the resource scheduler may search for a plurality of physical resources according to the current resource usage data, find that 6 CPUs of 6G in the physical resources 2 are not allocated, and allocate the CPU of 5G in the 6G in the physical resources 2 as the first resource to the target job a for use.

In the case that it is determined that the first resource exists in the physical resources, the resource scheduler may send a resource usage list including the first resource information to the job manager and the machine node of the physical resource where the first resource exists, respectively. Wherein the first resource information may include location information of the first resource to indicate which one of the physical resources the first resource is an unallocated resource. The job manager may then send the job node for executing the target job to the physical resource on which the first resource resides based on the resource usage list. The machine node of the physical resource where the first resource is located may allow the job node to temporarily use the first resource in the physical resource according to the above-mentioned resource usage list to execute the target job. It should be noted that the first resource may be understood as a resource with a higher reliability level. Once the resource is allocated to a target job, the target job is protected by the system using the first resource, i.e., the first resource for the target job is not reclaimed until the target job is completed.

In the case that it is determined that the first resource does not exist in the physical resources, the resource scheduler may continue to search for the allocated resources in each physical resource according to the current resource usage data, and search for whether there are remaining resources (also referred to as overstock resources) that are not currently utilized in the allocated resources and whose resource amount satisfies the resource demand amount of the target job. For example, the resource scheduler may search for unallocated resources of the plurality of physical resources based on the current resource usage data, and if it is determined that there are no more than 2 CPUs of the plurality of physical resources (i.e., there are no first resources), may continue to search for an entry of an allocated resource of each of the plurality of physical resources based on the current resource usage data, find that there is one allocated resource (which has been allocated for processing the target job a before) of the physical resources 2, and the total amount of allocated resources is 5 CPUs, but only 3 CPUs of the plurality of CPUs are currently utilized, that is, there are remaining resources including 2 CPUs of the allocated resources, and are currently in an idle state and not utilized. In order to increase the resource utilization rate and avoid that the target job B can only wait for the available resources to continue execution due to the fact that the unallocated resources are temporarily unavailable, the CPUs of 2G which are not currently utilized in the allocated resources in the physical resources 2 can be used as second resources to be temporarily allocated to the target job B. Thus, although there is currently no unallocated resource that meets the resource demand of the target job B, a part of the remaining resources that are not currently utilized may be temporarily called from the previously allocated resources to be temporarily used by other jobs.

The second resource may be understood as a remaining resource which is not currently used and is temporarily borrowed from the first resource which is allocated to use of other jobs. In addition, the second resource has a lower reliability level than the first resource. Specifically, when the target job originally allocated with the first resource needs to be utilized as the remaining resource allocated to other target jobs as the second resource before being used at a certain stage, the system can preferentially ensure the execution of the target job originally allocated with the first resource, namely, the target job allocated with the second resource later can be stopped, and the second resource is returned to be used for the target job originally allocated with the first resource, so that the target job allocated with at least the first resource can be smoothly executed. For example, if there is no unallocated resource satisfying the resource demand, a part of the remaining resources which are not currently used in the first resources originally allocated to the target job a is allocated as the second resources to the target job B in history, and thus both the target job a and the target job B can be executed smoothly. After executing for a period of time, the resource scheduler finds out the resources temporarily allocated before the target job A needs to be used currently according to the updated current resource usage data, at this time, in order to ensure the smooth execution of the target job A originally allocated with the first resources, the resource scheduler may stop the target job B, release and return the part of the remaining resources temporarily borrowed from the first resources of the target job A before, that is, the second resources used by the target job B, so that the target job A still can be stably executed without stopping the job due to lack of resources.

As can be seen, the reliability level of the second resource is lower than that of the first resource, and when the use of the first resource and the use of the second resource conflict (for example, in the process of executing the target job by using the second resource borrowed from the first resource, the target job originally allocated with the first resource is to use the temporarily borrowed second resource), the system will preferentially protect the execution of the target job allocated with the first resource, and stop the execution of the target job allocated with the second resource, and recover the second resource. Such that the target job allocated with the second resource is at a higher risk during execution, to be stopped due to a conflict with the use of the first resource. Accordingly, in order to reduce the target job to which the second resource is allocated from being stopped during execution, when the second resource is retrieved and determined from the allocated first resources, a resource having a risk of collision with the first resource usage less than a threshold parameter for a preset period of time may be selected as the second resource. Specifically, for example, it is possible to retrieve and determine whether or not there is a currently unused resource satisfying the resource demand amount as the second resource from the first resources of the target job in which the resource amount used by the target job has occurred in the peak of the resource amount within the preset period, based on the current resource usage data and the historical resource usage data (i.e., the current resource usage data obtained previously).

It should be noted that, during the execution of a target job, the amount of resources used will not always be at the peak value of the amount of resources, and typically, after the peak value of the amount of resources is experienced, the second peak value of the amount of resources will occur after a relatively long period of time. Therefore, the probability that the second resource determined from the first resource of the target job in which the resource amount used by the target job has occurred in the peak of the resource amount in the preset period of time conflicts with the use of the first resource in one period of time in the future is relatively smaller, so that the target job allocated with the second resource has a higher probability of being executed to completion. Of course, a prediction model of the resource usage situation may be established according to the historical resource usage data, and the allocated resource with lower resource usage in a future time period may be predicted as the target resource according to the prediction model of the resource usage situation, and then the remaining resource satisfying the resource demand may be retrieved from the target resource and determined as the second resource. Of course, it should be noted that the above manner for reducing the risk of temporarily recovering the allocated second resource is only for better explaining the embodiment of the present application, and in the specific implementation, the second resource may be determined in a suitable manner according to the job condition to be processed.

In the present embodiment, the processing method itself may increase the operation load of the physical resources, considering that the unused resources among the allocated resources among the physical resources are temporarily allocated and used as the second resources. Once the operation burden born by the physical resource is too high, exceeding a certain limit value may cause the overall downtime or restarting of the physical resource, which may possibly form a risk for the overall operation of the job. In order to avoid the risk caused by the overhigh physical resource burden, the resource scheduler can acquire current resource use data through a machine node arranged on the physical resource, determine the current running state parameter of the physical resource according to the current resource use data, compare the current running state parameter of the physical resource with the threshold state parameter of the physical resource, and determine whether the current running state parameter is larger than the threshold state parameter. Under the condition that the current running state parameter is determined to be greater than the threshold state parameter, the risk of downtime or restarting of the current physical resource can be judged, in order to protect the stability of the whole running of the operation, the second resource which is already allocated with lower reliability level can be preferentially recovered, and the execution of the target operation of the allocated second resource is stopped, so that the whole stability of the physical resource is ensured.

Further, in consideration of a conflict with the use of the first resource when the target job allocated with the second resource is executed, the second resource is recovered in order to protect the use of the first resource, and the target job allocated with the second resource is terminated. At this time, the execution of the target job returns to zero again, and the execution is restarted after a new resource is newly registered. In fact, the target job has been executed for a period of time by using the second resource before being stopped, some intermediate results are obtained, when the second resource is recovered, the intermediate result obtained by the target job by using the second resource before being stopped can be recorded when the target job is stopped, so that when the target job subsequently obtains a new resource, the job node can execute the subsequent target job by taking the intermediate result as a starting point of job execution, thereby avoiding wasting intermediate data obtained based on the previous second resource and improving the resource utilization rate and the processing efficiency.

Although the risk that the allocated second resource is retracted before the execution of the job is completed can be reduced to a great extent by the above method, the smooth and complete execution of the corresponding target job by using the second resource cannot be completely ensured. Considering that the reliable level of the first resource is higher than that of the second resource, the target job can be guaranteed to be completely executed by utilizing the first resource. Therefore, in the implementation, after the second resource is allocated and the job node starts to run the corresponding target job by using the second resource, and before the target job is executed: the resource adjuster can continuously acquire updated current resource usage data, and search whether unallocated resources meeting the resource demand of the target job exist in each physical resource, namely, first resources according to the updated current resource usage data. When the first resource is retrieved, the newly retrieved first resource is allocated to the target job to which the second resource has been allocated.

As shown in fig. 4, when the newly retrieved first resource is allocated to the target job to which the second resource has been allocated, the case-by-case processing can be performed. Specifically, it may be determined whether the newly retrieved first resource and the allocated second resource are not in the same physical resource. Under the condition that the newly retrieved first resource and the second resource are located in the same physical resource, the newly retrieved first resource can be directly returned to the resource originally allocated to other target jobs, and the label of the second resource used by the target job is modified to be the first resource. Thus, the target operation can be ensured to be stably executed without interruption. When the newly retrieved first resource and the second resource are determined to be located in different physical resources, the original execution link of the target job using the allocated second resource can be maintained, and the execution link of the same target job is opened by using the newly retrieved first resource. Specifically, the resource scheduler may send a resource usage list including the newly retrieved first resource to the job manager and the machine node of the physical resource where the newly retrieved first resource is located. The job manager can then send the job node for executing the same target job to the newly retrieved first resource according to the resource usage list; the machine node may allow the job node to invoke a first resource on the physical resource to execute the target job according to the resource usage list. This corresponds to the execution of the same target job on two different physical resources, respectively.

The execution target operation by using the newly retrieved resource has higher reliability level and can be successfully executed. Thus, the problem that the target job which may occur by solely utilizing the second resource cannot be successfully executed can be avoided. In particular implementations, the job node that utilizes the second resource may be maintained to continue execution of the target job. At this time, the resource scheduler may acquire the execution information of the target job on the first resource and the execution information of the target job on the second resource, respectively, to determine the execution progress of the target job on the first resource and the execution progress of the target job on the second resource. The resource scheduler can further determine whether the target job is executed on one of the first resource and the second resource according to the execution information of the target job on the first resource and the execution information of the target job on the second resource; in the event that it is determined that the target job has been executed on one of the first and second resources, the resource scheduler may stop execution of the target job on the other resource and release the first resource, returning the second resource. For example, the resource scheduler may determine that the target job has been executed on the first resource according to the execution information of the target job on the first resource and the execution information of the target job on the second resource, and at this time, the resource scheduler may stop execution of the target job on the second resource, release the first resource, and return the borrowed second resource.

Further, it is considered that when processing a plurality of target jobs to be executed, resources are generally allocated to the target jobs with higher priorities in priority according to the priorities of the target jobs. The priority may be determined according to the importance degree of the target job. Thus, when allocating resources to a plurality of target jobs, the following situations may occur: for the job corresponding to the target job with higher priority, the importance degree of the job is higher, and the job can be successfully executed and completed by priority. Thus, even if no unallocated resources (i.e., first resources) are currently available for providing to a higher priority target job, then the higher priority target job will be preferentially allocated to the first resources once a satisfactory first resource appears, i.e., such higher priority target job has a higher probability of obtaining a higher stability level first resource. In this case, if the second resource is preferentially found and allocated for such a target job of higher priority when no allocated resource is currently available, it is highly likely that the target job has already acquired the first resource of higher stability before the target job is completed by execution with the second resource, and the second resource previously allocated to the target job is a resource waste to some extent. Therefore, in consideration of the specific characteristics of resource allocation to a plurality of target jobs, in order to avoid resource waste, determination and allocation of the second resource to the target job of the type having relatively low priority may be preferentially performed.

Specifically, in the case where the job to be executed includes a plurality of target jobs, the resource scheduler may determine the priority of each target resource according to the resource usage application for each target job; and further, under the condition that no unallocated resources are available, corresponding second resources can be determined and allocated to each target job in sequence according to the order of the priorities of the target jobs from low to high. Therefore, the problem that the target job with higher priority is allocated with the second resource and then the target job obtains the resource waste caused by the first resource can be avoided.

Fig. 5 is a method flowchart of the resource allocation method provided in the present application. Although the present application provides a method operation step or apparatus structure as shown in the following examples or figures, more or fewer operation steps or module units may be included in the method or apparatus based on routine or non-inventive labor. In the steps or structures where there is no necessary causal relationship logically, the execution order of the steps or the module structure of the apparatus is not limited to the execution order or the module structure shown in the drawings and described in the embodiments of the present application. The described methods or module structures may be implemented sequentially or in parallel (e.g., in a parallel processor or multithreaded environment, or even in a distributed processing environment) in accordance with the embodiments or the method or module structure connection illustrated in the figures when implemented in a practical device or end product application.

As shown in fig. 5, the method for allocating resources may include the following steps:

step 501: acquiring resource use data of allocated resources in a resource pool;

step 502: determining a surplus resource from the allocated resources according to the resource use data of the allocated resources;

step 503: and distributing the allowance resources to target jobs of the resources to be distributed.

Specifically, in the step 503, the allocating the remaining resources to the target job of the resources to be allocated may specifically include: and distributing the surplus resources to target jobs of the resources to be distributed, wherein the resource demand of the target jobs is smaller than or equal to the resource quantity of the surplus resources.

It is considered that when the resource allocation is actually performed, the resource allocation may be performed through the unallocated resource without performing the allocation through the remaining resource in the case where the unallocated resource is sufficient. When the method is realized, before the residual resources are determined from the allocated resources according to the resource use data of the allocated resources, the resource use data of unallocated resources can be obtained; determining whether unallocated resources meeting the requirements exist in the unallocated resources according to the resource usage data of the unallocated resources, wherein the resource quantity of the unallocated resources meeting the requirements is greater than or equal to the unallocated resources of the resource demand quantity of the target job of the resources to be allocated; and if the unallocated resources exist in the unallocated resources, allocating the unallocated resources meeting the requirements to the target job.

In order to realize effective integration of the resource usage, a machine node for counting the resources can be arranged, and the machine node can be arranged on physical resources, so that the resource usage can be integrated. That is, the resource usage data of the allocated resources may be acquired by a machine node preset on the physical resources.

Because the allocated residual resources are allocated, the resources are not used enough when the original job is processed, so that the unallocated resources can be allocated to the target job when the unallocated resources can meet the target job requirement, and then the residual resources are returned to the original job. That is, after the allowance resource is allocated to the target job of the resource to be allocated, whether the unallocated resource meeting the requirement exists in the resource to be allocated or not can be detected; and sending the unallocated resources meeting the requirements to the target job under the condition that the unallocated resources meeting the requirements exist in the resources to be allocated. Further, it may be determined whether the unallocated resources meeting the requirements and the surplus resources allocated to the target job are located in the same physical resource; and returning the unallocated resource meeting the requirements to the job corresponding to the allocated resource to which the residual resource belongs under the condition that the unallocated resource meeting the requirements and the residual resource allocated to the target job are located on the same physical resource. And allocating the unallocated resources meeting the requirements to the target job to execute the target job under the condition that the unallocated resources meeting the requirements and the allowance resources allocated to the target job are not located in the same physical resource.

After the unallocated resources meeting the requirements are allocated to the target job to execute the target job, the execution information of the target job on the residual resources and the target can be respectively obtained as the execution information on the unallocated resources meeting the requirements; determining whether the target job is executed on at least one of the allowance resource and the non-allocated resource according to the execution information of the target job on the allowance resource and the execution information of the target job on the non-allocated resource meeting the requirements, and stopping the execution of the target job on the allowance resource and the non-allocated resource meeting the requirements under the condition that the execution of the target job on the at least one of the allowance resource and the non-allocated resource meeting the requirements is determined to be completed. That is, if the resources are sufficient, the surplus resources and the normal unallocated resources can be allocated to the target job at the same time, and then both are simultaneously executed, and after one of the execution is completed, both the resources are released, thereby shortening the execution time of the target job.

Further, considering that the allocation is a secondary allocation of resources, but it is obviously unreasonable if the allocation affects the normal operation of the task which is initially allocated, in order to avoid affecting the orderly operation of the task which is originally allocated resources, when actually implemented, after the surplus resources are allocated to the target jobs of the resources to be allocated, the resource usage data of the allocated resources to which the surplus resources belong can be obtained; determining an operation state representation parameter of the allocated resource of the residual resource according to the resource use data of the allocated resource of the residual resource; and withdrawing the allowance resource under the condition that the operation state characterization parameter is larger than a threshold state parameter.

In one embodiment, the allocating the surplus resource to the target job of the resource to be allocated having the resource demand amount smaller than or equal to the resource amount of the surplus resource may include: and when the target job of the resource to be allocated with the resource demand less than or equal to the resource quantity of the surplus resource comprises a plurality of target jobs, allocating the surplus resource to the target job with the lowest priority in the plurality of target jobs. That is, allocation with low priority is selected, so that a hindrance in the running process of the resource caused by high allocation priority is avoided.

In an actual implementation, if the job node receives Normal resources (i.e., regular resources) during execution in super-sell form, then processing may be performed as follows:

1) If the Normal resource and the super-sell resource are on the same machine (i.e., on the same machine node), then the job manager may inform the machine node to change the job node from super-sell form to Normal form;

2) If the Normal resource and the overseal resource are not on the same machine, the job manager may start up a copy of the job node according to the Normal resource, which is equivalent to having 2 identical job node instances running simultaneously, and finally taking the job node instance that is executed first as the actual completed job node, and killing another executing job node.

The machine node is an actual executor monitoring the actual utilization rate of the physical resources of the machine, and when the utilization rate of the resources is low, the machine node allows the execution of the overseal operation node. If the resource utilization is at a high level, the machine node may refuse to start the super-selling job node, or even actively kill the executing super-selling job node, in order to ensure that the Normal job node is executed.

Specifically, after the surplus resources are allocated to the target jobs of the resources to be allocated, the operation parameters of the machine nodes allocated with the surplus resources can be detected in real time; determining whether the machine node with the allocated allowance resources exceeds a preset load threshold according to the operation parameters; and withdrawing the allocated allowance resources under the condition that the preset load threshold is determined to be exceeded.

The monitoring of the resource by the machine node may be multi-dimensional, and may include, for example, but not limited to: disk IO, network transport, machine Load, CPU usage, memory usage, etc., the machine's operating environment is unstable if one dimension is at a high water level. For each dimension, two values may be set, one being an early warning value and one being a hazard value. If a certain dimension reaches an early warning value, the machine node can be set to refuse to start a new overstock operation node, and if the dangerous value is reached, the machine node actively kills part of the overstock operation node until the resource use is lower than the dangerous value.

The machine node can periodically send the resource use condition (actual use and early warning value) of each dimension to the resource manager while monitoring the self resource use condition, so that the resource manager can select a machine with low actual physical resource use ratio to start the overstock operation node, and the running stability of the overstock operation node can be ensured to the greatest extent.

For the selection of the job node, when the job node is realized, the job node is considered to have different priorities, wherein the priorities can be differentiated according to the importance degree of the service, and the higher the priority is, the more Normal resources are preferentially allocated to the resource scheduler. When cluster resources are tensed, a plurality of job nodes with different priorities are in a state of queuing for waiting for the resources. In this example, the way in which super-sell resources are preferentially allocated for low-priority jobs is mainly because for a single job node, the resource scheduler cannot accurately give when it will be allocated resources when the cluster is tight, but as a whole, cluster resources must flow to job nodes with higher priority, i.e. the higher the priority of a job node, the greater the probability of being allocated to a resource. Assuming that a high priority job node in the queuing is selected for resource overstock, the greater the probability of being allocated to Normal resources during overstock execution. If the Normal resource and the super-sell resource are optimal in the same machine, the improvement of the cluster physical resource utilization rate is the maximum value, otherwise, the same job node is started on the machine where the Normal resource is located, so that whether the super-sell node or the Normal node runs out first, the other node always runs out, and the cluster resource utilization rate is improved but is "wasted" from the viewpoint of the job node. Thus, a low priority job node may be preferentially selected for overstock, and the probability of its allocation to Normal resources is low, the probability of the occurrence of the "waste" is low, and if the low priority job node runs successfully in the overstock mode, it is optimal, and if the running process is killed by the machine node, the time required is not slower than waiting for the Normal resources to re-run.

Specifically, the allocating the surplus resource to the target job of the resource to be allocated may include: determining whether a job with a resource demand less than or equal to the resource demand of the allowance resource exists in the job pool; when a plurality of jobs whose resource demand is equal to or less than the resource demand of the surplus resources exist in the job pool, the surplus resources are allocated to jobs with the lowest priority in the job pool.

For the selection of the super-selling machine node, the machine node reports the real physical resource use condition and the early warning value of the machine in each dimension to the resource manager, so that the resource manager can count the health score=early warning value-resource use condition of each dimension, if the health score of one dimension is smaller than or equal to 0, the machine cannot distribute the super-selling operation node, otherwise, the health scores of all dimensions are summed to be used as the health score of the machine. And sorting all machines capable of distributing super-selling resources according to the health score of the machines, selecting the machines of the TopN, and distributing the super-selling resources for the machines. When implemented, the health score of the machine may be controlled to be updated once per second.

Specifically, when performing resource overstock scheduling, the most suitable machine for allocating overstock resources may be determined in the following manner, for example, operation parameters of multiple dimensions of multiple machine nodes in the resource pool may be obtained; according to the operation parameters of the multiple dimensions of the multiple machine nodes, the multiple machine nodes are subjected to distribution degree sequencing; and allocating the residual resources in the allocated resources of the preset number of machine nodes with the highest allocation degree as the determined residual resources to the target job of the resources to be allocated. Wherein the plurality of dimensions may include, but are not limited to, at least one of: disk IO, download load, CPU utilization, memory utilization.

In actual implementation, a TopN machine capable of distributing super-selling resources is arranged, the super-selling resources are distributed by selecting job nodes with low priority in a queuing queue, and only 1 super-selling job node is distributed per second for each machine, so that the rhythm of super-selling resource distribution of a single machine node can be effectively controlled, and the situation that the super-selling job nodes are killed in the execution process due to the fact that too many super-selling job nodes are distributed at one time is avoided.

In the embodiment of the present application, the resource allocation system and the resource allocation method provided in the embodiments of the present application search and determine, as the second resource, the remaining resources that meet the resource demand from the currently allocated resources, by searching and determining, according to the acquired current resource usage data, in the case that there are no available unallocated resources (i.e., the first resources); the second resource which is not used currently is temporarily reassigned out to execute the target operation, so that the technical problem of low resource utilization rate in the existing method is solved, and the technical effects of fully utilizing the existing resource and improving the overall operation processing efficiency are achieved; after the second resource is allocated to the target job, the first resource with higher reliability level is searched and determined for the target job according to the updated current resource usage data, so that the stability of the whole job processing is ensured; the second resource is preferentially allocated by selecting the target job with relatively lower priority, so that the resource waste is reduced, and the resource utilization rate is further improved; in addition, the second resource with lower risk level is preferentially selected for allocation, so that the risk that the target job allocated with the second resource is stopped in the execution process is reduced, and the execution stability of the target job allocated with the second resource is improved.

The method embodiments provided by the embodiments of the present application may be performed in a mobile terminal, a computer terminal, or similar computing device. Taking the operation on the service device as an example, fig. 6 is a hardware structure block diagram of the service device of a resource allocation method according to an embodiment of the present invention. As shown in fig. 6, the service apparatus 10 may include one or more (only one is shown in the figure) processors 102 (the processors 102 may include, but are not limited to, a microprocessor MCU, a programmable logic device FPGA, etc. processing means), a memory 104 for storing data, and a transmission module 106 for communication functions. It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 6 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, the service device 10 may also include more or fewer components than shown in fig. 6, or have a different configuration than shown in fig. 5.

The memory 104 may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the resource allocation method in the embodiment of the present invention, and the processor 102 executes the software programs and modules stored in the memory 104, thereby executing various functional applications and data processing, that is, implementing the resource allocation method of the application program. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission module 106 is used to receive or transmit data via a network. The specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission module 106 includes a network adapter (Network Interface Controller, NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission module 106 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.

At the software level, the above-mentioned resource allocation device may, as shown in fig. 7, include:

an obtaining module 701, configured to obtain resource usage data of allocated resources in a resource pool;

a determining module 702, configured to determine, according to the resource usage data of the allocated resources, a remaining resource from the allocated resources;

an allocation module 703, configured to allocate the surplus resource to a target job of the resource to be allocated.

In one embodiment, the determining module 702 may specifically obtain the operation parameters of multiple dimensions of multiple machine nodes in the resource pool; according to the operation parameters of the multiple dimensions of the multiple machine nodes, the multiple machine nodes are subjected to distribution degree sequencing; and allocating the residual resources in the allocated resources of the preset number of machine nodes with the highest allocation degree as the determined residual resources to the target job of the resources to be allocated.

In one embodiment, the plurality of dimensions may include, but are not limited to, at least one of: disk IO, download load, CPU utilization, memory utilization.

In one embodiment, the allocation module 703 may specifically determine whether a job whose resource demand is less than or equal to the resource demand of the remaining resources exists in the job pool; when a plurality of jobs whose resource demand is equal to or less than the resource demand of the surplus resources exist in the job pool, the surplus resources are allocated to jobs with the lowest priority in the job pool.

In one embodiment, the above device may further detect, in real time, an operation parameter of a machine node to which the surplus resource is allocated after the surplus resource is allocated to the target job of the resource to be allocated; determining whether the machine node with the allocated allowance resources exceeds a preset load threshold according to the operation parameters; and withdrawing the allocated allowance resources under the condition that the preset load threshold is determined to be exceeded.

In one embodiment, the above apparatus may further determine whether there are allocable regular resources in the resource pool after allocating the surplus resources to the target job of the resources to be allocated; in the event that it is determined that there are allocable regular resources, the regular resources are allocated to the target job.

In one embodiment, the above apparatus may further determine whether the regular resource and the surplus resource are located at the same machine node after allocating the regular resource to the target job in a case where it is determined that the regular resource exists; converting the margin resource to a regular resource if it is determined to be located at the same machine node; and in the case that the determination is not located at the same machine node, running the target job through the allowance resource and the regular resource in parallel.

In the above example, the surplus resources in the allocated resources are determined, and the surplus resources are allocated to the target job of the resources to be allocated, that is, the surplus data in the allocated resources are secondarily allocated, so that the technical problem of low resource utilization rate in the existing method can be solved, and the technical effects of fully utilizing the resources and improving the job processing efficiency are achieved.

Although the present application provides method operational steps as described in the examples or flowcharts, more or fewer operational steps may be included based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented by an actual device or client product, the instructions may be executed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment) as shown in the embodiments or figures.

The apparatus or module set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. For convenience of description, the above devices are described as being functionally divided into various modules, respectively. The functions of the various modules may be implemented in the same piece or pieces of software and/or hardware when implementing the present application. Of course, a module that implements a certain function may be implemented by a plurality of sub-modules or a combination of sub-units.

The methods, apparatus or modules described herein may be implemented in computer readable program code means and in any suitable manner, e.g., the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller can be regarded as a hardware component, and means for implementing various functions included therein can also be regarded as a structure within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

Some of the modules of the apparatus described herein may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

From the description of the embodiments above, it will be apparent to those skilled in the art that the present application may be implemented in software plus necessary hardware. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, or may be embodied in the implementation of data migration. The computer software product may be stored on a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., comprising instructions for causing a computer device (which may be a personal computer, mobile terminal, server, or network device, etc.) to perform the methods described in various embodiments or portions of embodiments herein.

Various embodiments in this specification are described in a progressive manner, and identical or similar parts are all provided for each embodiment, each embodiment focusing on differences from other embodiments. All or portions of the present application can be used in a number of general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, mobile communication terminals, multiprocessor systems, microprocessor-based systems, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

Although the present application has been described by way of example, those of ordinary skill in the art will recognize that there are many variations and modifications of the present application without departing from the spirit of the present application, and it is intended that the appended claims encompass such variations and modifications without departing from the spirit of the present application.

Claims

1. A method for resource allocation, comprising:

acquiring resource use data of allocated resources in a resource pool;

Distributing the allowance resources to target jobs of the resources to be distributed;

wherein determining, from the allocated resources, a margin resource according to the resource usage data of the allocated resources, includes:

acquiring operation parameters of multiple dimensions of multiple machine nodes in the resource pool;

according to the operation parameters of the multiple dimensions of the multiple machine nodes, the multiple machine nodes are subjected to distribution degree sequencing;

and allocating the residual resources in the allocated resources of the preset number of machine nodes with the highest allocation degree as the determined residual resources to the target job of the resources to be allocated.

2. The method of claim 1, wherein the plurality of dimensions comprises: disk IO, download load, CPU utilization, memory utilization.

3. The method of claim 1, wherein allocating the margin resource to a target job of resources to be allocated comprises:

determining whether a job with a resource demand less than or equal to the resource demand of the allowance resource exists in the job pool;

when a plurality of jobs whose resource demand is equal to or less than the resource demand of the surplus resources exist in the job pool, the surplus resources are allocated to jobs with the lowest priority in the job pool.

4. The method of claim 1, wherein after allocating the margin resource to a target job of resources to be allocated, the method further comprises:

detecting the operation parameters of the machine nodes with the allocated allowance resources in real time;

determining whether the machine node with the allocated allowance resources exceeds a preset load threshold according to the operation parameters;

and withdrawing the allocated allowance resources under the condition that the preset load threshold is determined to be exceeded.

5. The method of claim 1, wherein after allocating the margin resource to a target job of resources to be allocated, the method further comprises:

determining whether there are allocable regular resources in the resource pool;

in the event that it is determined that there are allocable regular resources, the regular resources are allocated to the target job.

6. The method of claim 5, wherein in the event that it is determined that a regular resource exists, the regular resource is allocated to the target job, after which the method further comprises:

determining whether the regular resource and the margin resource are located at a same machine node;

converting the margin resource to a regular resource if it is determined to be located at the same machine node;

And in the case that the determination is not located at the same machine node, running the target job through the allowance resource and the regular resource in parallel.

7. A service device comprising a processor and a memory for storing processor-executable instructions that when executed by the processor implement:

acquiring resource use data of allocated resources in a resource pool;

wherein the processor determines, from the allocated resources, a remaining resource according to the resource usage data of the allocated resources, including:

8. The apparatus of claim 7, wherein the plurality of dimensions comprises: disk IO, download load, CPU utilization, memory utilization.

9. The apparatus of claim 7, wherein the processor allocating the margin resource to a target job of resources to be allocated comprises:

10. The apparatus of claim 7, wherein the processor, after allocating the margin resource to a target job of resources to be allocated, further:

11. The apparatus of claim 7, wherein the processor, after allocating the margin resource to a target job of resources to be allocated, further:

determining whether there are allocable regular resources in the resource pool;

12. The apparatus of claim 11, wherein the processor, upon determining that a regular resource exists, allocates the regular resource to the target job, and thereafter further:

13. A computer readable storage medium having stored thereon computer instructions which when executed implement the steps of the method of any of claims 1 to 6.