CN109936604A

CN109936604A - A kind of resource regulating method, device and system

Info

Publication number: CN109936604A
Application number: CN201711362963.6A
Authority: CN
Inventors: 张皓天; 苏磊; 靳江明
Original assignee: Beijing Tusimple Future Technology Co Ltd
Current assignee: Beijing Tusimple Technology Co Ltd
Priority date: 2017-12-18
Filing date: 2017-12-18
Publication date: 2019-06-25
Anticipated expiration: 2037-12-18
Also published as: CN109936604B

Abstract

The present invention discloses a kind of resource regulating method, device and system, to solve the lower technical problem of GPU resource utilization rate in the prior art.Method includes: the allowable resource of each GPU in each host of monitoring；When receiving new task, the corresponding demand resource of new task is determined；According to the allowable resource of each GPU in host, determine that allowable resource meets the target GPU of the demand resource；It is the new task distribution resource from the allowable resource of the target GPU, and the new task is distributed to the host where the target GPU.Using technical solution of the present invention, GPU resource utilization rate is not only improved, but also improve task execution efficiency and speed.

Description

A kind of resource regulating method, device and system

Technical field

The present invention relates to computer field, in particular to a kind of resource regulating method, a kind of resource scheduling device and one kind Resource scheduling system.

Background technique

Currently, the distributed computing group system based on master-worker mode with it is more and more extensive (such as Docker container cluster), the distributed computing group system based on master-worker mode include the end master server and Multiple ends worker host.The end master server is for receiving new task, distributing resource to new task and to the place worker Host assignment task dispatching；Worker host executes the new task for receiving new task.

In distributed computing group system, the end master server is when distributing resource to new task, by worker host Whole resource allocations of a monolith or muti-piece GPU (Graphics Processing Unit, graphics processor) in machine are to same One task, i.e. a task occupy a monolith or whole resources of muti-piece GPU.

When receiving new task, judging whether there is to be not yet assigned on the host of the end worker appoints the end Master server The monolith GPU of what task then etc. distributes one or more to new task again after pending middle task execution if it does not exist A monolith GPU resource.But in actual use, a task be not often at every moment 100% using allocated Monolith GPU resource, such as the task may only used 30% or 50% money in monolith GPU within a very long time Source, other resources in the GPU are in idle condition.Therefore, existing resource distribution mode can not abundant, reasonable utilization The resource of monolith GPU, GPU resource utilization rate are lower.

Summary of the invention

In view of the above problems, the present invention provides a kind of resource regulating method, device and system, to solve in the prior art The lower technical problem of GPU resource utilization rate.

The embodiment of the present invention, first aspect provide a kind of resource regulating method, and the method is applied to master-worker The end master server in the distributed computing cluster of mode, this method comprises:

Monitor the allowable resource of each GPU in each host；

When receiving new task, the corresponding demand resource of new task is determined；

According to the allowable resource of each GPU in host, determine that allowable resource meets the mesh of the demand resource Mark GPU；

It is the new task distribution resource from the allowable resource of the target GPU, and the new task is distributed to Host where the target GPU.

In the embodiment of the present invention, second aspect provides a kind of resource regulating method, and this method is suitable for master-worker The end worker host in the distributed computing cluster of mode, method include:

Determine the allowable resource in host in each GPU；

The allowable resource of each GPU is sent to the end master server；

Execute the task of the end master server distribution.

In the embodiment of the present invention, the third aspect provides a kind of resource scheduling device, which is arranged in master-worker The end master server in the distributed computing cluster of mode, device include:

Monitoring unit, for monitoring the allowable resource of each GPU in each host；

Resolution unit, for when receiving new task, determining the corresponding demand resource of new task；

Determination unit is determined described in allowable resource satisfaction for the allowable resource according to each GPU in host The target GPU of demand resource；

Allocation unit, for being the new task distribution resource from the allowable resource of the target GPU, and will be described New task distributes to the corresponding host of the target GPU.

In the embodiment of the present invention, fourth aspect provides a kind of resource scheduling device, which is arranged in master-worker In the end worker host in the distributed computing cluster of mode, device includes:

Resource determination unit, for determining the allowable resource in host in each GPU；

Communication unit, for the allowable resource of each GPU to be sent to the end master server；

Execution unit, for executing the task of the end master server distribution.

In the embodiment of the present invention, the 5th aspect provides a kind of resource scheduling system, including the end master server and difference Multiple ends the worker host being connect with the end master server, in which:

The end master server, for monitoring the allowable resource of each GPU in each host；Receiving new task When, determine the corresponding demand resource of new task；According to the allowable resource of each GPU in host, allowable resource is determined Meet the target GPU of the demand resource；It is the new task distribution resource from the allowable resource of the target GPU, and The new task is distributed into the corresponding host of the target GPU；

Host is sent to the end master clothes for determining the allowable resource of GPU in host, and by allowable resource Business device, and execute the task of the end master server distribution.

In the embodiment of the present invention, for the distributed computing cluster of master-worker mode, the end master server prison Control the allowable resource of each GPU in each host；It is not the monolith resource directly by GPU in the new task received It is fully allocated to new task, but is distributed from the allowable resource of GPU and demand resource phase according to the demand resource of new task Answer the stock number of size.Using technical solution of the present invention, if same GPU resource is distributed in execution, there are also remaining after task Allowable resource when, moreover it is possible to by the allowable resource of the GPU distribute to other tasks use, so as to realize multiple tasks The resource for sharing same GPU makes full use of GPU resource, and the task in the prior art that solves is monopolized monolith GPU resource and led The problem for causing GPU resource utilization rate low；Also, it is having the same with the prior art due to being made using technical solution of the present invention It can be used for more tasks in the case where GPU resource amount, resource can be distributed in time when receiving new task for new task, Entirety can be improved the speed of performing task and efficiency.

Detailed description of the invention

Attached drawing is used to provide further understanding of the present invention, and constitutes part of specification, with reality of the invention It applies example to be used to explain the present invention together, not be construed as limiting the invention.

Fig. 1 is the structural schematic diagram of resource scheduling system in the embodiment of the present invention；

Fig. 2 be arranged in the embodiment of the present invention resource scheduling device in the server of the end master structural schematic diagram it One；

Fig. 3 is the schematic diagram of the allowable resource amount of each GPU recorded in resource pool in the embodiment of the present invention；

Fig. 4 be arranged in the embodiment of the present invention resource scheduling device in the server of the end master structural schematic diagram it Two；

Fig. 5 is showing for the corresponding mission bit stream of host safeguarded in task information maintenance unit in the embodiment of the present invention It is intended to；

Fig. 6 is the schematic diagram after being updated to the mission bit stream in Fig. 5；

Fig. 7 is one of the structural schematic diagram of determination unit in the embodiment of the present invention；

Fig. 8 is the second structural representation of determination unit in the embodiment of the present invention；

Fig. 9 is the third structural representation of determination unit in the embodiment of the present invention；

Figure 10 is the structural schematic diagram for the resource scheduling device being arranged in the host of the end worker in the embodiment of the present invention；

Figure 11 is the flow chart for the resource regulating method being arranged in the server of the end master in the embodiment of the present invention；

Figure 12 is one of the flow chart of step 103 realized in Figure 11；

Figure 13 is the two of the flow chart of the step 103 in realization Figure 11；

Figure 14 is the three of the flow chart of the step 103 in realization Figure 11；

Figure 15 is the four of the flow chart of the step 103 in realization Figure 11；

Figure 16 is the five of the flow chart of the step 103 in realization Figure 11；

Figure 17 is the flow chart for the resource regulating method being arranged in the host of the end worker in the embodiment of the present invention.

Specific embodiment

Technical solution in order to enable those skilled in the art to better understand the present invention, below in conjunction with of the invention real The attached drawing in example is applied, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described implementation Example is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, this field is common Technical staff's every other embodiment obtained without making creative work, all should belong to protection of the present invention Range.

Technical solution of the present invention is suitable for the distributed computing cluster of all mster-worker modes, such as docker holds Device cluster, engine computing cluster etc..The application does not do considered critical for specific distributed computing cluster.

Embodiment one

As shown in Figure 1 it is the structural schematic diagram of resource scheduling system, is mster-worker mode in resource scheduling system Distributed computing cluster, distributed computing cluster include master server and it is multiple respectively with master server communication The end the worker host of connection.

Master server can realize following functions by the master program being arranged on the master server: in real time Or periodically monitor the allowable resource of each GPU in each host；New task is received, and parses the corresponding task of new task Parameter obtains the corresponding demand resource of new task；According to the allowable resource of each GPU in each host, determine that money can be distributed Source meets the target GPU of the needs of new task resource；It is the new task distribution money from the allowable resource of target GPU Source, and the new task is distributed to the host where the target GPU, so as to as the host tune where target GPU The new task is executed with corresponding worker program.

Each end worker host can realize following functions by the worker program that is arranged on host: in real time or It periodically determines the allowable resource of each GPU on the host where the worker program, and distributes each GPU to money Source is sent to the end master server, and executes the task that master server distributes to the host.

In the embodiment of the present invention, the corresponding allowable resource of each GPU on the host is sent to master by host There are many mechanism for holding server, and the application does not do considered critical.For example, worker routines periodically actively will be where it The allowable resource of each GPU on host is synchronized to the end master server；Further for example, the end master server is periodically Resource acquisition request is sent to each host, the worker program in each host will according to the resource acquisition request received The allowable resource of each GPU where it on host is sent to the end master server；Further for example, the end master server can Each host is periodically polled, worker program, will in the host where the end master server poll to its The allowable resource of each GPU on the host is sent to the end master server.

For convenient for it is further understood that technical solution of the present invention, separately below from the end master server Technical solution of the present invention is described in detail with the end worker host.

Embodiment two

Master program in the server of the end master can pass through subprogram scheduler (the i.e. resource tune of the master Spend device) realize that aforementioned function, the structure of the resource scheduling device are as shown in Figure 2, it may include monitoring unit 11, resolution unit 12, determination unit 13 and allocation unit 14, in which:

Monitoring unit 11, for monitoring the allowable resource of each GPU in each host.

Resolution unit 12, for when receiving new task, determining the corresponding demand resource of new task.

In the embodiment of the present invention, when resolution unit 12 receives new task, new task is parsed by preset resolution rules Corresponding task parameters to obtain the corresponding demand resource of new task, such as include the identity information of new task in task parameters (GPU resource information includes GPU number, the money for occupying every GPU to the GPU resource information that (such as title or ID), new task need Source amount).

Determination unit 13 determines that allowable resource meets institute for the allowable resource according to each GPU in host State the target GPU of demand resource.

Allocation unit 14, for being new task distribution resource from the allowable resource of the target GPU, and by institute It states new task and distributes to the corresponding host of the target GPU.

In the embodiment of the present invention, the allowable resource that monitoring unit 11 monitors each GPU in each host can pass through but not It is only limitted to following manner realization:

Monitoring unit 11 establishes resource pool (i.e. resource pool), each host of dynamically recording in the resource pool In each GPU allowable resource amount, as shown in figure 3, host (being indicated with H1) include 3 GPU (respectively with H1G1, H1G2, H1G3 is indicated), the corresponding allowable resource amount of H1G1, H1G2, H1G3 is N11, N12, N13.Monitoring unit 11 is from place When receiving the corresponding allowable resource of each GPU on the host in host, according to the corresponding allowable resource of each GPU received Update the corresponding allowable resource amount of corresponding GPU in resource pool.

What certain those skilled in the art can also monitor each GPU in each host by other means distributes money Source, such as by establishing dynamic listing, the allowable resource amount of each GPU in each host is recorded in the dynamic listing Information safeguards the information in the dynamic listing in real time or periodically.

In the embodiment of the present invention, determination unit 13 is distributed from each GPU's obtained in each host in monitoring unit 11 Resource, to determine that allowable resource meets the target GPU of the corresponding demand resource of the new task.

It preferably, is the corresponding allowable resource amount of each GPU in the resource pool that timely updates, allocation unit 14 is from target After distributing resource in the allowable resource of GPU for new task, target GPU and its stock number for distributing to new task are synchronized to prison Unit 11 is controlled, is timely updated the allowable resource amount of target GPU by monitoring unit 11.It is by target GPU of H1G1 shown in Fig. 3 Example, the allowable resource amount of H1G1 is N11 before being new task distribution resource in the allowable resource from the H1G1, works as distribution Unit 14 after be resource that new task sendout is M1 in the allowable resource of target GPU, then target GPU is distributed Resource becomes N11-M1.

It preferably, is the mission bit stream in each host of further timely learning, the resource scheduling device can also be into one Step includes mission bit stream maintenance unit 15, as shown in Figure 4, in which:

Mission bit stream maintenance unit 15, for recording the corresponding mission bit stream of each host, wherein mission bit stream includes place Task in all execution on host, the GPU resource information for distributing to task in each execution, GPU resource information includes: to execute The corresponding GPU of middle task and the stock number for occupying every GPU.

As shown in figure 5, including 3 GPU (being indicated respectively with H1G1, H1G2, H1G3) in host H1, wrapped in host H1 It (is indicated respectively with task A1 and task A2) containing two tasks, in which: task A1 corresponds to H1G1, H1G2, and H1G1 distributes to task The stock number of A1 is M11, and the stock number that H1G2 distributes to task A1 is M12；Task A2 corresponds to H1G3, and H1G3 distributes to task A2 Stock number be M21.

Preferably, be the corresponding mission bit stream of each host that timely updates, in the embodiment of the present invention, allocation unit 14 from It is after distributing resource in the allowable resource of target GPU for new task, target GPU and its stock number for distributing to new task is synchronous To mission bit stream maintenance unit 15, so that mission bit stream maintenance unit 15 timely updates where target GPU host corresponding Business information.By taking H1G2 shown in fig. 5 is target GPU as an example, new task task 3 is indicated, then host H1 is corresponding for new task Mission bit stream as shown in fig. 6, newly-increased task 3, task 3 corresponding H1G2, H1G2 are that the stock number that task 3 is distributed is M31.

Preferably, in the embodiment of the present invention, each host discharges the corresponding money of the task after having executed a certain task Source, and the status information that the task execution finishes and the resource information that the task occupies are synchronized to monitoring unit 11 and task letter Maintenance unit 15 is ceased, so as to monitoring unit 11, the more new information of mission bit stream maintenance unit 15.

In the embodiment of the present invention, the allowable resource of GPU can be the idling-resource in GPU, be also possible in GPU can Shared resource can also be idling-resource and shareable resource in GPU.The idling-resource of GPU, which refers to be not yet assigned in GPU, to be held The resource of task in row, the shareable resource of GPU refer to distributed in GPU execute in task resource in prediction at one section The part resource that middle task utilizes is not performed in time.For example, by taking H1G1 shown in fig. 5 as an example, it is assumed that the resource of H1G1 is total Amount is N1, and H1G1 includes task A1 and task A2 at present, and it is M11 that wherein H1G1, which distributes to the stock number of task A1, distributes to and appoints The stock number of business A2 is M12, and wherein task A1 only takes up the stock number of M11 ' whithin a period of time, and task A2 is whithin a period of time The stock number of M12 ' is only taken up, then the idling-resource in H1G1 is N1-M11-M12, and the shareable resource in H1G1 includes (M11- M11 ') and (M12-M12 ').It is described separately below with example 1, example 2 and example 3.

Example 1

In example 1, the allowable resource of GPU is the idling-resource in GPU, the structure of determination unit 13 as shown in fig. 7, Including judgment sub-unit 131 and determine subelement 132, in which:

Judgment sub-unit 131, for judging in each GPU in host with the presence or absence of allowable resource more than or equal to described The candidate GPU of demand resource, candidate GPU then triggers determining subelement 132 if it exists；

Subelement 132 is determined, for choosing one of GPU as target GPU from the candidate GPU.

A GPU can be randomly selected as target GPU from candidate GPU by determining subelement 132, can also be from candidate The least GPU of allowable resource is chosen in GPU as target GPU, the application does not make considered critical.

Preferably, when candidate GPU is not present in the determination of judgment sub-unit 131, if the priority of new task is higher, to ensure height Priority tasks can execute in time, and aforementioned judgment sub-unit 131 is further used for: if it does not exist when candidate GPU, judge host The resource for being lower than the new task and distribution with the presence or absence of priority in machine is appointed more than or equal to seizing for the demand resource Business；Task can be seized if it exists, then one goal task of selection in task is seized from described, by the resource of the goal task The new task is distributed to, and the new task is distributed to the host where the goal task.It can seize if it does not exist New task is then put into the medium resource to be allocated in preset obstruction pond by task.

Example 2

In example 2, the allowable resource of GPU is the shareable resource in GPU.The structure of determination unit 13 can be found in figure Shown in 7, including judgment sub-unit 131 and determine subelement 132, wherein judgment sub-unit 131 and determine subelement 132 it is specific Function can be found in example 1, and details are not described herein.

Preferably due to which the shareable resource of GPU is to be already allocated in an execution in the GPU in the resource of task A part, the stock number that task may need over time, become in the execution increased, to ensure task in the execution It can smoothly be finished, in the embodiment of the present invention, the shareable resource for setting GPU can only be distributed in the priority ratio GPU Therefore all low new task of task in any one execution in example 2, determines that subelement 132 chooses one from candidate GPU GPU as target GPU, specifically: be above newly from the priority for choosing task in an execution for including in the candidate GPU The GPU of task is as target GPU.

Preferably, when candidate GPU is not present in the determination of judgment sub-unit 131, if the priority of new task is higher, to ensure height Priority tasks can execute in time, and aforementioned judgment sub-unit 131 is further used for: if judgement judges there is no when candidate GPU The resource for being lower than the new task and distribution with the presence or absence of priority in host is more than or equal to seizing for the demand resource Task；Task can be seized if it exists, then one goal task of selection in task is seized from described, by the money of the goal task The new task is distributed in source, and the new task is distributed to the host where the goal task.It can rob if it does not exist Task is accounted for, then new task is put into the medium resource to be allocated in preset obstruction pond.

In the embodiment of the present invention, master program periodically can choose highest priority or placement from obstruction pond The time longest task in obstruction pond, is sent to analytical unit 12 for the task of selection as new task.

Example 3

In example 3, the allowable resource of GPU is idling-resource and shareable resource in GPU, the determination unit 13 Structure as shown in figure 8, including that the first judgment sub-unit 133, first determines subelement 134, the second judgment sub-unit 135 and the Two determine subelement 136, in which:

First judgment sub-unit 133 is more than or equal to the presence or absence of idling-resource in each GPU for judging host described First candidate GPU of demand resource, the first candidate GPU then triggers first and determines subelement 134 if it exists, first waits if it does not exist GPU is selected then to trigger the second judgment sub-unit 135；

First determines subelement 134, for choosing a GPU from the first candidate GPU as target GPU；

Second judgment sub-unit 135 is more than or equal to institute with the presence or absence of shareable resource in each GPU for judging host State the second candidate GPU of demand resource；Triggering second determines subelement 136 when the second candidate GPU if it exists；

Second determines subelement 136, for choosing a GPU from the second candidate GPU as target GPU.

Preferably, second determine that subelement 136 is specifically used for: what selection one included from the second candidate GPU holds The priority of task is above the GPU of new task as target GPU in row.

Preferably, when the first judgment sub-unit 133 is determined there is no the first candidate GPU, if the priority of new task compared with Height, to ensure that high-priority task can execute in time, aforementioned first judgment sub-unit 133 is further used for: judging host In with the presence or absence of priority lower than the new task and the resource of distribution is more than or equal to the demand resource and seizes task； Task can be seized if it exists, then one goal task of selection in task is seized from described, by the resource of the goal task point New task described in dispensing, and the new task is distributed to the host where the goal task；It can seize and appoint if it does not exist Business, then trigger the second judgment sub-unit 135.

Preferably, to ensure that new task can be performed in time, aforementioned determination unit 13 shown in Fig. 7 can also be wrapped further It includes third judgment sub-unit 137 and third determines subelement 138, as shown in Figure 9, in which:

Second judgment sub-unit 135 is further used for: it is single to trigger third judgement when the second candidate GPU if it does not exist Member 137；

Third judgment sub-unit 137 whether there is idling-resource and shareable resource in each GPU for judging host Summation be more than or equal to the demand resource third candidate GPU, third candidate GPU then triggers third and determines subelement if it exists 138, new task is then put into the medium resource to be allocated in preset obstruction pond by third candidate GPU if it does not exist；

Third determines subelement 138, for choosing a GPU from the third candidate GPU as target GPU.

Preferably, in the embodiment of the present invention, third determines that subelement 138 is specifically used for: selecting from the second candidate GPU The priority of task in the execution for including is taken to be above the GPU of new task as target GPU.

Embodiment three

In the embodiment of the present invention three, the worker program in the host of the end worker can pass through resource tune as shown in Figure 10 It spends device and realizes that the resource scheduling device includes resource determination unit 21, communication unit 22 and execution unit 23, in which:

Resource determination unit 21, for determining the allowable resource in host in each GPU.

Communication unit 22, for the allowable resource of each GPU to be sent to the end master server.

Execution unit 23 distributes to the task of the host for executing the end master server.

Preferably, in the embodiment of the present invention three, the allowable resource of GPU can be the idling-resource of GPU, be also possible to The shareable resource of GPU can also be the idling-resource and shareable resource of GPU.

In an example, the allowable resource of GPU is the idling-resource of GPU, then resource determination unit 21 is specifically used for: The idling-resource of task in executing is not yet assigned in monitoring host in each GPU, and using idling-resource as allowable resource.

In another example, the allowable resource of GPU is the shareable resource of GPU, then resource determination unit 21 is specific For: it has been distributed in each GPU in prediction host in executing and has not been performed middle task whithin a period of time in the resource of task The shareable resource utilized, and using shareable resource as allowable resource.

In another example, the allowable resource of GPU is the idling-resource and shareable resource of GPU, then resource determines Unit 21 is specifically used for: being not yet assigned to the idling-resource of task in executing, and prediction host in monitoring host in each GPU In distributed in each GPU execute in task resource in be not performed that middle task utilizes whithin a period of time share money Source, and using the idling-resource and shareable resource as allowable resource.

In the embodiment of the present invention, resource determination unit 21 is predicted to have distributed to task in execution in host in each GPU The shareable resource that middle task utilizes it is not performed in resource whithin a period of time, specific implementation can be as follows: by monitoring each GPU In respectively execute in resource utilization of the task in historical time section, predict respectively to execute in the GPU in following a period of time in appoint The resource utilization of business will predict that not used part resource is as shareable resource in following a period of time.Such as it is a certain Comprising task A in an execution in GPU, the GPU resource amount for distributing to task A is M, and monitoring obtains task A in a period of time T Interior resource utilization is always below 50%, then can predict the resource utilization of the task A in subsequent time period still not Excess payment 50%, at this point, 50% in the GPU resource amount M of task A to be confirmed as to the shareable resource of the following period.

Preferably, the execution unit 23 is specifically used for: executing new task receiving the idling-resource using target GPU The first instruction when, utilize the idling-resource of the target GPU to execute the new task；And target GPU is used receiving Shareable resource when executing the second instruction of new task, utilize the shareable resource of the target GPU to execute the new task.

Preferably, execution unit 23 is further used for: more providing when detecting that the high-priority task in GPU needs to use When source, low priority task in the GPU out of service, and the shareable resource for distributing to low priority task is distributed to The high-priority task.

Example IV

Based on resource scheduling device shown in previous embodiment two, the embodiment of the present invention four provides a kind of scheduling of resource side Method, the method are applied to the end the master server in the distributed computing cluster of master-worker mode, the stream of method Journey figure is as shown in figure 11, comprising:

The allowable resource of each GPU in step 101, each host of monitoring；

Step 102, when receiving new task, determine the corresponding demand resource of new task；

Step 103, according to the allowable resource of each GPU in host, determine that allowable resource meets the demand The target GPU of resource；

Step 104 is new task distribution resource from the allowable resource of the target GPU, and by the new post The host where the target GPU is distributed in business.

In a specific example, the allowable resource is that the idling-resource or allowable resource in GPU are GPU In shareable resource, the step 103 specific implementation can be as shown in figure 12, comprising:

Step A1, judge to be more than or equal to the demand resource with the presence or absence of allowable resource in each GPU in host Candidate GPU；Candidate GPU thens follow the steps A2 if it exists；

Step A2, one of GPU is chosen as target GPU from the candidate GPU.

Preferably, if the allowable resource is the shareable resource in GPU, the step A2 is specifically included: from institute The priority for stating task in one execution for including of selection in candidate GPU is above the GPU of new task as target GPU.

Preferably, candidate GPU if it does not exist is further included steps of in the step A1 in flow chart shown in Figure 11 Shi Zhihang step A3~step A5, as shown in figure 13:

Step A3, judge that the resource in host with the presence or absence of priority lower than the new task and distribution is more than or equal to The demand resource seizes task；Task can be seized if it exists and thens follow the steps A4, can be seized task if it does not exist and then be held Row step A5；

Step A4, one goal task of selection in task is seized from described, the resource allocation of the goal task is given The new task, and the new task is distributed to the host where the goal task.

Step A5, new task is put into the medium resource to be allocated in preset obstruction pond.

In another example, allowable resource includes the idling-resource and shareable resource in GPU, the step 103 Specific implementation can be as shown in figure 14, comprising:

Step B1, judge to be more than or equal to the first of the demand resource with the presence or absence of idling-resource in each GPU of host Candidate GPU；The first candidate GPU thens follow the steps B2 if it exists, and the first candidate GPU thens follow the steps B3 if it does not exist；

Step B2, a GPU is chosen from the first candidate GPU as target GPU；

Step B3, judge to be more than or equal to the of the demand resource with the presence or absence of shareable resource in each GPU of host Two candidate GPU；The second candidate GPU thens follow the steps B4 if it exists；

Step B4, a GPU is chosen from the second candidate GPU as target GPU.

Step B4 is specifically used for: equal from the priority for choosing task in an execution for including in the second candidate GPU Higher than new task GPU as target GPU.

Preferably, to ensure that the task of high priority can execute in time, execute Figure 14 shown in process step B3 it Before, it can further comprise step B5~step B6, as shown in figure 15:

Step B5, judge that the resource in host with the presence or absence of priority lower than the new task and distribution is more than or equal to The demand resource seizes task；Task can be seized if it exists and thens follow the steps B6, can be seized task if it does not exist and then be held Row step B3；

Step B6, one goal task of selection in task is seized from described, the resource allocation of the goal task is given The new task, and the new task is distributed to the host where the goal task.

Preferably, in earlier figures 14, process shown in figure 15, the second candidate GPU if it does not exist can also further comprise Step B7~B8 still further comprises step B7~B8 in Figure 15 as shown in figure 16；

Step B7, judge to be more than or equal in each GPU of host with the presence or absence of the summation of idling-resource and shareable resource The third candidate GPU of the demand resource；Third candidate GPU if it exists thens follow the steps B8, if it does not exist third candidate GPU New task is then put into the obstruction medium resource to be allocated in pond；

Step B8, a GPU is chosen from the third candidate GPU as target GPU.

Preferably, step B8 is specifically used for: from the third candidate GPU choose an execution for including in task it is excellent First grade is above the GPU of new task as target GPU.

Embodiment five

Same idea based on a kind of resource scheduling device that previous embodiment three provides, the embodiment of the present invention five provide one Kind resource regulating method, worker end host of this method suitable for the distributed computing cluster of master-worker mode Machine, as shown in figure 17, this method comprises:

Step 201 determines allowable resource in host in each GPU；

The allowable resource of each GPU is sent to the end master server by step 202；

Step 203 executes the task that the end master server distributes to the host.

In an example, the allowable resource of GPU is the idling-resource in GPU, and step 201 specific implementation is such as Under: it is not yet assigned to the idling-resource of task in executing in monitoring host in each GPU, and using idling-resource as money can be distributed Source.

In another example, the allowable resource of GPU is the shareable resource in GPU, step 201 specific implementation It can be as follows: having been distributed in each GPU in prediction host in executing in not being performed in the resource of task whithin a period of time The shareable resource that task utilizes, and using shareable resource as allowable resource.

In another example, the allowable resource of GPU is idling-resource and shareable resource in GPU, the step 201 Specific implementation can be as follows: the idling-resource of task in executing is not yet assigned in monitoring host in each GPU, and, predict host That has distributed to not being performed in the resource of task whithin a period of time in executing in machine in each GPU middle task utilizes shares Resource, and using the idling-resource and shareable resource as allowable resource.

Preferably, the step 203 specifically includes: executing new task when receiving the idling-resource using target GPU When the first instruction, the new task is executed using the idling-resource of the target GPU；Being total to for target GPU is used when receiving When enjoying the second instruction of resource execution new task, the shareable resource of the target GPU is utilized to execute the new task.

Preferably, the step 203 further includes steps of the high-priority task needs worked as and detected in GPU When using more resources, low priority task in the GPU out of service, and sharing for low priority task will be distributed to Resource allocation gives the high-priority task.

Basic principle of the invention is described in conjunction with specific embodiments above, however, it is desirable to, it is noted that general to this field For logical technical staff, it is to be understood that the whole or any steps or component of methods and apparatus of the present invention can be any Computing device (including processor, storage medium etc.) perhaps in the network of computing device with hardware firmware, software or they Combination is realized that this is that those of ordinary skill in the art use the basic of them in the case where having read explanation of the invention What programming skill can be achieved with.

Those of ordinary skill in the art will appreciate that all or part of the steps that realization above-described embodiment method carries is can To instruct relevant hardware to complete by program, the program be can store in a kind of computer readable storage medium, The program when being executed, includes the steps that one or a combination set of embodiment of the method.

It, can also be in addition, each functional unit in each embodiment of the present invention can integrate in a processing module It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer In read/write memory medium.

It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The shape for the computer program product implemented in usable storage medium (including but not limited to magnetic disk storage and optical memory etc.) Formula.

The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

Although the above embodiment of the present invention has been described, created once a person skilled in the art knows basic Property concept, then additional changes and modifications can be made to these embodiments.So it includes upper that the following claims are intended to be interpreted as It states embodiment and falls into all change and modification of the scope of the invention.

Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims

1. a kind of resource regulating method, which is characterized in that the method is applied to the distributed computing of master-worker mode The end master server in cluster, method include:

Monitor the allowable resource of each graphics processor GPU in each host；

According to the allowable resource of each GPU in host, determine that allowable resource meets the target of the demand resource GPU；

It is the new task distribution resource from the allowable resource of the target GPU, and the new task is distributed to described Host where target GPU.

2. the method according to claim 1, wherein allowable resource is the idling-resource in GPU, or can divide With the shareable resource that resource is in GPU；

According to the allowable resource of each GPU in host, determine that allowable resource meets the target of the demand resource GPU is specifically included:

Judge the candidate GPU for being more than or equal to the demand resource in each GPU in host with the presence or absence of allowable resource；

Candidate GPU if it exists then chooses one of GPU as target GPU from the candidate GPU.

3. according to the method described in claim 2, it is characterized in that, if the allowable resource is the shareable resource in GPU When, one of GPU is chosen as target GPU from the candidate GPU, is specifically included:

The GPU of new task is above as target from the priority for choosing task in an execution for including in the candidate GPU GPU。

4. according to the method in claim 2 or 3, which is characterized in that the method also includes:

Candidate GPU if it does not exist, then: judging in host with the presence or absence of priority lower than the new task and the resource of distribution Task is seized more than or equal to the demand resource；Task can be seized if it exists, then one is chosen in task from described seize A goal task gives the resource allocation of the goal task to the new task, and the new task is distributed to the target Host where task.

5. the method according to claim 1, wherein allowable resource includes idling-resource in GPU and can be total to Enjoy resource；

Judge the first candidate GPU for being more than or equal to the demand resource in each GPU of host with the presence or absence of idling-resource；

First candidate GPU if it exists then chooses a GPU as target GPU from the first candidate GPU；

First candidate GPU if it does not exist then judges to be more than or equal to the need with the presence or absence of shareable resource in each GPU of host Seek the second candidate GPU of resource；

Second candidate GPU if it exists then chooses a GPU as target GPU from the second candidate GPU.

6. according to the method described in claim 5, it is characterized in that, whether there is in each GPU for judging host can share Resource is more than or equal to before the second candidate GPU of the demand resource, further includes:

Judge that the resource for being lower than the new task and distribution with the presence or absence of priority in host is more than or equal to the demand resource Seize task；

Task can be seized if it exists, then one goal task of selection in task is seized from described, by the money of the goal task The new task is distributed in source, and the new task is distributed to the host where the goal task；

Task can be seized if it does not exist, then execute it is described judgement host each GPU in the presence or absence of shareable resource be greater than etc. In the second candidate GPU of the demand resource the step of.

7. method according to claim 5 or 6, which is characterized in that further include:

Second candidate GPU if it does not exist, then: judging in each GPU of host with the presence or absence of idling-resource and shareable resource Summation is more than or equal to the third candidate GPU of the demand resource；Third candidate GPU if it exists, then from the third candidate GPU A GPU is chosen as target GPU.

8. method according to claim 5 or 6, which is characterized in that choose a GPU from the second candidate GPU and make It for target GPU, specifically includes: being above from the priority for choosing task in an execution for including in the second candidate GPU The GPU of new task is as target GPU；

And/or a GPU is chosen from the third candidate GPU as target GPU, it specifically includes: candidate from the third The priority that task in an execution for including is chosen in GPU is above the GPU of new task as target GPU.

9. a kind of resource regulating method, which is characterized in that this method is suitable for the distributed computing collection of master-worker mode The end worker host in group, method include:

Determine the allowable resource in host in each image processor GPU；

The allowable resource of each GPU is sent to the end master server；

Execute the task that the end master server distributes to the host.

10. according to the method described in claim 9, it is characterized in that, distributing money in each GPU in the determining host Source specifically includes:

It is not yet assigned to the idling-resource of task in executing in monitoring host in each GPU, and using idling-resource as money can be distributed Source；

And/or not being performed in the resource of task whithin a period of time in execution is distributed in prediction host in each GPU The shareable resource that middle task uses, and using shareable resource as allowable resource.

11. according to the method described in claim 10, it is characterized in that, executing the end master server distributes to the host Task, specifically include:

When receiving the first instruction using the idling-resource execution new task of target GPU, the free time of the target GPU is utilized Resource executes the new task；

When receive using target GPU shareable resource execute new task second instruction when, using the target GPU can Shared resource executes the new task.

12. according to the method for claim 11, which is characterized in that the method also includes:

Low priority when detecting that the high-priority task in GPU is needed using more resources, in the GPU out of service Task, and the shareable resource for distributing to low priority task is distributed into the high-priority task.

13. a kind of resource scheduling device, which is characterized in that the distributed meter of master-worker mode is arranged in described device The end the master server in cluster is calculated, device includes:

Monitoring unit, for monitoring the allowable resource of each graphics processor GPU in each host；

Determination unit determines that allowable resource meets the demand for the allowable resource according to each GPU in host The target GPU of resource；

Allocation unit, for being new task distribution resource from the allowable resource of the target GPU, and by the new post The corresponding host of the target GPU is distributed in business.

14. device according to claim 13, which is characterized in that allowable resource be GPU in idling-resource or can Distributing resource is the shareable resource in GPU；

The determination unit specifically includes:

Judgment sub-unit is provided with the presence or absence of allowable resource more than or equal to the demand in each GPU in host for judging The candidate GPU in source, candidate GPU then triggers determining subelement if it exists；

Subelement is determined, for choosing one of GPU as target GPU from the candidate GPU.

15. device according to claim 14, which is characterized in that if the allowable resource is to share money in GPU When source, the determining subelement is specifically used for: equal from the priority for choosing task in an execution for including in the candidate GPU Higher than new task GPU as target GPU.

16. device according to claim 14 or 15, which is characterized in that the judgment sub-unit is further used for: if not There are when candidate GPU, judge in host with the presence or absence of priority lower than the new task and distribution resource be more than or equal to institute That states demand resource seizes task；Task can be seized if it exists, then seizes one goal task of selection in task from described, It gives the resource allocation of the goal task to the new task, and the new task is distributed to the place where the goal task Host.

17. device according to claim 13, which is characterized in that allowable resource includes idling-resource in GPU and can Shared resource；

The determination unit specifically includes:

First judgment sub-unit is more than or equal to demand money with the presence or absence of idling-resource in each GPU for judging host The first candidate GPU in source, the first candidate GPU then triggers first and determines subelement if it exists, and the first candidate GPU is then touched if it does not exist Send out the second judgment sub-unit；

First determines subelement, for choosing a GPU from the first candidate GPU as target GPU；

Second judgment sub-unit is more than or equal to the demand with the presence or absence of shareable resource in each GPU for judging host Second candidate GPU of resource；Triggering second determines subelement when the second candidate GPU if it exists；

Second determines subelement, for choosing a GPU from the second candidate GPU as target GPU.

18. device according to claim 17, which is characterized in that the first judgment sub-unit is triggering the second judgment sub-unit Before, it is further used for: judges that the resource in host with the presence or absence of priority lower than the new task and distribution is more than or equal to The demand resource seizes task；Task can be seized if it exists, then target times is chosen in task from described seize Business gives the resource allocation of the goal task to the new task, and the new task is distributed to the goal task place Host；Task can be seized if it does not exist, then triggers the second judgment sub-unit.

19. device described in 7 or 18 according to claim 1, which is characterized in that the determination unit further includes that third judgement is single Member and third determine subelement, and second judgment sub-unit is further used for: triggering third when the second candidate GPU if it does not exist Judgment sub-unit；

Third judgment sub-unit whether there is the summation of idling-resource and shareable resource in each GPU for judging host More than or equal to the third candidate GPU of the demand resource, third candidate GPU then triggers third and determines subelement if it exists；

Third determines subelement, for choosing a GPU from the third candidate GPU as target GPU.

20. device described in 7 or 18 according to claim 1, which is characterized in that second determines that subelement is specifically used for: from described The priority that task in an execution for including is chosen in second candidate GPU is above the GPU of new task as target GPU；

And/or third determines that subelement is specifically used for: from the third candidate GPU choose an execution for including in task Priority be above the GPU of new task as target GPU.

21. a kind of resource scheduling device, which is characterized in that the distributed computing of master-worker mode is arranged in the device In the end worker host in cluster, device includes:

Resource determination unit, for determining the allowable resource in host in each image processor GPU；

Execution unit distributes to the task of the host for executing the end master server.

22. device according to claim 21, which is characterized in that the resource determination unit is specifically used for:

And/or distributed in each GPU in prediction host execute in task resource in be not performed whithin a period of time The shareable resource that task uses, and using shareable resource as allowable resource.

23. device according to claim 21, which is characterized in that the execution unit is specifically used for:

And when receiving the second instruction using the shareable resource execution new task of target GPU, utilize the target The shareable resource of GPU executes the new task.

24. device according to claim 23, which is characterized in that the execution unit is further used for: when detecting GPU In high-priority task when needing using more resources, the low priority task in the GPU out of service, and distributing to The shareable resource of low priority task distributes to the high-priority task.

25. a kind of resource scheduling system, which is characterized in that serviced including the end master server and respectively with the end master Multiple ends worker host of device connection:

The end master server, for monitoring the allowable resource of each GPU in each host；When receiving new task, really Determine the corresponding demand resource of new task；According to the allowable resource of each GPU in host, determine that allowable resource meets institute State the target GPU of demand resource；It is the new task distribution resource from the allowable resource of the target GPU, and will be described New task distributes to the host where the target GPU；

Host is sent to the service of the end master for determining the allowable resource of each GPU in host, and by allowable resource Device, and execute the task of the end master server distribution.