CN109428912B - Distributed system resource allocation method, device and system - Google Patents

Distributed system resource allocation method, device and system Download PDF

Info

Publication number
CN109428912B
CN109428912B CN201710737516.8A CN201710737516A CN109428912B CN 109428912 B CN109428912 B CN 109428912B CN 201710737516 A CN201710737516 A CN 201710737516A CN 109428912 B CN109428912 B CN 109428912B
Authority
CN
China
Prior art keywords
resource
job
management server
job management
returning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710737516.8A
Other languages
Chinese (zh)
Other versions
CN109428912A (en
Inventor
张杨
冯亦挥
欧阳晋
韩巧焕
汪方
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201710737516.8A priority Critical patent/CN109428912B/en
Priority to EP18848279.8A priority patent/EP3675434B1/en
Priority to PCT/CN2018/100579 priority patent/WO2019037626A1/en
Priority to JP2020508488A priority patent/JP2020531967A/en
Publication of CN109428912A publication Critical patent/CN109428912A/en
Priority to US16/799,616 priority patent/US11372678B2/en
Application granted granted Critical
Publication of CN109428912B publication Critical patent/CN109428912B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5022Mechanisms to release resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • H04L67/61Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources taking into account QoS or priority requirements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4812Task transfer initiation or dispatching by interrupt, e.g. masked
    • G06F9/4818Priority circuits therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/485Task life-cycle, e.g. stopping, restarting, resuming execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0893Assignment of logical groups to network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5003Managing SLA; Interaction between SLA and QoS
    • H04L41/5019Ensuring fulfilment of SLA
    • H04L41/5022Ensuring fulfilment of SLA by giving priorities, e.g. assigning classes of service
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/70Admission control; Resource allocation
    • H04L47/78Architectures of resource allocation
    • H04L47/783Distributed allocation of resources, e.g. bandwidth brokers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/70Admission control; Resource allocation
    • H04L47/82Miscellaneous aspects
    • H04L47/827Aggregation of resource allocation or reservation requests
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/48Indexing scheme relating to G06F9/48
    • G06F2209/483Multiproc
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority

Abstract

The embodiment of the application provides a method and equipment for distributing distributed system resources, wherein the method comprises the following steps: receiving a resource preemption request sent by a resource scheduling server, wherein the resource preemption request comprises job execution information corresponding to a first job management server; determining resources required to be returned by the second job management server and resource returning deadline according to job execution information corresponding to the first job management server and included in the resource preemption request; and returning the resources needing to be returned to the resource scheduling server before the resource returning deadline reaches according to the determined resources needing to be returned, the resource returning deadline and the current job execution progress of the second job management server. The method provided by the embodiment of the application can effectively improve the utilization rate of system resources and reduce resource waste.

Description

Distributed system resource allocation method, device and system
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a method, a device and a system for distributing resources of a distributed system.
Background
In a distributed system, when a job node needs to apply for a resource, a job management server sends a resource application request to a resource scheduling server, so as to apply for a certain amount of machine resources for the job node to use. After receiving the resource application request, the resource scheduling server calculates available resources which can be allocated to the job management server according to the remaining available resources in the distributed system, generates an available resource list, and sends the available resource list to the job management server. After receiving the available resource list, the job management server allocates the job node to the corresponding machine node to execute the job program.
In a distributed system, the situation that the whole cluster resources are exhausted frequently occurs, so that resource application is queued. Generally, resource application requests may be queued according to priority, and resource requests periodically taken from the wait queue for resubmission to request allocation of resources. In the wait queue, high priority resource requests are queued up and low priority resource requests are queued up. When the resource scheduling server receives a resource application request which is high in priority and needs to allocate resources immediately, in order to ensure that the resource application with the high priority can obtain resources quickly, the resource scheduling server violently recovers the low-priority job resources, and the high-priority resource application can occupy the low-priority job resources.
The applicant finds, through research, that when a situation occurs in which a resource of a high-priority job is applied for preemption of resources of a low-priority job, the resource scheduling server immediately recycles the resources of the low-priority job, and then a program being executed by the low-priority job is immediately forcibly terminated. If the low-priority jobs have already executed most of the jobs, the manner of terminating the jobs immediately causes the low-priority jobs to need to be executed from the beginning after acquiring the resources again, which reduces the resource utilization efficiency of the whole system.
Disclosure of Invention
The embodiment of the application provides a method, a device and a system for distributing distributed system resources, which can determine the time for returning the resources according to the job execution information of the job management server which occupies the resources and the current job execution progress of the job management server which returns the resources, thereby effectively improving the utilization rate of the system resources and reducing the resource waste.
Therefore, the embodiment of the application provides the following technical scheme:
in a first aspect, an embodiment of the present application provides a distributed resource allocation system, where the distributed resource allocation system includes a resource scheduling server, a first job management server, and a second job management server, where: the first job management server is used for sending a resource application request to the resource scheduling server, wherein the resource application request at least comprises job execution information of the first job management server; the resource scheduling server is used for determining a first job management server for preempting the resource and a second job management server for returning the resource when judging that the resource preemption condition is met; sending a resource preemption request to a second job management server for returning the resource, wherein the resource preemption request at least comprises job execution information corresponding to a first job management server for preempting the resource; receiving a resource returning request sent by a second job management server for returning the resource, and allocating the resource corresponding to the resource returning request to the first job management server for seizing the resource; the second job management server is used for receiving a resource preemption request sent by the resource scheduling server, and determining resources required to be returned by the second job management server and resource returning deadline time according to job execution information corresponding to the first job management server and included in the resource preemption request; and according to the determined resource returning deadline and the current job execution progress of the second job management server, returning the resources to be returned to the resource scheduling server before the resource returning deadline reaches.
In a second aspect, an embodiment of the present application provides a method for allocating distributed system resources, which is applied to a second job management server, and includes: receiving a resource preemption request sent by a resource scheduling server, wherein the resource preemption request comprises job execution information corresponding to a first job management server; determining resources required to be returned by the second job management server and resource returning deadline according to job execution information corresponding to the first job management server and included in the resource preemption request; and according to the determined resource returning deadline and the current job execution progress of the second job management server, returning the resources to be returned to the resource scheduling server before the resource returning deadline reaches.
In a third aspect, an embodiment of the present application provides a method for allocating resources of a distributed system, which is applied to a resource scheduling server, and includes: if the resource preemption condition is judged to be met, determining a first job management server for preempting the resource and a second job management server for returning the resource; sending a resource preemption request to a second job management server for returning the resource, wherein the resource preemption request at least comprises job execution information corresponding to a first job management server for preempting the resource; and receiving a resource returning request sent by the second job management server for returning the resources, and allocating the resources corresponding to the resource returning request to the first job management server for preempting the resources.
In a fourth aspect, an embodiment of the present application provides a distributed system resource allocation method applied to a first job management server, including: sending a resource application request to a resource scheduling server, wherein the resource application request at least comprises the production deadline of the operation or the resource acquisition deadline of the operation; receiving a resource allocation request sent by a resource scheduling server, and acquiring resources corresponding to the resource allocation request.
In a fifth aspect, an embodiment of the present application provides a second job management server, including: a receiving unit, configured to receive a resource preemption request sent by a resource scheduling server, where the resource preemption request includes job execution information corresponding to a first job management server; a determining unit, configured to determine, according to the job execution information corresponding to the first job management server included in the resource preemption request, a resource to be returned by the second job management server and a resource return deadline; and the resource returning unit is used for returning the resources needing to be returned to the resource scheduling server before the resource returning time is up according to the determined resource returning time and the current job execution progress of the second job management server.
In a sixth aspect, an embodiment of the present application provides a resource scheduling server, including: the determining unit is used for determining a first job management server for preempting the resource and a second job management server for returning the resource if the resource preemption condition is judged to be met; a sending unit, configured to send a resource preemption request to a second job management server that returns the resource, where the resource preemption request at least includes job execution information corresponding to a first job management server that preempts the resource; and the allocation unit is used for receiving the resource returning request sent by the second job management server for returning the resource and allocating the resource corresponding to the resource returning request to the first job management server for preempting the resource.
In a seventh aspect, an embodiment of the present application provides a first job management server, including: a sending unit, configured to send a resource application request to a resource scheduling server, where the resource application request at least includes an output deadline of a job or an arrival time of a resource of the job; and the receiving unit is used for receiving the resource allocation request sent by the resource scheduling server and acquiring the resource corresponding to the resource allocation request.
In an eighth aspect, embodiments of the present application provide an apparatus for resource allocation, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs configured to be executed by the one or more processors include instructions for: receiving a resource preemption request sent by a resource scheduling server, wherein the resource preemption request comprises job execution information corresponding to a first job management server; determining resources required to be returned by the second job management server and resource returning deadline according to job execution information corresponding to the first job management server and included in the resource preemption request; and according to the determined resource returning deadline and the current job execution progress of the second job management server, returning the resources to be returned to the resource scheduling server before the resource returning deadline reaches.
In a ninth aspect, embodiments of the present application provide an apparatus for resource allocation, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs configured to be executed by the one or more processors include instructions for: if the resource preemption condition is judged to be met, determining a first job management server for preempting the resource and a second job management server for returning the resource; sending a resource preemption request to a second job management server for returning the resource, wherein the resource preemption request at least comprises job execution information corresponding to a first job management server for preempting the resource; and receiving a resource returning request sent by the second job management server for returning the resources, and allocating the resources corresponding to the resource returning request to the first job management server for preempting the resources.
In a tenth aspect, embodiments of the present application provide an apparatus for resource allocation, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs configured to be executed by the one or more processors comprise instructions for: sending a resource application request to a resource scheduling server, wherein the resource application request at least comprises the production deadline of the operation or the resource acquisition deadline of the operation; receiving a resource allocation request sent by a resource scheduling server, and acquiring resources corresponding to the resource allocation request.
According to the distributed system resource allocation method, device and system provided by the embodiment of the application, when the resource preemption condition is determined to be met, the resource scheduling server can send the resource preemption request to the second job management server for returning the resource, the second job management server determines the resource return cut-off time according to the job execution information of the first job management server included in the resource preemption request, and the resource to be returned is returned to the resource scheduling server before the resource return cut-off time reaches according to the determined resource return cut-off time and the current job execution progress of the second job management server. According to the method provided by the embodiment of the application, when the resource preemption condition is determined to be met, the resources allocated by the second job management server are not immediately recovered, but the second job management server determines when to return the resources according to the job execution information of the first job management server and the current job execution progress of the second job management server, so that the second job management server can effectively utilize the resources, the whole resource utilization efficiency of the system can be effectively improved, and the resource waste is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic diagram of a distributed resource allocation system according to an embodiment of the present application;
fig. 2 is a flowchart of a method for allocating resources of a distributed system according to an embodiment of the present application;
fig. 3 is a flowchart of a method for allocating resources of a distributed system according to another embodiment of the present application;
fig. 4 is a schematic diagram of a resource allocation queue according to an embodiment of the present application;
fig. 5 is a flowchart of a method for allocating resources of a distributed system according to yet another embodiment of the present application;
fig. 6 is a flowchart of a method for allocating resources of a distributed system according to another embodiment of the present application;
FIG. 7 is a diagram illustrating a first job management server according to an embodiment of the present application;
fig. 8 is a schematic diagram of a resource scheduling server according to an embodiment of the present application;
FIG. 9 is a diagram illustrating a second job management server according to an embodiment of the present application;
FIG. 10 is a block diagram illustrating an apparatus for resource allocation in accordance with an example embodiment;
FIG. 11 is a block diagram illustrating an apparatus for resource allocation in accordance with an example embodiment;
fig. 12 is a block diagram illustrating an apparatus for resource allocation in accordance with an example embodiment.
Detailed Description
The embodiment of the application provides a method, a device and a system for distributing distributed system resources, which can determine the time for returning the resources according to the job execution information of the job management server which occupies the resources and the current job execution progress of the job management server which returns the resources, thereby effectively improving the utilization rate of the system resources and reducing the resource waste.
The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the examples of this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
The technical term "Resource scheduling server" (collectively called Resource Manager in english) generally refers to a server in a distributed system that coordinates and allocates resources available to the system.
The technical term "Job management server" (collectively referred to as "Job Master" in english) generally refers to a role of controlling all Job nodes (Job Worker) of the Job in a distributed system, and is responsible for applying for Job resources from all Job nodes to a resource scheduling server and sending Job programs to machine nodes. In general, a distributed system may include a plurality of job management servers.
The technical term "Machine Node" (collectively referred to as Machine Node in english) generally refers to a role in a distributed system that is responsible for supervising execution of user job programs on behalf of a Machine.
The technical term "Job node" (collectively called Job Worker in english) generally refers to a role of executing a specific Job program in a distributed system, and generally communicates only with a Job management server.
The technical term "job application resource" (referred to as Request throughout the english language) generally refers to a resource that is applied by a job management server in a distributed system, and corresponds to a resource application Request.
The technical term "job acquisition Resource" (referred to as Resource in all english) generally refers to a Resource acquired by a job management server in a distributed system, that is, a Resource allocated to the job management server by a Resource scheduler.
The technical term "machine-derived Resource" (hereinafter, referred to as Resource) generally refers to a Resource of a certain job node that the machine derives from a Resource scheduler. Generally, a machine obtains the resources of a certain job node from a resource scheduler before allowing the job node to execute a program. When the resource of a certain operation node is recovered by the resource scheduler, the machine forcibly stops the execution of the operation node.
Wherein the technical term "Priority of resource application request" (Priority) is used to describe the Priority order of resource application requests, the higher the Priority, the more important this resource request is, the better it should be satisfied.
Herein, the technical term "production job" generally refers to a job that is performed at a fixed point of time every day.
The technical term "development job" generally refers to a job of experimental nature submitted by a developer to debug a program, and the execution time per day is not fixed and the priority is generally low.
The technical term "job output time" generally refers to the time when the user job is executed at the latest.
The technical term "resource acquisition time" generally refers to the time when the job management server acquires the resource at the latest.
The technical term "resource return time" generally refers to the time when the job management server returns the resource at the latest.
The term "job execution time" generally refers to the time required for the job to be executed.
Referring to fig. 1, a distributed resource allocation system according to an embodiment of the present application is provided. The distributed resource allocation system includes a resource scheduling server 800, a first job management server 700, and a second job management server 900. The distributed resource allocation system may include a plurality of job management servers.
It should be noted that, in the prior art, when a resource with a high priority applies for preemption of a resource of a job with a low priority, the resource scheduling server will immediately recover the resource of the job with the low priority, and the program being executed by the job with the low priority will be immediately forcibly terminated.
The applicant has found through research that this implementation mode has at least the following disadvantages: (1) assuming that the execution time of the job of the first priority is 1 hour, and 55 minutes are executed currently, it is expected that the resource can be returned only by executing 5 minutes again. If a resource preemption event occurs at this time, the resource scheduling server forcibly recovers the resources of the low-priority job, and from the perspective of the resource utilization rate of the whole system, the resources of 55 minutes are also wasted. (2) If the low-priority job only needs to give up resources to the high-priority job in the execution process, the brute force recovery method causes the executed part of the first-priority job to be completely lost, and the first-priority job has no chance to be backed up. When a low-priority job acquires resources again, it can only be executed from the beginning, reducing execution efficiency. (3) Assuming that the execution time of the high-priority job is 1 hour, and the time of submitting the resource application request is 2 hours away from the job output deadline time, if the high-priority job preempts the resource of the low-priority job for the first time, and the low-priority job is executed in 5 minutes, the violent preemption is not necessary.
The applicant finds out through the analysis, if the execution time, the output deadline and the execution progress of the low-priority job are not considered, the resources of the low-priority job are directly recovered and allocated to the high-priority job, and unnecessary preemption events occur, so that the resource utilization rate of the system is reduced.
Based on the above, the application provides a resource allocation method, device and system, which replace the mode of directly recovering allocated resources of low-priority jobs by violence, when a resource preemption event occurs, the resource scheduling server and the job management server of the low-priority jobs can perform a round of interactive negotiation first, and comprehensively judge when to recover the resources of the low-priority jobs according to the execution time of the high-priority jobs, the output deadline time, the execution progress of the low-priority jobs and other information, so that the high-priority jobs can be guaranteed to take the resources, and the effective resource utilization rate of the system is maximized.
As shown in fig. 1, the distributed resource allocation system includes a resource scheduling server 800, a first job management server 700, and a second job management server 900, in which:
the first job management server 700 is configured to send a resource application request to the resource scheduling server, where the resource application request at least includes job execution information of the first job management server 700.
The resource scheduling server 800 is configured to determine a first job management server 700 that preempts a resource and a second job management server 900 that returns the resource when determining that a resource preemption condition is satisfied; sending a resource preemption request to the second job management server 900 returning the resource, where the resource preemption request at least includes job execution information corresponding to the first job management server 700 preempting the resource; and receiving a resource returning request sent by the second job management server 900 for returning the resource, and allocating the resource corresponding to the resource returning request to the first job management server 700 for preempting the resource.
The second job management server 900 is configured to receive a resource preemption request sent by the resource scheduling server 800, and determine, according to job execution information corresponding to the first job management server 700 included in the resource preemption request, resource return deadline of the second job management server 900; and returning the resources to be returned to the resource scheduling server 700 according to the determined resource return deadline and the current job execution progress of the second job management server 900.
In the above, the resource allocation system provided in the embodiment of the present application is briefly introduced, and the resource allocation method provided in the embodiment of the present application is described below from the first job manager, the resource scheduling server, and the second job management server, respectively.
The resource allocation method according to the exemplary embodiment of the present application will be described with reference to fig. 2 to 5.
Referring to fig. 2, a flowchart of a distributed system resource allocation method according to an embodiment of the present application is provided. As shown in fig. 2, the method applied to the first job management server may include:
s201, a first job management server sends a resource application request to a resource scheduling server, wherein the resource application request at least comprises the production deadline of a job or the resource deadline of the job.
In specific implementation, the first job management server that needs to obtain the resource may send a resource application request to the resource scheduling server. In one possible implementation, the resource application request includes a yield deadline for the job, i.e., a latest yield time of the job. In this implementation, the resource application request only includes the production deadline of the job, and the execution time of the job is calculated by the resource scheduling server. For example, the resource scheduling server obtains the history data of the job, and calculates the execution time of the job according to the history data of the job. For example, the execution time of the job is generally regular, and may be determined by the job management server or the resource scheduling server based on historical data, for example, the average execution time of the job for the past 7 days.
In another possible implementation manner, the resource application request may include a job output deadline and a job execution time. The job execution time generally refers to the time required for the job to be executed completely, and may be, for example, the time from when the job starts to be executed to when the job is executed and returns the resource. Generally, job execution time may be determined by: (1) the user may specify job execution time. For example, when a user job is formally submitted to a distributed system for execution, offline debugging is necessarily performed, the execution time of the user job can be estimated through offline debugging, and the user specifies or sets the job execution time when submitting a resource application request. (2) And acquiring the historical data of the job, and calculating the execution time of the job according to the historical data of the job.
In another possible implementation manner, the resource application request includes a resource arrival time of the job. In specific implementation, the first job management server may determine an execution time and a yield deadline of a job, and obtain a resource deadline of the job according to a difference between the yield deadline of the job and the execution time of the job. For example, the resource gets an expiration time TA _ resume, which is the yield expiration time-execution time. For example, if the throughput of a job is 22 o 'clock and the execution time of the job is 2 hours, the resource is 20 o' clock.
Of course, in other implementation manners, the resource application request may also include a job execution time, a job output deadline, and a resource obtaining deadline, which are not limited herein.
S202, the first job management server receives a resource allocation request sent by the resource scheduling server, and acquires a resource corresponding to the resource allocation request.
It should be noted that, unlike the prior art in which the resource application request only includes the request submission time, in the present application, the resource application request further includes the execution information of the job, so as to determine the time to which the resource of the job is intercepted and/or the time to which the resource is returned, more flexibly determine the time for allocating and returning the resource, and can effectively improve the resource utilization efficiency.
Referring to fig. 3, a flowchart of a distributed system resource allocation method according to an embodiment of the present application is provided. As shown in fig. 3, the method applied to the resource scheduling server may include:
s301, if the resource preemption condition is satisfied, the resource scheduling server determines a first job management server for preempting the resource and a second job management server for returning the resource.
In specific implementation, the judging that the resource preemption condition is met includes: and when the priority of the received resource application request is determined to be high and the number of the remaining available resources of the system is less than the application number of the resource application request, judging that the resource preemption condition is met. For example, the first job management server sends a resource application request C to the resource scheduling server, and it is assumed that the resource to be applied is a CPU: 70. MEM: and 60, the priority of the resource application request is 4. The remaining available resources of the system are CPU: 0. MEM: 0, the remaining available resources of the system cannot meet or completely meet the request of the high-priority operation, and the resource preemption is triggered. The resource scheduling server will traverse the resource allocation queue to determine the job management server that needs to return the resource. Specifically, a resource allocation queue exists in the resource scheduling server, which records the resource allocation situation of the resource scheduling server for the resource application, and the queue is arranged according to the priority of the resource application. When a resource application with a high priority is submitted to the resource scheduling server and the remaining system resources are not enough to satisfy the resource application, the resource scheduling server traverses the resource allocation queues according to the sequence from low priority to high priority, and for each resource application with a priority lower than that of the high priority resource application, it is assumed that all allocated resources are recovered, and then the resource application is accumulated with the remaining cluster resources. Judging whether the new residual resources obtained after accumulation can meet the resource application with high priority, and if so, stopping traversing; if not, the next resource application with low priority is continuously traversed until the resource application with high priority can be met. All of the used resources of the low priority jobs may be preempted by the high priority jobs. Fig. 4 is a schematic diagram of a resource allocation queue according to an embodiment of the present application. For example, assuming that the resource allocation queue is traversed in the order of low priority to high priority, for resource application B, assuming that its allocated resources are all preempted, the new remaining resources are the CPU: 30. MEM: 20. at this time, the resource request B is not yet satisfied enough to continue traversing, and assuming that its allocated resources are all preempted, the new remaining resources are the CPU: 70. MEM: 60. and at the moment, the residual resources can meet the high-priority resource application, and then the second job management server for returning the resources is determined to be the job management server corresponding to the resource application B and the resource application A.
S302, the resource scheduling server sends a resource preemption request to the second job management server of the returned resource, wherein the resource preemption request at least comprises job execution information corresponding to the first job management server of the preempted resource.
Different from the prior art that resources are allocated to the second job management server which directly recovers the returned resources violently, in the embodiment of the application, the resource scheduling server sends a resource preemption request to the second job management server which returns the resources, and notifies the second job management server that the resources need to be returned to the high-priority job.
In a possible implementation manner, the resource application request sent by the first job manager to the resource scheduling server only includes the job yield deadline, and the job execution information of the first job management server included in the resource preemption request may include the yield deadline of the job and the execution time of the job. Wherein the job execution time is determined by: and the resource scheduling server acquires the historical data of the job and calculates the execution time of the job according to the historical data of the job.
In yet another possible implementation manner, the resource application request sent by the first job manager to the resource scheduling server includes a job output deadline and a job execution time, and the job execution information of the first job management server included in the resource preemption request may include the job output deadline and the job execution time. Alternatively, the job execution information of the first job management server included in the resource preemption request may include a resource arrival time of the job. Wherein the resource arrival time of the job is determined by:
in another possible implementation manner, the resource application request sent by the first job manager to the resource scheduling server includes a job resource acquisition deadline, and the job execution information of the first job management server included in the resource preemption request may include the job resource acquisition deadline.
And when the resource scheduling server determines that the second job management server returning the resources comprises a plurality of job management servers, the resource scheduling server sends resource preemption requests to the job management servers respectively.
And S303, the resource scheduling server receives the resource returning request sent by the second job management server for returning the resource, and allocates the resource corresponding to the resource returning request to the first job management server for preempting the resource.
It should be noted that, when determining the first job management server that preempts the resource and the second job management server that returns the resource, the resource scheduling server may record the correspondence between the first job management server and the second job management server. When the second job management server sends the resource returning request, the returned resource can be distributed to the corresponding first job management server without being respectively sent to other waiting job management servers, so that the high-priority job can be ensured to quickly take the resource.
Referring to fig. 5, a flowchart of a distributed system resource allocation method according to an embodiment of the present application is provided. As shown in fig. 5, the method applied to the second job management server may include:
s501, the second job management server receives a resource preemption request sent by the resource scheduling server, wherein the resource preemption request comprises job execution information corresponding to the first job management server.
The job execution information corresponding to the first job management server may include a production deadline of the job and an execution time of the job, or the job execution information includes a resource deadline of the job.
It should be noted that the resource preemption request may further include the number of returned resources, and the second job management server may determine to return some or all of the allocated resources according to the number of returned resources.
And S502, the second job management server determines the resources required to be returned by the second job management server and the resource returning deadline according to the job execution information corresponding to the first job management server included in the resource preemption request.
In some embodiments, the determining, by the job execution information corresponding to the first job management server, resources that the second job management server needs to return and resource return deadline according to the job execution information corresponding to the first job management server included in the resource preemption request includes: and obtaining the resource obtaining ending time of the operation according to the difference value between the output ending time of the operation and the execution time of the operation, and taking the resource obtaining ending time of the operation as the resource returning ending time of the second operation management server.
In some embodiments, the job execution information corresponding to the first job management server includes a resource acquisition deadline of the job, and the resource acquisition deadline may be used as a resource return deadline.
And S503, the second job management server returns the resources to be returned to the resource scheduling server before the resource returning time is up according to the determined resource returning time and the current job execution progress of the second job management server.
In some embodiments, the returning the resources required to be returned to the resource scheduling server before the resource return deadline arrives according to the determined resource return deadline and the current job execution progress of the second job management server includes: calculating the remaining completion time of the current operation according to the current operation execution progress of the second operation management server; and if the sum of the current time and the residual completion time of the current operation is not larger than the resource returning deadline, returning the resources to be returned to the resource scheduling server by the second operation management server after the current operation is finished.
In some embodiments, the returning the resources required to be returned to the resource scheduling server before the resource return deadline arrives according to the determined resource return deadline and the current job execution progress of the second job management server includes: calculating the remaining completion time of the current operation according to the current operation execution progress of the second operation management server; and if the sum of the current time and the residual completion time of the current operation is judged to be larger than the resource returning time, the second operation management server backs up the current operation, records the backup position, and returns the resource to be returned to the resource scheduling server after the backup is completed. Further, the method further comprises: and receiving the resources re-allocated by the resource scheduling server, acquiring the backup of the current operation, and continuously executing the current operation from the backup position.
For example, after receiving the resource preemption request sent by the resource scheduling server, the second job management server may determine the final resource returning time TA _ Return, and obtain the remaining completion time of the job according to the time TA _ run that the current job has been executed and the estimated job execution time TA _ all: TA _ left-TA _ all-TA _ run. If the current time + TA _ left < ═ TA _ Return, then the resources are naturally returned after job A execution is completed. If the current time + TA _ left > TA _ Return, then job A begins logging the backup as it executes near the TA _ Return point in time, so that it can continue executing from the backup location when the resource is taken up again, rather than from scratch. After the backup is recorded, the second job management server actively returns the resources to the resource scheduling server, and meanwhile, the second job management server waits for the resource scheduling server to allocate the resources again.
In order to facilitate a clear understanding of the embodiments of the present application for those skilled in the art, the following description of the embodiments of the present application is provided as a specific example. It should be noted that the specific example is only to make the present application more clearly understood by those skilled in the art, but the embodiments of the present application are not limited to the specific example.
Referring to fig. 6, a flowchart of a distributed system resource allocation method according to an embodiment of the present application is provided. As shown in fig. 6, may include:
s601, the first job management server sends a resource application request to the resource scheduling server.
The resource application request includes a job output deadline and a job execution time.
S602, the resource scheduling server receives the resource application request and judges whether the resource preemption condition is met.
S603, the resource scheduling server determines that the resource preemption condition is met, traverses the resource allocation queue, determines a second job management server for returning the resource, and records the corresponding relation between the first job management server and the second job management server.
Wherein the plurality of second job management servers may be determined as job management servers that return the resources.
S604, the resource scheduling server sends a resource preemption request to the second job management server.
The resource preemption request includes a job output deadline and a job execution time of the first job management server.
S605, the second job management server determines the resource return time.
And the second job management server obtains the resource obtaining ending time of the job according to the difference value between the output ending time of the job and the execution time of the job, and the resource obtaining ending time of the job is used as the resource returning ending time of the second job management server.
And S606, the second job management server determines the time for returning the resources according to the resource returning time and the current job execution progress of the second job management server.
And the second job management server calculates the remaining completion time of the current job according to the current job execution progress of the second job management server. And if the sum of the current time and the residual completion time of the current operation is not larger than the resource returning deadline, returning the resources to be returned to the resource scheduling server by the second operation management server after the current operation is finished. And if the sum of the current time and the residual completion time of the current operation is judged to be larger than the resource returning time, the second operation management server backs up the current operation, records the backup position, and returns the resource to be returned to the resource scheduling server after the backup is completed.
S607, the second job management server transmits a resource return request.
S608, the resource scheduling server allocates resources for the first job management server.
The resource allocation method provided by the embodiment of the application can ensure that the high-priority operation can obtain resources quickly, and simultaneously improves the resource utilization efficiency of the system to the greatest extent.
The following describes a device corresponding to the method provided by the embodiment of the present application.
Referring to fig. 7, a schematic diagram of a first job management server according to an embodiment of the present application is provided.
A first job management server apparatus 700 comprising:
a sending unit 701, configured to send a resource application request to a resource scheduling server, where the resource application request at least includes an ending time of output of a job or an ending time of a resource of the job. The specific implementation of the sending unit 701 may be implemented with reference to S201 in the embodiment shown in fig. 2.
A receiving unit 702, configured to receive a resource allocation request sent by a resource scheduling server, and acquire a resource corresponding to the resource allocation request. The specific implementation of the sending unit 702 can be implemented with reference to S202 in the embodiment shown in fig. 2.
Referring to fig. 8, a schematic diagram of a resource scheduling server according to an embodiment of the present application is provided.
A resource scheduling server 800, comprising:
a determining unit 801, configured to determine, if it is determined that the resource preemption condition is met, a first job management server that preempts the resource and a second job management server that returns the resource. The specific implementation of the sending unit 801 may be implemented with reference to S301 in the embodiment shown in fig. 3.
A sending unit 802, configured to send a resource preemption request to the second job management server that returns the resource, where the resource preemption request at least includes job execution information corresponding to the first job management server that preempts the resource. The specific implementation of the sending unit 802 may be implemented with reference to S302 in the embodiment shown in fig. 3.
An allocating unit 803, configured to receive a resource returning request sent by the second job management server that returns the resource, and allocate the resource corresponding to the resource returning request to the first job management server that preempts the resource. The specific implementation of the allocating unit 803 may be implemented with reference to S303 in the embodiment shown in fig. 3.
In some embodiments, the resource scheduling server further comprises:
and the judging unit is used for judging that the resource preemption condition is met when the priority of the received resource application request is determined to be high and the number of the remaining available resources of the system is less than the application number of the resource application request.
In some embodiments, the job execution information corresponding to the first job management server sent by the sending unit includes a yield deadline of the job and an execution time of the job, or the job execution information includes a resource deadline of the job.
In some embodiments, the resource application request of the first job management server received by the receiving unit includes a yield deadline of the job, and the execution time of the job sent by the sending unit is determined by: acquiring historical data of the job, and calculating the execution time of the job according to the historical data of the job; and the resource arrival time of the job transmitted by the transmitting unit is determined by the following method: and obtaining the ending time of the resources of the operation according to the difference value between the output ending time of the operation and the execution time of the operation.
Referring to fig. 9, a schematic diagram of a second job management server according to an embodiment of the present application is provided.
A second job management server 900, comprising:
a receiving unit 901, configured to receive a resource preemption request sent by a resource scheduling server, where the resource preemption request includes job execution information corresponding to a first job management server. The specific implementation of the receiving unit 901 can be implemented with reference to S501 in the embodiment shown in fig. 5.
A determining unit 902, configured to determine, according to the job execution information corresponding to the first job management server included in the resource preemption request, the resource that needs to be returned by the second job management server and the resource return deadline. The specific implementation of the determining unit 902 can be implemented with reference to S502 in the embodiment shown in fig. 5.
A resource returning unit 903, configured to return, according to the determined resource returning deadline and the current job execution progress of the second job management server, the resource that needs to be returned to the resource scheduling server before the resource returning deadline reaches. The specific implementation of the resource returning unit 903 may be implemented with reference to S503 in the embodiment shown in fig. 5.
In some embodiments, the job execution information corresponding to the first job management server received by the receiving unit includes a production deadline of the job and an execution time of the job, and the determining unit is configured to obtain a resource acquisition deadline of the job according to a difference between the production deadline of the job and the execution time of the job, and use the resource acquisition deadline of the job as a resource return deadline of the second job management server.
In some embodiments, the resource returning unit is specifically configured to: calculating the remaining completion time of the current operation according to the current operation execution progress of the second operation management server; and if the sum of the current time and the residual completion time of the current operation is not larger than the resource returning deadline, returning the resources to be returned to the resource scheduling server by the second operation management server after the current operation is finished.
In some embodiments, the resource returning unit is specifically configured to: calculating the remaining completion time of the current operation according to the current operation execution progress of the second operation management server; and if the sum of the current time and the residual completion time of the current operation is judged to be larger than the resource returning time, the second operation management server backs up the current operation, records the backup position, and returns the resource to be returned to the resource scheduling server after the backup is completed.
In some embodiments, the second job management server further comprises:
and the acquisition unit is used for receiving the resources re-allocated by the resource scheduling server, acquiring the backup of the current operation and continuously executing the current operation from the backup position.
Referring to fig. 10, a block diagram of an apparatus for resource allocation according to another embodiment of the present application is shown. The method comprises the following steps: at least one processor 1001 (e.g., CPU), memory 1002, and at least one communication bus 1003 for enabling communications among the devices. The processor 1001 is used to execute executable modules, such as computer programs, stored in the memory 1002. The Memory 1002 may include a Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. One or more programs are stored in memory and configured to be executed by the one or more processors 1001 includes instructions for: receiving a resource preemption request sent by a resource scheduling server, wherein the resource preemption request comprises job execution information corresponding to a first job management server; determining resources required to be returned by the second job management server and resource returning deadline according to job execution information corresponding to the first job management server and included in the resource preemption request; and according to the determined resource returning deadline and the current job execution progress of the second job management server, returning the resources to be returned to the resource scheduling server before the resource returning deadline reaches.
In some embodiments, processor 1001 is specifically configured to execute the one or more programs including instructions to: and when the job execution information corresponding to the first job management server comprises the output deadline of the job and the execution time of the job, obtaining the resource acquisition deadline of the job according to the difference value between the output deadline of the job and the execution time of the job, and taking the resource acquisition deadline of the job as the resource return deadline of the second job management server.
In some embodiments, processor 1001 is specifically configured to execute the one or more programs including instructions to: calculating the remaining completion time of the current operation according to the current operation execution progress of the second operation management server; and if the sum of the current time and the residual completion time of the current operation is not larger than the resource returning deadline, returning the resources to be returned to the resource scheduling server by the second operation management server after the current operation is finished.
In some embodiments, processor 1001 is specifically configured to execute the one or more programs including instructions to: calculating the remaining completion time of the current operation according to the current operation execution progress of the second operation management server; and if the sum of the current time and the residual completion time of the current operation is judged to be larger than the resource returning time, the second operation management server backs up the current operation, records the backup position, and returns the resource to be returned to the resource scheduling server after the backup is completed.
In some embodiments, processor 1001 is specifically configured to execute the one or more programs including instructions to: and receiving the resources re-allocated by the resource scheduling server, acquiring the backup of the current operation, and continuously executing the current operation from the backup position.
Referring to fig. 11, a block diagram of an apparatus for resource allocation according to another embodiment of the present application is shown. The method comprises the following steps: at least one processor 1101 (e.g., CPU), memory 1102 and at least one communication bus 1103 for enabling communications among the devices. The processor 1101 is used to execute executable modules, such as computer programs, stored in the memory 1102. Memory 1102 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory, such as at least one disk Memory. One or more programs are stored in the memory and configured to be executed by the one or more processors 1101 include instructions for: if the resource preemption condition is judged to be met, determining a first job management server for preempting the resource and a second job management server for returning the resource; sending a resource preemption request to a second job management server for returning the resource, wherein the resource preemption request at least comprises job execution information corresponding to a first job management server for preempting the resource; and receiving a resource returning request sent by the second job management server for returning the resources, and allocating the resources corresponding to the resource returning request to the first job management server for preempting the resources.
In some embodiments, processor 1101 is specifically configured to execute the one or more programs including instructions for: and when the priority of the received resource application request is determined to be high and the number of the remaining available resources of the system is less than the application number of the resource application request, judging that the resource preemption condition is met.
In some embodiments, processor 1101 is specifically configured to execute the one or more programs including instructions for: acquiring historical data of the job, and calculating the execution time of the job according to the historical data of the job; and obtaining the ending time of the resources of the operation according to the difference value between the output ending time of the operation and the execution time of the operation.
Referring to fig. 12, a block diagram of an apparatus for resource allocation according to another embodiment of the present application is shown. The method comprises the following steps: at least one processor 1201 (e.g., CPU), memory 1202, and at least one communication bus 1203 are used to enable connectivity between the devices. The processor 1201 is used to execute executable modules, such as computer programs, stored in the memory 1202. Memory 1202 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory, such as at least one disk Memory. One or more programs are stored in the memory and configured to be executed by the one or more processors 1201, including instructions for: sending a resource application request to a resource scheduling server, wherein the resource application request at least comprises the production deadline of the operation or the resource acquisition deadline of the operation; receiving a resource allocation request sent by a resource scheduling server, and acquiring resources corresponding to the resource allocation request.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 304 comprising instructions, executable by the processor 320 of the apparatus 300 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
A machine-readable medium, which may be, for example, a non-transitory computer-readable storage medium, in which instructions, when executed by a processor of an apparatus (terminal or server), enable the apparatus to perform a borehole measurement method as shown in fig. 1, the method comprising: receiving a resource preemption request sent by a resource scheduling server, wherein the resource preemption request comprises job execution information corresponding to a first job management server; determining resources required to be returned by the second job management server and resource returning time according to the job execution information corresponding to the first job management server; and returning the resources needing to be returned to the resource scheduling server before the resource returning deadline reaches according to the determined resources needing to be returned, the resource returning deadline and the current job execution progress of the second job management server.
The arrangement of each unit or module of the apparatus of the present application can be implemented by referring to the methods shown in fig. 2 to 6, which are not described herein again.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice in the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the attached claims
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort. The foregoing is directed to embodiments of the present application and it is noted that numerous modifications and adaptations may be made by those skilled in the art without departing from the principles of the present application and are intended to be within the scope of the present application.

Claims (18)

1. A distributed resource allocation system, comprising a resource scheduling server, a first job management server, and a second job management server, wherein:
the first job management server is used for sending a resource application request to the resource scheduling server, wherein the resource application request at least comprises job execution information of the first job management server;
the resource scheduling server is used for determining a first job management server for preempting the resource and a second job management server for returning the resource when judging that the resource preemption condition is met; sending a resource preemption request to a second job management server for returning the resource, wherein the resource preemption request at least comprises job execution information corresponding to a first job management server for preempting the resource; receiving a resource returning request sent by a second job management server for returning the resource, and allocating the resource corresponding to the resource returning request to the first job management server for seizing the resource;
the second job management server is used for receiving a resource preemption request sent by the resource scheduling server, and determining resources required to be returned by the second job management server and resource returning deadline time according to job execution information corresponding to the first job management server and included in the resource preemption request; and returning the resources needing to be returned to the resource scheduling server before the resource returning deadline reaches according to the determined resources needing to be returned, the resource returning deadline and the current job execution progress of the second job management server.
2. A distributed system resource allocation method is applied to a second job management server and comprises the following steps:
receiving a resource preemption request sent by a resource scheduling server, wherein the resource preemption request comprises job execution information corresponding to a first job management server;
determining resources required to be returned by the second job management server and resource returning time according to the job execution information corresponding to the first job management server;
and returning the resources needing to be returned to the resource scheduling server before the resource returning deadline reaches according to the determined resources needing to be returned, the resource returning deadline and the current job execution progress of the second job management server.
3. The method according to claim 2, wherein the job execution information corresponding to the first job management server includes a production deadline of the job and an execution time of the job, and the determining the resource to which the second job management server needs to return and the resource return deadline according to the job execution information corresponding to the first job management server included in the resource preemption request includes:
and obtaining the resource obtaining ending time of the operation according to the difference value between the output ending time of the operation and the execution time of the operation, and taking the resource obtaining ending time of the operation as the resource returning ending time of the second operation management server.
4. The method of claim 2, wherein returning the resources to be returned to the resource scheduling server before the resource return deadline arrives according to the determined resource return deadline and a current job execution progress of the second job management server comprises:
calculating the remaining completion time of the current operation according to the current operation execution progress of the second operation management server;
and if the sum of the current time and the residual completion time of the current operation is not larger than the resource returning deadline, returning the resources to be returned to the resource scheduling server by the second operation management server after the current operation is finished.
5. The method of claim 2, wherein returning the resources to be returned to the resource scheduling server before the resource return deadline arrives according to the determined resource return deadline and a current job execution progress of the second job management server comprises:
calculating the remaining completion time of the current operation according to the current operation execution progress of the second operation management server;
and if the sum of the current time and the residual completion time of the current operation is judged to be larger than the resource returning time, the second operation management server backs up the current operation, records the backup position, and returns the resource to be returned to the resource scheduling server after the backup is completed.
6. The method of claim 5, further comprising:
and receiving the resources re-allocated by the resource scheduling server, acquiring the backup of the current operation, and continuously executing the current operation from the backup position.
7. A resource allocation method of a distributed system is applied to a resource scheduling server, and comprises the following steps:
if the resource preemption condition is judged to be met, determining a first job management server for preempting the resource and a second job management server for returning the resource;
sending a resource preemption request to a second job management server for returning the resource, wherein the resource preemption request at least comprises job execution information corresponding to a first job management server for preempting the resource;
and receiving a resource returning request sent by the second job management server for returning the resources, and allocating the resources corresponding to the resource returning request to the first job management server for preempting the resources.
8. The method of claim 7, wherein determining that a resource preemption condition is met comprises:
and when the priority of the received resource application request is determined to be high and the number of the remaining available resources of the system is less than the application number of the resource application request, judging that the resource preemption condition is met.
9. The method according to claim 7 or 8, wherein the job execution information corresponding to the first job management server includes a production deadline of the job and an execution time of the job, or the job execution information includes a resource acquisition deadline of the job.
10. The method of claim 9, wherein the resource request from the first job management server includes a throughput deadline for the job, and wherein the execution time for the job is determined by:
acquiring historical data of the job, and calculating the execution time of the job according to the historical data of the job;
the resource arrival time of the job is determined by:
and obtaining the ending time of the resources of the operation according to the difference value between the output ending time of the operation and the execution time of the operation.
11. A distributed system resource allocation method is applied to a first job management server and comprises the following steps:
sending a resource application request to a resource scheduling server, wherein the resource application request at least comprises the production deadline of the operation or the resource acquisition deadline of the operation;
receiving a resource allocation request sent by a resource scheduling server, and acquiring resources corresponding to the resource allocation request.
12. A second job management server, comprising:
a receiving unit, configured to receive a resource preemption request sent by a resource scheduling server, where the resource preemption request includes job execution information corresponding to a first job management server;
a determining unit, configured to determine, according to the job execution information corresponding to the first job management server included in the resource preemption request, a resource to be returned by the second job management server and a resource return deadline;
and the resource returning unit is used for returning the resources needing to be returned to the resource scheduling server before the resource returning time is up according to the determined resource returning time and the current job execution progress of the second job management server.
13. A resource scheduling server, comprising:
the determining unit is used for determining a first job management server for preempting the resource and a second job management server for returning the resource if the resource preemption condition is judged to be met;
a sending unit, configured to send a resource preemption request to a second job management server that returns the resource, where the resource preemption request at least includes job execution information corresponding to a first job management server that preempts the resource;
and the allocation unit is used for receiving the resource returning request sent by the second job management server for returning the resource and allocating the resource corresponding to the resource returning request to the first job management server for preempting the resource.
14. A first job management server, comprising:
a sending unit, configured to send a resource application request to a resource scheduling server, where the resource application request at least includes an output deadline of a job or an arrival time of a resource of the job;
and the receiving unit is used for receiving the resource allocation request sent by the resource scheduling server and acquiring the resource corresponding to the resource allocation request.
15. An apparatus for resource allocation comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and wherein execution of the one or more programs by one or more processors comprises instructions for:
receiving a resource preemption request sent by a resource scheduling server, wherein the resource preemption request comprises job execution information corresponding to a first job management server;
determining resources required to be returned by the second job management server and resource returning deadline according to job execution information corresponding to the first job management server and included in the resource preemption request;
and according to the determined resource returning deadline and the current job execution progress of the second job management server, returning the resources to be returned to the resource scheduling server before the resource returning deadline reaches.
16. An apparatus for resource allocation comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and wherein execution of the one or more programs by one or more processors comprises instructions for:
if the resource preemption condition is judged to be met, determining a first job management server for preempting the resource and a second job management server for returning the resource;
sending a resource preemption request to a second job management server for returning the resource, wherein the resource preemption request at least comprises job execution information corresponding to a first job management server for preempting the resource;
and receiving a resource returning request sent by the second job management server for returning the resources, and allocating the resources corresponding to the resource returning request to the first job management server for preempting the resources.
17. An apparatus for resource allocation comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and wherein execution of the one or more programs by one or more processors comprises instructions for:
sending a resource application request to a resource scheduling server, wherein the resource application request at least comprises the production deadline of the operation or the resource acquisition deadline of the operation;
receiving a resource allocation request sent by a resource scheduling server, and acquiring resources corresponding to the resource allocation request.
18. A machine-readable medium having stored thereon instructions, which when executed by one or more processors, cause an apparatus to perform a resource allocation method as recited in one or more of claims 2-6.
CN201710737516.8A 2017-08-24 2017-08-24 Distributed system resource allocation method, device and system Active CN109428912B (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
CN201710737516.8A CN109428912B (en) 2017-08-24 2017-08-24 Distributed system resource allocation method, device and system
EP18848279.8A EP3675434B1 (en) 2017-08-24 2018-08-15 Distributed system resource allocation method, device and system
PCT/CN2018/100579 WO2019037626A1 (en) 2017-08-24 2018-08-15 Distributed system resource allocation method, device and system
JP2020508488A JP2020531967A (en) 2017-08-24 2018-08-15 Distributed system Resource allocation methods, equipment, and systems
US16/799,616 US11372678B2 (en) 2017-08-24 2020-02-24 Distributed system resource allocation method, apparatus, and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710737516.8A CN109428912B (en) 2017-08-24 2017-08-24 Distributed system resource allocation method, device and system

Publications (2)

Publication Number Publication Date
CN109428912A CN109428912A (en) 2019-03-05
CN109428912B true CN109428912B (en) 2020-07-10

Family

ID=65438390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710737516.8A Active CN109428912B (en) 2017-08-24 2017-08-24 Distributed system resource allocation method, device and system

Country Status (5)

Country Link
US (1) US11372678B2 (en)
EP (1) EP3675434B1 (en)
JP (1) JP2020531967A (en)
CN (1) CN109428912B (en)
WO (1) WO2019037626A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109428912B (en) 2017-08-24 2020-07-10 阿里巴巴集团控股有限公司 Distributed system resource allocation method, device and system
CN110096357A (en) * 2019-03-29 2019-08-06 北京奇安信科技有限公司 A kind of the memory source method for cleaning and device of distributed system
CN110134521B (en) * 2019-05-28 2021-06-11 北京达佳互联信息技术有限公司 Resource allocation method, device, resource manager and storage medium
CN110362407A (en) * 2019-07-19 2019-10-22 中国工商银行股份有限公司 Computing resource dispatching method and device
FR3103596B1 (en) * 2019-11-26 2021-11-19 Thales Sa PROCESS FOR ALLOCATING RESOURCES IN RESPONSE TO REQUESTS BASED ON THEIR PRIORITY, COMPUTER PROGRAM, ASSIGNMENT CONTROL BLOCK AND ASSOCIATED COMPUTER SYSTEM
CN111459656B (en) * 2020-03-06 2023-11-03 北京百度网讯科技有限公司 Server management method, device, electronic equipment and storage medium
CN112416450B (en) * 2020-06-05 2023-02-17 上海哔哩哔哩科技有限公司 Resource encryption and display method and system
CN115311123B (en) * 2022-08-11 2023-04-28 浙江中测新图地理信息技术有限公司 Pixel stream GPU resource scheduling method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011050633A1 (en) * 2009-10-27 2011-05-05 中兴通讯股份有限公司 Method and device for service admitting disposal
CN102761469A (en) * 2011-04-27 2012-10-31 阿里巴巴集团控股有限公司 Allocation method and device for resource pool
CN103294533A (en) * 2012-10-30 2013-09-11 北京安天电子设备有限公司 Task flow control method and task flow control system
CN104079503A (en) * 2013-03-27 2014-10-01 华为技术有限公司 Method and device of distributing resources
WO2017092525A1 (en) * 2015-11-30 2017-06-08 中兴通讯股份有限公司 Resource allocation method and device

Family Cites Families (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2189861C (en) * 1995-03-31 2000-01-25 Louis H. Linneweh, Jr. Method and apparatus for allocating communication resources to support priority communications in a communication system
US6081513A (en) * 1997-02-10 2000-06-27 At&T Corp. Providing multimedia conferencing services over a wide area network interconnecting nonguaranteed quality of services LANs
CA2297994A1 (en) * 2000-02-04 2001-08-04 Ibm Canada Limited-Ibm Canada Limitee Automated testing computer system components
US20030022395A1 (en) * 2001-07-17 2003-01-30 Thoughtbeam, Inc. Structure and method for fabricating an integrated phased array circuit
US7139846B1 (en) * 2003-09-30 2006-11-21 Veritas Operating Corporation Computer system and method for performing low impact backup operations
US7234075B2 (en) * 2003-12-30 2007-06-19 Dell Products L.P. Distributed failover aware storage area network backup of application data in an active-N high availability cluster
US7380711B2 (en) * 2004-07-23 2008-06-03 Checkpoint Systems, Inc. Self-check system and method for protecting digital media
US20080065663A1 (en) * 2005-04-14 2008-03-13 Emc Corporation Reestablishing process context
JP4688617B2 (en) * 2005-09-16 2011-05-25 株式会社日立製作所 Storage control system and method
US7844445B2 (en) * 2005-10-12 2010-11-30 Storage Appliance Corporation Automatic connection to an online service provider from a backup system
JP4778916B2 (en) * 2007-02-05 2011-09-21 富士通株式会社 QoS request receiving program, QoS request receiving apparatus, and QoS request receiving method
JP4935595B2 (en) * 2007-09-21 2012-05-23 富士通株式会社 Job management method, job management apparatus, and job management program
US20090133029A1 (en) * 2007-11-12 2009-05-21 Srinidhi Varadarajan Methods and systems for transparent stateful preemption of software system
US20100131959A1 (en) * 2008-11-26 2010-05-27 Spiers Adam Z Proactive application workload management
US20100191881A1 (en) * 2009-01-23 2010-07-29 Computer Associates Think, Inc. System and Method for Reserving and Provisioning IT Resources
US8627322B2 (en) * 2010-10-29 2014-01-07 Google Inc. System and method of active risk management to reduce job de-scheduling probability in computer clusters
US8612597B2 (en) * 2010-12-07 2013-12-17 Microsoft Corporation Computing scheduling using resource lend and borrow
WO2013158000A1 (en) * 2012-04-16 2013-10-24 Telefonaktiebolaget L M Ericsson (Publ) Method and radio network node for managing radio resources
WO2014005288A1 (en) * 2012-07-03 2014-01-09 厦门简帛信息科技有限公司 Network system based on digital resources and application method thereof
US8896867B2 (en) * 2012-09-11 2014-11-25 Xerox Corporation Methods and systems for assigning jobs to production devices
US20150154714A1 (en) * 2013-05-08 2015-06-04 Panasonic Intellectual Property Corporation Of America Service provision method
US9448842B1 (en) * 2016-01-29 2016-09-20 International Business Machines Corporation Selecting and resizing currently executing job to accommodate execution of another job
US10169082B2 (en) * 2016-04-27 2019-01-01 International Business Machines Corporation Accessing data in accordance with an execution deadline
US10089144B1 (en) * 2016-06-17 2018-10-02 Nutanix, Inc. Scheduling computing jobs over forecasted demands for computing resources
US20180060133A1 (en) * 2016-09-01 2018-03-01 Amazon Technologies, Inc. Event-driven resource pool management
US10545796B2 (en) * 2017-05-04 2020-01-28 Salesforce.Com, Inc. Systems, methods, and apparatuses for implementing a scheduler with preemptive termination of existing workloads to free resources for high priority items
CN109428912B (en) 2017-08-24 2020-07-10 阿里巴巴集团控股有限公司 Distributed system resource allocation method, device and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011050633A1 (en) * 2009-10-27 2011-05-05 中兴通讯股份有限公司 Method and device for service admitting disposal
CN102761469A (en) * 2011-04-27 2012-10-31 阿里巴巴集团控股有限公司 Allocation method and device for resource pool
CN103294533A (en) * 2012-10-30 2013-09-11 北京安天电子设备有限公司 Task flow control method and task flow control system
CN104079503A (en) * 2013-03-27 2014-10-01 华为技术有限公司 Method and device of distributing resources
WO2017092525A1 (en) * 2015-11-30 2017-06-08 中兴通讯股份有限公司 Resource allocation method and device

Also Published As

Publication number Publication date
CN109428912A (en) 2019-03-05
WO2019037626A1 (en) 2019-02-28
US11372678B2 (en) 2022-06-28
EP3675434A1 (en) 2020-07-01
EP3675434B1 (en) 2023-10-04
US20200192711A1 (en) 2020-06-18
EP3675434A4 (en) 2020-09-02
JP2020531967A (en) 2020-11-05

Similar Documents

Publication Publication Date Title
CN109428912B (en) Distributed system resource allocation method, device and system
CN107025205B (en) Method and equipment for training model in distributed system
US10649806B2 (en) Elastic management of machine learning computing
US10554577B2 (en) Adaptive resource scheduling for data stream processing
US11582166B2 (en) Systems and methods for provision of a guaranteed batch
US8458712B2 (en) System and method for multi-level preemption scheduling in high performance processing
CN106250218B (en) System and method for scheduling tasks using sliding time windows
CN113037538B (en) System and method for local scheduling of low-delay nodes in distributed resource management
CN109564528B (en) System and method for computing resource allocation in distributed computing
US9507633B2 (en) Scheduling method and system
CN107430526B (en) Method and node for scheduling data processing
CN107168777B (en) Method and device for scheduling resources in distributed system
CN115617497A (en) Thread processing method, scheduling component, monitoring component, server and storage medium
CN114371926A (en) Refined resource allocation method and device, electronic equipment and medium
EP2840513B1 (en) Dynamic task prioritization for in-memory databases
CN110175078B (en) Service processing method and device
US8788601B2 (en) Rapid notification system
CN109964206B (en) Device and method for processing tasks
US9710311B2 (en) Information processing system, method of controlling information processing system, and recording medium
Chen et al. Speculative slot reservation: Enforcing service isolation for dependent data-parallel computations
CN113254177B (en) Task submitting method based on cluster, computer program product and electronic equipment
Hung et al. An Optimal Recovery Time Method in Cloud Computing
CN116991618A (en) Information processing method and device
CN115934322A (en) Method, apparatus, device and medium for performing tasks in a computing system
CN115202560A (en) Method, apparatus and computer program product for managing a storage system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant