CN111061553B

CN111061553B - Parallel task scheduling method and system for super computing center

Info

Publication number: CN111061553B
Application number: CN201911296937.7A
Authority: CN
Inventors: 李肯立; 肖雄; 唐卓; 蒋冰婷; 李文; 朱锦涛; 唐小勇; 阳王东; 周旭; 刘楚波; 曹嵘晖
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2019-12-17
Filing date: 2019-12-17
Publication date: 2023-10-10
Anticipated expiration: 2039-12-17
Also published as: CN111061553A

Abstract

The invention discloses a parallel task scheduling method for a super computing center, which provides a realization method based on the use price of a processor in the existing super computing environment and the task parallel execution problem commonly existing in a grid job scheduling system. The invention can fully utilize the existing hardware resources to calculate, and proves the execution efficiency of the method and the reliability of parallel execution of the used scheduling algorithm in operation, and simultaneously ensures the load balance among processors.

Description

Parallel task scheduling method and system for super computing center

Technical Field

The invention belongs to the technical field of high-performance computing of computers, and particularly relates to a parallel task scheduling method and system for a super computing center.

Background

At present, high-performance computing research using computing resources of a supercomputer has gained great popularity in China.

However, most supercomputing centers currently have some non-negligible problems with respect to task scheduling strategies: firstly, due to the inadequacy of task scheduling, the queuing time of the job is too long, and the scheduling efficiency is low; second, because the pricing of using the supercomputer is different from place to place, the job requiring large-scale processor to calculate needs to spend higher price to finish the calculation, thus greatly increasing the cost; thirdly, because the scheduling strategy does not use an effective load balancing strategy, tasks cannot be efficiently scheduled to an idle queue for calculation in a plurality of queues capable of providing calculation, so that the queues with lighter loads are in an idle state, and the queues with heavier loads are in a full-load state, thereby causing serious load imbalance conditions and further forming serious scheduling performance bottlenecks.

Disclosure of Invention

Aiming at the defects or improvement demands of the prior art, the invention provides a parallel task scheduling method and a system for a super computing center, which aim to solve the technical problems that the scheduling strategy used by the existing super computing center is too long in queuing time and low in scheduling efficiency due to the inadequacy of task scheduling, and the tasks requiring a large-scale processor to perform calculation are required to finish the calculation due to different pricing of the super computing center, thereby greatly increasing the technical problem of cost, and further forming the technical problem of serious scheduling performance bottleneck due to the fact that an effective load balancing strategy is not used.

To achieve the above object, according to one aspect of the present invention, there is provided a parallel task scheduling method for a supercomputing center, which is applied to a client, the method comprising the steps of:

(1) Acquiring a text file from a user, wherein the text file records job information to be scheduled, schedulable queue information and server computing capability information;

(2) Preprocessing the obtained text file to obtain a preprocessed text file;

(3) Normalizing all the processing frequencies of the CPU of the server in the computing capacity information of the server, and updating the computing capacity information of the server by using the normalized processing frequencies of the CPU of the server;

(4) Screening the dispatching queue names in the dispatching queue information according to the job information to be dispatched to obtain a screened dispatching queue set;

(5) Calculating the use price of each scheduling queue in the scheduling queue set screened in the step (4) according to the CPU core number required by the operation of the job in the job information to be scheduled,

(6) Setting a counter i=1;

(7) Judging whether i is larger than the total number of jobs corresponding to the job names in the scheduling job information, if so, turning to the step (11), otherwise, turning to the step (8);

(8) Selecting a scheduling queue corresponding to the minimum use price from the standard use prices of the plurality of scheduling queues obtained in the step (5), and scheduling the ith job corresponding to the job name in the job information to be scheduled to the scheduling queue corresponding to the minimum use price for execution;

(9) After the execution of the ith job is completed by the corresponding scheduling queue, updating the predicted CPU core number occupied by the job operation of the scheduling queue in the schedulable queue information;

(10) Setting i=i+1, and returning to step (7).

(11) And storing the number of each job which is already executed, the job name corresponding to the job in the job information to be scheduled, the job global ID corresponding to the job in the job information to be scheduled, the service end name corresponding to the scheduling queue of the service end executing the job in the schedulable queue information, and the scheduling queue name.

Preferably, the job information to be scheduled includes a job global ID, a job name, a job execution required software version, an estimated job execution completion time, and a job execution required CPU core number.

Preferably, the schedulable queue information includes a service end name to which the schedulable queue belongs, a scheduling queue name, a maximum/minimum CPU core number provided by each scheduling queue in the scheduling queue names for job running, a maximum time limit of each scheduling queue in the scheduling queue names for job running, a software name contained in each scheduling queue in the scheduling queue names, a software version contained in each scheduling queue in the scheduling queue names, and a time and cost for using each scheduling queue in the scheduling queue names.

Preferably, the server computing capability information includes a server name, a server available dispatch queue name, and a server CPU processing frequency.

Preferably, step (4) is specifically to search for a scheduling task in the job information to be scheduled, where the scheduling task meets the requirement of the job operation in the job information to be scheduled, the software version in the scheduling queue meets the requirement of the job operation in the job information to be scheduled, the maximum/minimum CPU core number provided by the scheduling queue for the job operation includes the required CPU core number for the job operation in the job information to be scheduled, and the maximum time limit of the scheduling queue on the job operation includes the estimated job operation completion time in the job information to be scheduled, where the scheduling queues meeting the above 4 conditions together form a screened scheduling queue set.

Preferably, step (5) is specifically to firstly query the predicted CPU core number occupied by the operation of the job in the schedulable queue information obtained in step (4) according to each scheduling queue in the screened scheduling queue set, then query the corresponding service end in the schedulable queue information according to the scheduling queue, multiply the time required by the operation of the job by the CPU core number required by the operation of the job, multiply the time-consuming time (hpprice) of each scheduling queue in the scheduling queue name, and finally multiply the service end CPU processing frequency corresponding to the queried service end in the updated service end computing capacity information in step (4), thereby obtaining the standard use price of the scheduling queue.

Preferably, step (9) updates the predicted number of CPU cores that the scheduling queue will occupy for the job to run in the schedulable queue information by subtracting the number of CPU cores that the scheduling queue uses to execute the ith job from the original value.

Preferably, the number of each job that has been executed in step (11) is arranged in the order in which the scheduled queues are executed.

According to another aspect of the present invention, there is provided a parallel task scheduling system for a supercomputing center, which is provided in a client, the parallel task scheduling system comprising:

the first module is used for acquiring a text file from a user, wherein the text file records job information to be scheduled, schedulable queue information and server computing capability information;

the second module is used for preprocessing the obtained text file to obtain a preprocessed text file;

the third module is used for carrying out normalization processing on all the processing frequencies of the CPU of the server in the computing capacity information of the server, and updating the computing capacity information of the server by using the normalized processing frequencies of the CPU of the server;

a fourth module, configured to screen a scheduling queue name in the schedulable queue information according to the job information to be scheduled, so as to obtain a screened scheduling queue set;

a fifth module for calculating the use price of each scheduling queue in the scheduling queue set screened by the fourth module according to the CPU core number required by the operation of the job in the job information to be scheduled,

a sixth module for setting a counter i=1;

a seventh module, configured to determine whether i is greater than a total number of jobs corresponding to the job names in the scheduled job information, and if yes, go to an eleventh module, otherwise go to an eighth module;

an eighth module, configured to select a scheduling queue corresponding to the minimum use price from the standard use prices of the plurality of scheduling queues obtained by the fifth module, and schedule the ith job corresponding to the job name in the job information to be scheduled to the scheduling queue corresponding to the minimum use price for execution;

a ninth module, configured to update, in the schedulable queue information, a predicted CPU core number that the scheduling queue will be occupied by the job operation after the execution of the ith job by the corresponding scheduling queue is completed;

a tenth module, configured to set i=i+1, and return to the seventh module;

an eleventh module, configured to store the number of each job that has been executed, a job name corresponding to the job in the job information to be scheduled, a job global ID corresponding to the job in the job information to be scheduled, a service end name corresponding to the scheduling queue of the service end that executes the job in the schedulable queue information, and a scheduling queue name.

In general, the above technical solutions conceived by the present invention, compared with the prior art, enable the following beneficial effects to be obtained:

(1) The invention adopts the steps (1) to (11), which uses the minimum price priority scheduling strategy based on the super computing center processor to efficiently execute scheduling, and calculates the effective scheduling of the queue execution job with the minimum price based on the standard price calculation formula, so that the technical problems of overlong queuing time and low scheduling efficiency of the job caused by the insufficient task scheduling of the scheduling strategy used by the existing super computing center can be solved;

(2) The invention adopts the steps (4) and (5), which can effectively screen out the queue capable of scheduling and calculate the standard price, and can accurately schedule the job needing large-scale processing to the scheduling queue with the lowest price for execution, thus solving the technical problems that the job needing large-scale processor for calculation cannot be efficiently scheduled to the corresponding scheduling queue for processing with the lowest price in the scheduling strategy used by the existing super computing center, and increasing a large amount of cost;

(3) The invention adopts the steps (5) to (11) and uses the scheduling strategy of load balancing, so that the load balancing among the service ends is well maintained, and the technical problems that the existing super computing center causes serious load unbalance due to the fact that an effective load balancing strategy is not used and forms serious scheduling performance bottleneck can be solved.

Drawings

FIG. 1 is a flow chart of a parallel task scheduling method for a supercomputer center of the present invention;

FIG. 2 is a comparison of the performance of the method of the present invention in terms of average job overhead with the scheduling policy used by existing supercomputer centers.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

The basic idea of the invention is that a calculation method for scheduling task priority based on the lowest price of a super computing center processor is adopted to make a final task-to-processor mapping decision, the calculation method stores all analyzed data respectively, the use price of each job to be scheduled on a queue is calculated, the obtained plurality of binary data are ordered, the data with the lowest use price are obtained to execute priority scheduling, and the resource information is updated periodically after the data are scheduled on the corresponding queue, so as to ensure the resource data certainty of each queue when the rest of the jobs are scheduled. By executing the scheme, higher performance and better load balancing effect are realized, and the cost is reduced.

As shown in fig. 1, the present invention provides a parallel task scheduling method for a supercomputing center, comprising the following steps:

(1) The method comprises the steps that a client acquires a text file from a user, wherein Job to be scheduled (Job) information, schedulable Queue (Queue) information and server computing capability information are recorded in the text file;

specifically, job information to be scheduled includes a job global ID (Jobgid), a job name (Username), a job execution required software name (application name), a job execution required software version (application version), an estimated job execution completion time (Walltime), and a job execution required CPU core number (CPU count).

As shown in table 1 below, which shows an example of job information to be scheduled:

TABLE 1

The schedulable queue information includes a service end name (Servername) to which the schedulable queue belongs, a schedulable queue name (queue name), a maximum/minimum CPU core number (Max/mincpu count) provided by each of the schedulable queues for job execution, a maximum time limit (walltimeelimit) for job execution by each of the schedulable queues in the schedulable queue names, a software name (applications) included in each of the schedulable queues in the schedulable queue names, a software version (applications) included in each of the schedulable queues in the schedulable queue names, and a time-to-use cost (hpcPrice) of each of the schedulable queues in the schedulable queue names.

As shown in table 2 below, which illustrates an example of schedulable queue information:

TABLE 2

The server computing capability information includes a server name (Servername), a server available dispatch queue name (queue names), and a server CPU processing Frequency (Frequency).

As shown in table 3 below, which shows an example of server-side computing capability information:

service end name 1	Dispatch queue name 1	CPU processing frequency 1
			Service end name 2	Dispatch queue name 2	CPU processing frequency 2
……	……	……
			Service end name n	Dispatch queue name n	CPU processing frequency n

TABLE 3 Table 3

(2) The client preprocesses the obtained text file to obtain a preprocessed text file;

specifically, the text files are preprocessed, i.e., redundant symbols (such as brackets, double quotation marks, colon marks, etc.) contained in the files are removed.

(3) The client normalizes all the processing frequencies (frequencies) of the CPU of the server in the computing capacity information of the server, and updates the computing capacity information of the server by using the normalized processing frequencies of the CPU of the server;

(4) The client screens the scheduling queue names in the schedulable queue information according to the job information to be scheduled to obtain a screened scheduling queue set;

specifically, the method searches for a scheduling task in the job information to be scheduled, wherein the software name (application names) contained in the scheduling queue and the software name (application names) required by the job operation in the job information to be scheduled are met, the software version (application version) contained in the scheduling queue and the software version (application version) required by the job operation in the job information to be scheduled are met, the maximum/minimum CPU core number (Max/MinCPUcount) provided by the scheduling queue for the job operation contains the CPU core number (CPU core) required by the job operation in the job information to be scheduled, and the maximum time limit (Walltimelimit) of the scheduling queue to the job operation contains the scheduling task of the estimated job operation completion time (Walltime) in the job information to be scheduled, and the scheduling queues meeting the 4 conditions together form a screened scheduling queue set.

(5) The client calculates the use price of each scheduling queue in the scheduling queue set screened in the step (4) according to the CPU core number (CPU count) required by the operation of the job in the job information to be scheduled,

specifically, the step includes firstly inquiring the predicted CPU core number occupied by the operation of the scheduling queue in the schedulable queue information obtained in the step (4) according to each scheduling queue in the screened scheduling queue set, then inquiring the corresponding service end in the schedulable queue information according to the scheduling queue, then multiplying the required CPU core number (CPU count) for the operation of the operation by the required time (Walltime), multiplying the required time (hpccce) for the operation of the operation by the time of each scheduling queue in the scheduling queue name, and finally multiplying the service end CPU processing frequency corresponding to the inquired service end in the updated service end computing capacity information in the step (4), thereby obtaining the standard use price of the scheduling queue.

(6) The client sets a counter i=1;

(7) The client judges whether i is larger than the total number of the jobs corresponding to the job names in the scheduling job information, if yes, the step (11) is carried out, otherwise, the step (8) is carried out;

(8) The client selects a scheduling queue corresponding to the minimum use price from the standard use prices of the plurality of scheduling queues obtained in the step (5), and schedules the ith job corresponding to the job name in the job information to be scheduled to the scheduling queue corresponding to the minimum use price for execution;

(9) After the execution of the ith job by the corresponding scheduling queue is completed, the client updates the predicted CPU core number occupied by the operation of the job in the scheduling queue in the schedulable queue information (namely subtracting the CPU core number used by the scheduling queue to execute the ith job from the original value);

(10) The client sets i=i+1, and returns to step (7).

(11) The client saves the number of each job which is already executed (which is arranged according to the execution sequence of the scheduled queue), the job name corresponding to the job in the job information to be scheduled, the job global ID corresponding to the job in the job information to be scheduled, the service end name corresponding to the scheduling queue of the service end executing the job in the schedulable queue information, and the scheduling queue name.

Performance testing

The present invention is compared to the existing scheduling algorithm (min-min algorithm) by calculating the average job overhead.

As shown in fig. 2, the abscissa indicates the time when a job is submitted, the ordinate indicates the calculated price of a scheduled task, the price (Cost) is calculated as the unit price of the time used multiplied by the unit time, the time used is represented as [ user job end time-actual start operation time ]. The number of CPU cores applied, and the lower the calculated value is, the lower the calculated price is, the job is scheduled preferentially. As is evident from fig. 2, the average job overhead of the method of the present invention (shown in the figure as ACFS algorithm, which is collectively referred to as application-aware price-first scheduling algorithm, i.e., application Cost First Scheduling) is better than that of the existing min-min algorithm, since the method of the present invention always schedules tasks to be scheduled to the scheduling queue with the lowest price to ensure low overhead execution of the overall calculation.

In order to ensure better low-overhead scheduling and load balancing among processors, a calculation method for scheduling tasks with lowest price based on a queue standard is adopted to make a final task-to-processor mapping decision, the calculation method stores all analyzed data respectively, calculates standard use prices of each job to be scheduled on the queue, sorts a plurality of binary data obtained, acquires data with the lowest standard use price to execute priority scheduling, and periodically updates resource information after the data is scheduled on the corresponding queue so as to ensure the resource data certainty of each queue in the rest job scheduling. By executing the scheme, higher performance and better load balancing effect are realized, and the cost is reduced.

The invention provides a parallel task scheduling method based on the minimum-price priority scheduling of the super computing center processor, which plays a key role in maintaining the load balancing performance and reducing the cost, and also improves the overall parallel efficiency.

It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A parallel task scheduling method for a supercomputing center, which is applied to a client, characterized in that the method comprises the following steps:

(1) Acquiring a text file from a user, wherein the text file records job information to be scheduled, schedulable queue information and server computing capability information; the job information to be scheduled comprises a job global ID, a job name, a software name required by job operation, a software version required by job operation, estimated job operation completion time and CPU core number required by job operation, the schedulable queue information comprises a service end name, a scheduling queue name, a maximum/minimum CPU core number provided by each scheduling queue in the scheduling queue names for job operation, a maximum time limit of each scheduling queue in the scheduling queue names to job operation, a software name contained in each scheduling queue in the scheduling queue names, a software version contained in each scheduling queue in the scheduling queue names and a time and time consumption of each scheduling queue in the scheduling queue names, and the service end computing capability information comprises the service end name, the scheduling queue names available by the service end and the CPU processing frequency of the service end;

(2) Preprocessing the obtained text file to obtain a preprocessed text file;

(4) Screening the dispatching queue names in the dispatching queue information according to the job information to be dispatched to obtain a screened dispatching queue set; searching for a scheduling task which simultaneously meets the requirement of the operation in the operation information to be scheduled, wherein the software name is contained in the operation information to be scheduled, the software version is contained in the operation information to be scheduled, the maximum/minimum CPU core number provided by the operation queue for the operation is contained in the operation information to be scheduled, the maximum time limit of the operation of the scheduling queue for the operation is contained in the estimated operation completion time of the operation information to be scheduled, and the scheduling queues which simultaneously meet the 4 conditions form a screened scheduling queue set;

(5) Calculating the use price of each scheduling queue in the scheduling queue set screened in the step (4) according to the CPU core number required by the operation of the job in the job information to be scheduled; the step (5) is that firstly, according to each scheduling queue in the screened scheduling queue set, the predicted CPU core number occupied by the operation of the scheduling queue is inquired in the scheduling queue information obtained in the step (4), then the corresponding service end is inquired in the scheduling queue information according to the scheduling queue, then the CPU core number required by the operation of the operation is multiplied by the time required by the operation, then the time and time consumption of each scheduling queue in the scheduling queue name are multiplied, and finally the service end CPU processing frequency corresponding to the inquired service end in the updated service end computing capacity information in the step (4) is multiplied, so that the standard use price of the scheduling queue is obtained;

(6) Setting a counter i=1;

(10) Setting i=i+1, and returning to step (7);

2. The parallel task scheduling method according to claim 1, wherein step (9) updates the predicted CPU core number that the scheduling queue will occupy by the operation of the job in the schedulable queue information by subtracting the CPU core number used by the scheduling queue to execute the i-th job from the original value.

3. The parallel task scheduling method according to claim 2, wherein the number of each of the jobs that have been executed in step (11) is arranged in the order of execution of the scheduled queues.

4. A parallel task scheduling system for a supercomputer center, provided in a client, characterized in that the parallel task scheduling system comprises:

the first module is used for acquiring a text file from a user, wherein the text file records job information to be scheduled, schedulable queue information and server computing capability information; the job information to be scheduled comprises a job global ID, a job name, a software name required by job operation, a software version required by job operation, estimated job operation completion time and CPU core number required by job operation, the schedulable queue information comprises a service end name, a scheduling queue name, a maximum/minimum CPU core number provided by each scheduling queue in the scheduling queue names for job operation, a maximum time limit of each scheduling queue in the scheduling queue names to job operation, a software name contained in each scheduling queue in the scheduling queue names, a software version contained in each scheduling queue in the scheduling queue names and a time and time consumption of each scheduling queue in the scheduling queue names, and the service end computing capability information comprises the service end name, the scheduling queue names available by the service end and the CPU processing frequency of the service end;

a fourth module, configured to screen a scheduling queue name in the schedulable queue information according to the job information to be scheduled, so as to obtain a screened scheduling queue set; the fourth module is specifically to search for a scheduling task which simultaneously meets the requirement of the operation of the job in the job information to be scheduled, wherein the software name contained in the scheduling queue is consistent with the requirement of the operation of the job in the job information to be scheduled, the software version contained in the scheduling queue is consistent with the requirement of the operation of the job in the job information to be scheduled, the maximum/minimum CPU core number provided by the scheduling queue for the operation of the job contains the CPU core number required by the operation of the job in the job information to be scheduled, and the maximum time limit of the scheduling queue for the operation of the job contains the estimated operation completion time of the job in the job information to be scheduled, and the scheduling queues which simultaneously meet the 4 conditions together form a screened scheduling queue set;

the fifth module is used for calculating the use price of each scheduling queue in the scheduling queue set screened by the fourth module according to the number of CPU cores required by the operation of the job to be scheduled, and specifically, the fifth module firstly inquires the predicted number of CPU cores occupied by the operation of the scheduling queue in the schedulable queue information obtained by the fourth module according to each scheduling queue in the screened scheduling queue set, then inquires the corresponding service end in the schedulable queue information according to the scheduling queue, then multiplies the time required by the operation of the job by the number of CPU cores required by the operation of the job, multiplies the time and time of use of each scheduling queue in the scheduling queue name, and finally multiplies the service end CPU processing frequency corresponding to the inquired service end in the service end computing capacity information updated by the fourth module, so that the standard use price of the scheduling queue is obtained;

a sixth module for setting a counter i=1;

a tenth module, configured to set i=i+1, and return to the seventh module;