CN114780213A - Resource scheduling method, system and storage medium for high-performance computing cloud platform - Google Patents

Resource scheduling method, system and storage medium for high-performance computing cloud platform Download PDF

Info

Publication number
CN114780213A
CN114780213A CN202210297756.1A CN202210297756A CN114780213A CN 114780213 A CN114780213 A CN 114780213A CN 202210297756 A CN202210297756 A CN 202210297756A CN 114780213 A CN114780213 A CN 114780213A
Authority
CN
China
Prior art keywords
application
instance
computing
performance
types
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210297756.1A
Other languages
Chinese (zh)
Inventor
冯建新
李青松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Beikun Cloud Computing Co ltd
Original Assignee
Shenzhen Beikun Cloud Computing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Beikun Cloud Computing Co ltd filed Critical Shenzhen Beikun Cloud Computing Co ltd
Priority to CN202210297756.1A priority Critical patent/CN114780213A/en
Publication of CN114780213A publication Critical patent/CN114780213A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5011Pool
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a resource scheduling method, a resource scheduling system and a storage medium for a high-performance computing cloud platform, wherein running characteristic data of an application designated by a user for computing when a computing task is submitted by the high-performance computing cloud platform and hardware related parameters of a designated resource specification submission job are obtained; screening out the case types meeting the conditions from a database containing the case types of the regions of various cloud manufacturers according to the acquired operation characteristic data of the application and the relevant parameters of the hardware; sequencing the example types to obtain a sequenced example type list; resources are scheduled from cloud manufacturers according to the screened and sequenced example type list, example types are automatically selected through a resource scheduling method, the calculation force of global cloud manufacturers is taken as a resource pool, the optimal calculation resources are selected according to the operation characteristics, the flexibility of resource selection is improved, the condition that the example type inventory is insufficient can be effectively avoided, and the success rate of resource scheduling is ensured.

Description

Resource scheduling method, system and storage medium for high-performance computing cloud platform
Technical Field
The invention relates to the field of high-performance computing, in particular to a resource scheduling method, a resource scheduling system and a storage medium for a high-performance computing cloud platform.
Background
Conventional High Performance Computing (HPC) resource scheduling methods use the concept of partition partitioning or queue, hereinafter referred to collectively as queues, which essentially describe a set of nodes having the same or similar performance characteristics that are used to specify the corresponding computing resource specification when a job is submitted. The CLOUD-HPC follows the naming convention, and uses the concept of queue to represent the set of instance types with the same hardware specification in a specific region of a specific CLOUD vendor, and when a job is submitted, the queue needs to be specified, and in fact, the information of computing resources required by the job, including the CLOUD vendor, the region and the used instance type, is specified.
The queue usually contains the following information, such as resource types: CPU/GPU, core number or card number, memory size, cloud manufacturer, region and instance type information.
Currently, a High Performance Computing (HPC) job scheduling system is mainly composed of a job management submodule and a resource management submodule, wherein the job management submodule is responsible for submitting and managing jobs, and the resource management submodule is responsible for allocating computing resources for job computing. The related configuration information of the queue is maintained in the resource management submodule, including the above-mentioned information of cloud manufacturer, region and example type, when the user submits the job through the job management submodule, the user needs to specify the queue used by the job, and the job management system sends the queue to the resource management submodule, which creates an elastic computing cluster by using the specified example type according to the related configuration information of the queue to the specified cloud manufacturer and region, but the system has the following disadvantages:
1. cannot be extended to other instance types. The corresponding relation between the queue and the instance type is pre-specified during cluster configuration, and cannot be dynamically adjusted during operation. When the instance type is in short stock, the creation of the computing cluster fails, and different queues are used for running the jobs by other instance types.
2. Cannot be extended to other regions. Due to the tight coupling relationship between queues and regions in the existing scheme, when the computing resources required by the operation queues and the operation are not in the same region or the resource quantity of the regions where the queues are located is insufficient, the region limitation of the scheduling mode is exposed, so that the resources cannot be fully scheduled by computing, and the computing efficiency is influenced.
3. Cannot be extended to other cloud vendors. Similar to regional limitations, binding a queue with a cloud vendor will cause submitted jobs to be limited to the resources of the corresponding cloud vendor. If the method is not limited by cloud manufacturers, the increased resource richness can relieve the condition of job queuing or failure caused by resource shortage to a certain extent.
4. The optimum cost performance cannot be obtained. Because the example types provided by different cloud manufacturers and regions have larger difference in hardware specification, performance and price, if the scheduling operation and the cloud manufacturers are decoupled, multiple factors such as hardware configuration, price and the like can be integrated, and the computing efficiency and the cost performance are improved by selecting the most appropriate computing resources.
Disclosure of Invention
The invention mainly aims to provide a resource scheduling method, a resource scheduling system and a storage medium for a high-performance computing cloud platform, which do not use the concept of queues, do not need to statically configure instance types in advance, but comprehensively consider factors such as resource richness, resource cost performance, application running characteristics and the like according to the specified resource specification and task quantity information submitted by a job, automatically select the instance types through a resource scheduling method, and accordingly take the computing power of global cloud manufacturers as a resource pool and select optimal computing resources according to the characteristics of the job.
In order to achieve the above object, the present invention provides a resource scheduling method for a high performance computing cloud platform, the method including the following steps:
acquiring running characteristic data of an application designated for computing by a user when a computing task is submitted on a high-performance computing cloud platform, and hardware related parameters of a designated resource specification submission job;
screening out the instance types meeting the conditions from a database containing the region instance types of various cloud manufacturers according to the acquired running characteristic data of the application and the related parameters of the hardware;
sorting the instance types to obtain a sorted instance type list;
and scheduling resources from the cloud manufacturer according to the screened and sequenced example type list.
The step of acquiring the running characteristic data of the application designated by the user for computing when the high-performance computing cloud platform submits the computing task comprises the following steps:
acquiring an application ID appointed by a user for computing when a computing task is submitted on a high-performance computing cloud platform;
querying a database containing application running characteristics according to the application ID to obtain Json data containing application information and the application running characteristics, wherein the application running characteristics at least comprise one of the following characteristics: single or double precision, computing coupling, high master frequency, large memory, network I/O, disk I/O, CPU instruction set.
The step of screening out the qualified instance types from the database containing the cloud manufacturer region instance types according to the acquired operation characteristic data and hardware related parameters of the application comprises the following steps:
according to the acquired resource specification of the hardware related parameters of the application, inquiring from a database containing instance type information to obtain an instance type list meeting the conditions;
and further screening the inquired instance type list according to the acquired running characteristic data of the application to obtain an available instance type list.
The step of sorting the instance types to obtain a sorted list of instance types includes:
calculating each index weight of the instance type in the available instance type list according to the running characteristics of the application;
calculating a comprehensive score of the instance type according to the index weight and the ranking of the index of the instance type in the instance type list;
and sorting according to the comprehensive scores of the instance types to obtain a sorted instance type list.
Wherein, the step of calculating the index weights of the instance types in the available instance type list according to the running characteristics of the application comprises:
the rule for calculating the weight adopts a strategy that calculation efficiency is prior rather than cost performance, or adjusts the value of weight calculation.
Wherein the hardware-related parameters include: asset type-type: node minimum core number-core.
Wherein the list of eligible instance types includes at least one of: cloud manufacturer and region, instance type, CPU core number, GPU card number, memory size, network IO performance, disk IO performance, payment type and price, instruction set, CPU master frequency, single-precision floating-point computing performance and double-precision floating-point computing performance.
The invention also provides a high-performance computing cloud platform resource scheduling system, which comprises: a memory having stored thereon a computer program which, when executed by the processor, carries out the steps of the method as described above.
The invention also proposes a computer storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method as described above.
Compared with the prior art, the resource scheduling method of the high-performance cloud computing platform does not use the concept of queues, does not need to statically configure the instance types in advance, but automatically selects the instance types by taking the computing power of global cloud manufacturers as the resource pool and selecting the optimal computing resources according to the characteristics of the operation according to the factors of resource richness, resource cost ratio, application operation characteristics and the like comprehensively in consideration of the specified resource specification and task quantity information submitted by the operation.
Specifically, compared with the traditional HPC job scheduling method, the high-performance computing cloud platform computing resource scheduling method of the present invention mainly has the following differences: the concept of application running characteristics is introduced firstly, the application is appointed to obtain the application running characteristics before the operation is submitted, and the example type with better performance-price ratio can be obtained after the application running characteristics are screened and sorted. And secondly, the concept of a queue is not used any more, the example type does not need to be configured statically in advance, cloud manufacturers, regions and example types are dynamically selected after a series of screening and sorting according to the specified resource specification, task quantity information and application operation index submitted by the operation, the example types are dynamically selected, the flexibility of resource selection is greatly improved, resources of a plurality of cloud manufacturers in a plurality of regions can be used simultaneously, the condition of insufficient inventory of the example types can be effectively avoided, and the success rate of resource scheduling is ensured. Finally, unique index weight and score calculation rules are used for selecting the instance types, and the weight of each index can be appointed in the sorting process so as to calculate the instance type with the optimal cost performance.
Drawings
FIG. 1 is a schematic flow chart of a resource scheduling method for a high-performance computing cloud platform according to the present invention;
FIG. 2 is a schematic diagram of the overall process of resource scheduling for a high performance computing cloud platform according to the present invention;
FIG. 3 is a flow chart illustrating the type of examples that may be used in the screening of the present invention;
FIG. 4 is a flow chart illustrating the sorting of example types according to the present invention.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
Referring to fig. 1 to 4, the present invention provides a resource scheduling method for a high performance computing cloud platform, including the following steps:
s10, acquiring running characteristic data of an application designated by a user for computing when a computing task is submitted on the high-performance computing cloud platform, and hardware related parameters of a designated resource specification submission job;
s20, screening out instance types meeting the conditions from a database containing the region instance types of various cloud manufacturers according to the acquired running characteristic data of the application and the relevant parameters of the hardware;
s30, sorting the instance types to obtain a sorted instance type list;
and S40, scheduling resources from the cloud manufacturer according to the screened and sequenced example type list.
The step of acquiring the running characteristic data of the application designated by the user for computing when the high-performance computing cloud platform submits the computing task comprises the following steps:
acquiring an application ID appointed by a user for computing when a computing task is submitted on a high-performance computing cloud platform;
querying a database containing application running characteristics according to the application ID to obtain Json data containing application information and the application running characteristics, wherein the application running characteristics at least comprise one of the following characteristics: single or double precision, computational coupling, high host frequency, large memory, network I/O, disk I/O, CPU instruction set.
The step of screening out the qualified instance types from the database containing the cloud manufacturer region instance types according to the acquired operation characteristic data and hardware related parameters of the application comprises the following steps:
according to the acquired resource specification of the hardware related parameters of the application, inquiring from a database containing instance type information to obtain an instance type list meeting the conditions;
and further screening the inquired instance type list according to the acquired running characteristic data of the application to obtain an available instance type list.
The step of sorting the instance types to obtain a sorted list of instance types includes:
calculating each index weight of the instance type in the available instance type list according to the running characteristics of the application;
calculating a comprehensive score of the instance type according to the index weight and the ranking of the index of the instance type in the instance type list;
and sorting according to the comprehensive scores of the example types to obtain a sorted example type list.
Wherein, the step of calculating the index weights of the instance types in the available instance type list according to the running characteristics of the application comprises:
the rule for calculating the weight adopts a strategy that calculation efficiency is prior rather than cost performance, or adjusts the value of weight calculation.
Wherein the hardware-related parameters include: asset type-type: node minimum core number-core.
Wherein the list of eligible instance types includes at least one of: cloud manufacturer and region, instance type, CPU core number, GPU card number, memory size, network IO performance, disk IO performance, payment type and price, instruction set, CPU master frequency, single-precision floating-point computing performance and double-precision floating-point computing performance.
Compared with the traditional HPC job scheduling method, the high-performance computing cloud platform computing resource scheduling method mainly has the following differences: the concept of application running characteristics is introduced firstly, the application is appointed to obtain the application running characteristics before the operation is submitted, and the example type with better performance-price ratio can be obtained after the application running characteristics are screened and sorted. And secondly, the concept of a queue is not used, the instance type does not need to be configured statically in advance, the cloud manufacturers, the regions and the instance type are selected dynamically after a series of screening and sequencing according to the specified resource specification, task quantity information and application operation indexes submitted by the operation, and the instance type is selected dynamically, so that the flexibility of resource selection is greatly improved, the resources of a plurality of cloud manufacturers in a plurality of regions can be used simultaneously, the condition of insufficient inventory of the instance type can be effectively avoided, and the success rate of resource scheduling is ensured. Finally, unique index weight and score calculation rules are used for selecting the example types, and the weight of each index can be appointed in the sorting process so as to calculate the example type with the optimal cost performance.
The scheme of the invention is explained in detail as follows:
the invention realizes a resource scheduling method of a high-performance cloud computing platform, which does not use the concept of a queue, does not need to statically configure instance types in advance, but submits the specified resource specification and task quantity information according to the operation, comprehensively considers the factors such as resource richness, resource cost performance, application running characteristics and the like, and automatically selects the instance types through the resource scheduling method, thereby taking the computing power of global cloud manufacturers as a resource pool and selecting the optimal computing resources according to the characteristics of the operation.
By means of the integrated cloud manufacturer instance type metadata, the method mainly performs resource comparison and selection from two aspects. On one hand, based on the running characteristic data of the application, including information such as hardware preference and cluster node coupling degree, screening and sorting are carried out according to the running characteristics of the application, on the other hand, screening is carried out according to hardware requirements when a user submits a job, and a proper instance type with high cost performance is obtained after screening and sorting of multiple dimensions. According to the scheduling method of the high-performance computing cloud platform resource, the defect that the high-performance computing cloud platform resource scheduling method cannot be expanded to other cloud manufacturers, regions and example types can be effectively overcome, and meanwhile, an optimal solution can be obtained in the aspect of cost performance.
Specifically, the resource scheduling method of the present invention completes scheduling of resources by 5 steps, as shown in fig. 2:
the detailed flow of the 5 steps is described in detail below, which specifically includes:
s1: the specified application obtains application running characteristics. When a user submits a computing task on a high-performance computing cloud platform, the user first needs to specify an application for computing, where the application may be a commercial or open source software such as Ansys series software, AlphaFold, tensrflow, or the like, or may be an application program of the user. The use of different hardware resources for these applications has a significant impact on computational efficiency,
for example, when a single RTX3090 graphics card or TESLA V100 graphics card is used, since 3090 performs far better than V100 in the case of performing only single-precision high-performance calculation of FP32, but performs far lower than V100 in the case of performing double-precision calculation of FP64, it is very important whether the application depends on single-precision or double-precision floating point calculation, which is an operation feature of the application, and if the scheduled resource satisfies the operation feature of the application, the calculation efficiency can be greatly increased.
After a user designates an application for calculation, a database containing application running characteristics is queried according to an application ID, if relevant information is not found, the application does not need to screen resources through the running characteristics, if the application can be queried through the application ID, a returned result is Json data containing application information and application characteristics, and the running characteristics of the application at least comprise one of the following characteristics:
single or double precision; calculated coupling: calculating the interdependence degree of the nodes; high dominant frequency; a large memory; network I/O; disk I/O; a CPU instruction set; ......
S2: and appointing a resource specification to submit the operation. When a user submits a job on a high-performance computing cloud platform, the user needs to specify hardware related parameters besides parameters such as the number of tasks, the core number of each task and the like, for example, the following parameters need to be specified when submitting the job in a command line:
resource type-type: the designation is a CPU or GPU resource;
node minimum core number-core: the number of CPU cores or CPU cards needed on a single machine on which the task runs.
In one possible implementation, the existing HPC job management system may be extended to support this information, such as extending SLURM's sbatch to submit a job sbatch-n 1-type CPU-core 16job.
S3: the available instance types are screened. After the application operation features and the hardware-related parameters obtained in step S1 and step S2 are obtained, an instance type meeting the conditions is screened from a database including the respective cloud manufacturer region instance types according to the information as a limiting condition, and fig. 3 is a specific screening process of step S3, and as shown in fig. 3, the specific screening process specifically includes:
and S3-1, inquiring from the database containing the example type information according to the resource type and the minimum node core number specified by the user in the step S2 to obtain an example type list meeting the conditions, wherein the screened example types all meet the basic requirements of job calculation. The screened example type results contain the following information: cloud manufacturer and region, instance type, CPU core number, GPU card number, memory size, network IO performance, disk IO performance, payment type and price, instruction set, CPU master frequency, single-precision floating-point computing performance, double-precision floating-point computing performance.
S3-2: and (4) further screening the list of the instance types inquired out in the step S3-1 according to the running characteristics of the application in the step S1, wherein the running characteristics of the application can be classified into 2 types, the first type is a mandatory requirement, for example, the application must depend on a CPU instruction set of an intel, then the screened instance types must use the intel instruction set, and the second type is an optional requirement, for example, the CPU has higher calculation efficiency when the CPU has high main frequency. The filtering in this step is performed for the first type of operation features, so that the situation that the acquired instance type cannot be finally applied to calculation is avoided.
S3-3: and returning to the filtered example list of S3-2.
S4: the instance types are ordered. After the filtering of step S3, if there is no instance type that is met, the resource scheduling will fail directly. If a set of available instance types is successfully obtained, the set of instance types needs to be sorted by priority and finally used for purchasing cloud vendor resources.
The specific steps of the sorting are shown in fig. 4, and specifically include:
s4-1: the available instance type list returned in step S3 already contains information such as the price of the instance type, various hardware indicators, and the like, and a weight division needs to be performed on the effective indicators of the instance type according to the application running characteristics contained in the current scheduling, so as to determine the priority of the instance type. Calculating the percentage of each index such as price, core number, dominant frequency and the like in weight according to a specific algorithm, for example:
1. the weight ratio of the prices is not less than 70%: the cost performance is prior;
2. the application run characteristics other than price bisect the remaining weights, with the price weight accounting for 100% when no run characteristics are specified for the application.
S4-2: and calculating the comprehensive score of the instance type according to the index weight calculated by the S4-1 and the ranking of the index of the instance type in the instance list. Examples as shown in the table below, the active application run characteristics are high dominant frequency, price weight 0.7, and dominant frequency weight 0.3.
Instance type Price/ranking Triple frequency/sequencing Composite score (rank 1 is 100 points, rank 2 is 50 points, rank 3 is 0 points)
A 10¥/1 2.8GHz/3 100*0.7+0*0.3=70
B 15¥/2 3.2Ghz/2 50*0.7+50*0.3=50
C 25¥/3 3.5Ghz/1 0*0.7+100*0.3=30
S4-3: and returning to the example list after sorting according to the example type comprehensive scores.
And S5, purchasing resources from the cloud manufacturer according to the screened and sorted example type list, and if the example type with the highest comprehensive score is insufficient in stock, purchasing the next-ranked example type until enough resources are purchased. And if the finally purchased resources do not reach the quantity required by the operation, all the nodes with failed resource scheduling are returned to the cloud manufacturer.
Compared with the traditional HPC job scheduling method, the high-performance computing cloud platform computing resource scheduling method mainly has the following differences: the concept of application running characteristics is introduced firstly, the application is appointed to obtain the application running characteristics before the operation is submitted, and the example type with better performance-price ratio can be obtained after the application running characteristics are screened and sorted. And secondly, the concept of a queue is not used any more, the example type does not need to be configured statically in advance, cloud manufacturers, regions and example types are dynamically selected after a series of screening and sorting according to the specified resource specification, task quantity information and application operation index submitted by the operation, the example types are dynamically selected, the flexibility of resource selection is greatly improved, resources of a plurality of cloud manufacturers in a plurality of regions can be used simultaneously, the condition of insufficient inventory of the example types can be effectively avoided, and the success rate of resource scheduling is ensured. Finally, unique index weight and score calculation rules are used for selecting the example types, and the weight of each index can be appointed in the sorting process so as to calculate the example type with the optimal cost performance.
It should be noted that the application operation index database and the cloud vendor instance type metadata database used in the resource scheduling algorithm of the present invention may also be provided in other alternative manners, such as being provided in a service manner, and may also be replaced by querying cloud vendor data in real time.
In addition, when the example types are sequenced, the weights of all indexes need to be determined firstly, and the rule for calculating the weights may adopt a strategy that the calculation efficiency is prior rather than the cost performance is prior, or the numerical value calculated by the weights is adjusted to a certain extent.
The invention also provides a high-performance computing cloud platform resource scheduling system, which comprises: a memory having stored thereon a computer program which, when executed by the processor, carries out the steps of the method as described above.
The invention also proposes a computer storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method as described above.
Compared with the prior art, the resource scheduling method of the high-performance cloud computing platform does not use the concept of queues, does not need to statically configure the instance types in advance, but automatically selects the instance types by taking the computing power of global cloud manufacturers as the resource pool and selecting the optimal computing resources according to the characteristics of the operation according to the factors of resource richness, resource cost ratio, application operation characteristics and the like comprehensively in consideration of the specified resource specification and task quantity information submitted by the operation.
Specifically, compared with the traditional HPC job scheduling method, the high-performance computing cloud platform computing resource scheduling method of the present invention mainly has the following differences: the concept of application running characteristics is introduced firstly, the application is appointed to obtain the application running characteristics before the operation is submitted, and the example type with better performance-price ratio can be obtained after the application running characteristics are screened and sorted. And secondly, the concept of a queue is not used any more, the example type does not need to be configured statically in advance, cloud manufacturers, regions and example types are dynamically selected after a series of screening and sorting according to the specified resource specification, task quantity information and application operation index submitted by the operation, the example types are dynamically selected, the flexibility of resource selection is greatly improved, resources of a plurality of cloud manufacturers in a plurality of regions can be used simultaneously, the condition of insufficient inventory of the example types can be effectively avoided, and the success rate of resource scheduling is ensured. Finally, unique index weight and score calculation rules are used for selecting the instance types, and the weight of each index can be appointed in the sorting process so as to calculate the instance type with the optimal cost performance.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all equivalent structures or equivalent processes performed by the present invention or directly or indirectly applied to other related technical fields are also included in the scope of the present invention.

Claims (9)

1. A resource scheduling method for a high-performance computing cloud platform is characterized by comprising the following steps:
acquiring running characteristic data of an application designated for computing by a user when a computing task is submitted on a high-performance computing cloud platform, and hardware related parameters of a designated resource specification submission job;
screening out the instance types meeting the conditions from a database containing the region instance types of various cloud manufacturers according to the acquired running characteristic data of the application and the related parameters of the hardware;
sorting the instance types to obtain a sorted instance type list;
and scheduling resources from the cloud manufacturer according to the screened and sequenced example type list.
2. The method of claim 1, wherein the step of obtaining running characteristic data of an application specified for computing by a user when a computing task is submitted by the high-performance computing cloud platform comprises:
acquiring an application ID appointed by a user for computing when a computing task is submitted on a high-performance computing cloud platform;
querying a database containing application running characteristics according to the application ID to obtain Json data containing application information and the application running characteristics, wherein the application running characteristics at least comprise one of the following characteristics: single or double precision, computing coupling, high master frequency, large memory, network I/O, disk I/O, CPU instruction set.
3. The method according to claim 1, wherein the step of screening out eligible instance types from a database containing instance types of respective cloud vendor domains according to the obtained operation feature data and hardware-related parameters of the application comprises:
according to the acquired resource specification of the hardware related parameters of the application, inquiring from a database containing instance type information to obtain an instance type list meeting the conditions;
and further screening the inquired instance type list according to the acquired running characteristic data of the application to obtain an available instance type list.
4. The method of claim 3, wherein the step of sorting the instance types to obtain a sorted list of instance types comprises:
calculating each index weight of the instance type in the available instance type list according to the running characteristics of the application;
calculating a comprehensive score of the instance type according to the index weight and the ranking of the index of the instance type in the instance type list;
and sorting according to the comprehensive scores of the instance types to obtain a sorted instance type list.
5. The method according to claim 4, wherein the step of calculating index weights of instance types in the list of available instance types according to the running characteristics of the application comprises:
the rule for calculating the weight adopts a strategy that calculation efficiency is prior rather than cost performance, or adjusts the value of weight calculation.
6. The method of claim 1, wherein the hardware-related parameters comprise: asset type-type: node minimum core number-core.
7. The method of claim 3, wherein the list of eligible instance types comprises at least one of: cloud manufacturer and region, instance type, CPU core number, GPU card number, memory size, network IO performance, disk IO performance, payment type and price, instruction set, CPU master frequency, single-precision floating-point computing performance and double-precision floating-point computing performance.
8. A high performance computing cloud platform resource scheduling system, the system comprising: memory and a processor, the memory having stored thereon a computer program which, when executed by the processor, carries out the steps of the method according to any one of claims 1-7.
9. A computer storage medium, characterized in that a computer program is stored on the computer storage medium, which computer program, when being executed by a processor, carries out the steps of the method according to any one of claims 1-7.
CN202210297756.1A 2022-03-24 2022-03-24 Resource scheduling method, system and storage medium for high-performance computing cloud platform Pending CN114780213A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210297756.1A CN114780213A (en) 2022-03-24 2022-03-24 Resource scheduling method, system and storage medium for high-performance computing cloud platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210297756.1A CN114780213A (en) 2022-03-24 2022-03-24 Resource scheduling method, system and storage medium for high-performance computing cloud platform

Publications (1)

Publication Number Publication Date
CN114780213A true CN114780213A (en) 2022-07-22

Family

ID=82424977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210297756.1A Pending CN114780213A (en) 2022-03-24 2022-03-24 Resource scheduling method, system and storage medium for high-performance computing cloud platform

Country Status (1)

Country Link
CN (1) CN114780213A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115129481A (en) * 2022-08-31 2022-09-30 华控清交信息科技(北京)有限公司 Computing resource allocation method and device and electronic equipment
CN117971512A (en) * 2024-04-02 2024-05-03 杭州骋风而来数字科技有限公司 Intelligent power calculation scheduling system and method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115129481A (en) * 2022-08-31 2022-09-30 华控清交信息科技(北京)有限公司 Computing resource allocation method and device and electronic equipment
CN115129481B (en) * 2022-08-31 2022-11-29 华控清交信息科技(北京)有限公司 Computing resource allocation method and device and electronic equipment
CN117971512A (en) * 2024-04-02 2024-05-03 杭州骋风而来数字科技有限公司 Intelligent power calculation scheduling system and method
CN117971512B (en) * 2024-04-02 2024-08-02 杭州骋风而来数字科技有限公司 Intelligent power calculation scheduling system and method

Similar Documents

Publication Publication Date Title
CN107548549B (en) Resource balancing in a distributed computing environment
CN109684065B (en) Resource scheduling method, device and system
CN114780213A (en) Resource scheduling method, system and storage medium for high-performance computing cloud platform
CN107659433B (en) Cloud resource scheduling method and equipment
CN103226467B (en) Data parallel processing method, system and load balance scheduler
US8578381B2 (en) Apparatus, system and method for rapid resource scheduling in a compute farm
CN112416585A (en) GPU resource management and intelligent scheduling method for deep learning
US11023281B2 (en) Parallel processing apparatus to allocate job using execution scale, job management method to allocate job using execution scale, and recording medium recording job management program to allocate job using execution scale
CN114356587B (en) Calculation power task cross-region scheduling method, system and equipment
CN112801448A (en) Material demand distribution method, device and system and storage medium
CN113157421B (en) Distributed cluster resource scheduling method based on user operation flow
CN110362388A (en) A kind of resource regulating method and device
CN106897136A (en) A kind of method for scheduling task and device
CN103997515B (en) Center system of selection and its application are calculated in a kind of distributed cloud
CN112817728A (en) Task scheduling method, network device and storage medium
CN110084507B (en) Scientific workflow scheduling optimization method based on hierarchical perception in cloud computing environment
US8281313B1 (en) Scheduling computer processing jobs that have stages and precedence constraints among the stages
CN114911613A (en) Cross-cluster resource high-availability scheduling method and system in inter-cloud computing environment
US8819239B2 (en) Distributed resource management systems and methods for resource management thereof
CN116467076A (en) Multi-cluster scheduling method and system based on cluster available resources
Miao et al. Efficient flow-based scheduling for geo-distributed simulation tasks in collaborative edge and cloud environments
CN117909061A (en) Model task processing system and resource scheduling method based on GPU hybrid cluster
JPWO2010001736A1 (en) Multiprocessor system, multithread processing method, and program
CN109783189B (en) Static workflow scheduling method and device
CN116775237A (en) Task scheduling method, device, network equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination