CN111212116A - High-performance computing cluster creating method and system based on container cloud - Google Patents
High-performance computing cluster creating method and system based on container cloud Download PDFInfo
- Publication number
- CN111212116A CN111212116A CN201911341781.XA CN201911341781A CN111212116A CN 111212116 A CN111212116 A CN 111212116A CN 201911341781 A CN201911341781 A CN 201911341781A CN 111212116 A CN111212116 A CN 111212116A
- Authority
- CN
- China
- Prior art keywords
- cluster
- performance computing
- subsystem
- container
- creating
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/104—Peer-to-peer [P2P] networks
- H04L67/1044—Group management mechanisms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/08—Network architectures or network communication protocols for network security for authentication of entities
- H04L63/083—Network architectures or network communication protocols for network security for authentication of entities using passwords
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/104—Peer-to-peer [P2P] networks
- H04L67/1074—Peer-to-peer [P2P] networks for supporting data block transmission mechanisms
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1095—Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/133—Protocols for remote procedure calls [RPC]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45562—Creating, deleting, cloning virtual machine instances
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45595—Network integration; Enabling network access in virtual machine instances
Abstract
The invention provides a container cloud-based high-performance computing cluster creating method and system, wherein the method comprises the following steps: respectively packaging containers of high-performance computing cluster management service, scheduling service and storage service by an administrator through a Docker, and uploading the containers to a mirror image warehouse of a container cloud platform; receiving a request of a user for creating a high-performance computing cluster; determining the computing resource demand of the high-performance computing cluster according to the creation request; and calling a container cloud platform interface according to the demand of the computing resources to complete the combined configuration of the high-performance computing resources, and creating a high-performance computing cluster. The invention realizes the automatic configuration of the high-performance computing cluster, simplifies the deployment, operation and maintenance difficulty, and only needs to concentrate on the high-performance computing service by a user; simultaneously, the method supports a plurality of versions of job management software, and meets the diversity requirements of different users; a user resource pool isolation mechanism is provided through containerization, and users do not interfere with each other when running tasks, so that the safety is higher than that of the traditional technology.
Description
Technical Field
The invention relates to the technical field of high-performance computing, in particular to a container cloud-based high-performance computing cluster creating method and system.
Background
The expansion of demand has driven the technological change, and high-performance computing has been a branch of computer science, and has been dedicated to the development of high-performance computers and application software running on the high-performance computers. Recall that high performance computing has been used as a powerful computing tool, inseparable from the development of scientific research. On one hand, the ever-limitless demand of scientific research on computing capacity promotes the forward development of high-performance computing technology; on the other hand, each huge advance of high-performance computing technology provides a brand-new means for scientific research. HPC mainly focuses on the fields of simulation, physical chemistry, life science, rendering, exploration, meteorology and the like, and the upper HPC cluster application environment is relatively fixed. Nowadays, with the popularization of cloud computing, HPC is gradually changing at all levels, and the combination of cloud computing is becoming more and more compact, and each cloud service manufacturer continuously provides a cloud-based high-performance computing solution.
The existing high-performance computing cluster has the following defects:
(1) the cluster configuration deployment is complex and not easy to manage. Typical high performance computing cluster application software stacks typically include, but are not limited to, MPI libraries, job management software, and distributed file systems.
(2) The cluster application software of multiple versions cannot be simultaneously supported on the same node, and the development integration environment of high-performance application cannot be realized.
(3) The method does not support multiple tenants, is complicated in user resource allocation, and does not have a user resource pool isolation mechanism.
(4) The computing, storage and network performance of the virtualization-based high-performance computing cluster are compromised to different degrees compared with the physical performance.
Disclosure of Invention
In order to solve at least one technical problem, the invention provides a container cloud-based high-performance computing cluster creation method and system.
In order to achieve the above object, a first aspect of the present invention provides a container cloud-based high performance computing cluster creation method, including:
respectively packaging containers of high-performance computing cluster management service, scheduling service and storage service by an administrator through a Docker, and uploading the containers to a mirror image warehouse of a container cloud platform;
receiving a request of a user for creating a high-performance computing cluster;
determining the computing resource demand of the high-performance computing cluster according to the creation request;
and calling a container cloud platform interface according to the demand of the computing resources to complete the combined configuration of the high-performance computing resources, and creating a high-performance computing cluster.
Further, receiving a request for creating a high-performance computing cluster from a user specifically includes:
receiving a creation request of a user for a high-performance computing cluster by an API (application programming interface) service subsystem, and verifying whether relevant parameters of the creation request are legal or not;
if the relevant parameters are legal, the API service subsystem calls a database interface to initialize cluster information to a database;
and the API service subsystem sends a cluster creation request to the scheduling subsystem through an RPC interface.
Preferably, the relevant parameters include any one or more of a container cloud platform where the high-performance computing cluster to be created is located, a cluster login node user name, a cluster login key pair, a cluster type, a cluster mirror image, a required resource size, and an SLURM version.
Further, determining the computing resource demand of the high-performance computing cluster according to the creation request specifically includes:
a dispatching subsystem receives a cluster creating request sent by an API service subsystem;
the scheduling subsystem accumulates the resources required by the high-performance computing cluster according to the CPU, the memory and the storage, and then compares the total amount of the existing computing resources;
and if the total amount of the existing computing resources meets the resources required by the high-performance computing cluster, selecting the executing subsystem node with the least current task, and sending a cluster creating request to the selected executing subsystem node.
Further, calling a container cloud platform interface according to the demand of the computing resources to complete the combined configuration of the high-performance computing resources, and creating a high-performance computing cluster, specifically comprising:
receiving, by an execution subsystem, a cluster creation request sent by the scheduling subsystem;
the execution subsystem constructs the resource parameters of the cluster creation requirement into an API specification format of Cloudet;
the method comprises the steps of creating required resources in batch by calling a resource creating request of Cloudtest, and completing a resource preparation subtask;
and executing the infrastructure deployment according to the resources required by batch creation, and completing the deployment subtask of the HPC cluster software.
Further, the batch creation of the required resources by calling the resource creation request of Cloudtest specifically includes:
creating a user-defined CPU configuration and memory size combined container as a login node, and injecting a specified key pair public key into the container of the login node;
creating a container of CPU configuration and memory size combination specified by a user as a computing node;
and storing the shared file with the size specified by the user into all containers loaded to the login node and the computing node respectively.
Further, after completing the resource preparation subtask, the method further includes:
writing the IP address and/or the domain name of the container of the login node into the database;
when a login request of a user is received, whether a key pair private key provided by the user is matched with a key pair public key prestored in a container or not is verified, if so, the user is allowed to log in the container to check the deployment state, and if not, the user is refused to log in the container to check the deployment state.
The second aspect of the present invention further provides a container cloud-based high-performance computing cluster creation system, configured to implement the container cloud-based high-performance computing cluster creation method described above, where the container cloud-based high-performance computing cluster creation system includes an HPC container management scheduling subsystem and a container cloud platform;
the HPC container management scheduling subsystem is used for receiving a user request for creating a high-performance computing cluster; determining the computing resource demand of the high-performance computing cluster according to the creation request; calling a container cloud platform interface to complete the combined configuration of high-performance computing resources according to the computing resource demand, and creating a high-performance computing cluster;
the container cloud platform configures a high-performance computing cluster through various container combinations, and comprises a mirror image warehouse, wherein the mirror image warehouse records container images of high-performance computing cluster management service, scheduling service and storage service which are respectively packaged by an administrator through a Docker, and the container images are used for generating corresponding containers.
Furthermore, the HPC container management scheduling subsystem comprises an API service subsystem, a scheduling subsystem and an execution subsystem, and the API service subsystem, the scheduling subsystem and the execution subsystem are communicated through RPC respectively;
the API service subsystem is used for receiving a request of a user for creating a high-performance computing cluster, verifying whether relevant parameters of the request are legal or not, and calling a database interface to initialize cluster information to a database when the relevant parameters are legal;
the scheduling subsystem is used for receiving the cluster creation request sent by the API service subsystem, accumulating resources required by the high-performance computing cluster according to the CPU, the memory and the storage, comparing the total amount of the existing computing resources, and when the total amount of the existing computing resources meets the resources required by the high-performance computing cluster, selecting the executing subsystem node with the least current task and sending the cluster creation request;
the execution subsystem is used for receiving the cluster creation request sent by the scheduling subsystem, structuring the resource parameters of the cluster creation requirement into an API (application programming interface) specification format of Cloudet, creating the required resources in batches by calling the resource creation request of Cloudet, completing the resource preparation subtasks, executing the infrastructure deployment according to the resources required by batch creation, and completing the HPC cluster software deployment subtasks.
Further, the container cloud-based high-performance computing cluster creating system further comprises an external authentication system, and the external authentication system is used for assisting the API service subsystem to complete verification of the validity of the relevant parameters.
The invention realizes the automatic configuration of the high-performance computing cluster, simplifies the deployment, operation and maintenance difficulty, and only needs to concentrate on the high-performance computing service by a user; meanwhile, the method supports a plurality of versions of job management software, meets the diversity requirements of different users, and has strong practicability and expansibility; the system supports multiple tenants, provides a user resource pool isolation mechanism through containerization, ensures that each user does not interfere with each other when running tasks, and has higher safety than the traditional technology; compared with a high-performance computing cluster based on virtualization, the performance compromise is small.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
FIG. 1 is a block diagram of a container cloud based high performance computing cluster creation system of the present invention;
FIG. 2 is a flow chart illustrating a method for creating a container cloud based high performance computing cluster according to the present invention;
FIG. 3 illustrates a workflow diagram of the API services subsystem of the present invention;
FIG. 4 illustrates a workflow diagram of the scheduling subsystem of the present invention;
FIG. 5 illustrates a workflow diagram of the execution subsystem of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
Referring to fig. 1 and fig. 2, in a first aspect, the present invention provides a method for creating a container cloud-based high-performance computing cluster, where the method includes:
and 204, calling a container cloud platform interface according to the computing resource demand to complete the combined configuration of the high-performance computing resources, and creating a high-performance computing cluster.
In the actual operation process, an administrator can respectively encapsulate containers of high-performance computing cluster management service, scheduling service and storage service through a Docker, and upload the containers to a container cloud mirror warehouse. The user configures the required amount of resources for each service through the HPC container management scheduling subsystem. The HPC container management scheduling subsystem calls a standard container cloud interface to complete the combined configuration of the high-performance computing resources.
The invention can effectively support the issuing and configuration management of the high-performance computing cluster based on the container cloud, saves the complexity of manual configuration of operation and maintenance management personnel, and reduces the probability of manual configuration errors. And meanwhile, the high-performance computing resource pool can be dynamically and rapidly configured and deployed according to the user requirements. And the user can increase and decrease resources and monitor the resource utilization rate of each service through the HPC container management scheduling subsystem. Therefore, the invention greatly simplifies the deployment, operation and maintenance management of the high-performance computing cluster, improves the resource utilization rate of the cluster and provides a high-performance computing cluster management interface which is easy to expand for users on the premise of not losing any performance.
Further, calling a container cloud platform interface according to the computing resource demand to complete the combined configuration of the high-performance computing resources, specifically comprising:
according to the required resource quantity, the adaptive container mirror image can be found in the mirror image warehouse of the container cloud platform, the corresponding quantity of containers are generated through the adaptive container mirror image, and the creation and the release of the high-performance computing cluster are completed through the combined configuration of the containers.
As shown in FIG. 1, the HPC container management scheduling subsystem includes the following three core subsystems: an API service subsystem, a scheduling subsystem and an execution subsystem. The API service subsystem, the scheduling subsystem and the execution subsystem are communicated through RPC respectively, and the execution subsystem executes requests such as resource creation and resource change through calling an API interface of the container cloud platform.
The API service subsystem manages the life cycle of the high-performance computing cluster by providing a Restful interface, and the life cycle comprises cluster creation, cluster deletion, cluster list acquisition, cluster expansion and contraction, cluster key acquisition, cluster template creation, cluster template acquisition and the like.
The scheduling subsystem has the following two functions: firstly, responding to RPC requests of creation, deletion and telescopic clusters of an API service subsystem; and secondly, acquiring cluster information of creation management of each execution subsystem, and scheduling the cluster to the execution subsystems to execute creation operation.
The main tasks of the execution subsystem are to create and update resources such as containers, storage and networks defined by the cluster by calling an API (application programming interface) of a container cloud platform, install configuration containers for login nodes and computing nodes defined by different roles of the high-performance computing cluster by using an infrastructure, and simultaneously support the operation of a plurality of execution subsystems on a plurality of nodes.
It should be noted that HPC (High-Performance Computing) typically uses many processors (as part of a single machine) or Computing systems and environments with several computers (operating as a single Computing resource) organized in a cluster.
Docker is an open source application container engine, which allows developers to package applications and dependency packages into a portable container, and then distribute them to any popular Linux or Windows machine, and also to implement virtualization. The containers are fully sandboxed without any interface between each other. There is little performance overhead and it can be easily run in machines and data centers.
The container cloud is a basic unit for resource partitioning and scheduling, encapsulates the whole software runtime environment, and provides a platform for developers and system administrators to build, publish and run distributed applications.
According to the embodiment of the invention, receiving a request of a user for creating a high-performance computing cluster specifically comprises the following steps:
receiving a creation request of a user for a high-performance computing cluster by an API (application programming interface) service subsystem, and verifying whether relevant parameters of the creation request are legal or not;
if the relevant parameters are legal, the API service subsystem calls a database interface to initialize cluster information to a database;
and the API service subsystem sends a cluster creation request to the scheduling subsystem through an RPC interface.
As shown in fig. 3, the specific work flow of the API service subsystem is as follows:
Further, if the parameters are illegal, the process is directly ended and cluster creation failure information is fed back to the user.
Preferably, the relevant parameters include any one or more of a container cloud platform where the high-performance computing cluster to be created is located, a cluster login node user name, a cluster login key pair, a cluster type, a cluster mirror image, a required resource size, and an SLURM version. But is not limited thereto.
It should be noted that slurm (simple Linux Utility for Resource management) is a highly scalable and fault-tolerant cluster manager and job scheduling system that can be used for large clusters of compute nodes. SLURM maintains a queue of pending jobs and manages the overall resource utilization of the jobs. It also manages the available compute nodes in an exclusive or non-exclusive manner (depending on the needs of the resource). Finally, the SLURM distributes jobs to a set of allocated nodes to perform work and monitors parallel jobs for completion.
According to an embodiment of the present invention, determining the computing resource demand of the high-performance computing cluster according to the creation request specifically includes:
a dispatching subsystem receives a cluster creating request sent by an API service subsystem;
the scheduling subsystem accumulates the resources required by the high-performance computing cluster according to the CPU, the memory and the storage, and then compares the total amount of the existing computing resources;
and if the total amount of the existing computing resources meets the resources required by the high-performance computing cluster, selecting the executing subsystem node with the least current task, and sending a cluster creating request to the selected executing subsystem node.
As shown in fig. 4, the specific work flow of the scheduling subsystem is as follows:
Further, upon execution of step 403, if the scheduling subsystem fails to successfully send a create request to the selected execution subsystem node, the cluster status is set to error (step 402. a).
According to the embodiment of the present invention, a container cloud platform interface is called according to the demand of the computing resources to complete the combined configuration of the high performance computing resources, and a high performance computing cluster is created, which specifically includes:
receiving, by an execution subsystem, a cluster creation request sent by the scheduling subsystem;
the execution subsystem constructs the resource parameters of the cluster creation requirement into an API specification format of Cloudet;
the method comprises the steps of creating required resources in batch by calling a resource creating request of Cloudtest, and completing a resource preparation subtask;
and executing the infrastructure deployment according to the resources required by batch creation, and completing the deployment subtask of the HPC cluster software.
Further, the batch creation of the required resources by calling the resource creation request of Cloudtest specifically includes:
creating a user-defined CPU configuration and memory size combined container as a login node, and injecting a specified key pair public key into the container of the login node;
creating a container of CPU configuration and memory size combination specified by a user as a computing node;
and storing the shared file with the size specified by the user into all containers loaded to the login node and the computing node respectively.
In practical application, a combined container of 1 single-core CPU and 1G memory can be created as a login node; or a combined container of 10 four-core CPUs and 8G memories can be created as a computing node; the shared file storage of the size specified by the user is preferably 100G capacity, but is not limited thereto.
Further, after completing the resource preparation subtask, the method further includes:
writing the IP address and/or the domain name of the container of the login node into the database;
when a login request of a user is received, whether a key pair private key provided by the user is matched with a key pair public key prestored in a container or not is verified, if so, the user is allowed to log in the container to check the deployment state, and if not, the user is refused to log in the container to check the deployment state.
As shown in fig. 5, the specific workflow of the execution subsystem is as follows:
At step 502, if the resource preparation subtask is successfully executed, the HPC cluster software deploys the subtask and starts to execute the infrastructure deployment. If the resource preparation subtask execution is not successful, the cluster status is updated to fail (step 502. b). Different Angle roles are respectively defined for a login node and a computing node. If the cluster deployment fails, the cluster state is updated to fail (step 502.b), and if the cluster software deployment succeeds, the cluster state is updated to be running (step 503).
IT should be noted that infrastructure is an open-source OpenSSH-based automated configuration management tool that can be used to configure systems, deploy software, and orchestrate higher-level IT tasks, such as continuous deployment or zero-downtime updates.
The second aspect of the present invention further provides a container cloud-based high-performance computing cluster creation system (as shown in fig. 1) for implementing the above container cloud-based high-performance computing cluster creation method, where the container cloud-based high-performance computing cluster creation system includes an HPC container management scheduling subsystem and a container cloud platform;
the HPC container management scheduling subsystem is used for receiving a user request for creating a high-performance computing cluster; determining the computing resource demand of the high-performance computing cluster according to the creation request; calling a container cloud platform interface to complete the combined configuration of high-performance computing resources according to the computing resource demand, and creating a high-performance computing cluster;
the container cloud platform configures a high-performance computing cluster through various container combinations, and comprises a mirror image warehouse, wherein the mirror image warehouse records container images of high-performance computing cluster management service, scheduling service and storage service which are respectively packaged by an administrator through a Docker, and the container images are used for generating corresponding containers.
Furthermore, the HPC container management scheduling subsystem comprises an API service subsystem, a scheduling subsystem and an execution subsystem, and the API service subsystem, the scheduling subsystem and the execution subsystem are communicated through RPC respectively;
the API service subsystem is used for receiving a request of a user for creating a high-performance computing cluster, verifying whether relevant parameters of the request are legal or not, and calling a database interface to initialize cluster information to a database when the relevant parameters are legal;
the scheduling subsystem is used for receiving the cluster creation request sent by the API service subsystem, accumulating resources required by the high-performance computing cluster according to the CPU, the memory and the storage, comparing the total amount of the existing computing resources, and when the total amount of the existing computing resources meets the resources required by the high-performance computing cluster, selecting the executing subsystem node with the least current task and sending the cluster creation request;
the execution subsystem is used for receiving the cluster creation request sent by the scheduling subsystem, structuring the resource parameters of the cluster creation requirement into an API (application programming interface) specification format of Cloudet, creating the required resources in batches by calling the resource creation request of Cloudet, completing the resource preparation subtasks, executing the infrastructure deployment according to the resources required by batch creation, and completing the HPC cluster software deployment subtasks.
Further, the container cloud-based high-performance computing cluster creating system further comprises an external authentication system, and the external authentication system is used for assisting the API service subsystem to complete verification of the validity of the relevant parameters.
The invention makes up the defect that the user can not configure and manage independently in the traditional HPC environment, and by the method, the user can freely combine different software versions required by the high-performance computing cluster based on the same container mirror image, thereby having strong flexibility and greatly reducing the mirror image maintenance burden of operation and maintenance personnel.
The invention can apply multi-tenants to the traditional HPC cluster, and deploy HPC software in a container mode, compared with a virtual machine mode, the performance is greatly improved, the isolation of HPC computing resources of different tenants is realized, and the safety is improved.
The invention realizes the capability of integrating and publishing the high-performance computing cluster at one place through the template function, greatly increases the portability of the high-performance computing cluster and lightens the burden of maintaining the cluster by a user.
The invention also supports large-scale users to deploy the management cluster at the same time through the distributed multi-worker design, and has good expansibility.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.
Claims (10)
1. A method for creating a container cloud-based high-performance computing cluster, the method comprising:
respectively packaging containers of high-performance computing cluster management service, scheduling service and storage service by an administrator through a Docker, and uploading the containers to a mirror image warehouse of a container cloud platform;
receiving a request of a user for creating a high-performance computing cluster;
determining the computing resource demand of the high-performance computing cluster according to the creation request;
and calling a container cloud platform interface according to the demand of the computing resources to complete the combined configuration of the high-performance computing resources, and creating a high-performance computing cluster.
2. The method for creating a high-performance computing cluster based on a container cloud according to claim 1, wherein receiving a request for creating a high-performance computing cluster from a user specifically includes:
receiving a creation request of a user for a high-performance computing cluster by an API (application programming interface) service subsystem, and verifying whether relevant parameters of the creation request are legal or not;
if the relevant parameters are legal, the API service subsystem calls a database interface to initialize cluster information to a database;
and the API service subsystem sends a cluster creation request to the scheduling subsystem through an RPC interface.
3. The method for creating the container cloud-based high-performance computing cluster as claimed in claim 2, wherein the relevant parameters include any one or more of a container cloud platform where the high-performance computing cluster to be created is located, a cluster login node user name, a key pair for cluster login, a cluster type, a cluster image, a required resource size, and a SLURM version.
4. The method for creating a container cloud-based high-performance computing cluster according to claim 1, wherein determining the computing resource demand of the high-performance computing cluster according to the creation request specifically includes:
a dispatching subsystem receives a cluster creating request sent by an API service subsystem;
the scheduling subsystem accumulates the resources required by the high-performance computing cluster according to the CPU, the memory and the storage, and then compares the total amount of the existing computing resources;
and if the total amount of the existing computing resources meets the resources required by the high-performance computing cluster, selecting the executing subsystem node with the least current task, and sending a cluster creating request to the selected executing subsystem node.
5. The method for creating a container cloud-based high-performance computing cluster according to claim 1, wherein a container cloud platform interface is invoked according to the computing resource demand to complete a combined configuration of high-performance computing resources, and a high-performance computing cluster is created, and specifically includes:
receiving, by an execution subsystem, a cluster creation request sent by the scheduling subsystem;
the execution subsystem constructs the resource parameters of the cluster creation requirement into an API specification format of Cloudet;
the method comprises the steps of creating required resources in batch by calling a resource creating request of Cloudtest, and completing a resource preparation subtask;
and executing the infrastructure deployment according to the resources required by batch creation, and completing the deployment subtask of the HPC cluster software.
6. The method for creating a container cloud-based high-performance computing cluster according to claim 5, wherein the creating of the required resources in batch by calling a resource creating request of Cloudtest specifically comprises:
creating a user-defined CPU configuration and memory size combined container as a login node, and injecting a specified key pair public key into the container of the login node;
creating a container of CPU configuration and memory size combination specified by a user as a computing node;
and storing the shared file with the size specified by the user into all containers loaded to the login node and the computing node respectively.
7. The method of claim 6, after completing the resource preparation subtask, the method further comprising:
writing the IP address and/or the domain name of the container of the login node into the database;
when a login request of a user is received, whether a key pair private key provided by the user is matched with a key pair public key prestored in a container or not is verified, if so, the user is allowed to log in the container to check the deployment state, and if not, the user is refused to log in the container to check the deployment state.
8. A container cloud based high performance computing cluster creation system for implementing the container cloud based high performance computing cluster creation method of any one of the preceding claims 1 to 7, the container cloud based high performance computing cluster creation system comprising an HPC container management scheduling subsystem and a container cloud platform;
the HPC container management scheduling subsystem is used for receiving a user request for creating a high-performance computing cluster; determining the computing resource demand of the high-performance computing cluster according to the creation request; calling a container cloud platform interface to complete the combined configuration of high-performance computing resources according to the computing resource demand, and creating a high-performance computing cluster;
the container cloud platform configures a high-performance computing cluster through various container combinations, and comprises a mirror image warehouse, wherein the mirror image warehouse records container images of high-performance computing cluster management service, scheduling service and storage service which are respectively packaged by an administrator through a Docker, and the container images are used for generating corresponding containers.
9. The container cloud based high performance computing cluster creation system of claim 8, in which the HPC container management scheduling subsystem comprises an API service subsystem, a scheduling subsystem and an execution subsystem, and the API service subsystem, the scheduling subsystem and the execution subsystem communicate with each other via RPC;
the API service subsystem is used for receiving a request of a user for creating a high-performance computing cluster, verifying whether relevant parameters of the request are legal or not, and calling a database interface to initialize cluster information to a database when the relevant parameters are legal;
the scheduling subsystem is used for receiving the cluster creation request sent by the API service subsystem, accumulating resources required by the high-performance computing cluster according to the CPU, the memory and the storage, comparing the total amount of the existing computing resources, and when the total amount of the existing computing resources meets the resources required by the high-performance computing cluster, selecting the executing subsystem node with the least current task and sending the cluster creation request;
the execution subsystem is used for receiving the cluster creation request sent by the scheduling subsystem, structuring the resource parameters of the cluster creation requirement into an API (application programming interface) specification format of Cloudet, creating the required resources in batches by calling the resource creation request of Cloudet, completing the resource preparation subtasks, executing the infrastructure deployment according to the resources required by batch creation, and completing the HPC cluster software deployment subtasks.
10. The system according to claim 8, further comprising an external authentication system, wherein the external authentication system is configured to assist the API service subsystem in verifying the validity of the relevant parameters.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911341781.XA CN111212116A (en) | 2019-12-24 | 2019-12-24 | High-performance computing cluster creating method and system based on container cloud |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911341781.XA CN111212116A (en) | 2019-12-24 | 2019-12-24 | High-performance computing cluster creating method and system based on container cloud |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111212116A true CN111212116A (en) | 2020-05-29 |
Family
ID=70788229
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911341781.XA Pending CN111212116A (en) | 2019-12-24 | 2019-12-24 | High-performance computing cluster creating method and system based on container cloud |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111212116A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112035403A (en) * | 2020-08-28 | 2020-12-04 | 广州市品高软件股份有限公司 | Cloud computing-based multi-tenant elastic file system creation method |
CN112099924A (en) * | 2020-11-18 | 2020-12-18 | 南京信易达计算技术有限公司 | Container-based high-performance computing cluster system |
CN112199178A (en) * | 2020-10-21 | 2021-01-08 | 中国电子科技集团公司第十五研究所 | Cloud service dynamic scheduling method and system based on lightweight container |
CN112650560A (en) * | 2021-01-18 | 2021-04-13 | 济南浪潮高新科技投资发展有限公司 | Container design-based cloud robot model issuing method |
CN112698924A (en) * | 2021-03-23 | 2021-04-23 | 杭州太美星程医药科技有限公司 | Clinical test electronic data acquisition system and operation method thereof |
CN112822028A (en) * | 2020-12-29 | 2021-05-18 | 北京浪潮数据技术有限公司 | Slurm cluster charging method, system, electronic equipment and storage medium |
CN113766005A (en) * | 2021-07-29 | 2021-12-07 | 苏州浪潮智能科技有限公司 | RDMA (remote direct memory Access) -based method and system for batch creation of cloud hosts |
CN114090268A (en) * | 2022-01-11 | 2022-02-25 | 北京九章云极科技有限公司 | Container management method and container management system |
CN114401280A (en) * | 2022-01-14 | 2022-04-26 | 北京天云融创软件技术有限公司 | Operation data synchronization method and system |
WO2022109932A1 (en) * | 2020-11-26 | 2022-06-02 | 深圳晶泰科技有限公司 | Multi-task submission system based on slurm computing platform |
CN115964176A (en) * | 2023-01-05 | 2023-04-14 | 海马云(天津)信息技术有限公司 | Cloud computing cluster scheduling method, electronic device and storage medium |
WO2023116420A1 (en) * | 2021-12-22 | 2023-06-29 | 中兴通讯股份有限公司 | Database deployment methods, database processing methods, related devices and storage medium |
CN117075930A (en) * | 2023-10-17 | 2023-11-17 | 之江实验室 | Computing framework management system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102611723A (en) * | 2011-01-25 | 2012-07-25 | 赵天海 | Method for building high-performance computing application service based on virtualization technology |
CN104506620A (en) * | 2014-12-23 | 2015-04-08 | 西安电子科技大学 | Extensible automatic computing service platform and construction method for same |
CN106790483A (en) * | 2016-12-13 | 2017-05-31 | 武汉邮电科学研究院 | Hadoop group systems and fast construction method based on container technique |
CN108243157A (en) * | 2016-12-26 | 2018-07-03 | 华为技术服务有限公司 | The method for implanting and device of sensitive information in virtual machine |
CN108845878A (en) * | 2018-05-08 | 2018-11-20 | 南京理工大学 | The big data processing method and processing device calculated based on serverless backup |
CN109656686A (en) * | 2018-12-17 | 2019-04-19 | 武汉烽火信息集成技术有限公司 | The upper deployment container cloud method of OpenStack, storage medium, electronic equipment and system |
US20190349305A1 (en) * | 2018-05-11 | 2019-11-14 | Huazhong University Of Science And Technology | Container communication method and system for parallel applications |
-
2019
- 2019-12-24 CN CN201911341781.XA patent/CN111212116A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102611723A (en) * | 2011-01-25 | 2012-07-25 | 赵天海 | Method for building high-performance computing application service based on virtualization technology |
CN104506620A (en) * | 2014-12-23 | 2015-04-08 | 西安电子科技大学 | Extensible automatic computing service platform and construction method for same |
CN106790483A (en) * | 2016-12-13 | 2017-05-31 | 武汉邮电科学研究院 | Hadoop group systems and fast construction method based on container technique |
CN108243157A (en) * | 2016-12-26 | 2018-07-03 | 华为技术服务有限公司 | The method for implanting and device of sensitive information in virtual machine |
CN108845878A (en) * | 2018-05-08 | 2018-11-20 | 南京理工大学 | The big data processing method and processing device calculated based on serverless backup |
US20190349305A1 (en) * | 2018-05-11 | 2019-11-14 | Huazhong University Of Science And Technology | Container communication method and system for parallel applications |
CN109656686A (en) * | 2018-12-17 | 2019-04-19 | 武汉烽火信息集成技术有限公司 | The upper deployment container cloud method of OpenStack, storage medium, electronic equipment and system |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112035403A (en) * | 2020-08-28 | 2020-12-04 | 广州市品高软件股份有限公司 | Cloud computing-based multi-tenant elastic file system creation method |
CN112199178A (en) * | 2020-10-21 | 2021-01-08 | 中国电子科技集团公司第十五研究所 | Cloud service dynamic scheduling method and system based on lightweight container |
CN112099924A (en) * | 2020-11-18 | 2020-12-18 | 南京信易达计算技术有限公司 | Container-based high-performance computing cluster system |
WO2022109932A1 (en) * | 2020-11-26 | 2022-06-02 | 深圳晶泰科技有限公司 | Multi-task submission system based on slurm computing platform |
CN112822028A (en) * | 2020-12-29 | 2021-05-18 | 北京浪潮数据技术有限公司 | Slurm cluster charging method, system, electronic equipment and storage medium |
CN112650560A (en) * | 2021-01-18 | 2021-04-13 | 济南浪潮高新科技投资发展有限公司 | Container design-based cloud robot model issuing method |
CN112650560B (en) * | 2021-01-18 | 2022-10-18 | 山东浪潮科学研究院有限公司 | Container design-based cloud robot model issuing method |
CN112698924A (en) * | 2021-03-23 | 2021-04-23 | 杭州太美星程医药科技有限公司 | Clinical test electronic data acquisition system and operation method thereof |
CN113766005B (en) * | 2021-07-29 | 2023-04-28 | 苏州浪潮智能科技有限公司 | RDMA-based method and system for batch creation of cloud hosts |
CN113766005A (en) * | 2021-07-29 | 2021-12-07 | 苏州浪潮智能科技有限公司 | RDMA (remote direct memory Access) -based method and system for batch creation of cloud hosts |
WO2023116420A1 (en) * | 2021-12-22 | 2023-06-29 | 中兴通讯股份有限公司 | Database deployment methods, database processing methods, related devices and storage medium |
CN114090268A (en) * | 2022-01-11 | 2022-02-25 | 北京九章云极科技有限公司 | Container management method and container management system |
CN114401280A (en) * | 2022-01-14 | 2022-04-26 | 北京天云融创软件技术有限公司 | Operation data synchronization method and system |
CN115964176A (en) * | 2023-01-05 | 2023-04-14 | 海马云(天津)信息技术有限公司 | Cloud computing cluster scheduling method, electronic device and storage medium |
CN117075930A (en) * | 2023-10-17 | 2023-11-17 | 之江实验室 | Computing framework management system |
CN117075930B (en) * | 2023-10-17 | 2024-01-26 | 之江实验室 | Computing framework management system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111212116A (en) | High-performance computing cluster creating method and system based on container cloud | |
US9661071B2 (en) | Apparatus, systems and methods for deployment and management of distributed computing systems and applications | |
US10467725B2 (en) | Managing access to a resource pool of graphics processing units under fine grain control | |
US10225335B2 (en) | Apparatus, systems and methods for container based service deployment | |
EP3347816B1 (en) | Extension of resource constraints for service-defined containers | |
US8892945B2 (en) | Efficient application management in a cloud with failures | |
US8756597B2 (en) | Extending functionality of legacy services in computing system environment | |
CN108255497B (en) | Application deployment method and device | |
US10148657B2 (en) | Techniques for workload spawning | |
CN112104723B (en) | Multi-cluster data processing system and method | |
JP2021518018A (en) | Function portability for service hubs with function checkpoints | |
CN102404385A (en) | Virtual cluster deployment system and deployment method for high performance computing | |
US9729610B2 (en) | Method for intercepting an instruction produced by an application on a computer | |
US10728169B1 (en) | Instance upgrade migration | |
Guan et al. | A multi-layered scheme for distributed simulations on the cloud environment | |
US11614957B1 (en) | Native-hypervisor based on-demand code execution system | |
US20230138867A1 (en) | Methods for application deployment across multiple computing domains and devices thereof | |
US11847611B2 (en) | Orchestrating and automating product deployment flow and lifecycle management | |
CN112564979A (en) | Execution method and device for construction task, computer equipment and storage medium | |
Mendez et al. | e-clouds: Scientific computing as a service | |
CN114465765B (en) | Client security management system and method of cloud desktop system | |
WO2023012553A1 (en) | System for the containerization of business workstations with low-cost remote user interfaces | |
CN117708822A (en) | Data processing method, proxy device and related equipment | |
Rathbone et al. | Cyberaide creative: On-demand cyberinfrastructure provision in clouds | |
CN115904478A (en) | Cloud platform resource management method and system and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200529 |
|
RJ01 | Rejection of invention patent application after publication |