CN117170812B - Numerical forecasting calculation cloud system based on research and development operation and maintenance integrated architecture - Google Patents
Numerical forecasting calculation cloud system based on research and development operation and maintenance integrated architecture Download PDFInfo
- Publication number
- CN117170812B CN117170812B CN202311148883.6A CN202311148883A CN117170812B CN 117170812 B CN117170812 B CN 117170812B CN 202311148883 A CN202311148883 A CN 202311148883A CN 117170812 B CN117170812 B CN 117170812B
- Authority
- CN
- China
- Prior art keywords
- node
- application
- numerical forecasting
- target node
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012827 research and development Methods 0.000 title claims abstract description 41
- 238000012423 maintenance Methods 0.000 title claims abstract description 16
- 238000004364 calculation method Methods 0.000 title claims abstract description 9
- 238000011161 development Methods 0.000 claims abstract description 19
- 238000010397 one-hybrid screening Methods 0.000 claims abstract description 13
- 238000006243 chemical reaction Methods 0.000 claims abstract description 8
- CCEKAJIANROZEO-UHFFFAOYSA-N sulfluramid Chemical group CCNS(=O)(=O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F CCEKAJIANROZEO-UHFFFAOYSA-N 0.000 claims description 21
- 238000003860 storage Methods 0.000 claims description 12
- 238000000034 method Methods 0.000 claims description 10
- 238000004519 manufacturing process Methods 0.000 claims description 4
- 230000010354 integration Effects 0.000 claims 1
- 238000011160 research Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 238000012986 modification Methods 0.000 description 6
- 230000004048 modification Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 4
- 230000006872 improvement Effects 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000004806 packaging method and process Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000002955 isolation Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000004321 preservation Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000000712 assembly Effects 0.000 description 1
- 238000000429 assembly Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005538 encapsulation Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000002688 persistence Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to a numerical forecasting calculation cloud system based on an integrated architecture of research and development operation and maintenance, which comprises the following components: a hybrid cluster comprising at least one hybrid node; the mirror image warehouse is used for storing a plurality of Docker basic mirrors and a plurality of numerical forecasting application mirrors; the conversion unit is used for synchronously converting the plurality of Docker application images into a plurality of Singularity application images; the node scheduling unit is used for receiving application research and development tasks, determining a first target node from the hybrid cluster, wherein the first target node is used for scheduling by the K8S scheduler, and the first target node is used for research and development of the numerical forecasting application; the node scheduling unit is also used for receiving the business job task, determining a second target node from the mixed cluster, wherein the second target node is used for Slurm for scheduling by a scheduler, and the second target node is used for pulling the numerical forecasting application and running the job, and has the advantages of integrating the research, development and operation environment of the numerical forecasting and improving the utilization rate of hardware resources.
Description
Technical Field
The invention relates to the field of data processing, in particular to a numerical forecasting calculation cloud system based on an integrated architecture of research and development operation and maintenance.
Background
The current cluster scheduler of the high-performance clusters applied to the numerical forecasting field is mainly slurm, the main stream high-performance container is Singularity container, the cluster scheduler applied to calculation is mainly Kubernetes (K8S for short), the main stream container is a Docker container, the container technology can realize the packaging preservation, quick deployment, repeated use and safe isolation of the computing environment, the rapid development is obtained in recent years, the Docker container with more mature technology and stronger isolation and the main stream arrangement system K8S thereof are mainly focused on the manufacturing and management of containerized application, and the method is applicable to a numerical forecasting development system; the light-weight and weakly isolated Singularity container and the main stream job scheduling system Slurm thereof are mainly focused on the resource management and job scheduling of the high-performance computing cluster, and are suitable for the numerical forecasting service system. However, slurm is not compatible with K8S at present, which makes it difficult for the numerical forecasting service system to rapidly deploy the containerized application created by the development system, and the two cannot share the underlying computing resources, which causes separation of the development environment from the service environment and great waste of hardware computing resources.
Therefore, it is necessary to provide a numerical forecasting computing cloud system based on an integrated architecture of research and development and operation, which is used for realizing resource sharing and hybrid scheduling of Slrum nodes and K8S nodes, so as to integrate the research and development and operation environment of numerical forecasting and improve the utilization rate of hardware resources.
Disclosure of Invention
One of the embodiments of the present disclosure provides a numerical forecasting computing cloud system based on an integrated architecture of research and development operations and maintenance, including: a hybrid cluster comprising Slurm and K8S clusters, the hybrid cluster comprising at least one Slurm node, at least one hybrid node and at least one K8S node, the hybrid nodes being scheduled by one of the Slurm and K8S clusters at the same time; the mirror image warehouse is used for storing a plurality of Docker basic mirrors and a plurality of numerical forecasting application mirrors; the conversion unit is used for synchronously converting the plurality of Docker application images into a plurality of Singularity application images; a shared storage unit for storing the plurality Singularity of application images converted by the conversion unit; the node scheduling unit is used for receiving application research and development tasks and determining a first target node from the hybrid cluster, wherein the first target node is used for scheduling by the K8S scheduler and is used for research and development of numerical forecasting application; the node scheduling unit is further configured to receive a service job task, and determine a second target node from the hybrid cluster, where the second target node is used for scheduling by the Slurm scheduler, and the second target node is used to pull a numerical forecasting application and run a job.
In some embodiments, the developing of the numerical forecasting application by the first target node includes: the first target node dispatches and allocates computing resources for carrying out the mirror image making task; pulling a target Docker basic mirror image from the mirror image warehouse; and manufacturing a numerical forecasting application image based on the target Docker basic image and a user instruction, and solidifying and uploading the manufactured numerical forecasting application image to the image warehouse.
In some embodiments, the second target node pulls a numerical forecasting application and runs a job, including: the second target node dispatches and allocates computing resources for carrying out the business job tasks; pulling a target Singularity application image from the shared storage unit; and running a numerical forecasting application program based on the target Singularity application image and the numerical forecasting task script.
In some embodiments, the plurality of Docker base images includes at least MySQL application images, programming language images, and operating system images.
In some embodiments, the plurality of numerical forecasting application images includes at least an HPL application image, fvcom application images, and a WRF application image.
In some embodiments, the node scheduling unit determines a first target node from the hybrid cluster, including: installing a node group priority plug-in on the Volcano scheduler; grouping the at least one hybrid node and the at least one K8S node according to the resource type, generating a plurality of node groups, and configuring priority for each node group; the Volcano scheduler determines the first target node from the at least one hybrid node and at least one K8S node based on each of the node group configuration priorities.
In some embodiments, the plurality of node groups includes at least Slurm node groups, hybrid CPU node groups, hybrid GPU node groups, K8S CPU node groups, and K8S GPU node groups; the Volcano scheduler determining the first target node from the at least one hybrid node and at least one K8S node based on each of the node group configuration priorities, comprising: judging whether the first target node exists in the K8S CPU node group or not; if the first target node does not exist in the K8S CPU node group, judging whether the first target node exists in the mixed CPU node group or not; if the first target node does not exist in the mixed CPU node group, judging whether the first target node exists in the K8S GPU node group; and if the first target node does not exist in the K8S GPU node group, judging whether the first target node exists in the mixed GPU node group.
In some embodiments, the node scheduling unit is further configured to maintain a hybrid node list, where the hybrid node list is configured to record an operation identifier of each Slurm node, each hybrid node, and each K8S node.
In some embodiments, the first target node is used at least for development of a numerical forecasting application, management of development environments and resources, creation of a numerical forecasting application image, and management of containers; the second target node is at least used for managing the running environment, the computing resource, the running result and the running diary of the numerical forecasting application.
In some embodiments, the image creation task is initiated by a research and development user with root rights; the business job task is initiated by a business user without root authority.
Drawings
The present specification will be further elucidated by way of example embodiments, which will be described in detail by means of the accompanying drawings. The embodiments are not limiting, in which like numerals represent like structures, wherein:
FIG. 1 is a block diagram of a numerical forecasting computing cloud system based on an integrated research and development architecture according to some embodiments of the present disclosure;
FIG. 2 is a schematic diagram of an integrated system architecture for an research and development environment according to some embodiments of the present disclosure;
FIG. 3 is a flow chart of development and operation of a numerical forecasting application according to some embodiments of the present disclosure;
fig. 4 is a flow diagram illustrating a determination of a first target node from a hybrid cluster according to some embodiments of the present disclosure.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present specification, the drawings that are required to be used in the description of the embodiments will be briefly described below. It is apparent that the drawings in the following description are only some examples or embodiments of the present specification, and it is possible for those of ordinary skill in the art to apply the present specification to other similar situations according to the drawings without inventive effort. Unless otherwise apparent from the context of the language or otherwise specified, like reference numerals in the figures refer to like structures or operations.
It will be appreciated that "system," "apparatus," "unit" and/or "module" as used herein is one method for distinguishing between different components, elements, parts, portions or assemblies at different levels. However, if other words can achieve the same purpose, the words can be replaced by other expressions.
As used in this specification and the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.
A flowchart is used in this specification to describe the operations performed by the system according to embodiments of the present specification. It should be appreciated that the preceding or following operations are not necessarily performed in order precisely. Rather, the steps may be processed in reverse order or simultaneously. Also, other operations may be added to or removed from these processes.
The terms referred to in this specification will be explained first.
Numerical forecasting is an application of HPC, and numerical calculation is carried out through a large-scale computer to solve an atmospheric motion basic equation set, so that the atmospheric motion state and weather phenomenon at the future moment are forecasted;
kubernetes: is a portable, extensible, open source platform for managing containerized workloads and services that facilitates declarative configuration and automation.
Pod (container group): is the smallest unit managed by kubernetes, and multiple containers are combined together to be called Pod.
Volcano: volcano is the first and only Kubernetes-based container batch computing platform under CNCF, and is mainly used in high-performance computing scenarios. It provides a currently lacking set of mechanisms for Kubernetes that are typically required for a variety of high performance workloads for machine learning big data applications, scientific computing, special effects rendering, and the like.
Volcano Job: vcJob is a type of Job resource customized by Volcano. More advanced functions are provided, such as specifiable schedulers, support for minimum running Pod numbers, support for tasks, support for lifecycle management, support for specified queues, support for priority scheduling, etc., than Kubernetes Job, vcJob. The Volcono Job is more suitable for high-performance computing scenes such as machine learning, big data, scientific computing and the like.
CPU: the central processing unit (Central Processing Unit, abbreviated as CPU) is used as an operation and control core of the computer system and is a final execution unit for information processing and program running.
GPU: graphics processors (English: graphics processing unit, abbreviated: GPU), also known as display cores, vision processors, display chips, are microprocessors that are dedicated to image and graphics related operations on personal computers, workstations, gaming machines, and some mobile devices (e.g., tablet computers, smartphones, etc.).
Volcano Controller: a controller of Volcano, manages Volcano Job on the cluster.
Volcano Scheduler: volcano Scheduler schedule Volcano Job through a series of actions and plug-ins and find a best suited node for it.
Slurm: slurm the work scheduling tool is a free and open source work scheduler for Linux and Unix-like kernels, used by many supercomputers and computer clusters around the world.
Singularity: singularity is a container platform. It allows you to create and run containers that package software in a portable and repeatable way. You can build a container on a notebook computer using Singularity and then run it on many largest HPC clusters in the world, local universities or corporate clusters, single servers, workstations in the cloud or under the lobby.
The Harbor is an enterprise-level mirror warehouse server for storing and distributing the Docker mirror images, and provides a plurality of practical functions such as authority management, log review, hierarchical transmission, horizontal expansion, mirror image replication, graphical interfaces and the like.
The container mirror image, a read-only container obtained by using Docker packaging, the content of which is not changed after construction, can be regarded as a standardized container template, and the container is a mirror image running example.
The numerical forecasting computing cloud system based on the research and development integrated architecture can be used for realizing the research and development integrated system architecture of the operation and maintenance environment, and fig. 2 is a schematic structural diagram of the research and development integrated system architecture of the operation and maintenance environment according to some embodiments of the present disclosure, as shown in fig. 2, the research and development integrated system architecture of the operation and maintenance environment comprises a numerical forecasting research and development system and a numerical forecasting service system, which are respectively applied to research and development scenes and service scenes. In a numerical forecasting research and development system, research and development users use K8S and dock technologies to conduct the work of research and development of numerical forecasting application programs, creation of application images, management of containers and the like; in the numerical forecasting service system, common service users pull forecasting application programs constructed by a research and development system and run jobs by using Slurm and Singularity technologies. The container is a virtualization technology, which can provide isolated running space for application programs, and package, save and fast transplant the system environment in which the programs run. And a system kernel is shared among different containers, so that compared with a virtual machine, the container has the characteristics of less resource occupation, shorter starting time, more convenient migration and deployment and the like.
The architecture simultaneously builds a K8S cluster in the numerical forecasting research and development system and a Slurm cluster in the numerical forecasting research and development system, establishes a communication mechanism between the K8S cluster and the Slurm cluster, cooperatively schedules physical computing nodes, and shares bottom hardware computing resources in a mode of dynamically acquiring and releasing the node resources.
The Docker in the numerical forecasting research and development system in the architecture is the most widely used container technology at present, but is not suitable for numerical forecasting common business users without root rights.
Singularity in the numerical forecasting service system in the architecture has the characteristics of simplicity, portability, easy expansion, easy distribution, consistent user rights inside and outside the container and the like, is more suitable for containerized deployment of numerical forecasting application than Docker, but lacks more mature community support and high-quality mirror image of Docker, and cannot use efficient container arrangement systems such as K8S and the like.
In some embodiments, in the numerical forecasting development system, a Volcano framework can be introduced on the basis of a K8S cluster to process numerical forecasting batch tasks. The Volcano framework strengthens the K8S job scheduling capability, can make up for the deficiency of the K8S scheduling capability, and supports a large number of mainstream computing frameworks in the fields of machine learning, deep learning, big data and the like.
Fig. 1 is a block diagram of a numerical forecasting calculation cloud system based on an integrated architecture of research and development operation and maintenance according to some embodiments of the present disclosure, where, as shown in fig. 1, the numerical forecasting calculation cloud system based on the integrated architecture of research and development operation and maintenance at least includes a hybrid cluster, a mirror warehouse, a conversion unit, a shared storage unit and a node scheduling unit.
The hybrid clusters may include Slurm and K8S clusters, the hybrid clusters including at least one Slurm node, at least one hybrid node, and at least one K8S node, the hybrid node being scheduled by one of the Slurm and K8S clusters at the same time.
The image repository may be used to store a plurality of Docker base images and a plurality of numerical forecasting application images. The multiple Docker base images at least include MySQL application images, programming language (e.g., python, etc.) images, and operating system (e.g., centos, etc.) images, and the multiple numerical Forecasting application images at least include HPL application images, fvcom (finish-Volume Coastal Ocean Model) application images, and WRF (WEATHER RESEARCH AND Forecasting) application images.
The conversion unit may be configured to synchronously convert the plurality of Docker application images into the plurality Singularity of application images.
The shared memory unit may be used to store a plurality Singularity of application images converted by the conversion unit. Multiple Singularity application images may be stored in a cloud platform shared storage system, and a shared storage system directory may be mounted into an instantiated container to implement data persistence storage. The cloud platform shared storage system is mounted in a physical cluster, and Singularity application images (SIF files) on the cloud platform shared storage system can be scheduled to be executed through Slurm after being instantiated by service users.
The node scheduling unit may be configured to receive an application development task, determine a first target node from the hybrid cluster, where the first target node is used for scheduling by the K8S scheduler, and the first target node is at least used for development of a numerical forecasting application program, management of a development environment and resources, creation of a numerical forecasting application image, and management of a container, and the image creation task may be initiated by a development user having root rights.
The node scheduling unit is further configured to receive a service job task, and determine a second target node from the hybrid cluster, where the second target node is used for scheduling by the Slurm scheduler, the second target node is used for pulling the numerical forecasting application and running the job, and the second target node is further at least used for managing a running environment, a computing resource, a running result and a running diary of the numerical forecasting application, and the service job task can be initiated by a service user without root authority.
Fig. 3 is a schematic flow chart of development and operation of a numerical forecasting application according to some embodiments of the present disclosure, as shown in fig. 3, in some embodiments, the development of the numerical forecasting application by the first target node includes:
the first target node dispatches and allocates computing resources for carrying out mirror image making tasks;
pulling a target Docker basic mirror image from a mirror image warehouse;
And manufacturing a numerical forecasting application image based on the target Docker basic image and the user instruction, and solidifying and uploading the manufactured numerical forecasting application image to an image warehouse.
As shown in fig. 3, in some embodiments, the second target node pulls the numerical forecasting application and runs the job, including:
the second target node dispatches and distributes the computing resource used for carrying out business operation tasks;
Pulling a target Singularity application image from the shared storage unit;
And running the numerical forecasting application program based on the target Singularity application image and the numerical forecasting task script.
In some embodiments, the node scheduling unit determines the first target node from the hybrid cluster, comprising:
installing a node group priority plug-in on the Volcano scheduler;
Grouping at least one hybrid node and at least one K8S node according to the resource types to generate a plurality of node groups, and configuring priorities for each node group, wherein the plurality of node groups at least comprise Slurm node groups, hybrid CPU node groups, hybrid GPU node groups, K8S CPU node groups and K8S GPU node groups;
The Volcano scheduler determines a first target node from the at least one hybrid node and the at least one K8S node based on each node group configuration priority.
Fig. 4 is a schematic flow diagram illustrating a determination of a first target node from a hybrid cluster according to some embodiments of the present disclosure, as shown in fig. 4, in some embodiments, the Volcano scheduler determines the first target node from at least one hybrid node and at least one K8S node based on a configuration priority of each node group, including:
judging whether a first target node exists in the K8S CPU node group or not;
if the K8S CPU node group does not have the first target node, judging whether the mixed CPU node group has the first target node or not;
If the first target node does not exist in the mixed CPU node group, judging whether the first target node exists in the K8S GPU node group;
If the first target node does not exist in the K8S GPU node group, judging whether the first target node exists in the hybrid GPU node group.
In some embodiments, the node scheduling unit is further configured to maintain a mixed node list, where the mixed node list is configured to record the running identifier of each Slurm node, each mixed node, and each K8S node. When the running identifier is idle, both cluster tasks can dispatch and run the node. When a node runs a certain cluster task, the running identifier is set to be capable of only running the task. After the task of the node is completed, the running identifier of the node is reset to be idle.
It can be appreciated that the numerical forecasting computing cloud system based on the development operation and maintenance integrated architecture can at least comprise the following beneficial effects:
1. sharing and rapid construction of a research and development environment can be realized, and the time for installing and deploying the numerical forecasting system by multiple users is reduced;
2. the method can realize the encapsulation and preservation of the research and development environment, and realize the rapid recovery of the numerical forecasting research and development environment through the encapsulated file;
3. The system can realize efficient migration from a research and development environment to an operation environment, and avoid the problems of complex environment, numerous dependence, difficult deployment, poor portability and the like of the traditional numerical forecasting system;
4. the K8S and Slurm clusters are communicated with each other, so that the bottom computing resources are shared, and the resource utilization efficiency is improved;
5. slurm directly dispatches the physical nodes, a multi-stage dispatching strategy is not used any more, and dispatching efficiency is greatly improved;
6. the container instance environment is directly scheduled and used by Slurm physical clusters, only one layer of container is used, and the container performance loss is smaller;
7. the packaging numerical forecasting application of Singularity containers further reduces the loss of the containerization performance;
8. The flexible expansion of Slurm clusters is realized, and the dynamic scheduling of cluster storage computing resources according to the task demands of users is ensured;
9. By running Slurm and K8S natively in a shared environment, complexity and uncertainty of Slurm and K8S nested scheduling are avoided, and system stability is guaranteed.
While the basic concepts have been described above, it will be apparent to those skilled in the art that the foregoing detailed disclosure is by way of example only and is not intended to be limiting. Although not explicitly described herein, various modifications, improvements, and adaptations to the present disclosure may occur to one skilled in the art. Such modifications, improvements, and modifications are intended to be suggested within this specification, and therefore, such modifications, improvements, and modifications are intended to be included within the spirit and scope of the exemplary embodiments of the present invention.
Meanwhile, the specification uses specific words to describe the embodiments of the specification. Reference to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic is associated with at least one embodiment of the present description. Thus, it should be emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various positions in this specification are not necessarily referring to the same embodiment. Furthermore, certain features, structures, or characteristics of one or more embodiments of the present description may be combined as suitable.
Furthermore, the order in which the elements and sequences are processed, the use of numerical letters, or other designations in the description are not intended to limit the order in which the processes and methods of the description are performed unless explicitly recited in the claims. While certain presently useful inventive embodiments have been discussed in the foregoing disclosure, by way of various examples, it is to be understood that such details are merely illustrative and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements included within the spirit and scope of the embodiments of the present disclosure. For example, while the system components described above may be implemented by hardware devices, they may also be implemented solely by software solutions, such as installing the described system on an existing server or mobile device.
Likewise, it should be noted that in order to simplify the presentation disclosed in this specification and thereby aid in understanding one or more inventive embodiments, various features are sometimes grouped together in a single embodiment, figure, or description thereof. This method of disclosure does not imply that the subject matter of the present description requires more features than are set forth in the claims. Indeed, less than all of the features of a single embodiment disclosed above.
Finally, it should be understood that the embodiments described in this specification are merely illustrative of the principles of the embodiments of this specification. Other variations are possible within the scope of this description. Thus, by way of example, and not limitation, alternative configurations of embodiments of the present specification may be considered as consistent with the teachings of the present specification. Accordingly, the embodiments of the present specification are not limited to only the embodiments explicitly described and depicted in the present specification.
Claims (8)
1. A numerical forecasting computing cloud system based on an integrated architecture of research and development operations and maintenance, comprising:
A hybrid cluster comprising Slurm and K8S clusters, the hybrid cluster comprising at least one Slurm node, at least one hybrid node and at least one K8S node, the hybrid nodes being scheduled by one of the Slurm and K8S clusters at the same time;
The mirror image warehouse is used for storing a plurality of Docker basic mirrors and a plurality of numerical forecasting application mirrors;
the conversion unit is used for synchronously converting the plurality of Docker application images into a plurality of Singularity application images;
a shared storage unit for storing the plurality Singularity of application images converted by the conversion unit;
the node scheduling unit is used for receiving application research and development tasks and determining a first target node from the hybrid cluster, wherein the first target node is used for scheduling by the K8S scheduler and is used for research and development of numerical forecasting application;
The node scheduling unit is further configured to receive a service job task, and determine a second target node from the hybrid cluster, where the second target node is used for the Slurm scheduler to schedule, and the second target node is used to pull a numerical forecasting application and run a job;
the node scheduling unit determines a first target node from the hybrid cluster, including:
installing a node group priority plug-in on the Volcano scheduler;
grouping the at least one hybrid node and the at least one K8S node according to the resource type, generating a plurality of node groups, and configuring priority for each node group;
The Volcano scheduler determining the first target node from the at least one hybrid node and at least one K8S node based on each of the node group configuration priorities;
The plurality of node groups at least comprise Slurm node groups, a mixed CPU node group, a mixed GPU node group, a K8S CPU node group and a K8S GPU node group;
The Volcano scheduler determining the first target node from the at least one hybrid node and at least one K8S node based on each of the node group configuration priorities, comprising:
judging whether the first target node exists in the K8S CPU node group or not;
If the first target node does not exist in the K8S CPU node group, judging whether the first target node exists in the mixed CPU node group or not;
if the first target node does not exist in the mixed CPU node group, judging whether the first target node exists in the K8S GPU node group;
And if the first target node does not exist in the K8S GPU node group, judging whether the first target node exists in the mixed GPU node group.
2. The numerical forecasting computing cloud system based on the research and development operation and maintenance integrated architecture of claim 1, wherein the first target node performs research and development of a numerical forecasting application, and the method comprises the following steps:
the first target node dispatches and allocates computing resources for carrying out the mirror image making task;
Pulling a target Docker basic mirror image from the mirror image warehouse;
And manufacturing a numerical forecasting application image based on the target Docker basic image and a user instruction, and solidifying and uploading the manufactured numerical forecasting application image to the image warehouse.
3. The invention of claim 1, wherein the second target node pulls the numerical forecasting application and runs the job, comprising:
The second target node dispatches and allocates computing resources for carrying out the business job tasks;
pulling a target Singularity application image from the shared storage unit;
And running a numerical forecasting application program based on the target Singularity application image and the numerical forecasting task script.
4. The invention provides a numerical forecasting computing cloud system based on an integrated research and development operation and maintenance architecture, wherein the plurality of Docker base images at least comprise MySQL application images, programming language images and operating system images.
5. The system of claim 4, wherein the plurality of numerical forecast application images includes at least an HPL application image, fvcom application images, and a WRF application image.
6. The numerical forecasting computing cloud system of any one of claims 1-5, wherein the node scheduling unit is further configured to maintain a hybrid node list, where the hybrid node list is configured to record the running identifier of each Slurm node, each hybrid node, and each K8S node.
7. The development operation and maintenance integration architecture-based numerical forecasting computing cloud system of any one of claims 1-5, wherein the first target node is at least used for development of numerical forecasting application programs, management of development environments and resources, creation of numerical forecasting application images and management of containers;
the second target node is at least used for managing the running environment, the computing resource, the running result and the running diary of the numerical forecasting application.
8. The numerical forecasting calculation cloud system based on the research and development operation and maintenance integrated architecture according to any one of claims 1-5, wherein the mirror image making task is initiated by a research and development user with root authority;
the business job task is initiated by a business user without root authority.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311148883.6A CN117170812B (en) | 2023-09-07 | 2023-09-07 | Numerical forecasting calculation cloud system based on research and development operation and maintenance integrated architecture |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311148883.6A CN117170812B (en) | 2023-09-07 | 2023-09-07 | Numerical forecasting calculation cloud system based on research and development operation and maintenance integrated architecture |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117170812A CN117170812A (en) | 2023-12-05 |
CN117170812B true CN117170812B (en) | 2024-05-03 |
Family
ID=88944615
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311148883.6A Active CN117170812B (en) | 2023-09-07 | 2023-09-07 | Numerical forecasting calculation cloud system based on research and development operation and maintenance integrated architecture |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117170812B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017045424A1 (en) * | 2015-09-18 | 2017-03-23 | 乐视控股(北京)有限公司 | Application program deployment system and deployment method |
CN112835714A (en) * | 2021-01-29 | 2021-05-25 | 中国人民解放军国防科技大学 | Container arrangement method, system and medium for CPU heterogeneous cluster in cloud edge environment |
CN112965819A (en) * | 2021-03-04 | 2021-06-15 | 山东英信计算机技术有限公司 | Method and device for mixed scheduling of container resources across processor architectures |
WO2021208546A1 (en) * | 2020-04-16 | 2021-10-21 | 南京邮电大学 | Multi-dimensional resource scheduling method in kubernetes cluster architecture system |
CN114968601A (en) * | 2022-07-28 | 2022-08-30 | 合肥中科类脑智能技术有限公司 | Scheduling method and scheduling system for AI training jobs with resources reserved according to proportion |
CN115118723A (en) * | 2022-05-31 | 2022-09-27 | 中科曙光国际信息产业有限公司 | Cluster scheduling system |
WO2022227447A1 (en) * | 2021-04-27 | 2022-11-03 | 上海商汤科技开发有限公司 | Task processing apparatus and method, computer device, and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113377520B (en) * | 2021-07-07 | 2023-03-24 | 北京百度网讯科技有限公司 | Resource scheduling method, device, equipment and storage medium |
-
2023
- 2023-09-07 CN CN202311148883.6A patent/CN117170812B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017045424A1 (en) * | 2015-09-18 | 2017-03-23 | 乐视控股(北京)有限公司 | Application program deployment system and deployment method |
WO2021208546A1 (en) * | 2020-04-16 | 2021-10-21 | 南京邮电大学 | Multi-dimensional resource scheduling method in kubernetes cluster architecture system |
CN112835714A (en) * | 2021-01-29 | 2021-05-25 | 中国人民解放军国防科技大学 | Container arrangement method, system and medium for CPU heterogeneous cluster in cloud edge environment |
CN112965819A (en) * | 2021-03-04 | 2021-06-15 | 山东英信计算机技术有限公司 | Method and device for mixed scheduling of container resources across processor architectures |
WO2022227447A1 (en) * | 2021-04-27 | 2022-11-03 | 上海商汤科技开发有限公司 | Task processing apparatus and method, computer device, and storage medium |
CN115118723A (en) * | 2022-05-31 | 2022-09-27 | 中科曙光国际信息产业有限公司 | Cluster scheduling system |
CN114968601A (en) * | 2022-07-28 | 2022-08-30 | 合肥中科类脑智能技术有限公司 | Scheduling method and scheduling system for AI training jobs with resources reserved according to proportion |
Non-Patent Citations (3)
Title |
---|
基于容器技术的电力系统业务应用资源池系统设计研究;耿贞伟;权鹏宇;李少华;;数字技术与应用;20170115(01);全文 * |
自定义任务调度系统的快速构建;于连河;;电信快报;20200810(08);全文 * |
面向容器的集群资源管理系统研究;李英华;;无线互联科技;20170410(07);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN117170812A (en) | 2023-12-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10467725B2 (en) | Managing access to a resource pool of graphics processing units under fine grain control | |
WO2020108303A1 (en) | Heterogeneous computing-based task processing method and software-hardware framework system | |
US9946563B2 (en) | Batch scheduler management of virtual machines | |
CN104794194B (en) | A kind of distributed heterogeneous concurrent computational system towards large scale multimedia retrieval | |
CN104123182B (en) | Based on the MapReduce task of client/server across data center scheduling system and method | |
CN101599026A (en) | A kind of cluster job scheduling system with resilient infrastructure | |
US9104491B2 (en) | Batch scheduler management of speculative and non-speculative tasks based on conditions of tasks and compute resources | |
CN112395736B (en) | Parallel simulation job scheduling method of distributed interactive simulation system | |
CN113504902B (en) | Industrial APP integrated development system and related equipment | |
CN112860396B (en) | GPU scheduling method and system based on distributed deep learning | |
Ye et al. | SHWS: Stochastic hybrid workflows dynamic scheduling in cloud container services | |
CN111353609A (en) | Machine learning system | |
CN115686805A (en) | GPU resource sharing method and device, and GPU resource sharing scheduling method and device | |
CN113377493A (en) | Container cloud simulation system and design method thereof | |
Hu et al. | GPGPU cloud: A paradigm for general purpose computing | |
CN115080207A (en) | Task processing method and device based on container cluster | |
CN117170812B (en) | Numerical forecasting calculation cloud system based on research and development operation and maintenance integrated architecture | |
CN116302581B (en) | Novel intelligent power distribution terminal and system | |
CN103582877A (en) | Computer system interrupt handling | |
US9898343B2 (en) | Application-level dispatcher control of application-level pseudo threads and operating system threads | |
CN115269140A (en) | Container-based cloud computing workflow scheduling method, system and equipment | |
CN108762891A (en) | A kind of cloud platform resource regulating method and device | |
CN117170811B (en) | Node grouping job scheduling method and system based on volcano | |
CN114298313A (en) | Artificial intelligence computer vision reasoning method | |
Hsiao et al. | Cloud Computing, Internet of Things (IoT), Edge Computing, and Big Data Infrastructure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |