CN111897622B - High-throughput computing method and system based on container technology - Google Patents

High-throughput computing method and system based on container technology Download PDF

Info

Publication number
CN111897622B
CN111897622B CN202010523599.2A CN202010523599A CN111897622B CN 111897622 B CN111897622 B CN 111897622B CN 202010523599 A CN202010523599 A CN 202010523599A CN 111897622 B CN111897622 B CN 111897622B
Authority
CN
China
Prior art keywords
container
workflow
job
grid
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010523599.2A
Other languages
Chinese (zh)
Other versions
CN111897622A (en
Inventor
黄荷
徐蕴琪
�金钟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Network Information Center of CAS
Original Assignee
Computer Network Information Center of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Network Information Center of CAS filed Critical Computer Network Information Center of CAS
Priority to CN202010523599.2A priority Critical patent/CN111897622B/en
Publication of CN111897622A publication Critical patent/CN111897622A/en
Application granted granted Critical
Publication of CN111897622B publication Critical patent/CN111897622B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a high-throughput computing method and system based on a container technology, which relate to the field of high-throughput computing.A workflow description file defines workflow jobs, each workflow job consists of one or more subtask jobs, and a dependency relationship between the subtask jobs is defined through a directed graph; constructing the subtask operation into an operation container in a container mode, and connecting a resource pool, wherein the resource pool comprises computing resources and storage resources which are mainly composed of local physical resources, grid resources and virtualized resources; and scheduling, distributing, running, monitoring and managing each subtask according to the dependency relationship. A container and related technologies are utilized to realize a high-throughput computing platform for butting local physical resources, grid resources and virtualized cloud resources, influence factors of the environment are shielded for computing, resource requirements under different scenes are met, research efficiency and flexibility are improved, and meanwhile, support of a system to workflow morphological computing tasks is taken into consideration.

Description

High-throughput computing method and system based on container technology
Technical Field
The invention relates to the field of high-throughput computing, in particular to a high-throughput computing method and system based on a container technology.
Background
With the development of cloud computing and virtualization technologies, the container packages the application and the dependent environment thereof in a standardized manner through a kernel lightweight virtualization technology, provides an isolated operating environment for application programs and services, and has the advantages of being fast, efficient, easy to migrate and the like. Compared with the traditional virtual machine, the container environment directly runs on a host operating system, the additional requirement on system resources is far lower than that of the virtual machine, and the container arrangement framework and other related technologies provide comprehensive support for the arrangement and management of multiple containers, so that the container arrangement framework and other related technologies can be widely applied to multiple business scenes such as continuous integration and continuous deployment, automatic testing, microservice and the like.
High-throughput computing includes applications in high-throughput material computing, high-throughput material integration computing, material genome computing, and the like, and generally, the high-throughput computing operation is executed in a command line manner in a computing cluster in which related applications are installed. However, running a computing task in this mode still presents certain challenges: computing has higher requirements on the environment, the problem that the computing application cannot be compatible with the operating system or system software often occurs when the operating system of an older version installs a new third-party computing application, and the time cost and risk brought by upgrading the operating system or system software are relatively high under the condition, so that the time cost of compatibility and debugging is increased; the reproducibility of results is difficult to ensure, the reproduction of general data results needs to ensure a complete reproducibility mechanism of the whole system stack, if only the consistency of source codes is relied on, the completely same environment for obtaining a specific calculation result can not be ensured to reproduce, and the usability of high-throughput calculation is reduced; the computing resource is single in form and is difficult to meet the requirements of different computing scenes. High-throughput computing tasks typically include multiple relatively independent subtask steps, and the subtasks and their mutual relationship definitions form a workflow, so high-throughput computing tasks can often be described in terms of workflows. Support for workflow morphism tasks is an important requirement for high throughput computing systems.
Disclosure of Invention
The invention aims to solve the problems of difficult environmental compatibility, difficult result reproduction, single computing resource form and the like in a high-throughput computing scene, and provides a high-throughput computing method and a high-throughput computing system based on a container technology.
In order to achieve the purpose, the invention provides the following technical scheme:
a high throughput computing method based on container technology, comprising the steps of:
defining workflow jobs through a workflow description file, wherein each workflow job consists of one or more subtask jobs, the subtask jobs are executed serially or synchronously and parallelly according to the sequence, and the dependency relationship between the subtask jobs is defined through a directed graph;
constructing the subtask operation into an operation container in a container mode, and connecting a resource pool, wherein the resource pool comprises computing resources and storage resources which are mainly composed of local physical resources, grid resources and virtualized resources;
and scheduling, distributing, running, monitoring and managing each subtask according to the dependency relationship.
A high-throughput computing system based on container technology comprises a physical layer, a scheduling operation layer, a workflow engine layer and an application layer, wherein:
the physical layer is positioned at the bottom layer and is used for providing a uniform resource pool which comprises computing resources and storage resources, wherein the computing resources and the storage resources are mainly composed of local physical resources, grid resources and virtualized resources;
the scheduling operation layer is positioned at the upper layer of the physical layer and is used for constructing subtask jobs into job containers in a container mode, the job containers are connected with a resource pool of the physical layer, the definition of the workflow jobs is obtained through a workflow job description file, each workflow job consists of one or more subtask jobs, the subtask jobs are executed serially or synchronously and parallelly according to the sequence, the dependency relationship among the subtask jobs is defined through a directed graph, and each subtask is scheduled, distributed and operated according to the dependency relationship;
the workflow engine layer is positioned on the upper layer of the scheduling operation layer and is used for analyzing, dispatching, monitoring and managing the subtasks of the workflow operation;
the application layer is positioned on the upper layer of the workflow engine layer and is used for encapsulating functions from the workflow engine layer and the scheduling and running layer and providing a visual interface and an entrance for uniformly accessing the system for a user.
Further, the virtualized resources include container instance services provided by public cloud vendors.
Further, the scheduling operation layer comprises the following two modules:
(1) a container scheduling module: the system comprises a job scheduling module, a job container module and a job configuration module, wherein the job scheduling module is used for monitoring the configuration information of the newly-built job and then distributing the job container to the corresponding work module by using a job scheduling strategy;
(2) a container working module: for running the job container on different resources.
Further, the job scheduling policy includes resource availability, resource load, and job-to-resource propensity.
Further, the operation container comprises a local operation container, a grid operation expression container and an on-cloud operation container, wherein the local operation container and the on-cloud operation container are respectively connected with local physical resources and virtualized resources through a unified interface provided by the container scheduling module, the grid operation expression container is connected with grid middleware and an API through the unified interface of the container scheduling module, and the operation is operated in a grid environment and state information of the operation is obtained in real time.
Further, the process of the grid job presentation container acquires the state of the remote grid job in real time, and presents the actual job state as the outward state of the container; the grid job performance container comprises a tool packet and script codes which are in butt joint with a grid resource environment, the script codes are run to log in a grid by a specified user identity when the grid job performance container is started, and operation of submitting a job and uploading a job file is carried out through a grid environment API; after a job is successfully created in the grid environment, the grid job presentation container continuously polls through the running guardian process to check the status of the job in the remote environment and updates its own status accordingly.
Further, the container is a Docker container.
Further, the container is constructed and operated via a Kubernetes open source platform.
Further, the workflow engine layer processes and controls the workflow operation by using the workflow tool Argo.
Further, the workflow engine layer includes the following seven components:
(1) CLI: the system is used for realizing operations such as adding, deleting, modifying, checking and the like of the workflow operation through a command line tool;
(2) a workflow controller: the system is used for controlling the execution of workflow processes, so that the scheduling and running layer sequentially creates containers according to the process sequence and the operation state and executes calculation; the workflow controller forms a plurality of subtask job configurations with execution sequences by interpreting and splitting the workflow description file; the subtask job configuration comprises basic mirror image information, an execution command, resource requirements and input and output, can be identified by the scheduling and running layer, and creates a container with corresponding calculation content;
(3) a workflow queue: the system is used for storing one or more workflow jobs to be processed, and the workflow job at the head of the queue is processed by the controller preferentially;
(4) and (3) a subtask queue: the scheduling operation layer is used for storing one or more subtask jobs to be scheduled, and the job at the head of the queue is sent to the scheduling operation layer by the controller preferentially;
(5) the log and monitoring module: the system is used for log collection and job container execution state monitoring and transmitting to the controller;
(6) the garbage recycling module: the system is used for marking, sorting or deleting the operation containers and the related files which are finished in operation, wrong in operation and ended in operation;
(7) client API: and a programming interface for externally exposing control and operation.
Furthermore, the workflow operation is defined by a workflow description file, an application layer defines each subtask step and the mutual relation thereof in the workflow according to specific application, and the functions of parameter transmission, condition judgment and recursive calling are supported in the process.
Furthermore, the scheduling running layer sends out a monitoring request, controls the execution of the flow according to the job state when the job execution state changes, and receives the error log information submitted by the scheduling running layer.
Further, for the log and monitoring module, the log collection method is to obtain the operation index of the workflow job from the scheduling operation layer, provide centralized management of log information, and provide the functions of checking, saving and deleting the log information of all the workflow jobs; the monitoring comprises monitoring of the execution state of the operation container, and the controller can conveniently and timely carry out re-operation on abnormal operation.
Further, the application layer includes the following three parts:
(1) web backend: providing an API which is easy to understand and use for a front-end interface, and realizing the butt joint and operation with a workflow engine layer by using a client API of the workflow engine layer;
(2) a Web interface: providing a system use interface for a user, providing functions of user login and logout, submission, termination, deletion, workflow progress check and workflow list of a user workflow, and displaying and explaining results in a visual mode;
(3) application of the database: for storing user data and application specific data.
Furthermore, the Web back-end language adopts Python, adopts a flash frame, is connected by Python PyMODM library for data interaction processing, reads data from the database or stores the changed data into the database; the Web interface adopts an Vue framework and realizes user authentication and authorization by using a Vue-auth authentication library.
Further, the application database adopts a MongoDB database.
And further, the remote mirror image warehouse is used for uniformly storing the application mirror images packaged by the users, so that centralized management and distribution are facilitated.
Further, the system adopts a hierarchical design based on a browser/server mode.
The high-throughput computing system based on the container technology provided by the embodiment of the invention can be used for connecting three resource forms, namely local physical resources, grid resources, virtualized resources and the like, effectively completing the operation and monitoring of high-throughput workflow operation through the workflow engine by utilizing the container and related technologies, shielding the influence factors of the environment for computing and improving the usability and flexibility of the computing system.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the example serve to explain the principles of the invention and not to limit the invention.
FIG. 1 is a design architecture diagram of a high throughput computing system based on container technology in accordance with an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a method for computing high throughput workflow according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments and the accompanying drawings.
A first aspect of the present invention is to define the overall flow of a computational task in a high-throughput job in the form of a workflow.
A workflow job is made up of one or more subtask jobs.
The subtask jobs can be executed serially in a sequential order or synchronously and parallelly, and the dependency relationship can be defined by a directed graph.
The system acquires the definition of the workflow job through the workflow job description file, carries out scheduling processing on each subtask step according to the dependency relationship, and uniformly manages the operation state of the job and the subtasks thereof.
A second aspect of the invention is to build and run subtasks in a workflow job on a container basis.
The system creates a container mirror image for encapsulating the running environment of each subtask, and performs job scheduling by taking the container as a unit, thereby realizing flexible deployment of diversified environments required by different subtasks and solving the problem that the calculation result is difficult to reproduce.
The third aspect of the invention is to interface three physical resources of different forms, including local physical resources, grid resources and virtualized cloud resources, through a polymorphic presentation container. The system provides a class of performance containers corresponding to each physical resource, downward interfaces with different forms of computing resources through polymorphic interfaces, upward exposes a uniform management operation interface, and supports uniform scheduling management, thereby shielding the use difficulty caused by the diversity of bottom layer resources.
The embodiment provides a high-throughput computing system based on container technology, and with reference to fig. 1, the high-throughput computing system based on container technology is a design architecture diagram, the architecture is based on a browser/server mode, and a hierarchical design is adopted. The system specifically comprises:
the system comprises a physical layer, a scheduling operation layer, a workflow engine layer and an application layer. The system forms a high-throughput computing system which is in butt joint with three resource forms, namely local physical resources, grid resources, virtualized resources and the like, can submit, run and manage high-throughput workflow computing operation through the four layers.
The physical layer is located at the bottom layer of the system and is used for providing computing resources and storage resources for the computing system, and specifically, the physical layer can include three resource forms, namely local physical resources, grid resources and virtualized resources, to form a resource pool capable of being flexibly selected, so that the scheduling layer can build and run high-throughput computing jobs. Alternatively, the virtualized resource may be a Container Instance (Container Instance) service provided by a public cloud vendor, which has the advantage that the Container can be run without managing the underlying server by specifying the image and only paying for the resource consumed by the actual running of the Container.
The scheduling running layer is positioned at the upper layer of the physical layer and is used for scheduling and running high-throughput computing operation, and scheduling distribution and unified management are carried out on the operation through a scheduling strategy. Jobs are built and run in the form of containers under the management of a container orchestration framework. Alternatively, the container may be a Docker container, which is a mainstream open source container technology implemented based on Go language, providing an efficient, agile, and lightweight container solution. Optionally, the container arrangement technology may be kubernets, which is an open source platform for automated deployment, capacity expansion, and operation and maintenance of a container cluster, and provides a complete open source scheme for container arrangement management. The initial design was service centric, with version 1.2 being followed by kubernets starting to support Job types, i.e., batch processing tasks. Because the scheduling framework is in a plug-in mode, a user can customize a scheduling strategy according to the requirement of the user, and the expansibility of the user facing different task scheduling requirements is greatly increased. This layer contains two modules:
(1) a container scheduling module: when the module monitors the information of the newly-built job configuration through the interface, the module binds the job to the corresponding working module according to scheduling strategies such as resource availability, resource load, tendency of the job to the resource and the like, and informs the container working module to take over the subsequent work;
(2) a container working module: for the real running of the job container on different resources, generally, before the formal container is started, some necessary preparation work such as data initialization is carried out, and then the container mirror is pulled and the job container is started until the container running finishes exiting or is terminated or fails exiting. Illustratively, job containers are divided into three types, a local job container, a grid job presentation container, and an on-cloud job container. The container is connected with the grid middleware and the API through the unified interface of the container scheduling module, the operation is operated in a grid environment and state information of the operation is acquired in real time, and the external performance of the operation is basically consistent with other two operation containers through containerization packaging. The grid job presentation container does not actually run the computational tasks, and the process in the container can acquire the state of the remote grid job in real time and "present" the actual job state as the outward state of the container. The container contains a tool kit and script codes for connecting grid resource environment, and the script codes are run when the container is started to log in the grid by the specified user identity, and operations such as submitting operation and uploading operation files are carried out through a grid environment API. After the job is successfully created in the grid environment, the monitoring process running in the container continuously polls to check the state of the job in the remote environment, and updates the state of the monitoring process correspondingly until exiting.
And the workflow engine layer is used for processing and controlling the workflow operation. Optionally, the workflow technology may be Argo, Argo is a workflow tool implemented based on Custom Resources (CRD) of kubernets, and the workflow control and task operation are implemented based on scheduling capabilities of kubernets. This layer includes the following six components:
(1) CLI: the component completes operations such as addition, deletion, modification, check and the like of the workflow by using a command line tool through encapsulating functions from a scheduling operation layer;
(2) a workflow controller: for control of the workflow. The workflow description file is interpreted and split to form a plurality of job configurations with execution sequence, so that the scheduling and running layer can create a container and execute jobs according to the job configurations, send monitoring requests to the scheduling and running layer, control the execution of the flow according to the job state when the job execution state changes, and receive error log information submitted by the scheduling and running layer. The workflow description file can define workflow operation in various flow forms such as DAG and the like, the flow supports the functions of parameter transmission, condition judgment, recursive calling and the like, the operation configuration generally comprises basic mirror image information, an execution command, resource requirements, input and output and the like, and the configuration can be identified by a scheduling operation layer and creates a container with corresponding calculation content;
(3) a workflow queue: the system comprises a workflow queue, a controller and a workflow analysis module, wherein the workflow queue is used for storing one or more workflow descriptions to be processed, the workflow submitted by a user is added into the workflow queue in time sequence, and the workflow description at the head of the queue is processed by the controller preferentially;
(4) and (3) job queue: the system comprises a scheduling operation layer, a workflow management layer and a workflow management layer, wherein the scheduling operation layer is used for storing one or more than one job configuration to be scheduled, the jobs in the split workflow are added into an operation queue according to the flow sequence and the completion state of the front jobs, and the jobs at the head of the queue are sent to the scheduling operation layer by a controller preferentially;
(5) the log and monitoring module: the system is used for log collection and job container execution state monitoring, and transmits information to the controller. The log collection is to obtain the operation index of the workflow job from the scheduling operation layer, and provide the centralized management of the log information, so as to provide the functions of checking, saving and deleting the log information of all the workflow jobs. The monitoring mainly comprises monitoring the execution state of the operation container, so that the controller can conveniently perform operations such as rerun on abnormal operation in time;
(6) the garbage recycling module: the system is used for processing the operation containers and related files of the operation end, the operation error and the operation termination, and marking, sorting, deleting or other operations are carried out.
The application layer is used for encapsulating functions from the workflow engine layer and the scheduling operation layer, providing an interface convenient to use, and enabling a user to complete operations such as submission, viewing, termination, deletion and the like of the workflow on the interface. The layer includes the following three portions:
(1) web backend: the method comprises the steps of providing an API (application programming interface) interface which is easy to understand and use, processing data requests sent from a front end, using a client library of a workflow engine layer to realize the interface and operation with the workflow engine layer, and further filtering data according to specific requirements. Optionally, the backend language may be Python, the backend framework technology may be a flash framework, and when interaction with the database is required, the Python pymod library is connected to perform data interaction processing, and data from the database is read or changed data is stored in the database.
(2) A Web interface: the system is used for providing a friendly system use interface for users, and providing functions of user login and logout, submission, termination and deletion of user workflows, workflow progress viewing, workflow list and the like. And the front-end interface and the front-end interaction logic are responsible, and the front-end and the back-end are separated based on the API (application programming interface) display page provided by the back end. Optionally, the front-end framework may be Vue, the Vue-auth authentication library may be used to implement user authentication and authorization, and the front-end and the back-end may interact by calling an API interface of the back-end using Ajax.
(3) An application database: the data storage system is used for storing user data and data of a high-throughput computing workflow and providing data support for Web applications. Alternatively, the database technology used may be the MongoDB database, which is a database based on distributed file storage, aiming at providing an extensible high-performance data storage solution for Web applications.
The system also comprises a remote mirror image warehouse which is used for uniformly storing the packed application mirror images and is convenient for centralized management and distribution.
FIG. 2 is a schematic flow chart of a high throughput workflow calculation method, in which 201 and 206 are corresponding steps.
The high-throughput computing system based on the container technology provided by the embodiment enables a user to conveniently submit a high-throughput computing workflow, meets the requirements of different computations, and provides computing support for related research works.
The above description is only a specific embodiment of the present invention and is not intended to limit the present invention. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (8)

1. A high-throughput computing method based on container technology is characterized by comprising the following steps:
defining workflow jobs through a workflow description file, wherein each workflow job consists of one or more subtask jobs, the subtask jobs are executed serially or synchronously and parallelly according to the sequence, and the dependency relationship between the subtask jobs is defined through a directed graph;
constructing the subtask operation into an operation container in a container mode, and connecting a resource pool, wherein the resource pool comprises computing resources and storage resources which are mainly composed of local physical resources, grid resources and virtualized resources; the operation container comprises a local operation container, a grid operation expression container and an on-cloud operation container, wherein the local operation container and the on-cloud operation container are respectively butted with local physical resources and virtualized resources through a unified interface provided by a container scheduling module, the grid operation expression container is butted with a grid middleware and an API (application programming interface) through the unified interface of the container scheduling module, and the operation is operated in a grid environment and state information of the operation is acquired in real time; the process of the grid operation expression container acquires the state of the remote grid operation in real time and presents the actual operation state as the external state of the container; the grid job performance container comprises a tool packet and script codes which are in butt joint with a grid resource environment, the script codes are run to log in a grid by a specified user identity when the grid job performance container is started, and operation of submitting a job and uploading a job file is carried out through a grid environment API; after the job is successfully created in the grid environment, the grid job performance container continuously polls and checks the state of the job in the remote environment through the running monitoring process, and updates the state of the job correspondingly;
and scheduling, distributing, running, monitoring and managing each subtask according to the dependency relationship.
2. A high-throughput computing system based on container technology is characterized by comprising a physical layer, a scheduling operation layer, a workflow engine layer and an application layer, wherein:
the physical layer is positioned at the bottom layer and is used for providing a uniform resource pool which comprises computing resources and storage resources, wherein the computing resources and the storage resources are mainly composed of local physical resources, grid resources and virtualized resources;
the scheduling operation layer is positioned at the upper layer of the physical layer and is used for constructing subtask jobs into job containers in a container mode, the job containers are connected with a resource pool of the physical layer, the definition of the workflow jobs is obtained through a workflow job description file, each workflow job consists of one or more subtask jobs, the subtask jobs are executed serially or synchronously and parallelly according to the sequence, the dependency relationship among the subtask jobs is defined through a directed graph, and each subtask is scheduled, distributed and operated according to the dependency relationship; the operation container comprises a local operation container, a grid operation expression container and an on-cloud operation container, wherein the local operation container and the on-cloud operation container are respectively butted with local physical resources and virtualized resources through a unified interface provided by a container scheduling module, the grid operation expression container is butted with a grid middleware and an API (application programming interface) through the unified interface of the container scheduling module, and the operation is operated in a grid environment and state information of the operation is acquired in real time; the process of the grid operation expression container acquires the state of the remote grid operation in real time and presents the actual operation state as the external state of the container; the grid job performance container comprises a tool packet and script codes which are in butt joint with a grid resource environment, the script codes are run to log in a grid by a specified user identity when the grid job performance container is started, and operation of submitting a job and uploading a job file is carried out through a grid environment API; after the job is successfully created in the grid environment, the grid job performance container continuously polls and checks the state of the job in the remote environment through the running monitoring process, and updates the state of the job;
the workflow engine layer is positioned on the upper layer of the scheduling operation layer and is used for analyzing, dispatching, monitoring and managing the subtasks of the workflow operation;
the application layer is positioned on the upper layer of the workflow engine layer and is used for encapsulating functions from the workflow engine layer and the scheduling and running layer and providing a visual interface and an entrance for uniformly accessing the system for a user.
3. The system of claim 2, wherein the schedule run layer includes the following two modules:
(1) a container scheduling module: after monitoring the newly-built job configuration information, allocating the job container to the corresponding working module by using a job scheduling strategy, wherein the job scheduling strategy comprises resource availability, resource load and tendency of the job to the resource;
(2) a container working module: for running the job container on different resources.
4. The system of claim 2, wherein the container is a Docker container constructed and operated via a Kubernetes open source platform.
5. The system of claim 2, wherein the workflow engine layer processes and controls the workflow job using the workflow tool Argo, the workflow engine layer comprising:
(1) CLI: the system comprises a command line tool, a workflow execution tool and a workflow execution tool, wherein the command line tool is used for realizing the operations of adding, deleting, modifying and viewing workflow jobs;
(2) a workflow controller: the system is used for controlling the execution of workflow processes, so that the scheduling and running layer sequentially creates containers according to the process sequence and the operation state and executes calculation; the workflow controller forms a plurality of subtask job configurations with execution sequences by interpreting and splitting the workflow description file; the subtask job configuration comprises basic mirror image information, an execution command, resource requirements and input and output, can be identified by the scheduling and running layer, and creates a container with corresponding calculation content;
(3) a workflow queue: the system is used for storing one or more workflow jobs to be processed, and the workflow job at the head of the queue is processed by the controller preferentially;
(4) and (3) a subtask queue: the scheduling operation layer is used for storing one or more subtask jobs to be scheduled, and the job at the head of the queue is sent to the scheduling operation layer by the controller preferentially;
(5) the log and monitoring module: the system is used for log collection and job container execution state monitoring and transmitting to a controller, and the log collection method is used for obtaining the operation indexes of the workflow jobs from a scheduling operation layer, providing centralized management of log information and providing functions of checking, storing and deleting the log information of all the workflow jobs; the monitoring comprises monitoring the execution state of the operation container, so that the controller can conveniently and timely carry out re-operation on abnormal operation;
(6) the garbage recycling module: the system is used for marking, sorting or deleting the operation containers and the related files which are finished in operation, wrong in operation and ended in operation;
(7) client API: and a programming interface for exposing control and operation to the outside.
6. The system of claim 2, wherein the application layer comprises:
(1) web backend: providing an API which is easy to understand and use for a front-end interface, and realizing the butt joint and operation with a workflow engine layer by using a client API of the workflow engine layer;
(2) a Web interface: providing a system use interface for a user, providing functions of user login and logout, submission, termination, deletion, workflow progress check and workflow list of a user workflow, and displaying and explaining results in a visual mode;
(3) application of the database: for storing user data and application specific data.
7. The system of claim 6, wherein the Web backend language employs Python, employs a flash framework, and performs data interaction processing by Python PyMODM library connection, reading data from the database or storing modified data into the database; the Web interface adopts an Vue framework, and realizes user authentication and authorization by using a Vue-auth authentication library; the application database adopts a MongoDB database.
8. The system of claim 2, further comprising a remote mirror repository for uniformly storing the packaged application images for centralized management and distribution.
CN202010523599.2A 2020-06-10 2020-06-10 High-throughput computing method and system based on container technology Active CN111897622B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010523599.2A CN111897622B (en) 2020-06-10 2020-06-10 High-throughput computing method and system based on container technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010523599.2A CN111897622B (en) 2020-06-10 2020-06-10 High-throughput computing method and system based on container technology

Publications (2)

Publication Number Publication Date
CN111897622A CN111897622A (en) 2020-11-06
CN111897622B true CN111897622B (en) 2022-09-30

Family

ID=73206653

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010523599.2A Active CN111897622B (en) 2020-06-10 2020-06-10 High-throughput computing method and system based on container technology

Country Status (1)

Country Link
CN (1) CN111897622B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112882810B (en) * 2021-02-25 2023-02-07 国家超级计算天津中心 High-throughput task processing method based on supercomputer
CN112698924A (en) * 2021-03-23 2021-04-23 杭州太美星程医药科技有限公司 Clinical test electronic data acquisition system and operation method thereof
CN113110923B (en) * 2021-03-25 2023-10-20 南京飞灵智能科技有限公司 Use method and device of workflow engine based on k8s
CN113225269B (en) * 2021-04-16 2022-11-22 鹏城实验室 Container-based workflow scheduling method, device and system and storage medium
CN113326123B (en) * 2021-04-30 2024-03-26 杭州绳武科技有限公司 Biological information analysis and calculation system and method based on container technology
CN113190328A (en) * 2021-05-22 2021-07-30 北京理工大学 System identification-oriented containerized cloud workflow processing system and method
CN113535326B (en) * 2021-07-09 2024-04-12 粤港澳大湾区精准医学研究院(广州) Calculation flow scheduling system based on high-throughput sequencing data
CN114064083A (en) * 2021-11-22 2022-02-18 江苏安超云软件有限公司 Method for deploying cloud native application through self-defined template in configuration center and application
WO2023102869A1 (en) * 2021-12-10 2023-06-15 上海智药科技有限公司 Task management system, method and apparatus, device, and storage medium
CN114327834A (en) * 2021-12-31 2022-04-12 中国第一汽车股份有限公司 Multi-concurrent data processing method and device
CN115147031B (en) * 2022-09-07 2022-12-06 深圳华锐分布式技术股份有限公司 Clearing workflow execution method, device, equipment and medium
CN117112184B (en) * 2023-10-23 2024-02-02 深圳市魔数智擎人工智能有限公司 Task scheduling service method and system based on container technology

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376017A (en) * 2019-01-07 2019-02-22 人和未来生物科技(长沙)有限公司 Cloud computing platform task processing method, system and its application method based on container
CN110389823A (en) * 2018-04-19 2019-10-29 广东石油化工学院 It is a kind of based on virtualization container technique cloud computing environment under workflow task dispatching method
CN111045791A (en) * 2019-12-16 2020-04-21 武汉智领云科技有限公司 Big data containerization central scheduling system and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11055135B2 (en) * 2017-06-02 2021-07-06 Seven Bridges Genomics, Inc. Systems and methods for scheduling jobs from computational workflows

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110389823A (en) * 2018-04-19 2019-10-29 广东石油化工学院 It is a kind of based on virtualization container technique cloud computing environment under workflow task dispatching method
CN109376017A (en) * 2019-01-07 2019-02-22 人和未来生物科技(长沙)有限公司 Cloud computing platform task processing method, system and its application method based on container
CN111045791A (en) * 2019-12-16 2020-04-21 武汉智领云科技有限公司 Big data containerization central scheduling system and method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Skyport: container-based execution environment management for multi-cloud scientific workflows;W Gerlach等;《2014 5th International Workshop on Data-Intensive Computing in the Clouds》;20141121;第25-31页 *
W Gerlach等.Skyport: container-based execution environment management for multi-cloud scientific workflows.《2014 5th International Workshop on Data-Intensive Computing in the Clouds》.2014,第25-31页. *
云环境下支持弹性伸缩的容器化工作流框架;刘彪等;《计算机工程》;20190331;第45 卷(第3 期);第7-13页 *

Also Published As

Publication number Publication date
CN111897622A (en) 2020-11-06

Similar Documents

Publication Publication Date Title
CN111897622B (en) High-throughput computing method and system based on container technology
US11675620B2 (en) Methods and apparatus to automate deployments of software defined data centers based on automation plan and user-provided parameter values
US20200133651A1 (en) Release automation service in software development tools
US20100262558A1 (en) Incorporating Development Tools In System For Deploying Computer Based Process On Shared Infrastructure
US11301262B2 (en) Policy enabled application-release-management subsystem
US9513874B2 (en) Enterprise computing platform with support for editing documents via logical views
US20220197249A1 (en) Dynamic Cloud Deployment of Robotic Process Automation (RPA) Robots
US20220391225A1 (en) Web-based robotic process automation designer systems and automations for virtual machines, sessions, and containers
CN112256406B (en) Operation flow platformization scheduling method
US11650810B1 (en) Annotation based automated containerization
KR102446568B1 (en) Robotic Process Automation Running in Session 2 Automation of Process Running in Session 1 via Robot
Zhao et al. Realizing fast, scalable and reliable scientific computations in grid environments
JP2023070148A (en) Systems and methods for dynamically binding robotic process automation (RPA) robots to resources
US10452371B2 (en) Automating enablement state inputs to workflows in z/OSMF
WO2022109932A1 (en) Multi-task submission system based on slurm computing platform
Shaffer et al. Lightweight function monitors for fine-grained management in large scale Python applications
KR20220007496A (en) A robot running in a second session of a process running in the first session Automation through a robot
EP4124946A1 (en) Optimized software delivery to airgapped robotic process automation (rpa) hosts
US11762676B2 (en) Optimized software delivery to airgapped robotic process automation (RPA) hosts
US11971705B2 (en) Autoscaling strategies for robotic process automation
JP2023159886A (en) System, apparatus and method for deploying robotic process automation across multiple operating systems
CN115964030A (en) Application development system and application development method
CN117806654A (en) Tekton-based custom cloud native DevOps pipeline system and method
Doninger et al. SAS® Grid 101: How It Can Modernize Your Existing SAS® Environment
Lu PROJECT DELIVERABLE

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant