WO2022056735A1 - Cloud high-performance scientific calculation workflow design control system and graphical user interface - Google Patents

Cloud high-performance scientific calculation workflow design control system and graphical user interface Download PDF

Info

Publication number
WO2022056735A1
WO2022056735A1 PCT/CN2020/115613 CN2020115613W WO2022056735A1 WO 2022056735 A1 WO2022056735 A1 WO 2022056735A1 CN 2020115613 W CN2020115613 W CN 2020115613W WO 2022056735 A1 WO2022056735 A1 WO 2022056735A1
Authority
WO
WIPO (PCT)
Prior art keywords
workflow
node
task
file
user
Prior art date
Application number
PCT/CN2020/115613
Other languages
French (fr)
Chinese (zh)
Inventor
谈樑
刘阳
鄂同富
姜子麒
马健
温书豪
赖力鹏
Original Assignee
深圳晶泰科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳晶泰科技有限公司 filed Critical 深圳晶泰科技有限公司
Priority to PCT/CN2020/115613 priority Critical patent/WO2022056735A1/en
Publication of WO2022056735A1 publication Critical patent/WO2022056735A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/20Software design

Definitions

  • the invention belongs to the technical field of high-performance computing and visualization, and particularly relates to a cloud-based high-performance scientific computing workflow design control system and a user graphical interface.
  • the standardized workflow language provides scientific computing process developers with a standardized method of invoking cloud resources to perform tasks, it satisfies the needs of scientific computing process definition and design to complete high-throughput computing.
  • the R&D personnel still need to master the writing of the language, understand the working mechanism of the cloud middleware and the abstract model of the complete process, especially in the scientific computing workflow R&D stage, the R&D personnel often constantly improve the computing nodes Algorithmic functions, adjustment and replacement of nodes in the workflow, if there is no interactive graphical interface, the research and development efficiency will be greatly reduced.
  • the main way to realize this process is mainly to write description files by engineers who have the foundation of workflow language writing and then transfer them to the workflow engine.
  • the interactive graphical interface can effectively speed up this process and help R&D personnel to master the workflow. Panorama.
  • Mature scientific computing workflow software is deployed in the cloud. Because it is integrated by the software developer and the cloud manufacturer, the whole set of usage process is consistent with the functions of the local software. Only the process of deployment and installation in the cloud is free of installation and use in the browser. The usage time is charged.
  • This method does not fully utilize the core advantages of scheduling cloud computing services, such as elastic scaling of computing resources, which can not only improve computing efficiency but also improve cost performance.
  • each algorithm module or computing analysis node is not a cloud-native architecture, but an architecture integrated with the cloud software.
  • the computing resources use the software framework to schedule the computer where it is located.
  • service function computing has many data statistics and analysis tools, it is at a disadvantage in large-scale parallel computing under high-throughput requirements.
  • the present invention provides a cloud high-performance scientific computing workflow design control system and a user graphical interface, which can be used for cloud-based high-performance scientific computing by dragging and configuring human-computer interaction functions on a user terminal through cloud native software. Workflow design, deployment and execution control.
  • the present invention allows users to deploy the algorithm written in a containerized manner to the cloud through the interactive mode of real-time visual operation, design, arrange and combine different algorithms, and create and publish a set of workflows that can call mainstream public cloud computing resources. Collaborate and reuse with authorized users, conveniently test and execute computing tasks, and enable instant dynamic control through the interface during computing. .
  • the cloud high-performance scientific computing workflow design control system includes three layers: the underlying service layer, the analysis layer and the human-computer interaction layer;
  • the underlying service layer provides basic services for upper-layer applications in the form of SDK, including: task scheduling and execution middleware, computing and storage resource management services, task monitoring and log services;
  • the main function of the task scheduling and execution middleware is to enable the cloud computing cluster to package and run tasks through packaged instructions
  • the computing and storage resource service the main function of which provides resource uploading, downloading, storage and distribution for the workflow and actual running tasks of the client;
  • the main function of the task monitoring and log service is to collect and store data such as status and error throws returned during the task running process, and to query through commands.
  • the parsing layer is the actual support of the architecture, providing machine language conversion for the actual operation of the user on the interface, so that the system and the underlying service can operate according to the user's instructions, and at the same time, the data returned by the underlying service is in accordance with the The designed form of the system is converted to make the data conform to the user's interface operation habits and display logic;
  • the parsing layer includes three modules: workflow description language interpretation parser, workflow generator, and task dispatcher;
  • the main function of the workflow description language interpretation parser is to apply a standardized workflow language to describe and convert cloud-native containerized software or algorithms.
  • the node that can be dragged and controlled on the graphical interface, as well as the properties that the node needs to display, etc.
  • On the underlying server the user's operation is converted into the data structure required by the underlying SDK;
  • the main function of the workflow generator is to process and convert the workflow nodes and related configurations that users drag and drop to combine into workflow files, so that the task distributor can identify the workflow sequence relationship, distribution mode and parameter configuration. ;
  • the main function of the task dispatcher is to call the underlying service according to the configuration file generated by the workflow generator, and finally realize the analysis of the workflow, the execution and distribution of the task.
  • the human-computer interaction layer is responsible for realizing the functions of operation behavior and parameter configuration for the user, including four modules: a workflow management module, a work node control panel, a workflow arrangement module, and a task manager;
  • the main function of the workflow management module is to download and load containers and workflow description files in the cloud, and manage, modify, or introduce workflows in the form of files on the client side;
  • the main function of the work node control panel is to allow the user to configure the work node on the client side, and to configure the execution properties and operation parameters of the node; the operation mode of the node is: single-point mode or parallel mode.
  • the main function of the workflow orchestration module is to allow users to introduce different workflow nodes and establish connection relationships between different workflow nodes;
  • the main function of the task manager is to be able to view the running status, input and output and task logs of each node's task or each sub-task in a concurrent task in real time.
  • the present invention also provides a user graphical interface for the above-mentioned cloud high-performance scientific computing workflow design control system, including a workflow management module, a workflow arrangement module, a workflow node control module, and a task management module;
  • the workflow management module regards the description file of a workflow node as an independent file, and also regards the description file of a workflow as an independent file; The relationship between them is realized through the analysis of the workflow orchestration module;
  • the workflow of the workflow management module is:
  • (1.2) Publish containerized software; by completing the cloud native description file, the user's local algorithm file or designed workflow is converted into an expression file that can be recognized by the task dispatcher through the workflow generator, and then the calculation in the underlying service is called. and middleware for storage resource services, containerize and package algorithms or designed workflows, and upload them to the cloud. Before containerization, users can configure the name of the container and the deployment environment;
  • the workflow orchestration module parses the opened workflow file by calling the workflow description language interpretation parser, or drags or imports the workflow node file, and correctly identifies it as each node graph, node graph
  • the connection order and relationship of are displayed in the orchestration panel; the workflow of the workflow orchestration module is as follows:
  • connection Configure the connection; by double-clicking the connection line, connect the output parameter of the previous step with the input parameter of the next step, the relationship is that one output parameter can be selected corresponding to multiple input parameters; if you want to configure multiple lower parameters for one output parameter
  • One-step input parameters can be achieved by adding a link input parameter.
  • public parameters are functions of fixed control of specific parameters in the workflow, which are connected to any step like a node, and carry out one-to-many assignment transfer with the input parameters of the node.
  • the workflow node control module the workflow node controller parses the node file introduced into the workflow by calling the workflow description language interpretation parser, and provides the user with the configuration function of the node characteristics; the process flow of the workflow node control module for:
  • Task node attribute configuration At present, a node is divided into the following attributes, sequential operation, conditional branch and implementation intervention.
  • the selection sequence operation is that if there is no manual intervention, the node will perform the operation of the next node after the operation is completed, so that the workflow can automatically run until an error is thrown or completed; the selection conditional branch is based on the output parameter results of a certain step.
  • the output of a conditional branch needs to be connected to at least two different nodes; when the scientific computing workflow reaches the current node, it will be automatically paused, and the parameters will be set separately after the user has reviewed the parameters.
  • the input parameters of the next node can continue to execute;
  • the task running mode is configured in the JobIn area.
  • the user selects Scatter to perform concurrent computation on a specific input parameter.
  • the scale of concurrency can be assigned by the specific value in Value or the function calculation relationship.
  • the user selects Gather to converge multiple concurrently generated operation results to a certain output parameter to form data or files in dictionary format;
  • the task management module allows the user to learn the running status of a node's single-point task or concurrent task, input parameters, and output parameters and running logs through the operation of the orchestration panel. This module calls the interface of task monitoring and log service in the underlying service; after the task is started, the workflow orchestration module can visually see the workflow running stage and the running status of each node, and each node has five statuses : Standby (grey), Passed (green), Running (blue), Running but faulted (yellow) and Aborted (red). If all nodes turn green, the workflow has completed successfully.
  • the process of the task management module is:
  • control the workflow node that is being executed includes the following three types: suspend the node task, modify the node parameters, and restart the node task; select a workflow node in the orchestration module, when it is found that the input and output parameters during the task running do not meet expectations , and check the log to find an exception but the task is still running, you can choose to pause to stop the running node task to avoid unnecessary waste of resources; when the workflow includes the implementation of the intervention node, you can modify the workflow node to control module, modify the output parameters, and then continue to run the selected workflow node, you can continue to execute the workflow from the current node; when any workflow is manually stopped at a node, it can be stopped at the node. Select Restart Node to resume the workflow from the current node.
  • the design of the parsing layer encapsulates the standardized workflow language and the standardized method for executing tasks on cloud resources, which satisfies the concise definition and convenient design of the scientific computing process, and fulfills the needs of high-throughput computing. The ability to penetrate to the underlying service.
  • the design of the human-computer interaction layer allows users to quickly grasp the panorama of the scientific computing workflow by simply dragging, clicking, configuring and other graphical interface interactions in the process of designing the scientific computing process, reducing the design workflow. engineering threshold.
  • the cloud-native software architecture design ensures that in the research and development stage of scientific computing workflow, developers can continuously improve the algorithm functions of computing nodes, easily release self-developed algorithm software, adjust and replace nodes in the workflow, and improve research and development efficiency.
  • the control module of the node task realizes or performs conditional manual intervention for a node, and gradually achieves the controllability of the task process, the saving of computing resources, and the dynamic balance of visual interaction.
  • FIG. 1 is a schematic diagram of a system architecture of the present invention
  • Fig. 2 is the user graphical interface schematic diagram of the present invention
  • Fig. 3 is the user graphic abstract interface diagram of the present invention.
  • FIG. 4 is a schematic flow chart of publishing containerized software according to an embodiment
  • FIG. 5 is a schematic diagram of the design and release workflow of an embodiment
  • FIG. 6 is a schematic diagram of a workflow monitoring and management process in operation according to an embodiment.
  • the cloud high-performance scientific computing workflow design control system includes three layers: the underlying service layer, the analysis layer and the human-computer interaction layer;
  • the underlying service layer provides basic services for upper-layer applications in the form of SDK, including: task scheduling and execution middleware, computing and storage resource management services, task monitoring and log services;
  • the main function of the task scheduling and execution middleware is to enable the cloud computing cluster to package and run tasks through packaged instructions
  • the computing and storage resource service the main function of which provides resource uploading, downloading, storage and distribution for the workflow and actual running tasks of the client;
  • the main function of the task monitoring and log service is to collect and store data such as status and error throws returned during the task running process, and to query through commands.
  • the parsing layer is the actual support of the architecture, providing machine language conversion for the actual operation of the user on the interface, so that the system and the underlying service can operate according to the user's instructions, and at the same time, the data returned by the underlying service is in accordance with the Convert the designed form of the system to make the data conform to the user's interface operation habits and display logic;
  • the parsing layer consists of three modules: Workflow Description Language Interpretation Parser, Workflow Generator, and Task Dispatcher;
  • the main function of the workflow description language interpretation parser is to apply a standardized workflow language to describe and convert cloud-native containerized software or algorithms.
  • the node that can be dragged and controlled on the graphical interface, as well as the properties that the node needs to display, etc.
  • On the underlying server the user's operation is converted into the data structure required by the underlying SDK;
  • the main function of the workflow generator is to process and convert the workflow nodes and related configurations that users drag and drop to combine into workflow files, so that the task distributor can identify the workflow sequence relationship, distribution mode and parameter configuration. ;
  • the main function of the task dispatcher is to call the underlying service according to the configuration file generated by the workflow generator, and finally realize workflow analysis, task execution and distribution.
  • the human-computer interaction layer is responsible for implementing the functions of operation behavior and parameter configuration for users, including four modules: a workflow management module, a work node control panel, a workflow arrangement module, and a task manager.
  • the main function of the workflow management module is to download and load containers and workflow description files in the cloud, and manage, modify, or introduce workflows in the form of files on the client side;
  • the main function of the work node control panel is to allow the user to configure the work node on the client side, and to configure the execution properties and operation parameters of the node; the operation mode of the node is: single-point mode or parallel mode.
  • the main function of the workflow orchestration module is to allow users to introduce different workflow nodes and establish connection relationships between different workflow nodes;
  • the main function of the task manager is to be able to view the running status, input and output and task logs of each node's task or each sub-task in a concurrent task in real time.
  • This embodiment also provides a user graphical interface for the above-mentioned control system, and an abstract interface diagram is shown in FIG. 3 .
  • the specific structure is shown in Figure 2, including a workflow management module, a workflow arrangement module, a workflow node control module, and a task management module;
  • the workflow management module regards the description file of a workflow node as an independent file, and also regards the description file of a workflow as an independent file; The relationship between them is realized through the analysis of the workflow orchestration module;
  • the workflow of the workflow management module is:
  • (1.2) Publish containerized software; by completing the cloud native description file, the user's local algorithm file or designed workflow is converted into an expression file that can be recognized by the task dispatcher through the workflow generator, and then the calculation in the underlying service is called. and middleware for storage resource services, containerize and package algorithms or designed workflows, and upload them to the cloud. Before containerization, users can configure the name of the container and the deployment environment;
  • the workflow orchestration module parses the opened workflow file by calling the workflow description language interpretation parser, or drags or imports the workflow node file, and correctly identifies it as each node graph, node graph
  • the connection order and relationship of are displayed in the orchestration panel; the workflow of the workflow orchestration module is as follows:
  • connection Configure the connection; by double-clicking the connection line, connect the output parameter of the previous step with the input parameter of the next step, the relationship is that one output parameter can be selected corresponding to multiple input parameters; if you want to configure multiple lower parameters for one output parameter
  • One-step input parameters can be achieved by adding a link input parameter.
  • public parameters are functions of fixed control of specific parameters in the workflow, which are connected to any step like a node, and carry out one-to-many assignment transfer with the input parameters of the node.
  • the workflow node control module the workflow node controller parses the node file introduced into the workflow by calling the workflow description language interpretation parser, and provides the user with the configuration function of the node characteristics; the process flow of the workflow node control module for:
  • Task node attribute configuration At present, a node is divided into the following attributes, sequential operation, conditional branch and implementation intervention.
  • the selection sequence operation is that if no manual intervention is performed, the node will perform the operation of the next node after the operation is completed, so that the workflow can automatically run until an error is thrown or completed; the selection conditional branch is based on the output parameter results of a certain step.
  • the output of a conditional branch needs to be connected to at least two different nodes; when the scientific computing workflow reaches the current node, it will be automatically paused, and the parameters will be set separately after the user has reviewed the parameters.
  • the input parameters of the next node can continue to execute;
  • the task running mode is configured in the JobIn area.
  • the user selects Scatter to perform concurrent computation on a specific input parameter.
  • the scale of concurrency can be assigned by the specific value in Value or the function calculation relationship.
  • the user selects Gather to converge multiple concurrently generated operation results to a certain output parameter to form data or files in dictionary format;
  • the task management module allows the user to learn the running status of a node's single-point task or concurrent task, input parameters, and output parameters and running logs through the operation of the orchestration panel. This module calls the interface of task monitoring and log service in the underlying service; after the task is started, the workflow orchestration module can visually see the workflow running stage and the running status of each node, and each node has five statuses : Standby (grey), Passed (green), Running (blue), Running but faulted (yellow) and Aborted (red). If all nodes turn green, the workflow has completed successfully.
  • the process of the task management module is:
  • control the workflow node that is being executed includes the following three types: suspend the node task, modify the node parameters, and restart the node task; select a workflow node in the orchestration module, when it is found that the input and output parameters during the task running do not meet expectations , and check the log to find an exception but the task is still running, you can choose to pause to stop the running node task to avoid unnecessary waste of resources; when the workflow includes the implementation of the intervention node, you can modify the workflow node to control module, modify the output parameters, and then continue to run the selected workflow node, you can continue to execute the workflow from the current node; when any workflow is manually stopped at a node, it can be stopped at the node. Select Restart Node to resume the workflow from the current node.
  • the containerized software is released, as shown in Figure 4.
  • the user can deploy the locally written algorithm to the cloud through the user interface, turning it into a reusable cloud-native application, which can be easily invoked by a series of scientific computing workflows. for design and operation.
  • the specific implementation process is as follows:
  • Step 1 The user imports the local algorithm file through the file import function of the workflow management module, and the left menu will generate a corresponding description file for it;
  • Step 2 The user double-clicks the description file, and after translation by the workflow description language interpretation parser, the user interface pops up the node control module to display the content in the description file;
  • Step 3 The user edits the parameters through the form provided by the user interface, and configures the attributes, types and default values of the parameters;
  • Step 4 After the configuration is complete, select the containerized software function on the toolbar to verify the written content;
  • Step 5 If the verification is passed, the client calls the underlying computing and storage services for cloud deployment. If the verification fails, go back to the third step to configure the description file.
  • FIG. 5 the design and release workflow is shown in Figure 5.
  • This embodiment enables users to use the deployed cloud software or algorithms to design and arrange scientific computing processes through dragging and visual form configuration, and deploy them to the cloud.
  • the specific implementation process is as follows:
  • the first step is to introduce cloud native applications: the user opens the cloud image library through the workflow management module, and imports a batch of required cloud algorithm or software description files. After the introduction, all the corresponding description files appear in the menu;
  • the second step is to introduce the editor: the user drags and drops the description files in sequence or selects a batch into the task orchestration module.
  • the orchestration module After translation by the workflow description language interpretation parser, the orchestration module correctly displays the name, graph, status, etc. of the node. information;
  • the third step of connection and arrangement The user can drag and drop the nodes in the canvas of the task arrangement module, or automatically layout, to arrange the arrangement positions of multiple nodes, and then the user can focus on the data and calculation sequence of the workflow. , connect different nodes in turn;
  • the fourth step is to configure the node: After the connection is completed, the user configures the node.
  • the concurrent and recycling configuration can be performed for one or more parameters of the nodes with data inflow and data outflow;
  • the fifth step is to save and publish: After the configuration is completed, save it.
  • the software calls the parsing layer to verify. If the connection and configuration are correct, it can be successfully saved. If there is an error, it will prompt the user to continue the configuration;
  • Step 6 Cloud deployment and construction: The saved files can be deployed in the cloud. After the user selects the cloud deployment, the basic cloud information is configured, the analysis layer calls the underlying computing and storage services, and the cloud service performs the software construction of the workflow. , which prompts the user on the user interface.
  • the running workflow monitoring and management can be integrated by various types of user task management systems to realize the multiplexing support for each collaboration group or business department.
  • the user can open and view the running status of the workflow through the task management module in various systems.
  • the specific implementation process is as follows:
  • the first step is to select the task of the user terminal task system: the user selects the task through the task management module integrated with other systems and invokes the interface for viewing details.
  • the task management module invokes the underlying computing and services through the parsing layer, and requests the workflow file of the task carrier. After the download is successful, it is parsed by the workflow description language interpretation parser and displayed in the orchestration module.
  • the second step is to select a node in the task arrangement panel and view the detailed information: the workflow after parsing is completed, the overall workflow information and the connection and status of each node will be displayed.
  • the user selects a node, right-clicks and selects to view the node information to open the task management module.
  • the third step is to select a task log in the concurrent tasks: after opening the management module of a node, if the node is in parallel mode, you can view the status and log of each concurrent subtask, right-click the selected subtask, And click View Log to open the subtask log for task debugging.

Abstract

The present invention provides a cloud high-performance scientific calculation workflow design control system and a graphical user interface. The control system comprises three layers, wherein an underlying service layer provides, in the form of an SDK, a basic service for an upper-layer application; a parsing layer is an actual architectural support, and provides machine language conversion for an actual operation performed by a user on an interface, so that the system and an underlying service can operate according to an instruction of the user, and at the same time, data returned by the underlying service is converted according to a form designed by the system, such that the data conforms to an interface operation habit and display logic of the user; and a human-computer interaction layer is responsible for realizing operation behavior and parameter configuration functions for the user. The graphical user interface comprises a workflow management module, a workflow orchestration module, a workflow node control module and a task management module. According to the present invention, by means of cloud native software, human-computer interaction functions, such as dragging and configuration, are performed at a user terminal, so as to complete control over the design, deployment and execution of a cloud high-performance scientific calculation workflow.

Description

云端高性能科学计算工作流设计控制系统及用户图形界面Cloud high-performance scientific computing workflow design control system and user graphical interface 技术领域technical field
本发明属于高性能计算和可视化技术领域,具体涉及云端高性能科学计算工作流设计控制系统及用户图形界面。The invention belongs to the technical field of high-performance computing and visualization, and particularly relates to a cloud-based high-performance scientific computing workflow design control system and a user graphical interface.
背景技术Background technique
标准化工作流语言虽然为科学计算流程研发人员提供了标准化调用云端资源执行任务的方法,满足了科学计算流程进行定义和设计,完成高通量计算的需求。但在设计科学计算流程的过程中,研发人员仍需要掌握该语言的编写,了解云端中间件工作机制和完整流程的抽象模型,尤其在科学计算工作流研发阶段,研发人员往往不断地改进计算节点算法功能,调整和替换工作流程中的节点,若没有可交互的图形化界面,研发效率会大打折扣。目前,实现该流程的主要方式主要由有工作流语言编写基础的工程人员进行编写描述文件后传入工作流引擎,而交互式的图形化界面能够有效加快这一过程,帮助研发人员掌握工作流全景图。Although the standardized workflow language provides scientific computing process developers with a standardized method of invoking cloud resources to perform tasks, it satisfies the needs of scientific computing process definition and design to complete high-throughput computing. However, in the process of designing the scientific computing process, the R&D personnel still need to master the writing of the language, understand the working mechanism of the cloud middleware and the abstract model of the complete process, especially in the scientific computing workflow R&D stage, the R&D personnel often constantly improve the computing nodes Algorithmic functions, adjustment and replacement of nodes in the workflow, if there is no interactive graphical interface, the research and development efficiency will be greatly reduced. At present, the main way to realize this process is mainly to write description files by engineers who have the foundation of workflow language writing and then transfer them to the workflow engine. The interactive graphical interface can effectively speed up this process and help R&D personnel to master the workflow. Panorama.
成熟的科学计算工作流软件在云端部署,由于是由软件开发厂商与云厂商进行集成,一整套使用流程与本地软件功能一致,仅云端化部署和安装的过程,浏览器中免安装使用,按照使用时长收费,该方式并未完全发挥调度云计算服务的核心优势诸如计算资源的弹性伸缩,既能提升计算效率又同时提升性价比。究其原因,是因为每一个算法模块或者计算分析节点并非云原生架构,而是与该云端软件集成的架构,运算资源使用软件框架对其所在计算机进行调度,相比于公有云厂商提供的无服务函数计算,虽然拥有众多数据统计和分析工具,但是在高通量需求下的大规模并行的运算方面则处于劣势。Mature scientific computing workflow software is deployed in the cloud. Because it is integrated by the software developer and the cloud manufacturer, the whole set of usage process is consistent with the functions of the local software. Only the process of deployment and installation in the cloud is free of installation and use in the browser. The usage time is charged. This method does not fully utilize the core advantages of scheduling cloud computing services, such as elastic scaling of computing resources, which can not only improve computing efficiency but also improve cost performance. The reason is that each algorithm module or computing analysis node is not a cloud-native architecture, but an architecture integrated with the cloud software. The computing resources use the software framework to schedule the computer where it is located. Although service function computing has many data statistics and analysis tools, it is at a disadvantage in large-scale parallel computing under high-throughput requirements.
实际的科学计算流程设计和开发的场景中,考虑到云原生搭建工作流的便利性和可行性,申请人已采取将成熟软件进行容器化封装,并对其进行与研究人员开发的算法形成一致的计算机表达,从而形成实际科学计算工作流程在云端的实际可用。但实际在工作流执行的过程中,一个工作流对应一个任务实例,用户很难通过任务返回的状态数据来判断每一个节点的运行情况,而通过节点返回的数据总量较大,使得在命令行环境下难以查找和辨识,除此之外,用户往往需要对任务进行动态控制,或者针对某节点进行有条件的人工干预,以达到任务过程可控,计算资源节约,以及可视化交互操作的动态平衡。In the actual scientific computing process design and development scenarios, considering the convenience and feasibility of cloud-native workflow construction, the applicant has adopted containerized packaging of mature software, and has made it consistent with the algorithms developed by researchers. The computer representation of the actual scientific computing workflow is thus actually available in the cloud. However, in the actual process of workflow execution, a workflow corresponds to a task instance, and it is difficult for users to judge the operation of each node through the status data returned by the task, and the total amount of data returned by the node is large, which makes the command It is difficult to find and identify in the running environment. In addition, users often need to dynamically control the task, or perform conditional manual intervention for a certain node, so as to achieve controllable task process, save computing resources, and visualize the dynamics of interactive operations. balance.
发明内容SUMMARY OF THE INVENTION
针对上述技术问题,本发明提供一种云端高性能科学计算工作流设计控制系统及用户 图形界面,通过云原生软件在用户终端进行拖拽、配置人机交互功能,完成用于云端高性能科学计算工作流的设计、部署和执行控制。In view of the above technical problems, the present invention provides a cloud high-performance scientific computing workflow design control system and a user graphical interface, which can be used for cloud-based high-performance scientific computing by dragging and configuring human-computer interaction functions on a user terminal through cloud native software. Workflow design, deployment and execution control.
本发明可以让用户通过实时可视化操作的交互方式,将编写的算法进行容器化部署到云端,并对不同算法进行设计编排组合,制成和发布一套可以调用主流公有云计算资源的工作流程,与授权用户协作复用,便利地测试和执行运算任务,并能够在运算过程中通过界面进行即时动态控制。。The present invention allows users to deploy the algorithm written in a containerized manner to the cloud through the interactive mode of real-time visual operation, design, arrange and combine different algorithms, and create and publish a set of workflows that can call mainstream public cloud computing resources. Collaborate and reuse with authorized users, conveniently test and execute computing tasks, and enable instant dynamic control through the interface during computing. .
具体技术方案为:The specific technical solutions are:
云端高性能科学计算工作流设计控制系统,包括三层:底层服务层,解析层和人机交互层;The cloud high-performance scientific computing workflow design control system includes three layers: the underlying service layer, the analysis layer and the human-computer interaction layer;
所述的底层服务层:底层服务以SDK形式,为上层应用提供基础的服务,包括:任务调度和执行中间件、计算和存储资源管理服务、任务监控和日志服务;The underlying service layer: the underlying service provides basic services for upper-layer applications in the form of SDK, including: task scheduling and execution middleware, computing and storage resource management services, task monitoring and log services;
所述的任务调度和执行中间件,主要功能为能够通过封装好的指令,让云端计算集群进行任务装箱和运行;The main function of the task scheduling and execution middleware is to enable the cloud computing cluster to package and run tasks through packaged instructions;
所述的计算和存储资源服务,主要功能为用户端的工作流和实际运行的任务提供资源上传、下载、储存和分发;The computing and storage resource service, the main function of which provides resource uploading, downloading, storage and distribution for the workflow and actual running tasks of the client;
所述的任务监控和日志服务,主要功能为对任务运行过程中返回的状态和抛错等数据进行收集和储存,并能够通过指令进行查询。The main function of the task monitoring and log service is to collect and store data such as status and error throws returned during the task running process, and to query through commands.
所述的解析层:解析层是架构实际支撑,为用户在界面上的实际操作,提供机器语言转换,使得系统和底层服务能够根据用户的指令进行运作,同时,将底层服务传回的数据按照系统设计好的形式进行转换,使数据符合用户的界面操作习惯和展示逻辑;解析层包含三个模块:工作流描述语言释义解析器、工作流生成器、任务分发器;The parsing layer: the parsing layer is the actual support of the architecture, providing machine language conversion for the actual operation of the user on the interface, so that the system and the underlying service can operate according to the user's instructions, and at the same time, the data returned by the underlying service is in accordance with the The designed form of the system is converted to make the data conform to the user's interface operation habits and display logic; the parsing layer includes three modules: workflow description language interpretation parser, workflow generator, and task dispatcher;
所述的工作流描述语言释义解析器,主要功能是应用标准化的工作流语言,对云原生的容器化软件或算法进行描述转换,该转换的效果是:在用户端将描述文件转换成用户在图形界面上可以进行拖拽控制的节点,以及该节点需要展示的属性等。在底层服务端,将用户的操作转换为底层SDK所需要的数据结构;The main function of the workflow description language interpretation parser is to apply a standardized workflow language to describe and convert cloud-native containerized software or algorithms. The node that can be dragged and controlled on the graphical interface, as well as the properties that the node needs to display, etc. On the underlying server, the user's operation is converted into the data structure required by the underlying SDK;
所述的工作流生成器,主要功能是将用户进行拖拽组合的工作流节点以及相关配置进行处理并转换成工作流文件,能够让任务分发器识别出工作流顺序关系、分发模式和参数配置;The main function of the workflow generator is to process and convert the workflow nodes and related configurations that users drag and drop to combine into workflow files, so that the task distributor can identify the workflow sequence relationship, distribution mode and parameter configuration. ;
所述的任务分发器,主要功能是根据工作流生成器生成的配置文件,对底层服务进行 调用,最终实现工作流的解析,任务的执行和分发。The main function of the task dispatcher is to call the underlying service according to the configuration file generated by the workflow generator, and finally realize the analysis of the workflow, the execution and distribution of the task.
所述的人机交互层:人机交互层负责为用户实现操作行为、参数配置功能,包括四个模块:工作流管理模块、工作节点控制面板、工作流编排模块、任务管理器;The human-computer interaction layer: the human-computer interaction layer is responsible for realizing the functions of operation behavior and parameter configuration for the user, including four modules: a workflow management module, a work node control panel, a workflow arrangement module, and a task manager;
所述的工作流管理模块,主要功能是能够将云端的容器、工作流描述文件进行下载和加载,在用户端以文件的形式进行管理,修改,或引入工作流;The main function of the workflow management module is to download and load containers and workflow description files in the cloud, and manage, modify, or introduce workflows in the form of files on the client side;
所述的工作节点控制面板,主要功能是让用户在用户端对工作节点进行配置,对该节点的执行属性、运行参数进行配置;节点的运行模式为:单点模式或并行模式。The main function of the work node control panel is to allow the user to configure the work node on the client side, and to configure the execution properties and operation parameters of the node; the operation mode of the node is: single-point mode or parallel mode.
所述的工作流编排模块,主要功能是能够让用户引入不同的工作流节点,并且建立不同工作流节点的连接关系;The main function of the workflow orchestration module is to allow users to introduce different workflow nodes and establish connection relationships between different workflow nodes;
所述的任务管理器,主要功能是能够实时查看每一个节点的任务或并发任务中每一个子任务的运行状态,输入输出和任务日志。The main function of the task manager is to be able to view the running status, input and output and task logs of each node's task or each sub-task in a concurrent task in real time.
本发明还提供用于上述的云端高性能科学计算工作流设计控制系统的用户图形界面,包括工作流管理模块、工作流编排模块、工作流节点控制模块、任务管理模块;The present invention also provides a user graphical interface for the above-mentioned cloud high-performance scientific computing workflow design control system, including a workflow management module, a workflow arrangement module, a workflow node control module, and a task management module;
所述的工作流管理模块:工作流管理模块将一个工作流节点的描述文件视为一个独立的文件,将一个工作流的描述文件同样视为一个独立的文件;而工作流与工作流节点之间的关系,是通过工作流编排模块解析实现;The workflow management module: the workflow management module regards the description file of a workflow node as an independent file, and also regards the description file of a workflow as an independent file; The relationship between them is realized through the analysis of the workflow orchestration module;
工作流管理模块的流程为:The workflow of the workflow management module is:
(1.1)导入本地文件;通过系统的文件管理系统,将用户本的算法文件引入,会根据算法名称自动生成该算法的云原生描述文件;(1.1) Import the local file; import the user's algorithm file through the system's file management system, and the cloud native description file of the algorithm will be automatically generated according to the algorithm name;
(1.2)发布容器化软件;通过完成云原生描述文件,将用户本地的算法文件或者设计好的工作流通过工作流生成器转化成任务分发器能够识别的表达文件,随后调用底层服务中的计算和存储资源服务的中间件,对算法或者设计好的工作流进行容器化打包,并且上传到云端,容器化前,用户可以对该容器的名称和部署环境进行配置;(1.2) Publish containerized software; by completing the cloud native description file, the user's local algorithm file or designed workflow is converted into an expression file that can be recognized by the task dispatcher through the workflow generator, and then the calculation in the underlying service is called. and middleware for storage resource services, containerize and package algorithms or designed workflows, and upload them to the cloud. Before containerization, users can configure the name of the container and the deployment environment;
(1.3)导入云端文件;通过调用底层服务中的计算和存储资源服务,将云端已部署的容器化软件的描述文件,下载到客户端本地,允许在用户端进行管理和操作。(1.3) Import cloud files; by calling the computing and storage resource services in the underlying services, the description files of the containerized software deployed in the cloud are downloaded to the local client, allowing management and operation on the client.
(1.4)导出和保存;通过系统的文件管理系统,将用户的文件另存到本地磁盘环境中,令用户通过系统资源管理器来管理文件。(1.4) Export and save; save the user's file to the local disk environment through the system's file management system, so that the user can manage the file through the system resource manager.
所述的工作流编排模块:工作流编排模块通过调用工作流描述语言释义解析器解析打开的工作流文件,或拖拽或引入的工作流节点文件,将其正确识别为各个节点图形、节点 图形的连接顺序和关系,展示在编排面板中;工作流编排模块的的流程为:The workflow orchestration module: the workflow orchestration module parses the opened workflow file by calling the workflow description language interpretation parser, or drags or imports the workflow node file, and correctly identifies it as each node graph, node graph The connection order and relationship of , are displayed in the orchestration panel; the workflow of the workflow orchestration module is as follows:
(2.1)读取工作流文件;通过双击工作流管理模块中的工作流后缀文件,将文件中所包含的工作流节点,节点间关系加载到编排面板中;(2.1) Read the workflow file; by double-clicking the workflow suffix file in the workflow management module, load the workflow nodes and relationships between nodes contained in the file into the arrangement panel;
(2.2)添加工作流节点;通过从工作流管理模块中拖拽节点文件,或者在工具栏中选择添加节点,可以为工作流加入新的节点;(2.2) Adding workflow nodes; new nodes can be added to the workflow by dragging and dropping node files from the workflow management module, or selecting Add Nodes in the toolbar;
(2.3)连接工作流节点;通过选择一个工作流节点的输出端,拖拽拉出箭头,连接到目标工作流节点的输入端,即可确定科学计算工作流的上下步骤;(2.3) Connecting workflow nodes; by selecting the output end of a workflow node, dragging and pulling the arrow, and connecting to the input end of the target workflow node, the upper and lower steps of the scientific computing workflow can be determined;
(2.4)配置连接;通过双击连接线,将上一步的输出参数与下一步的输入参数对应相连,其关系为一个输出参数可选对应多个输入参数;若要为一个输出参数配置多个下一步的输入参数,可以通过添加一个连接线输入参数实现。(2.4) Configure the connection; by double-clicking the connection line, connect the output parameter of the previous step with the input parameter of the next step, the relationship is that one output parameter can be selected corresponding to multiple input parameters; if you want to configure multiple lower parameters for one output parameter One-step input parameters can be achieved by adding a link input parameter.
(2.5)配置公共参数;公共参数为固定控制工作流中特定参数的功能,它像节点一样连接到任意步骤,并与节点的输入参数进行一对多的赋值传递。(2.5) Configure public parameters; public parameters are functions of fixed control of specific parameters in the workflow, which are connected to any step like a node, and carry out one-to-many assignment transfer with the input parameters of the node.
(2.6)配置工作流结果;工作流结果为设计人员希望通过科学计算工作流得到的最终数据或者文件,该结果配置完成后会通过工作流描述语言释义解析器转换成任务管理模块或其它高层任务管理系统所需要的获取结果数据的接口描述,供用户在各个系统中方便地获得;(2.6) Configure the workflow result; the workflow result is the final data or file that the designer hopes to obtain through the scientific computing workflow. After the configuration is completed, the result will be converted into a task management module or other high-level tasks through the workflow description language interpretation parser The interface description of the obtained result data required by the management system for users to easily obtain in each system;
(2.7)查看工作流节点列表和状态;通过打开工作流节点面板,可以查看每一个节点文件在本地是否被正确下载并加入到工作流中,以保证用户能够对节点进行正确配置;(2.7) View the list and status of workflow nodes; by opening the workflow node panel, you can check whether each node file is correctly downloaded locally and added to the workflow to ensure that users can configure the nodes correctly;
(2.8)调试和运行工作流;针对已经完成发布的工作流,可以在编排模块单击运行功能按钮,即可通过任务分发器调用底层服务中的任务调度和执行中间件,分配云端计算集群资源,运行任务。(2.8) Debug and run the workflow; for the workflow that has been published, you can click the run function button in the orchestration module to call the task scheduling and execution middleware in the underlying service through the task dispatcher to allocate cloud computing cluster resources , to run the task.
所述的工作流节点控制模块:工作流节点控制器通过调用工作流描述语言释义解析器解析引入到工作流中的节点文件,提供给用户对节点特性的配置功能;工作流节点控制模块的流程为:The workflow node control module: the workflow node controller parses the node file introduced into the workflow by calling the workflow description language interpretation parser, and provides the user with the configuration function of the node characteristics; the process flow of the workflow node control module for:
(3.1)读取工作流节点配置;双击或者右键打开工作流编排模块中的选定节点,通过工作流描述语言释义解析器解析后打开节点控制面板,即可看到该节点所具备的科学计算能力的描述、输入输出参数以及可选的任务运行模式;(3.1) Read the workflow node configuration; double-click or right-click to open the selected node in the workflow orchestration module, and open the node control panel after parsing by the workflow description language interpretation parser, and you can see the scientific computing of the node. Description of capabilities, input and output parameters, and optional task operation mode;
(3.2)任务节点属性配置目前一个节点分为以下属性,顺序运算、条件分支和实施干预。选择顺序运算为若不进行人工干预,则该节点会在完成运算后进行下一节点的运算, 使得工作流能够自动运行直至抛错或完成;选择条件分支为根据某一个步骤的输出参数结果进行不同的下一步路径选择,通常一个条件分支的输出需要连接至少两个不同的节点;选择实施干预为当科学计算工作流进行到当前节点后,会自动暂停,待用户审阅完参数后,另行设置下一节点的入参,方可继续执行;(3.2) Task node attribute configuration At present, a node is divided into the following attributes, sequential operation, conditional branch and implementation intervention. The selection sequence operation is that if there is no manual intervention, the node will perform the operation of the next node after the operation is completed, so that the workflow can automatically run until an error is thrown or completed; the selection conditional branch is based on the output parameter results of a certain step. For different next-step path selections, usually the output of a conditional branch needs to be connected to at least two different nodes; when the scientific computing workflow reaches the current node, it will be automatically paused, and the parameters will be set separately after the user has reviewed the parameters. The input parameters of the next node can continue to execute;
(3.3)任务运行模式配置在JobIn区域,用户选择Scatter即为将一个特定的输入参数进行并发计算,并发的规模可以通过Value中的具体数值或者函数计算关系来赋值。在JobOut区域,用户选择Gather即为将并发产生的多个运算结果收敛到某一个输出参数上,形成字典格式的数据或文件;(3.3) The task running mode is configured in the JobIn area. The user selects Scatter to perform concurrent computation on a specific input parameter. The scale of concurrency can be assigned by the specific value in Value or the function calculation relationship. In the JobOut area, the user selects Gather to converge multiple concurrently generated operation results to a certain output parameter to form data or files in dictionary format;
(3.4)设置输出参数为结果与工作流编排模块中的配置工作流结果类似,可以在节点控制器中配置流程输出结果。(3.4) Setting the output parameter as the result is similar to the configuration workflow result in the workflow orchestration module, and the process output result can be configured in the node controller.
所述的任务管理模块:任务管理模块允许用户通过编排面板的操作,获悉一个节点的单点任务或者并发任务的运行状态,入参出参和运行日志。该模块,调用的是底层服务中的任务监控和日志服务的接口;任务启动后,通过工作流编排模块直观地看到工作流运行阶段和每一个节点的运行状态,每一个节点有五个状态:待命(灰色),通过(绿色),运行中(蓝色),运行中但发生故障(黄色)和异常中止(红色)。若全部节点均变为绿色,则表示该工作流顺利完成。任务管理模块的流程为:The task management module: the task management module allows the user to learn the running status of a node's single-point task or concurrent task, input parameters, and output parameters and running logs through the operation of the orchestration panel. This module calls the interface of task monitoring and log service in the underlying service; after the task is started, the workflow orchestration module can visually see the workflow running stage and the running status of each node, and each node has five statuses : Standby (grey), Passed (green), Running (blue), Running but faulted (yellow) and Aborted (red). If all nodes turn green, the workflow has completed successfully. The process of the task management module is:
(4.1)查看正在执行的工作流节点,在编排模块选中一个工作流节点,选择查看节点状态详情,即可打开任务管理模块面板;(4.1) View the running workflow node, select a workflow node in the orchestration module, and choose to view the node status details to open the task management module panel;
(4.2)控制正在执行的工作流节点,控制功能包含以下三种:暂停节点任务,修改节点参数,重启节点任务;在编排模块选中一个工作流节点,当发现任务运行中出参入参不符合预期,并且检查日志发现异常但任务仍在运行时,可以选择暂停即可将正在运行的节点任务停止,避免不必要的资源浪费;当工作流中包含实施干预节点时,可以通过修改工作流节点控制模块,对输出参数进行修改,再对选中的工作流节点进行继续运行操作,即可从当前节点,继续执行工作流;当任一工作流在某节点进行了人工停止后,可以在停止节点处选择重启节点,即可从当前节点继续执行工作流。(4.2) Control the workflow node that is being executed. The control functions include the following three types: suspend the node task, modify the node parameters, and restart the node task; select a workflow node in the orchestration module, when it is found that the input and output parameters during the task running do not meet expectations , and check the log to find an exception but the task is still running, you can choose to pause to stop the running node task to avoid unnecessary waste of resources; when the workflow includes the implementation of the intervention node, you can modify the workflow node to control module, modify the output parameters, and then continue to run the selected workflow node, you can continue to execute the workflow from the current node; when any workflow is manually stopped at a node, it can be stopped at the node. Select Restart Node to resume the workflow from the current node.
本发明提供的云端高性能科学计算工作流设计控制系统及用户图形界面,具有以下技术优势:The cloud high-performance scientific computing workflow design control system and user graphical interface provided by the present invention have the following technical advantages:
1、解析层的设计对标准化工作流语言和云端资源执行任务的标准化方法进行了封装,满足了科学计算流程的简洁定义和便捷设计,完成高通量计算的需求,实现了从人机交互 层穿透到底层服务的能力。1. The design of the parsing layer encapsulates the standardized workflow language and the standardized method for executing tasks on cloud resources, which satisfies the concise definition and convenient design of the scientific computing process, and fulfills the needs of high-throughput computing. The ability to penetrate to the underlying service.
2、人机交互层的设计让用户在设计科学计算流程的过程中,通过简单的拖拽、点击、配置等图形界面的交互方式,快速掌握科学计算工作流的全景图,降低了设计工作流的工程门槛。2. The design of the human-computer interaction layer allows users to quickly grasp the panorama of the scientific computing workflow by simply dragging, clicking, configuring and other graphical interface interactions in the process of designing the scientific computing process, reducing the design workflow. engineering threshold.
3、云原生的软件架构设计保证了科学计算工作流研发阶段,研发人员可以不断地改进计算节点算法功能,方便地发布自研算法软件,调整和替换工作流程中的节点,提升研发效率。3. The cloud-native software architecture design ensures that in the research and development stage of scientific computing workflow, developers can continuously improve the algorithm functions of computing nodes, easily release self-developed algorithm software, adjust and replace nodes in the workflow, and improve research and development efficiency.
4、综合利用了本地工作流软件和容器化计算软件的组合优势,即能充分复用自研、合作授权、第三方公开的数据统计和学科领域分析工具,又规避了庞大的软件集成成本,还保障了在高通量需求下的大规模并行的运算资源。4. Comprehensively utilizes the combined advantages of local workflow software and containerized computing software, that is, it can fully reuse self-developed, cooperative authorization, third-party public data statistics and subject area analysis tools, and avoid huge software integration costs. It also guarantees massively parallel computing resources under high-throughput requirements.
5、节点任务的控制模块实现了或者针对某节点进行有条件的人工干预,逐步达到任务过程可控,计算资源节约,以及可视化交互操作的动态平衡。5. The control module of the node task realizes or performs conditional manual intervention for a node, and gradually achieves the controllability of the task process, the saving of computing resources, and the dynamic balance of visual interaction.
附图说明Description of drawings
图1为本发明的系统架构示意图;1 is a schematic diagram of a system architecture of the present invention;
图2为本发明的用户图形界面示意图;Fig. 2 is the user graphical interface schematic diagram of the present invention;
图3为本发明的用户图形抽象界面图;Fig. 3 is the user graphic abstract interface diagram of the present invention;
图4为实施例的发布容器化软件流程示意图;FIG. 4 is a schematic flow chart of publishing containerized software according to an embodiment;
图5为实施例的设计和发布工作流流程示意图;5 is a schematic diagram of the design and release workflow of an embodiment;
图6为实施例的运行中的工作流监控管理流程示意图。FIG. 6 is a schematic diagram of a workflow monitoring and management process in operation according to an embodiment.
具体实施方式detailed description
结合实施例说明本发明的具体技术方案。The specific technical solutions of the present invention are described with reference to the embodiments.
如图1所示,云端高性能科学计算工作流设计控制系统,包括三层:底层服务层,解析层和人机交互层;As shown in Figure 1, the cloud high-performance scientific computing workflow design control system includes three layers: the underlying service layer, the analysis layer and the human-computer interaction layer;
所述的底层服务层:底层服务以SDK形式,为上层应用提供基础的服务,包括:任务调度和执行中间件、计算和存储资源管理服务、任务监控和日志服务;The underlying service layer: the underlying service provides basic services for upper-layer applications in the form of SDK, including: task scheduling and execution middleware, computing and storage resource management services, task monitoring and log services;
所述的任务调度和执行中间件,主要功能为能够通过封装好的指令,让云端计算集群进行任务装箱和运行;The main function of the task scheduling and execution middleware is to enable the cloud computing cluster to package and run tasks through packaged instructions;
所述的计算和存储资源服务,主要功能为用户端的工作流和实际运行的任务提供资源上传、下载、储存和分发;The computing and storage resource service, the main function of which provides resource uploading, downloading, storage and distribution for the workflow and actual running tasks of the client;
所述的任务监控和日志服务,主要功能为对任务运行过程中返回的状态和抛错等数据进行收集和储存,并能够通过指令进行查询。The main function of the task monitoring and log service is to collect and store data such as status and error throws returned during the task running process, and to query through commands.
所述的解析层:解析层是架构实际支撑,为用户在界面上的实际操作,提供机器语言转换,使得系统和底层服务能够根据用户的指令进行运作,同时,将底层服务传回的数据按照系统设计好的形式进行转换,使数据符合用户的界面操作习惯和展示逻辑;The parsing layer: the parsing layer is the actual support of the architecture, providing machine language conversion for the actual operation of the user on the interface, so that the system and the underlying service can operate according to the user's instructions, and at the same time, the data returned by the underlying service is in accordance with the Convert the designed form of the system to make the data conform to the user's interface operation habits and display logic;
解析层包含三个模块:工作流描述语言释义解析器、工作流生成器、任务分发器;The parsing layer consists of three modules: Workflow Description Language Interpretation Parser, Workflow Generator, and Task Dispatcher;
所述的工作流描述语言释义解析器,主要功能是应用标准化的工作流语言,对云原生的容器化软件或算法进行描述转换,该转换的效果是:在用户端将描述文件转换成用户在图形界面上可以进行拖拽控制的节点,以及该节点需要展示的属性等。在底层服务端,将用户的操作转换为底层SDK所需要的数据结构;The main function of the workflow description language interpretation parser is to apply a standardized workflow language to describe and convert cloud-native containerized software or algorithms. The node that can be dragged and controlled on the graphical interface, as well as the properties that the node needs to display, etc. On the underlying server, the user's operation is converted into the data structure required by the underlying SDK;
所述的工作流生成器,主要功能是将用户进行拖拽组合的工作流节点以及相关配置进行处理并转换成工作流文件,能够让任务分发器识别出工作流顺序关系、分发模式和参数配置;The main function of the workflow generator is to process and convert the workflow nodes and related configurations that users drag and drop to combine into workflow files, so that the task distributor can identify the workflow sequence relationship, distribution mode and parameter configuration. ;
所述的任务分发器,主要功能是根据工作流生成器生成的配置文件,对底层服务进行调用,最终实现工作流的解析,任务的执行和分发。The main function of the task dispatcher is to call the underlying service according to the configuration file generated by the workflow generator, and finally realize workflow analysis, task execution and distribution.
所述的人机交互层:人机交互层负责为用户实现操作行为、参数配置功能,包括四个模块:工作流管理模块、工作节点控制面板、工作流编排模块、任务管理器。The human-computer interaction layer: the human-computer interaction layer is responsible for implementing the functions of operation behavior and parameter configuration for users, including four modules: a workflow management module, a work node control panel, a workflow arrangement module, and a task manager.
所述的工作流管理模块,主要功能是能够将云端的容器、工作流描述文件进行下载和加载,在用户端以文件的形式进行管理,修改,或引入工作流;The main function of the workflow management module is to download and load containers and workflow description files in the cloud, and manage, modify, or introduce workflows in the form of files on the client side;
所述的工作节点控制面板,主要功能是让用户在用户端对工作节点进行配置,对该节点的执行属性、运行参数进行配置;节点的运行模式为:单点模式或并行模式。The main function of the work node control panel is to allow the user to configure the work node on the client side, and to configure the execution properties and operation parameters of the node; the operation mode of the node is: single-point mode or parallel mode.
所述的工作流编排模块,主要功能是能够让用户引入不同的工作流节点,并且建立不同工作流节点的连接关系;The main function of the workflow orchestration module is to allow users to introduce different workflow nodes and establish connection relationships between different workflow nodes;
所述的任务管理器,主要功能是能够实时查看每一个节点的任务或并发任务中每一个子任务的运行状态,输入输出和任务日志。The main function of the task manager is to be able to view the running status, input and output and task logs of each node's task or each sub-task in a concurrent task in real time.
本实施例还提供用于上述控制系统的用户图形界面,抽象界面图如图3所示。具体结构如图2所示,包括工作流管理模块、工作流编排模块、工作流节点控制模块、任务管理模块;This embodiment also provides a user graphical interface for the above-mentioned control system, and an abstract interface diagram is shown in FIG. 3 . The specific structure is shown in Figure 2, including a workflow management module, a workflow arrangement module, a workflow node control module, and a task management module;
所述的工作流管理模块:工作流管理模块将一个工作流节点的描述文件视为一个独立 的文件,将一个工作流的描述文件同样视为一个独立的文件;而工作流与工作流节点之间的关系,是通过工作流编排模块解析实现;The workflow management module: the workflow management module regards the description file of a workflow node as an independent file, and also regards the description file of a workflow as an independent file; The relationship between them is realized through the analysis of the workflow orchestration module;
工作流管理模块的流程为:The workflow of the workflow management module is:
(1.1)导入本地文件;通过系统的文件管理系统,将用户本的算法文件引入,会根据算法名称自动生成该算法的云原生描述文件;(1.1) Import the local file; import the user's algorithm file through the system's file management system, and the cloud native description file of the algorithm will be automatically generated according to the algorithm name;
(1.2)发布容器化软件;通过完成云原生描述文件,将用户本地的算法文件或者设计好的工作流通过工作流生成器转化成任务分发器能够识别的表达文件,随后调用底层服务中的计算和存储资源服务的中间件,对算法或者设计好的工作流进行容器化打包,并且上传到云端,容器化前,用户可以对该容器的名称和部署环境进行配置;(1.2) Publish containerized software; by completing the cloud native description file, the user's local algorithm file or designed workflow is converted into an expression file that can be recognized by the task dispatcher through the workflow generator, and then the calculation in the underlying service is called. and middleware for storage resource services, containerize and package algorithms or designed workflows, and upload them to the cloud. Before containerization, users can configure the name of the container and the deployment environment;
(1.3)导入云端文件;通过调用底层服务中的计算和存储资源服务,将云端已部署的容器化软件的描述文件,下载到客户端本地,允许在用户端进行管理和操作。(1.3) Import cloud files; by calling the computing and storage resource services in the underlying services, the description files of the containerized software deployed in the cloud are downloaded to the local client, allowing management and operation on the client.
(1.4)导出和保存;通过系统的文件管理系统,将用户的文件另存到本地磁盘环境中,令用户通过系统资源管理器来管理文件。(1.4) Export and save; save the user's file to the local disk environment through the system's file management system, so that the user can manage the file through the system resource manager.
所述的工作流编排模块:工作流编排模块通过调用工作流描述语言释义解析器解析打开的工作流文件,或拖拽或引入的工作流节点文件,将其正确识别为各个节点图形、节点图形的连接顺序和关系,展示在编排面板中;工作流编排模块的的流程为:The workflow orchestration module: the workflow orchestration module parses the opened workflow file by calling the workflow description language interpretation parser, or drags or imports the workflow node file, and correctly identifies it as each node graph, node graph The connection order and relationship of , are displayed in the orchestration panel; the workflow of the workflow orchestration module is as follows:
(2.1)读取工作流文件;通过双击工作流管理模块中的工作流后缀文件,将文件中所包含的工作流节点,节点间关系加载到编排面板中;(2.1) Read the workflow file; by double-clicking the workflow suffix file in the workflow management module, load the workflow nodes and relationships between nodes contained in the file into the arrangement panel;
(2.2)添加工作流节点;通过从工作流管理模块中拖拽节点文件,或者在工具栏中选择添加节点,可以为工作流加入新的节点;(2.2) Adding workflow nodes; new nodes can be added to the workflow by dragging and dropping node files from the workflow management module, or selecting Add Nodes in the toolbar;
(2.3)连接工作流节点;通过选择一个工作流节点的输出端,拖拽拉出箭头,连接到目标工作流节点的输入端,即可确定科学计算工作流的上下步骤;(2.3) Connecting workflow nodes; by selecting the output end of a workflow node, dragging and pulling the arrow, and connecting to the input end of the target workflow node, the upper and lower steps of the scientific computing workflow can be determined;
(2.4)配置连接;通过双击连接线,将上一步的输出参数与下一步的输入参数对应相连,其关系为一个输出参数可选对应多个输入参数;若要为一个输出参数配置多个下一步的输入参数,可以通过添加一个连接线输入参数实现。(2.4) Configure the connection; by double-clicking the connection line, connect the output parameter of the previous step with the input parameter of the next step, the relationship is that one output parameter can be selected corresponding to multiple input parameters; if you want to configure multiple lower parameters for one output parameter One-step input parameters can be achieved by adding a link input parameter.
(2.5)配置公共参数;公共参数为固定控制工作流中特定参数的功能,它像节点一样连接到任意步骤,并与节点的输入参数进行一对多的赋值传递。(2.5) Configure public parameters; public parameters are functions of fixed control of specific parameters in the workflow, which are connected to any step like a node, and carry out one-to-many assignment transfer with the input parameters of the node.
(2.6)配置工作流结果;工作流结果为设计人员希望通过科学计算工作流得到的最终数据或者文件,该结果配置完成后会通过工作流描述语言释义解析器转换成任务管理模块 或其它高层任务管理系统所需要的获取结果数据的接口描述,供用户在各个系统中方便地获得;(2.6) Configure the workflow result; the workflow result is the final data or file that the designer hopes to obtain through the scientific computing workflow. After the configuration is completed, the result will be converted into a task management module or other high-level tasks through the workflow description language interpretation parser The interface description of the obtained result data required by the management system for users to easily obtain in each system;
(2.7)查看工作流节点列表和状态;通过打开工作流节点面板,可以查看每一个节点文件在本地是否被正确下载并加入到工作流中,以保证用户能够对节点进行正确配置;(2.7) View the list and status of workflow nodes; by opening the workflow node panel, you can check whether each node file is correctly downloaded locally and added to the workflow to ensure that users can configure the nodes correctly;
(2.8)调试和运行工作流;针对已经完成发布的工作流,可以在编排模块单击运行功能按钮,即可通过任务分发器调用底层服务中的任务调度和执行中间件,分配云端计算集群资源,运行任务;(2.8) Debug and run the workflow; for the workflow that has been published, you can click the run function button in the orchestration module to call the task scheduling and execution middleware in the underlying service through the task dispatcher to allocate cloud computing cluster resources , run the task;
所述的工作流节点控制模块:工作流节点控制器通过调用工作流描述语言释义解析器解析引入到工作流中的节点文件,提供给用户对节点特性的配置功能;工作流节点控制模块的流程为:The workflow node control module: the workflow node controller parses the node file introduced into the workflow by calling the workflow description language interpretation parser, and provides the user with the configuration function of the node characteristics; the process flow of the workflow node control module for:
(3.1)读取工作流节点配置;双击或者右键打开工作流编排模块中的选定节点,通过工作流描述语言释义解析器解析后打开节点控制面板,即可看到该节点所具备的科学计算能力的描述、输入输出参数以及可选的任务运行模式;(3.1) Read the workflow node configuration; double-click or right-click to open the selected node in the workflow orchestration module, and open the node control panel after parsing by the workflow description language interpretation parser, and you can see the scientific computing of the node. Description of capabilities, input and output parameters, and optional task operation mode;
(3.2)任务节点属性配置目前一个节点分为以下属性,顺序运算、条件分支和实施干预。选择顺序运算为若不进行人工干预,则该节点会在完成运算后进行下一节点的运算,使得工作流能够自动运行直至抛错或完成;选择条件分支为根据某一个步骤的输出参数结果进行不同的下一步路径选择,通常一个条件分支的输出需要连接至少两个不同的节点;选择实施干预为当科学计算工作流进行到当前节点后,会自动暂停,待用户审阅完参数后,另行设置下一节点的入参,方可继续执行;(3.2) Task node attribute configuration At present, a node is divided into the following attributes, sequential operation, conditional branch and implementation intervention. The selection sequence operation is that if no manual intervention is performed, the node will perform the operation of the next node after the operation is completed, so that the workflow can automatically run until an error is thrown or completed; the selection conditional branch is based on the output parameter results of a certain step. For different next-step path selections, usually the output of a conditional branch needs to be connected to at least two different nodes; when the scientific computing workflow reaches the current node, it will be automatically paused, and the parameters will be set separately after the user has reviewed the parameters. The input parameters of the next node can continue to execute;
(3.3)任务运行模式配置在JobIn区域,用户选择Scatter即为将一个特定的输入参数进行并发计算,并发的规模可以通过Value中的具体数值或者函数计算关系来赋值。在JobOut区域,用户选择Gather即为将并发产生的多个运算结果收敛到某一个输出参数上,形成字典格式的数据或文件;(3.3) The task running mode is configured in the JobIn area. The user selects Scatter to perform concurrent computation on a specific input parameter. The scale of concurrency can be assigned by the specific value in Value or the function calculation relationship. In the JobOut area, the user selects Gather to converge multiple concurrently generated operation results to a certain output parameter to form data or files in dictionary format;
(3.4)设置输出参数为结果与工作流编排模块中的配置工作流结果类似,可以在节点控制器中配置流程输出结果。(3.4) Setting the output parameter as the result is similar to the configuration workflow result in the workflow orchestration module, and the process output result can be configured in the node controller.
所述的任务管理模块:任务管理模块允许用户通过编排面板的操作,获悉一个节点的单点任务或者并发任务的运行状态,入参出参和运行日志。该模块,调用的是底层服务中的任务监控和日志服务的接口;任务启动后,通过工作流编排模块直观地看到工作流运行阶段和每一个节点的运行状态,每一个节点有五个状态:待命(灰色),通过(绿色),运 行中(蓝色),运行中但发生故障(黄色)和异常中止(红色)。若全部节点均变为绿色,则表示该工作流顺利完成。任务管理模块的流程为:The task management module: the task management module allows the user to learn the running status of a node's single-point task or concurrent task, input parameters, and output parameters and running logs through the operation of the orchestration panel. This module calls the interface of task monitoring and log service in the underlying service; after the task is started, the workflow orchestration module can visually see the workflow running stage and the running status of each node, and each node has five statuses : Standby (grey), Passed (green), Running (blue), Running but faulted (yellow) and Aborted (red). If all nodes turn green, the workflow has completed successfully. The process of the task management module is:
(4.1)查看正在执行的工作流节点,在编排模块选中一个工作流节点,选择查看节点状态详情,即可打开任务管理模块面板;(4.1) View the running workflow node, select a workflow node in the orchestration module, and choose to view the node status details to open the task management module panel;
(4.2)控制正在执行的工作流节点,控制功能包含以下三种:暂停节点任务,修改节点参数,重启节点任务;在编排模块选中一个工作流节点,当发现任务运行中出参入参不符合预期,并且检查日志发现异常但任务仍在运行时,可以选择暂停即可将正在运行的节点任务停止,避免不必要的资源浪费;当工作流中包含实施干预节点时,可以通过修改工作流节点控制模块,对输出参数进行修改,再对选中的工作流节点进行继续运行操作,即可从当前节点,继续执行工作流;当任一工作流在某节点进行了人工停止后,可以在停止节点处选择重启节点,即可从当前节点继续执行工作流。(4.2) Control the workflow node that is being executed. The control functions include the following three types: suspend the node task, modify the node parameters, and restart the node task; select a workflow node in the orchestration module, when it is found that the input and output parameters during the task running do not meet expectations , and check the log to find an exception but the task is still running, you can choose to pause to stop the running node task to avoid unnecessary waste of resources; when the workflow includes the implementation of the intervention node, you can modify the workflow node to control module, modify the output parameters, and then continue to run the selected workflow node, you can continue to execute the workflow from the current node; when any workflow is manually stopped at a node, it can be stopped at the node. Select Restart Node to resume the workflow from the current node.
其中,发布容器化软件,如图4,该实施例可以令用户将本地编写的算法通过用户界面部署到云端,变成可复用的云原生应用,方便地被一系列科学计算工作流调用,用于设计和运行。其具体实施流程为:Among them, the containerized software is released, as shown in Figure 4. In this embodiment, the user can deploy the locally written algorithm to the cloud through the user interface, turning it into a reusable cloud-native application, which can be easily invoked by a series of scientific computing workflows. for design and operation. The specific implementation process is as follows:
第一步:用户通过工作流管理模块的文件引入功能,将本地算法文件引入,左侧菜单会为其生成一个对应的描述文件;Step 1: The user imports the local algorithm file through the file import function of the workflow management module, and the left menu will generate a corresponding description file for it;
第二步:用户双击描述文件,通过工作流描述语言释义解析器转译后,用户界面弹出节点控制模块,展示描述文件内的内容;Step 2: The user double-clicks the description file, and after translation by the workflow description language interpretation parser, the user interface pops up the node control module to display the content in the description file;
第三步:用户通过用户界面提供的表单进行参数编辑,配置参数的属性、类型和默认数值;Step 3: The user edits the parameters through the form provided by the user interface, and configures the attributes, types and default values of the parameters;
第四步:配置完成后,在工具栏选择容器化软件功能,即可对编写的内容进行校验;Step 4: After the configuration is complete, select the containerized software function on the toolbar to verify the written content;
第五步:若校验通过,则客户端调用底层计算与存储服务进行云端部署。若未通过校验,则重新回到第三步对描述文件进行配置。Step 5: If the verification is passed, the client calls the underlying computing and storage services for cloud deployment. If the verification fails, go back to the third step to configure the description file.
其中,设计和发布工作流,如图5所示,该实施例可以令用户使用已部署的云端软件或算法,通过拖拽和可视化表单配置等方式进行科学计算流程设计编排,并且部署到云端,变成可复用的云原生应用,方便被协作小组的成员应用。其具体实施流程为:Among them, the design and release workflow is shown in Figure 5. This embodiment enables users to use the deployed cloud software or algorithms to design and arrange scientific computing processes through dragging and visual form configuration, and deploy them to the cloud. Become a reusable cloud-native application that can be easily applied by members of the collaborative team. The specific implementation process is as follows:
第一步引入云原生应用:用户通过工作流管理模块打开云端镜像库,将一批需要的云端算法或者软件的描述文件引入,引入后,菜单中出现对应的所有描述文件;The first step is to introduce cloud native applications: the user opens the cloud image library through the workflow management module, and imports a batch of required cloud algorithm or software description files. After the introduction, all the corresponding description files appear in the menu;
第二步引入编辑器:用户将描述文件依次或者全选一批拖拽放入任务编排模块,通过 工作流描述语言释义解析器转译后,编排模块中正确展示该节点的名称,图形,状态等信息;The second step is to introduce the editor: the user drags and drops the description files in sequence or selects a batch into the task orchestration module. After translation by the workflow description language interpretation parser, the orchestration module correctly displays the name, graph, status, etc. of the node. information;
第三步连线与编排:用户可以在任务编排模块的画布中,对节点进行拖拽,或者自动布局,用于整理多个节点的排布位置,随后用户可以针对工作流的数据和计算顺序,依次将不同的节点进行连线;The third step of connection and arrangement: The user can drag and drop the nodes in the canvas of the task arrangement module, or automatically layout, to arrange the arrangement positions of multiple nodes, and then the user can focus on the data and calculation sequence of the workflow. , connect different nodes in turn;
第四步配置节点:连线完成后,用户进行节点配置,在节点配置中,可以针对有数据流入和数据流出的节点中的某一个或者多个参数进行并发和回收配置;The fourth step is to configure the node: After the connection is completed, the user configures the node. In the node configuration, the concurrent and recycling configuration can be performed for one or more parameters of the nodes with data inflow and data outflow;
第五步保存和发布:配置完成后,进行保存,保存过程中软件调用解析层进行校验,若连接和配置无误,则能够成功保存,若有误,则提示后引导用户继续配置;The fifth step is to save and publish: After the configuration is completed, save it. During the save process, the software calls the parsing layer to verify. If the connection and configuration are correct, it can be successfully saved. If there is an error, it will prompt the user to continue the configuration;
第六步云端部署和构建:保存好的文件可以进行云端部署,用户选择云端部署后,配置基本的云端信息,解析层调用底层计算与存储服务,由云端服务进行工作流的软件构建,成功后,在用户界面上提示用户。Step 6 Cloud deployment and construction: The saved files can be deployed in the cloud. After the user selects the cloud deployment, the basic cloud information is configured, the analysis layer calls the underlying computing and storage services, and the cloud service performs the software construction of the workflow. , which prompts the user on the user interface.
其中,运行中的工作流监控管理,如图6所示,运行中的工作流监控管理模块可以被各种类型的用户任务管理系统集成,以实现对各个协作组或者业务部门的复用支持。该实施例可以令用户通过各种系统中的任务管理模块打开并查看工作流的运行情况。其具体实施流程为:Among them, the running workflow monitoring and management, as shown in Figure 6, the running workflow monitoring and management module can be integrated by various types of user task management systems to realize the multiplexing support for each collaboration group or business department. In this embodiment, the user can open and view the running status of the workflow through the task management module in various systems. The specific implementation process is as follows:
第一步用户终端任务系统选定任务:用户通过其它系统集成的任务管理模块选定任务并调用查看详情的接口,任务管理模块通过解析层调用底层计算与服务,请求任务载体的工作流文件,下载成功后通过工作流描述语言释义解析器解析,展示在编排模块中。The first step is to select the task of the user terminal task system: the user selects the task through the task management module integrated with other systems and invokes the interface for viewing details. The task management module invokes the underlying computing and services through the parsing layer, and requests the workflow file of the task carrier. After the download is successful, it is parsed by the workflow description language interpretation parser and displayed in the orchestration module.
第二步在任务编排面板中选定一个节点并查看详细信息:解析完成后的工作流,会展示整体工作流信息和各个节点的连线、状态等,用户选中一个节点,右键选择查看节点信息,即可打开任务管理模块。The second step is to select a node in the task arrangement panel and view the detailed information: the workflow after parsing is completed, the overall workflow information and the connection and status of each node will be displayed. The user selects a node, right-clicks and selects to view the node information to open the task management module.
第三步选定并发任务中的某一个任务日志:打开某一个节点的管理模块后,若该节点处于并行模式,则可以查看每一条并发子任务的状态和日志,右键点击选取的子任务,并单击查看日志,即可打开该子任务日志,进行任务调试。The third step is to select a task log in the concurrent tasks: after opening the management module of a node, if the node is in parallel mode, you can view the status and log of each concurrent subtask, right-click the selected subtask, And click View Log to open the subtask log for task debugging.

Claims (9)

  1. 云端高性能科学计算工作流设计控制系统,其特征在于,包括三层:底层服务层,解析层和人机交互层;The cloud high-performance scientific computing workflow design control system is characterized in that it includes three layers: the bottom service layer, the analysis layer and the human-computer interaction layer;
    所述的底层服务层:底层服务以SDK形式,为上层应用提供基础的服务;The underlying service layer: the underlying service provides basic services for upper-layer applications in the form of SDK;
    所述的解析层:为用户在界面上的实际操作,提供机器语言转换,使得系统和底层服务能够根据用户的指令进行运作,同时,将底层服务传回的数据按照系统设计好的形式进行转换,使数据符合用户的界面操作习惯和展示逻辑;The parsing layer: provides machine language conversion for the user's actual operation on the interface, so that the system and the underlying service can operate according to the user's instructions, and at the same time, the data returned by the underlying service is converted in the form designed by the system , so that the data conforms to the user's interface operation habits and display logic;
    所述的人机交互层:负责为用户实现操作行为、参数配置功能。The human-computer interaction layer is responsible for realizing operation behavior and parameter configuration functions for users.
  2. 根据权利要求1所述的云端高性能科学计算工作流设计控制系统,其特征在于,所述的底层服务层,包括:任务调度和执行中间件、计算和存储资源管理服务、任务监控和日志服务;The cloud high-performance scientific computing workflow design control system according to claim 1, wherein the underlying service layer includes: task scheduling and execution middleware, computing and storage resource management services, task monitoring and log services ;
    所述的任务调度和执行中间件,主要功能为能够通过封装好的指令,让云端计算集群进行任务装箱和运行;The main function of the task scheduling and execution middleware is to enable the cloud computing cluster to package and run tasks through packaged instructions;
    所述的计算和存储资源服务,主要功能为用户端的工作流和实际运行的任务提供资源上传、下载、储存和分发;The computing and storage resource service, the main function of which provides resource uploading, downloading, storage and distribution for the workflow and actual running tasks of the client;
    所述的任务监控和日志服务,主要功能为对任务运行过程中返回的状态和抛错数据进行收集和储存,并能够通过指令进行查询。The main function of the task monitoring and log service is to collect and store the status and error-throwing data returned during the task running process, and to query through commands.
  3. 根据权利要求1所述的云端高性能科学计算工作流设计控制系统,其特征在于,所述的解析层包含三个模块:工作流描述语言释义解析器、工作流生成器、任务分发器;The cloud high-performance scientific computing workflow design control system according to claim 1, wherein the parsing layer comprises three modules: a workflow description language interpretation parser, a workflow generator, and a task dispatcher;
    所述的工作流描述语言释义解析器,主要功能是应用标准化的工作流语言,对云原生的容器化软件或算法进行描述转换;在底层服务端,将用户的操作转换为底层SDK所需要的数据结构;The main function of the workflow description language interpretation parser is to apply a standardized workflow language to describe and convert cloud-native containerized software or algorithms; on the underlying server, it converts user operations into those required by the underlying SDK. data structure;
    所述的工作流生成器,主要功能是将用户进行拖拽组合的工作流节点以及相关配置进行处理并转换成工作流文件,能够让任务分发器识别出工作流顺序关系、分发模式和参数配置;The main function of the workflow generator is to process and convert the workflow nodes and related configurations that users drag and drop to combine into workflow files, so that the task distributor can identify the workflow sequence relationship, distribution mode and parameter configuration. ;
    所述的任务分发器,主要功能是根据工作流生成器生成的配置文件,对底层服务进行调用,最终实现工作流的解析,任务的执行和分发。The main function of the task dispatcher is to call the underlying service according to the configuration file generated by the workflow generator, and finally realize the analysis of the workflow and the execution and distribution of tasks.
  4. 根据权利要求1所述的云端高性能科学计算工作流设计控制系统,其特征在于,所述的人机交互层,包括四个模块:工作流管理模块、工作节点控制面板、工作流编排模块、任务管理器;The cloud high-performance scientific computing workflow design control system according to claim 1, wherein the human-computer interaction layer includes four modules: a workflow management module, a work node control panel, a workflow orchestration module, task manager;
    所述的工作流管理模块,主要功能是能够将云端的容器、工作流描述文件进行下载和加载,在用户端以文件的形式进行管理,修改,或引入工作流;The main function of the workflow management module is to download and load containers and workflow description files in the cloud, and manage, modify, or introduce workflows in the form of files on the client side;
    所述的工作节点控制面板,主要功能是让用户在用户端对工作节点进行配置,对该节点的执行属性、运行参数进行配置;The main function of the work node control panel is to allow the user to configure the work node on the client side, and to configure the execution properties and operation parameters of the node;
    所述的工作流编排模块,主要功能是能够让用户引入不同的工作流节点,并且建立不同工作流节点的连接关系;The main function of the workflow orchestration module is to allow users to introduce different workflow nodes and establish connection relationships between different workflow nodes;
    所述的任务管理器,主要功能是能够实时查看每一个节点的任务或并发任务中每一个子任务的运行状态,输入输出和任务日志。The main function of the task manager is to be able to view the running status, input and output and task log of each node's task or each sub-task in a concurrent task in real time.
  5. 用户图形界面,其特征在于,用于权利要求1到4任一项所述的云端高性能科学计算工作流设计控制系统,包括工作流管理模块、工作流编排模块、工作流节点控制模块、任务管理模块;A graphical user interface, characterized in that it is used in the cloud high-performance scientific computing workflow design control system according to any one of claims 1 to 4, including a workflow management module, a workflow orchestration module, a workflow node control module, a task management module;
    所述的工作流管理模块:工作流管理模块将一个工作流节点的描述文件视为一个独立的文件,将一个工作流的描述文件同样视为一个独立的文件;而工作流与工作流节点之间的关系,是通过工作流编排模块解析实现;The workflow management module: the workflow management module regards the description file of a workflow node as an independent file, and also regards the description file of a workflow as an independent file; The relationship between them is realized through the analysis of the workflow orchestration module;
    所述的工作流编排模块:工作流编排模块通过调用工作流描述语言释义解析器解析打开的工作流文件,或拖拽或引入的工作流节点文件,将其正确识别为各个节点图形、节点图形的连接顺序和关系,展示在编排面板中;The workflow orchestration module: the workflow orchestration module parses the opened workflow file by calling the workflow description language interpretation parser, or drags or imports the workflow node file, and correctly identifies it as each node graph, node graph The connection order and relationship of , displayed in the layout panel;
    所述的工作流节点控制模块:工作流节点控制器通过调用工作流描述语言释义解析器解析引入到工作流中的节点文件,提供给用户对节点特性的配置功能;The workflow node control module: the workflow node controller parses the node file introduced into the workflow by calling the workflow description language interpretation parser, and provides the user with the configuration function of the node characteristics;
    所述的任务管理模块:任务管理模块允许用户通过编排面板的操作,获悉一个节点的单点任务或者并发任务的运行状态,入参出参和运行日志;所述的任务管理模块,调用的是底层服务中的任务监控和日志服务的接口;任务启动后,通过工作流编排模块直观地看到工作流运行阶段和每一个节点的运行状态,每一个节点有五个状态:待命,通过,运行中,运行中但发生故障,异常中止;若全部节点均变为通过,则表示该工作流顺利完成。Described task management module: the task management module allows the user to learn the running status of a single-point task or concurrent task of a node, input parameters, and output parameters and operation logs through the operation of the orchestration panel; the task management module calls the The interface of task monitoring and log service in the underlying service; after the task is started, the workflow running stage and the running status of each node can be visually seen through the workflow orchestration module. Each node has five statuses: standby, passed, running If all nodes are passed, it means that the workflow is successfully completed.
  6. 根据权利要求5所述的用户图形界面,其特征在于,所述的工作流管理模块的流程为:The graphical user interface according to claim 5, wherein the process of the workflow management module is:
    (1.1)导入本地文件;通过系统的文件管理系统,将用户本的算法文件引入,会根据算法名称自动生成该算法的云原生描述文件;(1.1) Import the local file; import the user-based algorithm file through the system's file management system, and the cloud native description file of the algorithm will be automatically generated according to the algorithm name;
    (1.2)发布容器化软件;通过完成云原生描述文件,将用户本地的算法文件或者设计 好的工作流通过工作流生成器转化成任务分发器能够识别的表达文件,随后调用底层服务中的计算和存储资源服务的中间件,对算法或者设计好的工作流进行容器化打包,并且上传到云端,容器化前,用户对该容器的名称和部署环境进行配置;(1.2) Publishing containerized software; by completing the cloud native description file, the user's local algorithm file or designed workflow is converted into an expression file that can be recognized by the task dispatcher through the workflow generator, and then the calculation in the underlying service is called. and middleware for storage resource services, containerize the algorithm or designed workflow, and upload it to the cloud. Before containerization, the user configures the name of the container and the deployment environment;
    (1.3)导入云端文件;通过调用底层服务中的计算和存储资源服务,将云端已部署的容器化软件的描述文件,下载到客户端本地,允许在用户端进行管理和操作;(1.3) Import cloud files; by invoking the computing and storage resource services in the underlying services, download the description files of the containerized software deployed in the cloud to the local client, allowing management and operation on the client;
    (1.4)导出和保存;通过系统的文件管理系统,将用户的文件另存到本地磁盘环境中,令用户通过系统资源管理器来管理文件。(1.4) Export and save; save the user's file to the local disk environment through the system's file management system, so that the user can manage the file through the system resource manager.
  7. 根据权利要求6所述的用户图形界面,其特征在于,所述的工作流编排模块的的流程为:The graphical user interface according to claim 6, wherein the process of the workflow orchestration module is:
    (2.1)读取工作流文件;通过双击工作流管理模块中的工作流后缀文件,将文件中所包含的工作流节点,节点间关系加载到编排面板中;(2.1) Read the workflow file; by double-clicking the workflow suffix file in the workflow management module, load the workflow node and the relationship between the nodes contained in the file into the arrangement panel;
    (2.2)添加工作流节点;通过从工作流管理模块中拖拽节点文件,或者在工具栏中选择添加节点,为工作流加入新的节点;(2.2) Add a workflow node; add a new node to the workflow by dragging and dropping the node file from the workflow management module, or selecting Add Node in the toolbar;
    (2.3)连接工作流节点;通过选择一个工作流节点的输出端,拖拽拉出箭头,连接到目标工作流节点的输入端,即可确定科学计算工作流的上下步骤;(2.3) Connecting workflow nodes; by selecting the output end of a workflow node, dragging and pulling out the arrow, and connecting to the input end of the target workflow node, the upper and lower steps of the scientific computing workflow can be determined;
    (2.4)配置连接;通过双击连接线,将上一步的输出参数与下一步的输入参数对应相连,其关系为一个输出参数可选对应多个输入参数;若要为一个输出参数配置多个下一步的输入参数,可以通过添加一个连接线输入参数实现;(2.4) Configure the connection; by double-clicking the connection line, connect the output parameter of the previous step with the input parameter of the next step, the relationship is that one output parameter can be selected corresponding to multiple input parameters; if you want to configure multiple lower parameters for one output parameter One-step input parameters can be achieved by adding a connection line input parameter;
    (2.5)配置公共参数;公共参数为固定控制工作流中特定参数的功能,它像节点一样连接到任意步骤,并与节点的输入参数进行一对多的赋值传递;(2.5) Configure public parameters; public parameters are the function of fixed control of specific parameters in the workflow, which are connected to any step like a node, and carry out a one-to-many assignment transfer with the input parameters of the node;
    (2.6)配置工作流结果;工作流结果为设计人员希望通过科学计算工作流得到的最终数据或者文件,该结果配置完成后会通过工作流描述语言释义解析器转换成任务管理模块或其它高层任务管理系统所需要的获取结果数据的接口描述,供用户在各个系统中方便地获得;(2.6) Configure the workflow result; the workflow result is the final data or file that the designer hopes to obtain through the scientific computing workflow. After the configuration is completed, the result will be converted into a task management module or other high-level tasks through the workflow description language interpretation parser The interface description of the obtained result data required by the management system for users to easily obtain in each system;
    (2.7)查看工作流节点列表和状态;通过打开工作流节点面板,查看每一个节点文件在本地是否被正确下载并加入到工作流中,以保证用户能够对节点进行正确配置;(2.7) Check the list and status of workflow nodes; by opening the workflow node panel, check whether each node file is correctly downloaded locally and added to the workflow, so as to ensure that the user can configure the nodes correctly;
    (2.8)调试和运行工作流;针对已经完成发布的工作流,在编排模块单击运行功能按钮,即可通过任务分发器调用底层服务中的任务调度和执行中间件,分配云端计算集群资源,运行任务。(2.8) Debug and run the workflow; for the workflow that has been published, click the run function button in the orchestration module to call the task scheduling and execution middleware in the underlying service through the task dispatcher, and allocate cloud computing cluster resources. Run the task.
  8. 根据权利要求7所述的用户图形界面,其特征在于,所述的工作流节点控制模块的流程为:The graphical user interface according to claim 7, wherein the process of the workflow node control module is:
    (3.1)读取工作流节点配置;双击或者右键打开工作流编排模块中的选定节点,通过工作流描述语言释义解析器解析后打开节点控制面板,即可看到该节点所具备的科学计算能力的描述、输入输出参数以及可选的任务运行模式;(3.1) Read the workflow node configuration; double-click or right-click to open the selected node in the workflow orchestration module, and open the node control panel after parsing by the workflow description language interpretation parser, and you can see the scientific computing of the node. Description of capabilities, input and output parameters, and optional task operation mode;
    (3.2)任务节点属性配置目前一个节点分为以下属性,顺序运算、条件分支和实施干预;选择条件分支为根据某一个步骤的输出参数结果进行不同的下一步路径选择,一个条件分支的输出需要连接至少两个不同的节点;选择实施干预为当科学计算工作流进行到当前节点后,会自动暂停,待用户审阅完参数后,另行设置下一节点的入参,方可继续执行;(3.2) Task node attribute configuration At present, a node is divided into the following attributes: sequential operation, conditional branching and implementation intervention; selecting a conditional branch is to select different next-step paths according to the output parameter results of a certain step, and the output of a conditional branch needs to be Connect at least two different nodes; choose to implement the intervention, when the scientific computing workflow progresses to the current node, it will automatically pause, and after the user has reviewed the parameters, set the input parameters of the next node before continuing to execute;
    (3.3)任务运行模式配置在JobIn区域,用户选择Scatter即为将一个特定的输入参数进行并发计算,并发的规模可以通过Value中的具体数值或者函数计算关系来赋值;在JobOut区域,用户选择Gather即为将并发产生的多个运算结果收敛到某一个输出参数上,形成字典格式的数据或文件;(3.3) The task running mode is configured in the JobIn area. The user selects Scatter to perform concurrent calculation on a specific input parameter. The scale of concurrency can be assigned by the specific value in Value or the function calculation relationship; in the JobOut area, the user selects Gather That is, to converge multiple concurrently generated operation results to a certain output parameter to form data or files in dictionary format;
    (3.4)设置输出参数为结果与工作流编排模块中的配置工作流结果类似,可以在节点控制器中配置流程输出结果。(3.4) Setting the output parameter as the result is similar to the configuration workflow result in the workflow orchestration module, and the process output result can be configured in the node controller.
  9. 根据权利要求8所述的用户图形界面,其特征在于,所述的任务管理模块的流程为:The graphical user interface according to claim 8, wherein the process of the task management module is:
    (4.1)查看正在执行的工作流节点,在编排模块选中一个工作流节点,选择查看节点状态详情,即可打开任务管理模块面板;(4.1) View the running workflow node, select a workflow node in the orchestration module, and choose to view the node status details to open the task management module panel;
    (4.2)控制正在执行的工作流节点,控制功能包含以下三种:暂停节点任务,修改节点参数,重启节点任务;在编排模块选中一个工作流节点,当发现任务运行中出参入参不符合预期,并且检查日志发现异常但任务仍在运行时,可以选择暂停即可将正在运行的节点任务停止,避免不必要的资源浪费;当工作流中包含实施干预节点时,通过修改工作流节点控制模块,对输出参数进行修改,再对选中的工作流节点进行继续运行操作,即可从当前节点,继续执行工作流;当任一工作流在某节点进行了人工停止后,可以在停止节点处选择重启节点,即可从当前节点继续执行工作流。(4.2) Control the workflow node that is being executed. The control functions include the following three types: suspend the node task, modify the node parameters, and restart the node task; select a workflow node in the orchestration module, when it is found that the input and output parameters during the task running do not meet expectations , and check the log to find an exception but the task is still running, you can choose to pause to stop the running node task to avoid unnecessary waste of resources; when the workflow includes the implementation of the intervention node, by modifying the workflow node control module , modify the output parameters, and then continue the operation of the selected workflow node, you can continue to execute the workflow from the current node; when any workflow is manually stopped at a node, you can select it at the stop node Restart the node to resume the workflow from the current node.
PCT/CN2020/115613 2020-09-16 2020-09-16 Cloud high-performance scientific calculation workflow design control system and graphical user interface WO2022056735A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/115613 WO2022056735A1 (en) 2020-09-16 2020-09-16 Cloud high-performance scientific calculation workflow design control system and graphical user interface

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/115613 WO2022056735A1 (en) 2020-09-16 2020-09-16 Cloud high-performance scientific calculation workflow design control system and graphical user interface

Publications (1)

Publication Number Publication Date
WO2022056735A1 true WO2022056735A1 (en) 2022-03-24

Family

ID=80777501

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/115613 WO2022056735A1 (en) 2020-09-16 2020-09-16 Cloud high-performance scientific calculation workflow design control system and graphical user interface

Country Status (1)

Country Link
WO (1) WO2022056735A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114757124A (en) * 2022-04-21 2022-07-15 哈尔滨工程大学 CFD workflow modeling method and device based on XML, computer and storage medium
CN114792088A (en) * 2022-06-23 2022-07-26 中国科学院空天信息创新研究院 Editing method and device for digital earth workflow
CN115098093A (en) * 2022-08-24 2022-09-23 湖南云畅网络科技有限公司 Data flow expression processing method and system
CN115407998A (en) * 2022-08-22 2022-11-29 深圳市誉辰智能装备股份有限公司 Design method for software framework of upper computer of lithium battery equipment
CN115906499A (en) * 2022-12-05 2023-04-04 中国航空发动机研究院 Heterogeneous system-oriented aircraft engine integrated simulation workflow engine system
CN116107548A (en) * 2023-04-13 2023-05-12 中国科学院长春光学精密机械与物理研究所 Cross-platform integrated software architecture
WO2024011628A1 (en) * 2022-07-15 2024-01-18 京东方科技集团股份有限公司 Data processing method, apparatus, device and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106597993A (en) * 2016-10-28 2017-04-26 北京海普瑞森科技发展有限公司 Software architecture of fast tool servo control system
US20180041477A1 (en) * 2015-02-20 2018-02-08 Pristine Machine, LLC Method to split data operational function among system layers
CN108647886A (en) * 2018-05-10 2018-10-12 深圳晶泰科技有限公司 Scientific algorithm process management system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180041477A1 (en) * 2015-02-20 2018-02-08 Pristine Machine, LLC Method to split data operational function among system layers
CN106597993A (en) * 2016-10-28 2017-04-26 北京海普瑞森科技发展有限公司 Software architecture of fast tool servo control system
CN108647886A (en) * 2018-05-10 2018-10-12 深圳晶泰科技有限公司 Scientific algorithm process management system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TAN YUN,LIU YANG: "Studies on the Software Project Management System Based on Windows Workflow Foundation Technology", JOURNAL OF QINGDAO TECHNICAL COLLEGE, vol. 21, no. 3, 15 September 2008 (2008-09-15), pages 77 - 81, XP055914475, ISSN: 1672-2698 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114757124A (en) * 2022-04-21 2022-07-15 哈尔滨工程大学 CFD workflow modeling method and device based on XML, computer and storage medium
CN114757124B (en) * 2022-04-21 2024-02-27 哈尔滨工程大学 CFD workflow modeling method and device based on XML, computer and storage medium
CN114792088A (en) * 2022-06-23 2022-07-26 中国科学院空天信息创新研究院 Editing method and device for digital earth workflow
WO2024011628A1 (en) * 2022-07-15 2024-01-18 京东方科技集团股份有限公司 Data processing method, apparatus, device and medium
CN115407998A (en) * 2022-08-22 2022-11-29 深圳市誉辰智能装备股份有限公司 Design method for software framework of upper computer of lithium battery equipment
CN115407998B (en) * 2022-08-22 2023-09-05 深圳市誉辰智能装备股份有限公司 Method for designing upper computer software framework of lithium battery equipment
CN115098093A (en) * 2022-08-24 2022-09-23 湖南云畅网络科技有限公司 Data flow expression processing method and system
CN115906499A (en) * 2022-12-05 2023-04-04 中国航空发动机研究院 Heterogeneous system-oriented aircraft engine integrated simulation workflow engine system
CN116107548A (en) * 2023-04-13 2023-05-12 中国科学院长春光学精密机械与物理研究所 Cross-platform integrated software architecture

Similar Documents

Publication Publication Date Title
WO2022056735A1 (en) Cloud high-performance scientific calculation workflow design control system and graphical user interface
US11429433B2 (en) Process discovery and automatic robotic scripts generation for distributed computing resources
Bowers et al. Enabling scientificworkflow reuse through structured composition of dataflow and control-flow
US5999911A (en) Method and system for managing workflow
CN104679488B (en) A kind of flow custom development platform and flow custom development approach
CN106775632B (en) High-performance geographic information processing method and system with flexibly-expandable business process
CN112162727A (en) Cloud high-performance scientific computing workflow design control system and user graphical interface
US9916136B2 (en) Interface infrastructure for a continuation based runtime
WO2023071075A1 (en) Method and system for constructing machine learning model automated production line
US20110004564A1 (en) Model Based Deployment Of Computer Based Business Process On Dedicated Hardware
US20100262559A1 (en) Modelling Computer Based Business Process And Simulating Operation
CN110286892B (en) Rapid development engine system based on business process design
CN111682973B (en) Method and system for arranging edge cloud
CN111147555A (en) Heterogeneous resource mixed arrangement method
CN103984818A (en) AUV (autonomous underwater vehicle) design flow visualization modeling method based on Flex technology
CN111176645A (en) Power grid big data application-oriented data integration management system and implementation method thereof
Aksakalli et al. Systematic approach for generation of feasible deployment alternatives for microservices
CN112667221A (en) Deep learning model construction method and system for developing IDE (integrated development environment) based on deep learning
KR20130033652A (en) Method and apparatus for developing, distributing and executing object-wise dynamic compileless programs
CN115964185A (en) Micro-service management system for technical resource sharing
Brabra et al. Toward higher-level abstractions based on state machine for cloud resources elasticity
US20100235839A1 (en) Apparatus and method for automation of a business process
Kolonay et al. Grid interactive service-oriented programming environment
CN112395100A (en) Data-driven complex product cloud service data packet calling method and system
CN117406979B (en) Interface interaction design method and system for computing workflow

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20953593

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20953593

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 180723)