CN114661427B - Node management method and system for computing cluster for deploying containerized application service - Google Patents

Node management method and system for computing cluster for deploying containerized application service Download PDF

Info

Publication number
CN114661427B
CN114661427B CN202210535996.0A CN202210535996A CN114661427B CN 114661427 B CN114661427 B CN 114661427B CN 202210535996 A CN202210535996 A CN 202210535996A CN 114661427 B CN114661427 B CN 114661427B
Authority
CN
China
Prior art keywords
node
container
management
authorization
deployed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210535996.0A
Other languages
Chinese (zh)
Other versions
CN114661427A (en
Inventor
孙夏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhixing Technology Co Ltd
Original Assignee
Shenzhen Zhixing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zhixing Technology Co Ltd filed Critical Shenzhen Zhixing Technology Co Ltd
Priority to CN202210535996.0A priority Critical patent/CN114661427B/en
Publication of CN114661427A publication Critical patent/CN114661427A/en
Application granted granted Critical
Publication of CN114661427B publication Critical patent/CN114661427B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/52Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
    • G06F21/53Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow by executing in a restricted environment, e.g. sandbox or secure virtual machine
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45587Isolation or security of virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances

Abstract

The application relates to a node management method and a node management system for a computing cluster deploying containerized application services. The method comprises the following steps: appointing one main node in the at least one main node as a management node and deploying a first container corresponding to a first node management application on the management node; deploying a second container corresponding to a second node management application on each of the at least one worker node; and running a second container deployed on a working node associated with the authorization and verification requirement in response to the authorization and verification requirement, and running the first container deployed on the management node so as to determine whether the collected software and hardware information of the working node associated with the authorization and verification requirement is consistent with the authorization configuration file. Thus, the safety and the stability of the nodes are convenient to deploy and evaluate.

Description

Node management method and system for computing cluster for deploying containerized application service
Technical Field
The application relates to the technical field of cloud computing and cloud native application, in particular to cloud native application in the technical field of private computing and federal learning, and specifically relates to a node management method and system for a computing cluster deploying containerized application service.
Background
With the development of Cloud computing technology, in order to better utilize Cloud computing infrastructure and develop Cloud computing potential, a concept of Cloud Native (also called Native Cloud) application is proposed. The cloud native application refers to design, technology, methods and the like related to an application program and an architecture which are constructed for and run on a cloud platform or a cloud computing platform. The key technologies of the cloud native application and the native cloud architecture generally include a container (container) service, or a container arrangement technology, a containerized application service, and the like. A container is generally understood as a simulation of software application processes and an abstraction at the application layer, and the container service can enable the application to run normally in different computing environments and does not require the installation of a complete operating system and dependent environment after redeployment. In related products and services in the technical fields of privacy computing and federal learning, tasks such as training reasoning of a deep learning model and the like are widely applied through online services, particularly cloud services. However, privacy computing and federal learning put higher demands on data security and privacy information protection, for example, higher demands on security and reliability of nodes when cooperation among a plurality of computer network nodes is involved, which are difficult to be satisfied by the existing cloud-native technology.
Therefore, a need exists for a node management method and system for a computing cluster deploying containerized application services that overcomes the above-mentioned deficiencies.
Disclosure of Invention
In a first aspect, an embodiment of the present application provides a node management method for a computing cluster. Deploying a containerized application service on the computing cluster and the computing cluster including at least one master node and at least one worker node, the node management method comprising: designating one of the at least one master node as a management node and deploying a first container corresponding to a first node management application on the management node, wherein the first container is configured to: when the first container is manually or automatically run, the first container accesses an authorization profile that is not on the management node; deploying, on each of the at least one worker node, a second container corresponding to a second node management application, wherein the second container deployed on each of the at least one worker node is configured to: when the second container deployed on the working node is manually or automatically operated, the second container collects the software and hardware information of the working node and sends the collected software and hardware information of the working node to the management node; responding to an authorization verification requirement, operating a second container arranged on a working node associated with the authorization verification requirement, and operating the first container arranged on the management node so as to determine whether the collected software and hardware information of the working node associated with the authorization verification requirement is consistent with the authorization configuration file, if so, the authorization verification is passed, otherwise, the authorization verification is not passed.
The technical solution described in the first aspect is directed to a computing cluster on which a containerized application service is deployed, and the node management operation of the computing cluster is implemented in the form of the containerized application service deployed on the computing cluster, so that the features of lightweight virtualization and convenient deployment of a container technology are utilized, and the evaluation of software and hardware information of a working node associated with the authorization verification requirement according to an authorization configuration file is implemented by a first container running on a management node and a second container running on the working node, and the security is increased by making the authorization configuration file not on the management node.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the node management method further includes: in response to the joining of a new node, running the first container deployed on the management node and adding an authorization to the new node in the authorization profile, and deploying a second container on the new node; in response to exiting an existing node, the first container deployed on the management node is executed and the authorization for the existing node is deleted in the authorization profile, and the second container deployed on the existing node is deleted.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the computing cluster is a part of a cloud native application platform, a cloud native application architecture support system, a cloud native PaaS management platform, a cloud native privacy computing platform, or a cloud native federal learning platform.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the containerized application service deployed on the computing cluster includes at least one of: a Kubernets container orchestration engine, a Kubernets container orchestration and management service, a Kubernets container management platform, an Azure Kubernets service, an IBM Kubernets service, a Kubesphere container cloud platform, a Rancher container management platform, a k3s container management service, a MicroK8s container management tool, a Vmware Tanzu container scheduling framework, a RedHat OpenShift container scheduling framework, a Swam container management platform, and a meso container management platform.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the authorization verification requirement includes at least one of: a periodic maintenance requirement indicating an authorization verification of some or all of the at least one working node of the computing cluster and an execution of an abort or a limit of authority for an unauthorized working node, a software scheduling requirement indicating a scheduling requirement for one or more of the at least one working node of the computing cluster, and an abnormal behavior detection indicating a detection of an abnormal data request behavior, an abnormal data processing behavior, or an authorized time-out behavior of one or more of the at least one working node of the computing cluster.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the node management method further includes: and when the updated authorization configuration file is received, operating the first container deployed on the management node so as to perform format check on the updated authorization configuration file, and if the format check is passed, replacing the authorization configuration file by the updated authorization configuration file.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the authorization profile is located in a common storage space of the computing cluster, where the common storage space is not on any master node of the at least one master node included in the computing cluster and is not on any worker node of the at least one worker node included in the computing cluster.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the common storage space stores at least one of: configuration management resources (ConfigMap) of the compute cluster, software sensitive information, a scheduling policy, and resource constraints, wherein the software sensitive information includes an authorization certificate, a password, and a key, and the configuration management resources are used to implement configuration management of a containerized application service deployed on the compute cluster.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the authorization configuration file further includes at least one of the following: software sensitive information, including authorization credentials, passwords, and keys, scheduling policies, and resource constraints.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the node management method further includes: in response to the master node designated as a management node going down or an exception occurring, deleting the first container deployed on the master node designated as a management node, and designating another master node of the at least one master node as a new management node and deploying the first container on the new management node.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the node management method further includes: and responding to a software authorization application requirement, operating a second container which is arranged on each working node in the at least one working node, and operating the first container which is arranged on the management node so as to integrate the software and hardware information of each working node in the at least one working node to generate the information of the computing cluster for the software authorization application requirement.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the node management method further includes: in response to a local authentication request from a given one of the at least one worker node, running the first container deployed on the management node to compare software and hardware information of the given worker node with the authorization profile to generate a local authentication result, and sending the local authentication result to the given worker node.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the at least one master node of the computing cluster includes a first master node and a second master node, the at least one worker node of the computing cluster includes a first group of worker nodes and a second group of worker nodes, the first master node serves as a management node with respect to the first group of worker nodes, and the second master node serves as a management node with respect to the second group of worker nodes.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the node management method further includes: providing a priority scheduling label for the authorized one of the at least one worker node, wherein the worker node with the priority scheduling label has a higher priority in the invocation of the containerized application service than the worker node without the priority scheduling label.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the software and hardware information of the working node collected by the second container deployed on each working node of the at least one working node at least includes one preset hardware tag of multiple preset hardware tags, and the calling of the containerization application service is based on the preset hardware tag of each working node of the at least one working node.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the multiple preset hardware tags include: a first tag indicating whether there is an FPGA, a second tag indicating whether there is a GPU, a third tag indicating whether SGX is supported, a fourth tag indicating whether there is a TEE.
According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the invoking of the containerized application service is based on a preset hardware tag that each working node of the at least one working node has, and includes: preferentially calling a working node with a first label when the calling of the containerized application service requires reconfigurable computing, preferentially calling a working node with a second label when the calling of the containerized application service requires parallelized computing, preferentially calling a working node with a third label when the calling of the containerized application service requires hardware security isolation, and preferentially calling a working node with a fourth label when the calling of the containerized application service requires a trusted execution environment.
In a second aspect, embodiments of the present application provide a non-transitory computer-readable storage medium. The computer readable storage medium stores computer instructions which, when executed by a processor, implement the method according to any of the first aspects.
The technical solution described in the second aspect is directed to a computing cluster on which a containerized application service is deployed, wherein node management operations of the computing cluster are implemented in the form of the containerized application service deployed on the computing cluster, so that the features of lightweight virtualization and convenient deployment of a container technology are utilized, and evaluating software and hardware information of a work node associated with the authorization verification requirement according to an authorization configuration file is implemented by a first container running on a management node and a second container running on the work node, and security is increased by making the authorization configuration file not on the management node.
In a third aspect, an embodiment of the present application provides an electronic device. The electronic device includes: a processor; a memory for storing processor-executable instructions; wherein the processor implements the method according to any one of the first aspects by executing the executable instructions.
The technical solution described in the third aspect is directed to a computing cluster on which a containerized application service is deployed, and the node management operation of the computing cluster is implemented in the form of the containerized application service deployed on the computing cluster, so that the features of lightweight virtualization and convenient deployment of a container technology are utilized, and the evaluation of software and hardware information of a working node associated with the authorization verification requirement according to an authorization configuration file is implemented by a first container running on a management node and a second container running on the working node, and the security is not increased on the management node by making the authorization configuration file.
In a fourth aspect, an embodiment of the present application provides a node management system for a computing cluster. Deploying a containerized application service on the computing cluster and the computing cluster including at least one master node and at least one worker node, one of the at least one master node designated as a management node and deploying a first container corresponding to a first node management application on the management node. The first container is configured to: when the first container is manually or automatically run, the first container accesses an authorization profile that is not on the management node. Deploying, on each of the at least one worker node, a second container corresponding to a second node management application. Wherein the second container deployed on each of the at least one worker node is configured to: when the second container deployed on the working node is manually or automatically operated, the second container collects the software and hardware information of the working node and sends the collected software and hardware information of the working node to the management node. The node management system is configured to: responding to the authorization verification requirement, operating a second container arranged on a working node associated with the authorization verification requirement, and operating the first container arranged on the management node so as to determine whether the collected software and hardware information of the working node associated with the authorization verification requirement is consistent with the authorization configuration file, if so, the authorization verification is passed, otherwise, the authorization verification is not passed.
The technical solution described in the fourth aspect is directed to a computing cluster on which a containerized application service is deployed, wherein node management operations of the computing cluster are implemented in the form of the containerized application service deployed on the computing cluster, so that the features of lightweight virtualization and convenient deployment of a container technology are utilized, and evaluating software and hardware information of a work node associated with the authorization verification requirement according to an authorization configuration file is implemented by a first container running on a management node and a second container running on the work node, and security is increased by making the authorization configuration file not on the management node.
According to a possible implementation manner of the technical solution of the fourth aspect, an embodiment of the present application further provides that the node management system is further configured to: in response to the joining of a new node, running the first container deployed on the management node and adding an authorization for the new node in the authorization profile; in response to exiting an existing node, the first container deployed on the management node is executed and the authorization for the existing node is deleted in the authorization profile, and the second container deployed on the existing node is deleted.
According to a possible implementation manner of the technical solution of the fourth aspect, the embodiment of the present application further provides that the authorization profile is located in a common storage space of the computing cluster, where the common storage space is not on any master node of the at least one master node included in the computing cluster and is not on any worker node of the at least one worker node included in the computing cluster.
According to a possible implementation manner of the technical solution of the fourth aspect, an embodiment of the present application further provides that the software and hardware information of the working node, which is collected by the second container deployed on each of the at least one working node, at least includes one preset hardware tag of a plurality of preset hardware tags, and the calling of the containerization application service is based on the preset hardware tag that each working node of the at least one working node has.
Drawings
In order to explain the technical solutions in the embodiments or background art of the present application, the drawings used in the embodiments or background art of the present application will be described below.
Fig. 1 shows a flowchart of a node management method according to an embodiment of the present application.
Fig. 2 illustrates a schematic diagram of a computing cluster provided in an embodiment of the present application.
Fig. 3 shows a block diagram of an electronic device used in the node management method in fig. 1 according to an embodiment of the present application.
Fig. 4 shows a block diagram of a node management system provided in an embodiment of the present application.
Detailed Description
In order to solve the technical problem of the deficiency in the prior art, the embodiments of the present application provide a node management method and system for deploying a computing cluster of a containerized application service. Wherein a containerized application service is deployed on the computing cluster and the computing cluster includes at least one master node and at least one worker node, the node management method comprising: designating one of the at least one master node as a management node and deploying a first container corresponding to a first node management application on the management node, wherein the first container is configured to: when the first container is manually or automatically run, the first container accesses an authorization profile that is not on the management node; deploying, on each of the at least one worker node, a second container corresponding to a second node management application, wherein the second container deployed on each of the at least one worker node is configured to: when the second container deployed on the working node is manually or automatically operated, the second container collects the software and hardware information of the working node and sends the collected software and hardware information of the working node to the management node; responding to the authorization verification requirement, operating a second container arranged on a working node associated with the authorization verification requirement, and operating the first container arranged on the management node so as to determine whether the collected software and hardware information of the working node associated with the authorization verification requirement is consistent with the authorization configuration file, if so, the authorization verification is passed, otherwise, the authorization verification is not passed. The embodiment of the application has the following beneficial technical effects: aiming at a computing cluster on which a containerized application service is deployed, node management operation of the computing cluster is realized in the form of the containerized application service deployed on the computing cluster, so that the characteristics of light-weight virtualization of container technology, convenience in deployment and the like are utilized, evaluation of software and hardware information of a working node associated with the authorization verification requirement according to an authorization configuration file is realized through a first container running on a management node and a second container running on the working node, and safety is increased by enabling the authorization configuration file not to be located on the management node.
The embodiment of the application can be used in the following application scenarios including, but not limited to, cloud computing, cloud services, and technologies related to cloud native applications, such as a cloud native application platform, a cloud native application architecture, a cloud native management platform, a cloud native privacy computing platform, a cloud native federal learning platform, and the like.
The embodiments of the present application may be modified and improved according to specific application environments, and are not limited herein.
In order to make the technical field of the present application better understand, embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.
Fig. 1 shows a flowchart of a node management method according to an embodiment of the present application. Wherein the node management method of fig. 1 is applied to a computing cluster on which a containerized application service is deployed and which includes at least one master node and at least one worker node. As shown in fig. 1, the node management method includes the following steps.
Step S102: and one main node in the at least one main node is designated as a management node, and a first container corresponding to the first node management application is deployed on the management node.
Wherein the first container is configured to: when the first container is manually or automatically run, the first container accesses an authorization profile that is not on the management node.
Step S104: deploying, on each of the at least one worker node, a second container corresponding to a second node management application.
Wherein the second container deployed on each of the at least one worker node is configured to: when the second container deployed on the working node is manually or automatically operated, the second container collects the software and hardware information of the working node and sends the collected software and hardware information of the working node to the management node.
Step S106: in response to an authorization verification requirement, a second container deployed on a working node associated with the authorization verification requirement is operated, and the first container deployed on the management node is operated so as to determine whether the collected software and hardware information of the working node associated with the authorization verification requirement is consistent with the authorization configuration file.
And if so, the authorization verification is passed, otherwise, the authorization verification is not passed.
Wherein the first container and the second container both belong to a container or to a containerized application service deployed on the computing cluster. Here, the containerized application service, also including the first and second containers mentioned, is based on container technology. Container technology is generally understood to package code and its dependent items into an independent executable software package, sometimes referred to as a container image, which includes code, runtime, system tools, system libraries, settings, etc. necessary to run application processes, so that various applications (e.g., tasks in a container service, minimal deployment units (Pod), clusters, etc.) can be quickly switched from one computing environment to another and remain reliably run. Thus, container technology simulates a software application process by running a container or container image, or otherwise invoking a container. Each container emulates a different software application process and thus is an independently running process. Containers and container technologies are abstractions at the application level and may also be understood as a type of virtualized executable resource. Multiple containers may share a physical machine or may run on different physical machines. Multiple containers may share a common operating system, e.g., initiated by the same virtual machine, or may run under different operating systems. The container and container technology enables the software application process and the running process of various applications not to depend on a complete operating system and a dependent environment, so that the transplantation is facilitated, and the container construction process only packs necessary elements, namely only needs a lightweight virtualization technology, and can be conveniently deployed after the container construction. On the other hand, because the container is constructed and deployed independently of the operating system and the dependent environment (such as a physical machine), it is necessary to provide security and reliability for the computing environment in which the container is operated or invoked. Containers and container technologies, and related containerized application services, often run on a computing cluster. A computing cluster may be understood as a collection of nodes, e.g., a plurality of computer network nodes, or may be understood as a multi-master cluster environment, e.g., a cluster environment constructed by a plurality of masters through lan technology. A single computing cluster may include multiple physical machines, multiple virtual machines, or a combination of multiple physical and virtual machines. Multiple nodes under a single computing cluster may be on the same physical machine or on different physical machines, possibly on the same virtual machine or on different virtual machines. Multiple nodes under a single computing cluster may run under the same operating system or under different operating systems. Thus, the operating system and the dependent environment of each of the plurality of nodes under a single computing cluster may be independent of each other and may be different from each other, and thus the security and reliability of each node may need to be evaluated separately. In addition, before a piece of software or an application is to be run on a node, it is generally necessary to ensure that the node meets the authorization requirements and/or configuration requirements, etc. of the software or the application. The method comprises the steps of evaluating the safety and reliability of nodes, determining whether authorization requirements are met or not and the like, and belongs to the category of node management of a plurality of nodes under a computing cluster. In addition, new nodes may be added to the computing cluster, and nodes in the computing cluster may be dropped, which also belongs to the category of node management. The following describes relevant details and improved effects of the node management method for a computing cluster on which a containerized application service is deployed according to the embodiment of the present application, with reference to the foregoing steps S102 to S106.
One master node is designated as a management node in step S102 and a first container corresponding to a first node management application is deployed on the management node, and a second container corresponding to a second node management application is deployed on each worker node in step S104. Where a master node refers to a critical node, the organization of the master node generally represents the computing cluster, and the addition, deletion, or replacement of a master node may represent the replacement of an old computing cluster with a new computing cluster. A worker node is a specific node that provides a service to a user or performs a specific task. The number of master nodes of a computing cluster is generally fixed or pre-designed, and the number of working nodes may be increased or decreased as needed. By deploying a first container on one master node and a second container on each worker node, the subsequent node management method can be completed by running or calling the corresponding container. The structure of an exemplary compute cluster is described below in conjunction with FIG. 2.
Fig. 2 illustrates a schematic diagram of a computing cluster provided in an embodiment of the present application. As shown in FIG. 2, the computing cluster includes two master nodes and three worker nodes. Wherein the two master nodes are master node 202 and master node 204, respectively. The first container 212 is deployed on the master node 202, whereas the first container is not deployed on the master node 204, the master node 202 being designated as a management node. The three worker nodes are worker node 222, worker node 224, and worker node 226, respectively. Second container 232 is deployed on worker node 222, second container 234 is deployed on worker node 224, and second container 236 is deployed on worker node 226. It should be understood that the first container 212 may also be deployed on the master node 204, as long as the first container is deployed on one of the master nodes included in the computing cluster. In addition, the number of master nodes and worker nodes in the computing cluster of fig. 2 is merely exemplary, and the node management method of fig. 1 may be applied to computing clusters having any number of master nodes and any number of worker nodes.
Continuing with fig. 1, it was noted above that the first container is deployed on one master node and the second container is deployed on each worker node, such as first container 212 on master node 202 and second container 232, second container 234, and second container 236 on worker node 222, worker node 224, and worker node 226, respectively, as shown in fig. 2. And, the first container is configured to: when the first container is manually or automatically run, the first container accesses an authorization profile that is not on the management node; a second container disposed on each of the at least one worker node is configured to: when the second container deployed on the working node is manually or automatically operated, the second container collects the software and hardware information of the working node and sends the collected software and hardware information of the working node to the management node. Taking the example computing cluster of FIG. 2 as an example, when the first container 212 is manually or automatically run, the first container 212 accesses an authorization profile that is not on the management node (i.e., the master node 202); when the second container (e.g., second container 232, second container 234, or second container 236) deployed on the worker node (e.g., worker node 222, worker node 224, or worker node 226) is manually or automatically run, the second container collects the software and hardware information of the worker node and sends the collected software and hardware information of the worker node to the management node. Here, it should be noted that the first container, such as the first container 212, and the second container, such as the second container 232, the second container 234, or the second container 236, may be manually or automatically operated, or, in other words, may be manually or automatically invoked. Thus, running or invoking the first container or the second container means emulating the respective software application process, i.e. emulating the respective first node management application or second node management application. Also, such container runs or calls are operating system and dependent environments that are not dependent on the node to which the container belongs. Furthermore, when the first container runs, the first container accesses an authorization configuration file which is not on the management node, and when the second container runs, the second container collects software and hardware information of the working node to which the second container belongs and sends the collected software and hardware information of the working node to the management node. This means that the containerized application service deployed on the computing cluster is constructed based on the first container and the second container, and the corresponding containers are deployed according to the difference between the main node and the working node, and the node management of the computing cluster is completed by running or calling the corresponding containers. Wherein when the first container is running, emulating the first node's process on the management node on which the first container is deployed (i.e., the master node designated as the management node) to thereby access authorization profiles not on the management node; when the second container runs, simulating the process of the second node management application on the working node where the second container is deployed, so as to collect the software and hardware information of the working node and send the collected software and hardware information of the working node to the management node. Therefore, the node management operation of the computing cluster is realized in the form of containerized application service deployed on the computing cluster, and the access authorization configuration file is realized or software and hardware information is collected and uploaded in the mode of running or calling the first container or the second container, so that the characteristics of light-weight virtualization, convenience in deployment and the like of a container technology are utilized, and a way is provided for ensuring the safety and stability of the node when the container is run or called. Specifically, in step S106, in response to the authorization verification requirement, the second container deployed on the working node associated with the authorization verification requirement is run, and the first container deployed on the management node is run to determine whether the collected software and hardware information of the working node associated with the authorization verification requirement is consistent with the authorization profile. In this way, by operating the second container disposed on the working node associated with the authorization verification requirement, the software and hardware information of the working node associated with the authorization verification requirement is collected and uploaded to the management node, and then operating the first container to access the authorization configuration file, it can be determined whether the software and hardware information is consistent with the authorization configuration file, so that it can be determined whether the working node associated with the authorization verification requirement meets the authorization verification requirement. Authorization verification requirements may be based on any possible scenario as far as an evaluation of the security and reliability of one or more working nodes is concerned. In summary, the node management method of fig. 1, for a computing cluster on which a containerized application service is deployed, implements node management operations of the computing cluster in the form of the containerized application service deployed on the computing cluster, thereby taking advantage of the features of lightweight virtualization and easy deployment of container technology, and implements evaluation of software and hardware information of a working node associated with the authorization verification requirement according to an authorization configuration file by a first container running on a management node and a second container running on the working node, and adds security by making the authorization configuration file not on the management node.
In addition to the requirement for authorization verification or for the evaluation of the security and reliability of the worker nodes, there may be situations in which new worker nodes join or existing worker nodes exit, which also involve node management. To this end, in one possible implementation, the node management method further includes: in response to the joining of a new node, running the first container deployed on the management node and adding authorization for the new node in the authorization profile, and deploying a second container on the new node; in response to exiting an existing node, the first container deployed on the management node is run and authorization for the existing node is deleted in the authorization profile, and the second container deployed on the existing node is deleted. In this way, for both the joining of the new node and the exiting of the existing node, the second container is deployed on the new node and the second container deployed on the exiting existing node is deleted, so that the second container is deployed on all the working nodes of the updated computing cluster, and the nodes not belonging to the computing cluster are ensured to have no second container, thereby ensuring the consistency between the deployment of the containerized application service and the updated computing cluster. And by running the first container and adding the authorization to the new node in the authorization configuration file or deleting the authorization to the existing node in the authorization configuration file, consistency between the authorization configuration file and the updated computing cluster is ensured. It should be noted that the changes to the authorization profile, including the addition or deletion of authorization records therein, are made via a first container running on the management node, and the first container is run to access authorization profiles that are not on the management node. Therefore, the method ensures that other nodes except the management node in the computing cluster cannot change the authorization configuration file, and the attack on the management node cannot spread the authorization configuration file which is not on the management node. Furthermore, the response to the requirement for authorization verification, and the node management-related operations performed for the addition of a new node and the exit of an existing node are implemented by the first container and/or the second container, which means that the node management-related operations can be implemented based on only the containerized application service deployed on the computing cluster, and the first container can be deployed on any one of the master nodes of the computing cluster, so that the first container is not limited to the centralized authorization management server and can fully utilize the feasible master node resources of the computing cluster.
In one possible implementation, the computing cluster is part of a cloud-native application platform, a cloud-native application architecture support system, a cloud-native PaaS management platform, a cloud-native privacy computing platform, or a cloud-native federated learning platform. It should be understood that the computing cluster and the above-described node management method for the computing cluster may be applied to any application scenario in which security and reliability issues in cooperation among multiple nodes need to be considered, in particular to a cloud computing technology-based platform for private computing and federal learning.
In one possible implementation, the containerized application service deployed on the computing cluster includes at least one of: a Kubernets container orchestration engine, a Kubernets container orchestration and management service, a Kubernets container management platform, an Azure Kubernets service, an IBM Kubernets service, a Kubesphere container cloud platform, a Rancher container management platform, a k3s container management service, a MicroK8s container management tool, a Vmware Tanzu container scheduling framework, a RedHat OpenShift container scheduling framework, a Swam container management platform, and a meso container management platform. It should be appreciated that the containerization application service deployed on the computing cluster may also correspond to any suitable container orchestration engine or container management service based on container technology.
In one possible embodiment, the authorization verification requirement includes at least one of: a periodic maintenance requirement indicating an authorization verification of some or all of the at least one working node of the computing cluster and an execution of an abort or a limit of authority for an unauthorized working node, a software scheduling requirement indicating a scheduling requirement for one or more of the at least one working node of the computing cluster, and an abnormal behavior detection indicating a detection of an abnormal data request behavior, an abnormal data processing behavior, or an authorized time-out behavior of one or more of the at least one working node of the computing cluster. As mentioned above, the authorization verification requirement corresponds to an evaluation requirement for the security and reliability of the worker node. Specifically, the authorization verification requirement can be generally divided into a periodic maintenance requirement, a software scheduling requirement, and an abnormal behavior detection, and each corresponds to a working node associated with the authorization verification requirement, that is, a working node associated with the periodic maintenance requirement, the software scheduling requirement, or the abnormal behavior detection, respectively.
In a possible implementation manner, the node management method further includes: and when receiving the updated authorization configuration file, operating the first container deployed on the management node so as to perform format check on the updated authorization configuration file, and if the format check is passed, replacing the authorization configuration file with the updated authorization configuration file. Therefore, the validity and the legality of the updated authorization configuration file are ensured through format verification.
In one possible implementation, the authorization profile is located in a common storage space of the computing cluster that is not on any of the at least one master node included in the computing cluster and that is not on any of the at least one worker node included in the computing cluster. Wherein the common storage space is for the computing cluster as a whole, or at least for all master nodes of the computing cluster. Where a master node refers to a critical node, the organization of the master node generally represents the computing cluster, and the addition, deletion, or replacement of a master node may represent the replacement of an old computing cluster with a new computing cluster. By having the authorization profile located in a common storage space not on any master node, the security of the authorization profile can be better guaranteed and only the first container on the master node designated as the management node can access the authorization profile when it is run-time, i.e. can access the authorization profile located in the common storage space. In some embodiments, the common storage space stores at least one of: the configuration management method comprises the steps of configuration management resources (ConfigMap) of the computing cluster, software sensitive information, a scheduling policy and resource limitation, wherein the software sensitive information comprises an authorization certificate, a password and a secret key, and the configuration management resources are used for realizing configuration management of containerized application services deployed on the computing cluster. Therefore, other information needing security protection can be stored in the common storage space, and management of the computing cluster is facilitated. In some embodiments, the authorization profile further comprises at least one of: software sensitive information, including authorization credentials, passwords, and keys, scheduling policies, and resource limitations.
In a possible implementation manner, the node management method further includes: in response to the master node designated as a management node going down or an exception occurring, deleting the first container deployed on the master node designated as a management node, and designating another master node of the at least one master node as a new management node and deploying the first container on the new management node. As mentioned above, one master node in the computing cluster is designated as a management node, and when the master node designated as the management node is abnormal or offline, in order not to affect the node management, another master node may be designated as a new management node, and the first container on the original management node is deleted, so that it is ensured that the first container on the current management node is the only authorized configuration file that can be accessed in the current computing cluster, which is beneficial to maintaining the normal operation of the node management method.
In a possible implementation manner, the node management method further includes: responding to a software authorization application requirement, operating a second container deployed on each working node in the at least one working node, and operating the first container deployed on the management node so as to integrate the software and hardware information of each working node in the at least one working node to generate information of the computing cluster for the software authorization application requirement. The requirement for software authorization application generally refers to screening out a work node which can meet specific requirements from the computing cluster, for example, the work node can effectively resist piracy and infringement, and further, for example, the work node meets the requirements of digital signatures and the like. In order to respond to the software authorization application requirement, the node management method further collects the software and hardware information of each working node by operating the second containers on all the working nodes and generates information for the software authorization application requirement by operating the first container, which is beneficial to judging whether the software authorization application requirement can be met and which working nodes can be used for subsequent operations.
In a possible implementation manner, the node management method further includes: in response to a local authentication request from a given worker node of the at least one worker node, operating the first container deployed on the management node to compare software and hardware information of the given worker node with the authorization profile to generate a local authentication result, and sending the local authentication result to the given worker node. Where the local authentication request comes from a given worker node, for example, software or an application to be launched on the given worker node requires prior verification of the security and legitimacy of the given worker node for formal launch. In response to the local authentication request, the node management method generates a local authentication result by executing the first container.
In one possible implementation, the at least one master node of the computing cluster includes a first master node and a second master node, the at least one worker node of the computing cluster includes a first set of worker nodes and a second set of worker nodes, the first master node acts as a management node with respect to the first set of worker nodes, and the second master node acts as a management node with respect to the second set of worker nodes. Therefore, different management nodes can correspond to different groups through grouping, and the node management requirement of a large-scale computing cluster can be processed. In each group, the node management related operations between the management node of the group and the working node in the group may refer to the various embodiments described above, and are not described herein again.
In a possible implementation manner, the node management method further includes: providing a priority scheduling label for the working node of the at least one working node that passes the authorization verification, wherein the working node of the at least one working node having the priority scheduling label has a higher priority in the invocation of the containerized application service than the working node without the priority scheduling label. As mentioned above, the authorization verification requirements correspond to evaluation requirements for security and reliability of the worker node, such as scheduled maintenance requirements, software scheduling requirements, and abnormal behavior detection. For the working node passing the authorization verification, that is, the working node with the standard security and reliability, the priority scheduling label of the working node can also be generated by using the software and hardware information of the working node collected in the process of evaluating the working node. The priority scheduling tag here means that a higher priority is run or invoked in the invocation of the containerized application service. The generation of the priority scheduling label or the screening standard for the called working node with higher priority can be set according to actual needs, for example, according to the maximum computing capacity, idle resources and the like, so that the overall resource utilization efficiency of the computing cluster is improved by combining the node management method.
In a possible implementation manner, the software and hardware information of each working node collected by the second container deployed on the at least one working node at least includes one preset hardware tag of a plurality of preset hardware tags, and the invocation of the containerization application service is based on the preset hardware tag of each working node. Here, the preset hardware tag is a part of software and hardware information of the working node, and may be a feedback of a preset problem or a determination based on whether a preset criterion is satisfied. In some embodiments, the plurality of preset hardware tags includes: a first tag indicating whether a Field Programmable Gate Array (FPGA) is present, a second tag indicating whether a Graphics Processing Unit (GPU) is present, a third tag indicating whether Software Guard Extensions (SGX) is supported, and a fourth tag indicating whether a Trusted Execution Environment (TEE) is present. In this way, through the exemplary first tag, second tag, third tag and fourth tag, a work node with a specific preset hardware tag can be conveniently screened out from the work nodes of the computing cluster, for example, a work node with both FPGA and TEE is screened out. Therefore, the method is favorable for quickly judging the working node to be called and improving the overall operation efficiency. In some embodiments, the invoking of the containerized application service is based on a preset hardware tag that each of the at least one worker node has individually, including: when the calling of the containerized application service requires reconfigurable computing, a working node with a first label is called preferentially, when the calling of the containerized application service requires parallelized computing, a working node with a second label is called preferentially, when the calling of the containerized application service requires hardware security isolation, a working node with a third label is called preferentially, and when the calling of the containerized application service requires a trusted execution environment, a working node with a fourth label is called preferentially. In this way, a work node with a specific preset hardware tag may be selectively invoked for the invocation of the containerized application service, for example, a work node with a first tag, that is, with an FPGA, may be invoked for reconfigurable computing needs. When a composite requirement exists, for example, reconfigurable computing and hardware security isolation are required simultaneously, a working node with the first label and the third label can be called, and the overall operation efficiency is improved.
It is to be understood that the above-described method may be implemented by a corresponding execution body or carrier. In some exemplary embodiments, a non-transitory computer readable storage medium stores computer instructions that, when executed by a processor, implement the above-described method and any of the above-described embodiments, implementations, or combinations thereof. In some example embodiments, an electronic device includes: a processor; a memory for storing processor-executable instructions; wherein the processor implements the above method and any of the above embodiments, implementations, or combinations thereof by executing the executable instructions.
Fig. 3 shows a block diagram of an electronic device used in the node management method in fig. 1 according to an embodiment of the present application. As shown in fig. 3, the electronic device includes a main processor 302, an internal bus 304, a network interface 306, a main memory 308, and secondary processor 310 and secondary memory 312, as well as a secondary processor 320 and secondary memory 322. The main processor 302 is connected to the main memory 308, and the main memory 308 may be used to store computer instructions executable by the main processor 302, so that the node management method of fig. 1 may be implemented, including some or all of the steps, and any possible combination or combination and possible replacement or variation of the steps. The network interface 306 is used to provide network connectivity and to transmit and receive data over a network. The internal bus 304 is used to provide internal data interaction between the main processor 302, the network interface 306, the auxiliary processor 310, and the auxiliary processor 320. The secondary processor 310 is coupled to the secondary memory 312 and provides secondary computing power, and the secondary processor 320 is coupled to the secondary memory 322 and provides secondary computing power. The auxiliary processors 310 and 320 may provide the same or different auxiliary computing capabilities including, but not limited to, computing capabilities optimized for particular computing requirements such as parallel processing capabilities or tensor computing capabilities, computing capabilities optimized for particular algorithms or logic structures such as iterative computing capabilities or graph computing capabilities, and the like. The secondary processor 310 and the secondary processor 320 may include one or more processors of a particular type, such as a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or the like, so that customized functions and structures may be provided. In some exemplary embodiments, the electronic device may not include an auxiliary processor, may include only one auxiliary processor, and may include any number of auxiliary processors and each have a corresponding customized function and structure, which are not specifically limited herein. The architecture of the two auxiliary processors shown in FIG. 3 is for illustration only and should not be construed as limiting. In addition, the main processor 302 may include a single-core or multi-core computing unit to provide the functions and operations necessary for embodiments of the present application. In addition, the main processor 302 and the auxiliary processors (such as the auxiliary processor 310 and the auxiliary processor 320 in fig. 3) may have different architectures, that is, the electronic device may be a heterogeneous architecture based system, for example, the main processor 302 may be a general-purpose processor based on an instruction set operating system, such as a CPU, and the auxiliary processor may be a graphics processor GPU suitable for parallelized computation or a dedicated accelerator suitable for neural network model-related operations. Secondary memories (e.g., secondary memory 312 and secondary memory 322 shown in fig. 3) may be used to implement customized functions and structures with their respective secondary processors. While main memory 308 is provided to store the necessary instructions, software, configurations, data, etc. to cooperate with main processor 302 to provide the functionality and operations necessary for the embodiments of the present application. In some exemplary embodiments, the electronic device may not include the auxiliary memory, may include only one auxiliary memory, and may further include any number of auxiliary memories, which is not specifically limited herein. The architecture of the two auxiliary memories shown in fig. 3 is illustrative only and should not be construed as limiting. Main memory 308, and possibly secondary memory, may include one or more of the following features: volatile, nonvolatile, dynamic, static, readable/writable, read-only, random-access, sequential access, location addressability, file addressability, and content addressability, and may include random-access memory (RAM), flash memory, read-only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, a hard disk, a removable disk, a recordable and/or rewriteable Compact Disc (CD), a Digital Versatile Disc (DVD), a mass storage media device, or any other form of suitable storage media. The internal bus 304 may include any of a variety of different bus structures or combinations of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and/or a processor or local bus that utilizes any of a variety of bus architectures. It should be understood that the electronic device shown in fig. 3, the illustrated structure of which does not constitute a specific limitation as to the apparatus or system, may in some exemplary embodiments include more or less components than the specific embodiments and figures, or combine certain components, or split certain components, or have a different arrangement of components.
Fig. 4 shows a block diagram of a node management system provided in an embodiment of the present application. As shown in fig. 4, computing cluster 410 includes one master node and three worker nodes. Node management system 420 responds to the authorization verification requirements of compute cluster 410, new node joining, and existing node exiting. In particular, a containerized application service is deployed on the computing cluster 410 and the computing cluster 410 includes at least one master node and at least one worker node. One of the at least one master node is designated as a management node and a first container corresponding to a first node management application is deployed on the management node. The first container is configured to: when the first container is manually or automatically run, the first container accesses an authorization profile that is not on the management node. Deploying, on each of the at least one worker node, a second container corresponding to a second node management application. Wherein the second container deployed on each of the at least one worker node is configured to: when the second container deployed on the working node is manually or automatically operated, the second container collects the software and hardware information of the working node and sends the collected software and hardware information of the working node to the management node. The node management system 420 is configured to: responding to the authorization verification requirement, operating a second container arranged on a working node associated with the authorization verification requirement, and operating the first container arranged on the management node so as to determine whether the collected software and hardware information of the working node associated with the authorization verification requirement is consistent with the authorization configuration file, if so, the authorization verification is passed, otherwise, the authorization verification is not passed. It should be appreciated that the node management system 420 accomplishes the relevant operations by running a first container deployed on the master node and a second container on the worker node of the computing cluster 410. Although the node management system 420 is provided separately beside the computing cluster 410 for illustrative purposes, the node management system 420 should be understood as running on the computing cluster 410, specifically by running a first container and a second container on the computing cluster 410 to accomplish the relevant node management operations. The node management system 420 of fig. 4, for the computing cluster 410 on which the containerized application service is deployed, implements the node management operation of the computing cluster 410 in the form of the containerized application service deployed on the computing cluster 410, thereby taking advantage of the features of lightweight virtualization and easy deployment of container technology, and implements, by the first container running on the management node and the second container on the working node, the evaluation of the software and hardware information of the working node associated with the authorization verification requirement according to the authorization configuration file, and adds security by making the authorization configuration file not on the management node.
In one possible implementation, the node management system 420 is further configured to: in response to the joining of a new node, running the first container deployed on the management node and adding an authorization to the new node in the authorization profile; in response to exiting an existing node, the first container deployed on the management node is executed and the authorization for the existing node is deleted in the authorization profile, and the second container deployed on the existing node is deleted.
In one possible embodiment, the authorization profile is located in a common storage space of the computing cluster 410 that is not on any of the at least one master node included in the computing cluster 410 and that is not on any of the at least one worker node included in the computing cluster 410.
In a possible implementation manner, the software and hardware information of the working node collected by the second container deployed on each of the at least one working node at least includes one preset hardware tag of multiple preset hardware tags, and the invocation of the containerized application service is based on the preset hardware tag of each of the at least one working node.
The computing cluster 410 is illustratively shown in fig. 4 as including one master node and three worker nodes. The number of master nodes and worker nodes in computing cluster 410 is merely exemplary, and the node management system of FIG. 4 may be adapted for use in a computing cluster having any number of master nodes and any number of worker nodes.
The embodiments provided herein may be implemented in any one or combination of hardware, software, firmware, or solid state logic circuitry, and may be implemented in connection with signal processing, control, and/or application specific circuitry. Particular embodiments of the present application provide an apparatus or device that may include one or more processors (e.g., microprocessors, controllers, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), etc.) that process various computer-executable instructions to control the operation of the apparatus or device. Particular embodiments of the present application provide an apparatus or device that can include a system bus or data transfer system that couples the various components together. A system bus can include any of a variety of different bus structures or combination of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and/or a processor or local bus that utilizes any of a variety of bus architectures. The devices or apparatuses provided in the embodiments of the present application may be provided separately, or may be part of a system, or may be part of other devices or apparatuses.
Particular embodiments provided herein may include or be combined with computer-readable storage media, such as one or more storage devices capable of providing non-transitory data storage. The computer-readable storage medium/storage device may be configured to store data, programmers and/or instructions that, when executed by a processor of an apparatus or device provided by embodiments of the present application, cause the apparatus or device to perform operations associated therewith. The computer-readable storage medium/storage device may include one or more of the following features: volatile, non-volatile, dynamic, static, read/write, read-only, random access, sequential access, location addressability, file addressability, and content addressability. In one or more exemplary embodiments, the computer-readable storage medium/storage device may be integrated into a device or apparatus provided in the embodiments of the present application or belong to a common system. The computer-readable storage medium/memory device may include optical, semiconductor, and/or magnetic memory devices, etc., and may also include Random Access Memory (RAM), flash memory, read-only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, a hard disk, a removable disk, a recordable and/or rewriteable Compact Disc (CD), a Digital Versatile Disc (DVD), a mass storage media device, or any other form of suitable storage media.
The above is an implementation manner of the embodiments of the present application, and it should be noted that the steps in the method described in the embodiments of the present application may be sequentially adjusted, combined, and deleted according to actual needs. In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. It is to be understood that the embodiments of the present application and the structures shown in the drawings are not to be construed as particularly limiting the devices or systems concerned. In other embodiments of the present application, an apparatus or system may include more or fewer components than the specific embodiments and figures, or may combine certain components, or may separate certain components, or may have a different arrangement of components. Those skilled in the art will understand that various modifications and changes may be made in the arrangement, operation, and details of the methods and apparatus described in the specific embodiments without departing from the spirit and scope of the embodiments herein; without departing from the principles of embodiments of the present application, several improvements and modifications may be made, and such improvements and modifications are considered within the scope of the present application.

Claims (18)

1. A node management method of a computing cluster on which a containerized application service is deployed and which includes at least one master node and at least one worker node, the node management method comprising:
designating one of the at least one master node as a management node and deploying a first container corresponding to a first node management application on the management node, wherein the first container is configured to: when the first container is manually or automatically run, the first container accesses an authorization profile that is not on the management node, the authorization profile located in a common storage space of the computing cluster that is not on any of the at least one master node included in the computing cluster and that is not on any of the at least one worker node included in the computing cluster;
deploying, on each of the at least one worker node, a second container corresponding to a second node management application, wherein the second container deployed on each of the at least one worker node is configured to: when the second container deployed on the working node is manually or automatically operated, the second container collects the software and hardware information of the working node and sends the collected software and hardware information of the working node to the management node;
in response to an authorization verification requirement, running a second container deployed on a working node associated with the authorization verification requirement to collect software and hardware information of the working node associated with the authorization verification requirement, then running the first container deployed on the management node to determine whether the collected software and hardware information of the working node associated with the authorization verification requirement is consistent with the authorization configuration file, if so, the authorization verification is passed, otherwise, the authorization verification is not passed,
wherein, the node management method further comprises:
in response to the joining of a new node, running the first container deployed on the management node and adding authorization for the new node in the authorization profile, and deploying a second container on the new node;
in response to exiting an existing node, running the first container deployed on the management node and deleting authorization for the existing node in the authorization profile, and deleting a second container deployed on the existing node,
in response to the master node designated as the management node going offline or an exception occurring, deleting the first container deployed on the master node designated as the management node, and designating another master node of the at least one master node as a new management node and deploying the first container on the new management node.
2. The node management method of claim 1, wherein the computing cluster is part of a cloud-native application platform, a cloud-native application architecture support system, a cloud-native PaaS management platform, a cloud-native privacy computing platform, or a cloud-native federal learning platform.
3. The node management method of claim 1, wherein the containerized application services deployed on the computing cluster include at least one of: a Kubernets container orchestration engine, a Kubernets container orchestration and management service, a Kubernets container management platform, an Azure Kubernets service, an IBM Kubernets service, a Kubesphere container cloud platform, a Rancher container management platform, a k3s container management service, a MicroK8s container management tool, a Vmware Tanzu container scheduling framework, a RedHat OpenShift container scheduling framework, a Swam container management platform, and a meso container management platform.
4. The node management method of claim 1, wherein the authorization validation requirement comprises at least one of: a periodic maintenance requirement indicating an authorization verification of some or all of the at least one working node of the computing cluster and an execution of an abort or a limit of authority for an unauthorized working node, a software scheduling requirement indicating a scheduling requirement for one or more of the at least one working node of the computing cluster, and an abnormal behavior detection indicating a detection of an abnormal data request behavior, an abnormal data processing behavior, or an authorized time-out behavior of one or more of the at least one working node of the computing cluster.
5. The node management method according to claim 1, wherein the node management method further comprises: and when the updated authorization configuration file is received, operating the first container deployed on the management node so as to perform format check on the updated authorization configuration file, and if the format check is passed, replacing the authorization configuration file by the updated authorization configuration file.
6. The node management method according to claim 1, wherein the common storage space stores at least one of: the configuration management method comprises the steps of configuration management resources (ConfigMap) of the computing cluster, software sensitive information, a scheduling policy and resource limitation, wherein the software sensitive information comprises an authorization certificate, a password and a secret key, and the configuration management resources are used for realizing configuration management of containerized application services deployed on the computing cluster.
7. The node management method of claim 1, wherein the authorization profile further comprises at least one of: software sensitive information, including authorization credentials, passwords, and keys, scheduling policies, and resource limitations.
8. The node management method according to claim 1, wherein the node management method further comprises:
and responding to a software authorization application requirement, operating a second container which is arranged on each working node in the at least one working node, and operating the first container which is arranged on the management node so as to integrate the software and hardware information of each working node in the at least one working node to generate the information of the computing cluster for the software authorization application requirement.
9. The node management method according to claim 1, wherein the node management method further comprises:
in response to a local authentication request from a given one of the at least one worker node, running the first container deployed on the management node to compare software and hardware information of the given worker node with the authorization profile to generate a local authentication result, and sending the local authentication result to the given worker node.
10. The node management method of claim 1, wherein the at least one master node of the computing cluster comprises a first master node and a second master node, wherein the at least one worker node of the computing cluster comprises a first set of worker nodes and a second set of worker nodes, wherein the first master node acts as a management node with respect to the first set of worker nodes, and wherein the second master node acts as a management node with respect to the second set of worker nodes.
11. The node management method according to claim 1, wherein the node management method further comprises:
providing a priority scheduling label for the authorized one of the at least one worker node, wherein the worker node with the priority scheduling label has a higher priority in the invocation of the containerized application service than the worker node without the priority scheduling label.
12. The node management method according to claim 1, wherein the collected software and hardware information of each of the at least one working node in the second container deployed on the at least one working node at least includes one preset hardware tag of a plurality of preset hardware tags, and the calling of the containerized application service is based on the preset hardware tag of each of the at least one working node.
13. The node management method according to claim 12, wherein the plurality of preset hardware tags comprises: a first tag indicating whether there is an FPGA, a second tag indicating whether there is a GPU, a third tag indicating whether SGX is supported, a fourth tag indicating whether there is a TEE.
14. The node management method according to claim 13, wherein the invoking of the containerized application service is based on a preset hardware tag that each of the at least one worker node has, and comprises:
preferentially invoking a worker node having a first tag when invocation of the containerized application service requires reconfigurable computation,
preferentially invoking a worker node having a second tag when invocation of the containerized application service requires parallelized computations,
preferentially calling a worker node with a third tag when the calling of the containerized application service requires hardware security isolation,
preferentially calling the working node with the fourth label when the calling of the containerized application service requires a trusted execution environment.
15. A non-transitory computer readable storage medium, wherein the computer readable storage medium stores computer instructions that when executed by a processor implement the method of any one of claims 1 to 14.
16. An electronic device, characterized in that the electronic device comprises:
a processor;
a memory for storing processor-executable instructions;
wherein the processor implements the method of any one of claims 1 to 14 by executing the executable instructions.
17. A node management system for a computing cluster, wherein a containerized application service is deployed on the computing cluster and the computing cluster includes at least one master node and at least one worker node, one of the at least one master node being designated as a management node and a first container corresponding to a first node management application being deployed on the management node, the first container being configured to: when the first container is manually or automatically run, the first container accesses an authorization profile that is not on the management node, the authorization profile being located in a common storage space of the computing cluster that is not on any of the at least one master node included in the computing cluster and that is not on any of the at least one worker node included in the computing cluster, deploying a second container corresponding to a second node management application on each of the at least one worker node, wherein the second container deployed on each of the at least one worker node is configured to: when the second container deployed on the working node is manually or automatically operated, the second container collects the software and hardware information of the working node and sends the collected software and hardware information of the working node to the management node,
the node management system is configured to:
responding to an authorization verification requirement, operating a second container which is arranged on a working node associated with the authorization verification requirement so as to collect software and hardware information of the working node associated with the authorization verification requirement, and then operating the first container which is arranged on the management node so as to determine whether the collected software and hardware information of the working node associated with the authorization verification requirement is consistent with the authorization configuration file, if so, the authorization verification is passed, otherwise, the authorization verification is not passed;
in response to the joining of a new node, running the first container deployed on the management node and adding an authorization for the new node in the authorization profile;
in response to exiting an existing node, running the first container deployed on the management node and deleting authorization for the existing node in the authorization profile, and deleting a second container deployed on the existing node;
in response to the master node designated as a management node going down or an exception occurring, deleting the first container deployed on the master node designated as a management node, and designating another master node of the at least one master node as a new management node and deploying the first container on the new management node.
18. The node management system according to claim 17, wherein the collected software and hardware information of each of the at least one working node in the second container deployed on the at least one working node includes at least one preset hardware tag from a plurality of preset hardware tags, and the calling of the containerized application service is based on the preset hardware tag of each of the at least one working node.
CN202210535996.0A 2022-05-18 2022-05-18 Node management method and system for computing cluster for deploying containerized application service Active CN114661427B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210535996.0A CN114661427B (en) 2022-05-18 2022-05-18 Node management method and system for computing cluster for deploying containerized application service

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210535996.0A CN114661427B (en) 2022-05-18 2022-05-18 Node management method and system for computing cluster for deploying containerized application service

Publications (2)

Publication Number Publication Date
CN114661427A CN114661427A (en) 2022-06-24
CN114661427B true CN114661427B (en) 2022-08-19

Family

ID=82036801

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210535996.0A Active CN114661427B (en) 2022-05-18 2022-05-18 Node management method and system for computing cluster for deploying containerized application service

Country Status (1)

Country Link
CN (1) CN114661427B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109564527A (en) * 2016-06-16 2019-04-02 谷歌有限责任公司 The security configuration of cloud computing node
CN113204410A (en) * 2021-05-31 2021-08-03 平安科技(深圳)有限公司 Container type localized deployment method, system, equipment and storage medium
WO2022056845A1 (en) * 2020-09-18 2022-03-24 Zte Corporation A method of container cluster management and system thereof

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8713522B2 (en) * 2011-05-05 2014-04-29 Microsoft Corporation Validating the configuration of distributed systems
CN111510333B (en) * 2020-04-15 2023-04-18 中国工商银行股份有限公司 Alliance block chain system based on K3S platform, realization method and device
CN112905337B (en) * 2021-02-07 2024-03-26 中国工商银行股份有限公司 MySQL cluster scheduling method and device for software and hardware hybrid deployment
CN113254220A (en) * 2021-07-01 2021-08-13 国汽智控(北京)科技有限公司 Networked automobile load cooperative control method, device, equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109564527A (en) * 2016-06-16 2019-04-02 谷歌有限责任公司 The security configuration of cloud computing node
WO2022056845A1 (en) * 2020-09-18 2022-03-24 Zte Corporation A method of container cluster management and system thereof
CN113204410A (en) * 2021-05-31 2021-08-03 平安科技(深圳)有限公司 Container type localized deployment method, system, equipment and storage medium

Also Published As

Publication number Publication date
CN114661427A (en) 2022-06-24

Similar Documents

Publication Publication Date Title
EP3340057B1 (en) Container monitoring method and apparatus
CN100456249C (en) Architecture and method for managing the sharing of logical resources among separate partitions of a logically partitioned computer system
US8024564B2 (en) Automating configuration of software applications
JP2020166879A (en) Memory allocation techniques at partially-offloaded virtualization managers
CA2637749C (en) Method, system, and program product for deploying a platform dependent application in a grid environment
CN109564524A (en) The safety guidance of virtualization manager
CN111212116A (en) High-performance computing cluster creating method and system based on container cloud
CN108021400B (en) Data processing method and device, computer storage medium and equipment
KR20100138980A (en) Method and system for detecting the installation and usage of software in an application virtualization environment
JP5486611B2 (en) Parallel task application framework
Lin et al. ABS-YARN: A formal framework for modeling Hadoop YARN clusters
US9959157B1 (en) Computing instance migration
Gogouvitis et al. Seamless computing in industrial systems using container orchestration
CN110851188A (en) Domestic PLC trusted chain implementation device and method based on binary architecture
CN113010265A (en) Pod scheduling method, scheduler, memory plug-in and system
US20180329729A1 (en) Software-defined microservices
Real et al. Dynamic spatially isolated secure zones for NoC-based many-core accelerators
CN114661427B (en) Node management method and system for computing cluster for deploying containerized application service
Fazio et al. Managing volunteer resources in the cloud
CN111181929A (en) Heterogeneous hybrid cloud architecture based on shared virtual machine files and management method
CN111310193A (en) Data processing method, device, storage medium and processor
CN108491249B (en) Kernel module isolation method and system based on module weight
US20230222211A1 (en) Unified workload runtime protection
CN116028163A (en) Method, device and storage medium for scheduling dynamic link library of container group
JP2022501733A (en) Data management methods and devices and servers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant