CN115422010A - Node management method and device in data cluster and storage medium - Google Patents

Node management method and device in data cluster and storage medium Download PDF

Info

Publication number
CN115422010A
CN115422010A CN202211138811.9A CN202211138811A CN115422010A CN 115422010 A CN115422010 A CN 115422010A CN 202211138811 A CN202211138811 A CN 202211138811A CN 115422010 A CN115422010 A CN 115422010A
Authority
CN
China
Prior art keywords
node
task
nodes
abnormal
manager
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211138811.9A
Other languages
Chinese (zh)
Inventor
闾泽军
申鹏
邢乃路
付庆午
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202211138811.9A priority Critical patent/CN115422010A/en
Publication of CN115422010A publication Critical patent/CN115422010A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • G06F11/3093Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application provides a node management method, a node management device and a storage medium in a data cluster. In the technical scheme of the application, a resource manager acquires at least one abnormal node information sent by at least one application manager and at least one task running information sent by at least one node manager, wherein a node indicated by the at least one abnormal node information is a node in a plurality of nodes connected with the at least one node manager, and the at least one task running information includes running information of a target task which fails to run at any node in the plurality of nodes; determining a target abnormal node from the plurality of nodes according to the at least one abnormal node information and the at least one task running information; and then performing task scheduling on normal nodes in the plurality of nodes. The node management method can improve the accuracy of identifying abnormal nodes in the big data cluster, and further improves the scheduling stability of the big data cluster.

Description

Node management method and device in data cluster and storage medium
Technical Field
The present application relates to the field of big data technologies, and in particular, to a method and an apparatus for managing nodes in a data cluster, and a storage medium.
Background
In the field of big data technology, a resource scheduling system (Hadoop tier and other resource scheduler, hadoop Yarn) of a big data cluster is responsible for resource management and scheduling work of the system, has become a de facto standard of big data resource management in recent years, and supports a plurality of computing engines such as a parallel computing programming model (MapReduce), a big data computing framework (Spark), a streaming computing framework (flex), a distributed execution framework (Tez) of DAG jobs, and the like. Yarn is divided into global Resource Manager (RM) and Node Manager (NM) roles, where RM is mainly responsible for global allocation and management and NM is responsible for resource allocation and management of individual nodes.
With the continuous expansion of cluster scale, in the management of cluster resources, abnormal events of nodes always occur, in the current related technology, the management of abnormal nodes appearing in the cluster resources mainly provides disk damage condition detection by default through a health state monitoring mechanism of an NM (network management module) and also supports a user-defined monitoring script.
However, the existing detection mode of node exception is not accurate enough to identify the abnormal node, which results in low scheduling stability of the big data cluster, and therefore how to improve the scheduling stability of the big data cluster becomes a problem to be solved urgently at present.
Disclosure of Invention
The application provides a node management method, a node management device and a storage medium in a data cluster.
In a first aspect, the present application provides a node management method in a data cluster, which is applied to a resource manager, and the method includes: the method comprises the steps of obtaining at least one abnormal node information sent by at least one application program manager and at least one task running information sent by at least one node manager, wherein a node indicated by the at least one abnormal node information is a node in a plurality of nodes connected with the at least one node manager, and the at least one task running information comprises running information of a target task which fails to run on any one of the plurality of nodes; determining a target abnormal node from the plurality of nodes according to the at least one abnormal node information and the at least one task running information; and performing task scheduling on normal nodes in the plurality of nodes, wherein the normal nodes are nodes except for part or all of the target abnormal nodes in the plurality of nodes.
In the embodiment of the application, the resource manager acquires abnormal node information from the application manager and task running information from the node manager, comprehensively determines the target abnormal node, and further performs task scheduling on nodes except part or all of the abnormal nodes in the target abnormal node.
In a second aspect, the present application provides a node management method in a data cluster, which is applied to a node manager, and the method includes: acquiring at least one task running information, wherein the at least one task running information comprises running information of a target task which fails to run on any one of a plurality of nodes connected with the node manager; and sending the at least one task running information to a resource manager.
In a third aspect, the present application provides a node management apparatus in a data cluster, which is applied to a resource manager, and the apparatus includes: an obtaining module, configured to obtain at least one abnormal node information sent by at least one application manager and at least one task running information sent by at least one node manager, where a node indicated by the at least one abnormal node information is a node in a plurality of nodes connected to the at least one node manager, and the at least one task running information includes running information of a target task that has failed to run on any node in the plurality of nodes; the determining module is used for determining a target abnormal node from the plurality of nodes according to the at least one abnormal node information and the at least one task running information; and the scheduling module is used for performing task scheduling on normal nodes in the plurality of nodes, wherein the normal nodes are nodes except for part or all of the target abnormal nodes in the plurality of nodes.
In a fourth aspect, the present application provides a node management apparatus in a data cluster, which is applied to a node manager, and the apparatus includes: an obtaining module, configured to obtain at least one task running information, where the at least one task running information includes running information of a target task that fails to run on any node of a plurality of nodes connected to the node manager; and the sending module is used for sending the at least one task running information to the resource manager.
In a fifth aspect, the present application provides an apparatus for managing nodes in a data cluster, including a processor and a memory, where the memory is used to store code instructions; the processor is configured to execute the code instructions to implement the method of the first aspect or the second aspect or any possible implementation manner thereof.
In a sixth aspect, the present application provides a computer-readable storage medium storing a computer program (which may also be referred to as code, or instructions) which, when run on a computer, causes the computer to perform the method of the first or second aspect or any of its possible implementations.
In a seventh aspect, the present application provides a computer program product comprising: a computer program (which may also be referred to as code, or instructions), which when executed, causes a computer to perform the method of the first or second aspect or any of its possible implementations.
Drawings
Fig. 1 is a schematic structural diagram of a Yarn scheduling system according to an embodiment of the present application;
fig. 2 is a flowchart of a node management method in a data cluster according to an embodiment of the present application;
fig. 3 is a flowchart of a node management method in a data cluster according to another embodiment of the present application;
FIG. 4 is a schematic diagram illustrating an application blacklist determining a cluster global blacklist according to an embodiment of the present application;
FIG. 5 is a flowchart of a method for managing nodes in a data cluster according to another embodiment of the present application;
fig. 6 is a schematic structural diagram of a node management apparatus in a data cluster according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a node management apparatus in a data cluster according to another embodiment of the present application;
fig. 8 is a schematic structural diagram of an apparatus according to another embodiment of the present application.
Detailed Description
The technical solution in the present application will be described below with reference to the accompanying drawings.
In the embodiments of the present application, terms such as "first" and "second" are used to distinguish the same or similar items having substantially the same function and action. For example, the first instruction and the second instruction are for distinguishing different user instructions, and the order of the user instructions is not limited. Those skilled in the art will appreciate that the terms "first," "second," and the like do not denote any order or importance, but rather the terms "first," "second," and the like do not denote any order or importance.
It is noted that the words "exemplary," "for example," and "such as" are used herein to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.
Further, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a alone, A and B together, and B alone, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, and c, may represent: a, or b, or c, or a and b, or a and c, or b and c, or a, b and c, wherein a, b and c can be single or multiple.
Fig. 1 is a schematic structural diagram of a Yarn scheduling system according to an embodiment of the present application. As shown in fig. 1, the Yarn scheduling system 100 includes a Resource Manager (RM) 101 and a Node Manager (NM) 102. Where interactions between RM 101 and NM102 may take place. For example, the RM 101 may monitor the NM102, and perform unified management and scheduling on resources on the NM102, and the NM102 may report the state of the task running thereon (e.g., usage information such as a disk, a memory, and a CPU) to the RM 101 periodically.
It is understood that the number of NM102 shown in fig. 1 is only an example, and the Yarn scheduling system 100 may generally include one RM 101 and a plurality of NM102, and the application does not limit the number of NM 102.
Illustratively, the RM 101 is a central resource manager of the Yarn scheduling system 100, and is responsible for resource management and allocation of the entire system, including processing client requests, starting/monitoring an Application Manager (AM), monitoring NM102, allocation and scheduling of resources, and the like. For example, when a user submits an application, the RM 101 needs to provide an AM to track and manage the application, which is responsible for applying for resources from the RM 101 and asking the NM102 to start a task (container) that may occupy certain resources. Since different AM's are distributed to different nodes, they do not affect each other.
NM102 may be regarded as a resource and task manager on each node, and is responsible for the resource usage of each node and the running state of each container, and interacts with RM 101, that is, reports the state of the node (usage information such as a disk, a memory, and a CPU) to RM 101; NM102 handles various actual requests attached to the AM at the same time, such as start, stop, etc. of the container. The container is a unit for allocating resources in Yarn, and includes resources such as memory, CPU, and the like, and may also be regarded as the package of Yarn in multi-dimensional resources, and the specific package of container is dynamically generated according to the requirements of the application program.
As an example, in the Yarn scheduling system, when a user submits a job to the RM 101, the RM 101 schedules an AM corresponding to the job to one NM102 node, after an AM process is started, the RM 101 starts to register, and applies for a resource to the RM 101 according to a job execution plan, and then the RM 101 schedules a container to each NM102 node, and the container periodically reports the execution state to the AM, and all NM102 in Yarn also periodically reports the resource and process state to the RM 101.
In recent years, in the field of big data computing, hadoop Yarn has become a de facto standard for big data resource management, supporting many computing engines such as MapReduce, spark, flex, and Tez. However, with the continuous expansion of cluster size, in the cluster resource management, a small probability event of abnormal node always occurs, so the fault tolerance of the abnormal node must be considered in the scheduling.
In the related art, the management of abnormal nodes appearing in cluster resources mainly passes through the NM-based health status monitoring mechanism. For example, the Yarn scheduling system includes an RM and multiple NMs, each of which may include multiple disks, where each of the NMs periodically monitors a health state of a disk through a "localdirsrervice" service therein, if a space occupation of a certain disk exceeds a threshold set by an administrator, the disk may be marked as an abnormal state, if a number of disks in the abnormal state on a node exceeds a preset ratio, a corresponding whole node may be marked as an abnormal state (Unhealthy), that is, an abnormal node, further, the corresponding NM reports the abnormal node to the RM through a Heartbeat (Heartbeat), the RM clears a container running on the abnormal node, and a subsequent schedule does not allocate a container to the abnormal node any more.
However, the existing detection mode of the abnormal node strongly depends on NM, and the detection strategy is only limited in the physical environment of the node itself, and the identification of the abnormal node is not accurate enough, which results in low scheduling stability of the big data cluster, so how to improve the scheduling stability of the big data cluster becomes a problem to be solved urgently at present.
In view of this, embodiments of the present application provide a method and an apparatus for managing nodes in a data cluster, which can improve accuracy of identifying abnormal nodes in a big data cluster, and further improve scheduling stability of the big data cluster.
The following describes a node management method in a data cluster according to an embodiment of the present application in detail with reference to the accompanying drawings.
Referring to fig. 2, a flowchart of a node management method in a data cluster according to an embodiment of the present application is shown. The method may be applied to the Yarn scheduling system shown in fig. 1, and may also be applied to other scenarios in addition, which is not limited in this embodiment of the present application. For convenience of illustration, the method is applied to the Yarn scheduling system shown in fig. 1 as an example hereinafter, and accordingly, the resource manager hereinafter is the RM 101 shown in fig. 1, and the node manager hereinafter is the NM102 shown in fig. 1. The following describes in detail the various steps in the method shown in fig. 2, the flow chart including:
s201, the application manager sends at least one abnormal node message to the resource manager, and correspondingly, the resource manager receives the at least one abnormal node message.
Wherein the node indicated by the at least one abnormal node information is a node among a plurality of nodes connected to the at least one node manager.
It should be understood that, in the running process of the Yarn scheduling system, the resource manager starts the corresponding application program manager according to the operation requirement, and the application program manager obtains the corresponding abnormal node information according to the task condition failed in the execution process and sends the abnormal node information to the resource manager.
As an example, if the application manager (e.g., app 1) fails to operate on each of the node N1, the node N2, and the node N3, the abnormal node information includes the node N1, the node N2, and the node N3.
S202, the node manager acquires at least one task running information, wherein the at least one task running information comprises running information of a target task which fails to run on any one of a plurality of nodes connected with the node manager.
It should be understood that, during the operation of the yann scheduling system, the node manager may obtain operation information of the tasks operating thereon, such as status codes of all tasks (the running task and the finished task), where a part of the status codes may effectively reflect whether the corresponding nodes are abnormal or not. Or, the running information may be used to indicate a running time of the task, and if the running time of a certain task exceeds a preset time threshold, the task may be considered as an abnormal task. Of course, the operation information may also include other contents, and is not limited herein.
S203, the node manager sends at least one task running information to the resource manager, and correspondingly, the resource manager receives the at least one task running information.
Wherein the at least one task execution information includes execution information of a target task that has failed to be executed at any one of the plurality of nodes.
Understandably, the node manager sends the at least one task running information to the resource manager periodically through heartbeat, and informs the resource manager of the condition of the corresponding node. For example, the status codes of all tasks (running task and task that has finished running) are sent to the resource manager, which in turn receives the status codes.
S204, the resource manager determines a target abnormal node from the plurality of nodes according to the at least one abnormal node information and the at least one task running information.
It should be understood that the resource manager determines that all nodes with abnormality are target abnormal nodes according to the at least one abnormal node information and the at least one task running information received in the above steps; or determining a node with a serious abnormality as a target abnormal node according to the at least one abnormal node information and the at least one task operation information, wherein the node with the serious abnormality can be judged by judging whether the two information, namely the abnormal node information and the task operation information, coincide for a plurality of times, the coincidence times exceed a preset threshold value, which indicates that the abnormality is serious, and the more the coincidence times, the more the abnormality is serious.
S205, the resource manager carries out task scheduling on normal nodes in the plurality of nodes.
The normal nodes are nodes except for part or all of the target abnormal nodes in the plurality of nodes.
It should be understood that after the resource manager finds the target abnormal node, the task scheduling is not performed on the target abnormal node, and the task scheduling is performed on nodes except for some or all of the target abnormal nodes.
Illustratively, some nodes may only generate exceptions when running a specific task, in this case, normal nodes are nodes except some exception nodes in the target exception node, and are not all excluded, for example, the resource manager does not schedule these specified exception nodes only when running the specific task, and in other cases, the scheduling may be continued. If some nodes are determined to be abnormal nodes no matter what task is run, the normal nodes are nodes except all abnormal nodes in the target abnormal nodes.
Optionally, the method for the resource manager to perform task scheduling on the normal node may refer to a task scheduling method in the prior art, which is not limited in this application.
In the above embodiment, the resource manager obtains the abnormal node information from the application manager and the task running information from the node manager, and determines the target abnormal node comprehensively, and the target abnormal node obtained by this way is relatively accurate, for example, it may identify nodes with various environmental abnormalities, such as node user group missing, abnormal running caused by configuration file error, abnormal disk directory authority, and missing dependent packet, and the retry of the task is greatly reduced.
Further, based on the determined target abnormal node, task scheduling is performed on nodes except for part or all of the abnormal nodes in the target abnormal node, the method for determining the target abnormal node is participated by the three components of the comprehensive application manager, the node manager and the resource manager, state reference information during task running is fully utilized, the abnormal node is elected by adopting a voting idea, various environmental abnormal problems which cannot be covered by physical monitoring are excavated, internal autonomy of a scheduling system is formed, accuracy of identifying the abnormal node in the big data cluster can be improved, and scheduling stability of the big data cluster is further improved.
Based on the foregoing embodiment, fig. 3 is a flowchart of a node management method in a data cluster according to another embodiment of the present application. In the embodiment shown in fig. 3, taking how to determine the target abnormal node according to the abnormal node information and the task running information as an example, the following describes each step in the method shown in fig. 3 in detail, where the flowchart includes:
s301, the application manager sends at least one abnormal node message to the resource manager, and correspondingly, the resource manager receives the at least one abnormal node message.
Wherein the node indicated by the at least one abnormal node information is a node among a plurality of nodes connected to the at least one node manager.
This step is similar to step S201 in the embodiment shown in fig. 2, and is not repeated herein.
S302, the node manager acquires at least one task running information, wherein the at least one task running information comprises running information of a target task which fails to run on any one of a plurality of nodes connected with the node manager.
This step is similar to step S202 in the embodiment shown in fig. 2, and is not described again here.
S303, the node manager sends at least one task running information to the resource manager, and correspondingly, the resource manager receives the at least one task running information.
Wherein the at least one task execution information includes execution information of a target task that has failed to be executed at any one of the plurality of nodes.
This step is similar to step S203 in the embodiment shown in fig. 2, and is not described again here.
S304, the resource manager determines a global abnormal node and at least one application program abnormal node corresponding to at least one application program from the plurality of nodes according to the at least one abnormal node information.
The resource manager does not schedule any task on the global abnormal node, the resource manager does not schedule a task corresponding to the target application program on the target application program abnormal node, the target application program abnormal node corresponds to the target application program, and the target application program abnormal node is one of the at least one application program abnormal node.
It should be understood that the application manager obtains corresponding abnormal node information according to a task condition failing in the execution process, marks a node corresponding to the abnormal node information as an application abnormal node, and then sends the abnormal node information to the resource manager to request the resource manager to replace another node for the abnormal node.
For example, if the application manager fails to run a task on a node, the application manager marks the node as an application abnormal node, and then sends the application abnormal node to the resource manager through heartbeat, so that the resource manager is informed that the node has a problem and requests to replace another node for the node.
Optionally, the set of application exception nodes may also be called an application blacklist (appblack) of nodes.
Further, the resource manager processes the received application program Blacklist sent by the application program manager, counts the frequency of occurrence of abnormal nodes corresponding to the application program Blacklist, and if the frequency is greater than or equal to a preset threshold, the abnormal nodes are determined as Global abnormal nodes, that is, written into a Global black list (Global black list).
For example, as shown in fig. 4, the application blacklist of the application manager App1 includes a node N1, a node N2, and a node N3, the application blacklist of the application manager App2 includes a node N2, a node N4, and a node N6, the application blacklist of the application manager App3 includes a node N2, a node N6, and a node N7, and the application blacklist of the application manager App4 includes a node N2, a node N6, and a node N8, and assuming that the preset threshold is 3, the node N2 appears 4 times, and the node N6 appears 3 times, that is, the occurrence frequency of the node N2 and the node N6 exceeds the preset threshold, the node N2 and the node N6 are considered as a cluster global blacklist.
It should be appreciated that nodes in the cluster global blacklist prohibit scheduling of any task, and nodes in the application blacklist prohibit scheduling of tasks only to the corresponding application.
In the step, the method for determining the global abnormal node and the application program abnormal node according to the abnormal node information is accurate, and subsequent task scheduling is facilitated.
S305, the resource manager determines at least one abnormal task node of the application program from the plurality of nodes according to the at least one task running information.
The resource manager does not schedule the target tasks on the target application program abnormal task nodes, and the target application program abnormal task nodes are one of at least one application program abnormal task node.
Optionally, each of the at least one task running information received by the resource manager includes running failure task information, where the task information is used to indicate a failure reason of a running failure task.
Illustratively, the task information of the failed run may be an exit code of a target task (AM Container) in the target application of the failed run on the corresponding node manager. That is, the target task in the target application program has been scheduled on the node, but the running fails, and the exit code of the running-failed task includes the reason of the running failure, for example, the running failure is caused by failure in the starting process of the target task, failure in the running process of the target task, or failure caused by abnormal configuration environment of the node.
Further, the resource manager removes a target failed task from at least one task running information, and obtains at least one updated task running information, wherein the target failed task is a task with a failure reason indication of running failure caused by physical resources other than the node. And determining at least one abnormal task node of the application program according to the at least one piece of updated task running information.
It should be understood that the task that fails to run due to the physical resource other than the node is also the target task that sent the exit code, which has been scheduled on the node but failed to run.
Illustratively, the at least one piece of updated task running information may include a case where an application manager Launcher (AM Launcher) directly generates an exception inside the resource manager when starting the target application manager, that is, the target task does not reach the node manager, or a unidirectional network between the resource manager and the node manager is not available, and the like.
Alternatively, the set of nodes where the application manager is abnormal may also be called an application manager Blacklist (AMContainer black list), and the nodes in the application manager Blacklist only limit the scheduling of the corresponding application manager.
Optionally, the resource manager includes an abnormal node manager, and the step of acquiring the abnormal node information sent by the application manager and the task running information sent by the node manager, and determining the target abnormal node may be performed by the abnormal node manager.
In the step, the method for determining the abnormal task node of the application program according to the task running information considers the possibility that the abnormality is generated in the task running process and the abnormality is generated when the task does not start running yet, the obtained target abnormal node is more accurate, and the subsequent task scheduling is convenient.
S306, the resource manager carries out task scheduling on normal nodes in the plurality of nodes.
The normal nodes are nodes except for part or all of the abnormal nodes in the target abnormal nodes in the plurality of nodes.
It should be understood that, during the running process, the abnormal node manager in the resource manager may update the node information in the cluster global Blacklist and the application abnormal task Blacklist in real time, and transmit the information to a scheduling module (Scheduler) in the resource manager, which is used as a scheduling Blacklist Filter (Blacklist Filter), so as to achieve the purpose of avoiding scheduling.
That is to say, when a scheduling module in the resource manager performs task scheduling, the scheduling module firstly filters out relevant nodes in a cluster global blacklist and an application program abnormal task blacklist, and then performs task scheduling on normal nodes except for part or all of the abnormal nodes in the target abnormal nodes.
In the above embodiment, the resource manager obtains the abnormal node information from the application manager and the task running information from the node manager, determines the global abnormal node and the application abnormal node from the plurality of nodes according to the abnormal node information, and determines the abnormal task node of the application from the plurality of nodes according to the task running information, so that the obtained target abnormal node is relatively accurate, for example, the nodes with various environmental abnormalities such as node user group missing, running abnormality caused by configuration file errors, disk directory authority abnormality, dependency packet missing and the like can be identified, and task retry is greatly reduced.
Furthermore, based on the determined global abnormal node, the determined application program abnormal node and the determined application program abnormal task node, task scheduling is performed on the nodes except for part or all of the abnormal nodes in the target abnormal node, so that the scheduling stability of the big data cluster can be improved.
On the basis of the foregoing embodiment, fig. 5 is a flowchart of a node management method in a data cluster according to another embodiment of the present application. In the embodiment shown in fig. 5, taking task scheduling on a normal node in a plurality of nodes according to the node health score as an example, each step in the method shown in fig. 5 is described in detail below, and the flowchart includes:
s501, the application manager sends at least one abnormal node message to the resource manager, and correspondingly, the resource manager receives the at least one abnormal node message.
Wherein the node indicated by the at least one abnormal node information is a node among a plurality of nodes connected to the at least one node manager.
This step is similar to step S201 in the embodiment shown in fig. 2, and is not described again here.
S502, the node manager acquires at least one task running information, wherein the at least one task running information comprises running information of a target task which fails to run on any one of a plurality of nodes connected with the node manager.
This step is similar to step S202 in the embodiment shown in fig. 2, and is not repeated herein.
S503, the node manager sends at least one task running information to the resource manager, and correspondingly, the resource manager receives the at least one task running information.
Wherein the at least one task execution information includes execution information of a target task that has failed to be executed at any one of the plurality of nodes.
This step is similar to step S203 in the embodiment shown in fig. 2, and is not repeated herein.
S504, the resource manager determines a target abnormal node from the plurality of nodes according to the at least one abnormal node information and the at least one task running information.
This step is similar to step S204 in the embodiment shown in fig. 2, and is not described again here.
And S505, the node manager calculates the resource use condition of each normal node connected with the node manager when executing the task according to the preset health monitoring index, and the obtained node health score corresponding to each normal node.
The preset health monitoring indexes comprise the use condition of the central processing unit, the occupation condition of a disk, the occupation condition of a memory and the network condition.
It should be understood that the node manager detects resource usage, such as central processing unit usage, disk usage, memory usage, and network usage, of each normal node when executing a task, and calculates the obtained multidimensional health monitoring values in a weighted average manner to obtain a uniform index, i.e., a node health score.
Optionally, clusters of different load types may adjust the weights of different dimensions.
Illustratively, the Central Processing Unit (CPU) usage uses "SystemLoadAvg" and "AvailableProcessors" as reference indicators; monitoring the util index by using iostat under the occupation condition of the disk, sequencing the util of the mounted hard disk devices, and selecting the average value of the first 1/3 with higher utilization rate as the disk pressure; the memory occupation condition is selected from a "/proc/meminfo" kernel interface file to calculate the system memory vacancy rate; the network condition calculates the network bandwidth utilization according to the "/proc/net/dev". Then, normalizing the monitoring data values of the several types to be between 0 and 1, and calculating a total node health score (loadScore) value through weighted average, wherein a specific algorithm is as follows:
Figure BDA0003852523500000101
the loadScore is a node health score corresponding to each node, the cpuscscore and the cpuWeight are average scores of reference indexes corresponding to the use condition of the central processing unit, the disksscore and the disksight are average scores of reference indexes corresponding to the use condition of the disk, the netScore and the netWeight are average scores of reference indexes corresponding to the network condition, and the memScore and the memWeight are average scores of reference indexes corresponding to the memory use condition. Of course, the above calculation process is only an example, and those skilled in the art may obtain the node health score by using other parameters and calculation methods, which are not limited herein.
S506, the node manager sends at least one node health score to the resource manager, and correspondingly, the resource manager receives the at least one node health score.
Optionally, since the node health score changes rapidly, to avoid jitter, the obtained node health score may be discretized (e.g., 10, 20, 30, \ 8230;, 100), and only when the discrete interval of the node health score changes, the value is reported to the resource manager.
Optionally, the process of calculating the node health score may be executed by the node manager, and the node health score is sent to the resource manager after calculation is completed, or may be directly executed by the resource manager when monitoring the node manager, which is not limited in the present application.
And S507, the resource manager performs task scheduling on normal nodes in the plurality of nodes according to the node health scores.
It should be understood that, for each node, if the node health score is greater than or equal to a first preset threshold, a preset task running on the node is released; if the node health score is greater than or equal to a second preset threshold and smaller than a first preset threshold, stopping continuously scheduling a new task to the node, wherein the first preset threshold is greater than the second preset threshold; and if the node health score is smaller than a second preset threshold value, maintaining the scheduling mode of the node.
Illustratively, the resource manager classifies node health into three classes: the node health score is greater than or equal to a first preset threshold value and is a first class (series), the node health score is greater than or equal to a second preset threshold value and is a second class (High) when the node health score is less than the first preset threshold value, and the node health score is a third class (Low) when the node health score is less than the second preset threshold value. For a first class (series) node, indicating that the node cannot normally run a task, triggering a scheduler to perform task eviction at the moment, and selecting part of tasks to release so as to reduce the Load (Load) of the node; for the second type (High) node, it indicates that the load of the current node is already High, but the currently running task can still continue to run, but is not suitable for scheduling more tasks, and the state of this type of node will be marked as "AutoReadOnly"; for the third class (Low) node, the node is indicated to operate normally, and the original scheduling mode can be maintained.
Optionally, the states of the three types of nodes may be mutually converted in the running process, and the states are timely updated to the scheduler for corresponding scheduling intervention.
It should be understood that when the task runs on the node with the higher node health score, the task is seriously interfered, and the long tail phenomenon of the task can be effectively reduced by scheduling and avoiding the node with the higher node health score.
Optionally, the caches of the three types of nodes are maintained in a priority queue manner, for example, normal nodes are cached in a first preset queue according to the node health scores, wherein the nodes in the first preset queue are sorted from low to high according to the node health scores corresponding to the nodes. If the cache space occupied by the first preset queue is larger than or equal to a first preset cache threshold, for example, when the first preset queue is full, nodes with low node health scores are sequentially removed according to the levels of the node health scores corresponding to the nodes.
Optionally, since the node health score corresponding to the node is updated in real time, it is not necessary to write "checkpoint", and after the resource manager is restarted, the node health score may be reconstructed again according to the node health score data reported by the node manager.
Optionally, since the "shuffle" process of a part of tasks easily causes a lot of input/output (IO) data of a disk under the cluster load of spark and mapreduce types, and at the nodes of the Yarn node manager and the mixed part of the "Hdfs DataNode", such IO exceptions cause interference to other tasks and also severely interfere the services of the DataNode. Therefore, the method and the device adopt a dynamic IO limiting mode for tasks which run on the nodes, do not perform IO limitation by default, perform IO monitoring on each task when running, increase the limitation on IO through a cgroup blkio subsystem to reduce interference on other tasks and other services, and finally enable the node environment to automatically recover to a normal level. The specific algorithm is as follows:
monitoring the use information of the input/output IO resources of all tasks corresponding to each node manager in at least one node manager, and if the use value of the IO resource of the first task in the use information of the IO resource in all tasks is larger than or equal to a preset use threshold, performing suppression processing on the IO resource used by the first task.
Illustratively, a process tree of the node manager is traversed, the IO of all tasks is sequenced, and if the preset use threshold of the IO of a single process is exceeded, the group blk resource configuration is written into to limit the IO use of the tasks, that is, an iops/bps limiting mode is introduced to limit the resource group use upper limit. Accordingly, when the task ends, the corresponding cgroup blk configuration is cleared.
It should be understood that by dynamically controlling the task of IO exception, the nodes occupied by the abnormal IO can be greatly reduced. Taking the read-write rate of the disk as an example, the index is collected from "/sys/block/$ device/stat", and the use pressure of the disk IO is represented by the change value in the preset time window. The following table 1 counts the disk read-write rates of all the nodes of the cluster in a certain time period, where the read-write rates of the disks are sorted from small to large, and the read-write rates of the disks corresponding to different quantiles are counted. The write rate of the 99% sub-bit disk is greatly reduced, and the nodes of the abnormal IO are fully reflected to be effectively controlled.
TABLE 1
Quantile value Amplitude of descent
99% 91%
90% 18.9%
80% 18%
Optionally, the ratio of dataode drops caused by IO exceptions is also reduced by more than 90%.
Optionally, the node manager may also establish monitoring for the self-contained Shuffle service in addition to performing runtime monitoring and limiting on the task, and the Shuffle service may also bring abnormal IO consumption in some scenarios.
In the embodiment, on the basis of comprehensively determining the target abnormal node by acquiring the abnormal node information from the application program manager and the task running information from the node manager, node health scores of nodes are calculated for nodes except for part or all of the abnormal nodes in the target abnormal node, and a hierarchical processing strategy is established for the node health scores, so that the self-healing cost is minimized, and the occurrence of task abnormality is reduced from a scheduling source; and dynamic IO limitation during task operation is adopted, the machine performance is fully exerted under the condition of releasing the IO limitation by default, meanwhile, the occurrence of abnormal IO scenes can be obviously reduced, and the scheduling stability of the big data cluster is effectively improved.
Optionally, based on any one of the three embodiments, after the target abnormal node is determined from the multiple nodes, a priority queue and an timeout control mechanism may be used for storing the target abnormal node, that is, the target abnormal node is sequentially buffered in a second preset queue; if the cache space occupied by the second preset queue is larger than or equal to a second preset cache threshold, removing target abnormal nodes stored in the second preset queue in advance according to the time sequence of queue entry; and/or removing abnormal nodes corresponding to the storage time which is greater than or equal to the preset storage period.
Illustratively, the target abnormal nodes are sorted according to the enqueue time, each time one node is considered as the target abnormal node, the enqueue time is entered, meanwhile, the Buffer (Buffer) size of the queue is controlled, when the Buffer is full, the node at the head of the queue is automatically removed, and the overtime node is periodically removed.
It should be understood that the storage manner can effectively control the number of the target abnormal nodes so as to avoid that the computing resources are seriously unavailable due to the excessive number of the abnormal nodes.
Optionally, a node restart may be considered a valid operation and maintenance operation, and may automatically remove the node from the target abnormal node. The target abnormal node is also written into a fault-tolerant recovery mechanism (checkpoint) to ensure that the resource manager is not lost after restarting.
Optionally, a "yarn rmadmin" operation and maintenance command may be added to the present application to support human intervention (adding or removing) of the target abnormal node list.
Based on the foregoing embodiment, fig. 6 is a schematic structural diagram of a node management apparatus 600 in a data cluster according to an embodiment of the present application, where the apparatus 600 includes: an acquisition module 601, a determination module 602, and a scheduling module 603.
The obtaining module 601 is configured to obtain at least one abnormal node information sent by at least one application manager and at least one task running information sent by at least one node manager, where a node indicated by the at least one abnormal node information is a node in a plurality of nodes connected to the at least one node manager, and the at least one task running information includes running information of a target task that fails to run on any node in the plurality of nodes; a determining module 602, configured to determine a target abnormal node from the multiple nodes according to the at least one abnormal node information and the at least one task running information; the scheduling module 603 is configured to perform task scheduling on a normal node in the plurality of nodes, where the normal node is a node excluding some or all of the target abnormal nodes in the plurality of nodes.
In some embodiments, the determining module 602 is specifically configured to determine, according to the at least one exception node information, a global exception node and at least one application exception node corresponding to at least one application from the multiple nodes, where the resource manager does not perform any task scheduling on the global exception node, the resource manager does not schedule a task corresponding to a target application on the target application exception node, the target application exception node corresponds to the target application, and the target application exception node is one of the at least one application exception node; and determining at least one application program abnormal task node from the plurality of nodes according to the at least one task running information, wherein the target application program abnormal task node corresponds to a target task in a target application program respectively, the resource manager does not schedule the target task on the target application program abnormal task node, and the target application program abnormal task node is one of the at least one application program abnormal task node.
In some embodiments, each task running information includes task information of running failure, where the task information is used to indicate a failure reason of the task of running failure, and the determining module 602 is further configured to remove a target failure task from the at least one task running information, and obtain at least one updated task running information, where the target failure task is a task of running failure caused by a physical resource of a non-node indicated by the failure reason; and determining the abnormal task node of the at least one application program according to the at least one piece of updated task running information.
In some embodiments, the obtaining module 601 is further configured to obtain at least one node health score sent by each node manager, where each node health score is obtained by the node manager calculating resource usage of each normal node connected to the node manager when executing a task according to a preset health monitoring index, and the preset health monitoring index includes a central processing unit usage, a disk usage, a memory usage, and a network usage.
In some embodiments, the scheduling module 603 is specifically configured to perform task scheduling on a normal node in the plurality of nodes according to the node health score.
In some embodiments, the scheduling module 603 is further configured to, for each node, release a preset task running on the node if the node health score is greater than or equal to a first preset threshold; if the node health score is larger than or equal to a second preset threshold and smaller than the first preset threshold, stopping continuously scheduling a new task to the node, wherein the first preset threshold is larger than the second preset threshold; and if the node health score is smaller than the second preset threshold value, maintaining the scheduling mode of the node.
In some embodiments, the apparatus further comprises: the cache module is used for caching the normal nodes into a first preset queue according to the node health scores, wherein the normal nodes in the first preset queue are sorted from low to high according to the node health scores corresponding to the normal nodes; and the removing module is used for sequentially removing nodes with low node health scores according to the node health scores corresponding to the normal nodes if the cache space occupied by the first preset queue is larger than or equal to a first preset cache threshold value.
In some embodiments, the apparatus further comprises: the monitoring module is used for monitoring the use information of the input/output IO resources of all tasks corresponding to each node manager in the at least one node manager; and the processing module is used for suppressing the IO resources used by the first task if the use value of the IO resources of the first task in the use information of the IO resources in all the tasks is greater than or equal to a preset use threshold value.
In some embodiments, the caching module is further configured to sequentially cache the target abnormal node in a second preset queue; the removing module is further configured to remove a target abnormal node previously stored in the second preset queue according to the time sequence of entering the queue if the cache space occupied by the second preset queue is greater than or equal to a second preset cache threshold; and/or removing abnormal nodes corresponding to the storage time which is greater than or equal to the preset storage period.
It should be appreciated that the apparatus 600 herein is embodied in the form of functional modules. The term module herein may refer to an Application Specific Integrated Circuit (ASIC), an electronic circuit, a processor (e.g., a shared, dedicated, or group processor) and memory that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that support the described functionality. In an optional example, it may be understood by those skilled in the art that the apparatus 600 may be embodied as a resource manager in the foregoing embodiment, or the functions of the resource manager in the foregoing embodiment may be integrated in the apparatus 600, and the apparatus 600 may be configured to execute each procedure and/or step corresponding to the resource manager in the foregoing method embodiment, and in order to avoid repetition, details are not described here again.
The device 600 has the function of implementing the corresponding steps executed by the resource manager in the method; the above functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above.
Fig. 7 is a schematic structural diagram of a node management apparatus 700 in a data cluster according to another embodiment of the present application, where the apparatus 700 includes: an acquisition module 701 and a sending module 702.
The acquiring module 701 is configured to acquire at least one piece of task operation information, where the at least one piece of task operation information includes operation information of a target task that fails to operate at any one of a plurality of nodes connected to a node manager; a sending module 702, configured to send at least one task running information to the resource manager.
In some embodiments, the apparatus further comprises: the computing module is used for computing the resource use condition of each normal node connected with the node manager during task execution according to a preset health monitoring index, so as to obtain a node health score corresponding to each normal node, wherein the preset health monitoring index comprises the use condition of a central processing unit, the occupation condition of a disk, the occupation condition of a memory and the network condition; a sending module 702, configured to send the node health score corresponding to each normal node to the resource manager.
In some embodiments, the apparatus further comprises: the monitoring module is used for monitoring the use information of the input/output IO resources of all tasks corresponding to the node manager; and the processing module is used for suppressing the IO resources used by the first task if the use value of the IO resources of the first task in the use information of the IO resources in all the tasks is greater than or equal to a preset use threshold value.
In some embodiments, each task running information includes task information of running failure, and the task information is used for indicating a failure reason of the task of running failure.
It should be appreciated that the apparatus 700 herein is embodied in the form of functional modules. The term module herein may refer to an Application Specific Integrated Circuit (ASIC), an electronic circuit, a processor (e.g., a shared, dedicated, or group processor) and memory that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that support the described functionality. In an optional example, it may be understood by those skilled in the art that the apparatus 700 may be specifically a node manager in the foregoing embodiment, or functions of the node manager in the foregoing embodiment may be integrated in the apparatus 700, and the apparatus 700 may be configured to execute each process and/or step corresponding to the node manager in the foregoing method embodiment, and details are not described herein again to avoid repetition.
The above device 700 has the function of implementing the corresponding steps executed by the node manager in the above method; the above functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above.
Fig. 8 is a schematic structural diagram of an apparatus according to another embodiment of the present application. The apparatus shown in fig. 8 may be used to perform the method of any of the previous embodiments.
As shown in fig. 8, the apparatus 800 of the present embodiment includes: memory 801, processor 802, communication interface 803, and bus 804. The memory 801, the processor 802, and the communication interface 803 are communicatively connected to each other via a bus 804.
The memory 801 may be a Read Only Memory (ROM), a static memory device, a dynamic memory device, or a Random Access Memory (RAM). The memory 801 may store a program, and when the program stored in the memory 801 is executed by the processor 802, the processor 802 is configured to perform the respective steps corresponding to the resource manager or the node manager in the method shown in the above-described embodiment.
The processor 802 may be a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits, and is configured to execute related programs to implement the methods shown in the embodiments of the present application.
The processor 802 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the method of the embodiment of the present application may be implemented by integrated logic circuits of hardware or instructions in the form of software in the processor 802.
The processor 802 may also be a general purpose processor, a Digital Signal Processor (DSP), an ASIC, an FPGA (field programmable gate array) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 801, and the processor 802 reads the information in the memory 801 and, in conjunction with the hardware, performs the functions that the unit included in the apparatus of the present application needs to perform.
The communication interface 803 may enable communication between the apparatus 800 and other devices or communication networks using, but not limited to, transceiver means such as transceivers.
The bus 804 may include a pathway to transfer information between various components of the apparatus 800 (e.g., memory 801, processor 802, communication interface 803).
It should be understood that the apparatus 800 shown in the embodiment of the present application may be an electronic device, or may also be a chip configured in the electronic device.
It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (18)

1. A node management method in a data cluster is applied to a resource manager, and the method comprises the following steps:
acquiring at least one abnormal node information sent by at least one application program manager and at least one task running information sent by at least one node manager, wherein a node indicated by the at least one abnormal node information is a node in a plurality of nodes connected with the at least one node manager, and the at least one task running information comprises running information of a target task which fails to run at any one of the plurality of nodes;
determining a target abnormal node from the plurality of nodes according to the at least one abnormal node information and the at least one task running information;
and performing task scheduling on normal nodes in the plurality of nodes, wherein the normal nodes are nodes except for part or all of the target abnormal nodes in the plurality of nodes.
2. The method of claim 1, wherein determining a target exception node from the plurality of nodes based on the at least one exception node information and the at least one task execution information comprises:
determining a global abnormal node and at least one application program abnormal node respectively corresponding to at least one application program from the plurality of nodes according to the at least one abnormal node information, wherein the resource manager does not perform any task scheduling on the global abnormal node, the resource manager does not schedule a task corresponding to the target application program on a target application program abnormal node, the target application program abnormal node corresponds to the target application program, and the target application program abnormal node is one of the at least one application program abnormal node;
and determining at least one application program abnormal task node from the plurality of nodes according to the at least one task running information, wherein the target application program abnormal task node corresponds to a target task in a target application program respectively, the resource manager does not schedule the target task on the target application program abnormal task node, and the target application program abnormal task node is one of the at least one application program abnormal task node.
3. The method according to claim 2, wherein each task running information includes task information of running failure, the task information is used for indicating a failure reason of the task of running failure, and the determining at least one application exception task node from the plurality of nodes according to the at least one task running information includes:
removing a target failure task from the at least one task running information to obtain at least one updated task running information, wherein the target failure task is a task with a failure reason indication of running failure caused by non-node physical resources;
and determining the abnormal task node of the at least one application program according to the at least one piece of updated task operation information.
4. The method of claim 1, wherein after determining a target outlier node, the method further comprises:
the method comprises the steps of obtaining at least one node health score sent by each node manager, wherein each node health score is obtained by calculating the resource use condition of each normal node connected with the node manager when executing a task according to a preset health monitoring index, and the preset health monitoring index comprises the use condition of a central processing unit, the occupation condition of a disk, the occupation condition of a memory and the network condition.
5. The method of claim 4, wherein the task scheduling for the regular nodes of the plurality of nodes comprises:
and performing task scheduling on normal nodes in the plurality of nodes according to the node health scores.
6. The method of claim 5, wherein the task scheduling for the regular nodes of the plurality of nodes according to the node health score comprises:
for each node, if the node health score is greater than or equal to a first preset threshold value, releasing a preset task running on the node;
if the node health score is larger than or equal to a second preset threshold and smaller than the first preset threshold, stopping continuously scheduling a new task to the node, wherein the first preset threshold is larger than the second preset threshold;
and if the node health score is smaller than the second preset threshold, maintaining the scheduling mode of the node.
7. The method of claim 6, further comprising:
caching the normal nodes into a first preset queue according to the node health scores, wherein the normal nodes in the first preset queue are sorted from low to high according to the node health scores corresponding to the normal nodes;
and if the cache space occupied by the first preset queue is larger than or equal to a first preset cache threshold, sequentially removing nodes with low node health scores according to the node health scores corresponding to the normal nodes.
8. The method of claim 7, further comprising:
monitoring the use information of input/output (IO) resources of all tasks corresponding to each node manager in the at least one node manager;
and if the use value of the IO resource of the first task in the use information of the IO resource in all the tasks is greater than or equal to a preset use threshold, performing suppression processing on the IO resource used by the first task.
9. The method according to any one of claims 1 to 8, further comprising:
sequentially caching the target abnormal nodes into a second preset queue;
if the cache space occupied by the second preset queue is larger than or equal to a second preset cache threshold value, removing the target abnormal node stored in the second preset queue according to the time sequence of queue entry; and/or the presence of a gas in the gas,
and removing the abnormal nodes with the storage time being greater than or equal to the preset storage period.
10. A node management method in a data cluster is applied to a node manager, and the method comprises the following steps:
acquiring at least one task running information, wherein the at least one task running information comprises running information of a target task which fails to run at any one of a plurality of nodes connected with the node manager;
and sending the at least one task running information to a resource manager.
11. The method of claim 10, further comprising:
calculating the resource use condition of each normal node connected with the node manager during task execution according to a preset health monitoring index, and obtaining a node health score corresponding to each normal node, wherein the preset health monitoring index comprises the use condition of a central processing unit, the occupation condition of a disk, the occupation condition of a memory and the network condition;
and sending the node health score corresponding to each normal node to the resource manager.
12. The method of claim 11, further comprising:
monitoring the use information of input/output (IO) resources of all tasks corresponding to the node manager;
and if the use value of the IO resource of the first task in the use information of the IO resource in all the tasks is greater than or equal to a preset use threshold, performing suppression processing on the IO resource used by the first task.
13. The method according to any one of claims 10 to 12, wherein each task running information includes task information of running failure, and the task information is used for indicating a failure reason of the task of running failure.
14. An apparatus for managing nodes in a data cluster, the apparatus being applied to a resource manager, the apparatus comprising:
an obtaining module, configured to obtain at least one abnormal node information sent by at least one application manager and at least one task running information sent by at least one node manager, where a node indicated by the at least one abnormal node information is a node in a plurality of nodes connected to the at least one node manager, and the at least one task running information includes running information of a target task that has failed to run on any node in the plurality of nodes;
the determining module is used for determining a target abnormal node from the plurality of nodes according to the at least one abnormal node information and the at least one task running information;
and the scheduling module is used for performing task scheduling on normal nodes in the plurality of nodes, wherein the normal nodes are nodes except for part or all of the abnormal nodes in the target abnormal nodes in the plurality of nodes.
15. A node management device in a data cluster, applied to a node manager, the device comprising:
an obtaining module, configured to obtain at least one task running information, where the at least one task running information includes running information of a target task that fails to run at any one of a plurality of nodes connected to the node manager;
and the sending module is used for sending the at least one task running information to the resource manager.
16. An apparatus for node management in a data cluster, comprising a processor and a memory, the memory configured to store code instructions; the processor is configured to execute the code instructions to perform the method of any of claims 1 to 13.
17. A computer-readable storage medium for storing a computer program comprising instructions for implementing the method of any one of claims 1 to 9 or 10 to 13.
18. A computer program product comprising computer program instructions which, when run on a computer, cause the computer to carry out the method of any one of claims 1 to 9 or 10 to 13.
CN202211138811.9A 2022-09-19 2022-09-19 Node management method and device in data cluster and storage medium Pending CN115422010A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211138811.9A CN115422010A (en) 2022-09-19 2022-09-19 Node management method and device in data cluster and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211138811.9A CN115422010A (en) 2022-09-19 2022-09-19 Node management method and device in data cluster and storage medium

Publications (1)

Publication Number Publication Date
CN115422010A true CN115422010A (en) 2022-12-02

Family

ID=84203933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211138811.9A Pending CN115422010A (en) 2022-09-19 2022-09-19 Node management method and device in data cluster and storage medium

Country Status (1)

Country Link
CN (1) CN115422010A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115994044A (en) * 2023-01-09 2023-04-21 苏州浪潮智能科技有限公司 Database fault processing method and device based on monitoring service and distributed cluster

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115994044A (en) * 2023-01-09 2023-04-21 苏州浪潮智能科技有限公司 Database fault processing method and device based on monitoring service and distributed cluster

Similar Documents

Publication Publication Date Title
US8417999B2 (en) Memory management techniques selectively using mitigations to reduce errors
US9307048B2 (en) System and method for proactive task scheduling of a copy of outlier task in a computing environment
EP0259224B1 (en) Method for performance evaluation of a data processor system
US10831387B1 (en) Snapshot reservations in a distributed storage system
US10191771B2 (en) System and method for resource management
US8132170B2 (en) Call stack sampling in a data processing system
US9495201B2 (en) Management of bottlenecks in database systems
US20070169125A1 (en) Task scheduling policy for limited memory systems
US20200034048A1 (en) Pulsed leader consensus management
US8914582B1 (en) Systems and methods for pinning content in cache
US20200042392A1 (en) Implementing Affinity And Anti-Affinity Constraints In A Bundled Application
US11972301B2 (en) Allocating computing resources for deferrable virtual machines
CN114328102A (en) Equipment state monitoring method, device, equipment and computer readable storage medium
Yang et al. Performance-aware speculative resource oversubscription for large-scale clusters
US7949903B2 (en) Memory management techniques selectively using mitigations to reduce errors
US9128754B2 (en) Resource starvation management in a computer system
CN115422010A (en) Node management method and device in data cluster and storage medium
US8140892B2 (en) Configuration of memory management techniques selectively using mitigations to reduce errors
CN113590285A (en) Method, system and equipment for dynamically setting thread pool parameters
US9021499B2 (en) Moving a logical device between processor modules in response to identifying a varying load pattern
CN116127494A (en) Control method and related device for concurrent access of users
US20130046910A1 (en) Method for managing a processor, lock contention management apparatus, and computer system
CN117593172B (en) Process management method, device, medium and equipment
CN108595625B (en) Operation and maintenance method and device of BI tool, computer device and storage medium
US10897390B2 (en) Takeover method of process, cluster construction program and cluster construction apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination