CN111930493A - NodeManager state management method and device in cluster and computing equipment - Google Patents

NodeManager state management method and device in cluster and computing equipment Download PDF

Info

Publication number
CN111930493A
CN111930493A CN201910394996.1A CN201910394996A CN111930493A CN 111930493 A CN111930493 A CN 111930493A CN 201910394996 A CN201910394996 A CN 201910394996A CN 111930493 A CN111930493 A CN 111930493A
Authority
CN
China
Prior art keywords
cluster
state
nodemanager
determining
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910394996.1A
Other languages
Chinese (zh)
Other versions
CN111930493B (en
Inventor
李瑶
许佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Hubei Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Hubei Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Hubei Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201910394996.1A priority Critical patent/CN111930493B/en
Publication of CN111930493A publication Critical patent/CN111930493A/en
Application granted granted Critical
Publication of CN111930493B publication Critical patent/CN111930493B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention relates to the technical field of distributed resource management and scheduling systems, and discloses a method, a device and a computing device for managing NodeManager states in a cluster. The method comprises the following steps: collecting network load information of a cluster, and evaluating the hardware state of the cluster according to the network load information; determining the health state of the nodes in the cluster according to the evaluation result; and when the state of the node is unhealthy, performing offline operation on the node manager. Through the mode, the embodiment of the invention realizes the prejudgment and the automatic offline before the NodeManager fault, ensures the stable operation of the system, and simultaneously avoids the condition that the task fails due to the failure of the Container allocation caused by the occupation of a plurality of application programs by the node host.

Description

NodeManager state management method and device in cluster and computing equipment
Technical Field
The embodiment of the invention relates to the technical field of distributed resource management and scheduling systems, in particular to a method, a device and computing equipment for managing NodeManager states in a cluster.
Background
With the development of computer technology, various data-intensive application-based computing frameworks are emerging, such as MpaReduce, Spark, S4, Storm, etc. When a computing framework is adopted, factors such as resource utilization rate, operation and maintenance cost, data sharing and the like are generally considered, and an application person generally wants to deploy all the computing frameworks to a common cluster, share the cluster resources and uniformly use the resources. Thus, a unified Resource management and scheduling platform, typically YARN (Yet other Resource coordinator), was created.
The YARN is divided into a resourcemanager (global resource manager, RM) and a NodeManager (node manager, NM) role, wherein the resourcemanager is mainly responsible for global allocation and management. The NodeManager is responsible for resource allocation and management of individual nodes. After receiving the task, the NodeManager can allocate Application Master and Container, and when the host resource is not exclusive to YARN, the situation that the ResourceManager resource Application fails can be caused.
In the prior art, YARN resource allocation only takes a CPU and a memory as computing resources, and is divided in advance in a yann-site.xml configuration form when a cluster is started, and a resource manager and a node manager maintain connection through heartbeat, and cannot make a judgment on a network so as to allocate resources. In addition, the Impala of the MPP architecture is also deployed on the host of the Hadoop cluster, but the resource allocation is not managed according to the YARN, and when the MPP aggregation query is executed, a large amount of data is accumulated in the memory, and at this time, if the MPP aggregation query is continuously applied according to the memory and the CPU in the configuration, a Container allocation failure is caused, and thus a task failure is caused. The memory occupied by the instant query is high, but the use time is short. If all reservations are made, it would be wasteful of YARN. Therefore, this approach cannot accommodate the situation where multiple applications are preempted by the node host.
Disclosure of Invention
In view of the foregoing problems, embodiments of the present invention provide a subscription database scaling method, apparatus, and computing device based on a TimesTen bank, which overcome the foregoing problems or at least partially solve the foregoing problems.
According to an aspect of an embodiment of the present invention, a method for managing NodeManager states in a cluster is provided, the method including:
collecting network load information of a cluster, and evaluating the hardware state of the cluster according to the network load information;
determining the health state of the nodes in the cluster according to the evaluation result;
and when the state of the node is unhealthy, performing offline operation on the node manager.
In an optional manner, the collecting network load information of a cluster, and evaluating a hardware state of the cluster according to the network load information, further includes:
collecting network load information of a cluster;
and evaluating the network delay of the cluster according to the network load information, and evaluating the disk state of the cluster.
In an optional manner, when the host resource is not exclusive to YARN, the method further comprises:
evaluating the CPU utilization rate and the memory utilization rate;
the determining the health status of the nodes in the cluster according to the result of the evaluation further comprises:
and determining the health state of the nodes in the cluster according to the evaluation results of the network delay, the disk state, the CPU utilization rate and the memory utilization rate.
In an optional manner, when the host resource is YARN exclusive, the method further comprises:
and when the network delay exceeds a preset value, evaluating the network delay of the cluster by combining the historical network delay and the health state record of the corresponding node.
In an optional manner, the method further comprises:
reconfiguring CPU resources and memory resources;
when the state health of the nodes in the cluster is determined according to the evaluation of the hardware state of the cluster, modifying the parameters of the NodeManager configuration file into the reconfigured values;
and carrying out online operation on the NodeManager.
In an optional manner, the evaluating the network delay of the cluster according to the network load information further includes:
acquiring request queuing time and processing time of an RPC queue through a JMX interface monitored by JMX in Hadoop;
summing the request queuing times of all the nodes, then averaging to obtain a reference queue time, and taking the processing time of the first host as a reference processing time;
judging whether the network delay of the first host is greater than the reference queue time or not, or whether the network delay of the second host is greater than the reference processing time or not;
the determining the health status of the nodes in the cluster according to the result of the evaluation further comprises:
and when the network delay of the first host is larger than the reference queue time or the network delay of the second host is larger than the reference processing time, determining that the state of the node is unhealthy.
In an optional manner, the evaluating the disk state of the cluster further includes:
checking the running state of the disk through a script;
judging whether the magnetic disk reports errors or not;
the determining the health status of the nodes in the cluster according to the result of the evaluation further comprises:
and when a certain disk in the disks of the cluster reports an error, determining that the state of the node is unhealthy.
In an optional manner, the evaluating the CPU usage further includes:
calculating the total core number N of the current CPU through a script, and determining the utilization rate p of the CPU used by the current non-YARN and the core number M of the CPU distributed by the NodeManager;
subtracting the product of N and (1-p) from M to obtain the evaluated value of the CPU utilization rate;
the determining the health status of the nodes in the cluster according to the result of the evaluation further comprises:
and when the evaluated value of the CPU utilization rate exceeds a preset CPU utilization rate threshold value, determining that the state of the node is unhealthy.
In an optional manner, the evaluating the memory usage further includes:
acquiring the total memory, the total memory allocated in the NodeManager and the use amount of the system process through the script;
judging whether the difference value between the total memory amount and the system process usage amount is larger than the total memory amount distributed in the NodeManager;
the determining the health status of the nodes in the cluster according to the result of the evaluation further comprises:
and when the difference value between the total memory amount and the system process usage amount is not greater than the total memory amount distributed in the NodeManager, determining that the state of the node is unhealthy.
According to another aspect of the embodiments of the present invention, there is provided a node manager state management apparatus in a cluster, the apparatus including:
the evaluation module is used for collecting network load information of a cluster and evaluating the hardware state of the cluster according to the network load information;
the determining module is used for determining the health state of the nodes in the cluster according to the evaluation result;
and the management module is used for performing offline operation on the NodeManager when the state of the node is unhealthy.
According to another aspect of embodiments of the present invention, there is provided a computing device including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation of the NodeManager state management method in the cluster.
According to another aspect of the embodiments of the present invention, there is provided a computer storage medium, in which at least one executable instruction is stored, and the executable instruction causes a processor to execute the method for managing node manager states in a cluster as described above.
The embodiment of the invention automatically collects and evaluates the hardware state of the cluster, determines the health state of the nodes in the cluster according to the evaluation result, and carries out offline operation on the NodeManager when the state of the nodes is unhealthy, thereby realizing the prejudgment and automatic offline of the NodeManager before the fault and ensuring the stable operation of the system; meanwhile, the embodiment of the invention does not only evaluate the health state of the node according to the states of the memory and the CPU in the configuration, thereby avoiding the condition that the task fails due to the failure of Container allocation when a node host is preempted by a plurality of application programs.
The foregoing description is only an overview of the technical solutions of the embodiments of the present invention, and the embodiments of the present invention can be implemented according to the content of the description in order to make the technical means of the embodiments of the present invention more clearly understood, and the detailed description of the present invention is provided below in order to make the foregoing and other objects, features, and advantages of the embodiments of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 shows a flowchart of a NodeManager state management method in a cluster according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a NodeManager status management method in a cluster according to another embodiment of the present invention;
FIG. 3 is a flowchart illustrating a NodeManager status management method in a cluster according to another embodiment of the present invention;
FIG. 4 is a flowchart illustrating a NodeManager state management method in a cluster according to yet another embodiment of the present invention;
fig. 5 is a flowchart illustrating a method for managing NodeManager states in a cluster according to a specific application example in an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a node manager state management apparatus in a cluster according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Fig. 1 shows a flowchart of a NodeManager status management method in a cluster, which is provided in an embodiment of the present invention and is applied to a computing device, for example, a server in a communication network, a management computer in a resource unified management and scheduling platform of a cluster, and the like. As shown in fig. 1, the method comprises the steps of:
step 110: collecting network load information of a cluster, and evaluating the hardware state of the cluster according to the network load information.
In this step, the hardware state includes network latency, disk state, and the like. Generally, when the host resource YARN is exclusive, the network state is judged through automatically collected network load information, and the hard disk state can be judged at the same time, so as to evaluate the hardware state of the cluster. The method further comprises the following steps:
step A1: collecting network load information of a cluster;
step A2: and evaluating the network delay of the cluster according to the network load information, and evaluating the disk state of the cluster.
Step 120: and determining the health state of the nodes in the cluster according to the evaluation result.
And judging whether the downtime risk exists according to the evaluation result, and if so, determining that the health state of the nodes in the cluster is unhealthy and needing further processing. The evaluation result may be whether the hardware state of the cluster meets a preset condition, and when the hardware state of the cluster meets the preset condition, the state of the node in the cluster is determined to be unhealthy. Or, the evaluation result may also be a score, and when the evaluated score is greater than or less than a preset threshold, the state of the node in the cluster is judged to be unhealthy. It is understood that, in step 110, the hardware status includes one or more hardware statuses, and in this case, if the result of the evaluation of a certain hardware status meets a preset condition or the score of the evaluation is greater than or less than a preset threshold, the node status of the cluster is determined to be unhealthy, without determining the health status of the node according to the evaluation result of the whole hardware.
Step 130: and when the state of the node is unhealthy, performing offline operation on the node manager.
The NodeManager is offline according to the condition of the current node without influencing the service, so that the stable operation of the system is guaranteed. It will be appreciated that when the condition is restored, the NodeManager may also be modified to the appropriate parameters and the line restored, as will be described in detail later.
Fig. 2 shows a flowchart of a method for managing node manager states in a cluster according to another embodiment of the present invention. This embodiment is the case where the host resource is not exclusive to YARN. As shown in fig. 2, the method comprises the steps of:
step 110: collecting network load information of a cluster, and evaluating the hardware state of the cluster according to the network load information.
Step 210: when the host resource is not exclusive to YARN, the CPU usage and memory usage are evaluated.
And judging whether the host resource is YARN which exclusively belongs to the software process.
Step 120: and determining the health state of the nodes in the cluster according to the evaluation results of the network delay, the disk state, the CPU utilization rate and the memory utilization rate.
At this time, the evaluated items include a plurality of items, and when the evaluation result of one item meets a preset condition or the evaluation score is greater than or less than a preset threshold, the node status of the cluster can be determined to be unhealthy without determining the health status of the node according to the evaluation results of all the items. For example, the node status in the cluster may be determined to be unhealthy only if the network delay is greater than a preset threshold.
Step 130: and when the state of the node is unhealthy, performing offline operation on the node manager.
Step 110, step 120 and step 130 are the same as those in the foregoing embodiments, and reference may be made to the detailed description of the foregoing embodiments, which are not repeated herein.
In this embodiment, when the host resource is not monopolized by YARN, the utilization rates of the current CPU and memory resources are analyzed, and the priorities of other applications are fully considered, so that the status of the current node is reasonably evaluated, and the node is offline and recovered without affecting the service according to the status of the current node.
Fig. 3 is a flowchart illustrating a method for managing node manager states in a cluster according to another embodiment of the present invention. This embodiment is the case where the host resource is the sole YARN and the network latency is too large. As shown in fig. 3, the method comprises the steps of:
step 110: collecting network load information of a cluster, and evaluating the hardware state of the cluster according to the network load information.
Wherein the hardware state includes a network delay.
Step 310: when the host resource is exclusive to the YARN and the network delay exceeds a preset value, the network delay of the cluster is evaluated by combining the historical network delay and the health state record of the corresponding node.
In this step, when the host resource is the sole YARN, if the current network traffic is too large, other services may occupy the bandwidth at this time, but not the node state is unhealthy, and if it is determined that the node state is unhealthy, the node manager is offline, which may result in unnecessary offline and reduce the system operation efficiency. Thus, reference may be made to historical information, including a record of various network delays and whether the node is healthy at that time. If the network delay exceeds a certain preset value, the historical network delay and the corresponding node health state record are combined, so that the network delay of the cluster is evaluated in an auxiliary mode. If a certain percentage (e.g., 80%) of the node conditions in the history are healthy under similar network delay conditions, then the network delay may be determined to be normal.
Step 120: determining a health status of nodes in the cluster according to a result of the evaluation of the network delay.
Step 130: and when the state of the node is unhealthy, performing offline operation on the node manager.
Step 110, step 120 and step 130 are the same as those in the foregoing embodiments, and reference may be made to the detailed description of the foregoing embodiments, which are not repeated herein.
In this embodiment, when the host resource is monopolized by YARN and the current network traffic is too large, it may be that other services are occupying bandwidth, and at this time, whether to send a command of NodeManager offline is comprehensively determined according to a historical traffic peak value, so that offline error is avoided.
Fig. 4 is a flowchart illustrating a node manager state management method in a cluster according to yet another embodiment of the present invention. In this embodiment, after the NodeManager is offline, after the condition is recovered, the NodeManager is modified into a suitable parameter and the online condition is recovered. As shown in fig. 4, the method comprises the steps of:
step 110: collecting network load information of a cluster, and evaluating the hardware state of the cluster according to the network load information.
Step 120: and determining the health state of the nodes in the cluster according to the evaluation result.
Step 130: and when the state of the node is unhealthy, performing offline operation on the node manager.
Step 440: and reconfiguring the CPU resource and the memory resource.
This step can achieve dynamic allocation and utilization of resources by program modification of values of yarn.
The CPU resources may be reconfigured by:
1. obtaining the idle time of each CPU through a system stat command, and evaluating the utilization rate of the whole CPU;
2. calculating the time for removing the CPU used by the NodeManager;
3. obtaining an idle CPU ratio according to a proportion, and obtaining the core number which should be allocated to the CPU by combining the physical CPU core number N, wherein the idle CPU ratio is as follows: the CPU sum of the user, nic, system and idle is in proportion; the calculation formula of the core number to be allocated to the CPU is: pf1+Pf2+Pf3+…+PfnWherein Pf1Idle duty … … Pf referring to CPU core 1nRefers to the idle duty cycle of CPU core n.
The memory resources may be reconfigured by:
1. counting-X preset when node manager Java process startsmxValue MxWherein X ismxThe maximum heap memory occupied by starting the Java process is pointed;
2. counting the total memory amount M of the current system populationtTotal memory amount M occupied by current systemu
3. Calculating to obtain the total memory amount M to be allocatedsThe calculation formula is as follows: ms=Mt-(Mx+Mu)
Step 450: and when the state health of the nodes in the cluster is determined according to the evaluation of the hardware state of the cluster, modifying the parameters of the NodeManager configuration file into the reconfigured values.
Step 460: and carrying out online operation on the NodeManager.
Xml configuration of the node can be modified according to the current running state of the node after offline, and when the node is online again, the memory and the CPU are allocated in a more flexible manner.
In the following, how to evaluate the network latency of the cluster according to the network load information, how to evaluate the disk state of the cluster, how to evaluate the CPU utilization, and how to evaluate the memory utilization will be further described in detail in the above embodiments.
In some embodiments, in the step a2, the evaluating the network delay of the cluster according to the network load information includes the following steps:
step A21: acquiring request queuing time and processing time of an RPC queue through a JMX interface monitored by JMX in Hadoop;
step A22: summing the request queuing times of all the nodes, then averaging to obtain a reference queue time, and taking the processing time of the first host as a reference processing time;
step A23: judging whether the network delay of the first host is greater than the reference queue time or not, or whether the network delay of the second host is greater than the reference processing time or not;
at this time, the determining the health status of the nodes in the cluster according to the evaluation result further includes:
and when the network delay of the first host is larger than the reference queue time or the network delay of the second host is larger than the reference processing time, determining that the state of the node is unhealthy.
Specifically, the network delay score of the first host and the network delay score of the second host can be calculated; wherein the network delay score of the first host is a difference between the network delay of the first host and the reference queue time when the network delay of the first host is greater than the reference queue time, otherwise it is 0: the network delay score of the second host is a difference between the network delay of the second host and the reference processing time when the network delay of the second host is greater than the reference processing time, otherwise, the network delay score of the second host is 0; and adding the network delay score of the first host and the network delay score of the second host to obtain the network delay score of the cluster. When the network delay score of the cluster is not 0, determining that the state of the node is unhealthy.
In this embodiment, rpcquetimeavgtime and rpcpprocessingtimeavgtime are collected through a JMX interface monitored by JMX in Hadoop, so as to obtain request queuing time and processing time of an RPC queue, and obtain average time of normal operation of a cluster, where the following formula may be referred to for specific calculation:
reference queue time Tq ═ Tq (Tq)1+Tq2+Tq3+…+TqN)/N
Reference processing time Tp ═ Tp (Tp)1+Tp1+Tp1+…+Tp1)/N
The current network delay time is judged as follows: (t)1-Tq)>0?(t1-Tq):0+(t2-Tp)>0?(t2-Tp):0
Wherein N represents N nodes, t1Indicates the network latency of host one, t2Indicating the network latency of host two.
When the network delay time exceeds a preset value (for example, 0.8s), the node state is determined to be unhealthy.
In some embodiments, in the step a2, the evaluating the disk status of the cluster includes the following steps:
step a 21': checking the running state of the disk through a script;
step a 22': judging whether the magnetic disk reports errors or not;
at this time, the determining the health status of the nodes in the cluster according to the evaluation result further includes:
and when a certain disk in the disks of the cluster reports an error, determining that the state of the node is unhealthy.
Specifically, when a certain disk fails to report an error, the disk state score of the disk is 100; when a certain disk reports an error, the disk state score of the disk is 0; when the score of any disk state is 0, the evaluated score of the disk state of the cluster is 0, and when the scores of the states of all the disks are 100, the evaluated score of the disk state of the cluster is 100. When the evaluated score of the disk state of the cluster is 0, determining that the state of the node is unhealthy.
In this embodiment, the script may execute smartcll-H sdaN (linux self-contained check script) to check the running status of the disk, the score evaluated when the disk fails to report an error is 100 scores, the score evaluated when the disk reports an error is 0 score, and the final scores of n disks are:0∈(D1,D2,D3,…,Dn) Is there a 0:100, wherein D1Score … … D for disk 1nThe score for disk n. When the disk score is 0, the node status is determined to be unhealthy.
In some embodiments, in the step a2, when the host resource is not exclusively YARN, the step of evaluating the CPU utilization rate includes the following steps:
step B1: calculating the total core number N of the current CPU through a script, and determining the utilization rate p of the CPU used by the current non-YARN and the core number M of the CPU distributed by the NodeManager;
step B2: subtracting the product of N and (1-p) from M to obtain the score of the CPU usage rate evaluation.
At this time, determining the health status of the nodes in the cluster according to the result of the evaluation of the CPU utilization, further comprising:
and when the evaluated value of the CPU utilization rate exceeds a preset CPU utilization rate threshold value, determining that the state of the node is unhealthy.
In this embodiment, the total core number N of the current CPU can be calculated by looking up/proc/stat (a file occupied by the current system is displayed by the Linux system), the CPU utilization p used by the current non-YARN and the core number M allocated by the NodeManager are determined, and the CPU score is M-N (1-p). When the CPU utilization score exceeds a preset value (e.g., 75% or 80%), the node status is determined to be unhealthy.
In some embodiments, in the step a2, when the host resource is not exclusively YARN, the evaluating the memory usage comprises the following steps:
step C1: acquiring the total memory, the total memory allocated in the NodeManager and the use amount of the system process through the script;
step C2: and judging whether the difference value between the total memory amount and the system process usage amount is larger than the total memory amount distributed in the NodeManager.
At this time, determining the health status of the nodes in the cluster according to the result of the evaluation of the memory usage rate, further comprising:
and when the difference value between the total memory amount and the system process usage amount is not greater than the total memory amount distributed in the NodeManager, determining that the state of the node is unhealthy.
Specifically, when the difference between the total memory amount and the system process usage amount is greater than the total memory amount allocated in the NodeManager, the score of the memory usage rate evaluation is 100, otherwise, the score of the memory usage rate evaluation is 0; and when the evaluated value of the memory utilization rate is 0, determining that the state of the node is unhealthy.
In this embodiment, the total memory amem, the total memory nmem allocated in the Nodemanager, and the system process usage amount smem are obtained from a/proc/meminfo (file occupied by the current system displayed by the Linux system) file. Then the memory score is amem-smem > nmem? 100: 0. and when the memory utilization rate is 0, determining that the node state is unhealthy.
The following describes the embodiment of the present invention in further detail by using a specific application example, and fig. 5 shows a flowchart of a method for managing node manager states in a cluster provided by a specific application example in the embodiment of the present invention. As shown in fig. 5, the method comprises the steps of:
step 510: and evaluating the network delay and the bad disk block of the cluster to obtain a network score and a disk score.
Step 520: judging whether the host resource is exclusive to YARN; if yes, go to step 540; otherwise, go to step 530;
step 530: and evaluating the CPU utilization rate and the memory utilization rate of the cluster to obtain a CPU score and a memory score.
Step 540: judging whether any one of the scores meets respective preset conditions; if yes, go to step 550; otherwise, step 560 is performed.
Step 550: the NodeManager is offline and the configuration is modified.
Before this step is performed, the node status is set to unhealthy.
Step 560: and continuing to operate.
Before this step is performed, the node status is set to healthy.
In this embodiment, the network, the disk, the CPU, and the memory of the host are scored by comprehensively determining the real-time states of the network, the disk, the CPU, and the memory of the host, and when a certain score satisfies a certain score, the NodeManager role of the host is offline, and when conditions are recovered, such as disk repair, memory occupancy reduction, and network delay, satisfy an online condition, that is, when it is determined that the state of the node in the cluster is healthy according to the evaluation of the hardware state of the cluster, the NodeManager is modified to an appropriate parameter to recover the online.
The embodiment of the invention automatically collects and evaluates the hardware state of the cluster, determines the health state of the nodes in the cluster according to the evaluation result, and carries out offline operation on the NodeManager when the state of the nodes is unhealthy, thereby realizing the prejudgment and automatic offline of the NodeManager before the fault and ensuring the stable operation of the system; meanwhile, the embodiment of the invention does not only evaluate the health state of the node according to the states of the memory and the CPU in the configuration, thereby avoiding the condition that the task fails due to the failure of Container allocation when a node host is preempted by a plurality of application programs.
Fig. 6 shows a schematic structural diagram of a node manager state management apparatus in a cluster according to an embodiment of the present invention. As shown in fig. 6, the apparatus 600 includes: an evaluation module 610, a determination module 620, and a management module 630.
The evaluation module 610 is configured to collect network load information of a cluster, and evaluate a hardware state of the cluster according to the network load information; the determining module 620 is configured to determine the health status of the nodes in the cluster according to the evaluation result; the management module 630 is used to perform offline operation on the NodeManager when the state of the node is unhealthy.
In an optional manner, the evaluation module 610 is further configured to:
collecting network load information of a cluster;
and evaluating the network delay of the cluster according to the network load information, and evaluating the disk state of the cluster.
In an alternative manner, when the host resource is not exclusively YARN, the evaluation module 610 is further configured to:
evaluating the CPU utilization rate and the memory utilization rate;
the determining module 620 is further configured to:
and determining the health state of the nodes in the cluster according to the evaluation results of the network delay, the disk state, the CPU utilization rate and the memory utilization rate.
In an alternative manner, when the host resource is YARN exclusive, the evaluation module 610 is further configured to:
and when the network delay exceeds a preset value, evaluating the network delay of the cluster by combining the historical network delay and the health state record of the corresponding node.
In an optional manner, the apparatus further comprises:
a configuration module 640, configured to reconfigure CPU resources and memory resources;
a modifying module 650, configured to modify a parameter of the NodeManager configuration file to the reconfigured value when the state health of the node in the cluster is determined according to the evaluation of the hardware state of the cluster;
the management module 630 is further configured to perform an online operation on the NodeManager.
In an optional manner, the evaluation module 610 is further configured to:
acquiring request queuing time and processing time of an RPC queue through a JMX interface monitored by JMX in Hadoop;
summing the request queuing times of all the nodes, then averaging to obtain a reference queue time, and taking the processing time of the first host as a reference processing time;
judging whether the network delay of the first host is greater than the reference queue time or not, or whether the network delay of the second host is greater than the reference processing time or not;
the determining module 620 is further configured to:
and when the network delay of the first host is larger than the reference queue time or the network delay of the second host is larger than the reference processing time, determining that the state of the node is unhealthy.
In an optional manner, the evaluation module 610 is further configured to:
checking the running state of the disk through a script;
judging whether the magnetic disk reports errors or not;
the determining module 620 is further configured to:
and when a certain disk in the disks of the cluster reports an error, determining that the state of the node is unhealthy.
In an optional manner, the evaluation module 610 is further configured to:
calculating the total core number N of the current CPU through a script, and determining the utilization rate p of the CPU used by the current non-YARN and the core number M of the CPU distributed by the NodeManager;
subtracting the product of N and (1-p) from M to obtain the evaluated value of the CPU utilization rate;
the determining module 620 is further configured to:
and when the evaluated value of the CPU utilization rate exceeds a preset CPU utilization rate threshold value, determining that the state of the node is unhealthy.
In an optional manner, the evaluation module 610 is further configured to:
acquiring the total memory, the total memory allocated in the NodeManager and the use amount of the system process through the script;
judging whether the difference value between the total memory amount and the system process usage amount is larger than the total memory amount distributed in the NodeManager;
the determining module 620 is further configured to:
and when the difference value between the total memory amount and the system process usage amount is not greater than the total memory amount distributed in the NodeManager, determining that the state of the node is unhealthy.
The embodiment of the invention automatically collects and evaluates the hardware state of the cluster, determines the health state of the nodes in the cluster according to the evaluation result, and carries out offline operation on the NodeManager when the state of the nodes is unhealthy, thereby realizing the prejudgment and automatic offline of the NodeManager before the fault and ensuring the stable operation of the system; meanwhile, the embodiment of the invention does not only evaluate the health state of the node according to the states of the memory and the CPU in the configuration, thereby avoiding the condition that the task fails due to the failure of Container allocation when a node host is preempted by a plurality of application programs.
An embodiment of the present invention provides a computer storage medium, where at least one executable instruction is stored in the storage medium, and the executable instruction enables a processor to execute the NodeManager state management method in a cluster in any of the above method embodiments.
The embodiment of the invention automatically collects and evaluates the hardware state of the cluster, determines the health state of the nodes in the cluster according to the evaluation result, and carries out offline operation on the NodeManager when the state of the nodes is unhealthy, thereby realizing the prejudgment and automatic offline of the NodeManager before the fault and ensuring the stable operation of the system; meanwhile, the embodiment of the invention does not only evaluate the health state of the node according to the states of the memory and the CPU in the configuration, thereby avoiding the condition that the task fails due to the failure of Container allocation when a node host is preempted by a plurality of application programs.
An embodiment of the present invention provides a computer program product, where the computer program product includes a computer program stored on a computer storage medium, where the computer program includes program instructions, and when the program instructions are executed by a computer, the computer is caused to execute the NodeManager state management method in a cluster in any of the above method embodiments.
The embodiment of the invention automatically collects and evaluates the hardware state of the cluster, determines the health state of the nodes in the cluster according to the evaluation result, and carries out offline operation on the NodeManager when the state of the nodes is unhealthy, thereby realizing the prejudgment and automatic offline of the NodeManager before the fault and ensuring the stable operation of the system; meanwhile, the embodiment of the invention does not only evaluate the health state of the node according to the states of the memory and the CPU in the configuration, thereby avoiding the condition that the task fails due to the failure of Container allocation when a node host is preempted by a plurality of application programs.
Fig. 7 is a schematic structural diagram of a computing device according to an embodiment of the present invention, and a specific embodiment of the present invention does not limit a specific implementation of the computing device.
As shown in fig. 7, the computing device may include: a processor (processor)702, a Communications Interface 704, a memory 706, and a communication bus 708.
Wherein: the processor 702, communication interface 704, and memory 706 communicate with each other via a communication bus 708. A communication interface 704 for communicating with network elements of other devices, such as clients or other servers. The processor 702 is configured to execute the program 710, and may specifically execute the NodeManager state management method in the cluster in any of the method embodiments described above.
In particular, the program 710 may include program code that includes computer operating instructions.
The processor 702 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement an embodiment of the present invention. The computing device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
The memory 706 stores a program 710. The memory 706 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The embodiment of the invention automatically collects and evaluates the hardware state of the cluster, determines the health state of the nodes in the cluster according to the evaluation result, and carries out offline operation on the NodeManager when the state of the nodes is unhealthy, thereby realizing the prejudgment and automatic offline of the NodeManager before the fault and ensuring the stable operation of the system; meanwhile, the embodiment of the invention does not only evaluate the health state of the node according to the states of the memory and the CPU in the configuration, thereby avoiding the condition that the task fails due to the failure of Container allocation when a node host is preempted by a plurality of application programs.
The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specified otherwise.

Claims (12)

1. A NodeManager state management method in a cluster is characterized by comprising the following steps:
collecting network load information of a cluster, and evaluating the hardware state of the cluster according to the network load information;
determining the health state of the nodes in the cluster according to the evaluation result;
and when the state of the node is unhealthy, performing offline operation on the node manager.
2. The method of claim 1, wherein collecting network load information for a cluster, evaluating a hardware state of the cluster based on the network load information, further comprises:
collecting network load information of a cluster;
and evaluating the network delay of the cluster according to the network load information, and evaluating the disk state of the cluster.
3. The method of claim 2, wherein when the host resource is not exclusively YARN, the method further comprises:
evaluating the CPU utilization rate and the memory utilization rate;
the determining the health status of the nodes in the cluster according to the result of the evaluation further comprises:
and determining the health state of the nodes in the cluster according to the evaluation results of the network delay, the disk state, the CPU utilization rate and the memory utilization rate.
4. The method of claim 2, wherein when the host resource is YARN exclusive, the method further comprises:
and when the network delay exceeds a preset value, evaluating the network delay of the cluster by combining the historical network delay and the health state record of the corresponding node.
5. The method according to any one of claims 1-4, further comprising:
reconfiguring CPU resources and memory resources;
when the state health of the nodes in the cluster is determined according to the evaluation of the hardware state of the cluster, modifying the parameters of the NodeManager configuration file into the reconfigured values;
and carrying out online operation on the NodeManager.
6. The method of claim 2, wherein the evaluating network latency of the cluster based on the network load information further comprises:
acquiring request queuing time and processing time of an RPC queue through a JMX interface monitored by JMX in Hadoop;
summing the request queuing times of all the nodes, then averaging to obtain a reference queue time, and taking the processing time of the first host as a reference processing time;
judging whether the network delay of the first host is greater than the reference queue time or not, or whether the network delay of the second host is greater than the reference processing time or not;
the determining the health status of the nodes in the cluster according to the result of the evaluation further comprises:
and when the network delay of the first host is larger than the reference queue time or the network delay of the second host is larger than the reference processing time, determining that the state of the node is unhealthy.
7. The method of claim 2, wherein the evaluating disk status of the cluster further comprises:
checking the running state of the disk through a script;
judging whether the magnetic disk reports errors or not;
the determining the health status of the nodes in the cluster according to the result of the evaluation further comprises:
and when a certain disk in the disks of the cluster reports an error, determining that the state of the node is unhealthy.
8. The method of claim 3, wherein the evaluating CPU usage further comprises:
calculating the total core number N of the current CPU through a script, and determining the utilization rate p of the CPU used by the current non-YARN and the core number M of the CPU distributed by the NodeManager;
subtracting the product of N and (1-p) from M to obtain the evaluated value of the CPU utilization rate;
the determining the health status of the nodes in the cluster according to the result of the evaluation further comprises:
and when the evaluated value of the CPU utilization rate exceeds a preset CPU utilization rate threshold value, determining that the state of the node is unhealthy.
9. The method of claim 3, wherein the evaluating memory usage further comprises:
acquiring the total memory, the total memory allocated in the NodeManager and the use amount of the system process through the script;
judging whether the difference value between the total memory amount and the system process usage amount is larger than the total memory amount distributed in the NodeManager;
the determining the health status of the nodes in the cluster according to the result of the evaluation further comprises:
and when the difference value between the total memory amount and the system process usage amount is not greater than the total memory amount distributed in the NodeManager, determining that the state of the node is unhealthy.
10. An apparatus for managing NodeManager status in a cluster, the apparatus comprising:
the evaluation module is used for collecting network load information of a cluster and evaluating the hardware state of the cluster according to the network load information;
the determining module is used for determining the health state of the nodes in the cluster according to the evaluation result;
and the management module is used for performing offline operation on the NodeManager when the state of the node is unhealthy.
11. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is configured to store at least one executable instruction, which causes the processor to perform the operations of the NodeManager state management method in a cluster according to any of claims 1 to 9.
12. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform the method of NodeManager state management in a cluster according to any of claims 1-9.
CN201910394996.1A 2019-05-13 2019-05-13 NodeManager state management method and device in cluster and computing equipment Active CN111930493B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910394996.1A CN111930493B (en) 2019-05-13 2019-05-13 NodeManager state management method and device in cluster and computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910394996.1A CN111930493B (en) 2019-05-13 2019-05-13 NodeManager state management method and device in cluster and computing equipment

Publications (2)

Publication Number Publication Date
CN111930493A true CN111930493A (en) 2020-11-13
CN111930493B CN111930493B (en) 2023-08-01

Family

ID=73282896

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910394996.1A Active CN111930493B (en) 2019-05-13 2019-05-13 NodeManager state management method and device in cluster and computing equipment

Country Status (1)

Country Link
CN (1) CN111930493B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542027A (en) * 2021-07-16 2021-10-22 中国工商银行股份有限公司 Flow isolation method, device and system based on distributed service architecture
CN114095401A (en) * 2021-11-19 2022-02-25 北京志凌海纳科技有限公司 Network state monitoring method, device, equipment and storage medium of super-convergence system
CN115150460A (en) * 2022-06-30 2022-10-04 济南浪潮数据技术有限公司 Node secure registration method, device, equipment and readable storage medium
CN115495234A (en) * 2022-08-23 2022-12-20 华为技术有限公司 Resource detection method and device
CN116204379A (en) * 2023-02-03 2023-06-02 安芯网盾(北京)科技有限公司 Method and device for detecting health of server software

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102866918A (en) * 2012-07-26 2013-01-09 中国科学院信息工程研究所 Resource management system for distributed programming framework
EP2644551A2 (en) * 2012-03-28 2013-10-02 Murata Machinery, Ltd. Yarn travelling information acquiring device and method
CN104615526A (en) * 2014-12-05 2015-05-13 北京航空航天大学 Monitoring system of large data platform
CN104915407A (en) * 2015-06-03 2015-09-16 华中科技大学 Resource scheduling method under Hadoop-based multi-job environment
US20160072726A1 (en) * 2014-09-10 2016-03-10 Mohit Soni Systems and methods for resource sharing between two resource allocation systems
CN107038069A (en) * 2017-03-24 2017-08-11 北京工业大学 Dynamic labels match DLMS dispatching methods under Hadoop platform
CN107241752A (en) * 2017-05-26 2017-10-10 华中科技大学 The YARN dispatching methods and system of a kind of sensing network flow
CN108874640A (en) * 2018-05-07 2018-11-23 北京京东尚科信息技术有限公司 A kind of appraisal procedure and device of clustering performance

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2644551A2 (en) * 2012-03-28 2013-10-02 Murata Machinery, Ltd. Yarn travelling information acquiring device and method
CN102866918A (en) * 2012-07-26 2013-01-09 中国科学院信息工程研究所 Resource management system for distributed programming framework
US20160072726A1 (en) * 2014-09-10 2016-03-10 Mohit Soni Systems and methods for resource sharing between two resource allocation systems
CN104615526A (en) * 2014-12-05 2015-05-13 北京航空航天大学 Monitoring system of large data platform
CN104915407A (en) * 2015-06-03 2015-09-16 华中科技大学 Resource scheduling method under Hadoop-based multi-job environment
CN107038069A (en) * 2017-03-24 2017-08-11 北京工业大学 Dynamic labels match DLMS dispatching methods under Hadoop platform
CN107241752A (en) * 2017-05-26 2017-10-10 华中科技大学 The YARN dispatching methods and system of a kind of sensing network flow
CN108874640A (en) * 2018-05-07 2018-11-23 北京京东尚科信息技术有限公司 A kind of appraisal procedure and device of clustering performance

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
J. P. NIVASH 等: "A neural network based framework for apache YARN", 《INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES2014)》, pages 1 - 6 *
潘佳艺 等: "异构Hadoop集群下的负载自适应反馈调度策略", 《计算机工程与科学》, vol. 39, no. 03, pages 413 - 423 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542027A (en) * 2021-07-16 2021-10-22 中国工商银行股份有限公司 Flow isolation method, device and system based on distributed service architecture
CN113542027B (en) * 2021-07-16 2022-10-11 中国工商银行股份有限公司 Flow isolation method, device and system based on distributed service architecture
CN114095401A (en) * 2021-11-19 2022-02-25 北京志凌海纳科技有限公司 Network state monitoring method, device, equipment and storage medium of super-convergence system
CN115150460A (en) * 2022-06-30 2022-10-04 济南浪潮数据技术有限公司 Node secure registration method, device, equipment and readable storage medium
CN115495234A (en) * 2022-08-23 2022-12-20 华为技术有限公司 Resource detection method and device
CN115495234B (en) * 2022-08-23 2023-11-28 华为技术有限公司 Resource detection method and device
CN116204379A (en) * 2023-02-03 2023-06-02 安芯网盾(北京)科技有限公司 Method and device for detecting health of server software
CN116204379B (en) * 2023-02-03 2023-08-15 安芯网盾(北京)科技有限公司 Method and device for detecting health of server software

Also Published As

Publication number Publication date
CN111930493B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
CN111930493B (en) NodeManager state management method and device in cluster and computing equipment
EP3335120B1 (en) Method and system for resource scheduling
CN113051075B (en) Kubernetes intelligent capacity expansion method and device
CN111625331B (en) Task scheduling method, device, platform, server and storage medium
US20140165061A1 (en) Statistical packing of resource requirements in data centers
US20110161972A1 (en) Goal oriented performance management of workload utilizing accelerators
US20050132379A1 (en) Method, system and software for allocating information handling system resources in response to high availability cluster fail-over events
CN103019853A (en) Method and device for dispatching job task
Sun et al. Rose: Cluster resource scheduling via speculative over-subscription
CN113886089B (en) Task processing method, device, system, equipment and medium
CN104298550A (en) Hadoop-oriented dynamic scheduling method
CN107430526B (en) Method and node for scheduling data processing
CN109873714B (en) Cloud computing node configuration updating method and terminal equipment
CN111459641A (en) Cross-machine-room task scheduling and task processing method and device
CN116467082A (en) Big data-based resource allocation method and system
CN114666335A (en) DDS-based distributed system load balancing device
CN109257256A (en) Apparatus monitoring method, device, computer equipment and storage medium
CN111158896A (en) Distributed process scheduling method and system
CN112612604B (en) Task scheduling method and device based on Actor model
CN111885159B (en) Data acquisition method and device, electronic equipment and storage medium
CN114237910A (en) Client load balancing implementation method and device
CN112612579A (en) Virtual machine deployment method, storage medium, and computer device
CN106844021B (en) Computing environment resource management system and management method thereof
CN112181443A (en) Automatic service deployment method and device and electronic equipment
CN112416538A (en) Multilayer architecture and management method of distributed resource management framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant