WO2016165242A1

WO2016165242A1 - Method of adjusting number of nodes in system and device utilizing same

Info

Publication number: WO2016165242A1
Application number: PCT/CN2015/086049
Authority: WO
Inventors: 刘建鹏; 尤元建
Original assignee: 中兴通讯股份有限公司
Priority date: 2015-04-14
Filing date: 2015-08-04
Publication date: 2016-10-20
Also published as: CN106155805A

Abstract

A method of adjusting a number of nodes in a system and device utilizing the same. The method comprises: receiving current hardware use condition information transmitted by each of node devices in a cluster (101); acquiring, according to the received hardware use information, index information describing a current use condition of the cluster (102); and adjusting, according to the index information, a number of activated node devices in the cluster (103). The adjusting method addresses the current technical problem of node device resource wastage or node device resource shortage in a Hadoop system resulting from being unable to dynamically adjust the number of the nodes in the Hadoop system.

Description

Method and device for adjusting number of nodes in system

Technical field

The present invention relates to the field of distributed storage technologies, and in particular, to a method and an apparatus for adjusting the number of nodes in a system.

Background technique

Hadoop is an open source software framework for distributed processing of large amounts of data. The storage core of Hadoop is the Hadoop Distributed File System (HDFS). HDFS is suitable for running on general-purpose hardware and needs to be deployed on a large number of devices to support large-scale data sets and high-throughput data access, and to achieve high numbers by multiple copies that can be distributed across different devices. Fault tolerance. A large number of devices deployed by HDFS form a Hadoop cluster.

However, the number of nodes in the current Hadoop system is defined at the beginning of use, and cannot be dynamically reduced or increased according to the actual amount of data, resulting in unnecessary waste of node device resources or insufficient technical resources in the Hadoop system.

Summary of the invention

The main technical problem to be solved by the embodiments of the present invention is to provide a method and a device for adjusting the number of nodes in the system, which can solve the problem that the node device resources in the current Hadoop system cannot be dynamically adjusted or the node device resources in the Hadoop system are insufficient. Technical problem.

To solve the above technical problem, an embodiment of the present invention provides a method for adjusting the number of nodes in a system, including the following steps:

Receiving current hardware usage status information sent by each node device in the cluster, where the node device is an activated node device;

Obtaining indicator information for describing a current usage status of the cluster according to the received hardware usage status information;

Adjusting the number of node devices that have been started in the cluster according to the indicator information.

In the embodiment of the present invention, the step of adjusting the number of node devices that have been started in the cluster according to the indicator information includes:

Determining, according to the indicator information, whether the device resource usage rate of the cluster is too high or too low,

If the device resource usage rate of the cluster is too low, reducing the number of node devices that are started in the cluster;

If the device resource usage of the cluster is too high, the number of node devices that have been started in the cluster is increased.

In the embodiment of the present invention, the step of increasing the number of node devices that have been started in the cluster includes:

Controlling that the node device that is not started starts and joins the cluster to increase the number of node devices that have been started in the cluster;

The step of reducing the number of node devices that have been started in the cluster includes:

The node devices that have been started within the cluster are controlled to be shut down to reduce the number of node devices that have been started within the cluster.

In the embodiment of the present invention, the step of controlling the node device that is not started to start and join the cluster to increase the number of node devices that are started in the cluster includes:

Controlling, by the IPMI (Intelligent Platform Management Interface) of the node device that is not activated, that the unstarted node device starts up and joins the cluster to increase the number of node devices that are started in the cluster;

The step of controlling the started node device in the cluster to be closed to reduce the number of node devices in the cluster includes:

The activated node device is turned off by the IPMI of the node device that has been started within the cluster to reduce the number of node devices that have been started within the cluster.

In the embodiment of the present invention, the step of controlling, by the IPMI of the node device that is not started, the initiating node device to start and join the cluster includes:

Sending a boot command to an IPMI (Intelligent Platform Management Interface) of the unactivated node device to enable an unstarted node device to boot and join the cluster;

The step of controlling, by the IPMI of the node device that has been started in the cluster, that the activated node device is shut down includes:

A shutdown command is sent to the IPMI of the node device that has been started within the cluster to cause the activated node device to shut down.

In the embodiment of the present invention, the indicator information includes at least one indicator, and the step of determining, according to the indicator information, whether the device resource usage rate of the cluster is too high or too low includes:

Comparing the values of each indicator with their respective preset lower thresholds and their respective preset upper thresholds;

The device resource usage rate of the cluster is determined to be too high or too low according to the comparison result.

In the embodiment of the present invention, the step of determining, according to the comparison result, whether the device resource usage rate of the cluster is too high or too low includes:

Determining that the device resource usage rate of the cluster is too low, when the value of the at least one indicator continues to be less than the corresponding preset lower threshold in the preset time period;

When the value of the at least one indicator continues to be greater than the corresponding preset uplink threshold in the preset time period, it is determined that the device resource usage rate of the cluster is too high.

In the embodiment of the present invention, the hardware usage status information includes: at least one of a device disk space usage rate, a current CPU usage rate of the device, and a current memory usage rate of the device;

Correspondingly, the indicator information includes: at least one of a total disk space usage rate in the cluster, an average CPU usage rate of the devices in the cluster, and an average memory usage rate of the devices in the cluster.

Also in order to solve the above technical problem, the present invention also provides another method for adjusting the number of nodes in the system. The method is applied to a node device in a system cluster, and includes the following steps:

After the node device is started, collecting hardware usage status information of the node device itself;

Sending the collected hardware usage information to the receiving end, so that the receiving end obtains the indicator information used to describe the current usage state of the cluster according to the hardware usage information, and then adjusts the activated node in the cluster according to the indicator information. The number of devices.

In the embodiment of the present invention, the step of sending the collected hardware usage information to the collected includes:

The collected hardware usage information is sent and given by the JMX (Java Management Extensions) framework used by the system.

Also, in order to solve the above technical problem, the embodiment of the present invention further provides an apparatus for adjusting the number of nodes in the system, including: a receiving module, an analyzing module, and a processing module;

The receiving module is configured to receive current hardware usage status information sent by each node device in the cluster, where the node device is an activated node device;

The analyzing module is configured to obtain, according to the received hardware usage status information, indicator information used to describe a current usage status of the cluster;

The processing module is configured to adjust the number of node devices that have been started in the cluster according to the indicator information.

In the embodiment of the present invention, the processing module includes: a determining module and an adjusting module;

The determining module is configured to determine, according to the indicator information, whether a device resource usage rate of the cluster is too high or too low;

The adjustment module is set to:

When the determining module determines that the device resource usage rate of the cluster is too low, the node device that is started in the cluster is controlled to be closed to reduce the number of node devices that are started in the cluster;

When the determining module determines that the device resource usage rate of the cluster is too high, the node device that controls the unstarting starts and joins the cluster to increase the number of node devices that are started in the cluster.

In the embodiment of the present invention, the adjustment module is configured to:

Controlling, by the IPMI of the node device that is not started, that the unstarted node device starts up and joins the cluster to increase the number of node devices that are started in the cluster;

In order to solve the above technical problem, the present invention further provides another apparatus for adjusting the number of nodes in the system, where the adjusting apparatus is applied to a node device in a system cluster, including: an acquisition module and a sending module;

The collecting module is configured to collect hardware usage status information of the node device itself after the node device is started;

The sending module is configured to send the collected hardware usage status information, so that the receiving end obtains indicator information used to describe the current usage status of the cluster according to the hardware usage status information, and then adjusts the location according to the indicator information. The number of node devices that have been started in the cluster.

In the embodiment of the present invention, the adjusting device further includes an IPMI and a control module;

The IPMI is configured to receive a shutdown command of an external transmission and transmit the shutdown command to the control module;

The control module is configured to shut down the node device after receiving the shutdown command.

The beneficial effects of the embodiments of the present invention are:

An embodiment of the present invention provides a method and an apparatus for adjusting the number of nodes in a system, where the method for adjusting the number of nodes in the system according to the embodiment of the present invention includes: receiving current hardware usage status information sent by each node device in the cluster. And obtaining, according to the received hardware usage information, indicator information used to describe a current usage state of the cluster; and adjusting, according to the indicator information, a number of node devices that are started in the cluster; the adjustment method of the present invention may be based on a current cluster The usage status dynamically adjusts the number of node devices in the Hadoop cluster to meet the requirements of the cluster to the node device resources, thereby improving the utilization of node devices in the Hadoop system, reducing unnecessary resource waste, and solving the shortage of node device resources in the Hadoop system. technical problem.

DRAWINGS

1 is a schematic flowchart of a method for adjusting a number of nodes in a Hadoop system according to Embodiment 1 of the present invention;

2 is a schematic flowchart of another method for adjusting the number of nodes in a Hadoop system according to Embodiment 1 of the present invention;

3 is a schematic flowchart of a method for adjusting a number of nodes in a Hadoop system according to Embodiment 2 of the present invention;

4 is a schematic structural diagram of an apparatus for adjusting a number of nodes in a Hadoop system according to Embodiment 3 of the present invention;

5 is a schematic structural diagram of another apparatus for adjusting the number of nodes in a Hadoop system according to Embodiment 3 of the present invention;

6 is a schematic structural diagram of an apparatus for adjusting a number of nodes in a Hadoop system according to Embodiment 4 of the present invention;

FIG. 7 is a schematic structural diagram of another apparatus for adjusting the number of nodes in a Hadoop system according to Embodiment 3 of the present invention; FIG.

FIG. 8 is a schematic structural diagram of a system for adjusting the number of nodes in a Hadoop system according to Embodiment 4 of the present invention.

detailed description

The present invention will be further described in detail below with reference to the accompanying drawings.

Considering the technical problem of waste of node device resources or insufficient node resources in the Hadoop system due to the fact that the number of nodes in the Hadoop system cannot be dynamically adjusted, the present invention provides a method and device for adjusting the number of nodes in the system, which can be used according to the current use of the cluster. The state dynamically adjusts the number of node devices in the system cluster. The following describes the adjustment method and apparatus of the present invention by taking the number of nodes in the Hadoop system as an example. It should be understood that the method and apparatus of this embodiment are not limited to dynamically adjusting the number of nodes in the Hadoop system, and can dynamically adjust node data in other systems.

Embodiment 1:

This embodiment provides a method for adjusting the number of nodes in a Hadoop system. As shown in FIG. 1, the method may include the following steps:

Step 101: Receive current hardware usage status information sent by each node device in the cluster, where the node device is an activated node device.

In this embodiment, the cluster is a Hadoop cluster, and each node device in the cluster needs to collect its own hardware usage status information, and then sends it out. In this embodiment, each node device in the Hadoop cluster is a node device that has been started, and each node device installs an HDFS component after startup, so that Hadoop can perform data access and the like.

In the embodiment of the present invention, the hardware usage status information in this embodiment may include: the current disk space usage rate of the device (ie, the current disk space usage percentage of the device), the current CPU usage of the device (and the current CPU usage percentage of the device), and the current device. At least one of memory usage (that is, the current memory usage percentage of the device).

In this embodiment, the method for receiving the hardware usage information sent by the node device may be multiple, for example, may be received through a network, specifically by using a Transmission Control Protocol (TCP)/Internet Protocol (Internet Protocol, referred to as Communicate with the node device for the IP) mode to receive the hardware usage status information collected by the node device.

Step 102: Acquire indicator information for describing a current usage state of the cluster according to the received hardware usage information.

Specifically, in this embodiment, the hardware usage status information sent by each node may be calculated to obtain indicator information describing the current usage status of the cluster. In this embodiment, the cluster is a set of node devices. Therefore, the indicator information describing the current state of use of the cluster in this embodiment may be obtained by using hardware usage information of the device in the cluster, and the indicator information may be related to the hardware of the node device in the cluster. The usage status information corresponds to, for example, when the indicator information is the total disk space usage rate in the cluster, the hardware usage status information collected by the node devices in the cluster may include the device disk space usage rate.

In the embodiment of the present invention, the indicator information in this embodiment may include: total disk space usage in the cluster (that is, the percentage of total disk space usage in the cluster), and average CPU usage of the devices in the cluster (average CPU usage percentage of devices in the cluster) And at least one of the average memory usage of the devices in the cluster (the average percentage of devices in the cluster).

Step 103: Adjust the number of node devices that have been started in the cluster according to the indicator information.

The method of the embodiment may reduce or increase the number of node devices that have been started in the cluster according to the indicator information. In this embodiment, the number of node devices that have been activated by the node devices in the cluster may be referred to as adjusting the number of node devices in the cluster. Thus, the adjustments (e.g., increase or decrease) of the number of node devices within a cluster as described herein refer to adjusting (e.g., increasing or decreasing) the number of node devices that have been activated within the cluster.

The method in this embodiment can determine, according to the indicator information, whether the device resource usage rate of the cluster is too high or too low.

If the device resource usage rate in the cluster is too low, the number of node devices that are started in the cluster is reduced;

If the device resource usage rate in the cluster is too high, the number of node devices that have been started in the cluster is increased.

In the embodiment of the present invention, when the indicator information in the embodiment includes at least one indicator, the step of determining, according to the indicator information, whether the device resource usage rate of the cluster is too high or too low, in the embodiment includes:

Specifically, the step of determining, according to the comparison result, whether the device resource usage rate of the cluster is too high or too low includes:

In this embodiment, when the indicator value corresponding to the indicator continues to be less than the corresponding preset lower threshold in the preset time period, it is determined that the device resource usage rate of the cluster is too low; for example, the average CPU usage of the device in the cluster is 1 If the number of minutes exceeds the preset uplink threshold, the device resource usage of the cluster is determined to be too low.

When the indicator corresponding to the indicator continues to be greater than the corresponding preset uplink threshold in the preset time period, it is determined that the device resource usage rate of the cluster is too high. For example, the average CPU usage of the devices in the cluster continues to be greater than the preset uplink threshold in one minute, and the device resource usage of the cluster is determined to be too high.

Certainly, the method of the embodiment may also consider the time factor. After comparing the value of each indicator with the corresponding preset lower limit threshold and the corresponding preset upper limit threshold, if at least one indicator has a value smaller than the corresponding If the preset lower threshold is used, it is determined that the device resource usage rate of the cluster is too low; if the value of the at least one indicator is greater than the corresponding preset uplink threshold, the device resource usage rate of the cluster is determined to be too high. .

When the indicator information includes: multiple indicators, for example, the total disk space usage in the cluster (that is, the total disk space usage percentage in the cluster), the average CPU usage of the devices in the cluster (the average CPU usage percentage of the devices in the cluster), and the intra-cluster The average memory usage of the device; the value of each indicator can be compared with the corresponding preset lower threshold; if the value of at least one indicator is less than the corresponding lower threshold, for example, the total disk space usage in the cluster is less than the preset. If the lower threshold and/or the average memory usage rate of the device in the cluster is lower than the preset lower threshold, the device resource usage of the cluster is too low. In this case, the resource is wasted. If the value of at least one indicator is greater than a preset upper threshold (for example, if the total disk space usage in the cluster is greater than a preset upper threshold and/or the average memory usage flow rate in the cluster is greater than a preset upper threshold), then The device resource usage of the cluster is too high. At this time, there is a risk of insufficient device resources. You need to increase the cluster. Node device inside.

In this embodiment, when the indicator information includes multiple indicators, a unified lower threshold and an upper threshold may be set in advance. Of course, different upper and lower thresholds may be set corresponding to different indicators.

The preferred manner of increasing the number of node devices in the cluster in this embodiment includes: controlling an unstarted node device to start and join the cluster to increase the number of node devices in the cluster. In this embodiment, the un-started node device can be controlled to join and join the cluster by the IPMI of the node device that is not activated to increase the number of node devices in the cluster. The node device that is not started in this embodiment may be a node device that is not started in the cluster or a node device that is not activated outside the cluster.

The preferred manner of reducing the number of node devices in the cluster in this embodiment includes: controlling the node devices that have been started in the cluster to be closed to reduce the number of node devices in the cluster. Specifically, the activated node device is controlled to be closed by an IPMI (Intelligent Platform Management Interface) of the node device that has been started in the cluster to reduce the number of node devices in the cluster.

The adjustment method in this embodiment can dynamically adjust the number of node devices in the Hadoop cluster according to the current usage state of the cluster. When the device resource usage rate of the set is too high, the node device is added in the cluster to avoid insufficient node resources in the Hadoop system. When the device resource usage of the set is too low, the node devices in the cluster are reduced to avoid unnecessary waste of device resources.

According to the above description, the method for adjusting the number of nodes in the Hadoop system of this embodiment, as shown in FIG. 2, may include the following steps:

Step 201: Receive current hardware usage status information sent by each node device in the cluster, where the node device is an activated node device.

Specifically, the hardware usage status information transmitted by the node device through the JMX framework may be received, and the received information is JMX information. The method in this embodiment can receive the hardware usage status information transmitted by the node device in multiple manners, for example, by receiving the JMX information by using the network. To obtain the hardware usage status information, the network protocol is first required to process the received JMX information (analysis). , CRC check, calculation, processing, etc.), for example, to restore the original hardware usage information.

In addition, considering that there may be many devices for transmitting data, in order to ensure the real-time processing data, the system uses multi-threading technology to ensure that data can be processed quickly, preventing buffer overflow and data loss, thereby realizing real-time processing requirements.

Step 202: Acquire indicator information for describing a current usage state of the cluster according to the received hardware usage status information, where the indicator information includes at least one indicator.

For example, when the hardware usage information sent by the node device in the cluster includes: the device disk space usage rate, the current CPU usage of the device, and the current memory usage rate of the device;

In this step, the metric information used to describe the current usage status of the cluster is calculated according to the hardware usage information, including: total disk space usage in the cluster, average CPU usage of the devices in the cluster, and average memory usage of the devices in the cluster.

Step 203: Compare the value of each indicator with a corresponding preset lower limit threshold and a corresponding preset upper limit threshold.

Step 204: Determine, according to the comparison result, whether the device resource usage rate of the cluster is too high or too low. If the usage is too high, go to step 205. If it is too low, go to step 206.

For example, this step can determine whether the device resource usage rate of the cluster is too high or too low according to the total disk space usage in the cluster, the average CPU usage of the devices in the cluster, and the average memory usage of the devices in the cluster.

Specifically, each of the indicators adopts the same preset range as an example, and each indicator is compared with an upper threshold and a lower threshold in a preset range; when there is a value of at least one indicator greater than an upper threshold, the device is determined. The resource usage rate is too high. When the value of the indicator is less than the lower threshold, it is determined that the device resource usage rate is too low.

In order to more accurately determine whether the resource is too high or too low, this step may compare each indicator with an upper threshold and a lower threshold in a preset range, and when the value of the at least one indicator continues to be greater than the upper threshold for a preset period of time When the device resource usage rate is too high, when the value of the at least one indicator continues to be less than the lower threshold in the preset time period, it is determined that the device resource usage rate is too low.

Step 205: Control, by the IPMI of the node device that is not started, that the unstarted node device starts and joins the cluster to increase the number of node devices that are started in the cluster.

The number of node devices may be one or more, and the number of the added devices may be preset, for example, one by default or determined according to the indicator information.

The step may specifically: sending a power-on command to the IPMI of the un-started node device to enable the un-started node device to start and join the cluster to increase the number of node devices in the cluster.

When it is determined that the device resources of the cluster are too high, the number of node devices in the cluster needs to be increased. Therefore, in this embodiment, the IPMI address list of the unstarted DataNode node can be read, and the IPMI address of the preset number of node devices is selected. The method sends a power-on command to the corresponding device, and after waiting for the node device to start, records the state of the device: moves the list of DataNode nodes that have not been started from the node to the list of activated nodes; the device that is not started receives the power-on command through the IPMI. The boot command is transmitted to the BMC (Baseboard Management Controller) component in the node device to start. The unstarted node device automatically starts and installs the HDFS component. After the node device starts, the HDFS has been installed according to the configuration of the system boot entry. The component is started with the operating system and joined to the cluster, which expands the Hadoop environment and solves the problem of insufficient resources for the original environment by automatically adding node devices.

Step 206: Control, by the IPMI of the node device that is started in the cluster, that the activated node device is shut down to reduce the number of node devices that are started in the cluster.

The number of node devices that are specifically reduced in the method of this embodiment may be one or more, and the reduced number may be preset, for example, by default, one node device is reduced or determined according to indicator information.

The step may specifically be: sending a shutdown command to the IPMI of the node device that is started in the cluster to shut down the activated node device.

When it is determined that the device resource usage of the cluster is too low, the number of node devices in the cluster needs to be reduced. Therefore, in this embodiment, the IPMI address list of the activated DataNode node can be read, and the IPMI address of the predetermined number of node devices is selected. The shutdown command is sent to the device through the http mode, and the shutdown command is transmitted to the BMC component in the node device to implement shutdown; after the node device is shut down, the state of the device is recorded: the node is moved from the list of activated DataNode nodes to The node list is not started; after the device is shut down by IPMI, the node is temporarily removed from the Hadoop environment, thereby solving the problem of wasted node resources in the environment.

The method of the embodiment automatically increases or decreases the number of nodes in the Hadoop system based on the IPMI, and the performance data of each node is obtained by the collectors in the nodes in each cluster, and is reported to the system through the JMX framework, and the system integrates and analyzes the information. According to the analysis result, the IPMI is used to control the start or stop of the target device, thereby achieving the purpose of dynamically and automatically adjusting the number of devices in the cluster.

Embodiment 2:

As shown in FIG. 3, this embodiment provides a method for adjusting the number of nodes in a Hadoop system, which is applied to a node device in a Hadoop system cluster, and includes the following steps:

Step 301: After the node device is started, collect hardware usage status information of the node device itself.

After the node device is started in the cluster, the device can collect the hardware usage information of the node device, for example, at least one of a device disk space usage rate, a current CPU usage rate of the device, and a current memory usage rate of the device.

In the embodiment of the present invention, the data conversion table may be established on the collected hardware usage status information, and the hardware usage status information is processed based on the data conversion table to form a data table.

Step 302: Send the collected hardware usage information to the receiving end, so that the receiving end obtains the indicator information used to describe the current usage state of the cluster according to the hardware usage information, and then adjusts the The number of node devices that are started.

Specifically, this step may send the collected hardware usage information by using the JMX framework used by the Hadoop system. In the embodiment of the present invention, JMX information can be sent out through a network such as TCP/IP.

In practice, functions can be added to the open source Hadoop framework so that the HDFS component can perform

steps

301 and 302.

The method of the embodiment can be used to collect the hardware usage information of the device, so that the management terminal dynamically adjusts the number of node devices in the cluster, and solves the problem that the node device resources cannot be dynamically adjusted in the current Hadoop system or the nodes in the Hadoop system. Technical problems with insufficient equipment resources.

In the embodiment of the present invention, the method of this embodiment may further include: receiving, by using an IPMI, a shutdown command, and closing after receiving the shutdown instruction. The number of node devices in the cluster is reduced, and the shutdown command can be received through the IPMI, and then the shutdown command is transmitted to the BMC component in the node device, and the shutdown is implemented through the BMC component.

Embodiment 3:

As shown in FIG. 4, this embodiment provides a device for adjusting the number of nodes in a Hadoop system, including: a receiving module 401, an analyzing module 402, and a processing module 403;

The receiving module 401 is configured to receive current hardware usage status information sent by each node device in the cluster, where the node device is an activated node device;

The analyzing module 402 is configured to obtain, according to the received hardware usage status information, indicator information used to describe a current usage status of the cluster;

The processing module 403 is configured to adjust the number of node devices that have been started in the cluster according to the indicator information.

In the embodiment of the present invention, as shown in FIG. 5, the processing module 403 in this embodiment includes: a determining module 4031 and an adjusting module 4032;

The determining module 4031 is configured to determine, according to the indicator information, whether a device resource usage rate of the cluster is too high or too low;

The adjustment module 4032 is configured to:

When the determining module 4031 determines that the device resource usage rate of the cluster is too low, the node device that is started in the cluster is controlled to be closed to reduce the number of node devices that are started in the cluster;

When the determining module 4031 determines that the device resource usage rate of the cluster is too high, the node device that controls the initiating is started and joins the cluster to increase the number of node devices that are started in the cluster.

In the embodiment of the present invention, the adjusting module 4032 is configured to:

The adjusting device of the present invention can dynamically adjust the number of node devices in the Hadoop cluster according to the current usage state of the cluster, and meet the requirements of the cluster to the node device resources, thereby improving the utilization of the node devices in the Hadoop system and reducing unnecessary resource waste. And solve the technical problem of insufficient node device resources in the Hadoop system.

Embodiment 4:

As shown in FIG. 6, the embodiment provides a device for adjusting the number of nodes in a Hadoop system, and the device is applied to a node device in a Hadoop system cluster, and includes: an acquisition module 601 and a sending module 602;

The collecting module 601 is configured to collect hardware usage status information of the node device itself after the node device is started;

The sending module 602 is configured to send and receive the collected hardware usage status information, so that the receiving end acquires indicator information used to describe the current usage status of the cluster according to the hardware usage status information, and then adjusts according to the indicator information. The number of node devices that have been started within the cluster.

In the embodiment of the present invention, the sending module 602 of the embodiment is configured to send and receive the collected hardware usage information by using the JMX framework used by the Hadoop system.

In the embodiment of the present invention, as shown in FIG. 7, the adjustment apparatus of this embodiment further includes: an IPMI interface 604 and a control module 603;

The IPMI interface 604 is configured to receive an externally transmitted shutdown command and transmit the shutdown command to the control module;

The control module 603 is configured to shut down the node device after receiving the shutdown command.

In this embodiment, the adjusting device can collect hardware usage information of the device for the management terminal to dynamically adjust the number of node devices in the cluster, and solve the problem that the node device resources in the Hadoop system cannot be dynamically adjusted or the node device in the Hadoop system. Technical problems with insufficient resources.

Embodiment 5:

As shown in FIG. 8 , the embodiment provides an adjustment system for the number of nodes in the Hadoop system, including: a plurality of unstarted first node devices 80, and a plurality of activated second node devices 81 located in the Hadoop system cluster. And node management device 82;

The first node device 80 is provided with an IPMI interface 801, and the second node device 81 is provided with an IPMI interface 813, an acquisition module 811, and a sending module 812;

The node management device 82 includes: a receiving module 821, an analysis module 822, and a processing module 823;

The collection module 811 of the second node device 81 is configured to collect hardware usage status data of the node device where the module is located;

The collecting module 812 of the second node device 81 is specifically configured to establish a data conversion table for the collected basic data, process the state data of the host based on the data conversion table, form a data table, and report the data through the JMX method. Give the system. The data is used to describe the hardware usage status of the host node, and mainly includes the total disk space of the host, the used disk space of the host, the current CPU usage percentage, the current memory usage percentage, and the like; and then the data table is sent through the sending module.

The sending module 813 of the second node device 81 is configured to send the collected data to the receiving module 821 in the node management device 82, specifically, can be sent to the receiving module 821 through the JMX framework;

The receiving module 821 in the node management device 82 is configured to receive data reported by the sending module 813, such as a data table. The receiving module is responsible for the collection and preliminary processing of the data table, wherein the specific process comprises two parts: receiving of the original data, processing of the original data. For example, the underlying interface may be a network interface, and communicates with the sending module of the node device by using TCP/IP. The original data received from the acquisition module is received through the interface provided by the system, and submitted to the data receiving module. The data receiving module shall parse the original data according to the protocol of different devices, and perform CRC (Cyclic Redundancy Check Code) verification, calculation, and processing. After processing, submit it to the data analysis module. Considering that there are many devices that can send data, in order to ensure the real-time processing of data, the system uses multi-threading technology to ensure that data can be processed quickly, preventing buffer overflow and data loss, thus realizing the requirements of real-time processing.

The analysis module 822 in the node management device 82 is configured to sort the performance data transmitted by the receiving module 821; based on the data table sent by the collection module, the receiving module extracts the corresponding type of data information according to the keywords in the table, and according to different types of data. Different rules calculate metric information that accurately describes the state of the entire cluster, such as the average CPU usage of hosts in the cluster, the average memory usage percentage of hosts in the cluster, and the percentage of total disk space usage in the cluster. And determining whether the indicator data is reasonable, the specific determining step is: comparing the obtained indicator data with the set threshold range, if the current indicator data is within a normal threshold range, no additional operations are performed, and the loop is continuously collected. , receiving, etc.; if the current indicator data is outside the normal threshold range, the result is passed to the processing module.

The processing module 823 determines, according to the data obtained by the analysis module 822, if the current indicator data is higher than the normal threshold range, the current system resource usage rate is high, and there is a risk of insufficient resources. If the indicator data continues to be above the threshold for a period of time, then all device lists are read, an unstarted first node device is found, and a boot command is sent to its IPMI; since the IPMI can take the system management software from the system hardware The platform management tasks are separated and the underlying server management functions are separated from the high-level software, so IPMI is not limited by the way the operating system is managed. IPMI can be remotely booted by specifying a generic, streamlined, message-based interface that transmits information to the device's BMC component. The first node device according to the configuration of the system boot entry, the Hadoop component that has been installed will start with the operating system and join the cluster, thereby increasing the system resources available in the cluster. On the contrary, if the current indicator data is lower than the normal threshold range, the current system resource usage rate is low, and there is a waste of resources. If the indicator data continues to be below the threshold for a period of time, then all device lists are read, a second node device that has been activated is found, and a shutdown command is sent to its IPMI. The remote shutdown command can be implemented through the BMC component of the device.

In this embodiment, a system for automatically increasing or decreasing the number of nodes in a Hadoop system based on IPMI is proposed. The acquisition module of each node in the cluster acquires performance data of each node, and reports it to the management device, and the management device controls the target according to the result. The device is started or stopped to achieve the purpose of dynamically and automatically adjusting the number of node devices in the cluster.

The above is a further detailed description of the present invention in connection with the specific embodiments, and the specific embodiments of the present invention are not limited to the description. It will be apparent to those skilled in the art that the present invention may be made without departing from the spirit and scope of the invention.

Industrial applicability

The technical solution provided by the embodiment of the present invention can be used in the process of adjusting the number of nodes in the system, and the method for adjusting the number of nodes in the system provided by the embodiment of the present invention includes: receiving the current current sent by each node device in the cluster The hardware usage information is obtained, and the indicator information used to describe the current usage status of the cluster is obtained according to the received hardware usage information; and the number of node devices that are started in the cluster is adjusted according to the indicator information; Dynamically adjust the number of node devices in the Hadoop cluster according to the current usage status of the cluster to meet the requirements of the cluster to the node device resources, thereby improving the utilization of node devices in the Hadoop system, reducing unnecessary resource waste, and solving nodes in the Hadoop system. Technical problems with insufficient equipment resources.

Claims

A method for adjusting the number of nodes in a system includes the following steps:

Receiving current hardware usage status information sent by each node device in the cluster, where the node device is an activated node device;

Obtaining indicator information for describing a current usage status of the cluster according to the received hardware usage status information;

Adjusting the number of node devices that have been started in the cluster according to the indicator information.
The method of claim 1, wherein the step of adjusting the number of node devices activated in the cluster according to the indicator information comprises:

Determining, according to the indicator information, whether the device resource usage rate of the cluster is too high or too low,

If the device resource usage rate of the cluster is too low, reducing the number of node devices that are started in the cluster;

If the device resource usage of the cluster is too high, the number of node devices that have been started in the cluster is increased.
The method of claim 2 wherein said step of increasing the number of node devices activated within said cluster comprises:

Controlling that the node device that is not started starts and joins the cluster to increase the number of node devices that have been started in the cluster;

The step of reducing the number of node devices that have been started in the cluster includes:

The node devices that have been started within the cluster are controlled to be shut down to reduce the number of node devices that have been started within the cluster.
The method of claim 3, wherein the step of controlling the unstarted node device to boot and join the cluster to increase the number of node devices activated within the cluster comprises:

Controlling, by the intelligent management platform interface IPMI of the node device that is not activated, that the unstarted node device starts and joins the cluster to increase the number of node devices that are started in the cluster;

The step of controlling the started node device in the cluster to be closed to reduce the number of node devices in the cluster includes:

The activated node device is turned off by the IPMI of the node device that has been started within the cluster to reduce the number of node devices that have been started within the cluster.
The method of claim 4, wherein the step of controlling, by the IPMI of the node device that is not activated, the initiating node device to boot and join the cluster comprises:

Sending a boot command to the IPMI of the unactivated node device to enable the unstarted node device to boot and join the cluster;

The step of controlling, by the IPMI of the node device that has been started in the cluster, that the activated node device is shut down includes:

A shutdown command is sent to the IPMI of the node device that has been started within the cluster to cause the activated node device to shut down.
The method of claim 2, wherein the indicator information comprises at least one indicator, and the step of determining, according to the indicator information, whether the device resource usage rate of the cluster is too high or too low comprises:

Comparing the values of each indicator with their respective preset lower thresholds and their respective preset upper thresholds;

The device resource usage rate of the cluster is determined to be too high or too low according to the comparison result.
The method of claim 6, wherein the step of determining, according to the comparison result, whether the device resource usage rate of the cluster is too high or too low comprises:

Determining that the device resource usage rate of the cluster is too low, when the value of the at least one indicator continues to be less than the corresponding preset lower threshold in the preset time period;

When the value of the at least one indicator continues to be greater than the corresponding preset uplink threshold in the preset time period, it is determined that the device resource usage rate of the cluster is too high.
The method according to any one of claims 1 to 7, wherein the hardware usage status information comprises: at least one of a device disk space usage rate, a current CPU usage rate of the device, and a current memory usage rate of the device;

Correspondingly, the indicator information includes: at least one of a total disk space usage rate in the cluster, an average CPU usage rate of the devices in the cluster, and an average memory usage rate of the devices in the cluster.
A method for adjusting the number of nodes in a system, the method being applied to a node device in a system cluster, the method comprising the following steps:

After the node device is started, collecting hardware usage status information of the node device itself;

Sending the collected hardware usage information to the receiving end, so that the receiving end obtains the indicator information used to describe the current usage state of the cluster according to the hardware usage information, and then adjusts the activated node in the cluster according to the indicator information. The number of devices.
The method of claim 9, wherein the step of transmitting the collected hardware usage information is performed by:

The collected hardware usage information is sent and given by the Java Management Extension JMX framework used by the system.
The method of claim 9 or 10, wherein the method further comprises:

Receiving a shutdown command through the intelligent management platform interface IPMI, and closing the node device after receiving the shutdown instruction.
An apparatus for adjusting the number of nodes in a system, comprising: a receiving module, an analyzing module, and a processing module;

The receiving module is configured to receive current hardware usage status information sent by each node device in the cluster, where the node device is an activated node device;

The analyzing module is configured to obtain, according to the received hardware usage status information, indicator information used to describe a current usage status of the cluster;

The processing module is configured to adjust the number of node devices that have been started in the cluster according to the indicator information.
The adjusting device according to claim 12, wherein the processing module comprises: a determining module and an adjusting module;

The determining module is configured to determine, according to the indicator information, whether a device resource usage rate of the cluster is too high or too low;

The adjustment module is set to:

When the determining module determines that the device resource usage rate of the cluster is too low, the node device that is started in the cluster is controlled to be closed to reduce the number of node devices that are started in the cluster;

When the determining module determines that the device resource usage rate of the cluster is too high, the node device that controls the unstarting starts and joins the cluster to increase the number of node devices that are started in the cluster.
The adjusting device according to claim 13, wherein the adjusting module is configured to:

Controlling, by the IPMI of the node device that is not started, that the unstarted node device starts up and joins the cluster to increase the number of node devices that are started in the cluster;

The activated node device is turned off by the IPMI of the node device that has been started within the cluster to reduce the number of node devices that have been started within the cluster.
An apparatus for adjusting the number of nodes in a system, the adjusting apparatus is applied to a node device in a system cluster, and the apparatus includes: an acquisition module and a sending module;

The collecting module is configured to collect hardware usage status information of the node device itself after the node device is started;

The sending module is configured to send the collected hardware usage status information, so that the receiving end obtains indicator information used to describe the current usage status of the cluster according to the hardware usage status information, and then adjusts the location according to the indicator information. The number of node devices that have been started in the cluster.
The adjustment apparatus according to claim 15, further comprising: an intelligent management platform interface IPMI and a control module;

The IPMI is configured to receive a shutdown command of an external transmission and transmit the shutdown command to the control module;

The control module is configured to shut down the node device after receiving the shutdown command.