WO2016165242A1 - Method of adjusting number of nodes in system and device utilizing same - Google Patents

Method of adjusting number of nodes in system and device utilizing same Download PDF

Info

Publication number
WO2016165242A1
WO2016165242A1 PCT/CN2015/086049 CN2015086049W WO2016165242A1 WO 2016165242 A1 WO2016165242 A1 WO 2016165242A1 CN 2015086049 W CN2015086049 W CN 2015086049W WO 2016165242 A1 WO2016165242 A1 WO 2016165242A1
Authority
WO
WIPO (PCT)
Prior art keywords
cluster
node
node device
started
module
Prior art date
Application number
PCT/CN2015/086049
Other languages
French (fr)
Chinese (zh)
Inventor
刘建鹏
尤元建
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016165242A1 publication Critical patent/WO2016165242A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the present invention relates to the field of distributed storage technologies, and in particular, to a method and an apparatus for adjusting the number of nodes in a system.
  • Hadoop is an open source software framework for distributed processing of large amounts of data.
  • the storage core of Hadoop is the Hadoop Distributed File System (HDFS).
  • HDFS is suitable for running on general-purpose hardware and needs to be deployed on a large number of devices to support large-scale data sets and high-throughput data access, and to achieve high numbers by multiple copies that can be distributed across different devices. Fault tolerance.
  • a large number of devices deployed by HDFS form a Hadoop cluster.
  • the number of nodes in the current Hadoop system is defined at the beginning of use, and cannot be dynamically reduced or increased according to the actual amount of data, resulting in unnecessary waste of node device resources or insufficient technical resources in the Hadoop system.
  • the main technical problem to be solved by the embodiments of the present invention is to provide a method and a device for adjusting the number of nodes in the system, which can solve the problem that the node device resources in the current Hadoop system cannot be dynamically adjusted or the node device resources in the Hadoop system are insufficient.
  • an embodiment of the present invention provides a method for adjusting the number of nodes in a system, including the following steps:
  • the step of adjusting the number of node devices that have been started in the cluster according to the indicator information includes:
  • the number of node devices that have been started in the cluster is increased.
  • the step of increasing the number of node devices that have been started in the cluster includes:
  • the step of reducing the number of node devices that have been started in the cluster includes:
  • the node devices that have been started within the cluster are controlled to be shut down to reduce the number of node devices that have been started within the cluster.
  • the step of controlling the node device that is not started to start and join the cluster to increase the number of node devices that are started in the cluster includes:
  • IPMI Intelligent Platform Management Interface
  • the step of controlling the started node device in the cluster to be closed to reduce the number of node devices in the cluster includes:
  • the activated node device is turned off by the IPMI of the node device that has been started within the cluster to reduce the number of node devices that have been started within the cluster.
  • the step of controlling, by the IPMI of the node device that is not started, the initiating node device to start and join the cluster includes:
  • IPMI Intelligent Platform Management Interface
  • the step of controlling, by the IPMI of the node device that has been started in the cluster, that the activated node device is shut down includes:
  • a shutdown command is sent to the IPMI of the node device that has been started within the cluster to cause the activated node device to shut down.
  • the indicator information includes at least one indicator
  • the step of determining, according to the indicator information, whether the device resource usage rate of the cluster is too high or too low includes:
  • the device resource usage rate of the cluster is determined to be too high or too low according to the comparison result.
  • the step of determining, according to the comparison result, whether the device resource usage rate of the cluster is too high or too low includes:
  • the device resource usage rate of the cluster is too high.
  • the hardware usage status information includes: at least one of a device disk space usage rate, a current CPU usage rate of the device, and a current memory usage rate of the device;
  • the indicator information includes: at least one of a total disk space usage rate in the cluster, an average CPU usage rate of the devices in the cluster, and an average memory usage rate of the devices in the cluster.
  • the present invention also provides another method for adjusting the number of nodes in the system.
  • the method is applied to a node device in a system cluster, and includes the following steps:
  • the step of sending the collected hardware usage information to the collected includes:
  • the collected hardware usage information is sent and given by the JMX (Java Management Extensions) framework used by the system.
  • JMX Java Management Extensions
  • the embodiment of the present invention further provides an apparatus for adjusting the number of nodes in the system, including: a receiving module, an analyzing module, and a processing module;
  • the receiving module is configured to receive current hardware usage status information sent by each node device in the cluster, where the node device is an activated node device;
  • the analyzing module is configured to obtain, according to the received hardware usage status information, indicator information used to describe a current usage status of the cluster;
  • the processing module is configured to adjust the number of node devices that have been started in the cluster according to the indicator information.
  • the processing module includes: a determining module and an adjusting module;
  • the determining module is configured to determine, according to the indicator information, whether a device resource usage rate of the cluster is too high or too low;
  • the adjustment module is set to:
  • the determining module determines that the device resource usage rate of the cluster is too low, the node device that is started in the cluster is controlled to be closed to reduce the number of node devices that are started in the cluster;
  • the determining module determines that the device resource usage rate of the cluster is too high, the node device that controls the unstarting starts and joins the cluster to increase the number of node devices that are started in the cluster.
  • the adjustment module is configured to:
  • the activated node device is turned off by the IPMI of the node device that has been started within the cluster to reduce the number of node devices that have been started within the cluster.
  • the present invention further provides another apparatus for adjusting the number of nodes in the system, where the adjusting apparatus is applied to a node device in a system cluster, including: an acquisition module and a sending module;
  • the collecting module is configured to collect hardware usage status information of the node device itself after the node device is started;
  • the sending module is configured to send the collected hardware usage status information, so that the receiving end obtains indicator information used to describe the current usage status of the cluster according to the hardware usage status information, and then adjusts the location according to the indicator information.
  • the number of node devices that have been started in the cluster is configured to send the collected hardware usage status information, so that the receiving end obtains indicator information used to describe the current usage status of the cluster according to the hardware usage status information, and then adjusts the location according to the indicator information. The number of node devices that have been started in the cluster.
  • the adjusting device further includes an IPMI and a control module
  • the IPMI is configured to receive a shutdown command of an external transmission and transmit the shutdown command to the control module;
  • the control module is configured to shut down the node device after receiving the shutdown command.
  • An embodiment of the present invention provides a method and an apparatus for adjusting the number of nodes in a system, where the method for adjusting the number of nodes in the system according to the embodiment of the present invention includes: receiving current hardware usage status information sent by each node device in the cluster.
  • the adjustment method of the present invention may be based on a current cluster
  • the usage status dynamically adjusts the number of node devices in the Hadoop cluster to meet the requirements of the cluster to the node device resources, thereby improving the utilization of node devices in the Hadoop system, reducing unnecessary resource waste, and solving the shortage of node device resources in the Hadoop system. technical problem.
  • FIG. 1 is a schematic flowchart of a method for adjusting a number of nodes in a Hadoop system according to Embodiment 1 of the present invention
  • FIG. 2 is a schematic flowchart of another method for adjusting the number of nodes in a Hadoop system according to Embodiment 1 of the present invention
  • FIG. 3 is a schematic flowchart of a method for adjusting a number of nodes in a Hadoop system according to Embodiment 2 of the present invention
  • FIG. 4 is a schematic structural diagram of an apparatus for adjusting a number of nodes in a Hadoop system according to Embodiment 3 of the present invention.
  • FIG. 5 is a schematic structural diagram of another apparatus for adjusting the number of nodes in a Hadoop system according to Embodiment 3 of the present invention.
  • FIG. 6 is a schematic structural diagram of an apparatus for adjusting a number of nodes in a Hadoop system according to Embodiment 4 of the present invention.
  • FIG. 7 is a schematic structural diagram of another apparatus for adjusting the number of nodes in a Hadoop system according to Embodiment 3 of the present invention.
  • FIG. 8 is a schematic structural diagram of a system for adjusting the number of nodes in a Hadoop system according to Embodiment 4 of the present invention.
  • the present invention provides a method and device for adjusting the number of nodes in the system, which can be used according to the current use of the cluster.
  • the state dynamically adjusts the number of node devices in the system cluster.
  • the following describes the adjustment method and apparatus of the present invention by taking the number of nodes in the Hadoop system as an example. It should be understood that the method and apparatus of this embodiment are not limited to dynamically adjusting the number of nodes in the Hadoop system, and can dynamically adjust node data in other systems.
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • This embodiment provides a method for adjusting the number of nodes in a Hadoop system. As shown in FIG. 1, the method may include the following steps:
  • Step 101 Receive current hardware usage status information sent by each node device in the cluster, where the node device is an activated node device.
  • the cluster is a Hadoop cluster, and each node device in the cluster needs to collect its own hardware usage status information, and then sends it out.
  • each node device in the Hadoop cluster is a node device that has been started, and each node device installs an HDFS component after startup, so that Hadoop can perform data access and the like.
  • the hardware usage status information in this embodiment may include: the current disk space usage rate of the device (ie, the current disk space usage percentage of the device), the current CPU usage of the device (and the current CPU usage percentage of the device), and the current device. At least one of memory usage (that is, the current memory usage percentage of the device).
  • the method for receiving the hardware usage information sent by the node device may be multiple, for example, may be received through a network, specifically by using a Transmission Control Protocol (TCP)/Internet Protocol (Internet Protocol, referred to as Communicate with the node device for the IP) mode to receive the hardware usage status information collected by the node device.
  • TCP Transmission Control Protocol
  • Internet Protocol Internet Protocol
  • Step 102 Acquire indicator information for describing a current usage state of the cluster according to the received hardware usage information.
  • the hardware usage status information sent by each node may be calculated to obtain indicator information describing the current usage status of the cluster.
  • the cluster is a set of node devices. Therefore, the indicator information describing the current state of use of the cluster in this embodiment may be obtained by using hardware usage information of the device in the cluster, and the indicator information may be related to the hardware of the node device in the cluster.
  • the usage status information corresponds to, for example, when the indicator information is the total disk space usage rate in the cluster, the hardware usage status information collected by the node devices in the cluster may include the device disk space usage rate.
  • the indicator information in this embodiment may include: total disk space usage in the cluster (that is, the percentage of total disk space usage in the cluster), and average CPU usage of the devices in the cluster (average CPU usage percentage of devices in the cluster) And at least one of the average memory usage of the devices in the cluster (the average percentage of devices in the cluster).
  • Step 103 Adjust the number of node devices that have been started in the cluster according to the indicator information.
  • the method of the embodiment may reduce or increase the number of node devices that have been started in the cluster according to the indicator information.
  • the number of node devices that have been activated by the node devices in the cluster may be referred to as adjusting the number of node devices in the cluster.
  • the adjustments (e.g., increase or decrease) of the number of node devices within a cluster as described herein refer to adjusting (e.g., increasing or decreasing) the number of node devices that have been activated within the cluster.
  • the method in this embodiment can determine, according to the indicator information, whether the device resource usage rate of the cluster is too high or too low.
  • the number of node devices that have been started in the cluster is increased.
  • the step of determining, according to the indicator information, whether the device resource usage rate of the cluster is too high or too low in the embodiment includes:
  • the device resource usage rate of the cluster is determined to be too high or too low according to the comparison result.
  • the step of determining, according to the comparison result, whether the device resource usage rate of the cluster is too high or too low includes:
  • the device resource usage rate of the cluster is too high.
  • the device resource usage rate of the cluster is too low; for example, the average CPU usage of the device in the cluster is 1 If the number of minutes exceeds the preset uplink threshold, the device resource usage of the cluster is determined to be too low.
  • the device resource usage rate of the cluster is too high. For example, the average CPU usage of the devices in the cluster continues to be greater than the preset uplink threshold in one minute, and the device resource usage of the cluster is determined to be too high.
  • the method of the embodiment may also consider the time factor. After comparing the value of each indicator with the corresponding preset lower limit threshold and the corresponding preset upper limit threshold, if at least one indicator has a value smaller than the corresponding If the preset lower threshold is used, it is determined that the device resource usage rate of the cluster is too low; if the value of the at least one indicator is greater than the corresponding preset uplink threshold, the device resource usage rate of the cluster is determined to be too high. .
  • the indicator information includes: multiple indicators, for example, the total disk space usage in the cluster (that is, the total disk space usage percentage in the cluster), the average CPU usage of the devices in the cluster (the average CPU usage percentage of the devices in the cluster), and the intra-cluster
  • the average memory usage of the device; the value of each indicator can be compared with the corresponding preset lower threshold; if the value of at least one indicator is less than the corresponding lower threshold, for example, the total disk space usage in the cluster is less than the preset. If the lower threshold and/or the average memory usage rate of the device in the cluster is lower than the preset lower threshold, the device resource usage of the cluster is too low. In this case, the resource is wasted.
  • the value of at least one indicator is greater than a preset upper threshold (for example, if the total disk space usage in the cluster is greater than a preset upper threshold and/or the average memory usage flow rate in the cluster is greater than a preset upper threshold), then The device resource usage of the cluster is too high. At this time, there is a risk of insufficient device resources. You need to increase the cluster. Node device inside.
  • a unified lower threshold and an upper threshold may be set in advance.
  • different upper and lower thresholds may be set corresponding to different indicators.
  • the preferred manner of increasing the number of node devices in the cluster in this embodiment includes: controlling an unstarted node device to start and join the cluster to increase the number of node devices in the cluster.
  • the un-started node device can be controlled to join and join the cluster by the IPMI of the node device that is not activated to increase the number of node devices in the cluster.
  • the node device that is not started in this embodiment may be a node device that is not started in the cluster or a node device that is not activated outside the cluster.
  • the preferred manner of reducing the number of node devices in the cluster in this embodiment includes: controlling the node devices that have been started in the cluster to be closed to reduce the number of node devices in the cluster.
  • the activated node device is controlled to be closed by an IPMI (Intelligent Platform Management Interface) of the node device that has been started in the cluster to reduce the number of node devices in the cluster.
  • IPMI Intelligent Platform Management Interface
  • the adjustment method in this embodiment can dynamically adjust the number of node devices in the Hadoop cluster according to the current usage state of the cluster.
  • the device resource usage rate of the set is too high, the node device is added in the cluster to avoid insufficient node resources in the Hadoop system.
  • the device resource usage of the set is too low, the node devices in the cluster are reduced to avoid unnecessary waste of device resources.
  • the method for adjusting the number of nodes in the Hadoop system of this embodiment may include the following steps:
  • Step 201 Receive current hardware usage status information sent by each node device in the cluster, where the node device is an activated node device.
  • the hardware usage status information transmitted by the node device through the JMX framework may be received, and the received information is JMX information.
  • the method in this embodiment can receive the hardware usage status information transmitted by the node device in multiple manners, for example, by receiving the JMX information by using the network.
  • the network protocol is first required to process the received JMX information (analysis). , CRC check, calculation, processing, etc.), for example, to restore the original hardware usage information.
  • the system uses multi-threading technology to ensure that data can be processed quickly, preventing buffer overflow and data loss, thereby realizing real-time processing requirements.
  • Step 202 Acquire indicator information for describing a current usage state of the cluster according to the received hardware usage status information, where the indicator information includes at least one indicator.
  • the hardware usage information sent by the node device in the cluster includes: the device disk space usage rate, the current CPU usage of the device, and the current memory usage rate of the device;
  • the metric information used to describe the current usage status of the cluster is calculated according to the hardware usage information, including: total disk space usage in the cluster, average CPU usage of the devices in the cluster, and average memory usage of the devices in the cluster.
  • Step 203 Compare the value of each indicator with a corresponding preset lower limit threshold and a corresponding preset upper limit threshold.
  • Step 204 Determine, according to the comparison result, whether the device resource usage rate of the cluster is too high or too low. If the usage is too high, go to step 205. If it is too low, go to step 206.
  • this step can determine whether the device resource usage rate of the cluster is too high or too low according to the total disk space usage in the cluster, the average CPU usage of the devices in the cluster, and the average memory usage of the devices in the cluster.
  • each of the indicators adopts the same preset range as an example, and each indicator is compared with an upper threshold and a lower threshold in a preset range; when there is a value of at least one indicator greater than an upper threshold, the device is determined.
  • the resource usage rate is too high.
  • the value of the indicator is less than the lower threshold, it is determined that the device resource usage rate is too low.
  • this step may compare each indicator with an upper threshold and a lower threshold in a preset range, and when the value of the at least one indicator continues to be greater than the upper threshold for a preset period of time When the device resource usage rate is too high, when the value of the at least one indicator continues to be less than the lower threshold in the preset time period, it is determined that the device resource usage rate is too low.
  • Step 205 Control, by the IPMI of the node device that is not started, that the unstarted node device starts and joins the cluster to increase the number of node devices that are started in the cluster.
  • the number of node devices may be one or more, and the number of the added devices may be preset, for example, one by default or determined according to the indicator information.
  • the step may specifically: sending a power-on command to the IPMI of the un-started node device to enable the un-started node device to start and join the cluster to increase the number of node devices in the cluster.
  • the IPMI address list of the unstarted DataNode node can be read, and the IPMI address of the preset number of node devices is selected.
  • the method sends a power-on command to the corresponding device, and after waiting for the node device to start, records the state of the device: moves the list of DataNode nodes that have not been started from the node to the list of activated nodes; the device that is not started receives the power-on command through the IPMI.
  • the boot command is transmitted to the BMC (Baseboard Management Controller) component in the node device to start.
  • the unstarted node device automatically starts and installs the HDFS component.
  • the HDFS has been installed according to the configuration of the system boot entry.
  • the component is started with the operating system and joined to the cluster, which expands the Hadoop environment and solves the problem of insufficient resources for the original environment by automatically adding node devices.
  • Step 206 Control, by the IPMI of the node device that is started in the cluster, that the activated node device is shut down to reduce the number of node devices that are started in the cluster.
  • the number of node devices that are specifically reduced in the method of this embodiment may be one or more, and the reduced number may be preset, for example, by default, one node device is reduced or determined according to indicator information.
  • the step may specifically be: sending a shutdown command to the IPMI of the node device that is started in the cluster to shut down the activated node device.
  • the IPMI address list of the activated DataNode node can be read, and the IPMI address of the predetermined number of node devices is selected.
  • the shutdown command is sent to the device through the http mode, and the shutdown command is transmitted to the BMC component in the node device to implement shutdown; after the node device is shut down, the state of the device is recorded: the node is moved from the list of activated DataNode nodes to The node list is not started; after the device is shut down by IPMI, the node is temporarily removed from the Hadoop environment, thereby solving the problem of wasted node resources in the environment.
  • the method of the embodiment automatically increases or decreases the number of nodes in the Hadoop system based on the IPMI, and the performance data of each node is obtained by the collectors in the nodes in each cluster, and is reported to the system through the JMX framework, and the system integrates and analyzes the information. According to the analysis result, the IPMI is used to control the start or stop of the target device, thereby achieving the purpose of dynamically and automatically adjusting the number of devices in the cluster.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • this embodiment provides a method for adjusting the number of nodes in a Hadoop system, which is applied to a node device in a Hadoop system cluster, and includes the following steps:
  • Step 301 After the node device is started, collect hardware usage status information of the node device itself.
  • the device can collect the hardware usage information of the node device, for example, at least one of a device disk space usage rate, a current CPU usage rate of the device, and a current memory usage rate of the device.
  • the data conversion table may be established on the collected hardware usage status information, and the hardware usage status information is processed based on the data conversion table to form a data table.
  • Step 302 Send the collected hardware usage information to the receiving end, so that the receiving end obtains the indicator information used to describe the current usage state of the cluster according to the hardware usage information, and then adjusts the The number of node devices that are started.
  • this step may send the collected hardware usage information by using the JMX framework used by the Hadoop system.
  • JMX information can be sent out through a network such as TCP/IP.
  • the method of the embodiment can be used to collect the hardware usage information of the device, so that the management terminal dynamically adjusts the number of node devices in the cluster, and solves the problem that the node device resources cannot be dynamically adjusted in the current Hadoop system or the nodes in the Hadoop system. Technical problems with insufficient equipment resources.
  • the method of this embodiment may further include: receiving, by using an IPMI, a shutdown command, and closing after receiving the shutdown instruction.
  • the number of node devices in the cluster is reduced, and the shutdown command can be received through the IPMI, and then the shutdown command is transmitted to the BMC component in the node device, and the shutdown is implemented through the BMC component.
  • Embodiment 3 is a diagrammatic representation of Embodiment 3
  • this embodiment provides a device for adjusting the number of nodes in a Hadoop system, including: a receiving module 401, an analyzing module 402, and a processing module 403;
  • the receiving module 401 is configured to receive current hardware usage status information sent by each node device in the cluster, where the node device is an activated node device;
  • the analyzing module 402 is configured to obtain, according to the received hardware usage status information, indicator information used to describe a current usage status of the cluster;
  • the processing module 403 is configured to adjust the number of node devices that have been started in the cluster according to the indicator information.
  • the processing module 403 in this embodiment includes: a determining module 4031 and an adjusting module 4032;
  • the determining module 4031 is configured to determine, according to the indicator information, whether a device resource usage rate of the cluster is too high or too low;
  • the adjustment module 4032 is configured to:
  • the determining module 4031 determines that the device resource usage rate of the cluster is too low, the node device that is started in the cluster is controlled to be closed to reduce the number of node devices that are started in the cluster;
  • the determining module 4031 determines that the device resource usage rate of the cluster is too high, the node device that controls the initiating is started and joins the cluster to increase the number of node devices that are started in the cluster.
  • the adjusting module 4032 is configured to:
  • the activated node device is turned off by the IPMI of the node device that has been started within the cluster to reduce the number of node devices that have been started within the cluster.
  • the adjusting device of the present invention can dynamically adjust the number of node devices in the Hadoop cluster according to the current usage state of the cluster, and meet the requirements of the cluster to the node device resources, thereby improving the utilization of the node devices in the Hadoop system and reducing unnecessary resource waste. And solve the technical problem of insufficient node device resources in the Hadoop system.
  • Embodiment 4 is a diagrammatic representation of Embodiment 4:
  • the embodiment provides a device for adjusting the number of nodes in a Hadoop system, and the device is applied to a node device in a Hadoop system cluster, and includes: an acquisition module 601 and a sending module 602;
  • the collecting module 601 is configured to collect hardware usage status information of the node device itself after the node device is started;
  • the sending module 602 is configured to send and receive the collected hardware usage status information, so that the receiving end acquires indicator information used to describe the current usage status of the cluster according to the hardware usage status information, and then adjusts according to the indicator information.
  • the number of node devices that have been started within the cluster is configured to send and receive the collected hardware usage status information, so that the receiving end acquires indicator information used to describe the current usage status of the cluster according to the hardware usage status information, and then adjusts according to the indicator information. The number of node devices that have been started within the cluster.
  • the sending module 602 of the embodiment is configured to send and receive the collected hardware usage information by using the JMX framework used by the Hadoop system.
  • the adjustment apparatus of this embodiment further includes: an IPMI interface 604 and a control module 603;
  • the IPMI interface 604 is configured to receive an externally transmitted shutdown command and transmit the shutdown command to the control module;
  • the control module 603 is configured to shut down the node device after receiving the shutdown command.
  • the adjusting device can collect hardware usage information of the device for the management terminal to dynamically adjust the number of node devices in the cluster, and solve the problem that the node device resources in the Hadoop system cannot be dynamically adjusted or the node device in the Hadoop system. Technical problems with insufficient resources.
  • Embodiment 5 is a diagrammatic representation of Embodiment 5:
  • the embodiment provides an adjustment system for the number of nodes in the Hadoop system, including: a plurality of unstarted first node devices 80, and a plurality of activated second node devices 81 located in the Hadoop system cluster. And node management device 82;
  • the first node device 80 is provided with an IPMI interface 801
  • the second node device 81 is provided with an IPMI interface 813, an acquisition module 811, and a sending module 812;
  • the node management device 82 includes: a receiving module 821, an analysis module 822, and a processing module 823;
  • the collection module 811 of the second node device 81 is configured to collect hardware usage status data of the node device where the module is located;
  • the collecting module 812 of the second node device 81 is specifically configured to establish a data conversion table for the collected basic data, process the state data of the host based on the data conversion table, form a data table, and report the data through the JMX method.
  • the data is used to describe the hardware usage status of the host node, and mainly includes the total disk space of the host, the used disk space of the host, the current CPU usage percentage, the current memory usage percentage, and the like; and then the data table is sent through the sending module.
  • the sending module 813 of the second node device 81 is configured to send the collected data to the receiving module 821 in the node management device 82, specifically, can be sent to the receiving module 821 through the JMX framework;
  • the receiving module 821 in the node management device 82 is configured to receive data reported by the sending module 813, such as a data table.
  • the receiving module is responsible for the collection and preliminary processing of the data table, wherein the specific process comprises two parts: receiving of the original data, processing of the original data.
  • the underlying interface may be a network interface, and communicates with the sending module of the node device by using TCP/IP.
  • the original data received from the acquisition module is received through the interface provided by the system, and submitted to the data receiving module.
  • the data receiving module shall parse the original data according to the protocol of different devices, and perform CRC (Cyclic Redundancy Check Code) verification, calculation, and processing. After processing, submit it to the data analysis module.
  • CRC Cyclic Redundancy Check Code
  • the analysis module 822 in the node management device 82 is configured to sort the performance data transmitted by the receiving module 821; based on the data table sent by the collection module, the receiving module extracts the corresponding type of data information according to the keywords in the table, and according to different types of data. Different rules calculate metric information that accurately describes the state of the entire cluster, such as the average CPU usage of hosts in the cluster, the average memory usage percentage of hosts in the cluster, and the percentage of total disk space usage in the cluster. And determining whether the indicator data is reasonable, the specific determining step is: comparing the obtained indicator data with the set threshold range, if the current indicator data is within a normal threshold range, no additional operations are performed, and the loop is continuously collected. , receiving, etc.; if the current indicator data is outside the normal threshold range, the result is passed to the processing module.
  • the processing module 823 determines, according to the data obtained by the analysis module 822, if the current indicator data is higher than the normal threshold range, the current system resource usage rate is high, and there is a risk of insufficient resources. If the indicator data continues to be above the threshold for a period of time, then all device lists are read, an unstarted first node device is found, and a boot command is sent to its IPMI; since the IPMI can take the system management software from the system hardware The platform management tasks are separated and the underlying server management functions are separated from the high-level software, so IPMI is not limited by the way the operating system is managed. IPMI can be remotely booted by specifying a generic, streamlined, message-based interface that transmits information to the device's BMC component.
  • the first node device according to the configuration of the system boot entry, the Hadoop component that has been installed will start with the operating system and join the cluster, thereby increasing the system resources available in the cluster. On the contrary, if the current indicator data is lower than the normal threshold range, the current system resource usage rate is low, and there is a waste of resources. If the indicator data continues to be below the threshold for a period of time, then all device lists are read, a second node device that has been activated is found, and a shutdown command is sent to its IPMI.
  • the remote shutdown command can be implemented through the BMC component of the device.
  • a system for automatically increasing or decreasing the number of nodes in a Hadoop system based on IPMI is proposed.
  • the acquisition module of each node in the cluster acquires performance data of each node, and reports it to the management device, and the management device controls the target according to the result.
  • the device is started or stopped to achieve the purpose of dynamically and automatically adjusting the number of node devices in the cluster.
  • the technical solution provided by the embodiment of the present invention can be used in the process of adjusting the number of nodes in the system, and the method for adjusting the number of nodes in the system provided by the embodiment of the present invention includes: receiving the current current sent by each node device in the cluster The hardware usage information is obtained, and the indicator information used to describe the current usage status of the cluster is obtained according to the received hardware usage information; and the number of node devices that are started in the cluster is adjusted according to the indicator information; Dynamically adjust the number of node devices in the Hadoop cluster according to the current usage status of the cluster to meet the requirements of the cluster to the node device resources, thereby improving the utilization of node devices in the Hadoop system, reducing unnecessary resource waste, and solving nodes in the Hadoop system. Technical problems with insufficient equipment resources.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)

Abstract

A method of adjusting a number of nodes in a system and device utilizing the same. The method comprises: receiving current hardware use condition information transmitted by each of node devices in a cluster (101); acquiring, according to the received hardware use information, index information describing a current use condition of the cluster (102); and adjusting, according to the index information, a number of activated node devices in the cluster (103). The adjusting method addresses the current technical problem of node device resource wastage or node device resource shortage in a Hadoop system resulting from being unable to dynamically adjust the number of the nodes in the Hadoop system.

Description

系统内节点数的调整方法和装置Method and device for adjusting number of nodes in system 技术领域Technical field
本发明涉及分布式存储技术领域,尤其涉及一种系统内节点数的调整方法和装置。The present invention relates to the field of distributed storage technologies, and in particular, to a method and an apparatus for adjusting the number of nodes in a system.
背景技术Background technique
Hadoop是一个能够对大量数据进行分布式处理的开源软件框架。Hadoop的存储核心是Hadoop分布式文件系统(Hadoop Distributed File System,简称为HDFS)。HDFS适合运行在通用的硬件上,并需要部署在大量设备上,以此来支持大规模的数据集和高吞吐量的数据访问,并通过可以分布在不同设备上的多个副本数来实现高容错性。HDFS所部署的大量设备组成Hadoop集群。Hadoop is an open source software framework for distributed processing of large amounts of data. The storage core of Hadoop is the Hadoop Distributed File System (HDFS). HDFS is suitable for running on general-purpose hardware and needs to be deployed on a large number of devices to support large-scale data sets and high-throughput data access, and to achieve high numbers by multiple copies that can be distributed across different devices. Fault tolerance. A large number of devices deployed by HDFS form a Hadoop cluster.
然而,目前Hadoop系统中的节点数都是在使用初期即定义,并不能根据实际的数据量进行动态减少或增加,造成不必要的节点设备资源浪费或Hadoop系统内节点资源不足的技术问题。However, the number of nodes in the current Hadoop system is defined at the beginning of use, and cannot be dynamically reduced or increased according to the actual amount of data, resulting in unnecessary waste of node device resources or insufficient technical resources in the Hadoop system.
发明内容Summary of the invention
本发明实施例要解决的主要技术问题是,提供一种系统内节点数的调整方法和装置,能够解决目前Hadoop系统中节点数不能动态调整造成的节点设备资源浪费或Hadoop系统内节点设备资源不足的技术问题。The main technical problem to be solved by the embodiments of the present invention is to provide a method and a device for adjusting the number of nodes in the system, which can solve the problem that the node device resources in the current Hadoop system cannot be dynamically adjusted or the node device resources in the Hadoop system are insufficient. Technical problem.
为解决上述技术问题,本发明实施例提供一种系统内节点数的调整方法,包括如下步骤:To solve the above technical problem, an embodiment of the present invention provides a method for adjusting the number of nodes in a system, including the following steps:
接收集群中各节点设备发送的自身当前的硬件使用状况信息,所述节点设备为已启动的节点设备;Receiving current hardware usage status information sent by each node device in the cluster, where the node device is an activated node device;
根据接收到的硬件使用状况信息获取用于描述所述集群当前使用状态的指标信息;Obtaining indicator information for describing a current usage status of the cluster according to the received hardware usage status information;
根据所述指标信息调整所述集群内已启动的节点设备的数量。Adjusting the number of node devices that have been started in the cluster according to the indicator information.
在本发明实施例中,所述根据所述指标信息调整所述集群内已启动的节点设备的数量的步骤包括: In the embodiment of the present invention, the step of adjusting the number of node devices that have been started in the cluster according to the indicator information includes:
根据所述指标信息判断所述集群的设备资源使用率是过高还是过低,Determining, according to the indicator information, whether the device resource usage rate of the cluster is too high or too low,
若所述集群的设备资源使用率过低,则减少所述集群内已启动的节点设备的数量;If the device resource usage rate of the cluster is too low, reducing the number of node devices that are started in the cluster;
若所述集群的设备资源使用率过高,则增加所述集群内已启动的节点设备的数量。If the device resource usage of the cluster is too high, the number of node devices that have been started in the cluster is increased.
在本发明实施例中,所述增加所述集群内已启动的节点设备的数量的步骤包括:In the embodiment of the present invention, the step of increasing the number of node devices that have been started in the cluster includes:
控制未启动的节点设备启动并加入所述集群以增加所述集群内已启动的节点设备的数量;Controlling that the node device that is not started starts and joins the cluster to increase the number of node devices that have been started in the cluster;
所述减少所述集群内已启动的节点设备的数量的步骤包括:The step of reducing the number of node devices that have been started in the cluster includes:
控制所述集群内已启动的节点设备关闭以减少所述集群内已启动的节点设备的数量。The node devices that have been started within the cluster are controlled to be shut down to reduce the number of node devices that have been started within the cluster.
在本发明实施例中,所述控制未启动的节点设备启动并加入所述集群以增加所述集群内已启动的节点设备的数量的步骤包括:In the embodiment of the present invention, the step of controlling the node device that is not started to start and join the cluster to increase the number of node devices that are started in the cluster includes:
通过未启动的节点设备的IPMI(Intelligent Platform Management Interface,智能管理平台接口)控制所述未启动的节点设备启动并加入所述集群以增加所述集群内已启动的节点设备的数量;Controlling, by the IPMI (Intelligent Platform Management Interface) of the node device that is not activated, that the unstarted node device starts up and joins the cluster to increase the number of node devices that are started in the cluster;
所述控制所述集群内已启动的节点设备关闭以减少所述集群内节点设备的数量的步骤包括:The step of controlling the started node device in the cluster to be closed to reduce the number of node devices in the cluster includes:
通过所述集群内已启动的节点设备的IPMI控制所述已启动的节点设备关闭以减少所述集群内已启动的节点设备的数量。The activated node device is turned off by the IPMI of the node device that has been started within the cluster to reduce the number of node devices that have been started within the cluster.
在本发明实施例中,所述通过未启动的节点设备的IPMI控制所述未启动的节点设备启动并加入所述集群的步骤包括:In the embodiment of the present invention, the step of controlling, by the IPMI of the node device that is not started, the initiating node device to start and join the cluster includes:
向所述未启动的节点设备的IPMI(智能平台管理接口)发送开机指令以使未启动的节点设备启动并加入所述集群;Sending a boot command to an IPMI (Intelligent Platform Management Interface) of the unactivated node device to enable an unstarted node device to boot and join the cluster;
所述通过所述集群内已启动的节点设备的IPMI控制所述已启动的节点设备关闭的步骤包括:The step of controlling, by the IPMI of the node device that has been started in the cluster, that the activated node device is shut down includes:
向所述集群内已启动的节点设备的IPMI发送关机指令以使所述已启动的节点设备关闭。 A shutdown command is sent to the IPMI of the node device that has been started within the cluster to cause the activated node device to shut down.
在本发明实施例中,所述指标信息包括至少一个指标,所述根据所述指标信息判断所述集群的设备资源使用率是过高还是过低的步骤包括:In the embodiment of the present invention, the indicator information includes at least one indicator, and the step of determining, according to the indicator information, whether the device resource usage rate of the cluster is too high or too low includes:
将每项指标的值与各自对应的预设下限阈值和各自对应的预设上限阈值进行比较;Comparing the values of each indicator with their respective preset lower thresholds and their respective preset upper thresholds;
根据比较结果判断所述集群的设备资源使用率是过高还是过低。The device resource usage rate of the cluster is determined to be too high or too low according to the comparison result.
在本发明实施例中,所述根据比较结果判断所述集群的设备资源使用率是过高还是过低的步骤包括:In the embodiment of the present invention, the step of determining, according to the comparison result, whether the device resource usage rate of the cluster is too high or too low includes:
当存在至少一项指标的值在预设时间段内持续小于各自对应的预设下限阈值时,则判定所述集群的设备资源使用率过低;Determining that the device resource usage rate of the cluster is too low, when the value of the at least one indicator continues to be less than the corresponding preset lower threshold in the preset time period;
当存在至少一项指标的值在预设时间段内持续大于各自对应的预设上行阈值时,则判定所述集群的设备资源使用率过高。When the value of the at least one indicator continues to be greater than the corresponding preset uplink threshold in the preset time period, it is determined that the device resource usage rate of the cluster is too high.
在本发明实施例中,所述硬件使用状况信息包括:设备磁盘空间使用率、设备当前CPU使用率和设备当前内存使用率的至少一种;In the embodiment of the present invention, the hardware usage status information includes: at least one of a device disk space usage rate, a current CPU usage rate of the device, and a current memory usage rate of the device;
对应地所述指标信息包括:集群内总磁盘空间使用率、集群内设备平均CPU使用率和集群内设备平均内存使用率中的至少一种。Correspondingly, the indicator information includes: at least one of a total disk space usage rate in the cluster, an average CPU usage rate of the devices in the cluster, and an average memory usage rate of the devices in the cluster.
同样为了解决上述的技术问题,本发明还提供了另一种系统内节点数的调整方法,所述方法应用于系统集群中的节点设备,包括如下步骤:Also in order to solve the above technical problem, the present invention also provides another method for adjusting the number of nodes in the system. The method is applied to a node device in a system cluster, and includes the following steps:
在所述节点设备启动后,采集所述节点设备自身的硬件使用状况信息;After the node device is started, collecting hardware usage status information of the node device itself;
将采集到的硬件使用状况信息发送给出去,以供接收端根据硬件使用状况信息获取用于描述所述集群当前使用状态的指标信息,然后根据所述指标信息调整所述集群内已启动的节点设备的数量。Sending the collected hardware usage information to the receiving end, so that the receiving end obtains the indicator information used to describe the current usage state of the cluster according to the hardware usage information, and then adjusts the activated node in the cluster according to the indicator information. The number of devices.
在本发明实施例中,所述将采集到的硬件使用状况信息发送给出去的步骤包括:In the embodiment of the present invention, the step of sending the collected hardware usage information to the collected includes:
通过所述系统使用的JMX(Java Management Extensions,Java管理扩展)框架将采集到的硬件使用状况信息发送给出去。The collected hardware usage information is sent and given by the JMX (Java Management Extensions) framework used by the system.
同样为了解决上述的技术问题,本发明实施例还提供了一种系统内节点数的调整装置,包括:接收模块、分析模块和处理模块; Also, in order to solve the above technical problem, the embodiment of the present invention further provides an apparatus for adjusting the number of nodes in the system, including: a receiving module, an analyzing module, and a processing module;
所述接收模块,设置为接收集群中各节点设备发送的自身当前的硬件使用状况信息,所述节点设备为已启动的节点设备;The receiving module is configured to receive current hardware usage status information sent by each node device in the cluster, where the node device is an activated node device;
所述分析模块,设置为根据接收到的硬件使用状况信息获取用于描述所述集群当前使用状态的指标信息;The analyzing module is configured to obtain, according to the received hardware usage status information, indicator information used to describe a current usage status of the cluster;
所述处理模块,设置为根据所述指标信息调整所述集群内已启动的节点设备的数量。The processing module is configured to adjust the number of node devices that have been started in the cluster according to the indicator information.
在本发明实施例中,所述处理模块包括:判断模块和调整模块;In the embodiment of the present invention, the processing module includes: a determining module and an adjusting module;
所述判断模块,设置为根据所述指标信息判断所述集群的设备资源使用率是过高还是过低;The determining module is configured to determine, according to the indicator information, whether a device resource usage rate of the cluster is too high or too low;
所述调整模块,设置为:The adjustment module is set to:
在所述判断模块判断所述集群的设备资源使用率过低时,控制所述集群内已启动的节点设备关闭以减少所述集群内已启动的节点设备的数量;When the determining module determines that the device resource usage rate of the cluster is too low, the node device that is started in the cluster is controlled to be closed to reduce the number of node devices that are started in the cluster;
在所述判断模块判断所述集群的设备资源使用率过高时,控制未启动的节点设备启动并加入所述集群以增加所述集群内已启动的节点设备的数量。When the determining module determines that the device resource usage rate of the cluster is too high, the node device that controls the unstarting starts and joins the cluster to increase the number of node devices that are started in the cluster.
在本发明实施例中,所述调整模块,设置为:In the embodiment of the present invention, the adjustment module is configured to:
通过未启动的节点设备的IPMI控制所述未启动的节点设备启动并加入所述集群以增加所述集群内已启动的节点设备的数量;Controlling, by the IPMI of the node device that is not started, that the unstarted node device starts up and joins the cluster to increase the number of node devices that are started in the cluster;
通过所述集群内已启动的节点设备的IPMI控制所述已启动的节点设备关闭以减少所述集群内已启动的节点设备的数量。The activated node device is turned off by the IPMI of the node device that has been started within the cluster to reduce the number of node devices that have been started within the cluster.
同样为了解决上述的技术问题,本发明还提供了另一种系统内节点数的调整装置,所述调整装置应用于系统集群中的节点设备,包括:采集模块和发送模块;In order to solve the above technical problem, the present invention further provides another apparatus for adjusting the number of nodes in the system, where the adjusting apparatus is applied to a node device in a system cluster, including: an acquisition module and a sending module;
所述采集模块,设置为在所述节点设备启动后,采集所述节点设备自身的硬件使用状况信息;The collecting module is configured to collect hardware usage status information of the node device itself after the node device is started;
所述发送模块,设置为将采集到的硬件使用状况信息发送给出去,以供接收端根据硬件使用状况信息获取用于描述所述集群当前使用状态的指标信息,然后根据所述指标信息调整所述集群内已启动的节点设备的数量。 The sending module is configured to send the collected hardware usage status information, so that the receiving end obtains indicator information used to describe the current usage status of the cluster according to the hardware usage status information, and then adjusts the location according to the indicator information. The number of node devices that have been started in the cluster.
在本发明实施例中,所述调整装置,还包括IPMI和控制模块;In the embodiment of the present invention, the adjusting device further includes an IPMI and a control module;
所述IPMI,设置为接收外部传输的关机指令并将所述关机指令传输给所述控制模块;The IPMI is configured to receive a shutdown command of an external transmission and transmit the shutdown command to the control module;
所述控制模块,设置为在接收到所述关机指令后关闭所述节点设备。The control module is configured to shut down the node device after receiving the shutdown command.
本发明实施例的有益效果是:The beneficial effects of the embodiments of the present invention are:
本发明实施例提供了一种系统内节点数的调整方法和装置,其中本发明实施例提供的系统内节点数的调整方法,包括:接收集群中各节点设备发送的自身当前的硬件使用状况信息;根据接收到的硬件使用信息获取用于描述所述集群当前使用状态的指标信息;根据所述指标信息调整所述集群内已启动的节点设备的数量;本发明的调整方法可以根据集群当前的使用状态动态地调整Hadoop集群内节点设备的数量,满足集群对节点设备资源的要求,从而可以提高Hadoop系统内节点设备的利用率减少不需要的资源浪费,以及解决Hadoop系统内节点设备资源不足的技术问题。An embodiment of the present invention provides a method and an apparatus for adjusting the number of nodes in a system, where the method for adjusting the number of nodes in the system according to the embodiment of the present invention includes: receiving current hardware usage status information sent by each node device in the cluster. And obtaining, according to the received hardware usage information, indicator information used to describe a current usage state of the cluster; and adjusting, according to the indicator information, a number of node devices that are started in the cluster; the adjustment method of the present invention may be based on a current cluster The usage status dynamically adjusts the number of node devices in the Hadoop cluster to meet the requirements of the cluster to the node device resources, thereby improving the utilization of node devices in the Hadoop system, reducing unnecessary resource waste, and solving the shortage of node device resources in the Hadoop system. technical problem.
附图说明DRAWINGS
图1为本发明实施例一提供的一种Hadoop系统内节点数的调整方法的流程示意图;1 is a schematic flowchart of a method for adjusting a number of nodes in a Hadoop system according to Embodiment 1 of the present invention;
图2为本发明实施例一提供的另一种Hadoop系统内节点数的调整方法的流程示意图;2 is a schematic flowchart of another method for adjusting the number of nodes in a Hadoop system according to Embodiment 1 of the present invention;
图3为本发明实施例二提供的一种Hadoop系统内节点数的调整方法的流程示意图;3 is a schematic flowchart of a method for adjusting a number of nodes in a Hadoop system according to Embodiment 2 of the present invention;
图4为本发明实施例三提供的一种Hadoop系统内节点数的调整装置的结构示意图;4 is a schematic structural diagram of an apparatus for adjusting a number of nodes in a Hadoop system according to Embodiment 3 of the present invention;
图5为本发明实施例三提供的另一种Hadoop系统内节点数的调整装置的结构示意图;5 is a schematic structural diagram of another apparatus for adjusting the number of nodes in a Hadoop system according to Embodiment 3 of the present invention;
图6为本发明实施例四供的一种Hadoop系统内节点数的调整装置的结构示意图;6 is a schematic structural diagram of an apparatus for adjusting a number of nodes in a Hadoop system according to Embodiment 4 of the present invention;
图7为本发明实施例三提供的另一种Hadoop系统内节点数的调整装置的结构示意图; FIG. 7 is a schematic structural diagram of another apparatus for adjusting the number of nodes in a Hadoop system according to Embodiment 3 of the present invention; FIG.
图8为本发明实施例四提供的一种Hadoop系统内节点数的调整系统的结构示意图。FIG. 8 is a schematic structural diagram of a system for adjusting the number of nodes in a Hadoop system according to Embodiment 4 of the present invention.
具体实施方式detailed description
下面通过具体实施方式结合附图对本发明作进一步详细说明。The present invention will be further described in detail below with reference to the accompanying drawings.
考虑到目前Hadoop系统中节点数不能动态调整造成的节点设备资源浪费或Hadoop系统内节点资源不足的技术问题,本发明提供了一种系统内节点数的调整方法和装置,可以根据集群当前的使用状态动态地调整系统集群内节点设备的数量,下面以对Hadoop系统内节点数为例来介绍本发明的调整方法和装置。应当理解的是:本实施例方法和装置不仅限于对Hadoop系统内的节点数进行动态调整,还可以对其他系统内的节点数据进行动态调整。Considering the technical problem of waste of node device resources or insufficient node resources in the Hadoop system due to the fact that the number of nodes in the Hadoop system cannot be dynamically adjusted, the present invention provides a method and device for adjusting the number of nodes in the system, which can be used according to the current use of the cluster. The state dynamically adjusts the number of node devices in the system cluster. The following describes the adjustment method and apparatus of the present invention by taking the number of nodes in the Hadoop system as an example. It should be understood that the method and apparatus of this embodiment are not limited to dynamically adjusting the number of nodes in the Hadoop system, and can dynamically adjust node data in other systems.
实施例一:Embodiment 1:
本实施例提供了一种Hadoop系统内节点数的调整方法,如图1所示,可以包括如下步骤:This embodiment provides a method for adjusting the number of nodes in a Hadoop system. As shown in FIG. 1, the method may include the following steps:
步骤101:接收集群中各节点设备发送的自身当前的硬件使用状况信息,所述节点设备为已启动的节点设备。Step 101: Receive current hardware usage status information sent by each node device in the cluster, where the node device is an activated node device.
本实施例中集群为Hadoop集群,在集群中的各节点设备需要采集自身的硬件使用状况信息,然后发送出去。在本实施例中Hadoop集群中的各节点设备均为已启动的节点设备,各节点设备在启动后均安装HDFS组件,以使得Hadoop可以进行数据访问等处理。In this embodiment, the cluster is a Hadoop cluster, and each node device in the cluster needs to collect its own hardware usage status information, and then sends it out. In this embodiment, each node device in the Hadoop cluster is a node device that has been started, and each node device installs an HDFS component after startup, so that Hadoop can perform data access and the like.
在本发明实施例中,本实施例中硬件使用状态信息可以包括:设备当前磁盘空间使用率(即设备当前磁盘空间使用百分比)、设备当前CPU使用率(及设备当前CPU使用百分比)和设备当前内存使用率(即设备当前内存使用百分比)的至少一种。In the embodiment of the present invention, the hardware usage status information in this embodiment may include: the current disk space usage rate of the device (ie, the current disk space usage percentage of the device), the current CPU usage of the device (and the current CPU usage percentage of the device), and the current device. At least one of memory usage (that is, the current memory usage percentage of the device).
在本实施例中接收节点设备发送的硬件使用状况信息的方法可以为多种,例如可以通过网络接收,具体地通过传输控制协议(Transfer Control Protocol,简称为TCP)/互联网协议(Internet Protocol,简称为IP)方式与节点设备通讯接收节点设备采集到的硬件使用状况信息。In this embodiment, the method for receiving the hardware usage information sent by the node device may be multiple, for example, may be received through a network, specifically by using a Transmission Control Protocol (TCP)/Internet Protocol (Internet Protocol, referred to as Communicate with the node device for the IP) mode to receive the hardware usage status information collected by the node device.
步骤102:根据接收到的硬件使用信息获取用于描述所述集群当前使用状态的指标信息。 Step 102: Acquire indicator information for describing a current usage state of the cluster according to the received hardware usage information.
具体地,本实施例中可以对各节点上发送的硬件使用状况信息进行计算得到描述所述集群当前使用状态的指标信息。本实施例中集群是由节点设备组成的集合,因此,本实施例中描述集群当前使用状态的指标信息可以由集群中设备的硬件使用信息来获取,该指标信息可以与集群中节点设备的硬件使用状况信息对应,例如当指标信息为集群内总磁盘空间使用率时,那个集群内的节点设备采集的硬件使用状况信息可以包括该设备磁盘空间使用率。Specifically, in this embodiment, the hardware usage status information sent by each node may be calculated to obtain indicator information describing the current usage status of the cluster. In this embodiment, the cluster is a set of node devices. Therefore, the indicator information describing the current state of use of the cluster in this embodiment may be obtained by using hardware usage information of the device in the cluster, and the indicator information may be related to the hardware of the node device in the cluster. The usage status information corresponds to, for example, when the indicator information is the total disk space usage rate in the cluster, the hardware usage status information collected by the node devices in the cluster may include the device disk space usage rate.
在本发明实施例中,本实施例中指标信息可以包括:集群内总磁盘空间使用率(即集群内总磁盘空间使用百分比)、集群内设备平均CPU使用率(集群内设备平均CPU使用百分比)和集群内设备平均内存使用率(集群内设备平均内存百分比)中的至少一种。In the embodiment of the present invention, the indicator information in this embodiment may include: total disk space usage in the cluster (that is, the percentage of total disk space usage in the cluster), and average CPU usage of the devices in the cluster (average CPU usage percentage of devices in the cluster) And at least one of the average memory usage of the devices in the cluster (the average percentage of devices in the cluster).
步骤103:根据所述指标信息调整所述集群内已启动的节点设备的数量。Step 103: Adjust the number of node devices that have been started in the cluster according to the indicator information.
本实施例方法根据所述指标信息可以减少或增加集群内已启动的节点设备的数量;本实施例中调整集群内节点设备已启动的节点设备的数量可简称为调整集群内节点设备的数量,因此,本文所述的调整(例如增加或者减少)集群内节点设备的数量均指的是调整(例如增加或者减少)集群内已启动的节点设备的数量。The method of the embodiment may reduce or increase the number of node devices that have been started in the cluster according to the indicator information. In this embodiment, the number of node devices that have been activated by the node devices in the cluster may be referred to as adjusting the number of node devices in the cluster. Thus, the adjustments (e.g., increase or decrease) of the number of node devices within a cluster as described herein refer to adjusting (e.g., increasing or decreasing) the number of node devices that have been activated within the cluster.
本实施例方法可以根据指标信息来判断集群的设备资源使用率是过高还是否过低,The method in this embodiment can determine, according to the indicator information, whether the device resource usage rate of the cluster is too high or too low.
若所述集群内设备资源使用率过低,则减少所述集群内已启动的节点设备的数量;If the device resource usage rate in the cluster is too low, the number of node devices that are started in the cluster is reduced;
若所述集群内设备资源使用率过高,则增加所述集群内已启动的节点设备的数量。If the device resource usage rate in the cluster is too high, the number of node devices that have been started in the cluster is increased.
在本发明实施例中,当本实施例中指标信息包括至少一个指标时,本实施例中根据所述指标信息判断所述集群的设备资源使用率是过高还是过低的步骤包括:In the embodiment of the present invention, when the indicator information in the embodiment includes at least one indicator, the step of determining, according to the indicator information, whether the device resource usage rate of the cluster is too high or too low, in the embodiment includes:
将每项指标的值与各自对应的预设下限阈值和各自对应的预设上限阈值进行比较;Comparing the values of each indicator with their respective preset lower thresholds and their respective preset upper thresholds;
根据比较结果判断所述集群的设备资源使用率是过高还是过低。The device resource usage rate of the cluster is determined to be too high or too low according to the comparison result.
具体地,所述根据比较结果判断所述集群的设备资源使用率是过高还是过低的步骤包括:Specifically, the step of determining, according to the comparison result, whether the device resource usage rate of the cluster is too high or too low includes:
当存在至少一项指标的值在预设时间段内持续小于各自对应的预设下限阈值时,则判定所述集群的设备资源使用率过低; Determining that the device resource usage rate of the cluster is too low, when the value of the at least one indicator continues to be less than the corresponding preset lower threshold in the preset time period;
当存在至少一项指标的值在预设时间段内持续大于各自对应的预设上行阈值时,则判定所述集群的设备资源使用率过高。When the value of the at least one indicator continues to be greater than the corresponding preset uplink threshold in the preset time period, it is determined that the device resource usage rate of the cluster is too high.
本实施例方法可以当指标对应的指标值在预设时间段内持续小于对应的预设下限阈值时,则判定所述集群的设备资源使用率过低;例如集群内设备平均CPU使用率在1分钟内持续大于预设上行阈值,则判定所述集群的设备资源使用率过低。In this embodiment, when the indicator value corresponding to the indicator continues to be less than the corresponding preset lower threshold in the preset time period, it is determined that the device resource usage rate of the cluster is too low; for example, the average CPU usage of the device in the cluster is 1 If the number of minutes exceeds the preset uplink threshold, the device resource usage of the cluster is determined to be too low.
当所述指标对应的指值在预设时间段内持续大于对应的预设上行阈值时,则判定所述集群的设备资源使用率过高。例如集群内设备平均CPU使用率在1分钟内持续大于预设上行阈值,判定所述集群的设备资源使用率过高。When the indicator corresponding to the indicator continues to be greater than the corresponding preset uplink threshold in the preset time period, it is determined that the device resource usage rate of the cluster is too high. For example, the average CPU usage of the devices in the cluster continues to be greater than the preset uplink threshold in one minute, and the device resource usage of the cluster is determined to be too high.
当然本实施例方法也可以不考虑时间因素,在将每项指标的值与各自对应的预设下限阈值和各自对应的预设上限阈值进行比较之后,如果存在至少一项指标的值小于各自对应的预设下限阈值时,则判定所述集群的设备资源使用率过低;如果存在至少一项指标的值大于各自对应的预设上行阈值时,则判定所述集群的设备资源使用率过高。Certainly, the method of the embodiment may also consider the time factor. After comparing the value of each indicator with the corresponding preset lower limit threshold and the corresponding preset upper limit threshold, if at least one indicator has a value smaller than the corresponding If the preset lower threshold is used, it is determined that the device resource usage rate of the cluster is too low; if the value of the at least one indicator is greater than the corresponding preset uplink threshold, the device resource usage rate of the cluster is determined to be too high. .
在指标信息包括:多个指标时,例如包括:集群内总磁盘空间使用率(即集群内总磁盘空间使用百分比)、集群内设备平均CPU使用率(集群内设备平均CPU使用百分比)和集群内设备平均内存使用率;可以将各个指标的值与各自对应的预设下限阈值进行比较;如果至少一个指标的值小于各自对应的预设下限阈值时(例如集群内总磁盘空间使用率小于预设下限阈值和/或集群内设备平均内存使用流率小于预设下限阈值时),则判定集群的设备资源使用率过低,此时存在资源浪费的情况,需要减少集群内的节点设备减少资源浪费;如果至少一个指标的值大于预设的上限阈值时(例如集群内总磁盘空间使用率大于预设上限阈值和/或集群内设备平均内存使用流率大于预设上限阈值时),则判定所述集群的设备资源使用率过高,此时存在设备资源不足的风险,需要增加集群内的节点设备。When the indicator information includes: multiple indicators, for example, the total disk space usage in the cluster (that is, the total disk space usage percentage in the cluster), the average CPU usage of the devices in the cluster (the average CPU usage percentage of the devices in the cluster), and the intra-cluster The average memory usage of the device; the value of each indicator can be compared with the corresponding preset lower threshold; if the value of at least one indicator is less than the corresponding lower threshold, for example, the total disk space usage in the cluster is less than the preset. If the lower threshold and/or the average memory usage rate of the device in the cluster is lower than the preset lower threshold, the device resource usage of the cluster is too low. In this case, the resource is wasted. If the value of at least one indicator is greater than a preset upper threshold (for example, if the total disk space usage in the cluster is greater than a preset upper threshold and/or the average memory usage flow rate in the cluster is greater than a preset upper threshold), then The device resource usage of the cluster is too high. At this time, there is a risk of insufficient device resources. You need to increase the cluster. Node device inside.
本实施例中在指标信息包括多个指标时,可以预先设置一个统一的下限阈值和上限阈值,当然,也可以对应不同指标设置不同的上下限阈值。In this embodiment, when the indicator information includes multiple indicators, a unified lower threshold and an upper threshold may be set in advance. Of course, different upper and lower thresholds may be set corresponding to different indicators.
在本实施例中增加所述集群内节点设备的数量的优先方式包括:控制未启动的节点设备启动并加入所述集群以增加所述集群内节点设备的数量。本实施例可以通过未启动的节点设备的IPMI控制所述未启动的节点设备启动并加入所述集群以增加所述集群内节点设备的数量。本实施例中未启动的节点设备可以为集群内未启动的节点设备也可以为集群外未启动的节点设备。 The preferred manner of increasing the number of node devices in the cluster in this embodiment includes: controlling an unstarted node device to start and join the cluster to increase the number of node devices in the cluster. In this embodiment, the un-started node device can be controlled to join and join the cluster by the IPMI of the node device that is not activated to increase the number of node devices in the cluster. The node device that is not started in this embodiment may be a node device that is not started in the cluster or a node device that is not activated outside the cluster.
在本实施例中减少所述集群内节点设备的数量的优先方式包括:控制所述集群内已启动的节点设备关闭以减少所述集群内节点设备的数量。具体地,通过所述集群内已启动的节点设备的IPMI(智能平台管理接口)控制所述已启动的节点设备关闭以减少所述集群内节点设备的数量。The preferred manner of reducing the number of node devices in the cluster in this embodiment includes: controlling the node devices that have been started in the cluster to be closed to reduce the number of node devices in the cluster. Specifically, the activated node device is controlled to be closed by an IPMI (Intelligent Platform Management Interface) of the node device that has been started in the cluster to reduce the number of node devices in the cluster.
本实施例的调整方法可以根据集群当前的使用状态动态地调整Hadoop集群内节点设备的数量,当集合的设备资源使用率过高时,在集群中增加节点设备,避免Hadoop系统内节点资源不足,当集合的设备资源使用率过低时,减少集群中的节点设备避免不需要的设备资源浪费。The adjustment method in this embodiment can dynamically adjust the number of node devices in the Hadoop cluster according to the current usage state of the cluster. When the device resource usage rate of the set is too high, the node device is added in the cluster to avoid insufficient node resources in the Hadoop system. When the device resource usage of the set is too low, the node devices in the cluster are reduced to avoid unnecessary waste of device resources.
根据上述的描述,本实施例的Hadoop系统内节点数的调整方法,如图2所示,可以包括如下步骤:According to the above description, the method for adjusting the number of nodes in the Hadoop system of this embodiment, as shown in FIG. 2, may include the following steps:
步骤201:接收集群中各节点设备发送的自身当前的硬件使用状况信息,所述节点设备为已启动的节点设备。Step 201: Receive current hardware usage status information sent by each node device in the cluster, where the node device is an activated node device.
具体地,可以接收节点设备通过JMX框架方式传输的的硬件使用状况信息,此时接收到的信息为JMX信息。本实施例方法可以通过多种方式接收节点设备传输的硬件使用状况信息,例如通过以网络来接收JMX信息,那么为获取硬件使用状况信息,首先需要安装网络协议对接收到JMX信息进行处理(解析、CRC校验、计算、处理等)例如还原为原始的硬件使用状况信息。Specifically, the hardware usage status information transmitted by the node device through the JMX framework may be received, and the received information is JMX information. The method in this embodiment can receive the hardware usage status information transmitted by the node device in multiple manners, for example, by receiving the JMX information by using the network. To obtain the hardware usage status information, the network protocol is first required to process the received JMX information (analysis). , CRC check, calculation, processing, etc.), for example, to restore the original hardware usage information.
另外,考虑到发送数据的设备可能众多,为了保证处理数据的实时性,系统会使用多线程技术,保证数据能够被快速处理,防止缓冲区堆满溢出、丢失数据,从而达到实时处理的要求。In addition, considering that there may be many devices for transmitting data, in order to ensure the real-time processing data, the system uses multi-threading technology to ensure that data can be processed quickly, preventing buffer overflow and data loss, thereby realizing real-time processing requirements.
步骤202:根据接收到的硬件使用状况信息获取用于描述所述集群当前使用状态的指标信息,所述指标信息包括至少一个指标。Step 202: Acquire indicator information for describing a current usage state of the cluster according to the received hardware usage status information, where the indicator information includes at least one indicator.
例如,当集群内节点设备发送的硬件使用状况信息包括:设备磁盘空间使用率、设备当前CPU使用率和设备当前内存使用率时;For example, when the hardware usage information sent by the node device in the cluster includes: the device disk space usage rate, the current CPU usage of the device, and the current memory usage rate of the device;
本步骤可以根据硬件使用状况信息计算出用于描述所述集群当前使用状态的指标信息,具体包括:集群内总磁盘空间使用率、集群内设备平均CPU使用率和集群内设备平均内存使用率。In this step, the metric information used to describe the current usage status of the cluster is calculated according to the hardware usage information, including: total disk space usage in the cluster, average CPU usage of the devices in the cluster, and average memory usage of the devices in the cluster.
步骤203:将每项指标的值与各自对应的预设下限阈值和各自对应的预设上限阈值进行比较。 Step 203: Compare the value of each indicator with a corresponding preset lower limit threshold and a corresponding preset upper limit threshold.
步骤204:根据比较结果判断所述集群的设备资源使用率是过高还是过低,若过高,执行步骤205,若过低,则执行步骤206。Step 204: Determine, according to the comparison result, whether the device resource usage rate of the cluster is too high or too low. If the usage is too high, go to step 205. If it is too low, go to step 206.
例如,本步骤可以根据集群内总磁盘空间使用率、集群内设备平均CPU使用率和集群内设备平均内存使用率来判断所述集群的设备资源使用率是过高还是过低。For example, this step can determine whether the device resource usage rate of the cluster is too high or too low according to the total disk space usage in the cluster, the average CPU usage of the devices in the cluster, and the average memory usage of the devices in the cluster.
具体地,以所有指标均采用同一个的预设范围为例,将每个指标与预设范围中的上限阈值和下限阈值进行比较;当存在至少一个指标的值大于上限阈值时,则判断设备资源使用率过高,当存在指标的值小于下限阈值时,则判断设备资源使用率过低。Specifically, each of the indicators adopts the same preset range as an example, and each indicator is compared with an upper threshold and a lower threshold in a preset range; when there is a value of at least one indicator greater than an upper threshold, the device is determined. The resource usage rate is too high. When the value of the indicator is less than the lower threshold, it is determined that the device resource usage rate is too low.
为更精确地判断资源过高还是过低,本步骤可以将每个指标与预设范围中的上限阈值和下限阈值进行比较,当存在至少一个指标的值在预设时间段内持续大于上限阈值时,则判定设备资源使用率过高,当存在至少一个指标的值在预设时间段内持续小于下限阈值时,则判定设备资源使用率过低。In order to more accurately determine whether the resource is too high or too low, this step may compare each indicator with an upper threshold and a lower threshold in a preset range, and when the value of the at least one indicator continues to be greater than the upper threshold for a preset period of time When the device resource usage rate is too high, when the value of the at least one indicator continues to be less than the lower threshold in the preset time period, it is determined that the device resource usage rate is too low.
步骤205:通过未启动的节点设备的IPMI控制所述未启动的节点设备启动并加入所述集群以增加所述集群内已启动的节点设备的数量。Step 205: Control, by the IPMI of the node device that is not started, that the unstarted node device starts and joins the cluster to increase the number of node devices that are started in the cluster.
本实施例方法中具体增加节点设备的数量可以为一个或者多个,增加的数量可以预先设置,例如默认增加一个或者根据指标信息来确定。The number of node devices may be one or more, and the number of the added devices may be preset, for example, one by default or determined according to the indicator information.
本步骤具体可以:向所述未启动的节点设备的IPMI发送开机指令以使未启动的节点设备启动并加入所述集群从而增加集群内节点设备的数量。The step may specifically: sending a power-on command to the IPMI of the un-started node device to enable the un-started node device to start and join the cluster to increase the number of node devices in the cluster.
在判断集群的设备资源过高时,需要增加集群内的节点设备的数量,因此,本实施例可以读取未启动的DataNode节点的IPMI地址列表,选择预设数量节点设备的IPMI地址,通过http方式向对应的设备发送开机命令,等待此节点设备启动后,记录此设备的状态:将此节点从未启动的DataNode节点列表移动至已启动节点列表;未启动的设备通过IPMI接收到开机命令,将开机命令传递给节点设备中的BMC(基板管理控制器)组件即可启动,未启动的节点设备自动启动并安装HDFS组件,节点设备启动后根据系统开机启动项的配置,已经安装完成的HDFS组件会随操作系统启动并加入集群,从而扩容了Hadoop环境,通过自动增加节点设备解决因原环境使用资源不足的问题。When it is determined that the device resources of the cluster are too high, the number of node devices in the cluster needs to be increased. Therefore, in this embodiment, the IPMI address list of the unstarted DataNode node can be read, and the IPMI address of the preset number of node devices is selected. The method sends a power-on command to the corresponding device, and after waiting for the node device to start, records the state of the device: moves the list of DataNode nodes that have not been started from the node to the list of activated nodes; the device that is not started receives the power-on command through the IPMI. The boot command is transmitted to the BMC (Baseboard Management Controller) component in the node device to start. The unstarted node device automatically starts and installs the HDFS component. After the node device starts, the HDFS has been installed according to the configuration of the system boot entry. The component is started with the operating system and joined to the cluster, which expands the Hadoop environment and solves the problem of insufficient resources for the original environment by automatically adding node devices.
步骤206:通过所述集群内已启动的节点设备的IPMI控制所述已启动的节点设备关闭以减少所述集群内已启动的节点设备的数量。 Step 206: Control, by the IPMI of the node device that is started in the cluster, that the activated node device is shut down to reduce the number of node devices that are started in the cluster.
本实施例方法中具体减少节点设备的数量可为一个或多个,减少的数量可以预先设置,例如默认减少一个节点设备或者根据指标信息来确定。The number of node devices that are specifically reduced in the method of this embodiment may be one or more, and the reduced number may be preset, for example, by default, one node device is reduced or determined according to indicator information.
本步骤具体可以为:向所述集群内已启动的节点设备的IPMI发送关机指令以使所述已启动的节点设备关闭。The step may specifically be: sending a shutdown command to the IPMI of the node device that is started in the cluster to shut down the activated node device.
在判断集群的设备资源使用率过低时,需要减少集群内的节点设备的数量,因此,本实施例可以读取已启动的DataNode节点的IPMI地址列表,选择预定数量的节点设备的IPMI地址,通过http方式向该设备发送关机命令,将关机命令传递给节点设备中的BMC组件即实现关机;等待此节点设备关闭后,记录此设备的状态:将此节点从已启动的DataNode节点列表移动至未启动节点列表;通过IPMI将设备关闭后,即将此节点从Hadoop环境中暂时移除,从而解决环境使用节点资源浪费的问题。When it is determined that the device resource usage of the cluster is too low, the number of node devices in the cluster needs to be reduced. Therefore, in this embodiment, the IPMI address list of the activated DataNode node can be read, and the IPMI address of the predetermined number of node devices is selected. The shutdown command is sent to the device through the http mode, and the shutdown command is transmitted to the BMC component in the node device to implement shutdown; after the node device is shut down, the state of the device is recorded: the node is moved from the list of activated DataNode nodes to The node list is not started; after the device is shut down by IPMI, the node is temporarily removed from the Hadoop environment, thereby solving the problem of wasted node resources in the environment.
本实施例方法基于IPMI来自动增减Hadoop系统中节点数的方法,由各个集群内节点中的采集器获取各自节点的性能数据,通过JMX框架上报至系统中,由系统将这些信息进行整合分析,根据分析的结果通过IPMI控制目标设备的启动或停止,从而达到动态自动调整集群内设备数量的目的。The method of the embodiment automatically increases or decreases the number of nodes in the Hadoop system based on the IPMI, and the performance data of each node is obtained by the collectors in the nodes in each cluster, and is reported to the system through the JMX framework, and the system integrates and analyzes the information. According to the analysis result, the IPMI is used to control the start or stop of the target device, thereby achieving the purpose of dynamically and automatically adjusting the number of devices in the cluster.
实施例二:Embodiment 2:
如图3所示,本实施例提供了一种Hadoop系统内节点数的调整方法,应用于Hadoop系统集群中的节点设备,包括如下步骤:As shown in FIG. 3, this embodiment provides a method for adjusting the number of nodes in a Hadoop system, which is applied to a node device in a Hadoop system cluster, and includes the following steps:
步骤301:在所述节点设备启动后,采集所述节点设备自身的硬件使用状况信息。Step 301: After the node device is started, collect hardware usage status information of the node device itself.
本实施例方法在集群中节点设备启动后,可以采集节点设备的硬件使用状况信息,例如包括:设备磁盘空间使用率、设备当前CPU使用率和设备当前内存使用率的至少一种。After the node device is started in the cluster, the device can collect the hardware usage information of the node device, for example, at least one of a device disk space usage rate, a current CPU usage rate of the device, and a current memory usage rate of the device.
在本发明实施例中,还可以对采集到的硬件使用状况信息建立数据转换表,基于该数据转换表对硬件使用状况信息进行处理,形成数据表。In the embodiment of the present invention, the data conversion table may be established on the collected hardware usage status information, and the hardware usage status information is processed based on the data conversion table to form a data table.
步骤302:将采集到的硬件使用状况信息发送给出去,以供接收端根据硬件使用状况信息获取用于描述所述集群当前使用状态的指标信息,然后根据所述指标信息调整所述集群内已启动的节点设备的数量。 Step 302: Send the collected hardware usage information to the receiving end, so that the receiving end obtains the indicator information used to describe the current usage state of the cluster according to the hardware usage information, and then adjusts the The number of node devices that are started.
具体地,本步骤可以通过所述Hadoop系统使用的JMX框架将采集到的硬件使用状况信息发送给出去。在本发明实施例中,可以通过网络例如TCP/IP方式将JMX信息发送出去。Specifically, this step may send the collected hardware usage information by using the JMX framework used by the Hadoop system. In the embodiment of the present invention, JMX information can be sent out through a network such as TCP/IP.
在实际操作中可以在开源的Hadoop框架中添加功能,使得HDFS组件可以执行步骤301和302。In practice, functions can be added to the open source Hadoop framework so that the HDFS component can perform steps 301 and 302.
应用本实施例的方法可以采集设备的硬件使用状况信息以供管理端对集群内节点设备的数量进行动态调整,解决目前Hadoop系统中节点数不能动态调整造成的节点设备资源浪费或Hadoop系统内节点设备资源不足的技术问题。The method of the embodiment can be used to collect the hardware usage information of the device, so that the management terminal dynamically adjusts the number of node devices in the cluster, and solves the problem that the node device resources cannot be dynamically adjusted in the current Hadoop system or the nodes in the Hadoop system. Technical problems with insufficient equipment resources.
在本发明实施例中,本实施例方法还可以包括:通过IPMI接收关机指令,在接收所述关机指令后关闭。在对集群内节点设备数量进行减少数,可以通过IPMI接收关机指令,然后将该关机指令传递给节点设备内的BMC组件,通过BMC组件实现关机。In the embodiment of the present invention, the method of this embodiment may further include: receiving, by using an IPMI, a shutdown command, and closing after receiving the shutdown instruction. The number of node devices in the cluster is reduced, and the shutdown command can be received through the IPMI, and then the shutdown command is transmitted to the BMC component in the node device, and the shutdown is implemented through the BMC component.
实施例三:Embodiment 3:
如图4所示,本实施例提供了一种Hadoop系统内节点数的调整装置,包括:接收模块401、分析模块402和处理模块403;As shown in FIG. 4, this embodiment provides a device for adjusting the number of nodes in a Hadoop system, including: a receiving module 401, an analyzing module 402, and a processing module 403;
所述接收模块401,设置为接收集群中各节点设备发送的自身当前的硬件使用状况信息,所述节点设备为已启动的节点设备;The receiving module 401 is configured to receive current hardware usage status information sent by each node device in the cluster, where the node device is an activated node device;
所述分析模块402,设置为根据接收到的硬件使用状况信息获取用于描述所述集群当前使用状态的指标信息;The analyzing module 402 is configured to obtain, according to the received hardware usage status information, indicator information used to describe a current usage status of the cluster;
所述处理模块403,设置为根据所述指标信息调整所述集群内已启动的节点设备的数量。The processing module 403 is configured to adjust the number of node devices that have been started in the cluster according to the indicator information.
在本发明实施例中,如图5所示,本实施例中处理模块403包括:判断模块4031和调整模块4032;In the embodiment of the present invention, as shown in FIG. 5, the processing module 403 in this embodiment includes: a determining module 4031 and an adjusting module 4032;
所述判断模块4031,设置为根据所述指标信息判断所述集群的设备资源使用率是过高还是过低;The determining module 4031 is configured to determine, according to the indicator information, whether a device resource usage rate of the cluster is too high or too low;
所述调整模块4032,设置为: The adjustment module 4032 is configured to:
在所述判断模块4031判断所述集群的设备资源使用率过低时,控制所述集群内已启动的节点设备关闭以减少所述集群内已启动的节点设备的数量;When the determining module 4031 determines that the device resource usage rate of the cluster is too low, the node device that is started in the cluster is controlled to be closed to reduce the number of node devices that are started in the cluster;
在所述判断模块4031判断所述集群的设备资源使用率过高时,控制未启动的节点设备启动并加入所述集群以增加所述集群内已启动的节点设备的数量。When the determining module 4031 determines that the device resource usage rate of the cluster is too high, the node device that controls the initiating is started and joins the cluster to increase the number of node devices that are started in the cluster.
在本发明实施例中,所述调整模块4032,设置为:In the embodiment of the present invention, the adjusting module 4032 is configured to:
通过未启动的节点设备的IPMI控制所述未启动的节点设备启动并加入所述集群以增加所述集群内已启动的节点设备的数量;Controlling, by the IPMI of the node device that is not started, that the unstarted node device starts up and joins the cluster to increase the number of node devices that are started in the cluster;
通过所述集群内已启动的节点设备的IPMI控制所述已启动的节点设备关闭以减少所述集群内已启动的节点设备的数量。The activated node device is turned off by the IPMI of the node device that has been started within the cluster to reduce the number of node devices that have been started within the cluster.
本发明的调整装置可以根据集群当前的使用状态动态地调整Hadoop集群内节点设备的数量,满足集群对节点设备资源的要求,从而可以提高Hadoop系统内节点设备的利用率减少不需要的资源浪费,以及解决Hadoop系统内节点设备资源不足的技术问题。The adjusting device of the present invention can dynamically adjust the number of node devices in the Hadoop cluster according to the current usage state of the cluster, and meet the requirements of the cluster to the node device resources, thereby improving the utilization of the node devices in the Hadoop system and reducing unnecessary resource waste. And solve the technical problem of insufficient node device resources in the Hadoop system.
实施例四:Embodiment 4:
如图6所示,本实施例提供了一种Hadoop系统内节点数的调整装置,所述调整装置应用于Hadoop系统集群中的节点设备,包括:采集模块601和发送模块602;As shown in FIG. 6, the embodiment provides a device for adjusting the number of nodes in a Hadoop system, and the device is applied to a node device in a Hadoop system cluster, and includes: an acquisition module 601 and a sending module 602;
所述采集模块601,设置为在所述节点设备启动后,采集所述节点设备自身的硬件使用状况信息;The collecting module 601 is configured to collect hardware usage status information of the node device itself after the node device is started;
所述发送模块602,设置为将采集到的硬件使用状况信息发送给出去,以供接收端根据硬件使用状况信息获取用于描述所述集群当前使用状态的指标信息,然后根据所述指标信息调整所述集群内已启动的节点设备的数量。The sending module 602 is configured to send and receive the collected hardware usage status information, so that the receiving end acquires indicator information used to describe the current usage status of the cluster according to the hardware usage status information, and then adjusts according to the indicator information. The number of node devices that have been started within the cluster.
在本发明实施例中,本实施例发送模块602,设置为通过所述Hadoop系统使用的JMX框架将采集到的硬件使用状况信息发送给出去。In the embodiment of the present invention, the sending module 602 of the embodiment is configured to send and receive the collected hardware usage information by using the JMX framework used by the Hadoop system.
在本发明实施例中,如图7所示,本实施例的调整装置还包括:IPMI接口604和控制模块603;In the embodiment of the present invention, as shown in FIG. 7, the adjustment apparatus of this embodiment further includes: an IPMI interface 604 and a control module 603;
所述IPMI接口604,设置为接收外部传输的关机指令并将所述关机指令传输给所述控制模块; The IPMI interface 604 is configured to receive an externally transmitted shutdown command and transmit the shutdown command to the control module;
所述控制模块603,设置为在接收到所述关机指令后关闭所述节点设备。The control module 603 is configured to shut down the node device after receiving the shutdown command.
本实施例调整装置可以采集设备的硬件使用状况信息以供管理端对集群内节点设备的数量进行动态调整,解决目前Hadoop系统中节点数不能动态调整造成的节点设备资源浪费或Hadoop系统内节点设备资源不足的技术问题。In this embodiment, the adjusting device can collect hardware usage information of the device for the management terminal to dynamically adjust the number of node devices in the cluster, and solve the problem that the node device resources in the Hadoop system cannot be dynamically adjusted or the node device in the Hadoop system. Technical problems with insufficient resources.
实施例五:Embodiment 5:
如图8所示,本实施例提供了一种Hadoop系统内节点数的调整系统,包括:多个未启动的第一节点设备80、位于Hadoop系统集群内多个已启动的第二节点设备81和节点管理装置82;As shown in FIG. 8 , the embodiment provides an adjustment system for the number of nodes in the Hadoop system, including: a plurality of unstarted first node devices 80, and a plurality of activated second node devices 81 located in the Hadoop system cluster. And node management device 82;
其中第一节点设备80设有IPMI接口801,第二节点设备81设有IPMI接口813、采集模块811和发送模块812;The first node device 80 is provided with an IPMI interface 801, and the second node device 81 is provided with an IPMI interface 813, an acquisition module 811, and a sending module 812;
所述节点管理装置82包括:接收模块821、分析模块822和处理模块823;The node management device 82 includes: a receiving module 821, an analysis module 822, and a processing module 823;
第二节点设备81中的采集模块811,设置为采集该模块所在的节点设备的硬件使用状况数据;The collection module 811 of the second node device 81 is configured to collect hardware usage status data of the node device where the module is located;
所述第二节点设备81中的采集模块812具体地用于对采集到的基本数据建立数据转换表,基于所述的数据转换表对主机的状态数据进行处理,形成数据表,经由JMX方式上报给系统。所述数据用于描述该主机节点的硬件使用状况,主要包括主机总磁盘空间、主机已使用磁盘空间、当前CPU使用百分比、当前内存使用百分比等;然后将所述数据表通过发送模块发出。The collecting module 812 of the second node device 81 is specifically configured to establish a data conversion table for the collected basic data, process the state data of the host based on the data conversion table, form a data table, and report the data through the JMX method. Give the system. The data is used to describe the hardware usage status of the host node, and mainly includes the total disk space of the host, the used disk space of the host, the current CPU usage percentage, the current memory usage percentage, and the like; and then the data table is sent through the sending module.
第二节点设备81中的发送模块813,设置为将采集到的数据发送给节点管理装置82中的接收模块821,具体地,可以通过JMX框架发送给接收模块821;The sending module 813 of the second node device 81 is configured to send the collected data to the receiving module 821 in the node management device 82, specifically, can be sent to the receiving module 821 through the JMX framework;
所述节点管理装置82中的接收模块821,设置为接收发送模块813上报的数据,例如数据表。接收模块负责数据表的采集和初步处理,其中具体过程包括两部分:原始数据的接收、原始数据的加工处理。例如其底层接口可以为网络接口,采用TCP/IP方式与节点设备的发送模块通讯。通过系统提供的接口,接收来自采集模块的原始数据,递交数据接收模块。数据接收模块要按照不同设备的协议对原始数据进行解析、CRC(循环冗余校验码)校验、计算、处理。处理完毕后递交给数据分析模块。考虑到发送数据的设备可能众多,为了保证处理数据的实时性,系统会使用多线程技术,保证数据能够被快速处理,防止缓冲区堆满溢出、丢失数据,从而达到实时处理的要求。 The receiving module 821 in the node management device 82 is configured to receive data reported by the sending module 813, such as a data table. The receiving module is responsible for the collection and preliminary processing of the data table, wherein the specific process comprises two parts: receiving of the original data, processing of the original data. For example, the underlying interface may be a network interface, and communicates with the sending module of the node device by using TCP/IP. The original data received from the acquisition module is received through the interface provided by the system, and submitted to the data receiving module. The data receiving module shall parse the original data according to the protocol of different devices, and perform CRC (Cyclic Redundancy Check Code) verification, calculation, and processing. After processing, submit it to the data analysis module. Considering that there are many devices that can send data, in order to ensure the real-time processing of data, the system uses multi-threading technology to ensure that data can be processed quickly, preventing buffer overflow and data loss, thus realizing the requirements of real-time processing.
节点管理装置82中的分析模块822,设置为整理由接收模块821传上来的性能数据;基于采集模块发送的数据表,接收模块根据表中关键字提取对应类型的数据信息,并按照不同类型数据的不同规则,计算出可准确描述整个集群状态的指标信息,如集群内主机平均CPU使用百分比、集群内主机平均内存使用百分比、集群内总磁盘空间使用百分比等。并判断指标数据是否合理,所述具体判断步骤为:将得到的指标数据与所设定的阈值范围作对比,若当前指标数据处于正常的阈值范围内,则不进行额外操作,继续循环进行采集、接收等行为;若当前指标数据再正常阈值范围之外,则将结果传递给处理模块。The analysis module 822 in the node management device 82 is configured to sort the performance data transmitted by the receiving module 821; based on the data table sent by the collection module, the receiving module extracts the corresponding type of data information according to the keywords in the table, and according to different types of data. Different rules calculate metric information that accurately describes the state of the entire cluster, such as the average CPU usage of hosts in the cluster, the average memory usage percentage of hosts in the cluster, and the percentage of total disk space usage in the cluster. And determining whether the indicator data is reasonable, the specific determining step is: comparing the obtained indicator data with the set threshold range, if the current indicator data is within a normal threshold range, no additional operations are performed, and the loop is continuously collected. , receiving, etc.; if the current indicator data is outside the normal threshold range, the result is passed to the processing module.
所述处理模块823,根据分析模块822得到的数据判断结果,若当前指标数据高于正常的阈值范围,则说明当前系统的资源使用率较高,存在资源不足的风险。若指标数据持续高于阈值并持续一段时间,则读取所有设备列表,查找到一台未启动的第一节点设备,并向其IPMI发送开机指令;由于IPMI可以把系统管理软件从系统的硬件平台管理任务中分离出来,并且把底层的服务器管理功能与高层软件分开,所以IPMI并不受操作系统管理方式的限制。IPMI通过指定通用、精简、基于消息的接口,将信息传输到设备的BMC组件,即可以实现远程开机的指令。第一节点设备根据系统开机启动项的配置,已经安装完成的Hadoop组件会随操作系统启动并加入集群,以此来增加集群中可用的系统资源。相反,若当前指标数据低于正常的阈值范围,则说明当前系统的资源使用率偏低,存在资源浪费的情况。若指标数据持续低于阈值并持续一段时间,则读取所有设备列表,查找到一台已启动的第二节点设备,并向其IPMI发送关机指令。通过设备的BMC组件,即可以实现远程关机的指令。The processing module 823 determines, according to the data obtained by the analysis module 822, if the current indicator data is higher than the normal threshold range, the current system resource usage rate is high, and there is a risk of insufficient resources. If the indicator data continues to be above the threshold for a period of time, then all device lists are read, an unstarted first node device is found, and a boot command is sent to its IPMI; since the IPMI can take the system management software from the system hardware The platform management tasks are separated and the underlying server management functions are separated from the high-level software, so IPMI is not limited by the way the operating system is managed. IPMI can be remotely booted by specifying a generic, streamlined, message-based interface that transmits information to the device's BMC component. The first node device according to the configuration of the system boot entry, the Hadoop component that has been installed will start with the operating system and join the cluster, thereby increasing the system resources available in the cluster. On the contrary, if the current indicator data is lower than the normal threshold range, the current system resource usage rate is low, and there is a waste of resources. If the indicator data continues to be below the threshold for a period of time, then all device lists are read, a second node device that has been activated is found, and a shutdown command is sent to its IPMI. The remote shutdown command can be implemented through the BMC component of the device.
本实施例提出了基于IPMI来自动增减Hadoop系统中节点数的系统,由各个集群内节点中的采集模块获取各自节点的性能数据,上报至管理装置中,由管理装置根据结果通IPMI控制目标设备的启动或停止,从而达到动态自动调整集群内节点设备数量的目的。In this embodiment, a system for automatically increasing or decreasing the number of nodes in a Hadoop system based on IPMI is proposed. The acquisition module of each node in the cluster acquires performance data of each node, and reports it to the management device, and the management device controls the target according to the result. The device is started or stopped to achieve the purpose of dynamically and automatically adjusting the number of node devices in the cluster.
以上内容是结合具体的实施方式对本发明所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干简单推演或替换,都应当视为属于本发明的保护范围。The above is a further detailed description of the present invention in connection with the specific embodiments, and the specific embodiments of the present invention are not limited to the description. It will be apparent to those skilled in the art that the present invention may be made without departing from the spirit and scope of the invention.
工业实用性Industrial applicability
本发明实施例提供的技术方案可以用于系统内节点数的调整过程中,本发明实施例提供的系统内节点数的调整方法,包括:接收集群中各节点设备发送的自身当前的 硬件使用状况信息;根据接收到的硬件使用信息获取用于描述所述集群当前使用状态的指标信息;根据所述指标信息调整所述集群内已启动的节点设备的数量;本发明的调整方法可以根据集群当前的使用状态动态地调整Hadoop集群内节点设备的数量,满足集群对节点设备资源的要求,从而可以提高Hadoop系统内节点设备的利用率减少不需要的资源浪费,以及解决Hadoop系统内节点设备资源不足的技术问题。 The technical solution provided by the embodiment of the present invention can be used in the process of adjusting the number of nodes in the system, and the method for adjusting the number of nodes in the system provided by the embodiment of the present invention includes: receiving the current current sent by each node device in the cluster The hardware usage information is obtained, and the indicator information used to describe the current usage status of the cluster is obtained according to the received hardware usage information; and the number of node devices that are started in the cluster is adjusted according to the indicator information; Dynamically adjust the number of node devices in the Hadoop cluster according to the current usage status of the cluster to meet the requirements of the cluster to the node device resources, thereby improving the utilization of node devices in the Hadoop system, reducing unnecessary resource waste, and solving nodes in the Hadoop system. Technical problems with insufficient equipment resources.

Claims (16)

  1. 一种系统内节点数的调整方法,包括如下步骤:A method for adjusting the number of nodes in a system includes the following steps:
    接收集群中各节点设备发送的自身当前的硬件使用状况信息,所述节点设备为已启动的节点设备;Receiving current hardware usage status information sent by each node device in the cluster, where the node device is an activated node device;
    根据接收到的硬件使用状况信息获取用于描述所述集群当前使用状态的指标信息;Obtaining indicator information for describing a current usage status of the cluster according to the received hardware usage status information;
    根据所述指标信息调整所述集群内已启动的节点设备的数量。Adjusting the number of node devices that have been started in the cluster according to the indicator information.
  2. 如权利要求1所述的方法,其中,所述根据所述指标信息调整所述集群内已启动的节点设备的数量的步骤包括:The method of claim 1, wherein the step of adjusting the number of node devices activated in the cluster according to the indicator information comprises:
    根据所述指标信息判断所述集群的设备资源使用率是过高还是过低,Determining, according to the indicator information, whether the device resource usage rate of the cluster is too high or too low,
    若所述集群的设备资源使用率过低,则减少所述集群内已启动的节点设备的数量;If the device resource usage rate of the cluster is too low, reducing the number of node devices that are started in the cluster;
    若所述集群的设备资源使用率过高,则增加所述集群内已启动的节点设备的数量。If the device resource usage of the cluster is too high, the number of node devices that have been started in the cluster is increased.
  3. 如权利要求2所述的方法,其中,所述增加所述集群内已启动的节点设备的数量的步骤包括:The method of claim 2 wherein said step of increasing the number of node devices activated within said cluster comprises:
    控制未启动的节点设备启动并加入所述集群以增加所述集群内已启动的节点设备的数量;Controlling that the node device that is not started starts and joins the cluster to increase the number of node devices that have been started in the cluster;
    所述减少所述集群内已启动的节点设备的数量的步骤包括:The step of reducing the number of node devices that have been started in the cluster includes:
    控制所述集群内已启动的节点设备关闭以减少所述集群内已启动的节点设备的数量。The node devices that have been started within the cluster are controlled to be shut down to reduce the number of node devices that have been started within the cluster.
  4. 如权利要求3所述的方法,其中,所述控制未启动的节点设备启动并加入所述集群以增加所述集群内已启动的节点设备的数量的步骤包括:The method of claim 3, wherein the step of controlling the unstarted node device to boot and join the cluster to increase the number of node devices activated within the cluster comprises:
    通过未启动的节点设备的智能管理平台接口IPMI控制所述未启动的节点设备启动并加入所述集群以增加所述集群内已启动的节点设备的数量;Controlling, by the intelligent management platform interface IPMI of the node device that is not activated, that the unstarted node device starts and joins the cluster to increase the number of node devices that are started in the cluster;
    所述控制所述集群内已启动的节点设备关闭以减少所述集群内节点设备的数量的步骤包括: The step of controlling the started node device in the cluster to be closed to reduce the number of node devices in the cluster includes:
    通过所述集群内已启动的节点设备的IPMI控制所述已启动的节点设备关闭以减少所述集群内已启动的节点设备的数量。The activated node device is turned off by the IPMI of the node device that has been started within the cluster to reduce the number of node devices that have been started within the cluster.
  5. 如权利要求4所述的方法,其中,所述通过未启动的节点设备的IPMI控制所述未启动的节点设备启动并加入所述集群的步骤包括:The method of claim 4, wherein the step of controlling, by the IPMI of the node device that is not activated, the initiating node device to boot and join the cluster comprises:
    向所述未启动的节点设备的IPMI发送开机指令以使未启动的节点设备启动并加入所述集群;Sending a boot command to the IPMI of the unactivated node device to enable the unstarted node device to boot and join the cluster;
    所述通过所述集群内已启动的节点设备的IPMI控制所述已启动的节点设备关闭的步骤包括:The step of controlling, by the IPMI of the node device that has been started in the cluster, that the activated node device is shut down includes:
    向所述集群内已启动的节点设备的IPMI发送关机指令以使所述已启动的节点设备关闭。A shutdown command is sent to the IPMI of the node device that has been started within the cluster to cause the activated node device to shut down.
  6. 如权利要求2所述的方法,其中,所述指标信息包括至少一个指标,所述根据所述指标信息判断所述集群的设备资源使用率是过高还是过低的步骤包括:The method of claim 2, wherein the indicator information comprises at least one indicator, and the step of determining, according to the indicator information, whether the device resource usage rate of the cluster is too high or too low comprises:
    将每项指标的值与各自对应的预设下限阈值和各自对应的预设上限阈值进行比较;Comparing the values of each indicator with their respective preset lower thresholds and their respective preset upper thresholds;
    根据比较结果判断所述集群的设备资源使用率是过高还是过低。The device resource usage rate of the cluster is determined to be too high or too low according to the comparison result.
  7. 如权利要求6所述的方法,其中,所述根据比较结果判断所述集群的设备资源使用率是过高还是过低的步骤包括:The method of claim 6, wherein the step of determining, according to the comparison result, whether the device resource usage rate of the cluster is too high or too low comprises:
    当存在至少一项指标的值在预设时间段内持续小于各自对应的预设下限阈值时,则判定所述集群的设备资源使用率过低;Determining that the device resource usage rate of the cluster is too low, when the value of the at least one indicator continues to be less than the corresponding preset lower threshold in the preset time period;
    当存在至少一项指标的值在预设时间段内持续大于各自对应的预设上行阈值时,则判定所述集群的设备资源使用率过高。When the value of the at least one indicator continues to be greater than the corresponding preset uplink threshold in the preset time period, it is determined that the device resource usage rate of the cluster is too high.
  8. 如权利要求1-7任一项所述的方法,其中,所述硬件使用状况信息包括:设备磁盘空间使用率、设备当前CPU使用率和设备当前内存使用率的至少一种;The method according to any one of claims 1 to 7, wherein the hardware usage status information comprises: at least one of a device disk space usage rate, a current CPU usage rate of the device, and a current memory usage rate of the device;
    对应地所述指标信息包括:集群内总磁盘空间使用率、集群内设备平均CPU使用率和集群内设备平均内存使用率中的至少一种。Correspondingly, the indicator information includes: at least one of a total disk space usage rate in the cluster, an average CPU usage rate of the devices in the cluster, and an average memory usage rate of the devices in the cluster.
  9. 一种系统内节点数的调整方法,所述方法应用于系统集群中的节点设备,所述方法包括如下步骤:A method for adjusting the number of nodes in a system, the method being applied to a node device in a system cluster, the method comprising the following steps:
    在所述节点设备启动后,采集所述节点设备自身的硬件使用状况信息; After the node device is started, collecting hardware usage status information of the node device itself;
    将采集到的硬件使用状况信息发送给出去,以供接收端根据硬件使用状况信息获取用于描述所述集群当前使用状态的指标信息,然后根据所述指标信息调整所述集群内已启动的节点设备的数量。Sending the collected hardware usage information to the receiving end, so that the receiving end obtains the indicator information used to describe the current usage state of the cluster according to the hardware usage information, and then adjusts the activated node in the cluster according to the indicator information. The number of devices.
  10. 如权利要求9所述的方法,其中,所述将采集到的硬件使用状况信息发送给出去的步骤包括:The method of claim 9, wherein the step of transmitting the collected hardware usage information is performed by:
    通过所述系统使用的Java管理扩展JMX框架将采集到的硬件使用状况信息发送给出去。The collected hardware usage information is sent and given by the Java Management Extension JMX framework used by the system.
  11. 如权利要求9或10所述的方法,其中,所述方法还包括:The method of claim 9 or 10, wherein the method further comprises:
    通过智能管理平台接口IPMI接收关机指令,在接收所述关机指令后关闭所述节点设备。Receiving a shutdown command through the intelligent management platform interface IPMI, and closing the node device after receiving the shutdown instruction.
  12. 一种系统内节点数的调整装置,包括:接收模块、分析模块和处理模块;An apparatus for adjusting the number of nodes in a system, comprising: a receiving module, an analyzing module, and a processing module;
    所述接收模块,设置为接收集群中各节点设备发送的自身当前的硬件使用状况信息,所述节点设备为已启动的节点设备;The receiving module is configured to receive current hardware usage status information sent by each node device in the cluster, where the node device is an activated node device;
    所述分析模块,设置为根据接收到的硬件使用状况信息获取用于描述所述集群当前使用状态的指标信息;The analyzing module is configured to obtain, according to the received hardware usage status information, indicator information used to describe a current usage status of the cluster;
    所述处理模块,设置为根据所述指标信息调整所述集群内已启动的节点设备的数量。The processing module is configured to adjust the number of node devices that have been started in the cluster according to the indicator information.
  13. 如权利要求12所述的调整装置,其中,所述处理模块包括:判断模块和调整模块;The adjusting device according to claim 12, wherein the processing module comprises: a determining module and an adjusting module;
    所述判断模块,设置为根据所述指标信息判断所述集群的设备资源使用率是过高还是过低;The determining module is configured to determine, according to the indicator information, whether a device resource usage rate of the cluster is too high or too low;
    所述调整模块,设置为:The adjustment module is set to:
    在所述判断模块判断所述集群的设备资源使用率过低时,控制所述集群内已启动的节点设备关闭以减少所述集群内已启动的节点设备的数量;When the determining module determines that the device resource usage rate of the cluster is too low, the node device that is started in the cluster is controlled to be closed to reduce the number of node devices that are started in the cluster;
    在所述判断模块判断所述集群的设备资源使用率过高时,控制未启动的节点设备启动并加入所述集群以增加所述集群内已启动的节点设备的数量。When the determining module determines that the device resource usage rate of the cluster is too high, the node device that controls the unstarting starts and joins the cluster to increase the number of node devices that are started in the cluster.
  14. 如权利要求13所述的调整装置,其中,所述调整模块,设置为: The adjusting device according to claim 13, wherein the adjusting module is configured to:
    通过未启动的节点设备的IPMI控制所述未启动的节点设备启动并加入所述集群以增加所述集群内已启动的节点设备的数量;Controlling, by the IPMI of the node device that is not started, that the unstarted node device starts up and joins the cluster to increase the number of node devices that are started in the cluster;
    通过所述集群内已启动的节点设备的IPMI控制所述已启动的节点设备关闭以减少所述集群内已启动的节点设备的数量。The activated node device is turned off by the IPMI of the node device that has been started within the cluster to reduce the number of node devices that have been started within the cluster.
  15. 一种系统内节点数的调整装置,所述调整装置应用于系统集群中的节点设备,所述装置包括:采集模块和发送模块;An apparatus for adjusting the number of nodes in a system, the adjusting apparatus is applied to a node device in a system cluster, and the apparatus includes: an acquisition module and a sending module;
    所述采集模块,设置为在所述节点设备启动后,采集所述节点设备自身的硬件使用状况信息;The collecting module is configured to collect hardware usage status information of the node device itself after the node device is started;
    所述发送模块,设置为将采集到的硬件使用状况信息发送给出去,以供接收端根据硬件使用状况信息获取用于描述所述集群当前使用状态的指标信息,然后根据所述指标信息调整所述集群内已启动的节点设备的数量。The sending module is configured to send the collected hardware usage status information, so that the receiving end obtains indicator information used to describe the current usage status of the cluster according to the hardware usage status information, and then adjusts the location according to the indicator information. The number of node devices that have been started in the cluster.
  16. 如权利要求15所述的调整装置,其中,还包括:智能管理平台接口IPMI和控制模块;The adjustment apparatus according to claim 15, further comprising: an intelligent management platform interface IPMI and a control module;
    所述IPMI,设置为接收外部传输的关机指令并将所述关机指令传输给所述控制模块;The IPMI is configured to receive a shutdown command of an external transmission and transmit the shutdown command to the control module;
    所述控制模块,设置为在接收到所述关机指令后关闭所述节点设备。 The control module is configured to shut down the node device after receiving the shutdown command.
PCT/CN2015/086049 2015-04-14 2015-08-04 Method of adjusting number of nodes in system and device utilizing same WO2016165242A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510173758.X 2015-04-14
CN201510173758.XA CN106155805A (en) 2015-04-14 2015-04-14 Method of adjustment that system internal segment is counted and device

Publications (1)

Publication Number Publication Date
WO2016165242A1 true WO2016165242A1 (en) 2016-10-20

Family

ID=57125660

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/086049 WO2016165242A1 (en) 2015-04-14 2015-08-04 Method of adjusting number of nodes in system and device utilizing same

Country Status (2)

Country Link
CN (1) CN106155805A (en)
WO (1) WO2016165242A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110445725A (en) * 2019-07-08 2019-11-12 福建天泉教育科技有限公司 It is new that method, the storage medium that load node shunts is added
CN110932935A (en) * 2019-11-26 2020-03-27 深圳前海微众银行股份有限公司 Resource control method, device, equipment and computer storage medium
CN111414274A (en) * 2019-01-04 2020-07-14 营邦企业股份有限公司 Far-end eliminating method for abnormal state of cabinet applied to data center
CN114124969A (en) * 2021-11-25 2022-03-01 广州市昊链信息科技股份有限公司 Data copying method, device, equipment and storage medium
CN114584986A (en) * 2020-12-01 2022-06-03 中国联合网络通信集团有限公司 Resource scheduling method and device

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109062508A (en) * 2018-07-19 2018-12-21 郑州云海信息技术有限公司 A kind of method and device of data processing
CN109767341B (en) * 2018-12-13 2024-04-16 深圳平安医疗健康科技服务有限公司 Service request processing method, processing device and terminal
CN109800079B (en) * 2018-12-13 2023-03-31 深圳平安医疗健康科技服务有限公司 Node adjusting method in medical insurance system and related device
CN111147565B (en) * 2019-12-22 2023-01-24 北京浪潮数据技术有限公司 Cluster node control method, device and equipment and readable storage medium
CN112235383B (en) * 2020-10-09 2024-03-22 腾讯科技(深圳)有限公司 Container service cluster node scheduling method and device, server and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102143563A (en) * 2010-07-30 2011-08-03 华为技术有限公司 Control method, equipment and systems of short message center cluster
CN103561428A (en) * 2013-10-10 2014-02-05 东软集团股份有限公司 Method and system for elastically distributing nodes in short message gateway cluster system
CN103713935A (en) * 2013-12-04 2014-04-09 中国科学院深圳先进技术研究院 Method and device for managing Hadoop cluster resources in online manner
US20140245298A1 (en) * 2013-02-27 2014-08-28 Vmware, Inc. Adaptive Task Scheduling of Hadoop in a Virtualized Environment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020087693A1 (en) * 2000-12-29 2002-07-04 Marshall Donald Brent Method and system for distributing service functionality
CN101772140A (en) * 2009-12-25 2010-07-07 中兴通讯股份有限公司 Self-adaption energy saving method and business system having self-adaption energy saving function
US8959223B2 (en) * 2011-09-29 2015-02-17 International Business Machines Corporation Automated high resiliency system pool
CN103188277B (en) * 2011-12-27 2016-05-18 中国电信股份有限公司 load energy consumption management system, method and server
CN103906207B (en) * 2014-03-03 2017-07-18 东南大学 Wireless sensor network data transmission method based on self adaptation awakening technology on demand

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102143563A (en) * 2010-07-30 2011-08-03 华为技术有限公司 Control method, equipment and systems of short message center cluster
US20140245298A1 (en) * 2013-02-27 2014-08-28 Vmware, Inc. Adaptive Task Scheduling of Hadoop in a Virtualized Environment
CN103561428A (en) * 2013-10-10 2014-02-05 东软集团股份有限公司 Method and system for elastically distributing nodes in short message gateway cluster system
CN103713935A (en) * 2013-12-04 2014-04-09 中国科学院深圳先进技术研究院 Method and device for managing Hadoop cluster resources in online manner

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414274A (en) * 2019-01-04 2020-07-14 营邦企业股份有限公司 Far-end eliminating method for abnormal state of cabinet applied to data center
CN110445725A (en) * 2019-07-08 2019-11-12 福建天泉教育科技有限公司 It is new that method, the storage medium that load node shunts is added
CN110445725B (en) * 2019-07-08 2022-07-26 福建天泉教育科技有限公司 Method and storage medium for shunting newly-added load node
CN110932935A (en) * 2019-11-26 2020-03-27 深圳前海微众银行股份有限公司 Resource control method, device, equipment and computer storage medium
CN114584986A (en) * 2020-12-01 2022-06-03 中国联合网络通信集团有限公司 Resource scheduling method and device
CN114584986B (en) * 2020-12-01 2024-04-09 中国联合网络通信集团有限公司 Resource scheduling method and device
CN114124969A (en) * 2021-11-25 2022-03-01 广州市昊链信息科技股份有限公司 Data copying method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN106155805A (en) 2016-11-23

Similar Documents

Publication Publication Date Title
WO2016165242A1 (en) Method of adjusting number of nodes in system and device utilizing same
US10474451B2 (en) Containerized upgrade in operating system level virtualization
WO2016197876A1 (en) Remote control method, remote server, management device, and terminal
DE102020132078A1 (en) RESOURCE ALLOCATION BASED ON APPLICABLE SERVICE LEVEL AGREEMENT
US20120017074A1 (en) Dynamic system mode switching
CN103957237A (en) Architecture of elastic cloud
WO2014094422A1 (en) Virtual machine specification adjustment method and device
CN113742031B (en) Node state information acquisition method and device, electronic equipment and readable storage medium
US9772920B2 (en) Dynamic service fault detection and recovery using peer services
WO2016192579A1 (en) Information processing method, cloud service platform and information processing system
CN107682460B (en) Distributed storage cluster data communication method and system
WO2015043407A1 (en) Method, system, and apparatus for online service inspection
WO2013127151A1 (en) Power consumption capping control method, device and system
DE112016002846T5 (en) Dynamic management of an inactivity timer during communication between processors
CN105681391A (en) Server, Android client and network communication method and device of server and Android client
US20170111240A1 (en) Service Elastic Method and Apparatus in Cloud Computing
CN109766110B (en) Control method, substrate management controller and control system
US10122602B1 (en) Distributed system infrastructure testing
CN105607606A (en) Data acquisition device and data acquisition method based on double-mainboard framework
CN109885385B (en) Self-optimizing method, device and equipment for application server thread pool
US10680916B2 (en) Management of network elements in a cloud platform
CN113553194B (en) Hardware resource management method, device and storage medium
WO2015117458A1 (en) Fault information collection method, device and system
CN112214303A (en) Kubernetes cluster automatic scaling system
CN111338580B (en) Method and equipment for optimizing disk performance

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15888945

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15888945

Country of ref document: EP

Kind code of ref document: A1