WO2016165242A1 - Procédé de réglage de nombre de nœuds dans un système, et dispositif l'utilisant - Google Patents

Procédé de réglage de nombre de nœuds dans un système, et dispositif l'utilisant Download PDF

Info

Publication number
WO2016165242A1
WO2016165242A1 PCT/CN2015/086049 CN2015086049W WO2016165242A1 WO 2016165242 A1 WO2016165242 A1 WO 2016165242A1 CN 2015086049 W CN2015086049 W CN 2015086049W WO 2016165242 A1 WO2016165242 A1 WO 2016165242A1
Authority
WO
WIPO (PCT)
Prior art keywords
cluster
node
node device
started
module
Prior art date
Application number
PCT/CN2015/086049
Other languages
English (en)
Chinese (zh)
Inventor
刘建鹏
尤元建
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016165242A1 publication Critical patent/WO2016165242A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the present invention relates to the field of distributed storage technologies, and in particular, to a method and an apparatus for adjusting the number of nodes in a system.
  • Hadoop is an open source software framework for distributed processing of large amounts of data.
  • the storage core of Hadoop is the Hadoop Distributed File System (HDFS).
  • HDFS is suitable for running on general-purpose hardware and needs to be deployed on a large number of devices to support large-scale data sets and high-throughput data access, and to achieve high numbers by multiple copies that can be distributed across different devices. Fault tolerance.
  • a large number of devices deployed by HDFS form a Hadoop cluster.
  • the number of nodes in the current Hadoop system is defined at the beginning of use, and cannot be dynamically reduced or increased according to the actual amount of data, resulting in unnecessary waste of node device resources or insufficient technical resources in the Hadoop system.
  • the main technical problem to be solved by the embodiments of the present invention is to provide a method and a device for adjusting the number of nodes in the system, which can solve the problem that the node device resources in the current Hadoop system cannot be dynamically adjusted or the node device resources in the Hadoop system are insufficient.
  • an embodiment of the present invention provides a method for adjusting the number of nodes in a system, including the following steps:
  • the step of adjusting the number of node devices that have been started in the cluster according to the indicator information includes:
  • the number of node devices that have been started in the cluster is increased.
  • the step of increasing the number of node devices that have been started in the cluster includes:
  • the step of reducing the number of node devices that have been started in the cluster includes:
  • the node devices that have been started within the cluster are controlled to be shut down to reduce the number of node devices that have been started within the cluster.
  • the step of controlling the node device that is not started to start and join the cluster to increase the number of node devices that are started in the cluster includes:
  • IPMI Intelligent Platform Management Interface
  • the step of controlling the started node device in the cluster to be closed to reduce the number of node devices in the cluster includes:
  • the activated node device is turned off by the IPMI of the node device that has been started within the cluster to reduce the number of node devices that have been started within the cluster.
  • the step of controlling, by the IPMI of the node device that is not started, the initiating node device to start and join the cluster includes:
  • IPMI Intelligent Platform Management Interface
  • the step of controlling, by the IPMI of the node device that has been started in the cluster, that the activated node device is shut down includes:
  • a shutdown command is sent to the IPMI of the node device that has been started within the cluster to cause the activated node device to shut down.
  • the indicator information includes at least one indicator
  • the step of determining, according to the indicator information, whether the device resource usage rate of the cluster is too high or too low includes:
  • the device resource usage rate of the cluster is determined to be too high or too low according to the comparison result.
  • the step of determining, according to the comparison result, whether the device resource usage rate of the cluster is too high or too low includes:
  • the device resource usage rate of the cluster is too high.
  • the hardware usage status information includes: at least one of a device disk space usage rate, a current CPU usage rate of the device, and a current memory usage rate of the device;
  • the indicator information includes: at least one of a total disk space usage rate in the cluster, an average CPU usage rate of the devices in the cluster, and an average memory usage rate of the devices in the cluster.
  • the present invention also provides another method for adjusting the number of nodes in the system.
  • the method is applied to a node device in a system cluster, and includes the following steps:
  • the step of sending the collected hardware usage information to the collected includes:
  • the collected hardware usage information is sent and given by the JMX (Java Management Extensions) framework used by the system.
  • JMX Java Management Extensions
  • the embodiment of the present invention further provides an apparatus for adjusting the number of nodes in the system, including: a receiving module, an analyzing module, and a processing module;
  • the receiving module is configured to receive current hardware usage status information sent by each node device in the cluster, where the node device is an activated node device;
  • the analyzing module is configured to obtain, according to the received hardware usage status information, indicator information used to describe a current usage status of the cluster;
  • the processing module is configured to adjust the number of node devices that have been started in the cluster according to the indicator information.
  • the processing module includes: a determining module and an adjusting module;
  • the determining module is configured to determine, according to the indicator information, whether a device resource usage rate of the cluster is too high or too low;
  • the adjustment module is set to:
  • the determining module determines that the device resource usage rate of the cluster is too low, the node device that is started in the cluster is controlled to be closed to reduce the number of node devices that are started in the cluster;
  • the determining module determines that the device resource usage rate of the cluster is too high, the node device that controls the unstarting starts and joins the cluster to increase the number of node devices that are started in the cluster.
  • the adjustment module is configured to:
  • the activated node device is turned off by the IPMI of the node device that has been started within the cluster to reduce the number of node devices that have been started within the cluster.
  • the present invention further provides another apparatus for adjusting the number of nodes in the system, where the adjusting apparatus is applied to a node device in a system cluster, including: an acquisition module and a sending module;
  • the collecting module is configured to collect hardware usage status information of the node device itself after the node device is started;
  • the sending module is configured to send the collected hardware usage status information, so that the receiving end obtains indicator information used to describe the current usage status of the cluster according to the hardware usage status information, and then adjusts the location according to the indicator information.
  • the number of node devices that have been started in the cluster is configured to send the collected hardware usage status information, so that the receiving end obtains indicator information used to describe the current usage status of the cluster according to the hardware usage status information, and then adjusts the location according to the indicator information. The number of node devices that have been started in the cluster.
  • the adjusting device further includes an IPMI and a control module
  • the IPMI is configured to receive a shutdown command of an external transmission and transmit the shutdown command to the control module;
  • the control module is configured to shut down the node device after receiving the shutdown command.
  • An embodiment of the present invention provides a method and an apparatus for adjusting the number of nodes in a system, where the method for adjusting the number of nodes in the system according to the embodiment of the present invention includes: receiving current hardware usage status information sent by each node device in the cluster.
  • the adjustment method of the present invention may be based on a current cluster
  • the usage status dynamically adjusts the number of node devices in the Hadoop cluster to meet the requirements of the cluster to the node device resources, thereby improving the utilization of node devices in the Hadoop system, reducing unnecessary resource waste, and solving the shortage of node device resources in the Hadoop system. technical problem.
  • FIG. 1 is a schematic flowchart of a method for adjusting a number of nodes in a Hadoop system according to Embodiment 1 of the present invention
  • FIG. 2 is a schematic flowchart of another method for adjusting the number of nodes in a Hadoop system according to Embodiment 1 of the present invention
  • FIG. 3 is a schematic flowchart of a method for adjusting a number of nodes in a Hadoop system according to Embodiment 2 of the present invention
  • FIG. 4 is a schematic structural diagram of an apparatus for adjusting a number of nodes in a Hadoop system according to Embodiment 3 of the present invention.
  • FIG. 5 is a schematic structural diagram of another apparatus for adjusting the number of nodes in a Hadoop system according to Embodiment 3 of the present invention.
  • FIG. 6 is a schematic structural diagram of an apparatus for adjusting a number of nodes in a Hadoop system according to Embodiment 4 of the present invention.
  • FIG. 7 is a schematic structural diagram of another apparatus for adjusting the number of nodes in a Hadoop system according to Embodiment 3 of the present invention.
  • FIG. 8 is a schematic structural diagram of a system for adjusting the number of nodes in a Hadoop system according to Embodiment 4 of the present invention.
  • the present invention provides a method and device for adjusting the number of nodes in the system, which can be used according to the current use of the cluster.
  • the state dynamically adjusts the number of node devices in the system cluster.
  • the following describes the adjustment method and apparatus of the present invention by taking the number of nodes in the Hadoop system as an example. It should be understood that the method and apparatus of this embodiment are not limited to dynamically adjusting the number of nodes in the Hadoop system, and can dynamically adjust node data in other systems.
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • This embodiment provides a method for adjusting the number of nodes in a Hadoop system. As shown in FIG. 1, the method may include the following steps:
  • Step 101 Receive current hardware usage status information sent by each node device in the cluster, where the node device is an activated node device.
  • the cluster is a Hadoop cluster, and each node device in the cluster needs to collect its own hardware usage status information, and then sends it out.
  • each node device in the Hadoop cluster is a node device that has been started, and each node device installs an HDFS component after startup, so that Hadoop can perform data access and the like.
  • the hardware usage status information in this embodiment may include: the current disk space usage rate of the device (ie, the current disk space usage percentage of the device), the current CPU usage of the device (and the current CPU usage percentage of the device), and the current device. At least one of memory usage (that is, the current memory usage percentage of the device).
  • the method for receiving the hardware usage information sent by the node device may be multiple, for example, may be received through a network, specifically by using a Transmission Control Protocol (TCP)/Internet Protocol (Internet Protocol, referred to as Communicate with the node device for the IP) mode to receive the hardware usage status information collected by the node device.
  • TCP Transmission Control Protocol
  • Internet Protocol Internet Protocol
  • Step 102 Acquire indicator information for describing a current usage state of the cluster according to the received hardware usage information.
  • the hardware usage status information sent by each node may be calculated to obtain indicator information describing the current usage status of the cluster.
  • the cluster is a set of node devices. Therefore, the indicator information describing the current state of use of the cluster in this embodiment may be obtained by using hardware usage information of the device in the cluster, and the indicator information may be related to the hardware of the node device in the cluster.
  • the usage status information corresponds to, for example, when the indicator information is the total disk space usage rate in the cluster, the hardware usage status information collected by the node devices in the cluster may include the device disk space usage rate.
  • the indicator information in this embodiment may include: total disk space usage in the cluster (that is, the percentage of total disk space usage in the cluster), and average CPU usage of the devices in the cluster (average CPU usage percentage of devices in the cluster) And at least one of the average memory usage of the devices in the cluster (the average percentage of devices in the cluster).
  • Step 103 Adjust the number of node devices that have been started in the cluster according to the indicator information.
  • the method of the embodiment may reduce or increase the number of node devices that have been started in the cluster according to the indicator information.
  • the number of node devices that have been activated by the node devices in the cluster may be referred to as adjusting the number of node devices in the cluster.
  • the adjustments (e.g., increase or decrease) of the number of node devices within a cluster as described herein refer to adjusting (e.g., increasing or decreasing) the number of node devices that have been activated within the cluster.
  • the method in this embodiment can determine, according to the indicator information, whether the device resource usage rate of the cluster is too high or too low.
  • the number of node devices that have been started in the cluster is increased.
  • the step of determining, according to the indicator information, whether the device resource usage rate of the cluster is too high or too low in the embodiment includes:
  • the device resource usage rate of the cluster is determined to be too high or too low according to the comparison result.
  • the step of determining, according to the comparison result, whether the device resource usage rate of the cluster is too high or too low includes:
  • the device resource usage rate of the cluster is too high.
  • the device resource usage rate of the cluster is too low; for example, the average CPU usage of the device in the cluster is 1 If the number of minutes exceeds the preset uplink threshold, the device resource usage of the cluster is determined to be too low.
  • the device resource usage rate of the cluster is too high. For example, the average CPU usage of the devices in the cluster continues to be greater than the preset uplink threshold in one minute, and the device resource usage of the cluster is determined to be too high.
  • the method of the embodiment may also consider the time factor. After comparing the value of each indicator with the corresponding preset lower limit threshold and the corresponding preset upper limit threshold, if at least one indicator has a value smaller than the corresponding If the preset lower threshold is used, it is determined that the device resource usage rate of the cluster is too low; if the value of the at least one indicator is greater than the corresponding preset uplink threshold, the device resource usage rate of the cluster is determined to be too high. .
  • the indicator information includes: multiple indicators, for example, the total disk space usage in the cluster (that is, the total disk space usage percentage in the cluster), the average CPU usage of the devices in the cluster (the average CPU usage percentage of the devices in the cluster), and the intra-cluster
  • the average memory usage of the device; the value of each indicator can be compared with the corresponding preset lower threshold; if the value of at least one indicator is less than the corresponding lower threshold, for example, the total disk space usage in the cluster is less than the preset. If the lower threshold and/or the average memory usage rate of the device in the cluster is lower than the preset lower threshold, the device resource usage of the cluster is too low. In this case, the resource is wasted.
  • the value of at least one indicator is greater than a preset upper threshold (for example, if the total disk space usage in the cluster is greater than a preset upper threshold and/or the average memory usage flow rate in the cluster is greater than a preset upper threshold), then The device resource usage of the cluster is too high. At this time, there is a risk of insufficient device resources. You need to increase the cluster. Node device inside.
  • a unified lower threshold and an upper threshold may be set in advance.
  • different upper and lower thresholds may be set corresponding to different indicators.
  • the preferred manner of increasing the number of node devices in the cluster in this embodiment includes: controlling an unstarted node device to start and join the cluster to increase the number of node devices in the cluster.
  • the un-started node device can be controlled to join and join the cluster by the IPMI of the node device that is not activated to increase the number of node devices in the cluster.
  • the node device that is not started in this embodiment may be a node device that is not started in the cluster or a node device that is not activated outside the cluster.
  • the preferred manner of reducing the number of node devices in the cluster in this embodiment includes: controlling the node devices that have been started in the cluster to be closed to reduce the number of node devices in the cluster.
  • the activated node device is controlled to be closed by an IPMI (Intelligent Platform Management Interface) of the node device that has been started in the cluster to reduce the number of node devices in the cluster.
  • IPMI Intelligent Platform Management Interface
  • the adjustment method in this embodiment can dynamically adjust the number of node devices in the Hadoop cluster according to the current usage state of the cluster.
  • the device resource usage rate of the set is too high, the node device is added in the cluster to avoid insufficient node resources in the Hadoop system.
  • the device resource usage of the set is too low, the node devices in the cluster are reduced to avoid unnecessary waste of device resources.
  • the method for adjusting the number of nodes in the Hadoop system of this embodiment may include the following steps:
  • Step 201 Receive current hardware usage status information sent by each node device in the cluster, where the node device is an activated node device.
  • the hardware usage status information transmitted by the node device through the JMX framework may be received, and the received information is JMX information.
  • the method in this embodiment can receive the hardware usage status information transmitted by the node device in multiple manners, for example, by receiving the JMX information by using the network.
  • the network protocol is first required to process the received JMX information (analysis). , CRC check, calculation, processing, etc.), for example, to restore the original hardware usage information.
  • the system uses multi-threading technology to ensure that data can be processed quickly, preventing buffer overflow and data loss, thereby realizing real-time processing requirements.
  • Step 202 Acquire indicator information for describing a current usage state of the cluster according to the received hardware usage status information, where the indicator information includes at least one indicator.
  • the hardware usage information sent by the node device in the cluster includes: the device disk space usage rate, the current CPU usage of the device, and the current memory usage rate of the device;
  • the metric information used to describe the current usage status of the cluster is calculated according to the hardware usage information, including: total disk space usage in the cluster, average CPU usage of the devices in the cluster, and average memory usage of the devices in the cluster.
  • Step 203 Compare the value of each indicator with a corresponding preset lower limit threshold and a corresponding preset upper limit threshold.
  • Step 204 Determine, according to the comparison result, whether the device resource usage rate of the cluster is too high or too low. If the usage is too high, go to step 205. If it is too low, go to step 206.
  • this step can determine whether the device resource usage rate of the cluster is too high or too low according to the total disk space usage in the cluster, the average CPU usage of the devices in the cluster, and the average memory usage of the devices in the cluster.
  • each of the indicators adopts the same preset range as an example, and each indicator is compared with an upper threshold and a lower threshold in a preset range; when there is a value of at least one indicator greater than an upper threshold, the device is determined.
  • the resource usage rate is too high.
  • the value of the indicator is less than the lower threshold, it is determined that the device resource usage rate is too low.
  • this step may compare each indicator with an upper threshold and a lower threshold in a preset range, and when the value of the at least one indicator continues to be greater than the upper threshold for a preset period of time When the device resource usage rate is too high, when the value of the at least one indicator continues to be less than the lower threshold in the preset time period, it is determined that the device resource usage rate is too low.
  • Step 205 Control, by the IPMI of the node device that is not started, that the unstarted node device starts and joins the cluster to increase the number of node devices that are started in the cluster.
  • the number of node devices may be one or more, and the number of the added devices may be preset, for example, one by default or determined according to the indicator information.
  • the step may specifically: sending a power-on command to the IPMI of the un-started node device to enable the un-started node device to start and join the cluster to increase the number of node devices in the cluster.
  • the IPMI address list of the unstarted DataNode node can be read, and the IPMI address of the preset number of node devices is selected.
  • the method sends a power-on command to the corresponding device, and after waiting for the node device to start, records the state of the device: moves the list of DataNode nodes that have not been started from the node to the list of activated nodes; the device that is not started receives the power-on command through the IPMI.
  • the boot command is transmitted to the BMC (Baseboard Management Controller) component in the node device to start.
  • the unstarted node device automatically starts and installs the HDFS component.
  • the HDFS has been installed according to the configuration of the system boot entry.
  • the component is started with the operating system and joined to the cluster, which expands the Hadoop environment and solves the problem of insufficient resources for the original environment by automatically adding node devices.
  • Step 206 Control, by the IPMI of the node device that is started in the cluster, that the activated node device is shut down to reduce the number of node devices that are started in the cluster.
  • the number of node devices that are specifically reduced in the method of this embodiment may be one or more, and the reduced number may be preset, for example, by default, one node device is reduced or determined according to indicator information.
  • the step may specifically be: sending a shutdown command to the IPMI of the node device that is started in the cluster to shut down the activated node device.
  • the IPMI address list of the activated DataNode node can be read, and the IPMI address of the predetermined number of node devices is selected.
  • the shutdown command is sent to the device through the http mode, and the shutdown command is transmitted to the BMC component in the node device to implement shutdown; after the node device is shut down, the state of the device is recorded: the node is moved from the list of activated DataNode nodes to The node list is not started; after the device is shut down by IPMI, the node is temporarily removed from the Hadoop environment, thereby solving the problem of wasted node resources in the environment.
  • the method of the embodiment automatically increases or decreases the number of nodes in the Hadoop system based on the IPMI, and the performance data of each node is obtained by the collectors in the nodes in each cluster, and is reported to the system through the JMX framework, and the system integrates and analyzes the information. According to the analysis result, the IPMI is used to control the start or stop of the target device, thereby achieving the purpose of dynamically and automatically adjusting the number of devices in the cluster.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • this embodiment provides a method for adjusting the number of nodes in a Hadoop system, which is applied to a node device in a Hadoop system cluster, and includes the following steps:
  • Step 301 After the node device is started, collect hardware usage status information of the node device itself.
  • the device can collect the hardware usage information of the node device, for example, at least one of a device disk space usage rate, a current CPU usage rate of the device, and a current memory usage rate of the device.
  • the data conversion table may be established on the collected hardware usage status information, and the hardware usage status information is processed based on the data conversion table to form a data table.
  • Step 302 Send the collected hardware usage information to the receiving end, so that the receiving end obtains the indicator information used to describe the current usage state of the cluster according to the hardware usage information, and then adjusts the The number of node devices that are started.
  • this step may send the collected hardware usage information by using the JMX framework used by the Hadoop system.
  • JMX information can be sent out through a network such as TCP/IP.
  • the method of the embodiment can be used to collect the hardware usage information of the device, so that the management terminal dynamically adjusts the number of node devices in the cluster, and solves the problem that the node device resources cannot be dynamically adjusted in the current Hadoop system or the nodes in the Hadoop system. Technical problems with insufficient equipment resources.
  • the method of this embodiment may further include: receiving, by using an IPMI, a shutdown command, and closing after receiving the shutdown instruction.
  • the number of node devices in the cluster is reduced, and the shutdown command can be received through the IPMI, and then the shutdown command is transmitted to the BMC component in the node device, and the shutdown is implemented through the BMC component.
  • Embodiment 3 is a diagrammatic representation of Embodiment 3
  • this embodiment provides a device for adjusting the number of nodes in a Hadoop system, including: a receiving module 401, an analyzing module 402, and a processing module 403;
  • the receiving module 401 is configured to receive current hardware usage status information sent by each node device in the cluster, where the node device is an activated node device;
  • the analyzing module 402 is configured to obtain, according to the received hardware usage status information, indicator information used to describe a current usage status of the cluster;
  • the processing module 403 is configured to adjust the number of node devices that have been started in the cluster according to the indicator information.
  • the processing module 403 in this embodiment includes: a determining module 4031 and an adjusting module 4032;
  • the determining module 4031 is configured to determine, according to the indicator information, whether a device resource usage rate of the cluster is too high or too low;
  • the adjustment module 4032 is configured to:
  • the determining module 4031 determines that the device resource usage rate of the cluster is too low, the node device that is started in the cluster is controlled to be closed to reduce the number of node devices that are started in the cluster;
  • the determining module 4031 determines that the device resource usage rate of the cluster is too high, the node device that controls the initiating is started and joins the cluster to increase the number of node devices that are started in the cluster.
  • the adjusting module 4032 is configured to:
  • the activated node device is turned off by the IPMI of the node device that has been started within the cluster to reduce the number of node devices that have been started within the cluster.
  • the adjusting device of the present invention can dynamically adjust the number of node devices in the Hadoop cluster according to the current usage state of the cluster, and meet the requirements of the cluster to the node device resources, thereby improving the utilization of the node devices in the Hadoop system and reducing unnecessary resource waste. And solve the technical problem of insufficient node device resources in the Hadoop system.
  • Embodiment 4 is a diagrammatic representation of Embodiment 4:
  • the embodiment provides a device for adjusting the number of nodes in a Hadoop system, and the device is applied to a node device in a Hadoop system cluster, and includes: an acquisition module 601 and a sending module 602;
  • the collecting module 601 is configured to collect hardware usage status information of the node device itself after the node device is started;
  • the sending module 602 is configured to send and receive the collected hardware usage status information, so that the receiving end acquires indicator information used to describe the current usage status of the cluster according to the hardware usage status information, and then adjusts according to the indicator information.
  • the number of node devices that have been started within the cluster is configured to send and receive the collected hardware usage status information, so that the receiving end acquires indicator information used to describe the current usage status of the cluster according to the hardware usage status information, and then adjusts according to the indicator information. The number of node devices that have been started within the cluster.
  • the sending module 602 of the embodiment is configured to send and receive the collected hardware usage information by using the JMX framework used by the Hadoop system.
  • the adjustment apparatus of this embodiment further includes: an IPMI interface 604 and a control module 603;
  • the IPMI interface 604 is configured to receive an externally transmitted shutdown command and transmit the shutdown command to the control module;
  • the control module 603 is configured to shut down the node device after receiving the shutdown command.
  • the adjusting device can collect hardware usage information of the device for the management terminal to dynamically adjust the number of node devices in the cluster, and solve the problem that the node device resources in the Hadoop system cannot be dynamically adjusted or the node device in the Hadoop system. Technical problems with insufficient resources.
  • Embodiment 5 is a diagrammatic representation of Embodiment 5:
  • the embodiment provides an adjustment system for the number of nodes in the Hadoop system, including: a plurality of unstarted first node devices 80, and a plurality of activated second node devices 81 located in the Hadoop system cluster. And node management device 82;
  • the first node device 80 is provided with an IPMI interface 801
  • the second node device 81 is provided with an IPMI interface 813, an acquisition module 811, and a sending module 812;
  • the node management device 82 includes: a receiving module 821, an analysis module 822, and a processing module 823;
  • the collection module 811 of the second node device 81 is configured to collect hardware usage status data of the node device where the module is located;
  • the collecting module 812 of the second node device 81 is specifically configured to establish a data conversion table for the collected basic data, process the state data of the host based on the data conversion table, form a data table, and report the data through the JMX method.
  • the data is used to describe the hardware usage status of the host node, and mainly includes the total disk space of the host, the used disk space of the host, the current CPU usage percentage, the current memory usage percentage, and the like; and then the data table is sent through the sending module.
  • the sending module 813 of the second node device 81 is configured to send the collected data to the receiving module 821 in the node management device 82, specifically, can be sent to the receiving module 821 through the JMX framework;
  • the receiving module 821 in the node management device 82 is configured to receive data reported by the sending module 813, such as a data table.
  • the receiving module is responsible for the collection and preliminary processing of the data table, wherein the specific process comprises two parts: receiving of the original data, processing of the original data.
  • the underlying interface may be a network interface, and communicates with the sending module of the node device by using TCP/IP.
  • the original data received from the acquisition module is received through the interface provided by the system, and submitted to the data receiving module.
  • the data receiving module shall parse the original data according to the protocol of different devices, and perform CRC (Cyclic Redundancy Check Code) verification, calculation, and processing. After processing, submit it to the data analysis module.
  • CRC Cyclic Redundancy Check Code
  • the analysis module 822 in the node management device 82 is configured to sort the performance data transmitted by the receiving module 821; based on the data table sent by the collection module, the receiving module extracts the corresponding type of data information according to the keywords in the table, and according to different types of data. Different rules calculate metric information that accurately describes the state of the entire cluster, such as the average CPU usage of hosts in the cluster, the average memory usage percentage of hosts in the cluster, and the percentage of total disk space usage in the cluster. And determining whether the indicator data is reasonable, the specific determining step is: comparing the obtained indicator data with the set threshold range, if the current indicator data is within a normal threshold range, no additional operations are performed, and the loop is continuously collected. , receiving, etc.; if the current indicator data is outside the normal threshold range, the result is passed to the processing module.
  • the processing module 823 determines, according to the data obtained by the analysis module 822, if the current indicator data is higher than the normal threshold range, the current system resource usage rate is high, and there is a risk of insufficient resources. If the indicator data continues to be above the threshold for a period of time, then all device lists are read, an unstarted first node device is found, and a boot command is sent to its IPMI; since the IPMI can take the system management software from the system hardware The platform management tasks are separated and the underlying server management functions are separated from the high-level software, so IPMI is not limited by the way the operating system is managed. IPMI can be remotely booted by specifying a generic, streamlined, message-based interface that transmits information to the device's BMC component.
  • the first node device according to the configuration of the system boot entry, the Hadoop component that has been installed will start with the operating system and join the cluster, thereby increasing the system resources available in the cluster. On the contrary, if the current indicator data is lower than the normal threshold range, the current system resource usage rate is low, and there is a waste of resources. If the indicator data continues to be below the threshold for a period of time, then all device lists are read, a second node device that has been activated is found, and a shutdown command is sent to its IPMI.
  • the remote shutdown command can be implemented through the BMC component of the device.
  • a system for automatically increasing or decreasing the number of nodes in a Hadoop system based on IPMI is proposed.
  • the acquisition module of each node in the cluster acquires performance data of each node, and reports it to the management device, and the management device controls the target according to the result.
  • the device is started or stopped to achieve the purpose of dynamically and automatically adjusting the number of node devices in the cluster.
  • the technical solution provided by the embodiment of the present invention can be used in the process of adjusting the number of nodes in the system, and the method for adjusting the number of nodes in the system provided by the embodiment of the present invention includes: receiving the current current sent by each node device in the cluster The hardware usage information is obtained, and the indicator information used to describe the current usage status of the cluster is obtained according to the received hardware usage information; and the number of node devices that are started in the cluster is adjusted according to the indicator information; Dynamically adjust the number of node devices in the Hadoop cluster according to the current usage status of the cluster to meet the requirements of the cluster to the node device resources, thereby improving the utilization of node devices in the Hadoop system, reducing unnecessary resource waste, and solving nodes in the Hadoop system. Technical problems with insufficient equipment resources.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)

Abstract

L'invention concerne un procédé de réglage d'un nombre de nœuds dans un système, et un dispositif l'utilisant. Le procédé comprend les étapes consistant : à recevoir des informations d'état d'utilisation de matériel courant envoyées par chacun des dispositifs de nœud d'une grappe (101) ; à acquérir, conformément aux informations d'utilisation de matériel reçues, des informations d'index décrivant un état d'utilisation courant de la grappe (102) ; et à régler, en fonction des informations d'index, un nombre de dispositifs de nœud activés dans la grappe (103). Le procédé de réglage permet de résoudre le problème technique actuel de gaspillage des ressources de dispositifs de nœud ou de manque de ressources de dispositifs de nœud dans un système Hadoop résultant de l'incapacité à régler dynamiquement le nombre des nœuds dans le système Hadoop.
PCT/CN2015/086049 2015-04-14 2015-08-04 Procédé de réglage de nombre de nœuds dans un système, et dispositif l'utilisant WO2016165242A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510173758.X 2015-04-14
CN201510173758.XA CN106155805A (zh) 2015-04-14 2015-04-14 系统内节点数的调整方法和装置

Publications (1)

Publication Number Publication Date
WO2016165242A1 true WO2016165242A1 (fr) 2016-10-20

Family

ID=57125660

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/086049 WO2016165242A1 (fr) 2015-04-14 2015-08-04 Procédé de réglage de nombre de nœuds dans un système, et dispositif l'utilisant

Country Status (2)

Country Link
CN (1) CN106155805A (fr)
WO (1) WO2016165242A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110445725A (zh) * 2019-07-08 2019-11-12 福建天泉教育科技有限公司 新加入负载节点分流的方法、存储介质
CN110932935A (zh) * 2019-11-26 2020-03-27 深圳前海微众银行股份有限公司 资源控制方法、装置、设备及计算机存储介质
CN111414274A (zh) * 2019-01-04 2020-07-14 营邦企业股份有限公司 运用于数据中心的机柜异常状态的远端排除方法
CN114124969A (zh) * 2021-11-25 2022-03-01 广州市昊链信息科技股份有限公司 一种数据复制方法、装置、设备以及存储介质
CN114584986A (zh) * 2020-12-01 2022-06-03 中国联合网络通信集团有限公司 资源调度的方法和装置

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109062508A (zh) * 2018-07-19 2018-12-21 郑州云海信息技术有限公司 一种数据处理的方法及装置
CN109767341B (zh) * 2018-12-13 2024-04-16 深圳平安医疗健康科技服务有限公司 一种业务请求处理方法、处理装置和终端
CN109800079B (zh) * 2018-12-13 2023-03-31 深圳平安医疗健康科技服务有限公司 医保系统中的节点调整方法及相关装置
CN111147565B (zh) * 2019-12-22 2023-01-24 北京浪潮数据技术有限公司 一种集群节点控制方法、装置、设备及可读存储介质
CN112235383B (zh) * 2020-10-09 2024-03-22 腾讯科技(深圳)有限公司 容器服务集群节点调度方法及装置、服务器、存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102143563A (zh) * 2010-07-30 2011-08-03 华为技术有限公司 一种短信中心集群的控制方法、设备及系统
CN103561428A (zh) * 2013-10-10 2014-02-05 东软集团股份有限公司 短信网关集群系统中的节点弹性分配方法及系统
CN103713935A (zh) * 2013-12-04 2014-04-09 中国科学院深圳先进技术研究院 一种在线管理Hadoop集群资源的方法和装置
US20140245298A1 (en) * 2013-02-27 2014-08-28 Vmware, Inc. Adaptive Task Scheduling of Hadoop in a Virtualized Environment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020087693A1 (en) * 2000-12-29 2002-07-04 Marshall Donald Brent Method and system for distributing service functionality
CN101772140A (zh) * 2009-12-25 2010-07-07 中兴通讯股份有限公司 一种自适应节能方法及具备该功能的业务系统
US8959223B2 (en) * 2011-09-29 2015-02-17 International Business Machines Corporation Automated high resiliency system pool
CN103188277B (zh) * 2011-12-27 2016-05-18 中国电信股份有限公司 负载能耗管理系统、方法和服务器
CN103906207B (zh) * 2014-03-03 2017-07-18 东南大学 基于自适应按需唤醒技术的无线传感器网络数据传输方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102143563A (zh) * 2010-07-30 2011-08-03 华为技术有限公司 一种短信中心集群的控制方法、设备及系统
US20140245298A1 (en) * 2013-02-27 2014-08-28 Vmware, Inc. Adaptive Task Scheduling of Hadoop in a Virtualized Environment
CN103561428A (zh) * 2013-10-10 2014-02-05 东软集团股份有限公司 短信网关集群系统中的节点弹性分配方法及系统
CN103713935A (zh) * 2013-12-04 2014-04-09 中国科学院深圳先进技术研究院 一种在线管理Hadoop集群资源的方法和装置

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414274A (zh) * 2019-01-04 2020-07-14 营邦企业股份有限公司 运用于数据中心的机柜异常状态的远端排除方法
CN110445725A (zh) * 2019-07-08 2019-11-12 福建天泉教育科技有限公司 新加入负载节点分流的方法、存储介质
CN110445725B (zh) * 2019-07-08 2022-07-26 福建天泉教育科技有限公司 新加入负载节点分流的方法、存储介质
CN110932935A (zh) * 2019-11-26 2020-03-27 深圳前海微众银行股份有限公司 资源控制方法、装置、设备及计算机存储介质
CN114584986A (zh) * 2020-12-01 2022-06-03 中国联合网络通信集团有限公司 资源调度的方法和装置
CN114584986B (zh) * 2020-12-01 2024-04-09 中国联合网络通信集团有限公司 资源调度的方法和装置
CN114124969A (zh) * 2021-11-25 2022-03-01 广州市昊链信息科技股份有限公司 一种数据复制方法、装置、设备以及存储介质

Also Published As

Publication number Publication date
CN106155805A (zh) 2016-11-23

Similar Documents

Publication Publication Date Title
WO2016165242A1 (fr) Procédé de réglage de nombre de nœuds dans un système, et dispositif l'utilisant
US10355959B2 (en) Techniques associated with server transaction latency information
US10474451B2 (en) Containerized upgrade in operating system level virtualization
WO2016197876A1 (fr) Procédé de commande à distance, serveur à distance, dispositif de gestion et terminal
DE102020132078A1 (de) Ressourcenzuteilung basierend auf anwendbarem service level agreement
US20120017074A1 (en) Dynamic system mode switching
CN103957237A (zh) 一种弹性云的体系结构
CN113742031B (zh) 节点状态信息获取方法、装置、电子设备及可读存储介质
WO2014094422A1 (fr) Procédé et dispositif d'ajustement de spécification d'une machine virtuelle
CN109254922B (zh) 一种服务器BMC Redfish功能的自动化测试方法及装置
US9772920B2 (en) Dynamic service fault detection and recovery using peer services
WO2016192579A1 (fr) Procédé de traitement d'informations, plateforme de service en nuage et système de traitement d'informations
CN107682460B (zh) 一种分布式存储集群数据通信方法及系统
WO2015043407A1 (fr) Procédé, système et appareil d'inspection de service en ligne
DE112016002846T5 (de) Dynamisches Management eines Inaktivitätszeitgebers während der Kommunikation zwischen Prozessoren
US10122602B1 (en) Distributed system infrastructure testing
US20170111240A1 (en) Service Elastic Method and Apparatus in Cloud Computing
CN109766110B (zh) 一种控制方法、基板管理控制器及控制系统
CN105607606A (zh) 一种基于双主板架构的数据采集装置及方法
CN111435227B (zh) 一种智能家居设备测试方法、装置、设备及介质
US11314670B2 (en) Method, apparatus, and device for transmitting file based on BMC, and medium
US10680916B2 (en) Management of network elements in a cloud platform
WO2015117458A1 (fr) Procédé, dispositif et système de collecte d'informations d'anomalie
CN111338580B (zh) 一种磁盘性能优化的方法和设备
CN110601914B (zh) 监测服务器存活状态的方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15888945

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15888945

Country of ref document: EP

Kind code of ref document: A1