CN110618861A

CN110618861A - Hadoop cluster energy-saving system

Info

Publication number: CN110618861A
Application number: CN201910868588.5A
Authority: CN
Inventors: 倪丽娜; 张金泉; 刘浩然; 韩庆亮
Original assignee: Shandong University of Science and Technology
Current assignee: Shandong University of Science and Technology
Priority date: 2019-09-16
Filing date: 2019-09-16
Publication date: 2019-12-27
Also published as: WO2021051441A1

Abstract

The invention discloses a Hadoop cluster energy-saving system, which belongs to the field of information technology processing and mainly comprises resource data collection of a bottom layer, a load prediction and energy consumption calculation model of a middle layer and operation scheduling of an upper layer, wherein key technologies and strategies used by each layer are introduced in detail, then an energy consumption calculation model is established based on the utilization rate of a CPU and a memory, and meanwhile, according to a specific experimental environment of the application, C in the energy consumption model is calculated by using a Benchmark_α、C_βAnd C_oCoefficient value of (c).

Description

Hadoop cluster energy-saving system

Technical Field

The invention belongs to the field of information technology processing, and particularly relates to a Hadoop cluster energy-saving system.

Background

At the present stage, people pay more and more attention to the data center. From the initial pursuit of the size and number of data centers, the establishment of green data centers has been promoted to date. Enterprises continue to rely on data centers, and more workload may shift from local to cloud platforms. However, current data centers face a number of challenges: first, resources are designed and deployed according to peak demand. While the traffic or computing tasks are typically staged, most servers are also powered up to run at off-peak times. Secondly, the number of data centers is continuously increasing, in a typical data center, about 70% of electric energy is consumed by a server, only 30% of electric energy is consumed by communication equipment, storage and air conditioning equipment, and the IT energy consumption is increased year by year. In data centers of various scales at home and abroad, Hadoop clusters occupy a high proportion of deployment amount, and a large number of Hadoop clusters are configured in various fields such as webpage searching, data mining, recommended advertisements and the like. However, the design ideas of Hadoop task scheduling and data block storage consider more problems in the aspects of cluster performance, data security and the like. Therefore, the load balancing strategy of the Hadoop cluster makes each node in the running state all the time, and the problem of energy consumption is not considered. Some clusters can reach the number of hundreds of stations, so the Hadoop cluster is one of the main contributors of the energy consumption of the data center to a certain extent. The research on the energy-saving strategy of the Hadoop cluster in the aspects of job scheduling and storage has important significance in reducing the power Usage efficiency PUE (Power Usage efficiency) of the data center, and meanwhile, the research also has a positive effect on the further development of the Hadoop open source project.

In order to provide an accurate resource allocation decision basis, the state change of the nodes needs to be monitored in real time, and meanwhile, on the basis of obtaining job information submitted by a user, a job scheduling queue with optimal energy consumption is obtained.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention provides the Hadoop cluster energy-saving system which is reasonable in design, overcomes the defects in the prior art and has a good effect.

In order to achieve the purpose, the invention adopts the following technical scheme:

a Hadoop cluster energy-saving system comprises a data collection module at the bottom layer, a load prediction module and an energy consumption model module at the middle layer and an operation scheduling module at the upper layer;

a data collection module configured to obtain cluster node data;

the cluster node data includes: (1) the condition of resource utilization of a node; (2) the situation that the tasks operated by the cluster nodes occupy the system resources;

the data collection module monitors the cluster performance index by means of the Agent probe technology of Zabbix; the data collection module works in a server end, a proxy end and an agent end mode;

the data collection module comprises a plurality of clusters, each cluster comprises two hosts, and each host corresponds to n cluster nodes; each host is provided with a server, each cluster node is provided with an agent, the server sends a request to the agent at intervals to collect the index data of the monitored item, the agent returns the requested data to the server, and the server writes the obtained data into a corresponding database to complete the collection and analysis of the data;

when the cluster scale of the Hadoop is too large, the pressure of the server end is increased, and the data collection module shares the analysis and collection work of cluster data by adopting proxy, so that the stability of a bottom system is ensured;

the load prediction module and the energy consumption model module are configured to be used for monitoring cluster performance, training the constructed LSTM network model for predicting the node load through the cluster node data collected by the data collection module at the bottom layer and providing support for the task scheduling at the upper layer;

the cluster performance monitoring method is specifically realized as follows:

the load prediction module and the energy consumption model module obtain real-time data of cluster node monitoring indexes by analyzing the CPU utilization rate and the memory allocation condition collected by the server end, and realize the monitoring of each cluster node through a set threshold;

the method specifically comprises the following steps: (1) visualizing the performance index; dynamically displaying real-time data including CPU utilization rate, memory allocation condition, running task of the node and resource allocated to the task, which are collected by a server end, by constructing a visual window; (2) collecting monitoring logs; writing the CPU utilization rate, the memory allocation condition and the occupation condition of each task resource on each node collected by the server end into a cluster log library; (3) monitoring frequency control; the receipt collecting device is used for setting the frequency of collecting receipts at the server end, namely collecting data 1 time at intervals;

training a constructed LSTM network model for predicting node load through cluster node data collected by a data collection module at the bottom layer, and providing support for upper-layer task scheduling; the specific implementation method comprises the following steps:

(1) predicting the trend of key indexes of the host within set time;

firstly, constructing an LSTM network model for predicting node load, using bottom layer data as training data, and continuously modifying index parameters of the LSTM network model to obtain a trained model; secondly, predicting the resource use condition of the cluster host in a given time period by using the trained model, obtaining task processing characteristics of the nodes, distributing proper tasks according to the characteristics, and obtaining an executable task list in a certain time period; finally, analyzing and processing the data through the sequence length with better effect obtained by experiments;

(2) calculating a cluster energy consumption value;

firstly, establishing an energy consumption calculation model; then, determining an index coefficient of the established model through actual test on the Hadoop cluster; finally, calculating cluster energy consumption in actual task scheduling;

the job scheduling module comprises a job scheduler which is configured to use the node load condition predicted by the trained LSTM model to schedule tasks according to the job information to be processed by the user;

the job scheduling module adopts a scheduling algorithm based on host state prediction, the algorithm needs to obtain job information input by a user in advance, the job information comprises CPU intensive type or memory intensive type, then a node capable of meeting the energy consumption requirement is selected from a cluster for processing, and a job scheduler allocates the node to the job to finish the node according to the job information of the user and the predicted node load condition;

the specific implementation method of the job scheduling module function is as follows:

(1) task oscillation migration control; (2) a threshold triggering mechanism; setting a dormant or activated threshold value of a node to provide support for task scheduling; (3) checking whether the minimum requirement calculated by the user is met; the scheduling program distributes the tasks to the active nodes, then checks the resources of the nodes and the requirements of the user tasks, activates the dormant nodes if the resources of the nodes and the requirements of the user tasks are not met, and finally counts the CPU utilization rate and the memory utilization rate of the nodes; (4) a node dormancy queue suggestion; and selecting the dormant node to be added into the node dormancy suggestion queue according to the CPU utilization rate and the memory utilization rate of the node.

Preferably, the resource usage of the cluster host in a given time period includes a trend of CPU utilization, a trend of memory utilization, and a load condition of the node in a future time period, and the prediction result provides a reference decision for scheduling in the uppermost layer.

The invention has the following beneficial technical effects:

(1) module low coupling

And the modules acquire data through an API (application programming interface) interface and call functions. After the Agent probe is installed on the newly added Hadoop node in the data collection module, the newly added Hadoop node can be seamlessly connected into the system, and the node can be automatically discovered and automatically collect index data such as a CPU (Central processing Unit), a memory and the like of the node for a model training layer to use; meanwhile, the computing resources of the node are also put into the resource pool; if a certain node fails and can not work normally, the states of other nodes can not be influenced, and the influence caused by the failure is reduced.

(2) The accuracy of the state prediction of the host is higher

The load prediction module of the middle layer divides original data into a plurality of different intervals, actual data are used for prediction in each data interval, then the predicted data are continuously put into known data to serve as historical data, and then the next data are continuously predicted, and the overall prediction is represented as rolling forward prediction. Since the actual data set is reused as input when the next time interval is reached, which is equivalent to data correction, the overall appearance is macroscopic trend correct.

Drawings

Fig. 1 is a diagram of the overall architecture of the system.

Detailed Description

The invention is described in further detail below with reference to the following figures and detailed description:

1. system architecture design

As shown in fig. 1, a Hadoop cluster energy-saving system includes a data collection module at a bottom layer, a load prediction module and an energy consumption model module at a middle layer, and an operation scheduling module at an upper layer;

a data collection module configured to obtain cluster node data;

the cluster performance monitoring method is specifically realized as follows:

the method specifically comprises the following steps: (1) visualizing the performance index; dynamically displaying real-time data including CPU utilization rate, memory allocation condition, running task of the node and resource allocated to the task, which are collected by a server end, by constructing a visual window; (2) collecting monitoring logs; writing the CPU utilization rate, the memory allocation condition and the occupation condition of each task resource on each node collected by the server end into a cluster log library; (3) monitoring frequency control; the frequency for collecting data at the server end is set, namely the data is collected at intervals;

(1) predicting the trend of key indexes of the host within set time;

(2) calculating a cluster energy consumption value;

The resource utilization conditions of the cluster host in a given time period comprise the trend of CPU utilization rate, the trend of memory utilization rate and the load condition of the node in the future time period, and the prediction result provides a reference decision for the scheduling of the uppermost layer.

2. Energy saving scheme analysis

The layered energy-saving system scheme is mainly characterized in that:

(1) module low coupling

(2) The accuracy of the state prediction of the host is higher

The model training module of the middle layer divides the original data into a plurality of different intervals, actual data are used for prediction in each data interval, and then the predicted data are continuously put into known data to serve as historical data, so that the next data are continuously predicted, and the prediction is represented as rolling forward prediction in the overall view. Since the reuse of the actual data set as input is equivalent to a data correction when the next time interval is reached, the overall appearance is macroscopic trend correct, but the disadvantage is that some details are missing.

3. Energy consumption model

3.1 selecting energy consumption model index

Research shows that the energy consumption of the Hadoop cluster is mainly determined by the inflow and outflow of a CPU, a memory and a network. The CPU and the memory are the main parts of the energy consumption of the node, and the energy consumption of the network is mainly generated by the switching device, for example, the energy consumption relationship with the hardware devices such as the switch is tight. There are of course other metrics that affect power consumption, such as disk I/O, server fan operating mode, etc., and these metrics are not considered because the present application is primarily concerned with the direction of resource allocation and data storage.

In summary, the energy consumption modeling is carried out based on two index data of the CPU and the memory, and energy consumption parts of other systems, such as a disk, network inflow and outflow, and other conventional system index energy consumption are regarded as basic constants.

In combination with the actual environment, there are many factors to be considered in establishing the energy consumption model based on the CPU and the memory, including the states of the host such as shutdown, hibernation, idle, etc.; the type of instruction set, complex instruction set or reduced instruction set may have a different number of computing units involved. However, taking these factors into account, modeling is costly. Research shows that the load of the cluster has positive correlation with the CPU utilization rate and the memory utilization rate of the node, so the node power can be calculated by formula (1):

P＝C₀+C_α*U_cpu+C_β*U_mem(0≤U_cpu≤1,0≤U_mem≤1) (1)；

in the above formula, C₀Is a constant, representing other base power independent of CPU utilization and memory usage, C_αIs the utilization ratio pair of CPUCoefficient of influence of energy consumption, C_βIs the coefficient of influence of the utilization of the memory on the energy consumption, C₀And C_βIs the coefficient value of the linear regression obtained by a large amount of model training, and the coefficient value obtained by different servers is different.

If the Hadoop cluster consists of n nodes, the total power can be represented by equation (2):

from this, the cluster at t can be obtained₀To t₁The total energy consumption value during the period, which is calculated by integrating the power of the nodes, is represented by E, as shown in equation (3):

3.2 energy consumption model coefficient calculation

In order to obtain a relatively accurate energy consumption calculation value, the coefficient in the energy consumption model needs to be tested and measured, the experimental environment selected by the application is built based on IBM x336, and a power analyzer is used for obtaining the following data:

(1) a CPU idle state power value and a full state power value.

(2) And under the condition that the CPU utilization rates are close to consistency, the power of different memory utilization rates is obtained.

(3) CPU and memory usage are simultaneously close to uniform power.

The tool for controlling the utilization rate of the host resources is a CPU and memory pressure testing tool evaluated by a server: COREMark and memory test reference HPCC. Specific values are given in the following table:

TABLE 1 Server Power measurement

Tab3.1 Measured value of server power

C in the actual cluster environment of the application₀、C_αAnd C_βIs calculated according to the data in table 1. When the memory utilization rate is close, the coefficient of the CPU is calculated to obtain:

C_α＝100*(P₂-P₁)/(CPU₂-CPU₁)＝16.24

the memory coefficient is calculated in the same way as: c_β＝7.46

And P is₄＝C_o+C_α*U_cpu+C_β*U_memC to be calculated_β7.46 and C_αSubstitution calculation results in 16.24: c_o＝102.16。

From the above calculations, the power calculation formula can be expressed as:

P＝n*102.16+16.24*(U_CPU1+U_CPU2+...+U_CPUn)+7.46*(U_mem1+U_mem2+...+U_memn)

(0≤U_CPUi≤1,0≤U_memi≤1)

the energy-saving system design scheme is introduced firstly, the energy-saving system design scheme mainly comprises resource data collection of a bottom layer, a load prediction and energy consumption calculation model of a middle layer and operation scheduling of an upper layer, key technologies and strategies used by each layer are introduced in detail, then the energy consumption calculation model is established based on the utilization rate of a CPU (Central processing Unit) and a memory, and meanwhile according to a specific experimental environment of the energy-saving system, C in the energy consumption model is calculated by using a Benchmark_α、C_βAnd C_oCoefficient value of (c).

It is to be understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art may make modifications, alterations, additions or substitutions within the spirit and scope of the present invention.

Claims

1. A Hadoop cluster economizer system which characterized in that: the energy-saving management system comprises a data collection module at the bottom layer, a load prediction module and an energy consumption model module at the middle layer and an operation scheduling module at the upper layer;

a data collection module configured to obtain cluster node data;

the cluster performance monitoring method is specifically realized as follows:

the method specifically comprises the following steps: (1) visualizing the performance index; dynamically displaying real-time data including CPU utilization rate, memory allocation condition, running task of the node and resource allocated to the task, which are collected by a server end, by constructing a visual window; (2) collecting monitoring logs; writing the CPU utilization rate, the memory allocation condition and the occupation condition of each task resource on each node collected by the server end into a cluster log library; (3) monitoring frequency control; the frequency for collecting data at the server end is set, namely, the data is collected for 1 time at intervals;

(1) predicting the trend of key indexes of the host within set time;

(2) calculating a cluster energy consumption value;

the job scheduling module adopts a scheduling algorithm based on host state prediction, the algorithm needs to obtain job information input by a user in advance, the job information comprises CPU intensive type or memory intensive type, and then a node capable of meeting the energy consumption requirement is selected from a cluster for processing; the job scheduler allocates a node for completing the job according to the job information of the user and the predicted node load condition;

2. The Hadoop cluster economizer system of claim 1 wherein: the resource utilization conditions of the cluster host in a given time period comprise the trend of CPU utilization rate, the trend of memory utilization rate and the load condition of the node in the future time period, and the prediction result provides a reference decision for the scheduling of the uppermost layer.