CN104063501A

CN104063501A - Copy balancing method based HDFS

Info

Publication number: CN104063501A
Application number: CN201410321195.XA
Authority: CN
Inventors: 罗光春; 田玲; 陈爱国; 舒康
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2014-07-07
Filing date: 2014-07-07
Publication date: 2014-09-24
Anticipated expiration: 2034-07-07
Also published as: CN104063501B

Abstract

The invention discloses a copy balancing method based on an HDFS. An abstract Performance class is designed in configuration items of a cluster, and performance data of all DataNodes are collected through heartbeat information; in the data migration process, matching of the DataNodes needs to meet node matching rules of an existing Balancer program, and the performance index data of the DataNodes also need to be referred; evaluation is conducted according to the specific value of performance grades and memory spaces of the DataNodes, and matching is conducted between the DataNode with the optimal evaluation and the DataNode with the worst evaluation; the amount of data stored by the DataNodes is made to be in direct proportion to the performance of the DataNodes, the load balancing capability of the HDFS is improved, and the performance of the cluster is improved. When the cluster is established, the performance difference of the configuration of all the nodes in the cluster does not need to be considered.

Description

Copy balance method based on HDFS

Technical field

The present invention relates to data processing and control technical field, be specifically related to a kind of copy balance method based on HDFS.

Background technology

In recent years, along with produce the arrival of content as the WEB2.0 epoch of mark taking user, the application such as blog, SNS, P2P, IM, picture, video develop rapidly, people's life is more and more pressed close in information service, and the data in internet present geometric growth.Although can rely on the powerful computing power of supercomputer and storage capacity can meet calculating and the memory requirement of internet data, its cost is very high, is difficult to be widely used.Break through the restriction of the abilities such as unit storage, calculating, internal memory for the price with cheap, people have turned one's attention to distributed system.Common Distributed Calculation project is used idle computing machine, by Internet or Ethernet transmission data, allows every computing machine contribute oneself computing power and storage capacity.

HDFS is the distributed file system of Hadoop, be the abbreviation of Hadoop Distributed File System, and the mass data storage solution of an Error Tolerance and high-throughput is provided.File is divided into some data blocks by HDFS, and each data block acquiescence is preserved 64MB data; Meanwhile, HDFS is defaulted as each data block and preserves 3 copies, improves reliability and the reading performance of data.The typical file size of HDFS is GB level, and in addition, HDFS also supports other quantity of documents of millions.

HDFS framework builds based on one group of specific node, comprise back end DataNode and metadata node NameNode, NameNode provides Metadata Service in HDFS inside, the NameSpace of managing file system, by the meta-data preservation of all files and file in a file system tree; DataNode is for HDFS provides storage block, is used for storing data files.HDFS communications protocol is to be all structured in ICP/IP protocol, and client is connected to NameNode by a configurable port, mutual by ClientProtocol and NameNode, and DataNode is that use DataNodeProtocol and NameNode are mutual.Design is upper again, and DataNode keeps and the communicating by letter of NameNode by periodically sending heartbeat message to NameNode, and information comprises the attribute of data block, and data block belongs to which file, data block ID, modification time etc.At present, HDFS is used widely in various large-scale online services and large memory system, becomes the mass memory de facto standard of the online service companies such as each large website.

Although HDFS is an outstanding distributed file system, it still has many deficiencies, and problem of load balancing is exactly a weakness of HDFS.HDFS cluster is very easy to occur the unbalanced situation of disk utilization between machine and machine, such as adding new back end in cluster.In the time there is unbalance condition in HDFS, will cause a lot of problems, such as MR program cannot be utilized the advantage of local computing well, between machine, cannot reach better network bandwidth utilization rate, machine disk cannot utilize etc.In Hadoop, comprise a Balancer program, by moving this program, can make HDFS cluster reach the state of a balance, developer, in exploitation Balancer program, has followed following several principles:

1, in the process heavily distributing at executing data, must ensure that data do not go out active, not change the backup number of data, not change the data block quantity possessing in each rack.

2, system manager can or stop the heavy distribution program of data by the heavy distribution program of an order log-on data.

3, the process of executing data piece migration, can not take too much system resource, as the network bandwidth.

4,, when the heavy distribution program of data is carried out, can not affect the task scheduling work of NameNode.

5, data heavily distribute task as a process independently, divide out execution with NameNode, conventionally the heavy distribution process of executing data in another Rebalancer Server.

The execution of Balancer program comprises the steps:

Step 1, Rebalance Server obtain each DataNode disk service condition from NameNode.

Which machinery requirement step 2, Rebalance Server calculate by data mobile, and which machine can be accepted mobile data, and from NameNode, obtain the data distribution situation that need to move.

Step 3, Rebalance Server calculate and the data block of which platform machine can be moved in another machine and goes.

Step 4, need the machine of Mobile data piece that the object machine of data mobile is got on, delete the data block data on own machine simultaneously.

Step 5, Rebalance Server get the execution result of this data mobile, and continue to carry out this process, never data can move or HDFS cluster and reached the standard of balance till.

The mode of existing Balancer program work is all fit closely in most situations; But it only considers each node disk utilization rate simply, and determines migration data task based on disk utilization rate, and the data volume that finally makes each node deposit is proportional to its disk size.Suppose such a case: 1, data are 3 parts of backups; 2, HDFS is made up of 2 rack; 3, the configuration of the machine disk in 2 rack is different, and in rack1, the disk space of each machine is 1TB, and in rack2, the disk space of each machine is 10TB; 4,2 of present most of data parts of backups are all stored in first rack.In so a kind of situation, after operation Balancer program, the disk remaining space in rack1 is far smaller than rack2, and the data in whole HDFS cluster are still uneven.And in actual large-scale cluster, be difficult to ensure the consistance of node configuration in cluster, therefore, need to develop the new copy balance method based on HDFS.

Summary of the invention

Technical matters to be solved by this invention is to provide a kind of copy balance method based on HDFS, and its data volume that can make DataNode deposit is proportional to its performance.

The technical scheme that technical solution problem of the present invention adopts is:

Copy balance method based on HDFS, comprises the following steps:

1), cluster configuration:

Be designed for the Performance class that represents DataNode Performance Evaluating Indexes, Performance class provides one for obtaining the getPerformance method of corresponding performance data;

Define the performance class of the each performance index of corresponding DataNode, performance class is to inherit the subclass of Performance class;

In the heartbeat message of HDFS communications protocol, increase the performance index data of corresponding performance class;

2), Data Collection:

DataNode collects the performance index data corresponding with each performance class by getPerformance method; Then by DataNode timed sending heartbeat message, performance index data corresponding above-mentioned and each performance class are sent to NameNode, and preserve performance index data corresponding to above-mentioned and each performance class by NameNode;

3), carry out Balancer program:

3.1, from NameNode, obtained the performance index data of each DataNode by Rebalance Server;

3.2, according to the performance index data of obtaining, calculate each DataNode Performance Score;

3.3, calculate the ratio cc of each DataNode Performance Score and its memory space, and the mean value α of the α value of each DataNode _avg;

3.4, the performance error scope allowing by the threshold parameter acquiring of Balancer order, according to α _avgdetermine the target zone of DataNode performance in conjunction with threshold parameter; The DataNode, DataNode and the α value difference of α value in target zone that α value are better than to target zone are divided into Q in the DataNode of target zone _over, Q _mid, Q _underthree queues;

3.5, coupling DataNode, comprises the steps:

If 3.5.1 Q _underand Q _overall non-NULLs, by Q _undermiddle DataNode and Q _overin DataNode mate; If Q _underfor sky Q _overnon-NULL, by Q _midthe DataNode that performance is the poorest and Q _overin DataNode mate; If Q _undernon-NULL and Q _overfor sky, by Q _midthe DataNode of performance optimum and Q _underin DataNode mate;

If two DataNode that 3.5.2 3.5.1 chooses meet the node matching rule of Balancer program, carry out the coupling of DataNode, enter step 3.6, otherwise repeating step 3.5.1;

3.6, data block selection: from DataNode to be moved out, select data block, if data block meets the data block selection rule of Balancer program, executing data migration task, otherwise reselect data block;

3.7, complete after Data Migration task, recalculate move out DataNode and the α value of DataNode of moving into, and both are put into suitable queue;

3.8, repeat 3.5,3.6 and 3.7, until Q _overand Q _underbe sky.

Concrete, the performance class defining in cluster configuration step comprise the CpuPerformance class of the CPU speed for obtaining DataNode, for obtain DataNode memory size MemoryPerformance class and for obtaining the DiskPerformance class of disk size of DataNode.

Further, in cluster configuration step, configure respectively weight corresponding to each performance class; In step 3.1, the performance index data of obtaining each DataNode by Rebalance Server from NameNode and weight corresponding to each performance class; In the time that step 3.2 is calculated each DataNode Performance Score, according to the Performance Score of performance index data corresponding to each performance and the each DataNode of weight calculation thereof.Concrete, in described step 3.2, according to performance index data and the weight obtained, employing TOPSIS algorithm calculates optimal value, worst-case value and the approach degree of each DataNode performance, and using approach degree as its Performance Score.

The invention has the beneficial effects as follows: in the time of Data Migration, the coupling of DataNode is except meeting the node matching rule of existing Balancer program, also need the performance index data with reference to DataNode, ratio according to the Performance Score of DataNode and memory space is evaluated, and evaluating optimum and evaluating between the poorest DataNode and mate, make data volume that DataNode deposits be proportional to the performance of DataNode, improve HDFS distributed file system load balance ability, promote cluster performance.In the time setting up cluster, without the performance difference of considering each node configuration in cluster.

Brief description of the drawings

Fig. 1 is the concise and to the point class figure of cluster configuration amendment of the present invention;

Fig. 2 is the schematic flow sheet of Data Collection of the present invention;

Fig. 3 is the schematic flow sheet that Performance Score of the present invention calculates;

Fig. 4 is the coupling schematic flow sheet of data redistribution of the present invention.

Embodiment

Copy balance method based on HDFS of the present invention, comprises cluster configuration, Data Collection, three parts of execution Balancer program, and below in conjunction with drawings and Examples, the present invention is further described.

1), cluster configuration

As shown in Figure 1, be designed for the Performance class that represents DataNode Performance Evaluating Indexes, Performance class provides one for obtaining the getPerformance method of corresponding performance data.Define the performance class of the each performance index of corresponding DataNode, performance class is to inherit the subclass of Performance class, concrete, performance class comprise the CpuPerformance class of the CPU speed for obtaining DataNode, for obtain DataNode memory size MemoryPerformance class and for obtaining the DiskPerformance class of disk size of DataNode; Revise the communications protocol of HDFS, increase the performance index data of corresponding performance class in heartbeat message, the data structure that after amendment, the communications protocol of HDFS is transmitted is as shown in table 1.

Table 1, comprise the data structure of the heart-beat protocol transmission of Performance class

DatanodeRegistration reg
	int xmitsInProgress
int xceiverCount
	double[] performances

Performance class and performance class are realized by the configuration file of amendment HDFS distributed file system.Performance class is corresponding with the performance index of needs assessment, in use, performance class is not limited to CpuPerformance class, MemoryPerformance class, DiskPerformance class, if user need to have other Considerations can inherit the go forward side by side line correlation configuration of Performance class.

2), Data Collection

DataNode collects the performance index data corresponding with each performance class by getPerformance method; Then according to amended HDFS communications protocol, performance index data corresponding above-mentioned and each performance class are sent to NameNode by DataNode timed sending heartbeat message, and preserve performance index data corresponding to above-mentioned and each performance class by NameNode.Adapt with following Performance Score algorithm, the parameter of Performance class comprises the classes of corresponding performance class and the weight of corresponding performance class weight, and Weight item is configured in cluster configuration step.

In the present embodiment, concrete, for easy to use, in cluster configuration step, for the independent configuration of DataNode and set default configuration, as shown in Figure 2, data acquisition flow is as follows:

After 2.1 systems start, read configuration item dfs.DataNode.Performance.classes, if find this configuration item and corresponding performance class thereof, use the performance index data of configuration separately for this DataNode, otherwise use the performance index data of default configuration; Read configuration item dfs.NameNode.Performance.weight, if can find this configuration item and correct format, by these weights according to the order of sequence assignment give each performance index; Otherwise give tacit consent to the weight equalization of each performance index;

After 2.2 operation clusters, DataNodes constantly collects user-defined data, then sends heartbeat message to NameNode.Performance index data acquisition is stored by matrix data structure, the data item of the Data Update matrix comprising according to the heartbeat message receiving, and performance index data matrix is:

matrix = [\begin{matrix} x_{11} & . . . & x_{1 n} \\ . & . \\ . & . \\ . & . \\ x_{m 1} & . . . & x_{mn} \end{matrix}] = [\begin{matrix} {DN}_{1} \\ . \\ . \\ . \\ {DN}_{m} \end{matrix}]

Wherein, m represents m DataNode, and n represents n performance index.

3), carry out Balancer program

Balance order adopts the command format hadoop balancer[-threshold<thresholdGreatT.GreaT.G T of existing HDFS], wherein, parametric t hreshold represents the error allowing, acquiescence 10%, the setting of this parameter is all just the same because of the data volume of all DataNode storages and the ratio of performance quantized value after very difficult guarantee migration data, so will allow a certain amount of error rate.Balancer order is monitored by NameNode and is resolved before execution.

The execution of Balancer program, compared with existing, the key distinction is: in the time of Data Migration, the coupling of DataNode is except meeting the node matching rule of existing Balancer program, also need the performance index data with reference to DataNode, ratio according to the Performance Score of DataNode and memory space is evaluated, and is evaluating optimum and evaluating between the poorest DataNode and mate, and makes data volume that DataNode deposits be proportional to the performance of DataNode.

The calculating of the Performance Score of DataNode can be calculated by existing any means as required, in the present embodiment, adopts TOPSIS algorithm.TOPSIS algorithm, it is the abbreviation of Technique for Order Preference by Similarity to an IdealSolution, it is proposed in 1981 first by C.L.Hwang and K.Yoon, be called again the good and bad Furthest Neighbor of separating, its ultimate principle, by detecting evaluation object and optimum solution, the distance of poor solution is also that approach degree sorts, if the most close optimum solution while of evaluation object, again away from inferior solution, is best; Otherwise be not optimum.Wherein, each desired value of optimum solution all reaches the optimal value of each evaluation index; Each desired value of inferior solution all reaches the worst-case value of each evaluation index.

Concrete, in the present embodiment, as shown in Figure 3, Figure 4, the step of carrying out Balancer program is as follows:

3.1, by the first performance index data of obtaining each DataNode from NameNode and weight corresponding to each performance class of Rebalance Server;

3.2,, according to performance index data and the weight obtained, employing TOPSIS algorithm calculates optimal value, worst-case value and the approach degree of each DataNode performance, and using approach degree as its Performance Score, step is as follows:

3.2.1 calculate the z mark of the each performance index of each DataNode

D = [\begin{matrix} r_{11} & . . . & r_{1 j} & . . . & r_{1 n} \\ . & . & . \\ . & . & . \\ . & . & . \\ r_{i 1} & . . . & r_{ij} & . . . & r_{in} \\ . & . & . \\ . & . & . \\ . & . & . \\ r_{m 1} & . . . & r_{mj} & . . . & r_{mn} \end{matrix}]

Wherein,

X _ijbe the value of j the evaluation index of i DataNode, μ _jbe that j is listed as also the i.e. mean value of j the evaluation index of all DataNode, σ _jbe that j is listed as also the i.e. variance of j the evaluation index of all DataNode.

3.2.2 assign weight;

W = [\begin{matrix} w_{11} & . . . & w_{1 j} & . . . & w_{1 n} \\ . & . & . \\ . & . & . \\ . & . & . \\ w_{i 1} & . . . & w_{ij} & . . . & w_{in} \\ . & . & . \\ . & . & . \\ . & . & . \\ w_{m 1} & . . . & w_{mj} & . . . & w_{mn} \end{matrix}]

Wherein, w _ij=r _ijw _j, i=1,2 ... m,

W _jfor the weight j evaluation index.

3.2.3 determine optimum solution A according to the data of 3.2.2 ⁺inferior solution A ^-

A^{+} = (v_{1}^{+}, v_{2}^{+}, . . ., v_{n}^{+})

A^{-} = (v_{1}^{-}, v_{2}^{-}, . . ., v_{n}^{-})

Wherein, represent j the optimal value in evaluation index, best to performance evaluation; Otherwise be j the worst-case value in evaluation index, the poorest to performance evaluation.

3.2.4 calculate respectively each DataNode to optimum solution and the distance of poor solution;

S_{i}^{+} = Σ_{j = 1}^{n} (v_{j}^{+} \lg \frac{v_{j}^{+}}{w_{ij}} + (1 - v_{j}^{+}) \lg \frac{1 - v_{j}^{+}}{1 - w_{ij}}), i = 1,2, . . ., m

S_{i}^{-} = Σ_{j = 1}^{n} (v_{j}^{-} \lg \frac{v_{j}^{-}}{w_{ij}} + (1 - v_{j}^{-}) \lg \frac{1 - v_{j}^{-}}{1 - w_{ij}}), i = 1,2, . . ., m

Wherein, represent the distance of i DataNode to optimum solution, represent that i DataNode is to the distance of poor solution.

3.2.5 calculate the approach degree C of each DataNode;

C_{i} = \frac{S_{i}^{-}}{S_{i}^{+} + S_{i}^{-}}, i = 1,2, . . ., m

As approach degree C _iwithin=0 o'clock, represent that joint behavior is the poorest, work as C _irepresent that joint behavior is best at=1 o'clock.

3.3, calculate the ratio cc that each DataNode Performance Score is approach degree and its memory data output, and the mean value α of the α value of each DataNode _avg, formula is as follows:

α_{avg} = \frac{Σ_{i = 1}^{m} C_{i}}{Σ_{i = 1}^{m} U_{i}}

Wherein, U _irepresent the memory data output of i DataNode, NameNode obtains from Datanode by heart-beat protocol;

3.4, the performance error scope that the threshold parameter acquiring by Balancer order allows, determines the target zone of DataNode performance in conjunction with threshold parameter according to α avg, target zone is:

[α _avg*(1-threshold),α _avg*(1-threshold)]

α value is better than to target zone, α value in target zone and α value difference is divided into Q in the DataNode of target zone _over, Q _mid, Q _underthree queues;

3.5, coupling DataNode, comprises the steps:

If two DataNode that 3.5.2 3.5.1 chooses meet the node matching rule of Balancer program, carry out the coupling of DataNode, enter step 3.6, otherwise repeating step 3.5.1, the node matching rule of above-mentioned Balancer program is also the node matching rule of existing Balancer program;

3.6, data block selection: select data block from DataNode to be moved out, if data block meets the data block selection rule of Balancer program, executing data migration task, otherwise reselect data block, above-mentioned data block selection rule, with existing identical, comprising: A) migration data piece not in task queue queue, B) migration data target DataNode do not have migration data piece, C) can not reduce the original frame number of data block after the migration of migration data piece;

3.8, repeat 3.5,3.6 and 3.7, until Q _overand Q _underbe sky.

Claims

1. the copy balance method based on HDFS, comprises the following steps:

1), cluster configuration:

2), Data Collection:

3), carry out Balancer program:

3.5, coupling DataNode, comprises the steps:

3.8, repeat 3.5,3.6 and 3.7, until Q _overand Q _underbe sky.

2. the copy balance method based on HDFS according to claim 1, is characterized in that: the performance class defining in cluster configuration step comprise the CpuPerformance class of the CPU speed for obtaining DataNode, for obtain DataNode memory size MemoryPerformance class and for obtaining the DiskPerformance class of disk size of DataNode.

3. the copy balance method based on HDFS according to claim 1, is characterized in that: in cluster configuration step, configure respectively weight corresponding to each performance class; In step 3.1, by the first performance index data of obtaining each DataNode from NameNode and weight corresponding to each performance class of Rebalance Server; In the time that step 3.2 is calculated each DataNode Performance Score, according to the Performance Score of performance index data corresponding to each performance and the each DataNode of weight calculation thereof.

4. the copy balance method based on HDFS according to claim 3, it is characterized in that: in described step 3.2, according to performance index data and the weight obtained, employing TOPSIS algorithm calculates optimal value, worst-case value and the approach degree of each DataNode performance, and using approach degree as its Performance Score.