CN102937918A - Data block balancing method in operation process of HDFS (Hadoop Distributed File System) - Google Patents

Data block balancing method in operation process of HDFS (Hadoop Distributed File System) Download PDF

Info

Publication number
CN102937918A
CN102937918A CN2012103931769A CN201210393176A CN102937918A CN 102937918 A CN102937918 A CN 102937918A CN 2012103931769 A CN2012103931769 A CN 2012103931769A CN 201210393176 A CN201210393176 A CN 201210393176A CN 102937918 A CN102937918 A CN 102937918A
Authority
CN
China
Prior art keywords
node
task
data block
request
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012103931769A
Other languages
Chinese (zh)
Other versions
CN102937918B (en
Inventor
曹海军
伍卫国
董小社
樊源泉
魏伟
朱霍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201210393176.9A priority Critical patent/CN102937918B/en
Publication of CN102937918A publication Critical patent/CN102937918A/en
Application granted granted Critical
Publication of CN102937918B publication Critical patent/CN102937918B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data block balancing method in an operation process of an HDFS (Hadoop Distributed File System). The method comprises the following steps of: at first, pre-processing local task lists of nodes, and dividing the local task list of each node into entirely local tasks and non-entirely local tasks, so as to provide the basis for starting data block balance judgment of the HDFS; secondly, carrying out estimation and task request prediction on an operation rate of each node; thirdly, designing and realizing an assignment process of each node after completing said steps; fourthly, selecting proper nodes to move a data block between the proper nodes, so that the distribution of the data block can be matched with a predicted node task request sequence; and finally, balancing the data block. With the adoption of the data block balancing method, non-local map task execution which is possible to occur is judged by predicting the node task request in advance, and the proper data block is moved between the corresponding nodes, so that the distribution response of the local map tasks can be obtained when the nodes send an actual task request. Therefore, the completion efficiency of a Map step can be improved.

Description

A kind of HDFS runtime data piece balance method
Technical field
The invention belongs to field of computer technology, relate to a kind of data block balance method, HDFS(Hadoop Distributed File System under especially a kind of cloud computing environment) data block balance method in operational process.
Background technology
Hadoop is by Apache increase income high reliability of organization development and storage and the distributed paralleling calculation platform of enhanced scalability, develop as the basic platform of the search engine project Nutch that increases income the earliest, independent from the Nutch project afterwards, become one of the cloud computing platform of typically increasing income.The Hadoop core has realized by the distributed file system (Hadoop Distributed File System, HDFS) of piece storage and the MapReduce computation model that is used for Distributed Calculation.HDFS provides the storage system that is comprised of numerous nodes for the Hadoop cluster, when storage large-scale data file, file can be cut into the data block (last piece of data piece exception) of a plurality of formed objects, is distributed in the cluster on all nodes.In order to guarantee reliability, HDFS can create a plurality of copies according to being configured to each piece of data piece, and is placed on the different nodes of cluster.HDFS provides data storage service for upper strata MapReduce computing engines.Hadoop MapReduce is divided into many little tasks in parallel with application and carries out, and each little task is just processed the data block of the local storage of computing node.
The HDFS file system adopts piecemeal mechanism distributed storage data set, and improve system reliability by the data block redundancy strategy, each data block has a plurality of copies to exist simultaneously in system, these copies are distributed on a plurality of nodes in a plurality of frames in the system, prevent from causing losing of data block because individual node breaks down.In addition, this distributed redundancy scheme can guarantee that the concurrent of file read, so that HDFS is more suitable for the data processing mode of " once write, repeatedly read ".For realizing this data block redundancy strategy, the HDFS file system must ensure a plurality of copies and write simultaneously when data writing.
The HDFS file system needs to obtain a plurality of node configuration node pipelines by NameNode first when data writing flows, and when first node in the data stream arrival pipeline, this node storage Data Concurrent is given second node in the pipeline.Same, second node storage Data Concurrent given the 3rd node in the pipeline ... by that analogy, finish writing of multiple copies.
The HDFS file system when placing data block and copy thereof, consider following some:
1) when the node of submitting data to also is the node of storage data block in the HDFS file system, places the backup of a data block on this node;
2) backup of a data block must be distributed on a plurality of frames, avoids single frame fault to cause whole data unavailable;
3) be in the backup that data block also must be arranged on other interior nodes of same frame with the submission back end, can reduce like this communication and IO expense between frame as far as possible;
4) under the prerequisite that satisfies the front condition, take into account the utilization rate of considering node storage space, guarantee as far as possible each node storage utilization rate balance.
The Hadoop Map stage is the phase one of whole MapReduce Job execution, mainly finishes outer input data is converted into<Key Value〉intermediate data of form, offer the follow-up Reduce stage as the input data.Under the distributed parallel processing environment, the Hadoop Map stage uses distributed file system HDFS as the input data source, and under the governing principle of " mobile computing is more more economical than Mobile data ", user's Map processing procedure of appointment when the submit job is assigned to each HDFS data block store node carries out.The required input data of the processing procedure that is assigned with when certain node just when this node is stored, claim this processing procedure to satisfy the data locality.
Hadoop MapReduce has avoided the problem of a plurality of data block copy re-treatments by node task requests distribution mechanism.But from the analysis of Hadoop Map stage implementation as can be known, the locality of Map task input data also can produce a very large impact the execution speed of Map task.When Map input data will be saved data block Internet Transmission expense, raising Map tasks carrying speed when the Map tasks carrying is on same node.In existing Hadoop architecture, the distribution of HDFS data block copy directly affects the locality of Map task input data by the Hadoop task dispatcher.
Therefore, although existing HDFS data block Placement Strategy can guarantee roughly balance of each internodal data block quantity, but because the irrational distribution of some data block copy, after causing some node " to steal " the local Map task of other nodes, other nodes are because local Map task is assigned with same needs " task stealing ", this " task allocation offsets " phenomenon has further strengthened non-local data transfer amount of Map stage, bring huge transmission pressure for whole network, affect the operational efficiency of all stage.In addition, when between node during the number of data blocks balance, node task processing speed difference also can cause occurring largely non-localization task and process.
Summary of the invention
The object of the invention is to solve the lower problem of Map stage map task data locality that causes owing to HDFS data block skewness, a kind of HDFS runtime data piece balance method is provided, the method proposes to move the HDFS equilibrium strategy based on the runtime data piece, judge in advance the non-local map tasks carrying that may occur by prediction node task requests, and between corresponding node, move suitable data block, so that node can access the assignment response of local map task when sending the actual task request, thereby improve the efficient of finishing in Map stage.
The objective of the invention is to solve by the following technical programs:
This HDFS runtime data piece balance method may further comprise the steps:
1) the local task list pre-service of node
1.1 propose complete local task and non-complete local task: when there is a plurality of copy in each data block of HDFS, cause same task can appear in the local Map task list of different nodes, thereby remaining map number of tasks n in the local task list of certain node means that it is n that this node can distribute the local number of tasks of execution;
1.2 the preprocessing process of the local task list of node: when each node sends task requests successively, obtain from the local task list of node in the complete local task list that current executable task joins node, not being assigned with in the local task list of task then joins in the non-complete local task list;
Information Statistics when 2) node moves
Realize by design NodeEvaluateInfo class: treated total cost consuming time of data block total number sum, node reduced data piece of statistics node and the implementation progress tip of operation in such, know computing node average block processing time cost/sum after the above information, the current operation task of node excess time (1-tip)/(cost/sum);
3) assessment of node speed and task requests sequence prediction
3.1 the assessment of node speed: by step 2), adopt COST i/ NUM iThe data processing rate that represents each node, namely the node processing individual task is on average consuming time; Wherein, NUM iBe the completed local map number of tasks of a certain moment node i, COST iFor processing always consuming time that these local tasks spend;
3.2 the prediction of system task request sequence: till when the system task request sequence namely begins to finish to operation from current time, each moment sequence from node to host node application tasks carrying; At T 0Constantly, the progress of the positive Processing tasks of node i is P i, assess the node processing individual data piece T of being average consuming time that formula obtains by front speed i, the K subtask request time point t of this node then IkFor
T 0+(1-P i)×T i+(k-1)×T i k≥1;
Wherein k represents to count the request of this node k subtask from current time; After obtaining the task requests sequence of each node, determine in the following way the system task request sequence: note system spare number of tasks is m, and nodes is n in the system, to each node i, gets it and counts the time point of m subtask request from current time, is designated as { t I1, t I2... t Im, n node will consist of n * m time point { t 11, t 12... t 1m, t 21, t 22... t 2m..., t N1, t N2... t Nm; All time points are arranged by ascending order, got front m, then can obtain beginning remaining the request sequence R of m task the system from current time m.R m(j)=t IkShow that namely j task requests will be by node i at t in the system IkConstantly send, and this request is k request of node i;
4) distribution analysis and the realization of node task: the task distribution condition of under the node request sequence of step 3) prediction, determining in advance each node;
5) the right selection of data block mobile node: the node that from the task requests sequence, obtains the request of sending, then from the local task list of this node, obtain task, if task is empty, assert that then this node is the ready to balance node, joins it in ready to balance node listing; The data block mobile node is traversal allocate array to the first step of selection course, makes up mapping table Map<node, List<Task〉〉, record all the unallocated tasks on all data block source nodes;
6) movement of data block between node
Determine and just can carry out actual data block behind ready to balance node and the data block source node and move; May there be a plurality of data blocks to need to move because data block moves separate with the node tasks carrying and considers, for raising the efficiency and simplifying programming and realize, adopt the JAVA Thread Pool Technology to realize that whole data block moves.
Further, in the above step 4), simulation Hadoop scheduler is at the system task request sequence R of current prediction mUnder response process; According to request opportunity and system's current task distribution condition of each node, determine to the task assignment response of this request and judge that this task is distributed whether satisfy the task locality; The assignment record that sets the tasks: realize by the AllocatedRecord class whether data block corresponding to the time of such assignment flag by logger task, the node serial number of distributing to, distribution and this task has added generation exchange tabulation; Node task requests record, record send the node of this request and this request at the order of this node all are asked from current time backward; At last according to the system task request sequence R that determines in the step 3) m, j request R wherein m(j)=t IkBy node i at t IkConstantly send, and this request is k request of node i; By traversal task requests sequence R m, to j the task requests that occurs in the system, k task task from the local Map task list of node i (i, k)First schedulable local Map task is searched in beginning; Judge that schedulable local Map task is according to being:
7) task (i, k) be not empty;
8) allocate[task (i, k).id]==-1, namely this task device that is not scheduled is distributed to other node;
Work as task (i, k+m|m 〉=0)When satisfying the task locality, the assignment record allocate[task of corresponding task is set (i, k).id]=and i, finish the analysis to j task requests; Otherwise work as task (i, k+m|m 〉=0)Be not empty, with task (i, k+m|m 〉=0)Add in the commutative task queue of this node, judge next local Map task task (i, k+m+1|m 〉=0)Work as task (i, k+m|m 〉=0)During for sky, make up ready to balance node object BalanceNode and be recorded in the ready to balance node listing according to node i and switching task formation thereof.
Further, above-mentioned steps 5) detailed process is: for each unappropriated task task, obtain its data block copy memory node collection, node and task that the data block copy memory node is concentrated are pressed<Node, List<Task〉〉 form put into mapping table, same node is at List<Task〉afterbody adds task; After obtaining all data block source nodes, for each the ready to balance node in the ready to balance node listing, by the Ergodic Maps table, find first and ready to balance node to be positioned at the data block source node of same frame; The request of moving of construction data piece is also submitted to; Judge that foundation that two nodes are positioned at same frame is that the two node name prefix is consistent; When finding when being positioned at the data block source node of same frame with the ready to balance node, select first data block source node in the mapping table.
The invention has the beneficial effects as follows:
The present invention is directed to the difference of Hadoop operation different node processing data blocks in Map stage running process, so that distributing, data block more meets each joint behavior by the Mobile data piece, not only can reduce follow-up operation non-local Map task of similar operation on this data set distributes, improve Map task locality, promote each node to carry out balance at the Map phased mission, and can improve the task balance of current running job in Map stage subsequent process.In the Hadoop Map stage, the execution of each Map task is fully independently to each other.Each Map task only needs to obtain data from local disk when processing local Map task, except to JobTracker node report self processing progress almost without any need for network service.Therefore, when the local Map task of node processing, the Mobile data piece is less to the pressure that whole network causes.
Description of drawings
Fig. 1 is that the node task is distributed analytic process class figure;
Fig. 2 is that node appoints me to distribute analysis process figure;
Fig. 3 is ready to balance node and task matching figure to be allocated;
Fig. 4 is that the data block mobile node is to the coupling process flow diagram;
Fig. 5 is that data block moves the thread pool framework;
Fig. 6 is that data block moves thread pool class figure;
Fig. 7 is that data block moves between node.
Embodiment
Below in conjunction with accompanying drawing the present invention is described in detail.
Based on the HDFS data block equilibrium strategy that the runtime data piece moves, its specific implementation step is as follows:
The first step, the local task list pre-service of node.Local task list to each node carries out pre-service, and it is divided into complete local task part and non-complete local task part.The complete local task of all nodes has partly realized a complete process to input data set, and does not have to each other task to occur simultaneously.Ideally, if each node distributes complete local task simultaneously, then the distribution of HDFS data block is to meet scheduler to the distribution of each node, i.e. HDFS data block placement is balanced.Can determine that Conflict Tasks distributes this moment by the task requests in prediction node future, and judge that with this non-local task that may occur distributes, for the placement of balance HDFS data block provides reliable foundation.Simulation JobTracker dispatching distribution task process is adopted in the pre-service of the local task list of node.Each node task processing speed is identical, when each node will send task requests successively, the request of preprocessing process responsive node, obtain from the local task list of node in the complete local task list that current executable task joins node, not being assigned with in the local task list of task then joins in the non-complete local task list.Executable local task i.e. this task does not also have at other copy memory nodes and carries out.
Second step, Information Statistics during the node operation.The present invention represents in system design adopted NodeEvaluateInfo class, treated total cost consuming time of data block total number sum, node reduced data piece of statistics node and the implementation progress tip of operation in such, knowing after the above information can computing node average block processing time cost/sum, the current operation task of node excess time (1-tip)/(cost/sum);
The 3rd step, the assessment of node speed and task requests sequence prediction.1) node speed assessment.The present invention proposes a kind of node speed evaluation scheme, utilizes the node statistical information of collecting previously, adopts COST i/ NUM iThe data processing rate that represents each node, namely the node processing individual task is on average consuming time.Wherein, NUM iBe the completed local map number of tasks of a certain moment node i, COST iFor processing always consuming time that these local tasks spend; 2) system task request sequence prediction.Till when the system task request sequence namely begins to finish to operation from current time, each moment sequence from node to host node application tasks carrying.Theoretically, this sequence only just can accurately be known after operation is finished, and can predict based on the progress of node data processing speed and the current Processing tasks of node in the job run process; Suppose at T 0Constantly, the progress of the positive Processing tasks of node i is P i, assess the node processing individual data piece T of being average consuming time that formula obtains by front speed i, the K subtask request time point t of this node then IkFor
T 0+(1-P i)×T i+(k-1)×T i k≥1
Wherein k represents to count the request of this node k subtask from current time.
After obtaining the task requests sequence of each node, can determine in the following way the system task request sequence.
Note system spare number of tasks is m, and nodes is n in the system, to each node i, gets it and counts the time point of m subtask request from current time, is designated as { t I1, t I2... t Im, n node will consist of n * m time point { t 11, t 12... t 1m, t 21, t 22... t 2m..., t N1, t N2... t Nm.All time points are arranged by ascending order, got front m, then can obtain beginning remaining the request sequence R of m task the system from current time m.R m(j)=t IkShow that namely j task requests will be by node i at t in the system IkConstantly send, and this request is k request of node i.
The formalized description of this process is: known integer sequence A={a 1, a 2... a n, B={b 1, b 2... b n, structure integer set C={c i| c i=a i+ k*b i, k 〉=o} asks front m of ascending order arrangement C ordinal number is arranged.
In the 4th step, the node task is distributed analysis.
1) node task assigning process design.It is in fact that simulation Hadoop scheduler is at the system task request sequence R of current prediction that the node task is distributed analytic process mUnder response process.According to request opportunity and system's current task distribution condition of each node, determine to the task assignment response of this request and judge that this task is distributed whether satisfy the task locality.When the task requests of certain node can not obtain local Map task assignment response, this node was the ready to balance node.The node task is distributed associated class figure as shown in Figure 1.Node task requests record NodeRequest.It is the response process of simulation Hadoop task dispatcher under the system task request sequence of current prediction that the node task is distributed analysis, NodeRequest then describes a node task requests in the simulation process, wherein essential record send the node of this request, and should request at the order of this node from backward all requests of current time.
2) node task assigning process is realized.It is that the task of determining in advance each node under the node request sequence of prediction is distributed situation, its idiographic flow such as mistake that the node task is distributed analysis! Do not find Reference source.Shown in 2.According to the description of front, system task request sequence R mIn j the request R m(j)=t IkBy node i at t IkConstantly send, and this request is k request of node i.By traversal task requests sequence R m, to j the task requests that occurs in the system, k task task from the local Map task list of node i (i, k)First schedulable local Map task is searched in beginning.Judge that schedulable local Map task is according to being:
(1) task (i, k)Be not empty;
(2) allocate[task (i, k).id]==-1, namely this task device that is not scheduled is distributed to other node.Work as task (i, k+m|m 〉=0)When satisfying the task locality, the assignment record allocate[task of corresponding task is set (i, k).id]=and i, finish the analysis to j task requests.Otherwise work as task (i, k+m|m 〉= 0)Be not empty, with task (i, k+m|m 〉=0)Add in the commutative task queue of this node, judge next local Map task task (i, k+m+1|m 〉=0)Work as task (i, k+m|m 〉=0)During for sky, make up ready to balance node object BalanceNode and be recorded in the ready to balance node listing according to node i and switching task formation thereof.After finishing the analysis of whole task requests, unbalanced if the HDFS data block is placed, then part task assignment record still is-1 in the not empty and allocate array of ready to balance node listing.This moment, ready to balance node number was identical with unallocated item number in the allocate array, and unappropriated task is not the local Map task on any one ready to balance node, was assigned with this task otherwise must have node in the node task assigning process in front.
In the 5th step, the data block mobile node is to selecting.Just can be between the memory node at ready to balance node and unallocated task input block place after finishing the node task and distribute analyzing the Mobile data piece.The process of specified data piece source node and Mobile data piece as shown in Figure 3.When selecting coupling ready to balance node and unallocated task, to retrain to some extent and limit.(1) switching node is to select in the memory node of a plurality of copy data pieces of unallocated task.For reducing communication overhead, the preferential selection and the copy memory node of ready to balance node in same frame; (2) should avoid as much as possible carrying out transmission of data blocks between a plurality of ready to balance nodes and same data block store node.
For accelerating internodal matching process, the present invention adopts greedy algorithm, at first parse the data block store node of all unallocated tasks, between ready to balance node set and memory node collection, search subsequently possible coupling combination, treat the constraint condition that satisfies the front between balance node and the unallocated task data piece memory node in case find certain, just determine the matching relationship of the two, no longer search other possible more excellent matching results.The algorithm time complexity of this process is O (N), and wherein N is the ready to balance nodes.Concrete node matching process such as a mistake! Do not find Reference source.Shown in.
The data block mobile node is traversal allocate array to the first step of selection course, makes up mapping table Map<node, List<Task〉〉, record all the unallocated tasks on all data block source nodes.Detailed process is for each unappropriated task task, obtain its data block copy memory node collection moveableNodes, node among the movableNodes and task are pressed<Node, List<Task〉〉 form put into mapping table nodeToTasks, same node is at List<Task〉afterbody adds task.
In realization, select java.util.LinkedHashMap as the type of nodeToTasks.Such basic characteristics are according to access order the key-value pair in the mapping table to be carried out iteration, and behind the some key-value pairs of access, this key-value pair will be put into the afterbody of chained list.Using LinkedHashMap can effectively avoid carrying out data block between a plurality of ready to balance nodes and the same data source nodes moves.
After obtaining all data block source nodes, for each the ready to balance node in the ready to balance node listing, by traversal nodeToTasks, find first and ready to balance node to be positioned at the data block source node of same frame.The request of moving of construction data piece is also submitted to.Judge that foundation that two nodes are positioned at same frame is that the two node name prefix is consistent, is positioned at same frame such as node/rack-A/node01 and node/rack-A/node02.When finding when being positioned at the data block source node of same frame with the ready to balance node, select first data block source node among the nodeToTask.
In the 6th step, data block moves between node.Determine and just can carry out actual data block behind ready to balance node and the data block source node and move.Because moving separate with the node tasks carrying and consider, data block may have a plurality of data blocks to need to move, for raising the efficiency and simplifying programming and realize, adopt the JAVA Thread Pool Technology to realize the data block mobile module, such as mistake! Do not find Reference source.Shown in.Each data block moves the execution of calling that task is waited for idle thread in the thread pool, and after tasks carrying finished, thread returned to thread pool, accepts the execution of next task.1) thread pool design.Thread pool associated class figure as shown in Figure 6; 2) data block moves task.The corresponding a data block of each task, according to the task assignment record of ready to balance node, the node that can parse the corresponding data piece and select index maximum in the local Map task list according to mission number is as destination node.Each data block request of moving is packaged into a MoveTask class, has wherein comprised mobile data block Block and data block source node and the destination node BalanceNode of needs.Data block moves task Transfer by realizing the java.lang.Runnable interface, realizes that in run () method data block moves logic.Its process flow diagram as shown in Figure 7.
Each data block is moved request, Tracnsfer therefrom parses data block object and data block source node and destination node, send data block displacement instruction OP_REPLACE_BLOCK to destination node, send the OP_COPY_BLOCK instruction and finish the transmission of data block to source node by destination node.After data block copies successfully, the copy of this data block on the destination node notice NameNode deletion source node.

Claims (3)

1. a HDFS runtime data piece balance method is characterized in that, may further comprise the steps:
1) the local task list pre-service of node
1.1 propose complete local task and non-complete local task: when there is a plurality of copy in each data block of HDFS, cause same task can appear in the local Map task list of different nodes, thereby remaining map number of tasks n in the local task list of certain node means that it is n that this node can distribute the local number of tasks of execution;
1.2 the preprocessing process of the local task list of node: when each node sends task requests successively, obtain from the local task list of node in the complete local task list that current executable task joins node, not being assigned with in the local task list of task then joins in the non-complete local task list;
Information Statistics when 2) node moves
Realize by design NodeEvaluateInfo class: treated total cost consuming time of data block total number sum, node reduced data piece of statistics node and the implementation progress tip of operation in such, know computing node average block processing time cost/sum after the above information, the current operation task of node excess time (1-tip)/(cost/sum);
3) assessment of node speed and task requests sequence prediction
3.1 the assessment of node speed: by step 2), adopt COST i/ NUM iThe data processing rate that represents each node, namely the node processing individual task is on average consuming time; Wherein, NUM iBe the completed local map number of tasks of a certain moment node i, COST iFor processing always consuming time that these local tasks spend;
3.2 the prediction of system task request sequence: till when the system task request sequence namely begins to finish to operation from current time, each moment sequence from node to host node application tasks carrying; At T 0Constantly, the progress of the positive Processing tasks of node i is P i, assess the node processing individual data piece T of being average consuming time that formula obtains by front speed i, the K subtask request time point t of this node then IkBe T 0+ (1-P i) * T i+ (k-1) * T iK 〉=1;
Wherein k represents to count the request of this node k subtask from current time; After obtaining the task requests sequence of each node, determine in the following way the system task request sequence: note system spare number of tasks is m, and nodes is n in the system, to each node i, gets it and counts the time point of m subtask request from current time, is designated as { t I1, t I2... t Im, n node will consist of n * m time point { t 11, t 12... t 1m, t 21, t 22... t 2m..., t N1, t N2... t Nm; All time points are arranged by ascending order, got front m, then can obtain beginning remaining the request sequence R of m task the system from current time m.R m(j)=t IkShow that namely j task requests will be by node i at t in the system IkConstantly send, and this request is k request of node i;
4) distribution analysis and the realization of node task: the task distribution condition of under the node request sequence of step 3) prediction, determining in advance each node;
5) the right selection of data block mobile node: the node that from the task requests sequence, obtains the request of sending, then from the local task list of this node, obtain task, if task is empty, assert that then this node is the ready to balance node, joins it in ready to balance node listing; The data block mobile node is traversal allocate array to the first step of selection course, makes up mapping table Map<node, List<Task〉〉, record all the unallocated tasks on all data block source nodes;
6) movement of data block between node
Determine and just can carry out actual data block behind ready to balance node and the data block source node and move; May there be a plurality of data blocks to need to move because data block moves separate with the node tasks carrying and considers, for raising the efficiency and simplifying programming and realize, adopt the JAVA Thread Pool Technology to realize that whole data block moves.
2. HDFS runtime data piece balance method according to claim 1 is characterized in that, in the step 4), simulation Hadoop scheduler is at the system task request sequence R of current prediction mUnder response process; According to request opportunity and system's current task distribution condition of each node, determine to the task assignment response of this request and judge that this task is distributed whether satisfy the task locality; The assignment record that sets the tasks: realize by the AllocatedRecord class whether data block corresponding to the time of such assignment flag by logger task, the node serial number of distributing to, distribution and this task has added generation exchange tabulation; Node task requests record, record send the node of this request and this request at the order of this node all are asked from current time backward; At last according to the system task request sequence R that determines in the step 3) m, j request R wherein m(j)=t IkBy node i at t IkConstantly send, and this request is k request of node i; By traversal task requests sequence R m, to j the task requests that occurs in the system, k task task from the local Map task list of node i (i, k)First schedulable local Map task is searched in beginning; Judge that schedulable local Map task is according to being:
A) task (i, k)Be not empty;
B) allocate[task (i, k).id]==-1, namely this task device that is not scheduled is distributed to other node;
Work as task (i, k+m|m 〉=0)When satisfying the task locality, the assignment record allocate[task of corresponding task is set (i, k).id]=and i, finish the analysis to j task requests; Otherwise work as task (i, k+m|m 〉=0)Be not empty, with task (i, k+m|m 〉=0)Add in the commutative task queue of this node, judge next local Map task task (i, k+m+1|m 〉=0)Work as task (i, k+m|m 〉=0)During for sky, make up ready to balance node object BalanceNode and be recorded in the ready to balance node listing according to node i and switching task formation thereof.
3. HDFS runtime data piece balance method according to claim 1 and 2, it is characterized in that, the step 5) detailed process is: for each unappropriated task task, obtain its data block copy memory node collection, node and task that the data block copy memory node is concentrated are pressed<Node, List<Task〉〉 form put into mapping table, same node is at List<Task〉afterbody adds task; After obtaining all data block source nodes, for each the ready to balance node in the ready to balance node listing, by the Ergodic Maps table, find first and ready to balance node to be positioned at the data block source node of same frame; The request of moving of construction data piece is also submitted to; Judge that foundation that two nodes are positioned at same frame is that the two node name prefix is consistent; When finding when being positioned at the data block source node of same frame with the ready to balance node, select first data block source node in the mapping table.
CN201210393176.9A 2012-10-16 2012-10-16 A kind of HDFS runtime data block balance method Expired - Fee Related CN102937918B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210393176.9A CN102937918B (en) 2012-10-16 2012-10-16 A kind of HDFS runtime data block balance method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210393176.9A CN102937918B (en) 2012-10-16 2012-10-16 A kind of HDFS runtime data block balance method

Publications (2)

Publication Number Publication Date
CN102937918A true CN102937918A (en) 2013-02-20
CN102937918B CN102937918B (en) 2016-03-30

Family

ID=47696817

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210393176.9A Expired - Fee Related CN102937918B (en) 2012-10-16 2012-10-16 A kind of HDFS runtime data block balance method

Country Status (1)

Country Link
CN (1) CN102937918B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530182A (en) * 2013-10-22 2014-01-22 海南大学 Working scheduling method and device
CN103713935A (en) * 2013-12-04 2014-04-09 中国科学院深圳先进技术研究院 Method and device for managing Hadoop cluster resources in online manner
CN104239520A (en) * 2014-09-17 2014-12-24 西安交通大学 Historical-information-based HDFS (hadoop distributed file system) data block placement strategy
CN104317650A (en) * 2014-10-10 2015-01-28 北京工业大学 Map/Reduce type mass data processing platform-orientated job scheduling method
CN105426495A (en) * 2015-11-24 2016-03-23 中国农业银行股份有限公司 Data parallel reading method and apparatus
CN105578212A (en) * 2015-12-15 2016-05-11 南京邮电大学 Point-to-point streaming media real-time monitoring method under big data stream computing platform
CN105981033A (en) * 2014-02-14 2016-09-28 慧与发展有限责任合伙企业 Assign placement policy to segment set
CN107122242A (en) * 2017-03-28 2017-09-01 成都优易数据有限公司 A kind of balanced dicing method of big data of effective lifting distributed arithmetic performance
CN107872480A (en) * 2016-09-26 2018-04-03 中国电信股份有限公司 Big data cluster data balancing method and apparatus
CN108153759A (en) * 2016-12-05 2018-06-12 中国移动通信集团公司 A kind of data transmission method of distributed data base, middle tier server and system
CN105224612B (en) * 2015-09-14 2018-12-07 成都信息工程大学 MapReduce data Localization methodologies based on dynamically labeled preferred value

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010013074A1 (en) * 1998-03-17 2001-08-09 Tim P. Marslano System and method for using door translation to perform inter-process communication
US20050234867A1 (en) * 2002-12-18 2005-10-20 Fujitsu Limited Method and apparatus for managing file, computer product, and file system
CN101446966A (en) * 2008-12-31 2009-06-03 中国建设银行股份有限公司 Data storage method and system
CN102142032A (en) * 2011-03-28 2011-08-03 中国人民解放军国防科学技术大学 Method and system for reading and writing data of distributed file system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010013074A1 (en) * 1998-03-17 2001-08-09 Tim P. Marslano System and method for using door translation to perform inter-process communication
US20050234867A1 (en) * 2002-12-18 2005-10-20 Fujitsu Limited Method and apparatus for managing file, computer product, and file system
CN101446966A (en) * 2008-12-31 2009-06-03 中国建设银行股份有限公司 Data storage method and system
CN102142032A (en) * 2011-03-28 2011-08-03 中国人民解放军国防科学技术大学 Method and system for reading and writing data of distributed file system

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530182A (en) * 2013-10-22 2014-01-22 海南大学 Working scheduling method and device
CN103713935A (en) * 2013-12-04 2014-04-09 中国科学院深圳先进技术研究院 Method and device for managing Hadoop cluster resources in online manner
CN103713935B (en) * 2013-12-04 2017-05-03 中国科学院深圳先进技术研究院 Method and device for managing Hadoop cluster resources in online manner
CN105981033B (en) * 2014-02-14 2019-05-07 慧与发展有限责任合伙企业 Placement Strategy is distributed into set of segments
CN105981033A (en) * 2014-02-14 2016-09-28 慧与发展有限责任合伙企业 Assign placement policy to segment set
CN104239520A (en) * 2014-09-17 2014-12-24 西安交通大学 Historical-information-based HDFS (hadoop distributed file system) data block placement strategy
CN104239520B (en) * 2014-09-17 2017-06-20 西安交通大学 A kind of HDFS data block Placement Strategies based on historical information
CN104317650B (en) * 2014-10-10 2018-05-01 北京工业大学 A kind of job scheduling method towards Map/Reduce type mass data processing platforms
CN104317650A (en) * 2014-10-10 2015-01-28 北京工业大学 Map/Reduce type mass data processing platform-orientated job scheduling method
CN105224612B (en) * 2015-09-14 2018-12-07 成都信息工程大学 MapReduce data Localization methodologies based on dynamically labeled preferred value
CN105426495A (en) * 2015-11-24 2016-03-23 中国农业银行股份有限公司 Data parallel reading method and apparatus
CN105426495B (en) * 2015-11-24 2019-03-12 中国农业银行股份有限公司 Data parallel read method and device
CN105578212B (en) * 2015-12-15 2019-02-19 南京邮电大学 A kind of point-to-point Streaming Media method of real-time in big data under stream calculation platform
CN105578212A (en) * 2015-12-15 2016-05-11 南京邮电大学 Point-to-point streaming media real-time monitoring method under big data stream computing platform
CN107872480A (en) * 2016-09-26 2018-04-03 中国电信股份有限公司 Big data cluster data balancing method and apparatus
CN108153759A (en) * 2016-12-05 2018-06-12 中国移动通信集团公司 A kind of data transmission method of distributed data base, middle tier server and system
CN108153759B (en) * 2016-12-05 2021-07-09 中国移动通信集团公司 Data transmission method of distributed database, intermediate layer server and system
CN107122242A (en) * 2017-03-28 2017-09-01 成都优易数据有限公司 A kind of balanced dicing method of big data of effective lifting distributed arithmetic performance
CN107122242B (en) * 2017-03-28 2020-09-11 成都优易数据有限公司 Big data balanced slicing method for effectively improving distributed operation performance

Also Published As

Publication number Publication date
CN102937918B (en) 2016-03-30

Similar Documents

Publication Publication Date Title
CN102937918B (en) A kind of HDFS runtime data block balance method
US11656911B2 (en) Systems, methods, and apparatuses for implementing a scheduler with preemptive termination of existing workloads to free resources for high priority items
Kaur et al. Container-as-a-service at the edge: Trade-off between energy efficiency and service availability at fog nano data centers
US10514951B2 (en) Systems, methods, and apparatuses for implementing a stateless, deterministic scheduler and work discovery system with interruption recovery
CN104331321B (en) Cloud computing task scheduling method based on tabu search and load balancing
CN101986274B (en) Resource allocation system and resource allocation method in private cloud environment
US20180321971A1 (en) Systems, methods, and apparatuses for implementing a scalable scheduler with heterogeneous resource allocation of large competing workloads types using qos
CN100428131C (en) Method for distributing resource in large scale storage system
CN103377091A (en) Method and system for efficient execution of jobs in a shared pool of resources
CN103970607A (en) Computing Optimized Virtual Machine Allocations Using Equivalence Combinations
CN110321223A (en) The data flow division methods and device of Coflow work compound stream scheduling perception
CN104050042A (en) Resource allocation method and resource allocation device for ETL (Extraction-Transformation-Loading) jobs
CN103297499A (en) Scheduling method and system based on cloud platform
CN109947532B (en) Big data task scheduling method in education cloud platform
CN107864211B (en) Cluster resource dispatching method and system
CN107038070A (en) The Parallel Task Scheduling method that reliability is perceived is performed under a kind of cloud environment
CN107678752A (en) A kind of task processing method and device towards isomeric group
CN103095788A (en) Cloud resource scheduling policy based on network topology
CN109408590A (en) Expansion method, device, equipment and the storage medium of distributed data base
Gandomi et al. HybSMRP: a hybrid scheduling algorithm in Hadoop MapReduce framework
CN105373432A (en) Cloud computing resource scheduling method based on virtual resource state prediction
Lu et al. A genetic algorithm-based job scheduling model for big data analytics
CN105005503B (en) Cloud computing load balancing method for scheduling task based on cellular automata
CN104239520B (en) A kind of HDFS data block Placement Strategies based on historical information
Ghazali et al. A classification of Hadoop job schedulers based on performance optimization approaches

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160330

Termination date: 20201016

CF01 Termination of patent right due to non-payment of annual fee