CN104063501A - Copy balancing method based HDFS - Google Patents

Copy balancing method based HDFS Download PDF

Info

Publication number
CN104063501A
CN104063501A CN201410321195.XA CN201410321195A CN104063501A CN 104063501 A CN104063501 A CN 104063501A CN 201410321195 A CN201410321195 A CN 201410321195A CN 104063501 A CN104063501 A CN 104063501A
Authority
CN
China
Prior art keywords
datanode
performance
data
class
hdfs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410321195.XA
Other languages
Chinese (zh)
Other versions
CN104063501B (en
Inventor
罗光春
田玲
陈爱国
舒康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201410321195.XA priority Critical patent/CN104063501B/en
Publication of CN104063501A publication Critical patent/CN104063501A/en
Application granted granted Critical
Publication of CN104063501B publication Critical patent/CN104063501B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/119Details of migration of file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a copy balancing method based on an HDFS. An abstract Performance class is designed in configuration items of a cluster, and performance data of all DataNodes are collected through heartbeat information; in the data migration process, matching of the DataNodes needs to meet node matching rules of an existing Balancer program, and the performance index data of the DataNodes also need to be referred; evaluation is conducted according to the specific value of performance grades and memory spaces of the DataNodes, and matching is conducted between the DataNode with the optimal evaluation and the DataNode with the worst evaluation; the amount of data stored by the DataNodes is made to be in direct proportion to the performance of the DataNodes, the load balancing capability of the HDFS is improved, and the performance of the cluster is improved. When the cluster is established, the performance difference of the configuration of all the nodes in the cluster does not need to be considered.

Description

Copy balance method based on HDFS
Technical field
The present invention relates to data processing and control technical field, be specifically related to a kind of copy balance method based on HDFS.
Background technology
In recent years, along with produce the arrival of content as the WEB2.0 epoch of mark taking user, the application such as blog, SNS, P2P, IM, picture, video develop rapidly, people's life is more and more pressed close in information service, and the data in internet present geometric growth.Although can rely on the powerful computing power of supercomputer and storage capacity can meet calculating and the memory requirement of internet data, its cost is very high, is difficult to be widely used.Break through the restriction of the abilities such as unit storage, calculating, internal memory for the price with cheap, people have turned one's attention to distributed system.Common Distributed Calculation project is used idle computing machine, by Internet or Ethernet transmission data, allows every computing machine contribute oneself computing power and storage capacity.
HDFS is the distributed file system of Hadoop, be the abbreviation of Hadoop Distributed File System, and the mass data storage solution of an Error Tolerance and high-throughput is provided.File is divided into some data blocks by HDFS, and each data block acquiescence is preserved 64MB data; Meanwhile, HDFS is defaulted as each data block and preserves 3 copies, improves reliability and the reading performance of data.The typical file size of HDFS is GB level, and in addition, HDFS also supports other quantity of documents of millions.
HDFS framework builds based on one group of specific node, comprise back end DataNode and metadata node NameNode, NameNode provides Metadata Service in HDFS inside, the NameSpace of managing file system, by the meta-data preservation of all files and file in a file system tree; DataNode is for HDFS provides storage block, is used for storing data files.HDFS communications protocol is to be all structured in ICP/IP protocol, and client is connected to NameNode by a configurable port, mutual by ClientProtocol and NameNode, and DataNode is that use DataNodeProtocol and NameNode are mutual.Design is upper again, and DataNode keeps and the communicating by letter of NameNode by periodically sending heartbeat message to NameNode, and information comprises the attribute of data block, and data block belongs to which file, data block ID, modification time etc.At present, HDFS is used widely in various large-scale online services and large memory system, becomes the mass memory de facto standard of the online service companies such as each large website.
Although HDFS is an outstanding distributed file system, it still has many deficiencies, and problem of load balancing is exactly a weakness of HDFS.HDFS cluster is very easy to occur the unbalanced situation of disk utilization between machine and machine, such as adding new back end in cluster.In the time there is unbalance condition in HDFS, will cause a lot of problems, such as MR program cannot be utilized the advantage of local computing well, between machine, cannot reach better network bandwidth utilization rate, machine disk cannot utilize etc.In Hadoop, comprise a Balancer program, by moving this program, can make HDFS cluster reach the state of a balance, developer, in exploitation Balancer program, has followed following several principles:
1, in the process heavily distributing at executing data, must ensure that data do not go out active, not change the backup number of data, not change the data block quantity possessing in each rack.
2, system manager can or stop the heavy distribution program of data by the heavy distribution program of an order log-on data.
3, the process of executing data piece migration, can not take too much system resource, as the network bandwidth.
4,, when the heavy distribution program of data is carried out, can not affect the task scheduling work of NameNode.
5, data heavily distribute task as a process independently, divide out execution with NameNode, conventionally the heavy distribution process of executing data in another Rebalancer Server.
The execution of Balancer program comprises the steps:
Step 1, Rebalance Server obtain each DataNode disk service condition from NameNode.
Which machinery requirement step 2, Rebalance Server calculate by data mobile, and which machine can be accepted mobile data, and from NameNode, obtain the data distribution situation that need to move.
Step 3, Rebalance Server calculate and the data block of which platform machine can be moved in another machine and goes.
Step 4, need the machine of Mobile data piece that the object machine of data mobile is got on, delete the data block data on own machine simultaneously.
Step 5, Rebalance Server get the execution result of this data mobile, and continue to carry out this process, never data can move or HDFS cluster and reached the standard of balance till.
The mode of existing Balancer program work is all fit closely in most situations; But it only considers each node disk utilization rate simply, and determines migration data task based on disk utilization rate, and the data volume that finally makes each node deposit is proportional to its disk size.Suppose such a case: 1, data are 3 parts of backups; 2, HDFS is made up of 2 rack; 3, the configuration of the machine disk in 2 rack is different, and in rack1, the disk space of each machine is 1TB, and in rack2, the disk space of each machine is 10TB; 4,2 of present most of data parts of backups are all stored in first rack.In so a kind of situation, after operation Balancer program, the disk remaining space in rack1 is far smaller than rack2, and the data in whole HDFS cluster are still uneven.And in actual large-scale cluster, be difficult to ensure the consistance of node configuration in cluster, therefore, need to develop the new copy balance method based on HDFS.
Summary of the invention
Technical matters to be solved by this invention is to provide a kind of copy balance method based on HDFS, and its data volume that can make DataNode deposit is proportional to its performance.
The technical scheme that technical solution problem of the present invention adopts is:
Copy balance method based on HDFS, comprises the following steps:
1), cluster configuration:
Be designed for the Performance class that represents DataNode Performance Evaluating Indexes, Performance class provides one for obtaining the getPerformance method of corresponding performance data;
Define the performance class of the each performance index of corresponding DataNode, performance class is to inherit the subclass of Performance class;
In the heartbeat message of HDFS communications protocol, increase the performance index data of corresponding performance class;
2), Data Collection:
DataNode collects the performance index data corresponding with each performance class by getPerformance method; Then by DataNode timed sending heartbeat message, performance index data corresponding above-mentioned and each performance class are sent to NameNode, and preserve performance index data corresponding to above-mentioned and each performance class by NameNode;
3), carry out Balancer program:
3.1, from NameNode, obtained the performance index data of each DataNode by Rebalance Server;
3.2, according to the performance index data of obtaining, calculate each DataNode Performance Score;
3.3, calculate the ratio cc of each DataNode Performance Score and its memory space, and the mean value α of the α value of each DataNode avg;
3.4, the performance error scope allowing by the threshold parameter acquiring of Balancer order, according to α avgdetermine the target zone of DataNode performance in conjunction with threshold parameter; The DataNode, DataNode and the α value difference of α value in target zone that α value are better than to target zone are divided into Q in the DataNode of target zone over, Q mid, Q underthree queues;
3.5, coupling DataNode, comprises the steps:
If 3.5.1 Q underand Q overall non-NULLs, by Q undermiddle DataNode and Q overin DataNode mate; If Q underfor sky Q overnon-NULL, by Q midthe DataNode that performance is the poorest and Q overin DataNode mate; If Q undernon-NULL and Q overfor sky, by Q midthe DataNode of performance optimum and Q underin DataNode mate;
If two DataNode that 3.5.2 3.5.1 chooses meet the node matching rule of Balancer program, carry out the coupling of DataNode, enter step 3.6, otherwise repeating step 3.5.1;
3.6, data block selection: from DataNode to be moved out, select data block, if data block meets the data block selection rule of Balancer program, executing data migration task, otherwise reselect data block;
3.7, complete after Data Migration task, recalculate move out DataNode and the α value of DataNode of moving into, and both are put into suitable queue;
3.8, repeat 3.5,3.6 and 3.7, until Q overand Q underbe sky.
Concrete, the performance class defining in cluster configuration step comprise the CpuPerformance class of the CPU speed for obtaining DataNode, for obtain DataNode memory size MemoryPerformance class and for obtaining the DiskPerformance class of disk size of DataNode.
Further, in cluster configuration step, configure respectively weight corresponding to each performance class; In step 3.1, the performance index data of obtaining each DataNode by Rebalance Server from NameNode and weight corresponding to each performance class; In the time that step 3.2 is calculated each DataNode Performance Score, according to the Performance Score of performance index data corresponding to each performance and the each DataNode of weight calculation thereof.Concrete, in described step 3.2, according to performance index data and the weight obtained, employing TOPSIS algorithm calculates optimal value, worst-case value and the approach degree of each DataNode performance, and using approach degree as its Performance Score.
The invention has the beneficial effects as follows: in the time of Data Migration, the coupling of DataNode is except meeting the node matching rule of existing Balancer program, also need the performance index data with reference to DataNode, ratio according to the Performance Score of DataNode and memory space is evaluated, and evaluating optimum and evaluating between the poorest DataNode and mate, make data volume that DataNode deposits be proportional to the performance of DataNode, improve HDFS distributed file system load balance ability, promote cluster performance.In the time setting up cluster, without the performance difference of considering each node configuration in cluster.
Brief description of the drawings
Fig. 1 is the concise and to the point class figure of cluster configuration amendment of the present invention;
Fig. 2 is the schematic flow sheet of Data Collection of the present invention;
Fig. 3 is the schematic flow sheet that Performance Score of the present invention calculates;
Fig. 4 is the coupling schematic flow sheet of data redistribution of the present invention.
Embodiment
Copy balance method based on HDFS of the present invention, comprises cluster configuration, Data Collection, three parts of execution Balancer program, and below in conjunction with drawings and Examples, the present invention is further described.
1), cluster configuration
As shown in Figure 1, be designed for the Performance class that represents DataNode Performance Evaluating Indexes, Performance class provides one for obtaining the getPerformance method of corresponding performance data.Define the performance class of the each performance index of corresponding DataNode, performance class is to inherit the subclass of Performance class, concrete, performance class comprise the CpuPerformance class of the CPU speed for obtaining DataNode, for obtain DataNode memory size MemoryPerformance class and for obtaining the DiskPerformance class of disk size of DataNode; Revise the communications protocol of HDFS, increase the performance index data of corresponding performance class in heartbeat message, the data structure that after amendment, the communications protocol of HDFS is transmitted is as shown in table 1.
Table 1, comprise the data structure of the heart-beat protocol transmission of Performance class
DatanodeRegistration reg
int xmitsInProgress
int xceiverCount
double[] performances
Performance class and performance class are realized by the configuration file of amendment HDFS distributed file system.Performance class is corresponding with the performance index of needs assessment, in use, performance class is not limited to CpuPerformance class, MemoryPerformance class, DiskPerformance class, if user need to have other Considerations can inherit the go forward side by side line correlation configuration of Performance class.
2), Data Collection
DataNode collects the performance index data corresponding with each performance class by getPerformance method; Then according to amended HDFS communications protocol, performance index data corresponding above-mentioned and each performance class are sent to NameNode by DataNode timed sending heartbeat message, and preserve performance index data corresponding to above-mentioned and each performance class by NameNode.Adapt with following Performance Score algorithm, the parameter of Performance class comprises the classes of corresponding performance class and the weight of corresponding performance class weight, and Weight item is configured in cluster configuration step.
In the present embodiment, concrete, for easy to use, in cluster configuration step, for the independent configuration of DataNode and set default configuration, as shown in Figure 2, data acquisition flow is as follows:
After 2.1 systems start, read configuration item dfs.DataNode.Performance.classes, if find this configuration item and corresponding performance class thereof, use the performance index data of configuration separately for this DataNode, otherwise use the performance index data of default configuration; Read configuration item dfs.NameNode.Performance.weight, if can find this configuration item and correct format, by these weights according to the order of sequence assignment give each performance index; Otherwise give tacit consent to the weight equalization of each performance index;
After 2.2 operation clusters, DataNodes constantly collects user-defined data, then sends heartbeat message to NameNode.Performance index data acquisition is stored by matrix data structure, the data item of the Data Update matrix comprising according to the heartbeat message receiving, and performance index data matrix is:
matrix = x 11 . . . x 1 n . . . . . . x m 1 . . . x mn = DN 1 . . . DN m
Wherein, m represents m DataNode, and n represents n performance index.
3), carry out Balancer program
Balance order adopts the command format hadoop balancer[-threshold<thresholdGreatT.GreaT.G T of existing HDFS], wherein, parametric t hreshold represents the error allowing, acquiescence 10%, the setting of this parameter is all just the same because of the data volume of all DataNode storages and the ratio of performance quantized value after very difficult guarantee migration data, so will allow a certain amount of error rate.Balancer order is monitored by NameNode and is resolved before execution.
The execution of Balancer program, compared with existing, the key distinction is: in the time of Data Migration, the coupling of DataNode is except meeting the node matching rule of existing Balancer program, also need the performance index data with reference to DataNode, ratio according to the Performance Score of DataNode and memory space is evaluated, and is evaluating optimum and evaluating between the poorest DataNode and mate, and makes data volume that DataNode deposits be proportional to the performance of DataNode.
The calculating of the Performance Score of DataNode can be calculated by existing any means as required, in the present embodiment, adopts TOPSIS algorithm.TOPSIS algorithm, it is the abbreviation of Technique for Order Preference by Similarity to an IdealSolution, it is proposed in 1981 first by C.L.Hwang and K.Yoon, be called again the good and bad Furthest Neighbor of separating, its ultimate principle, by detecting evaluation object and optimum solution, the distance of poor solution is also that approach degree sorts, if the most close optimum solution while of evaluation object, again away from inferior solution, is best; Otherwise be not optimum.Wherein, each desired value of optimum solution all reaches the optimal value of each evaluation index; Each desired value of inferior solution all reaches the worst-case value of each evaluation index.
Concrete, in the present embodiment, as shown in Figure 3, Figure 4, the step of carrying out Balancer program is as follows:
3.1, by the first performance index data of obtaining each DataNode from NameNode and weight corresponding to each performance class of Rebalance Server;
3.2,, according to performance index data and the weight obtained, employing TOPSIS algorithm calculates optimal value, worst-case value and the approach degree of each DataNode performance, and using approach degree as its Performance Score, step is as follows:
3.2.1 calculate the z mark of the each performance index of each DataNode
D = r 11 . . . r 1 j . . . r 1 n . . . . . . . . . r i 1 . . . r ij . . . r in . . . . . . . . . r m 1 . . . r mj . . . r mn
Wherein,
X ijbe the value of j the evaluation index of i DataNode, μ jbe that j is listed as also the i.e. mean value of j the evaluation index of all DataNode, σ jbe that j is listed as also the i.e. variance of j the evaluation index of all DataNode.
3.2.2 assign weight;
W = w 11 . . . w 1 j . . . w 1 n . . . . . . . . . w i 1 . . . w ij . . . w in . . . . . . . . . w m 1 . . . w mj . . . w mn
Wherein, w ij=r ijw j, i=1,2 ... m,
W jfor the weight j evaluation index.
3.2.3 determine optimum solution A according to the data of 3.2.2 +inferior solution A -
A + = ( v 1 + , v 2 + , . . . , v n + )
A - = ( v 1 - , v 2 - , . . . , v n - )
Wherein, represent j the optimal value in evaluation index, best to performance evaluation; Otherwise be j the worst-case value in evaluation index, the poorest to performance evaluation.
3.2.4 calculate respectively each DataNode to optimum solution and the distance of poor solution;
S i + = &Sigma; j = 1 n ( v j + lg v j + w ij + ( 1 - v j + ) lg 1 - v j + 1 - w ij ) , i = 1,2 , . . . , m
S i - = &Sigma; j = 1 n ( v j - lg v j - w ij + ( 1 - v j - ) lg 1 - v j - 1 - w ij ) , i = 1,2 , . . . , m
Wherein, represent the distance of i DataNode to optimum solution, represent that i DataNode is to the distance of poor solution.
3.2.5 calculate the approach degree C of each DataNode;
C i = S i - S i + + S i - , i = 1,2 , . . . , m
As approach degree C iwithin=0 o'clock, represent that joint behavior is the poorest, work as C irepresent that joint behavior is best at=1 o'clock.
3.3, calculate the ratio cc that each DataNode Performance Score is approach degree and its memory data output, and the mean value α of the α value of each DataNode avg, formula is as follows:
&alpha; avg = &Sigma; i = 1 m C i &Sigma; i = 1 m U i
Wherein, U irepresent the memory data output of i DataNode, NameNode obtains from Datanode by heart-beat protocol;
3.4, the performance error scope that the threshold parameter acquiring by Balancer order allows, determines the target zone of DataNode performance in conjunction with threshold parameter according to α avg, target zone is:
avg*(1-threshold),α avg*(1-threshold)]
α value is better than to target zone, α value in target zone and α value difference is divided into Q in the DataNode of target zone over, Q mid, Q underthree queues;
3.5, coupling DataNode, comprises the steps:
If 3.5.1 Q underand Q overall non-NULLs, by Q undermiddle DataNode and Q overin DataNode mate; If Q underfor sky Q overnon-NULL, by Q midthe DataNode that performance is the poorest and Q overin DataNode mate; If Q undernon-NULL and Q overfor sky, by Q midthe DataNode of performance optimum and Q underin DataNode mate;
If two DataNode that 3.5.2 3.5.1 chooses meet the node matching rule of Balancer program, carry out the coupling of DataNode, enter step 3.6, otherwise repeating step 3.5.1, the node matching rule of above-mentioned Balancer program is also the node matching rule of existing Balancer program;
3.6, data block selection: select data block from DataNode to be moved out, if data block meets the data block selection rule of Balancer program, executing data migration task, otherwise reselect data block, above-mentioned data block selection rule, with existing identical, comprising: A) migration data piece not in task queue queue, B) migration data target DataNode do not have migration data piece, C) can not reduce the original frame number of data block after the migration of migration data piece;
3.7, complete after Data Migration task, recalculate move out DataNode and the α value of DataNode of moving into, and both are put into suitable queue;
3.8, repeat 3.5,3.6 and 3.7, until Q overand Q underbe sky.

Claims (4)

1. the copy balance method based on HDFS, comprises the following steps:
1), cluster configuration:
Be designed for the Performance class that represents DataNode Performance Evaluating Indexes, Performance class provides one for obtaining the getPerformance method of corresponding performance data;
Define the performance class of the each performance index of corresponding DataNode, performance class is to inherit the subclass of Performance class;
In the heartbeat message of HDFS communications protocol, increase the performance index data of corresponding performance class;
2), Data Collection:
DataNode collects the performance index data corresponding with each performance class by getPerformance method; Then by DataNode timed sending heartbeat message, performance index data corresponding above-mentioned and each performance class are sent to NameNode, and preserve performance index data corresponding to above-mentioned and each performance class by NameNode;
3), carry out Balancer program:
3.1, from NameNode, obtained the performance index data of each DataNode by Rebalance Server;
3.2, according to the performance index data of obtaining, calculate each DataNode Performance Score;
3.3, calculate the ratio cc of each DataNode Performance Score and its memory space, and the mean value α of the α value of each DataNode avg;
3.4, the performance error scope allowing by the threshold parameter acquiring of Balancer order, according to α avgdetermine the target zone of DataNode performance in conjunction with threshold parameter; The DataNode, DataNode and the α value difference of α value in target zone that α value are better than to target zone are divided into Q in the DataNode of target zone over, Q mid, Q underthree queues;
3.5, coupling DataNode, comprises the steps:
If 3.5.1 Q underand Q overall non-NULLs, by Q undermiddle DataNode and Q overin DataNode mate; If Q underfor sky Q overnon-NULL, by Q midthe DataNode that performance is the poorest and Q overin DataNode mate; If Q undernon-NULL and Q overfor sky, by Q midthe DataNode of performance optimum and Q underin DataNode mate;
If two DataNode that 3.5.2 3.5.1 chooses meet the node matching rule of Balancer program, carry out the coupling of DataNode, enter step 3.6, otherwise repeating step 3.5.1;
3.6, data block selection: from DataNode to be moved out, select data block, if data block meets the data block selection rule of Balancer program, executing data migration task, otherwise reselect data block;
3.7, complete after Data Migration task, recalculate move out DataNode and the α value of DataNode of moving into, and both are put into suitable queue;
3.8, repeat 3.5,3.6 and 3.7, until Q overand Q underbe sky.
2. the copy balance method based on HDFS according to claim 1, is characterized in that: the performance class defining in cluster configuration step comprise the CpuPerformance class of the CPU speed for obtaining DataNode, for obtain DataNode memory size MemoryPerformance class and for obtaining the DiskPerformance class of disk size of DataNode.
3. the copy balance method based on HDFS according to claim 1, is characterized in that: in cluster configuration step, configure respectively weight corresponding to each performance class; In step 3.1, by the first performance index data of obtaining each DataNode from NameNode and weight corresponding to each performance class of Rebalance Server; In the time that step 3.2 is calculated each DataNode Performance Score, according to the Performance Score of performance index data corresponding to each performance and the each DataNode of weight calculation thereof.
4. the copy balance method based on HDFS according to claim 3, it is characterized in that: in described step 3.2, according to performance index data and the weight obtained, employing TOPSIS algorithm calculates optimal value, worst-case value and the approach degree of each DataNode performance, and using approach degree as its Performance Score.
CN201410321195.XA 2014-07-07 2014-07-07 copy balance method based on HDFS Active CN104063501B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410321195.XA CN104063501B (en) 2014-07-07 2014-07-07 copy balance method based on HDFS

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410321195.XA CN104063501B (en) 2014-07-07 2014-07-07 copy balance method based on HDFS

Publications (2)

Publication Number Publication Date
CN104063501A true CN104063501A (en) 2014-09-24
CN104063501B CN104063501B (en) 2017-06-16

Family

ID=51551215

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410321195.XA Active CN104063501B (en) 2014-07-07 2014-07-07 copy balance method based on HDFS

Country Status (1)

Country Link
CN (1) CN104063501B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978236A (en) * 2015-07-07 2015-10-14 四川大学 HDFS load source and sink node selection method based on multiple measurement indexes
CN105335513A (en) * 2015-10-30 2016-02-17 迈普通信技术股份有限公司 Distributed file system and file storage method
CN105915626A (en) * 2016-05-27 2016-08-31 南京邮电大学 Data copy initial placement method for cloud storage
CN107800808A (en) * 2017-11-15 2018-03-13 广东奥飞数据科技股份有限公司 A kind of data-storage system based on Hadoop framework
CN107872480A (en) * 2016-09-26 2018-04-03 中国电信股份有限公司 Big data cluster data balancing method and apparatus
CN110928836A (en) * 2019-10-18 2020-03-27 苏州浪潮智能科技有限公司 Data equalization optimization method based on HDFS, system terminal and storage medium
CN113626098A (en) * 2021-07-21 2021-11-09 长沙理工大学 Data node dynamic configuration method based on information interaction

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218233A (en) * 2013-05-09 2013-07-24 福州大学 Data allocation strategy in hadoop heterogeneous cluster
CN103425756A (en) * 2013-07-31 2013-12-04 西安交通大学 Copy management strategy for data blocks in HDFS
CN103631894A (en) * 2013-11-19 2014-03-12 浪潮电子信息产业股份有限公司 Dynamic copy management method based on HDFS

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218233A (en) * 2013-05-09 2013-07-24 福州大学 Data allocation strategy in hadoop heterogeneous cluster
CN103425756A (en) * 2013-07-31 2013-12-04 西安交通大学 Copy management strategy for data blocks in HDFS
CN103631894A (en) * 2013-11-19 2014-03-12 浪潮电子信息产业股份有限公司 Dynamic copy management method based on HDFS

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李晓恺等: "基于纠删码和动态副本策略的HDFS改进系统", 《计算机应用》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978236A (en) * 2015-07-07 2015-10-14 四川大学 HDFS load source and sink node selection method based on multiple measurement indexes
CN104978236B (en) * 2015-07-07 2018-11-06 四川大学 HDFS load source destination node choosing methods based on more measurement indexs
CN105335513A (en) * 2015-10-30 2016-02-17 迈普通信技术股份有限公司 Distributed file system and file storage method
CN105335513B (en) * 2015-10-30 2018-09-25 迈普通信技术股份有限公司 A kind of distributed file system and file memory method
CN105915626A (en) * 2016-05-27 2016-08-31 南京邮电大学 Data copy initial placement method for cloud storage
CN105915626B (en) * 2016-05-27 2019-02-26 南京邮电大学 A kind of data copy initial placement method towards cloud storage
CN107872480A (en) * 2016-09-26 2018-04-03 中国电信股份有限公司 Big data cluster data balancing method and apparatus
CN107800808A (en) * 2017-11-15 2018-03-13 广东奥飞数据科技股份有限公司 A kind of data-storage system based on Hadoop framework
CN110928836A (en) * 2019-10-18 2020-03-27 苏州浪潮智能科技有限公司 Data equalization optimization method based on HDFS, system terminal and storage medium
CN113626098A (en) * 2021-07-21 2021-11-09 长沙理工大学 Data node dynamic configuration method based on information interaction
CN113626098B (en) * 2021-07-21 2024-05-03 长沙理工大学 Data node dynamic configuration method based on information interaction

Also Published As

Publication number Publication date
CN104063501B (en) 2017-06-16

Similar Documents

Publication Publication Date Title
CN104063501A (en) Copy balancing method based HDFS
Mondal et al. Managing large dynamic graphs efficiently
US10324983B2 (en) Interactive visualizations for a recurrent neural network
US10044505B2 (en) Stable data-processing in a distributed computing environment
CN102662639A (en) Mapreduce-based multi-GPU (Graphic Processing Unit) cooperative computing method
US10747739B1 (en) Implicit checkpoint for generating a secondary index of a table
CN106471501A (en) The method of data query, the storage method data system of data object
CN104820708A (en) Cloud computing platform based big data clustering method and device
CN104809244A (en) Data mining method and device in big data environment
Li et al. Efficient subspace skyline query based on user preference using MapReduce
US10061621B1 (en) Managing resources in a configurable computing environment
US20170236132A1 (en) Automatically modeling or simulating indications of interest
Zhao et al. Research of P2P architecture based on cloud computing
Nasir et al. Partial key grouping: Load-balanced partitioning of distributed streams
Wang et al. Sublinear algorithms for big data applications
Liu et al. Parallelizing uncertain skyline computation against n‐of‐N data streaming model
US20220374424A1 (en) Join queries in data virtualization-based architecture
Pasquet et al. Autonomous multi-dimensional slicing for large-scale distributed systems
CN101340458A (en) Grid data copy generation method based on time and space limitation
CN111562990B (en) Lightweight serverless computing method based on message
Chen et al. Load balancing in mapreduce based on data locality
Liao et al. Fusion feature for LSH-based image retrieval in a cloud datacenter
Marcu et al. Storage and Ingestion Systems in Support of Stream Processing: A Survey
Tran Data storage for social networks: a socially aware approach
Wang et al. A Storage Optimization Model for Cloud Servers in Integrated Communication, Sensing, and Computation System

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant