CN104063501B - copy balance method based on HDFS - Google Patents

copy balance method based on HDFS Download PDF

Info

Publication number
CN104063501B
CN104063501B CN201410321195.XA CN201410321195A CN104063501B CN 104063501 B CN104063501 B CN 104063501B CN 201410321195 A CN201410321195 A CN 201410321195A CN 104063501 B CN104063501 B CN 104063501B
Authority
CN
China
Prior art keywords
datanode
performance
data
hdfs
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410321195.XA
Other languages
Chinese (zh)
Other versions
CN104063501A (en
Inventor
罗光春
田玲
陈爱国
舒康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201410321195.XA priority Critical patent/CN104063501B/en
Publication of CN104063501A publication Critical patent/CN104063501A/en
Application granted granted Critical
Publication of CN104063501B publication Critical patent/CN104063501B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/119Details of migration of file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Abstract

The invention discloses a kind of copy balance method based on HDFS, its Design abstraction Performance class in cluster configuration, and the performance data of each DataNode is collected by heartbeat message, in Data Migration, the matching of DataNode is in addition to the node matching rule for needing to meet existing Balancer programs, also need to the performance indications data with reference to DataNode, evaluated according to the Performance Score of DataNode and the ratio of amount of storage, and matched between the DataNode that evaluation is optimal and evaluation is worst, the data volume for depositing DataNode is proportional to the performance of DataNode, improve HDFS distributed file system load balance abilities, lifting cluster performance.When cluster is set up, the performance difference without considering each node configuration in cluster.

Description

Copy balance method based on HDFS
Technical field
The present invention relates to data processing and control technical field, and in particular to a kind of copy balance method based on HDFS.
Background technology
In recent years, with user produce content be mark the WEB2.0 epoch arrival, blog, SNS, P2P, IM, The application such as picture, video is developed rapidly, and the life of people is increasingly pressed close in information service, and the data in internet show geometry The growth of formula.Although can rely on the powerful computing capability of supercomputer and storage capacity can meet the meter of internet data Calculate and memory requirement, but its cost is very high, it is difficult to it is widely used.Deposited to break through unit with cheap price The limitation of the abilities such as storage, calculating, internal memory, people have turned one's attention to distributed system.Common distributed computing scheme is used Idle computer, data are transmitted by Internet or Ethernet, allow every computer to contribute the computing capability of oneself and deposit Energy storage power.
HDFS is the distributed file system of Hadoop, is the abbreviation of Hadoop Distributed File System, There is provided an Error Tolerance and the mass data storage solution of high-throughput.HDFS splits the file into some data Block, each data block acquiescence preserves 64MB data;Meanwhile, HDFS is defaulted as each data block and preserves 3 copies, improves data Reliability and reading performance.The typical file sizes of HDFS are GB grades, additionally, HDFS also supports the other quantity of documents of millions.
HDFS frameworks are based on one group of specific node and build, including back end DataNode and metadata node NameNode, NameNode provide Metadata Service inside HDFS, the NameSpace of file system are managed, by all of file With the meta-data preservation of file in a file system tree;DataNode provides memory block for HDFS, for data storage File.HDFS communications protocol is built on ICP/IP protocol, and client is connected to by a configurable port NameNode, is interacted by ClientProtocol with NameNode, and DataNode be use DataNodeProtocol with NameNode is interacted.On redesign, DataNode by periodically sending heartbeat message keeping to NameNode and The communication of NameNode, information includes the attribute of data block, i.e. data block belongs to which file, data block ID, modification time etc.. At present, HDFS is used widely in various large-scale online services and large memory system, has become major websites Deng the mass memory de facto standard of online service company.
Although HDFS is an outstanding distributed file system, it still has many deficiencies, and problem of load balancing is just It is a weakness of HDFS.HDFS clusters are very easy to the unbalanced situation of disk utilization between machine and machine occur, than As added new back end in cluster.When unbalance condition occurs in HDFS, many problems, such as MR programs will be triggered More preferable network bandwidth utilization rate cannot be unable to reach between machine well using the advantage of local computing, machine disk without Method is utilized etc..In Hadoop, comprising a Balancer program, by running this program, HDFS clusters can be caused A state for balance is reached, developer is when Balancer programs are developed, it then follows following several principles:
1st, during fast resampling is performed, it is necessary to assure data occur without loss, not the backup number of change data, The data number of blocks possessed in each rack is not changed.
2nd, system manager can be by an order log-on data redistribution program or stopping fast resampling program.
3rd, the process of data block migration is performed, it is impossible to take excessive system resource, such as network bandwidth.
4th, when fast resampling program is performed, it is impossible to influence the task scheduling of NameNode to work.
5th, fast resampling task is performed separately as an independent process with NameNode, generally at another Fast resampling process is performed in Rebalancer Server.
The execution of Balancer programs comprises the following steps:
Step 1, Rebalance Server obtain each DataNode disk service condition from NameNode.
Step 2, which machinery requirement Rebalance Server calculate moves data, and which machine can receive movement Data, and obtained from NameNode and need mobile data distribution situation.
Step 3, Rebalance Server are calculated can move to another machine by the data block of which platform machine In device.
Step 4, purpose machine that the machine of mobile data block moves data is needed up, while deleting on oneself machine Data block data.
Step 5, Rebalance Server get the implementing result of this secondary data movement, and continue executing with this mistake Journey, never data can move or HDFS clusters and reached balance standard untill.
The mode of existing Balancer program works is all fit closely in most situations;But it is only simple Ground considers each node disk utilization rate, and determines migrating data task based on disk utilization rate, finally makes the number of each node storage Its disk size is proportional to according to amount.Assuming that such a case:1st, data are 3 parts of backups;2nd, HDFS is made up of 2 rack;3、2 Machine disk configuration in individual rack is different, and each disk space of machine is 1TB, each machine in rack2 in rack1 Disk space be 10TB;4th, 2 parts of backups of most of data are stored in first rack now.In a kind of such feelings Under condition, after operation Balancer programs, the disk remaining space in rack1 is far smaller than rack2, the number in whole HDFS clusters According to still imbalance.And in the large construction cluster of reality, it is difficult to ensure that the uniformity of cluster interior joint configuration, accordingly, it would be desirable to develop The new copy balance method based on HDFS.
The content of the invention
The technical problems to be solved by the invention are to provide a kind of copy balance method based on HDFS, and it is enabled to The data volume of DataNode storages is proportional to its performance.
The technical proposal for solving the technical problem of the invention is:
Copy balance method based on HDFS, comprises the following steps:
1), cluster configuration:
It is designed for representing the Performance classes of DataNode Performance Evaluating Indexes, Performance classes provide one GetPerformance methods for obtaining corresponding performance data;
The performance class of definition correspondence each performance indications of DataNode, performance class is to inherit the subclass of Performance classes;
Increase the performance indications data of correspondence performance class in the heartbeat message of HDFS communications protocol;
2), Data Collection:
DataNode collects performance indications data corresponding with each performance class by getPerformance methods;Then by DataNode timings send heartbeat message, by above-mentioned performance indications data is activation corresponding with each performance class to NameNode, and Above-mentioned performance indications data corresponding with each performance class are preserved by NameNode;
3) Balancer programs, are performed:
3.1st, the performance indications data of each DataNode are obtained from NameNode by Rebalance Server;
3.2nd, according to the performance indications data for obtaining, each DataNode Performance Scores are calculated;
3.3rd, calculate the ratio cc of each DataNode Performance Scores and its amount of storage, and each DataNode α values it is average Value αavg
3.4th, the performance error scope allowed by the threshold parameter acquirings of Balancer orders, according to αavgWith reference to The target zone of threshold parameter determination DataNode performances;DataNode, α value by α values better than target zone is in target In the range of DataNode and α value differences be divided into Q in the DataNode of target zoneover、Qmid、QunderThree queues;
3.5th, DataNode is matched, is comprised the following steps:
If 3.5.1, QunderAnd QoverEqual non-NULL, then by QunderMiddle DataNode and QoverIn DataNode carry out Match somebody with somebody;If QunderThe Q for emptyoverNon-NULL, then by QmidMiddle performance worst DataNode and QoverIn DataNode carry out Match somebody with somebody;If QunderNon-NULL and QoverIt is sky, then by QmidThe DataNode and Q of middle best performanceunderIn DataNode carry out Match somebody with somebody;
If 3.5.2, two DataNode that 3.5.1 chooses meet the node matching rule of Balancer programs, perform The matching of DataNode, into step 3.6, otherwise repeat step 3.5.1;
3.6th, data block selection:Data block is selected from DataNode to be moved out, if data block meets Balancer programs Data block selection rule, then perform data migration task, otherwise reselect data block;
3.7th, move out DataNode and the α values of DataNode of moving into are recalculated after completing data migration task, and by two Person is put into suitable queue;
3.8th, 3.5,3.6 and 3.7 are repeated, until QoverAnd QunderIt is sky.
Specifically, the performance class defined in cluster configuration step includes the CPU speed for obtaining DataNode CpuPerformance classes, the MemoryPerformance classes of memory size for obtaining DataNode and for obtaining The DiskPerformance classes of the disk size of DataNode.
Further, the corresponding weight of each performance class is respectively configured in cluster configuration step;In step 3.1, by Performance indications data and each performance class corresponding power of the Rebalance Server from each DataNode of acquisition in NameNode Weight;When step 3.2 calculates each DataNode Performance Scores, according to the corresponding performance indications data of each performance and its weight calculation The Performance Score of each DataNode.Specifically, in the step 3.2, according to the performance indications data and weight that obtain, using TOPSIS algorithms calculate optimal value, worst-case value and the approach degree of each DataNode performances, and using approach degree as its Performance Score.
The beneficial effects of the invention are as follows:In Data Migration, the matching of DataNode is except needing to meet existing The node matching rule of Balancer programs is outer, in addition it is also necessary to reference to the performance indications data of DataNode, according to DataNode's The ratio of Performance Score and amount of storage is evaluated, and is matched between the DataNode that evaluation is optimal and evaluation is worst, The data volume for depositing DataNode is proportional to the performance of DataNode, improves HDFS distributed file system load balancing energy Power, lifts cluster performance.When cluster is set up, the performance difference without considering each node configuration in cluster.
Brief description of the drawings
Fig. 1 is the brief class figure of cluster configuration modification of the present invention;
Fig. 2 is the schematic flow sheet of Data Collection of the invention;
Fig. 3 is the schematic flow sheet that Performance Score of the invention is calculated;
Fig. 4 is the matching schematic flow sheet of data redistribution of the invention.
Specific embodiment
Copy balance method based on HDFS of the invention, including cluster configuration, Data Collection, execution Balancer programs Three parts, the present invention is further described with reference to the accompanying drawings and examples.
1), cluster configuration
As shown in figure 1, being designed for representing the Performance classes of DataNode Performance Evaluating Indexes, Performance Class provides a getPerformance method for being used to obtain corresponding performance data.Definition correspondence each performances of DataNode refer to Target performance class, performance class is to inherit the subclass of Performance classes, specifically, performance class is included for obtaining DataNode The CpuPerformance classes of CPU speed, the MemoryPerformance classes of memory size for obtaining DataNode and DiskPerformance classes for obtaining the disk size of DataNode;The communications protocol of HDFS is changed, in heartbeat message Increase the performance indications data of correspondence performance class, the data structure that the communications protocol of HDFS is transmitted after modification is as shown in table 1.
Table 1, the data structure of the heart-beat protocol transmission including Performance classes
DatanodeRegistration reg
int xmitsInProgress
int xceiverCount
double[] performances
Performance classes and performance class are realized by changing the configuration file of HDFS distributed file systems.Property Can class it is corresponding with the performance indications of needs assessment, when using, performance class be not limited to CpuPerformance classes, MemoryPerformance classes, DiskPerformance classes, if user needs other Considerations and can inherit Performance classes simultaneously carry out relevant configuration.
2), Data Collection
DataNode collects performance indications data corresponding with each performance class by getPerformance methods;Then root According to amended HDFS communications protocol, DataNode timings send heartbeat message by above-mentioned performance indications corresponding with each performance class Data is activation preserves above-mentioned performance indications data corresponding with each performance class to NameNode by NameNode.With following property Energy scoring algorithm is adapted, and the parameter of Performance classes includes the classes of correspondence performance class and corresponding performance class weight Weight, Weight is configured in cluster configuration step.
In the present embodiment, specifically, using for convenience, individually the matching somebody with somebody for DataNode in cluster configuration step Default configuration is put and sets, as shown in Fig. 2 data acquisition flow is as follows:
After 2.1 systems start, configuration item dfs.DataNode.Performance.classes is read, if finding the configuration Item and its corresponding performance class, then using the performance indications data being separately configured for the DataNode, otherwise matched somebody with somebody using acquiescence The performance indications data put;Read configuration item dfs.NameNode.Performance.weight, if can find the configuration item and Form is correct, then these weights are sequentially assigned into each performance indications;The weight for otherwise giving tacit consent to each performance indications is impartial;
After 2.2 operation clusters, DataNodes constantly collects user-defined data, then sends heartbeat to NameNode Information.Performance indications data are stored using matrix data structures, and the data that the heartbeat message according to reception is included are more The data item of new matrix, performance indications data matrix is:
Wherein, m represents that m-th DataNode, n represent n-th performance indications.
3) Balancer programs, are performed
Balance orders use command format hadoop the balancer [- threshold of existing HDFS<threshold >], wherein, parameter threshold represents the error of permission, gives tacit consent to 10%, and the setting of this parameter is because it is difficult to ensure that migration The data volume of all DataNode storages and the ratio of performance quantized value are all just the same after data, so to allow a certain amount of mistake Rate.Balancer orders are monitored by NameNode and parsed before execution.
The execution of Balancer programs, compared with existing, differs primarily in that:In Data Migration, DataNode In addition to the node matching rule except needing to meet existing Balancer programs, in addition it is also necessary to reference to the performance indications number of DataNode According to, evaluated according to the Performance Score of DataNode and the ratio of amount of storage, and it is worst optimal and evaluation is evaluated Matched between DataNode, the data volume for depositing DataNode is proportional to the performance of DataNode.
The calculating of the Performance Score of DataNode can be calculated by existing any means as needed, in the present embodiment In, using TOPSIS algorithms.TOPSIS algorithms, are Technique for Order Preference by Similarity The abbreviation of to an Ideal Solution, it was proposed first by C.L.Hwang and K.Yoon in 1981, was also called good and bad solution Furthest Neighbor, its general principle is arranged by detecting evaluation object with optimal solution, the distance of worst solution namely approach degree Sequence, if evaluation object is near optimal solution simultaneously again farthest away from most inferior solution, for best;Otherwise it is not optimal.Wherein, optimal solution Each desired value all reach the optimal value of each evaluation index;Each desired value of most inferior solution all reaches the worst-case value of each evaluation index.
It is as shown in Figure 3, Figure 4, as follows the step of perform Balancer programs specifically, in the present embodiment:
3.1st, by Rebalance Server first from the performance indications data of each DataNode of acquisition in NameNode And the corresponding weight of each performance class;
3.2nd, according to the performance indications data and weight for obtaining, each DataNode performances are calculated most using TOPSIS algorithms The figure of merit, worst-case value and approach degree, and using approach degree as its Performance Score, step is as follows:
3.2.1 the z-score of each performance indications of each DataNode is calculated
Wherein,
xijIt is i-th value of j-th evaluation index of DataNode, μjIt is jth row namely j-th of all DataNode The average value of evaluation index, σjIt is jth row namely the variance of j-th evaluation index of all DataNode.
3.2.2 weight is distributed;
Wherein, wij=rijwj, i=1,2 ... m,
wjIt is in j-th weight of evaluation index.
3.2.3 the data according to 3.2.2 determine optimal solution A+Most inferior solution A-
Wherein,The optimal value in j-th evaluation index is represented, it is best to performance evaluation;OtherwiseIt is that j-th evaluation refers to The worst-case value put on, it is worst to performance evaluation.
3.2.4 each DataNode to optimal solution and the distance of worst solution is calculated respectively;
Wherein,I-th DataNode to the distance of optimal solution is represented,Represent i-th DataNode to worst solution Distance.
3.2.5 the approach degree C of each DataNode is calculated;
As approach degree CiRepresent that joint behavior is worst, works as C when=0iRepresent that joint behavior is best when=1.
3.3rd, it is approach degree and the ratio cc of its memory data output to calculate each DataNode Performance Scores, and respectively The average value α of the α values of DataNodeavg, formula is as follows:
Wherein, UiI-th memory data output of DataNode is represented, NameNode is obtained by heart-beat protocol from Datanode ;
3.4th, the performance error scope allowed by the threshold parameter acquirings of Balancer orders, is tied according to α avg The target zone of threshold parameter determination DataNode performances is closed, target zone is:
avg*(1-threshold),αavg*(1-threshold)]
By α values better than target zone, α values in target zone and α value differences in target zone DataNode divide be Qover、Qmid、QunderThree queues;
3.5th, DataNode is matched, is comprised the following steps:
If 3.5.1, QunderAnd QoverEqual non-NULL, then by QunderMiddle DataNode and QoverIn DataNode carry out Match somebody with somebody;If QunderThe Q for emptyoverNon-NULL, then by QmidMiddle performance worst DataNode and QoverIn DataNode carry out Match somebody with somebody;If QunderNon-NULL and QoverIt is sky, then by QmidThe DataNode and Q of middle best performanceunderIn DataNode carry out Match somebody with somebody;
If 3.5.2, two DataNode that 3.5.1 chooses meet the node matching rule of Balancer programs, perform The matching of DataNode, into step 3.6, otherwise repeat step 3.5.1, the node matching rule of above-mentioned Balancer programs The node matching rule of i.e. existing Balancer programs;
3.6th, data block selection:Data block is selected from DataNode to be moved out, if data block meets Balancer programs Data block selection rule, then perform data migration task, otherwise reselect data block, above-mentioned data block selection rule with it is existing Have identical, including:A) migrating data block is not in task queue queue, B) migrating data target DataNode do not have migrating data Block, C) migrating data block migration after will not reduce the original frame number of data block;
3.7th, move out DataNode and the α values of DataNode of moving into are recalculated after completing data migration task, and by two Person is put into suitable queue;
3.8th, 3.5,3.6 and 3.7 are repeated, until QoverAnd QunderIt is sky.

Claims (4)

1. the copy balance method based on HDFS, comprises the following steps:
1), cluster configuration:
It is designed for representing the Performance classes of DataNode Performance Evaluating Indexes, Performance classes provide one to be used for Obtain the getPerformance methods of corresponding performance data;
The performance class of definition correspondence each performance indications of DataNode, performance class is to inherit the subclass of Performance classes;
Increase the performance indications of each performance class of correspondence each performance indications of DataNode in the heartbeat message of HDFS communications protocol Data;
2), Data Collection:
DataNode collects performance indications data corresponding with each performance class by getPerformance methods;Then by DataNode timings send heartbeat message, by above-mentioned performance indications data is activation corresponding with each performance class to NameNode, and Above-mentioned performance indications data corresponding with each performance class are preserved by NameNode;
3) Balancer programs, are performed:
3.1st, the performance indications data of each DataNode are obtained from NameNode by Rebalance Server;
3.2nd, according to the performance indications data for obtaining, each DataNode Performance Scores are calculated;
The average value of the 3.3rd, ratio cc of each DataNode Performance Scores of calculating and its amount of storage, and the α values of each DataNode αavg
3.4th, the performance error scope allowed by the threshold parameter acquirings of Balancer orders, according to αavgWith reference to The target zone of threshold parameter determination DataNode performances;DataNode, α value by α values better than target zone is in target In the range of DataNode and α value differences be divided into Q in the DataNode of target zoneover、Qmid、QunderThree queues;
3.5th, DataNode is matched, is comprised the following steps:
If 3.5.1, QunderAnd QoverEqual non-NULL, then by QunderMiddle DataNode and QoverIn DataNode matched;If QunderThe Q for emptyoverNon-NULL, then by QmidMiddle performance worst DataNode and QoverIn DataNode matched;If QunderNon-NULL and QoverIt is sky, then by QmidThe DataNode and Q of middle best performanceunderIn DataNode matched;
If 3.5.2, two DataNode that 3.5.1 chooses meet the node matching rule of Balancer programs, perform The matching of DataNode, into step 3.6, otherwise repeat step 3.5.1;
3.6th, data block selection:Data block is selected from DataNode to be moved out, if data block meets the number of Balancer programs According to block selection rule, then data migration task is performed, otherwise reselect data block;
3.7th, after completing data migration task, the α values of move out DataNode and the DataNode that moves into are recalculated, and both is put In entering suitable queue;
3.8th, 3.5,3.6 and 3.7 are repeated, until QoverAnd QunderIt is sky.
2. the copy balance method based on HDFS according to claim 1, it is characterised in that:It is fixed in cluster configuration step Justice performance class include for obtain DataNode CPU speed CpuPerformance classes, for obtaining DataNode's The MemoryPerformance classes of memory size and the Di skPerformance for obtaining the disk size of DataNode Class.
3. the copy balance method based on HDFS according to claim 1, it is characterised in that:Divide in cluster configuration step The corresponding weight of each performance class is not configured;In step 3.1, by Rebalance Server first from the acquisition in NameNode The performance indications data of each DataNode and the corresponding weight of each performance class;Each DataNode Performance Scores are calculated in step 3.2 When, according to the corresponding performance indications data of each performance and its Performance Score of each DataNode of weight calculation.
4. the copy balance method based on HDFS according to claim 3, it is characterised in that:In the step 3.2, root According to the performance indications data and weight that obtain, using TOPSIS algorithms calculate the optimal value of each DataNode performances, worst-case value and Approach degree, and using approach degree as its Performance Score.
CN201410321195.XA 2014-07-07 2014-07-07 copy balance method based on HDFS Active CN104063501B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410321195.XA CN104063501B (en) 2014-07-07 2014-07-07 copy balance method based on HDFS

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410321195.XA CN104063501B (en) 2014-07-07 2014-07-07 copy balance method based on HDFS

Publications (2)

Publication Number Publication Date
CN104063501A CN104063501A (en) 2014-09-24
CN104063501B true CN104063501B (en) 2017-06-16

Family

ID=51551215

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410321195.XA Active CN104063501B (en) 2014-07-07 2014-07-07 copy balance method based on HDFS

Country Status (1)

Country Link
CN (1) CN104063501B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978236B (en) * 2015-07-07 2018-11-06 四川大学 HDFS load source destination node choosing methods based on more measurement indexs
CN105335513B (en) * 2015-10-30 2018-09-25 迈普通信技术股份有限公司 A kind of distributed file system and file memory method
CN105915626B (en) * 2016-05-27 2019-02-26 南京邮电大学 A kind of data copy initial placement method towards cloud storage
CN107872480B (en) * 2016-09-26 2020-12-29 中国电信股份有限公司 Big data cluster data balancing method and device
CN107800808A (en) * 2017-11-15 2018-03-13 广东奥飞数据科技股份有限公司 A kind of data-storage system based on Hadoop framework
CN110928836A (en) * 2019-10-18 2020-03-27 苏州浪潮智能科技有限公司 Data equalization optimization method based on HDFS, system terminal and storage medium
CN113626098A (en) * 2021-07-21 2021-11-09 长沙理工大学 Data node dynamic configuration method based on information interaction

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218233A (en) * 2013-05-09 2013-07-24 福州大学 Data allocation strategy in hadoop heterogeneous cluster
CN103425756A (en) * 2013-07-31 2013-12-04 西安交通大学 Copy management strategy for data blocks in HDFS
CN103631894A (en) * 2013-11-19 2014-03-12 浪潮电子信息产业股份有限公司 Dynamic copy management method based on HDFS

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218233A (en) * 2013-05-09 2013-07-24 福州大学 Data allocation strategy in hadoop heterogeneous cluster
CN103425756A (en) * 2013-07-31 2013-12-04 西安交通大学 Copy management strategy for data blocks in HDFS
CN103631894A (en) * 2013-11-19 2014-03-12 浪潮电子信息产业股份有限公司 Dynamic copy management method based on HDFS

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于纠删码和动态副本策略的HDFS改进系统;李晓恺等;《计算机应用》;20140801;第32卷(第8期);全文 *

Also Published As

Publication number Publication date
CN104063501A (en) 2014-09-24

Similar Documents

Publication Publication Date Title
CN104063501B (en) copy balance method based on HDFS
Liu et al. Adaptive asynchronous federated learning in resource-constrained edge computing
CN109564568B (en) Apparatus, method and machine-readable storage medium for distributed dataset indexing
CN111027736B (en) Micro-service combined deployment and scheduling method under multi-objective optimization
Nasir et al. The power of both choices: Practical load balancing for distributed stream processing engines
US10685283B2 (en) Demand classification based pipeline system for time-series data forecasting
US10198532B2 (en) Reducing data storage, memory, and computational time needed for ad-hoc data analysis
US10324983B2 (en) Interactive visualizations for a recurrent neural network
US20190394083A1 (en) Pipeline system for time-series data forecasting
US20210026806A1 (en) Distributed columnar data set subset retrieval
US10044505B2 (en) Stable data-processing in a distributed computing environment
CN108804576B (en) Domain name hierarchical structure detection method based on link analysis
US20150331910A1 (en) Methods and systems of query engines and secondary indexes implemented in a distributed database
CN103345508A (en) Data storage method and system suitable for social network graph
CN103595780A (en) Cloud computing resource scheduling method based on repeat removing
Hao et al. Research of Cloud Computing based on the Hadoop platform
CN103207920A (en) Parallel metadata acquisition system
CN103198097A (en) Massive geoscientific data parallel processing method based on distributed file system
CN106471501A (en) The method of data query, the storage method data system of data object
CN109726004A (en) A kind of data processing method and device
CN111708497A (en) Cloud environment data storage optimization method based on HDFS
CN103442056B (en) A kind of intelligent shoe cabinet control system based on cloud platform
Cen et al. Developing a disaster surveillance system based on wireless sensor network and cloud platform
US20170236132A1 (en) Automatically modeling or simulating indications of interest
Zhang Storage optimization algorithm design of cloud computing edge node based on artificial intelligence technology

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant