CN108416054A - Dynamic HDFS copy number calculating methods based on file access temperature - Google Patents

Dynamic HDFS copy number calculating methods based on file access temperature Download PDF

Info

Publication number
CN108416054A
CN108416054A CN201810228575.7A CN201810228575A CN108416054A CN 108416054 A CN108416054 A CN 108416054A CN 201810228575 A CN201810228575 A CN 201810228575A CN 108416054 A CN108416054 A CN 108416054A
Authority
CN
China
Prior art keywords
file
access temperature
access
hot spot
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810228575.7A
Other languages
Chinese (zh)
Other versions
CN108416054B (en
Inventor
代钰
杨雷
化红翠
王际烽
张斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN201810228575.7A priority Critical patent/CN108416054B/en
Publication of CN108416054A publication Critical patent/CN108416054A/en
Application granted granted Critical
Publication of CN108416054B publication Critical patent/CN108416054B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of dynamic HDFS copy number calculating methods based on file access temperature, is related to data analysis technique field.Dynamic HDFS copy number calculating methods based on file access temperature, the rule that the access temperature of hot spot file changes over time is obtained by improved Markovian model type analysis first, and according to the calculation formula of file access temperature, the access temperature of file is predicted.Then queueing theory algorithm is used, the calculation formula of copy number is provided, dynamic adjusts the copy number of hot spot file.Dynamic HDFS copy number calculating methods provided by the invention based on file access temperature, solve the problems, such as the access bottleneck to hot spot file, improve the efficiency of service of cluster.

Description

Dynamic HDFS copy number calculating methods based on file access temperature
Technical field
The present invention relates to data analysis technique field more particularly to a kind of dynamic HDFS copies based on file access temperature A number calculating method.
Background technology
With modern the Internet technology development with science and technology progress, data with large capacity, diversity, high speed and The characteristics of authenticity, penetrates into the industry-by-industry and field of social development.The growth trend of mass data, to data and resource It is reasonably managed, and ensures the reliability of data, have become the critical problem that field of cloud calculation faces.
The distributed system architecture Hadoop developed by Apache funds club realizes a distributed field system It unites (Hadoop Distributed File System), abbreviation HDFS.HDFS has the characteristics of high fault tolerance, and is designed to It is deployed on cheap (low-cost) hardware;And it provides high-throughput (high throughput) to access using journey The data of sequence are suitble to those to have the application program of super large data set (large data set).HDFS is relaxed (relax) The requirement of POSIX can access the data in (streaming access) file system in the form of streaming.In the copy of HDFS In administrative mechanism, cluster acquiescence takes the replica management mechanism that 3 number of copies are preserved for each data block of file, but cannot expire Sufficient different user is to the requirements for access of different files, when user increases the visit capacity of some file, the data block of acquiescence Copy number cannot respond to a large amount of access request, cause hot spot file access bottleneck problem.Presently relevant replica management side Method gradually turns to dynamic copies construction strategy by static replica creation strategy makes the whole of cluster when external environment changes Individual character can be constant or can keep expeditiously providing service for client.But in dynamic copies construction strategy still The factor of great influence is not accounted for but had to the working efficiency of cluster there are some.
In the prior art, document《Efficient more replica management researchs under cloud environment》For large-scale cloud storage system at This benefit ensures that problem proposes a kind of dynamic copies creation method, it considers the pass between copy number and availability System adjusts copy number under the premise of considering to meet cloud storage system availability, but does not account for file access temperature With the relationship between copy number.Document《An Elastic Replication Management System for HDFS》 It is proposed that active/standby storage models realize the flexible management to HDFS copies, it manages engine to reality using complicated affairs When the data volume that accesses be identified, dynamically adjust copy number, and introduce correcting and eleting codes and be managed to copy number.This is Although system is effectively improved the performance of HDFS, it realizes that process is complicated, and when identifying real time access data, complexity is high.
Invention content
In view of the drawbacks of the prior art, the present invention provides a kind of dynamic HDFS copy number meters based on file access temperature Calculation method, realization calculate dynamic copies number.
A kind of dynamic HDFS copy number calculating methods based on file access temperature, include the following steps:
Step 1, according to file access log table on distributed file system HDFS and according to the calculating of file access temperature The access temperature of each file in measurement period is calculated in formula, and by the sum of access temperature of file in timing statistics to file Descending sort is carried out, preceding 20% file in selected and sorted list is as hot spot file, structure hot spot file-access temperature sequence Row are used as sequence to be predicted, the prediction for the temperature that accesses;
The calculation formula of the file access temperature is shown below:
Wherein, Hot (f) indicates that the access temperature of file f, AF (f)=N/T indicate that the access frequency of file f, N indicate text Access times of the part f in statistic period T,Indicate the data block size of file f, fsizeIndicate the size of file f,Expression is not more thanMaximum integer,Obtain the data block number of file f;
Step 2 carries out state space division, specific side using hierarchical clustering algorithm to hot spot file-access temperature sequence Method is:
It is the data set of N by hot spot file-one length of access temperature Sequence composition, the object in data set represents hot spot Access temperature of the file in different moments, the process that hierarchical clustering is carried out to hot spot file-access temperature data set are:
(1) regard each object in data set as one kind, N classes are obtained, the distance between class and class are each in two classes A data point two-by-two distance square median;
(2) two nearest classes of distance are merged into a class so that the sum of class reduces one;
(3) the distance between new class and other classes are recalculated;
(4) (2)-(3) step is repeated, until all data objects in data set are to the last merged into a class;
Based on above step, the clustering tree of hot spot file-access temperature sequence is obtained, according to cluster tree construction, defines horse The state space that Er Kefu is divided;
Step 3 carries out geneva inspection to the hot spot file-access temperature sequence for having divided state space, if met Geneva, using the sequence as improve Markov model list entries, otherwise the sequence cannot use improved markov Model is handled;
Step 4 will meet hot spot file-access temperature sequence of geneva as the input sequence for improving Markov model Row predict the access temperature of subsequent time hot spot file, and the access temperature that prediction obtains are written to hot spot file-access heat In the table of degrees of data library, specific method is:
Step 4.1:A step state is calculated and turns according to file-access temperature sequence based on the state space divided Move probability matrix P;
Step 4.2:It sets the corresponding state of current time file access hot value to initial state distribution, is denoted as p (0), probability distribution over states p (1)=p (0) P of subsequent time is calculated according to a step state transition probability matrix P;
Step 4.3:Take the state of distribution probability maximum value in the probability distribution over states p (1) of subsequent time as lower a period of time The state at quarter takes the sum of the standard deviation of hot spot file-access temperature sequence and the average value in dbjective state space to be used as lower a period of time The prediction at quarter accesses hot value;
Step 4.4:First value of list entries is removed, and using the access hot value newly predicted as pre- sequencing next time The last one value of row is added in sequence to be predicted, repeats above step, predicts the access temperature of hot spot file subsequent time;
Step 5, the queuing model based on the mono- queue Multiple server stations of M/M/r model copy access request, and herein On the basis of on setting node the handling capacity of copy be to determine the number of copy, specific method:
Step 5.1:Counting to obtain by inquiry hot spot file-access temperature database table specifies hot spot file to be unified under Count the access average request rate λ of the copy in the period;
Step 5.2:The cpu busy percentage threshold value U of server where copy is set, according to CPU effectiveness rules, cpu busy percentage Equal to request arriving rate divided by service rate, using following formula, the request service rate μ that single server is calculated is:
Step 5.3:Setting cluster total throughout is constrained to Q, based on the Little formula in queueing theory theory, service sojourn Time is multiplied by equal to service rateInverse, handling capacity be equal to service sojourn time inverse;It is more in isomorphism cluster environment Server service rate where copy is identical, to which copy number r be calculated by following two formula:
As shown from the above technical solution, the beneficial effects of the present invention are:It is provided by the invention to be based on file access temperature Dynamic HDFS copy number calculating methods, by based on improve Markov model the access temperature of file is predicted, Improve the accuracy of prediction.Meanwhile the copy number calculating method based on queueing theory, consider hot spot file access temperature at any time Between the rule that changes, dynamically adjust its copy number, the occurrence of high concurrent to cope with hot spot file accesses.Using queuing The method of opinion, by stored copies on node as Service Source, by analyzing the request rate and responsiveness of hot spot duplicate of the document, with Ensure that cluster throughput and reliability are target, copy number can be obtained by copy calculation formula, adjusted for subsequent dynamic Copy number makees place mat.
Description of the drawings
Fig. 1 is the stream of the dynamic HDFS copy number calculating methods provided in an embodiment of the present invention based on file access temperature Cheng Tu;
Fig. 2 is provided in an embodiment of the present invention using the access for improving Markov model prediction subsequent time hot spot file The prediction process schematic of temperature;
Fig. 3 is the prediction of the predicted value, improved Markov model of Markov model provided in an embodiment of the present invention Comparison diagram between value and actual value;
The number of copy and actual copy handling capacity is calculated based on queueing theory to be provided in an embodiment of the present invention in Fig. 4 The comparison diagram for the copy number being calculated.
Specific implementation mode
With reference to the accompanying drawings and examples, the specific implementation mode of the present invention is described in further detail.Implement below Example is not limited to the scope of the present invention for illustrating the present invention.
In the present embodiment, 3 racks are built, configure 4 virtual machines in each rack, and other three entities will be built Machine is respectively in the namenode nodes of Active states, and the namenode nodes in standby states, for preventing Only namenode Single Point of Faliures.Using third platform physical machine as calculate node, for obtaining file access daily record, predicting file It accesses temperature and calculates the number of copy.Cluster is configured to Hadoop version Hadoop-2.2.0, memory 32G, CPU Intel (R) Core (TM) i3-2120 CPU@3.30GHz, operating system CentOS-6.7, hard disk 2T, development language JAVA, R, Matlab。
Dynamic HDFS copy number calculating methods based on file access temperature, as shown in Figure 1, including the following steps:
Step 1, according to file access log table on distributed file system HDFS and according to access temperature calculation formula The access temperature of each file in measurement period is calculated, and file is carried out by the sum of access temperature of file in timing statistics Descending sort, preceding 20% file in selected and sorted list as hot spot file, make by structure hot spot file-access temperature sequence For sequence to be predicted, the prediction for the temperature that accesses;
The calculation formula of the file access temperature is shown below:
Wherein, Hot (f) indicates that the access temperature of file f, AF (f)=N/T indicate that the access frequency of file f, N indicate text Access times of the part f in statistic period T,Indicate the data block size of file f, fsizeIndicate the size of file f,Expression is not more thanMaximum integer,Obtain the data block number of file f.
In the present embodiment, with 5 days for a measurement period, the access frequency of the flu.txt files in 5 periods is counted altogether Rate.The access heat of flu.txt in measurement period is calculated according to the calculation formula for accessing temperature according to file access log sheet It is as shown in table 1 to spend information.
The access temperature information table of 1 flu.txt files of table
Access time 2017-08-01 2017-08-02 2017-08-03 2017-08-04 2017-08-05
Access temperature 262 486 632 300 570
Access time 2017-08-06 2017-10-02
Access temperature 401 382
Step 2 carries out state space division, specific side using hierarchical clustering algorithm to hot spot file-access temperature sequence Method is:
It is the data set of N by hot spot file-one length of access temperature Sequence composition, the object in data set represents hot spot Access temperature of the file in different moments, the process that hierarchical clustering is carried out to hot spot file-access temperature data set are:
(1) regard each object in data set as one kind, N classes are obtained, the distance between class and class are each in two classes A data point two-by-two distance square median;
(2) two nearest classes of distance are merged into a class so that the sum of class reduces one;
(3) the distance between new class and other classes are recalculated;
(4) (2)-(3) step is repeated, until all data objects in data set are to the last merged into a class;
Based on above step, the clustering tree of hot spot file-access temperature sequence is obtained, according to cluster tree construction, defines horse The state space that Er Kefu is divided.
In the present embodiment, temperature is accessed to history using hierarchy clustering method and divides spatiality, historical data is divided For 5 spatialities, A, B, C, D and E be used in combination that data set is marked.
Step 3 carries out geneva inspection to the hot spot file-access temperature sequence for having divided state space, if met Geneva, using the sequence as improve Markov model list entries, otherwise the sequence cannot use improved markov Model is handled;
Geneva examine specific method be:
To including n possible state index sequential value Xn={ x1, x2..., xn, the jth for shifting frequency matrix is arranged The sum of divided by the obtained value of summation that respectively arranges of each row, referred to as marginal probability, be shown below:
Wherein, fijIndicate index series Xn={ x1, x2..., xnIn from frequencies of the state i through step transfer arrival state j Rate, i, j ∈ E;
Then statisticWith degree of freedom for (n-1)2χ2It is distributed as Limit Distribution, wherein
Given level of significance α, if in the presence ofThen this sequence XnMeet geneva, otherwise the sequence Row cannot be handled with Markov model.
In the present embodiment, the cadence number transfer matrix f that can be shown below with R Language ProcessingsijTurn with probability Move matrix pij, and marginal probability matrix p as shown in table 2.j
2 marginal probability table of table
State 1 2 3 4 5
p.j 0.17021277 0.42553191 0.17021277 0.08510638 0.14893617
Statistic is calculated according to values aboveObtain χ as shown in table 32Statistic Computational chart.
3 χ of table2Normalized set table
In the present embodiment, level of significance α=0.1, according to χ2Normalized set table obtain quantileWherein, n=5.Therefore the history of this document accesses temperature and meets geneva, can be with The access temperature of file is predicted with Markov model.
Step 4 will meet hot spot file-access temperature sequence of geneva as the input sequence for improving Markov model Row predict the access temperature of subsequent time hot spot file, and the access temperature that prediction obtains are written to hot spot file-access heat In the table of degrees of data library, as shown in Fig. 2, specific method is:
Step 4.1:A step state is calculated and turns according to file-access temperature sequence based on the state space divided Move probability matrix P;
Step 4.2:It sets the corresponding state of current time file access hot value to initial state distribution, is denoted as p (0), probability distribution over states p (1)=p (0) P of subsequent time is calculated according to a step state transition probability matrix P;
Step 4.3:Take the state of distribution probability maximum value in the probability distribution over states p (1) of subsequent time as lower a period of time The state at quarter takes the sum of the standard deviation of hot spot file-access temperature sequence and the average value in dbjective state space to be used as lower a period of time The prediction at quarter accesses hot value;
Step 4.4:First value of list entries is removed, and using the access hot value newly predicted as pre- sequencing next time The last one value of row is added in sequence to be predicted, repeats above step, predicts the access temperature of hot spot file subsequent time.
In the present embodiment, in order to be verified to the accuracy that the method is predicted, improvement and unmodified horse is respectively adopted The flu.txt in 5 periods of Er Kefu models pair accesses temperature and compares.Markov model predicted value, improved Ma Erke Comparison diagram between the predicted value and actual value of husband's model is as shown in Figure 3.As seen from the figure, a cycle subsequent time is predicted When accessing hot value, the sequence due to accessing temperature is the same, so being obtained with unmodified Markov model with improving Access temperature and actual access hot value deviation be it is identical, and two methods prediction access hot value do not have with actual value There is too big difference.But in the access temperature at moment after prediction, using improved Markov model as a result of not Disconnected update sequence to be predicted, keeps the access temperature of prediction and actual deviation little, and unmodified Markov model, due to The ergodic of Markov model itself and balance distribution characteristics, with prediction frequency plus increase, the result of prediction with it is practical partially Difference is larger.The results show that the access hot value and actual value of improved Markov model prediction are relatively, and it is right The access trend of hot spot file also has relatively good prediction effect.
Step 5, the queuing model based on the mono- queue Multiple server stations of M/M/r model copy access request, and herein On the basis of on setting node the handling capacity of copy be to determine the number of copy, specific method:
Step 5.1:Counting to obtain by inquiry hot spot file-access temperature database table specifies hot spot file to be unified under Count the access average request rate λ of the copy in the period;
Step 5.2:The cpu busy percentage threshold value U of server where copy is set, according to CPU effectiveness rules, cpu busy percentage Equal to request arriving rate divided by service rate.To using following formula, calculate the request service rate u of single server:
Step 5.3:Setting cluster total throughout is constrained to Q, based on the Little formula in queueing theory theory, service sojourn Time is multiplied by equal to service rateInverse, handling capacity be equal to service sojourn time inverse;It is more in isomorphism cluster environment Server service rate where copy is identical, to which copy number r be calculated by following two formula:
In the present embodiment, predicted after obtaining the access temperature of hot spot file by improved Markov model, according to The access temperature of hot spot file, setting node cpu utilization threshold are 0.5, and the handling capacity of setting node copy is 100/s, then Per day handling capacity is that the handling capacity of 11 hours of access is 1,00*,11h,*36,00=,400 ten thousand, and pair is calculated based on queueing theory This number, and the copy number obtained with actual copy throughput calculation compares, comparison diagram is as shown in Figure 4.It can by figure Know, consider the request rate of people logging in and responsiveness of copy, the method for calculating copy number can be according to the trend adjustment pair for accessing temperature This number, in a cycle, the access temperature of hot spot file is in downward trend, at this point, the pair obtained based on queueing theory This number is fewer than actual copy number, within the subsequent cycle time, considers that the trend dynamic of hot spot file access temperature is adjusted Whole copy number, and the number of copies being calculated with goodput is relatively, demonstrates the validity of the method.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although Present invention has been described in detail with reference to the aforementioned embodiments, it will be understood by those of ordinary skill in the art that:It still may be used To modify to the technical solution recorded in previous embodiment, either which part or all technical features are equal It replaces;And these modifications or replacements, model defined by the claims in the present invention that it does not separate the essence of the corresponding technical solution It encloses.

Claims (5)

1. a kind of dynamic HDFS copy number calculating methods based on file access temperature, it is characterised in that:Include the following steps:
Step 1, according to file access log table on distributed file system HDFS and according to the calculation formula of file access temperature The access temperature of each file in measurement period is calculated, and file is carried out by the sum of access temperature of file in timing statistics Descending sort, preceding 20% file in selected and sorted list as hot spot file, make by structure hot spot file-access temperature sequence For sequence to be predicted, the prediction for the temperature that accesses;
Step 2 carries out state space division using hierarchical clustering algorithm to hot spot file-access temperature sequence;
Step 3 carries out geneva inspection to the hot spot file-access temperature sequence for having divided state space, if meeting geneva Property, using the sequence as improve Markov model list entries, otherwise the sequence cannot use improved Markov model To handle;
Step 4 will meet hot spot file-access temperature sequence of geneva as the list entries for improving Markov model, It predicts the access temperature of subsequent time hot spot file, and the access temperature that prediction obtains is written to hot spot file-access temperature In database table;
Step 5, the queuing model based on the mono- queue Multiple server stations of M/M/r model copy access request, and basic herein The handling capacity of copy determines the number of copy on upper setting node.
2. the dynamic HDFS copy number calculating methods according to claim 1 based on file access temperature, feature exist In:The calculation formula of file access temperature described in step 1 is shown below:
Wherein, Hot (f) indicates that the access temperature of file f, AF (f)=N/T indicate that the access frequency of file f, N indicate that file f exists Access times in statistic period T,Indicate the data block size of file f, fsizeIndicate the size of file f, Expression is not more thanMaximum integer,Obtain the data block number of file f.
3. the dynamic HDFS copy number calculating methods according to claim 1 based on file access temperature, feature exist In:The specific method of the step 2 is:
It is the data set of N by hot spot file-one length of access temperature Sequence composition, the object in data set represents hot spot file In the access temperature of different moments, the process that hierarchical clustering is carried out to hot spot file-access temperature data set is:
(1) regard each object in data set as one kind, N classes are obtained, the distance between class and class are each number in two classes Strong point two-by-two distance square median;
(2) two nearest classes of distance are merged into a class so that the sum of class reduces one;
(3) the distance between new class and other classes are recalculated;
(4) (2)-(3) step is repeated, until all data objects in data set are to the last merged into a class;
Based on above step, the clustering tree of hot spot file-access temperature sequence is obtained, according to cluster tree construction, defines Ma Erke The state space that husband divides.
4. the dynamic HDFS copy number calculating methods according to claim 1 based on file access temperature, feature exist In:The specific method of the step 4 is:
Step 4.1:It is general to be calculated according to file-access temperature sequence for the transfer of one step state based on the state space divided Rate matrix P;
Step 4.2:It sets the corresponding state of current time file access hot value to initial state distribution, is denoted as p (0), root Probability distribution over states p (1)=p (0) P of subsequent time is calculated according to a step state transition probability matrix P;
Step 4.3:Take the state of distribution probability maximum value in the probability distribution over states p (1) of subsequent time as subsequent time State takes the sum of the standard deviation of hot spot file-access temperature sequence and the average value in dbjective state space as subsequent time Prediction accesses hot value;
Step 4.4:First value of list entries is removed, and using the access hot value newly predicted as forecasting sequence next time The last one value is added in sequence to be predicted, repeats above step, predicts the access temperature of hot spot file subsequent time.
5. the dynamic HDFS copy number calculating methods according to claim 1 based on file access temperature, feature exist In:The specific method of the step 5 is:
Step 5.1:Count to obtain specified hot spot file in next statistics week by inquiring hot spot file-access temperature database table The access average request rate λ of copy in phase;
Step 5.2:The cpu busy percentage threshold value U of server where copy is arranged, according to CPU effectiveness rules, cpu busy percentage is equal to Request arriving rate divided by service rate, using following formula, the request service rate μ that single server is calculated is:
Step 5.3:Setting cluster total throughout is constrained to Q, based on the Little formula in queueing theory theory, service sojourn time It is multiplied by equal to service rateInverse, handling capacity be equal to service sojourn time inverse;In isomorphism cluster environment, more copies Place server service rate is identical, to which copy number r be calculated by following two formula:
CN201810228575.7A 2018-03-20 2018-03-20 Method for calculating number of copies of dynamic HDFS (Hadoop distributed File System) based on file access heat Expired - Fee Related CN108416054B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810228575.7A CN108416054B (en) 2018-03-20 2018-03-20 Method for calculating number of copies of dynamic HDFS (Hadoop distributed File System) based on file access heat

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810228575.7A CN108416054B (en) 2018-03-20 2018-03-20 Method for calculating number of copies of dynamic HDFS (Hadoop distributed File System) based on file access heat

Publications (2)

Publication Number Publication Date
CN108416054A true CN108416054A (en) 2018-08-17
CN108416054B CN108416054B (en) 2021-10-22

Family

ID=63132988

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810228575.7A Expired - Fee Related CN108416054B (en) 2018-03-20 2018-03-20 Method for calculating number of copies of dynamic HDFS (Hadoop distributed File System) based on file access heat

Country Status (1)

Country Link
CN (1) CN108416054B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112689166A (en) * 2020-12-18 2021-04-20 武汉市烽视威科技有限公司 Method and system for flexibly increasing and decreasing CDN hot content in real time
CN113391765A (en) * 2021-06-22 2021-09-14 中国工商银行股份有限公司 Data storage method, device, equipment and medium based on distributed storage system
CN115033187A (en) * 2022-08-10 2022-09-09 蓝深远望科技股份有限公司 Big data based analysis management method
CN115544377A (en) * 2022-11-25 2022-12-30 浙江星汉信息技术股份有限公司 Cloud storage-based file heat evaluation and updating method
CN116600015A (en) * 2023-07-18 2023-08-15 湖南快乐阳光互动娱乐传媒有限公司 Resource node adjustment method, system, electronic equipment and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150347A (en) * 2013-02-07 2013-06-12 浙江大学 Dynamic replica management method based on file heat
CN105574153A (en) * 2015-12-16 2016-05-11 南京信息工程大学 Transcript placement method based on file heat analysis and K-means
CN107632994A (en) * 2016-07-19 2018-01-26 普天信息技术有限公司 A kind of reliability Enhancement Method and system based on HDFS file system
CN107770259A (en) * 2017-09-30 2018-03-06 武汉理工大学 Copy amount dynamic adjusting method based on file temperature and node load

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150347A (en) * 2013-02-07 2013-06-12 浙江大学 Dynamic replica management method based on file heat
CN105574153A (en) * 2015-12-16 2016-05-11 南京信息工程大学 Transcript placement method based on file heat analysis and K-means
CN107632994A (en) * 2016-07-19 2018-01-26 普天信息技术有限公司 A kind of reliability Enhancement Method and system based on HDFS file system
CN107770259A (en) * 2017-09-30 2018-03-06 武汉理工大学 Copy amount dynamic adjusting method based on file temperature and node load

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112689166A (en) * 2020-12-18 2021-04-20 武汉市烽视威科技有限公司 Method and system for flexibly increasing and decreasing CDN hot content in real time
CN113391765A (en) * 2021-06-22 2021-09-14 中国工商银行股份有限公司 Data storage method, device, equipment and medium based on distributed storage system
CN115033187A (en) * 2022-08-10 2022-09-09 蓝深远望科技股份有限公司 Big data based analysis management method
CN115033187B (en) * 2022-08-10 2022-11-08 蓝深远望科技股份有限公司 Big data based analysis management method
CN115544377A (en) * 2022-11-25 2022-12-30 浙江星汉信息技术股份有限公司 Cloud storage-based file heat evaluation and updating method
CN116600015A (en) * 2023-07-18 2023-08-15 湖南快乐阳光互动娱乐传媒有限公司 Resource node adjustment method, system, electronic equipment and readable storage medium
CN116600015B (en) * 2023-07-18 2023-10-10 湖南快乐阳光互动娱乐传媒有限公司 Resource node adjustment method, system, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN108416054B (en) 2021-10-22

Similar Documents

Publication Publication Date Title
CN108416054A (en) Dynamic HDFS copy number calculating methods based on file access temperature
Ebadi et al. An energy‐aware method for data replication in the cloud environments using a tabu search and particle swarm optimization algorithm
Venkataraman et al. The power of choice in {Data-Aware} cluster scheduling
Yu et al. Location-aware associated data placement for geo-distributed data-intensive applications
Tirado et al. Predictive data grouping and placement for cloud-based elastic server infrastructures
Xie et al. Pandas: robust locality-aware scheduling with stochastic delay optimality
Xie et al. Kraken: memory-efficient continual learning for large-scale real-time recommendations
Liu et al. Scalable and adaptive data replica placement for geo-distributed cloud storages
Wang et al. Hybrid pulling/pushing for i/o-efficient distributed and iterative graph computing
Khan et al. Optimizing hadoop parameter settings with gene expression programming guided PSO
Abad et al. Generating request streams on Big Data using clustered renewal processes
Zhang et al. EB-BFT: An elastic batched BFT consensus protocol in blockchain
Myint et al. Comparative analysis of adaptive file replication algorithms for cloud data storage
Zhang et al. NADE: nodes performance awareness and accurate distance evaluation for degraded read in heterogeneous distributed erasure code-based storage
Soosai et al. Dynamic replica replacement strategy in data grid
KR101718739B1 (en) System and Method for Replicating Dynamic Data for Heterogeneous Hadoop
KR20160044623A (en) Load Balancing Method for a Linux Virtual Server
Qin et al. Fault tolerant storage and data access optimization in data center networks
Perozzi et al. Scalable graph clustering with parallel approximate PageRank
Luo et al. Superset: a non-uniform replica placement strategy towards perfect load balance and fine-grained power proportionality
Jian et al. A HDFS dynamic load balancing strategy using improved niche PSO algorithm in cloud storage
Jiang et al. Modeling and Analyzing for Data Durability Towards Cloud Storage Services
Li et al. MonickerHash: A Decentralized Load-Balancing Algorithm for Resource/Traffic Distribution
Fan et al. Latency-Aware Data Placements for Operational Cost Minimization of Distributed Data Centers
Ibrahim Improvement of Data-Intensive Applications Running on Cloud Computing Clusters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20211022