CN108416054A - Dynamic HDFS copy number calculating methods based on file access temperature - Google Patents
Dynamic HDFS copy number calculating methods based on file access temperature Download PDFInfo
- Publication number
- CN108416054A CN108416054A CN201810228575.7A CN201810228575A CN108416054A CN 108416054 A CN108416054 A CN 108416054A CN 201810228575 A CN201810228575 A CN 201810228575A CN 108416054 A CN108416054 A CN 108416054A
- Authority
- CN
- China
- Prior art keywords
- file
- access temperature
- access
- hot spot
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/1734—Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of dynamic HDFS copy number calculating methods based on file access temperature, is related to data analysis technique field.Dynamic HDFS copy number calculating methods based on file access temperature, the rule that the access temperature of hot spot file changes over time is obtained by improved Markovian model type analysis first, and according to the calculation formula of file access temperature, the access temperature of file is predicted.Then queueing theory algorithm is used, the calculation formula of copy number is provided, dynamic adjusts the copy number of hot spot file.Dynamic HDFS copy number calculating methods provided by the invention based on file access temperature, solve the problems, such as the access bottleneck to hot spot file, improve the efficiency of service of cluster.
Description
Technical field
The present invention relates to data analysis technique field more particularly to a kind of dynamic HDFS copies based on file access temperature
A number calculating method.
Background technology
With modern the Internet technology development with science and technology progress, data with large capacity, diversity, high speed and
The characteristics of authenticity, penetrates into the industry-by-industry and field of social development.The growth trend of mass data, to data and resource
It is reasonably managed, and ensures the reliability of data, have become the critical problem that field of cloud calculation faces.
The distributed system architecture Hadoop developed by Apache funds club realizes a distributed field system
It unites (Hadoop Distributed File System), abbreviation HDFS.HDFS has the characteristics of high fault tolerance, and is designed to
It is deployed on cheap (low-cost) hardware;And it provides high-throughput (high throughput) to access using journey
The data of sequence are suitble to those to have the application program of super large data set (large data set).HDFS is relaxed (relax)
The requirement of POSIX can access the data in (streaming access) file system in the form of streaming.In the copy of HDFS
In administrative mechanism, cluster acquiescence takes the replica management mechanism that 3 number of copies are preserved for each data block of file, but cannot expire
Sufficient different user is to the requirements for access of different files, when user increases the visit capacity of some file, the data block of acquiescence
Copy number cannot respond to a large amount of access request, cause hot spot file access bottleneck problem.Presently relevant replica management side
Method gradually turns to dynamic copies construction strategy by static replica creation strategy makes the whole of cluster when external environment changes
Individual character can be constant or can keep expeditiously providing service for client.But in dynamic copies construction strategy still
The factor of great influence is not accounted for but had to the working efficiency of cluster there are some.
In the prior art, document《Efficient more replica management researchs under cloud environment》For large-scale cloud storage system at
This benefit ensures that problem proposes a kind of dynamic copies creation method, it considers the pass between copy number and availability
System adjusts copy number under the premise of considering to meet cloud storage system availability, but does not account for file access temperature
With the relationship between copy number.Document《An Elastic Replication Management System for HDFS》
It is proposed that active/standby storage models realize the flexible management to HDFS copies, it manages engine to reality using complicated affairs
When the data volume that accesses be identified, dynamically adjust copy number, and introduce correcting and eleting codes and be managed to copy number.This is
Although system is effectively improved the performance of HDFS, it realizes that process is complicated, and when identifying real time access data, complexity is high.
Invention content
In view of the drawbacks of the prior art, the present invention provides a kind of dynamic HDFS copy number meters based on file access temperature
Calculation method, realization calculate dynamic copies number.
A kind of dynamic HDFS copy number calculating methods based on file access temperature, include the following steps:
Step 1, according to file access log table on distributed file system HDFS and according to the calculating of file access temperature
The access temperature of each file in measurement period is calculated in formula, and by the sum of access temperature of file in timing statistics to file
Descending sort is carried out, preceding 20% file in selected and sorted list is as hot spot file, structure hot spot file-access temperature sequence
Row are used as sequence to be predicted, the prediction for the temperature that accesses;
The calculation formula of the file access temperature is shown below:
Wherein, Hot (f) indicates that the access temperature of file f, AF (f)=N/T indicate that the access frequency of file f, N indicate text
Access times of the part f in statistic period T,Indicate the data block size of file f, fsizeIndicate the size of file f,Expression is not more thanMaximum integer,Obtain the data block number of file f;
Step 2 carries out state space division, specific side using hierarchical clustering algorithm to hot spot file-access temperature sequence
Method is:
It is the data set of N by hot spot file-one length of access temperature Sequence composition, the object in data set represents hot spot
Access temperature of the file in different moments, the process that hierarchical clustering is carried out to hot spot file-access temperature data set are:
(1) regard each object in data set as one kind, N classes are obtained, the distance between class and class are each in two classes
A data point two-by-two distance square median;
(2) two nearest classes of distance are merged into a class so that the sum of class reduces one;
(3) the distance between new class and other classes are recalculated;
(4) (2)-(3) step is repeated, until all data objects in data set are to the last merged into a class;
Based on above step, the clustering tree of hot spot file-access temperature sequence is obtained, according to cluster tree construction, defines horse
The state space that Er Kefu is divided;
Step 3 carries out geneva inspection to the hot spot file-access temperature sequence for having divided state space, if met
Geneva, using the sequence as improve Markov model list entries, otherwise the sequence cannot use improved markov
Model is handled;
Step 4 will meet hot spot file-access temperature sequence of geneva as the input sequence for improving Markov model
Row predict the access temperature of subsequent time hot spot file, and the access temperature that prediction obtains are written to hot spot file-access heat
In the table of degrees of data library, specific method is:
Step 4.1:A step state is calculated and turns according to file-access temperature sequence based on the state space divided
Move probability matrix P;
Step 4.2:It sets the corresponding state of current time file access hot value to initial state distribution, is denoted as p
(0), probability distribution over states p (1)=p (0) P of subsequent time is calculated according to a step state transition probability matrix P;
Step 4.3:Take the state of distribution probability maximum value in the probability distribution over states p (1) of subsequent time as lower a period of time
The state at quarter takes the sum of the standard deviation of hot spot file-access temperature sequence and the average value in dbjective state space to be used as lower a period of time
The prediction at quarter accesses hot value;
Step 4.4:First value of list entries is removed, and using the access hot value newly predicted as pre- sequencing next time
The last one value of row is added in sequence to be predicted, repeats above step, predicts the access temperature of hot spot file subsequent time;
Step 5, the queuing model based on the mono- queue Multiple server stations of M/M/r model copy access request, and herein
On the basis of on setting node the handling capacity of copy be to determine the number of copy, specific method:
Step 5.1:Counting to obtain by inquiry hot spot file-access temperature database table specifies hot spot file to be unified under
Count the access average request rate λ of the copy in the period;
Step 5.2:The cpu busy percentage threshold value U of server where copy is set, according to CPU effectiveness rules, cpu busy percentage
Equal to request arriving rate divided by service rate, using following formula, the request service rate μ that single server is calculated is:
Step 5.3:Setting cluster total throughout is constrained to Q, based on the Little formula in queueing theory theory, service sojourn
Time is multiplied by equal to service rateInverse, handling capacity be equal to service sojourn time inverse;It is more in isomorphism cluster environment
Server service rate where copy is identical, to which copy number r be calculated by following two formula:
As shown from the above technical solution, the beneficial effects of the present invention are:It is provided by the invention to be based on file access temperature
Dynamic HDFS copy number calculating methods, by based on improve Markov model the access temperature of file is predicted,
Improve the accuracy of prediction.Meanwhile the copy number calculating method based on queueing theory, consider hot spot file access temperature at any time
Between the rule that changes, dynamically adjust its copy number, the occurrence of high concurrent to cope with hot spot file accesses.Using queuing
The method of opinion, by stored copies on node as Service Source, by analyzing the request rate and responsiveness of hot spot duplicate of the document, with
Ensure that cluster throughput and reliability are target, copy number can be obtained by copy calculation formula, adjusted for subsequent dynamic
Copy number makees place mat.
Description of the drawings
Fig. 1 is the stream of the dynamic HDFS copy number calculating methods provided in an embodiment of the present invention based on file access temperature
Cheng Tu;
Fig. 2 is provided in an embodiment of the present invention using the access for improving Markov model prediction subsequent time hot spot file
The prediction process schematic of temperature;
Fig. 3 is the prediction of the predicted value, improved Markov model of Markov model provided in an embodiment of the present invention
Comparison diagram between value and actual value;
The number of copy and actual copy handling capacity is calculated based on queueing theory to be provided in an embodiment of the present invention in Fig. 4
The comparison diagram for the copy number being calculated.
Specific implementation mode
With reference to the accompanying drawings and examples, the specific implementation mode of the present invention is described in further detail.Implement below
Example is not limited to the scope of the present invention for illustrating the present invention.
In the present embodiment, 3 racks are built, configure 4 virtual machines in each rack, and other three entities will be built
Machine is respectively in the namenode nodes of Active states, and the namenode nodes in standby states, for preventing
Only namenode Single Point of Faliures.Using third platform physical machine as calculate node, for obtaining file access daily record, predicting file
It accesses temperature and calculates the number of copy.Cluster is configured to Hadoop version Hadoop-2.2.0, memory 32G, CPU Intel
(R) Core (TM) i3-2120 CPU@3.30GHz, operating system CentOS-6.7, hard disk 2T, development language JAVA, R,
Matlab。
Dynamic HDFS copy number calculating methods based on file access temperature, as shown in Figure 1, including the following steps:
Step 1, according to file access log table on distributed file system HDFS and according to access temperature calculation formula
The access temperature of each file in measurement period is calculated, and file is carried out by the sum of access temperature of file in timing statistics
Descending sort, preceding 20% file in selected and sorted list as hot spot file, make by structure hot spot file-access temperature sequence
For sequence to be predicted, the prediction for the temperature that accesses;
The calculation formula of the file access temperature is shown below:
Wherein, Hot (f) indicates that the access temperature of file f, AF (f)=N/T indicate that the access frequency of file f, N indicate text
Access times of the part f in statistic period T,Indicate the data block size of file f, fsizeIndicate the size of file f,Expression is not more thanMaximum integer,Obtain the data block number of file f.
In the present embodiment, with 5 days for a measurement period, the access frequency of the flu.txt files in 5 periods is counted altogether
Rate.The access heat of flu.txt in measurement period is calculated according to the calculation formula for accessing temperature according to file access log sheet
It is as shown in table 1 to spend information.
The access temperature information table of 1 flu.txt files of table
Access time | 2017-08-01 | 2017-08-02 | 2017-08-03 | 2017-08-04 | 2017-08-05 |
Access temperature | 262 | 486 | 632 | 300 | 570 |
Access time | 2017-08-06 | … | … | 2017-10-02 | … |
Access temperature | 401 | … | … | 382 | … |
Step 2 carries out state space division, specific side using hierarchical clustering algorithm to hot spot file-access temperature sequence
Method is:
It is the data set of N by hot spot file-one length of access temperature Sequence composition, the object in data set represents hot spot
Access temperature of the file in different moments, the process that hierarchical clustering is carried out to hot spot file-access temperature data set are:
(1) regard each object in data set as one kind, N classes are obtained, the distance between class and class are each in two classes
A data point two-by-two distance square median;
(2) two nearest classes of distance are merged into a class so that the sum of class reduces one;
(3) the distance between new class and other classes are recalculated;
(4) (2)-(3) step is repeated, until all data objects in data set are to the last merged into a class;
Based on above step, the clustering tree of hot spot file-access temperature sequence is obtained, according to cluster tree construction, defines horse
The state space that Er Kefu is divided.
In the present embodiment, temperature is accessed to history using hierarchy clustering method and divides spatiality, historical data is divided
For 5 spatialities, A, B, C, D and E be used in combination that data set is marked.
Step 3 carries out geneva inspection to the hot spot file-access temperature sequence for having divided state space, if met
Geneva, using the sequence as improve Markov model list entries, otherwise the sequence cannot use improved markov
Model is handled;
Geneva examine specific method be:
To including n possible state index sequential value Xn={ x1, x2..., xn, the jth for shifting frequency matrix is arranged
The sum of divided by the obtained value of summation that respectively arranges of each row, referred to as marginal probability, be shown below:
Wherein, fijIndicate index series Xn={ x1, x2..., xnIn from frequencies of the state i through step transfer arrival state j
Rate, i, j ∈ E;
Then statisticWith degree of freedom for (n-1)2χ2It is distributed as Limit Distribution, wherein
Given level of significance α, if in the presence ofThen this sequence XnMeet geneva, otherwise the sequence
Row cannot be handled with Markov model.
In the present embodiment, the cadence number transfer matrix f that can be shown below with R Language ProcessingsijTurn with probability
Move matrix pij, and marginal probability matrix p as shown in table 2.j。
2 marginal probability table of table
State | 1 | 2 | 3 | 4 | 5 |
p.j | 0.17021277 | 0.42553191 | 0.17021277 | 0.08510638 | 0.14893617 |
Statistic is calculated according to values aboveObtain χ as shown in table 32Statistic
Computational chart.
3 χ of table2Normalized set table
In the present embodiment, level of significance α=0.1, according to χ2Normalized set table obtain quantileWherein, n=5.Therefore the history of this document accesses temperature and meets geneva, can be with
The access temperature of file is predicted with Markov model.
Step 4 will meet hot spot file-access temperature sequence of geneva as the input sequence for improving Markov model
Row predict the access temperature of subsequent time hot spot file, and the access temperature that prediction obtains are written to hot spot file-access heat
In the table of degrees of data library, as shown in Fig. 2, specific method is:
Step 4.1:A step state is calculated and turns according to file-access temperature sequence based on the state space divided
Move probability matrix P;
Step 4.2:It sets the corresponding state of current time file access hot value to initial state distribution, is denoted as p
(0), probability distribution over states p (1)=p (0) P of subsequent time is calculated according to a step state transition probability matrix P;
Step 4.3:Take the state of distribution probability maximum value in the probability distribution over states p (1) of subsequent time as lower a period of time
The state at quarter takes the sum of the standard deviation of hot spot file-access temperature sequence and the average value in dbjective state space to be used as lower a period of time
The prediction at quarter accesses hot value;
Step 4.4:First value of list entries is removed, and using the access hot value newly predicted as pre- sequencing next time
The last one value of row is added in sequence to be predicted, repeats above step, predicts the access temperature of hot spot file subsequent time.
In the present embodiment, in order to be verified to the accuracy that the method is predicted, improvement and unmodified horse is respectively adopted
The flu.txt in 5 periods of Er Kefu models pair accesses temperature and compares.Markov model predicted value, improved Ma Erke
Comparison diagram between the predicted value and actual value of husband's model is as shown in Figure 3.As seen from the figure, a cycle subsequent time is predicted
When accessing hot value, the sequence due to accessing temperature is the same, so being obtained with unmodified Markov model with improving
Access temperature and actual access hot value deviation be it is identical, and two methods prediction access hot value do not have with actual value
There is too big difference.But in the access temperature at moment after prediction, using improved Markov model as a result of not
Disconnected update sequence to be predicted, keeps the access temperature of prediction and actual deviation little, and unmodified Markov model, due to
The ergodic of Markov model itself and balance distribution characteristics, with prediction frequency plus increase, the result of prediction with it is practical partially
Difference is larger.The results show that the access hot value and actual value of improved Markov model prediction are relatively, and it is right
The access trend of hot spot file also has relatively good prediction effect.
Step 5, the queuing model based on the mono- queue Multiple server stations of M/M/r model copy access request, and herein
On the basis of on setting node the handling capacity of copy be to determine the number of copy, specific method:
Step 5.1:Counting to obtain by inquiry hot spot file-access temperature database table specifies hot spot file to be unified under
Count the access average request rate λ of the copy in the period;
Step 5.2:The cpu busy percentage threshold value U of server where copy is set, according to CPU effectiveness rules, cpu busy percentage
Equal to request arriving rate divided by service rate.To using following formula, calculate the request service rate u of single server:
Step 5.3:Setting cluster total throughout is constrained to Q, based on the Little formula in queueing theory theory, service sojourn
Time is multiplied by equal to service rateInverse, handling capacity be equal to service sojourn time inverse;It is more in isomorphism cluster environment
Server service rate where copy is identical, to which copy number r be calculated by following two formula:
In the present embodiment, predicted after obtaining the access temperature of hot spot file by improved Markov model, according to
The access temperature of hot spot file, setting node cpu utilization threshold are 0.5, and the handling capacity of setting node copy is 100/s, then
Per day handling capacity is that the handling capacity of 11 hours of access is 1,00*,11h,*36,00=,400 ten thousand, and pair is calculated based on queueing theory
This number, and the copy number obtained with actual copy throughput calculation compares, comparison diagram is as shown in Figure 4.It can by figure
Know, consider the request rate of people logging in and responsiveness of copy, the method for calculating copy number can be according to the trend adjustment pair for accessing temperature
This number, in a cycle, the access temperature of hot spot file is in downward trend, at this point, the pair obtained based on queueing theory
This number is fewer than actual copy number, within the subsequent cycle time, considers that the trend dynamic of hot spot file access temperature is adjusted
Whole copy number, and the number of copies being calculated with goodput is relatively, demonstrates the validity of the method.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although
Present invention has been described in detail with reference to the aforementioned embodiments, it will be understood by those of ordinary skill in the art that:It still may be used
To modify to the technical solution recorded in previous embodiment, either which part or all technical features are equal
It replaces;And these modifications or replacements, model defined by the claims in the present invention that it does not separate the essence of the corresponding technical solution
It encloses.
Claims (5)
1. a kind of dynamic HDFS copy number calculating methods based on file access temperature, it is characterised in that:Include the following steps:
Step 1, according to file access log table on distributed file system HDFS and according to the calculation formula of file access temperature
The access temperature of each file in measurement period is calculated, and file is carried out by the sum of access temperature of file in timing statistics
Descending sort, preceding 20% file in selected and sorted list as hot spot file, make by structure hot spot file-access temperature sequence
For sequence to be predicted, the prediction for the temperature that accesses;
Step 2 carries out state space division using hierarchical clustering algorithm to hot spot file-access temperature sequence;
Step 3 carries out geneva inspection to the hot spot file-access temperature sequence for having divided state space, if meeting geneva
Property, using the sequence as improve Markov model list entries, otherwise the sequence cannot use improved Markov model
To handle;
Step 4 will meet hot spot file-access temperature sequence of geneva as the list entries for improving Markov model,
It predicts the access temperature of subsequent time hot spot file, and the access temperature that prediction obtains is written to hot spot file-access temperature
In database table;
Step 5, the queuing model based on the mono- queue Multiple server stations of M/M/r model copy access request, and basic herein
The handling capacity of copy determines the number of copy on upper setting node.
2. the dynamic HDFS copy number calculating methods according to claim 1 based on file access temperature, feature exist
In:The calculation formula of file access temperature described in step 1 is shown below:
Wherein, Hot (f) indicates that the access temperature of file f, AF (f)=N/T indicate that the access frequency of file f, N indicate that file f exists
Access times in statistic period T,Indicate the data block size of file f, fsizeIndicate the size of file f,
Expression is not more thanMaximum integer,Obtain the data block number of file f.
3. the dynamic HDFS copy number calculating methods according to claim 1 based on file access temperature, feature exist
In:The specific method of the step 2 is:
It is the data set of N by hot spot file-one length of access temperature Sequence composition, the object in data set represents hot spot file
In the access temperature of different moments, the process that hierarchical clustering is carried out to hot spot file-access temperature data set is:
(1) regard each object in data set as one kind, N classes are obtained, the distance between class and class are each number in two classes
Strong point two-by-two distance square median;
(2) two nearest classes of distance are merged into a class so that the sum of class reduces one;
(3) the distance between new class and other classes are recalculated;
(4) (2)-(3) step is repeated, until all data objects in data set are to the last merged into a class;
Based on above step, the clustering tree of hot spot file-access temperature sequence is obtained, according to cluster tree construction, defines Ma Erke
The state space that husband divides.
4. the dynamic HDFS copy number calculating methods according to claim 1 based on file access temperature, feature exist
In:The specific method of the step 4 is:
Step 4.1:It is general to be calculated according to file-access temperature sequence for the transfer of one step state based on the state space divided
Rate matrix P;
Step 4.2:It sets the corresponding state of current time file access hot value to initial state distribution, is denoted as p (0), root
Probability distribution over states p (1)=p (0) P of subsequent time is calculated according to a step state transition probability matrix P;
Step 4.3:Take the state of distribution probability maximum value in the probability distribution over states p (1) of subsequent time as subsequent time
State takes the sum of the standard deviation of hot spot file-access temperature sequence and the average value in dbjective state space as subsequent time
Prediction accesses hot value;
Step 4.4:First value of list entries is removed, and using the access hot value newly predicted as forecasting sequence next time
The last one value is added in sequence to be predicted, repeats above step, predicts the access temperature of hot spot file subsequent time.
5. the dynamic HDFS copy number calculating methods according to claim 1 based on file access temperature, feature exist
In:The specific method of the step 5 is:
Step 5.1:Count to obtain specified hot spot file in next statistics week by inquiring hot spot file-access temperature database table
The access average request rate λ of copy in phase;
Step 5.2:The cpu busy percentage threshold value U of server where copy is arranged, according to CPU effectiveness rules, cpu busy percentage is equal to
Request arriving rate divided by service rate, using following formula, the request service rate μ that single server is calculated is:
Step 5.3:Setting cluster total throughout is constrained to Q, based on the Little formula in queueing theory theory, service sojourn time
It is multiplied by equal to service rateInverse, handling capacity be equal to service sojourn time inverse;In isomorphism cluster environment, more copies
Place server service rate is identical, to which copy number r be calculated by following two formula:
。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810228575.7A CN108416054B (en) | 2018-03-20 | 2018-03-20 | Method for calculating number of copies of dynamic HDFS (Hadoop distributed File System) based on file access heat |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810228575.7A CN108416054B (en) | 2018-03-20 | 2018-03-20 | Method for calculating number of copies of dynamic HDFS (Hadoop distributed File System) based on file access heat |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108416054A true CN108416054A (en) | 2018-08-17 |
CN108416054B CN108416054B (en) | 2021-10-22 |
Family
ID=63132988
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810228575.7A Expired - Fee Related CN108416054B (en) | 2018-03-20 | 2018-03-20 | Method for calculating number of copies of dynamic HDFS (Hadoop distributed File System) based on file access heat |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108416054B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112689166A (en) * | 2020-12-18 | 2021-04-20 | 武汉市烽视威科技有限公司 | Method and system for flexibly increasing and decreasing CDN hot content in real time |
CN113391765A (en) * | 2021-06-22 | 2021-09-14 | 中国工商银行股份有限公司 | Data storage method, device, equipment and medium based on distributed storage system |
CN115033187A (en) * | 2022-08-10 | 2022-09-09 | 蓝深远望科技股份有限公司 | Big data based analysis management method |
CN115544377A (en) * | 2022-11-25 | 2022-12-30 | 浙江星汉信息技术股份有限公司 | Cloud storage-based file heat evaluation and updating method |
CN116600015A (en) * | 2023-07-18 | 2023-08-15 | 湖南快乐阳光互动娱乐传媒有限公司 | Resource node adjustment method, system, electronic equipment and readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103150347A (en) * | 2013-02-07 | 2013-06-12 | 浙江大学 | Dynamic replica management method based on file heat |
CN105574153A (en) * | 2015-12-16 | 2016-05-11 | 南京信息工程大学 | Transcript placement method based on file heat analysis and K-means |
CN107632994A (en) * | 2016-07-19 | 2018-01-26 | 普天信息技术有限公司 | A kind of reliability Enhancement Method and system based on HDFS file system |
CN107770259A (en) * | 2017-09-30 | 2018-03-06 | 武汉理工大学 | Copy amount dynamic adjusting method based on file temperature and node load |
-
2018
- 2018-03-20 CN CN201810228575.7A patent/CN108416054B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103150347A (en) * | 2013-02-07 | 2013-06-12 | 浙江大学 | Dynamic replica management method based on file heat |
CN105574153A (en) * | 2015-12-16 | 2016-05-11 | 南京信息工程大学 | Transcript placement method based on file heat analysis and K-means |
CN107632994A (en) * | 2016-07-19 | 2018-01-26 | 普天信息技术有限公司 | A kind of reliability Enhancement Method and system based on HDFS file system |
CN107770259A (en) * | 2017-09-30 | 2018-03-06 | 武汉理工大学 | Copy amount dynamic adjusting method based on file temperature and node load |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112689166A (en) * | 2020-12-18 | 2021-04-20 | 武汉市烽视威科技有限公司 | Method and system for flexibly increasing and decreasing CDN hot content in real time |
CN113391765A (en) * | 2021-06-22 | 2021-09-14 | 中国工商银行股份有限公司 | Data storage method, device, equipment and medium based on distributed storage system |
CN115033187A (en) * | 2022-08-10 | 2022-09-09 | 蓝深远望科技股份有限公司 | Big data based analysis management method |
CN115033187B (en) * | 2022-08-10 | 2022-11-08 | 蓝深远望科技股份有限公司 | Big data based analysis management method |
CN115544377A (en) * | 2022-11-25 | 2022-12-30 | 浙江星汉信息技术股份有限公司 | Cloud storage-based file heat evaluation and updating method |
CN116600015A (en) * | 2023-07-18 | 2023-08-15 | 湖南快乐阳光互动娱乐传媒有限公司 | Resource node adjustment method, system, electronic equipment and readable storage medium |
CN116600015B (en) * | 2023-07-18 | 2023-10-10 | 湖南快乐阳光互动娱乐传媒有限公司 | Resource node adjustment method, system, electronic equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108416054B (en) | 2021-10-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108416054A (en) | Dynamic HDFS copy number calculating methods based on file access temperature | |
Ebadi et al. | An energy‐aware method for data replication in the cloud environments using a tabu search and particle swarm optimization algorithm | |
Venkataraman et al. | The power of choice in {Data-Aware} cluster scheduling | |
Yu et al. | Location-aware associated data placement for geo-distributed data-intensive applications | |
Tirado et al. | Predictive data grouping and placement for cloud-based elastic server infrastructures | |
Xie et al. | Pandas: robust locality-aware scheduling with stochastic delay optimality | |
Xie et al. | Kraken: memory-efficient continual learning for large-scale real-time recommendations | |
Liu et al. | Scalable and adaptive data replica placement for geo-distributed cloud storages | |
Wang et al. | Hybrid pulling/pushing for i/o-efficient distributed and iterative graph computing | |
Khan et al. | Optimizing hadoop parameter settings with gene expression programming guided PSO | |
Abad et al. | Generating request streams on Big Data using clustered renewal processes | |
Zhang et al. | EB-BFT: An elastic batched BFT consensus protocol in blockchain | |
Myint et al. | Comparative analysis of adaptive file replication algorithms for cloud data storage | |
Zhang et al. | NADE: nodes performance awareness and accurate distance evaluation for degraded read in heterogeneous distributed erasure code-based storage | |
Soosai et al. | Dynamic replica replacement strategy in data grid | |
KR101718739B1 (en) | System and Method for Replicating Dynamic Data for Heterogeneous Hadoop | |
KR20160044623A (en) | Load Balancing Method for a Linux Virtual Server | |
Qin et al. | Fault tolerant storage and data access optimization in data center networks | |
Perozzi et al. | Scalable graph clustering with parallel approximate PageRank | |
Luo et al. | Superset: a non-uniform replica placement strategy towards perfect load balance and fine-grained power proportionality | |
Jian et al. | A HDFS dynamic load balancing strategy using improved niche PSO algorithm in cloud storage | |
Jiang et al. | Modeling and Analyzing for Data Durability Towards Cloud Storage Services | |
Li et al. | MonickerHash: A Decentralized Load-Balancing Algorithm for Resource/Traffic Distribution | |
Fan et al. | Latency-Aware Data Placements for Operational Cost Minimization of Distributed Data Centers | |
Ibrahim | Improvement of Data-Intensive Applications Running on Cloud Computing Clusters |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20211022 |