CN108416054B - Method for calculating number of copies of dynamic HDFS (Hadoop distributed File System) based on file access heat - Google Patents
Method for calculating number of copies of dynamic HDFS (Hadoop distributed File System) based on file access heat Download PDFInfo
- Publication number
- CN108416054B CN108416054B CN201810228575.7A CN201810228575A CN108416054B CN 108416054 B CN108416054 B CN 108416054B CN 201810228575 A CN201810228575 A CN 201810228575A CN 108416054 B CN108416054 B CN 108416054B
- Authority
- CN
- China
- Prior art keywords
- file
- access heat
- access
- sequence
- copies
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/1734—Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method for calculating the number of copies of a dynamic HDFS (Hadoop distributed File System) based on file access heat, and relates to the technical field of data analysis. According to the method for calculating the number of the copies of the dynamic HDFS based on the file access heat, firstly, the rule of the change of the access heat of the hot files along with the time is obtained through the improved Markov model analysis, and the access heat of the files is predicted according to a calculation formula of the access heat of the files. And then, giving a calculation formula of the number of the copies by adopting a queuing theory algorithm, and dynamically adjusting the number of the copies of the hot spot file. The method for calculating the number of the copies of the dynamic HDFS based on the file access heat solves the problem of access bottleneck to the hot files, and improves the service efficiency of the cluster.
Description
Technical Field
The invention relates to the technical field of data analysis, in particular to a method for calculating the number of copies of a dynamic HDFS (Hadoop distributed File System) based on file access heat.
Background
With the development of modern internet technology and the progress of scientific technology, data permeates into various industries and fields of social development by the characteristics of high capacity, diversity, high speed and reality. The growing trend of mass data, reasonable management of data and resources and guarantee of data reliability have become a key problem facing the cloud computing field.
The Distributed System infrastructure Hadoop developed by the Apache foundation realizes a Distributed File System (Hadoop Distributed File System), HDFS for short. HDFS is characterized by high fault tolerance and is designed for deployment on inexpensive (low-cost) hardware; and it provides high throughput (high throughput) to access data of applications, suitable for applications with very large data sets. HDFS relaxes the requirements of (relax) POSIX and can access (streaming access) data in a file system in the form of streams. In a copy management mechanism of the HDFS, a cluster defaults to a copy management mechanism that stores 3 copies for each data block of a file, but cannot meet access requirements of different users on different files, and when the access amount of a user to a certain file increases, the default number of copies of the data block cannot respond to a large number of access requests, which causes a bottleneck problem of access to hot files. Currently, a related copy management method gradually changes from a static copy creation policy to a dynamic copy creation policy, so that when an external environment changes, the entire performance of a cluster can be unchanged or a service can be efficiently provided for a client. There are still some factors that are not considered in the dynamic copy creation policy but have a significant impact on the working efficiency of the cluster.
In the prior art, a document "high-efficiency multi-copy management research in cloud environment" proposes a dynamic copy creating method for the problem of cost benefit guarantee of a large-scale cloud storage system, which comprehensively considers the relationship between the number of copies and availability, i.e., adjusts the number of copies on the premise of considering the availability of the cloud storage system, but does not consider the relationship between the file access heat and the number of copies. Document "An Elastic Replication Management System for HDFS" proposes An active/standby storage model to realize flexible Management of HDFS copies, and the method utilizes a complex transaction engine to identify data volume accessed in real time, dynamically adjusts the number of copies, and introduces erasure codes to manage the number of copies. Although the system effectively improves the performance of the HDFS, the implementation process is complex, and the complexity is high when real-time access data is identified.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method for calculating the number of copies of a dynamic HDFS based on the file access heat, which is used for calculating the number of the dynamic copies.
A method for calculating the number of copies of a dynamic HDFS based on file access heat comprises the following steps:
the calculation formula of the file access heat is shown as the following formula:
where hot (f) represents the access heat of the file f, af (f) represents the access frequency of the file f, N represents the number of accesses of the file f within the statistical period T,representing the data block size of file f, fsizeWhich represents the size of the file f, is,means not more thanIs the largest integer of (a) to (b),obtaining the number of data blocks of the file f;
forming a data set with the length of N by the hotspot file-access heat sequence, wherein objects in the data set represent the access heat of the hotspot files at different moments, and the process of hierarchically clustering the hotspot file-access heat data set comprises the following steps:
(1) regarding each object in the data set as a class, and obtaining N classes in total, wherein the distance between the classes is the middle value of the square of the distance between every two data points in the two classes;
(2) merging two classes with the nearest distance into one class, so that the total number of the classes is reduced by one;
(3) recalculating distances between the new class and other classes;
(4) repeating the steps (2) - (3) until all data objects in the data set are finally merged into one class;
based on the steps, obtaining a clustering tree of the hotspot file-access heat sequence, and defining a Markov division state space according to the clustering tree structure;
step 4.1: calculating to obtain a one-step state transition probability matrix P according to the file-access heat sequence based on the divided state space;
step 4.2: setting the state corresponding to the file access heat value at the current moment as initial state distribution, marking as P (0), and calculating to obtain the state probability distribution P (1) ═ P (0) P at the next moment according to the one-step state transition probability matrix P;
step 4.3: taking a state of a distribution probability maximum value in a state probability distribution p (1) at the next moment as a state at the next moment, and taking the sum of a standard deviation of a hot point file-access heat sequence and an average value of a target state space as a predicted access heat value at the next moment;
step 4.4: removing the first value of the input sequence, and adding the newly predicted visit heat value as the last value of the next predicted sequence into the sequence to be predicted;
step 4.5: repeating the steps 4.1-4.4, and predicting the access heat of the hot spot file at the next moment;
step 5.1, obtaining the access average request rate lambda of the copy of the specified hotspot file in the next statistic period through inquiring the hotspot file-accessing the heat database table;
step 5.2: setting a CPU utilization rate threshold U of the server where the copy is located, wherein the CPU utilization rate is equal to the request arrival rate divided by the service rate according to a CPU utility rule, and calculating the request service rate mu of the single server by using the following formula:
step 5.3: setting the total throughput constraint of the cluster as Q, and based on a Little formula in the queuing theory, the service stay time is equal to the service rate multiplied by the service rateThroughput is equal to the inverse of service dwell time; in the homogeneous cluster environment, the service rates of the servers where the multiple copies are located are the same, so that the number r of the copies is calculated by the following two formulas:
according to the technical scheme, the invention has the beneficial effects that: according to the method for calculating the number of the copies of the dynamic HDFS based on the file access heat, the access heat of the file is predicted based on the improved Markov model, and therefore the accuracy of prediction is improved. Meanwhile, the method for calculating the number of the copies based on the queuing theory considers the rule that the access heat of the hot files changes along with time, and dynamically adjusts the number of the copies so as to deal with the occurrence of high concurrent access conditions of the hot files. By adopting a queuing theory method, the copies stored on the nodes are taken as service resources, the request rate and the response rate of the hot spot file copies are analyzed to ensure the cluster throughput and reliability as targets, the number of the copies can be obtained through a copy calculation formula, and the method lays a foundation for the subsequent dynamic adjustment of the number of the copies.
Drawings
Fig. 1 is a flowchart of a method for calculating the number of copies of a dynamic HDFS based on file access heat according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a prediction process for predicting access heat of a hot file at the next time using an improved Markov model according to an embodiment of the present invention;
FIG. 3 is a graph illustrating a comparison between predicted values and true values for a Markov model, an improved Markov model, according to an embodiment of the present invention;
fig. 4 is a comparison diagram of the number of copies calculated based on the queuing theory and the number of copies calculated based on the actual copy throughput provided by the embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
In this embodiment, 3 racks are built, 4 virtual machines are configured on each rack, and three other physical machines are built, namely a namenode node in an Active state and a namenode in a standby state, so as to prevent a single point fault of the namenode. And taking the third entity machine as a computing node for acquiring the file access log, predicting the access heat of the file and computing the number of the copies. The configuration of the cluster is Hadoop version Hadoop-2.2.0, the internal memory 32G, the CPU Intel (R) core (TM) i3-2120 CPU @3.30GHz, the operating system CentOS-6.7, the hard disk 2T, the development language JAVA, R, Matlab.
The method for calculating the number of the copies of the dynamic HDFS based on the file access heat comprises the following steps as shown in FIG. 1:
the calculation formula of the file access heat is shown as the following formula:
where hot (f) represents the access heat of the file f, af (f) represents the access frequency of the file f, N represents the number of accesses of the file f within the statistical period T,representing the data block size of file f, fsizeWhich represents the size of the file f, is,means not more thanIs the largest integer of (a) to (b),and obtaining the number of the data blocks of the file f.
In this embodiment, 5 days are taken as a statistical period, and the access frequency of the flu.txt file in 5 periods is counted. The access heat information of flu.txt in the statistical period is obtained by calculation according to the file access log table and the calculation formula of the access heat and is shown in table 1.
Table 1 access heat information table of flu
Time of access | 2017-08-01 | 2017-08-02 | 2017-08-03 | 2017-08-04 | 2017-08-05 |
Visit heat | 262 | 486 | 632 | 300 | 570 |
Time of access | 2017-08-06 | ... | ... | 2017-10-02 | ... |
Visit heat | 401 | ... | ... | 382 | ... |
forming a data set with the length of N by the hotspot file-access heat sequence, wherein objects in the data set represent the access heat of the hotspot files at different moments, and the process of hierarchically clustering the hotspot file-access heat data set comprises the following steps:
(1) regarding each object in the data set as a class, and obtaining N classes in total, wherein the distance between the classes is the middle value of the square of the distance between every two data points in the two classes;
(2) merging two classes with the nearest distance into one class, so that the total number of the classes is reduced by one;
(3) recalculating distances between the new class and other classes;
(4) repeating the steps (2) - (3) until all data objects in the data set are finally merged into one class;
based on the steps, a cluster tree of the hotspot file-access heat sequence is obtained, and a Markov division state space is defined according to the cluster tree structure.
In this embodiment, a hierarchical clustering method is used to divide the historical access heat into spatial states, divide the historical data into 5 spatial states, and label the data set with A, B, C, D and E.
the specific method for the Malassezia test comprises the following steps:
for a sequence of n possible state index values Xn={x1,x2,...,xnDividing the sum of the jth column of the transition frequency matrix by the sum of each row and each column to obtain a value called a marginal probability, as shown in the following formula:
wherein f isijIndicates the index sequence Xn={x1,x2,...,xnThe frequency of a state j is reached from a state i through one-step transfer, i, j belongs to E;
then statisticWith a degree of freedom of (n-1)2Chi of2The distribution is a limiting distribution, wherein,
given a level of significance α, if presentThen this sequence XnIs markov-compliant, otherwise the sequence cannot be processed using markov models.
In this embodiment, the R language processing can be used to obtain a one-step frequency transfer matrix f shown in the following formulaijAnd probability transition matrix pijAnd a marginal probability matrix p as shown in Table 2.j。
TABLE 2 marginal probability table
Status of |
1 | 2 | 3 | 4 | 5 |
p.j | 0.17021277 | 0.42553191 | 0.17021277 | 0.08510638 | 0.14893617 |
Calculating to obtain statistic according to the above valuesAs a result, χ shown in Table 3 was obtained2The statistics calculation table.
TABLE 3X2Statistic calculation table
In this example, the significance level α is 0.1 in terms of χ2The statistic calculation table obtains quantile pointsWherein n is 5. Therefore, the historical access heat of the file is satisfactory to Markov, and the access heat of the file can be predicted by using a Markov model.
step 4.1: calculating to obtain a one-step state transition probability matrix P according to the file-access heat sequence based on the divided state space;
step 4.2: setting the state corresponding to the file access heat value at the current moment as initial state distribution, marking as P (0), and calculating to obtain the state probability distribution P (1) ═ P (0) P at the next moment according to the one-step state transition probability matrix P;
step 4.3: taking a state of a distribution probability maximum value in a state probability distribution p (1) at the next moment as a state at the next moment, and taking the sum of a standard deviation of a hot point file-access heat sequence and an average value of a target state space as a predicted access heat value at the next moment;
step 4.4: removing the first value of the input sequence, and adding the newly predicted visit heat value as the last value of the next predicted sequence into the sequence to be predicted;
step 4.5: and repeating the steps 4.1-4.4, and predicting the access heat of the hotspot file at the next moment.
In this embodiment, in order to verify the prediction accuracy of the method, the access heat of flu.txt of 5 cycles is compared by using improved and non-improved markov models, respectively. The comparison between the predicted values of the markov model, the predicted values of the improved markov model and the true values is shown in fig. 3. As can be seen from the figure, when the visit heat value at the next moment of the first cycle is predicted, since the sequence of the visit heat values is the same, the deviation of the visit heat values obtained by using the improved and non-improved markov models from the actual visit heat value is the same, and the predicted visit heat values of the two methods do not have much difference from the actual visit heat value. However, when predicting the access heat at a later time, the improved Markov model is adopted, and the predicted access heat has little deviation from the actual due to the adoption of the sequence to be predicted which is continuously updated, while the non-improved Markov model has larger deviation from the actual due to the increase of the prediction frequency due to the traversal property and the balanced distribution characteristic of the Markov model. The result shows that the improved Markov model forecasts the visit heat value which is relatively close to the actual value, and has relatively good forecasting effect on the visit trend of the hot spot file.
step 5.1, obtaining the access average request rate lambda of the copy of the specified hotspot file in the next statistic period through inquiring the hotspot file-accessing the heat database table;
step 5.2: and setting a CPU utilization rate threshold U of the server where the copy is located, wherein the CPU utilization rate is equal to the request arrival rate divided by the service rate according to the CPU utility rule. Thus, the request service rate μ of the single server is calculated using the following formula:
step 5.3: setting the total throughput constraint of the cluster as Q, and based on a Little formula in the queuing theory, the service stay time is equal to the service rate multiplied by the service rateThroughput is equal to the inverse of service dwell time; in the homogeneous cluster environment, the service rates of the servers where the multiple copies are located are the same, so that the number r of the copies is calculated by the following two formulas:
in this embodiment, after the access heat of the hotspot file is obtained by the improved markov model prediction, according to the access heat of the hotspot file, the threshold of the node CPU utilization rate is set to 0.5, and the throughput of the node copy is set to 100/s, so that the daily average throughput is 100 × 11h × 3600 — 400 ten thousand for 11 hours of access, the number of copies is obtained by calculation based on the queuing theory, and is compared with the number of copies obtained by actual copy throughput calculation, and a comparison graph is shown in fig. 4. It can be known from the figure that, the method for calculating the number of copies can adjust the number of copies according to the trend of the access heat in consideration of the request access rate and the response rate of the copies, in the first period, the access heat of the hotspot file is in a descending trend, at this time, the number of copies obtained based on the queuing theory is less than the actual number of copies, in the subsequent period, the number of copies is dynamically adjusted in consideration of the trend of the access heat of the hotspot file, and is closer to the number of copies calculated by actual throughput, and the effectiveness of the method is verified.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions and scope of the present invention as defined in the appended claims.
Claims (1)
1. A method for calculating the number of copies of a dynamic HDFS based on file access heat is characterized by comprising the following steps: the method comprises the following steps:
step 1, calculating according to a file access log table on a distributed file system HDFS and a calculation formula of file access heat to obtain access heat of each file in a statistical period, sorting the files in a descending order according to the sum of the access heat of the files in statistical time, selecting the first 20% of the files in the sorted list as hot files, and constructing a hot file-access heat sequence as a sequence to be predicted;
step 2, performing state space division on the hotspot file-access heat sequence by adopting a hierarchical clustering algorithm;
step 3, conducting Markov test on the hot spot file-access heat sequence divided into the state space, if the Markov test is satisfied, using the sequence as an input sequence of the improved Markov model, otherwise, the sequence can not be processed by the improved Markov model;
step 4, taking the hot file-access heat sequence meeting Markov property as an input sequence of an improved Markov model, predicting the access heat of the hot file at the next moment, and writing the predicted access heat into a hot file-access heat database table;
step 5, modeling the copy access request based on the queue model of the M/M/r single-queue multi-service desk, and setting the throughput of the copies on the node to determine the number of the copies;
the calculation formula of the file access heat in the step 1 is shown as the following formula:
where hot (f) represents the access heat of the file f, af (f) represents the access frequency of the file f, N represents the number of accesses of the file f within the statistical period T,representing the data block size of file f, fsizeWhich represents the size of the file f, is,means not more thanIs the largest integer of (a) to (b),obtaining the number of data blocks of the file f;
the specific method of the step 2 comprises the following steps:
forming a data set with the length of N by the hotspot file-access heat sequence, wherein objects in the data set represent the access heat of the hotspot files at different moments, and the process of hierarchically clustering the hotspot file-access heat data set comprises the following steps:
(1) regarding each object in the data set as a class, and obtaining N classes in total, wherein the distance between the classes is the middle value of the square of the distance between every two data points in the two classes;
(2) merging two classes with the nearest distance into one class, so that the total number of the classes is reduced by one;
(3) recalculating distances between the new class and other classes;
(4) repeating the steps (2) - (3) until all data objects in the data set are finally merged into one class;
based on the steps, obtaining a clustering tree of the hotspot file-access heat sequence, and defining a Markov division state space according to the clustering tree structure;
the specific method of the step 4 comprises the following steps:
step 4.1: calculating to obtain a one-step state transition probability matrix P according to the file-access heat sequence based on the divided state space;
step 4.2: setting the state corresponding to the file access heat value at the current moment as initial state distribution, marking as P (0), and calculating to obtain the state probability distribution P (1) ═ P (0) P at the next moment according to the one-step state transition probability matrix P;
step 4.3: taking a state of a distribution probability maximum value in a state probability distribution p (1) at the next moment as a state at the next moment, and taking the sum of a standard deviation of a hot point file-access heat sequence and an average value of a target state space as a predicted access heat value at the next moment;
step 4.4: removing the first value of the input sequence, and adding the newly predicted visit heat value as the last value of the next predicted sequence into the sequence to be predicted;
step 4.5: repeating the steps 4.1-4.4, and predicting the access heat of the hot spot file at the next moment;
the specific method of the step 5 comprises the following steps:
step 5.1: obtaining the average access request rate lambda of the copy of the specified hotspot file in the next statistic period through inquiring the hotspot file-accessing the heat database table;
step 5.2: setting a CPU utilization rate threshold U of the server where the copy is located, wherein the CPU utilization rate is equal to the request arrival rate divided by the service rate according to a CPU utility rule, and calculating the request service rate mu of the single server by using the following formula:
step 5.3: setting the total throughput constraint of the cluster as Q, and based on a Little formula in the queuing theory, the service stay time is equal to the service rate multiplied by the service rateThroughput is equal to the inverse of service dwell time; in the homogeneous cluster environment, the service rates of the servers where the multiple copies are located are the same, so that the number r of the copies in the HDFS system is calculated by the following two formulas:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810228575.7A CN108416054B (en) | 2018-03-20 | 2018-03-20 | Method for calculating number of copies of dynamic HDFS (Hadoop distributed File System) based on file access heat |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810228575.7A CN108416054B (en) | 2018-03-20 | 2018-03-20 | Method for calculating number of copies of dynamic HDFS (Hadoop distributed File System) based on file access heat |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108416054A CN108416054A (en) | 2018-08-17 |
CN108416054B true CN108416054B (en) | 2021-10-22 |
Family
ID=63132988
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810228575.7A Expired - Fee Related CN108416054B (en) | 2018-03-20 | 2018-03-20 | Method for calculating number of copies of dynamic HDFS (Hadoop distributed File System) based on file access heat |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108416054B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112689166A (en) * | 2020-12-18 | 2021-04-20 | 武汉市烽视威科技有限公司 | Method and system for flexibly increasing and decreasing CDN hot content in real time |
CN113391765A (en) * | 2021-06-22 | 2021-09-14 | 中国工商银行股份有限公司 | Data storage method, device, equipment and medium based on distributed storage system |
CN115033187B (en) * | 2022-08-10 | 2022-11-08 | 蓝深远望科技股份有限公司 | Big data based analysis management method |
CN115544377B (en) * | 2022-11-25 | 2023-04-07 | 浙江星汉信息技术股份有限公司 | Cloud storage-based file heat evaluation and updating method |
CN116600015B (en) * | 2023-07-18 | 2023-10-10 | 湖南快乐阳光互动娱乐传媒有限公司 | Resource node adjustment method, system, electronic equipment and readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103150347A (en) * | 2013-02-07 | 2013-06-12 | 浙江大学 | Dynamic replica management method based on file heat |
CN105574153A (en) * | 2015-12-16 | 2016-05-11 | 南京信息工程大学 | Transcript placement method based on file heat analysis and K-means |
CN107632994A (en) * | 2016-07-19 | 2018-01-26 | 普天信息技术有限公司 | A kind of reliability Enhancement Method and system based on HDFS file system |
CN107770259A (en) * | 2017-09-30 | 2018-03-06 | 武汉理工大学 | Copy amount dynamic adjusting method based on file temperature and node load |
-
2018
- 2018-03-20 CN CN201810228575.7A patent/CN108416054B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103150347A (en) * | 2013-02-07 | 2013-06-12 | 浙江大学 | Dynamic replica management method based on file heat |
CN105574153A (en) * | 2015-12-16 | 2016-05-11 | 南京信息工程大学 | Transcript placement method based on file heat analysis and K-means |
CN107632994A (en) * | 2016-07-19 | 2018-01-26 | 普天信息技术有限公司 | A kind of reliability Enhancement Method and system based on HDFS file system |
CN107770259A (en) * | 2017-09-30 | 2018-03-06 | 武汉理工大学 | Copy amount dynamic adjusting method based on file temperature and node load |
Also Published As
Publication number | Publication date |
---|---|
CN108416054A (en) | 2018-08-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108416054B (en) | Method for calculating number of copies of dynamic HDFS (Hadoop distributed File System) based on file access heat | |
Berger et al. | {AdaptSize}: Orchestrating the Hot Object Memory Cache in a Content Delivery Network | |
Hadian et al. | High performance parallel k-means clustering for disk-resident datasets on multi-core CPUs | |
Xie et al. | Pandas: robust locality-aware scheduling with stochastic delay optimality | |
Xie et al. | Kraken: memory-efficient continual learning for large-scale real-time recommendations | |
Liu et al. | Scalable and adaptive data replica placement for geo-distributed cloud storages | |
CN111966495B (en) | Data processing method and device | |
Liao et al. | Prefetching on storage servers through mining access patterns on blocks | |
CN116383464A (en) | Correlation big data clustering method and device based on stream computing | |
Sun et al. | SORD: A new strategy of online replica deduplication in Cloud-P2P | |
Li et al. | Efficient multi-attribute precedence-based task scheduling for edge computing in geo-distributed cloud environment | |
US11435926B2 (en) | Method, device, and computer program product for managing storage system | |
KR101718739B1 (en) | System and Method for Replicating Dynamic Data for Heterogeneous Hadoop | |
Zeng et al. | Do more replicas of object data improve the performance of cloud data centers? | |
CN108875786B (en) | Optimization method of consistency problem of food data parallel computing based on Storm | |
Guroob et al. | Efficient replica consistency model (ERCM) for update propagation in data grid environment | |
CN106997303B (en) | MapReduce-based big data approximate processing method | |
Luo et al. | Superset: a non-uniform replica placement strategy towards perfect load balance and fine-grained power proportionality | |
Jian et al. | A HDFS dynamic load balancing strategy using improved niche PSO algorithm in cloud storage | |
Li et al. | MonickerHash: A Decentralized Load-Balancing Algorithm for Resource/Traffic Distribution | |
Zeng et al. | Accelerating Neural Recommendation Training with Embedding Scheduling | |
Sun et al. | Linux Storage IO Important Parameters Filtering Model Based on Random Forest | |
Gui et al. | Grouping synchronous to eliminate stragglers with edge computing in distributed deep learning | |
US20240160572A1 (en) | Systems and methods to generate a miss ratio curve for a cache with variable-sized data blocks | |
Huang et al. | IObrain: An Intelligent Lightweight I/O Recommendation System based on Decision Tree |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20211022 |