CN106446079B - A kind of file of Based on Distributed file system prefetches/caching method and device - Google Patents

A kind of file of Based on Distributed file system prefetches/caching method and device Download PDF

Info

Publication number
CN106446079B
CN106446079B CN201610811562.3A CN201610811562A CN106446079B CN 106446079 B CN106446079 B CN 106446079B CN 201610811562 A CN201610811562 A CN 201610811562A CN 106446079 B CN106446079 B CN 106446079B
Authority
CN
China
Prior art keywords
file
queue
access
time
distributed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610811562.3A
Other languages
Chinese (zh)
Other versions
CN106446079A (en
Inventor
邝倍靖
宋�莹
王博
孙毓忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongke Flux Technology Co ltd
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201610811562.3A priority Critical patent/CN106446079B/en
Publication of CN106446079A publication Critical patent/CN106446079A/en
Application granted granted Critical
Publication of CN106446079B publication Critical patent/CN106446079B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present invention proposes that a kind of file of Based on Distributed file system prefetches/caching method and device, this method includes when the file in client access distributed file system, the access time for accessing file and the file information are recorded in access log, and judge whether file is cached according to access time and the file information;According to access log, each access time point of file in take-off time section TP, and obtain each access time neighborhood of a point, by the degree of association for being accessed file in clustering algorithm calculation document and neighborhood, file corresponding with the degree of association is stored in queue to be prefetched in the form of character string, similarity between file two-by-two is calculated separately in queue to be prefetched by cosine similarity algorithm, file in queue to be prefetched is reconfigured, and combine similarity, each combination is compressed, calculate the total correlation degree of each combination and file, and obtain the maximum combination of total correlation degree in combination, as a group of file to be prefetched.

Description

A kind of file of Based on Distributed file system prefetches/caching method and device
Technical field
The present invention relates to files to prefetch and cache field problem, in particular to a kind of file of Based on Distributed file system Prefetch/caching method and device.
Background technique
Currently, with the rapid development of development of Mobile Internet technology, network data is also in explosive growth, in order to cope with magnanimity The various problems that the access of data is faced, the quick development that distributed storage technology also obtains, at present comparative maturity point Cloth file system mainly include the GFS of Google, Hadoop Distributed File System (HDFS), Lustre, FastDFS, MooseFS, MogileFS, NFS and the TFS of Taobao, each not phase of the scene of these distributed file systems application Together, thus they the characteristics of also difference.
These distributed file systems, the different properties showed in terms of production application are mainly manifested in from magnanimity Efficiency when a certain or certain files is read in file, the problem of for this respect, solution mainly has, Optimum distribution On the other hand the reading of formula file system itself, memory mechanism, storage rule of format, file including metadata etc. are then Optimize reading, the storage performance of file from the periphery of distributed file system, main prefetching including file and caching technology. Proposed by the present invention is the reading performance that distributed file system is improved by prefetching and caching.
For caching technology, basic thought is to be deposited when some file of distributed file system is accessed with memory The file being just read is put, rather than is directly lost, in this way, when reading again, so that it may directly read inside memory, without With reading from distributed file system again, the responding ability of distributed file system entire so just has very big mention Height, still, when the file number of reading is especially more, it is impossible to all accessed files all be put in the buffer, because interior It deposits size to be limited, with matching is a variety of different rejecting strategies, including lru algorithm, LIRS algorithm, and LRU is i.e. most Page replacement algorithm is used less, and the least recently used page is replaced;LIRS algorithm is same by using accessing twice The distance (this distance refers to centre is accessed how much non-duplicate files) of file is gone dynamically as a kind of scale by accessed text Part is ranked up, so that replacement selection is made, since the quantity of documents of distributed file system is bigger, so these caching plans The hit rate of the cache file slightly obtained is relatively low, is not able to satisfy certain demands.
For prefetching technique, following technology is generallyd use: the Prefetching Model based on access frequency, when the access tool to Web There is certain rule, and there is historic and Relatively centralized hobby, therefore propose interest and access behavior pair based on group The file that future will access is predicted;Prefetching Model based on data mining excavates user's using data mining technology Interest association rule, the foundation prefetched as the page that will be accessed user;Prefetching Model based on popularity, periodically The access times of geo-statistic webpage, and choose the more webpage of access times and form popular page set, it is then nearest according to client The size of the request amount of sending prefetches the request amount for being equivalent to user and issuing recently from the popular page set on each server The page be placed on caching or be directly fed to user, using the 1st law of Zipf and the 2nd law to access popularity modeling, propose Based on the Prefetching Model of Web popularity, these prefetching techniques are mainly used in prefetching for Web page, corresponding distributed field system Prefetching for file on system is then fewer and fewer.
Forecasting method above is seldom applied in distributed file system, but 104933110 A of patent CN proposes one Data prefetching method of the kind based on MapReduce, patent passage capacity assessment are handled come the data block for predicting each calculate node Amount, and non-localized task will appear to assess which calculate node according to a series of calculating, for by calculating assessment Non-localized task is just prefetched to calculate node local when calculate node does not also apply for handling the task in advance, so that Calculate node will not generate calculating and wait, this efficiency for prefetching the system that substantially increases and executing MapReduce task, but for Non- MapReduce scene this prefetch rule and be not suitable for.
Summary of the invention
In view of the deficiencies of the prior art, the present invention proposes that a kind of file of Based on Distributed file system prefetches the/side of caching Method and device.
The present invention proposes that a kind of file of Based on Distributed file system prefetches/caching method, comprising:
File cache step will access the visit of the file when the file in client access distributed file system It asks that time and the file information are recorded in access log, and the file is judged according to the access time and the file information Whether cached;
File pre-fetch step, according to the access log, each access time point of the file in take-off time section TP, and Each access time neighborhood of a point is obtained, is calculated by clustering algorithm and is accessed file in the file and the neighborhood File corresponding with the degree of association is stored in queue to be prefetched in the form of character string, passes through cosine phase by the degree of association Calculate separately in the queue to be prefetched similarity between file two-by-two like degree algorithm, to the file in the queue to be prefetched into Row reconfigures, and in conjunction with the similarity, compresses to each combination, calculates each combination and the file Total correlation degree, and the maximum combination of total correlation degree described in the combination is obtained, as a group of file to be prefetched.
The file cache step includes the threshold value N for setting the file and being accessed number, is sentenced according to the access time The file break in the accessed frequency n um in current time T, if the accessed frequency n um is greater than the threshold value N, then by the file cache to distributed caching in, the otherwise uncached file.
The neighborhood is that period TN, the period of acquisition are respectively taken before and after sometime point tt.
The file pre-fetch step further includes carrying out compression processing to a group of file to be prefetched, and prefetches compressed text Part is to distributed caching.
It further include rejecting Files step, comprising:
Step 31, the least file of access times in the t time is taken out in queue Qf, wherein pressing t in distributed caching The queue that access times in time are ranked up is the queue Qf;
Step 32, it finds in the t time and visits in the queue Qt for critical rejecting point with the middle position of queue Qt It asks number least file, judges its position in the queue Qt, if the least file of access times is in critical rejecting Point below, and time for being accessed earliest of the least file of access times less than the file of critical rejecting point earliest access when Between, then the least file of access times is rejected, the file cache step is otherwise executed, wherein by visit in distributed caching Ask that the queue that the time is ranked up is the queue Qt;
Step 33, access times in the t time are taken out in the queue Qf and are less than the file of threshold value M, and execute step Rapid 32.
The present invention also proposes that a kind of file of Based on Distributed file system prefetches/caching system, comprising:
File cache module, for the file will to be accessed when the file in client access distributed file system Access time and the file information be recorded in access log, and according to the access time and the file information judgement described in Whether file is cached;
File prefetches module, for according to the access log, each access time of the file in take-off time section TP Point, and each access time neighborhood of a point is obtained, it is calculated in the file and the neighborhood and is accessed by clustering algorithm File corresponding with the degree of association is stored in queue to be prefetched in the form of character string, is passed through by the degree of association of file Cosine similarity algorithm calculates separately in the queue to be prefetched similarity between file two-by-two, in the queue to be prefetched File is reconfigured, and in conjunction with the similarity, is compressed to each combination, and each combination and institute are calculated The total correlation degree of file is stated, and obtains the maximum combination of total correlation degree described in the combination, as the one group of text to be prefetched Part.
The file cache module includes the threshold value N for setting the file and being accessed number, is sentenced according to the access time The file break in the accessed frequency n um in current time T, if the accessed frequency n um is greater than the threshold value N, then by the file cache to distributed caching in, the otherwise uncached file.
The neighborhood is that period TN, the period of acquisition are respectively taken before and after sometime point tt.
It further includes carrying out compression processing to a group of file to be prefetched that the file, which prefetches module, prefetches compressed text Part is to distributed caching.
It further include that file rejects module, comprising:
Step 31, the least file of access times in the t time is taken out in queue Qf, wherein pressing t in distributed caching The queue that access times in time are ranked up is the queue Qf;
Step 32, it finds in the t time and visits in the queue Qt for critical rejecting point with the middle position of queue Qt It asks number least file, judges its position in the queue Qt, if the least file of access times is in critical rejecting Point below, and time for being accessed earliest of the least file of access times less than the file of critical rejecting point earliest access when Between, then the least file of access times is rejected, the file cache module is otherwise executed, wherein by visit in distributed caching Ask that the queue that the time is ranked up is the queue Qt;
Step 33, access times in the t time are taken out in the queue Qf and are less than the file of threshold value M, and execute step Rapid 32.
As it can be seen from the above scheme the present invention has the advantages that
Cache module of the invention can be by analysis, and effective caching is likely to the file being accessed again later, right In the file for being less likely to be accessed again, do not cached.Module is prefetched using clustering algorithm and cosine similarity algorithm The degree of association file is calculated, and combines compress technique, can effectively carry out prefetching problem, enable the file being prefetched to It is effectively accessed, improves the hit rate for prefetching file.For rejecting module, two queues are maintained, one when being by access Between a queue Qt being ranked up, the other is a queue Qf being ranked up by the access times in the t time, when use Between and spatially two kinds of strategies combine with ensure reject file be the file that can be most unlikely accessed again.
Detailed description of the invention
Fig. 1 is the method for the present invention flow chart;
Fig. 2 is present system structure chart.
Specific embodiment
The present invention provides one layer of distribution buffer (buffer), for delaying on existing distributed file system The file in distributed file system is deposited and prefetches, target is can to accelerate the reading speed of distributed file system.
To achieve the above object, The technical solution adopted by the invention is as follows:
The present invention proposes that the file of Based on Distributed file system prefetches/caching method, includes the following steps, such as Fig. 1 institute Show:
A. cache file, its implementation are as follows:
A1. when client accesses a file in distributed file system;
A2. the time for accessing this document and the file information are recorded in access log;
A3. according to the accessed time of this document in access log and i.e. corresponding the file information judge this document whether into Row caching.
B. file, its implementation are prefetched are as follows:
B1. when a file is cached in memory, according to the access log of history, this document in the TP period is taken out Each access time point t1, t2, t3 ... tN;
B2. traversal history access log, time neighborhood of a point refer to, respectively take the TN period before and after sometime point tt, The obtained period.Accessed file f i 1, fi2, the fi3 ... in t1 neighborhood are taken out, one column of composition obtain an access letter Cease Inf1;
B3. t2 neighborhood, the access information Inf2, Inf3 ... of t3 neighborhood ... tN neighborhood are taken out respectively by step B2 InfN;
B4. above-mentioned access information Inf1, Inf2, Inf3 ... InfN are calculated, is obtained with existing clustering algorithm The degree of association r1, r2, r3 ... of file are accessed in above-mentioned cache file and above-mentioned time vertex neighborhood.
B5. by the degree of association size f1, f2, f3 ... are ranked up and filename is stored in the form of character string to It prefetches in queue, according to the size of distributed memory, the space size M of file can be prefetched by setting one file of every caching, from pass Queue to be prefetched is intercepted from big to small according to the degree of association in connection file, makes the size p*M of file to be prefetched in queue to be prefetched, The desirable arbitrary number greater than 1 of p, it is proposed that take 3.
B6. cosine similarity algorithm is utilized, the similarity between the file two-by-two in queue to be prefetched is calculated separately.Cosine phase Like the step of degree algorithm are as follows: calculate cosine after pretreatment → text feature item selection → weighting → generation vector space model.Institute Obtained cosine value is the similarity of two files.
B7. the file in queue to be prefetched is reconfigured, in conjunction with the similarity between file, is calculated using existing compression Method takes out the file that compressed total size is M or slightly smaller than M size to each combination.
B8. the total degree of association size for calculating each composition file and institute's cache file in B7, takes out total correlation degree size The maximum a group of file one group, as to be prefetched.
B9. compression processing is carried out to the obtained a group of file in B8 with existing compress technique, then prefetches these compressions File prefetches end to distributed caching.
C. file, its implementation are rejected are as follows:
When distributed memory deficiency, need to reject some files from memory, elimination method is as follows.
C1. two document queues are safeguarded in distributed memory, one is a queue Qt being ranked up by access time, The other is a queue Qf being ranked up by the access times in the t time.
C2. the least file of access times in the t time is taken out in Qf.
C3. the least text of access times in the above-mentioned t time is found for critical rejecting point in Qt with the middle position of Qt Part judges its position in Qt, if this document behind critical rejecting point, i.e. the time ratio that is accessed earliest of this document The earliest access time of critical rejecting dot file is early, then directly rejects this document, which terminates.It is no to then follow the steps A4- 4。
C4. the file that access times in the t time are less than threshold value M is taken out in Qf, executes C3;
Step of the present invention is further described below, it is an object of the present invention to the texts in cache prefetching distributed file system Part promotes the response speed of distributed file system.Detailed implementation steps include to execute: A, cache file to distributed memory; B, the file big with the cache file degree of association is prefetched to distributed memory.C, when distributed memory deficiency, one in memory is rejected A little files.A kind of specific embodiment is as follows:
A. cache file, its implementation are as follows:
A1. when a file in client access distributed file system;
A2. the time for accessing this document and the file information are recorded in access log;
A3. judge this document in current time T time according to the accessed time of this document in access log Access times num sets the threshold value N of access times, if the access times num in T time is greater than the access times of setting Threshold value N, then it is on the contrary then do not cache this document in distributed caching this document being cached to.For distributed field system System, many files are only accessed once in longer period of time, just without caching these files, and then improve distributed memory Utilization rate, avoid useless caching and reject operation.
B. file, its implementation are prefetched are as follows:
B1. when a file is cached in memory, according to history access log, this document is each in the taking-up TP time Access time point t1, t2, t3 ... tN;
B2. traversal history access log, time neighborhood of a point refer to, respectively take the TN period before and after sometime point tt, The obtained period.Accessed the file f i1, fi2, fi3 ... in t1 neighborhood are taken out, one column of composition obtain an access letter Cease Inf1;
B3. t2 neighborhood, the access information Inf2, Inf3 ... of t3 neighborhood ... tN neighborhood are taken out respectively by step B2 InfN;
B4. above-mentioned access information Inf1, Inf2, Inf3 ... InfN are calculated, is obtained with existing clustering algorithm The degree of association r1, r2, r3 ... of file are accessed in above-mentioned cache file and above-mentioned time vertex neighborhood.
B5. by the degree of association size f1, f2, f3 ... are ranked up and filename is stored in the form of character string to It prefetches in queue, according to the size of distributed memory, the space size M of file can be prefetched by setting one file of every caching, from pass Queue to be prefetched is intercepted from big to small according to the degree of association in connection file, makes the size p*M of file to be prefetched in queue to be prefetched, The desirable arbitrary number greater than 1 of p, it is proposed that take 3.
B6. cosine similarity algorithm is utilized, the similarity between the file two-by-two in queue to be prefetched is calculated separately.Cosine phase Like the step of degree algorithm are as follows: calculate cosine after pretreatment → text feature item selection → weighting → generation vector space model.Institute Obtained cosine value is the similarity of two files.
B7. the file in queue to be prefetched is reconfigured, in conjunction with the similarity between file, is calculated using existing compression Method takes out the file that compressed total size is M or slightly smaller than M size to each combination, and obtaining total size is greater than less than M's Combination nova.Steps are as follows for specific execution:
B7-1. prefetching the file number in queue is FN, and X file is taken out from FN file, guarantees X file Size is greater than M upon compression.FN file is reconfigured, is obtainedA combination.
B7-2. utilize the calculated file two-by-two of B6 similarity, calculate two high files of similarity in each combination into The compressed size of row.
B7-3. two high files of similarity are further screened from each combination as a compressed file Size is equal to or slightly less than a group of file of M out.
B8. the total degree of association size for calculating each group file and institute's cache file in B7, it is maximum to take out total correlation degree size One group, a group of file as to be prefetched.
B9. compression processing is carried out to the obtained a group of file in B8 using aprior compress technique, then prefetches these Compressed file prefetches end to distributed caching.
C. file, its implementation are rejected are as follows:
When distributed memory deficiency, need to reject some files from memory, elimination method is as follows.
C1. two document queues are safeguarded in distributed memory, one is a queue Qt being ranked up by access time, The other is a queue Qf being ranked up by the access times in the t time.
C2. the least file of access times in the t time is taken out in Qf.
C3. the least text of access times in the above-mentioned t time is found for critical rejecting point in Qt with the middle position of Qt Part judges its position in Qt, if this document behind critical rejecting point, i.e. the time ratio that is accessed earliest of this document The earliest access time of critical rejecting dot file is early, then directly rejects this document, which terminates.It is no to then follow the steps A4- 4。
C4. the file that access times in the t time are less than threshold value M is taken out in Qf, executes C3;
The present invention also proposes that a kind of file of Based on Distributed file system prefetches/caching system, as shown in Figure 2, comprising:
File cache module, for the file will to be accessed when the file in client access distributed file system Access time and the file information be recorded in access log, and according to the access time and the file information judgement described in Whether file is cached;
File prefetches module, for according to the access log, each access time of the file in take-off time section TP Point, and each access time neighborhood of a point is obtained, it is calculated in the file and the neighborhood and is accessed by clustering algorithm File corresponding with the degree of association is stored in queue to be prefetched in the form of character string, is passed through by the degree of association of file Cosine similarity algorithm calculates separately in the queue to be prefetched similarity between file two-by-two, in the queue to be prefetched File is reconfigured, and in conjunction with the similarity, is compressed to each combination, and each combination and institute are calculated The total correlation degree of file is stated, and obtains the maximum combination of total correlation degree described in the combination, as the one group of text to be prefetched Part.
The file cache module includes the threshold value N for setting the file and being accessed number, is sentenced according to the access time The file break in the accessed frequency n um in current time T, if the accessed frequency n um is greater than the threshold value N, then by the file cache to distributed caching in, the otherwise uncached file.
The neighborhood is that period TN, the period of acquisition are respectively taken before and after sometime point tt.
It further includes carrying out compression processing to a group of file to be prefetched that the file, which prefetches module, prefetches compressed text Part is to distributed caching.
It further include that file rejects module, comprising:
Step 31, the least file of access times in the t time is taken out in queue Qf, wherein pressing t in distributed caching The queue that access times in time are ranked up is the queue Qf;
Step 32, it finds in the t time and visits in the queue Qt for critical rejecting point with the middle position of queue Qt It asks number least file, judges its position in the queue Qt, if the least file of access times is in critical rejecting Point below, and time for being accessed earliest of the least file of access times less than the file of critical rejecting point earliest access when Between, then the least file of access times is rejected, the file cache module is otherwise executed, wherein by visit in distributed caching Ask that the queue that the time is ranked up is the queue Qt;
Step 33, access times in the t time are taken out in the queue Qf and are less than the file of threshold value M, and execute step Rapid 32.

Claims (8)

1. a kind of file of Based on Distributed file system prefetches/caching method characterized by comprising
File cache step, when the file in client access distributed file system, when by the access for accessing the file Between and the file information be recorded in access log, and whether the file is judged according to the access time and the file information It is cached;
File pre-fetch step, according to the access log, each access time point of the file in take-off time section TP, and obtain Each access time neighborhood of a point, by clustering algorithm calculate the file be accessed file in the neighborhood and be associated with Degree, file corresponding with the degree of association is stored in queue to be prefetched in the form of character string, cosine similarity is passed through Algorithm calculates separately in the queue to be prefetched similarity between file two-by-two, carries out weight to the file in the queue to be prefetched Combination nova, and in conjunction with the similarity, each combination is compressed, each combination of calculating is total with the file The degree of association, and the maximum combination of total correlation degree described in the combination is obtained, as a group of file to be prefetched;
Wherein the neighborhood is that period TN, the period of acquisition are respectively taken before and after sometime point tt.
2. the file of Based on Distributed file system as described in claim 1 prefetches/caching method, which is characterized in that described File cache step includes the threshold value N for setting the file and being accessed number, judges that the file exists according to the access time Accessed frequency n um in current time T, if the accessed frequency n um is greater than the threshold value N, by the text Part is cached in distributed caching, otherwise the uncached file.
3. the file of Based on Distributed file system as described in claim 1 prefetches/caching method, which is characterized in that described File pre-fetch step further includes carrying out compression processing to a group of file to be prefetched, prefetches compressed file to distributed Caching.
4. the file of Based on Distributed file system as described in claim 1 prefetches/caching method, which is characterized in that also wrap Include rejecting Files step, comprising:
Step 31, the least file of access times in the t time is taken out in queue Qf, wherein pressing the t time in distributed caching The queue that interior access times are ranked up is the queue Qf;
Step 32, with the middle position of queue Qt for critical rejecting point, access time in the t time is found in the queue Qt The least file of number, judges its position in the queue Qt, if the least file of access times is in critical rejecting point Below, and time for being accessed earliest of the least file of access times be less than critical rejecting point file earliest access time, The least file of access times is then rejected, the file cache step is otherwise executed, wherein by access in distributed caching The queue that time is ranked up is the queue Qt;
Step 33, access times in the t time are taken out in the queue Qf and are less than the file of threshold value M, and execute step 32.
5. a kind of file of Based on Distributed file system prefetches/caching system characterized by comprising
File cache module, for the visit of the file will to be accessed when the file in client access distributed file system It asks that time and the file information are recorded in access log, and the file is judged according to the access time and the file information Whether cached;
File prefetches module, for according to the access log, each access time point of the file in take-off time section TP, and Each access time neighborhood of a point is obtained, is calculated by clustering algorithm and is accessed file in the file and the neighborhood File corresponding with the degree of association is stored in queue to be prefetched in the form of character string, passes through cosine phase by the degree of association Calculate separately in the queue to be prefetched similarity between file two-by-two like degree algorithm, to the file in the queue to be prefetched into Row reconfigures, and in conjunction with the similarity, compresses to each combination, calculates each combination and the file Total correlation degree, and the maximum combination of total correlation degree described in the combination is obtained, as a group of file to be prefetched;
Wherein the neighborhood is that period TN, the period of acquisition are respectively taken before and after sometime point tt.
6. the file of Based on Distributed file system as claimed in claim 5 prefetches/caching system, which is characterized in that described File cache module includes the threshold value N for setting the file and being accessed number, judges that the file exists according to the access time Accessed frequency n um in current time T, if the accessed frequency n um is greater than the threshold value N, by the text Part is cached in distributed caching, otherwise the uncached file.
7. the file of Based on Distributed file system as claimed in claim 5 prefetches/caching system, which is characterized in that described It further includes carrying out compression processing to a group of file to be prefetched that file, which prefetches module, prefetches compressed file to distributed Caching.
8. the file of Based on Distributed file system as claimed in claim 5 prefetches/caching system, which is characterized in that also wrap It includes file and rejects module, comprising:
Step 31, the least file of access times in the t time is taken out in queue Qf, wherein pressing the t time in distributed caching The queue that interior access times are ranked up is the queue Qf;
Step 32, with the middle position of queue Qt for critical rejecting point, access time in the t time is found in the queue Qt The least file of number, judges its position in the queue Qt, if the least file of access times is in critical rejecting point Below, and time for being accessed earliest of the least file of access times be less than critical rejecting point file earliest access time, The least file of access times is then rejected, the file cache module is otherwise executed, wherein by access in distributed caching The queue that time is ranked up is the queue Qt;
Step 33, access times in the t time are taken out in the queue Qf and are less than the file of threshold value M, and execute step 32.
CN201610811562.3A 2016-09-08 2016-09-08 A kind of file of Based on Distributed file system prefetches/caching method and device Active CN106446079B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610811562.3A CN106446079B (en) 2016-09-08 2016-09-08 A kind of file of Based on Distributed file system prefetches/caching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610811562.3A CN106446079B (en) 2016-09-08 2016-09-08 A kind of file of Based on Distributed file system prefetches/caching method and device

Publications (2)

Publication Number Publication Date
CN106446079A CN106446079A (en) 2017-02-22
CN106446079B true CN106446079B (en) 2019-06-18

Family

ID=58164345

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610811562.3A Active CN106446079B (en) 2016-09-08 2016-09-08 A kind of file of Based on Distributed file system prefetches/caching method and device

Country Status (1)

Country Link
CN (1) CN106446079B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107357922A (en) * 2017-07-21 2017-11-17 郑州云海信息技术有限公司 A kind of NFS of distributed file system accesses auditing method and system
CN113282235A (en) * 2018-06-16 2021-08-20 王梅 Method and system for dynamically processing data set based on shift-out in cache
CN109195180A (en) * 2018-07-20 2019-01-11 重庆邮电大学 A kind of solution for reducing content in mobile content central site network and obtaining time delay
CN109413176B (en) * 2018-10-19 2021-06-08 中国银行股份有限公司 Report downloading method and device
CN109492009B (en) * 2018-11-25 2023-06-23 广州市塞安物联网科技有限公司 Method and system for identifying relevance time units in big data storage device
CN110018997B (en) * 2019-03-08 2021-07-23 中国农业科学院农业信息研究所 Mass small file storage optimization method based on HDFS

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101916289A (en) * 2010-08-20 2010-12-15 浙江大学 Method for establishing digital library storage system supporting mass small files and dynamic backup number
CN102332029A (en) * 2011-10-15 2012-01-25 西安交通大学 Hadoop-based mass classifiable small file association storage method
CN102332027A (en) * 2011-10-15 2012-01-25 西安交通大学 Mass non-independent small file associated storage method based on Hadoop
CN103077220A (en) * 2012-12-29 2013-05-01 中国科学院深圳先进技术研究院 User group correlation degree-based personalized recommendation method and system
CN103345449A (en) * 2013-06-19 2013-10-09 暨南大学 Method and system for prefetching fingerprints oriented to data de-duplication technology
CN103795781A (en) * 2013-12-10 2014-05-14 西安邮电大学 Distributed cache model based on file prediction
CN104023348A (en) * 2014-05-14 2014-09-03 北京大学深圳研究生院 Data pre-fetching method supporting consumer movement, access base station and terminal
CN105843841A (en) * 2016-03-07 2016-08-10 青岛理工大学 Small file storage method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101916289A (en) * 2010-08-20 2010-12-15 浙江大学 Method for establishing digital library storage system supporting mass small files and dynamic backup number
CN102332029A (en) * 2011-10-15 2012-01-25 西安交通大学 Hadoop-based mass classifiable small file association storage method
CN102332027A (en) * 2011-10-15 2012-01-25 西安交通大学 Mass non-independent small file associated storage method based on Hadoop
CN103077220A (en) * 2012-12-29 2013-05-01 中国科学院深圳先进技术研究院 User group correlation degree-based personalized recommendation method and system
CN103345449A (en) * 2013-06-19 2013-10-09 暨南大学 Method and system for prefetching fingerprints oriented to data de-duplication technology
CN103795781A (en) * 2013-12-10 2014-05-14 西安邮电大学 Distributed cache model based on file prediction
CN104023348A (en) * 2014-05-14 2014-09-03 北京大学深圳研究生院 Data pre-fetching method supporting consumer movement, access base station and terminal
CN105843841A (en) * 2016-03-07 2016-08-10 青岛理工大学 Small file storage method and system

Also Published As

Publication number Publication date
CN106446079A (en) 2017-02-22

Similar Documents

Publication Publication Date Title
CN106446079B (en) A kind of file of Based on Distributed file system prefetches/caching method and device
Subedi et al. Stacker: an autonomic data movement engine for extreme-scale data staging-based in-situ workflows
US20150032967A1 (en) Systems and methods for adaptive prefetching
CN103795781B (en) A kind of distributed caching method based on file prediction
CN104320448B (en) A kind of caching of the calculating equipment based on big data and prefetch acceleration method and device
CN110287010B (en) Cache data prefetching method oriented to Spark time window data analysis
CN105550338A (en) HTML5 application cache based mobile Web cache optimization method
Gharaibeh et al. A GPU accelerated storage system
Shi et al. An SPN-based integrated model for Web prefetching and caching
Chen et al. A hybrid memory built by SSD and DRAM to support in-memory Big Data analytics
Kim et al. Improving small file I/O performance for massive digital archives
Kazi et al. Web object prefetching: Approaches and a new algorithm
Ahmad et al. Reducing user latency in web prefetching using integrated techniques
CN103442000B (en) WEB caching replacement method and device, http proxy server
CN111209082A (en) Docker container registry prefetching method based on relevance
Yoon et al. Design of DRAM-NAND flash hybrid main memory and Q-learning-based prefetching method
Chen et al. Exploiting application-level similarity to improve SSD cache performance in Hadoop
Li et al. Real-time data prefetching algorithm based on sequential patternmining in cloud environment
Temgire et al. Review on web prefetching techniques
Faridi et al. Memcached vs Redis Caching Optimization Comparison using Machine Learning
Umapathi et al. Enhancing Web Services Using Predictive Caching
Lee et al. A proactive request distribution (prord) using web log mining in a cluster-based web server
Varki et al. Improve prefetch performance by splitting the cache replacement queue
Park et al. Directory search performance optimization of AMGA for the Belle II experiment
Mehta et al. Distributed database caching for web applications and web services

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240320

Address after: Room 711C, Floor 7, Building A, Yard 19, Ronghua Middle Road, Daxing District, Beijing Economic-Technological Development Area, 100176

Patentee after: Beijing Zhongke Flux Technology Co.,Ltd.

Country or region after: China

Address before: 100190 No. 6 South Road, Zhongguancun Academy of Sciences, Beijing, Haidian District

Patentee before: Institute of Computing Technology, Chinese Academy of Sciences

Country or region before: China

TR01 Transfer of patent right