CN108710639A - A kind of mass small documents access optimization method based on Ceph - Google Patents

A kind of mass small documents access optimization method based on Ceph Download PDF

Info

Publication number
CN108710639A
CN108710639A CN201810343960.6A CN201810343960A CN108710639A CN 108710639 A CN108710639 A CN 108710639A CN 201810343960 A CN201810343960 A CN 201810343960A CN 108710639 A CN108710639 A CN 108710639A
Authority
CN
China
Prior art keywords
file
small documents
ceph
client
cache
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810343960.6A
Other languages
Chinese (zh)
Other versions
CN108710639B (en
Inventor
王勇
陆小霞
叶苗
郇宜鸣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN201810343960.6A priority Critical patent/CN108710639B/en
Publication of CN108710639A publication Critical patent/CN108710639A/en
Application granted granted Critical
Publication of CN108710639B publication Critical patent/CN108710639B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present invention discloses a kind of mass small documents access optimization method based on Ceph, when user's storage file, the associated packet of small documents is obtained first with K-means clustering algorithms, the file in every group is ranked up by sequence from big to small again, is stored again into Ceph after then merging the associated in associated packet.When user initiates access request, whether in the buffer system first checks demand file, if in the presence of directly reading and returning to demand file;Otherwise solicited message is sent to Ceph clusters, realize the reading of small documents and according between blocks of files utilization rate and correlation ratio carry out small documents and prefetching and cache, return to demand file and prefetch small documents.The invention is reduced user's access time, is improved the access efficiency of mass small documents, improve the overall performance of system by the interaction of reduction user and cluster.

Description

A kind of mass small documents access optimization method based on Ceph
Technical field
The present invention relates to distributed document technical field of memory, and in particular to a kind of mass small documents access based on Ceph Optimization method.
Background technology
With the rapid development of cloud computing and big data, global metadata amount is exponentially incremented by, traditional storage system due to The factors such as its equipment cost and maintenance cost cannot meet the storage demand of people gradually.In addition, not with small documents quantity Disconnected to increase, most of distributed memory system cannot meet the efficient storage of small documents and the demand of reading.How to solve The storage of mass small documents and problem of management, the efficiency that stores and accesses for improving small documents is present maximum challenge.
Ceph is a kind of distributed file system, and when handling big file, the efficient storage and pipe of file may be implemented Reason, but Ceph, when storing mass small documents, there are still some shortcomings:
(1) storage efficiency of mass small documents is relatively low.It is to support affairs that interface, which is locally stored, in Ceph, and introducing log mechanism makes It obtains all write operations to be required for that daily record is first written, then local file system is written by object memory interface, therefore big In the case of the continuous I/O of scale, the handling capacity exported on practical disk is the half of its physical property, leads to small documents storage It can be relatively low;
(2) reading efficiency of mass small documents is not high.When small documents are accessed frequently, cluster needs to save in multiple storages Constantly jump is searched between point, therefore the small documents reading performance of Ceph clusters can be caused poor.
Invention content
To be solved by this invention is that Ceph has that storage is low with reading efficiency when handling mass small documents, is carried Optimization method is accessed for a kind of mass small documents based on Ceph.
To solve the above problems, the present invention is achieved by the following technical solutions:
A kind of mass small documents access optimization method based on Ceph, including steps are as follows:
Step 1, the filename and file size for obtaining the file of file to be uploaded in the client same period, and according to The file threshold value of setting classifies to these files:When the size of file to be uploaded is more than file threshold value, then it is determined as Big file is uploaded directly into Ceph clusters;When the size of file to be uploaded is equal to or less than file threshold value, then it is determined as small File;
Step 2 is associated grouping using K-means clustering algorithms to small documents, and to the small documents in each grouping It is ranked up from big to small according to file size, then uploads to Ceph collection after the small documents in each grouping are merged successively Group, while merging the mapping relations generation index file in file according to small documents;
Step 3, when user sends out access request, whether client judges demand file in the caching of client:If In the caching of client, then the demand file is directly directly accessed from the caching of client;Otherwise, client believes request Breath uploads Ceph clusters;
Step 4, Ceph clusters receive solicited message, and determine its file type according to the filename of demand file, if asking When to seek file be big file, then the demand file is directly read from Ceph clusters, and store into client-cache for user It accesses, if demand file is small documents, specific location of the demand file in merging file is first determined according to index file Information, then the demand file is read from Ceph clusters, and store and accessed into client-cache for user.
In above-mentioned steps 1, file threshold value is set according to Ceph group document block sizes.
In above-mentioned steps 2, the small documents in each grouping need to judge small documents to be combined in being associated with merging process Whether it is more than file threshold value with the sum of the size of merging file for merging generation before;If being less than or equal to file threshold value, directly will Small documents to be combined merge before being merged into the merging file generated, otherwise, need to apply for a merging file again.
In above-mentioned steps 2, the structure of index file is <key,value>, the filename of wherein key preservation small documents, Value preserves the size file_length of initial position file_offset and small documents of the small documents in merging file.
As an improvement, the mass small documents based on Ceph access optimization method, still further comprises file and prefetched Journey, i.e.,:
In the read requests file from Ceph clusters, and when demand file is small documents, where needing computation requests file Merge the correlation ratio Ψ of each small documents and demand file in file, and correlation ratio Ψ in the merging file is more than related threshold The small documents of value are read out together with demand file, in storage to client-cache;Wherein correlation ratio Ψ is:
Wherein, n accessed numbers of demand file in timing statistics section, d indicate to merge in file in timing statistics section The accessed number of small documents, sum indicate the total degree that all small documents are accessed in timing statistics section.
As a further improvement, in file prefetching process, it is more than the small of dependent thresholds when merging correlation ratio Ψ in file When file number prefetches number num more than given maximum, then num small documents and request are literary before only coming correlation ratio Ψ Part stores in client-cache together.
In said program, maximum prefetches number num and is:
Wherein, math.floor (*) indicate downward rounding, TwIndicate the maximum latency of user, TCephIndicate Ceph collection Group receives access request to the time for returning to file, TpreIndicate that Ceph clusters prefetch the time of a file.
As an improvement, the mass small documents based on Ceph access optimization method, still further comprise in client Caching file carry out cache optimization process, that is, calculate separately the weight R of each filew, and according to the power of cache file Weight RwFile is ranked up, the wherein high file of weight is stored in the L2 cache of client, and the low file of weight is deposited Storage is in level cache;When the file newly read in follow-up Ceph clusters needs to store to the caching in client, and caching sky Between it is insufficient when, weight R is gradually deleted from level cachewMinimum file;The wherein weight R of filewFor:
Rw=e-(Nt-Nr)×t
Wherein, NtIndicate the maximum capacity of client-cache, NrIndicate that the accessed number of cache file, t indicate caching more The new time.
Compared with prior art, the present invention has following features:
1, merged by the association of judgement, small documents to file, the foundation of index file, mass small documents can be improved Storage efficiency.
2, it using small documents pre-read mechanism, realizes and prefetches relative file to slow while small documents are read It deposits, reduces the interaction of user and cluster to improve the reading efficiency of small documents.
3, in file cache mechanism, caching pair can dynamically be calculated according to the access times of cache object and access time The weight factor of elephant determines the access of cache object according to its weight factor and eliminates sequence, reduces the waste of caching, improves slow The hit rate deposited further improves the reading efficiency of small documents.
Description of the drawings
Fig. 1 is the functional block diagram that optimization method is accessed based on Ceph mass small documents of preferred embodiment of the present invention.
Fig. 2 is file write-in schematic diagram.
Fig. 3 is index structure figure.
Fig. 4 is that file reads schematic diagram.
Fig. 5 is that file prefetches schematic diagram.
Fig. 6 is cache optimization schematic diagram.
Specific implementation mode
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with specific example, and with reference to attached Figure, the present invention is described in more detail.
A kind of mass small documents access optimization method based on Ceph, it is poly- first with K-means when user's storage file Class algorithm obtains the associated packet of small documents, then is ranked up by sequence from big to small to the file in every group, then will close Associated in connection grouping is stored into Ceph again after merging.When user initiates access request, system is first checked and is asked Whether in the buffer file is sought, if in the presence of directly reading and returning to demand file;Otherwise solicited message is sent to Ceph clusters, It realizes the reading of small documents and small text is realized according to demand file and the correlation where it in merging file between other small documents Part prefetching and caching, and is then back to demand file and prefetches small documents.
Specifically, a kind of mass small documents based on Ceph access optimization method, as shown in Figure 1, being written including file Stage and file read the stage;Specific steps are as follows:
(1) write-in of file, as shown in Figure 2.
The filename and file size of the file of file to be uploaded in the step S1 acquisition client same periods, and according to The file threshold value of setting classifies to these files:When the size of file to be uploaded is more than file threshold value, then it is determined as Big file is uploaded directly into Ceph clusters;When the size of file to be uploaded is equal to or less than file threshold value, then it is determined as small File, and go to step 2.Upload procedure is as shown in Figure 2.
Step S2, grouping is associated to small documents using K-means clustering algorithms, and to the small documents in each grouping It is ranked up from big to small according to file size, then uploads to Ceph collection after the small documents in each grouping are merged successively Group, while merging the mapping relations generation index file in file according to small documents.
Clustering processing is carried out to these small documents using K-means clustering algorithms, obtains the different grouping of small documents, Similarity in the same grouping between each small documents is higher, that is to say that the relevance between each small documents is bigger, can be with Small documents in same grouping are merged.In order to avoid there are blocks of files fragment problems, first to the text in same grouping Part is sorted from big to small, is stored again after then again merging the small documents in same group successively to Ceph clusters.In addition, Small documents, to avoid across the block storage of file, need the small documents for judging newly to merge and have merged text in being associated with merging process Whether the sum of size of part is more than threshold value 4M, if more than then needing to apply for a merging file again.
Indexed file structure indexed file structure <key,value>, key store small documents filename, value preserve small documents merging The size file_length of initial position file_offset and small documents in file, indexed file structure are as shown in Figure 3.
(2) reading of file, as shown in Figure 4.
Step S3, when user initiates access request, client receives solicited message, and checks that this document whether there is in visitor In the caching of family end.If in the buffer, reading small documents from caching and returning to demand file;Otherwise, illustrate this document not yet It is read, solicited message is sent to Ceph clusters, and go to step S4.
Step S4, Ceph clusters receive solicited message, and determine its file type according to the filename of demand file, if asking When to seek file be big file, then the demand file is directly read from Ceph clusters, and store into client-cache for user It accesses, if demand file is small documents, first determines that demand file specific location in merging file is believed according to index file Breath, then the demand file is read from Ceph clusters, and store and accessed into client-cache for user.
Step S5, file prefetches mechanism, as shown in Figure 5.
In order to effectively improve file reading speed, during small documents are read, pre-read mechanism can also be utilized real The pre-read of existing correlation small documents, and return to corresponding small documents simultaneously and prefetch file.
It is related to other small documents that merging file where judging demand file currently merges demand file in file Rate Ψ, and correlation ratio Ψ is compared with the dependent thresholds of setting:When the correlation ratio Ψ of small documents is more than dependent thresholds, It with the pre-read small documents and can then store into client-cache;Wherein correlation ratio Ψ is:
Wherein, n accessed numbers of demand file in timing statistics section, d indicate to merge in file in timing statistics section The accessed number of small documents, sum indicate the total degree that all small documents are accessed in timing statistics section.
In view of the limitation in client-cache space, in file prefetching process, it is more than phase when merging correlation ratio Ψ in file When the small documents number of pass threshold value prefetches number num more than given maximum, then num small texts before only coming correlation ratio Ψ Part stores in client-cache together with demand file, and above-mentioned maximum prefetches number num and can be manually set, can also basis Following formula, which calculates, to be determined:
Wherein, math.floor (*) indicate downward rounding, TwIndicate the maximum latency of user, TCephIndicate Ceph collection Group receives access request to the time for returning to file, TpreIndicate that Ceph clusters prefetch the time of a file.
Step S6, cache optimization mechanism, as shown in Figure 6.
According to the access frequency of cache file and access time, the weight R of each file is calculated separatelyw.According to weight Rw's Size determines the priority of cache object, according to the weight R of cache filewFile is ranked up, wherein weight RwIt is relatively high File Privilege is high, is stored in the L2 cache of client, and weight RwRelatively low File Privilege is low, is stored in visitor In the level cache at family end.When the file newly read in follow-up Ceph clusters needs to store to the caching in client, and caching When insufficient space, weight R is gradually deleted from level cachewMinimum file.If cache file is not accessed for a long time, Weight RwIt can decay therewith, avoid the case where certain files waste spatial cache because not being accessed for a long time.
The weight R of above-mentioned filewFor:
Rw=e-(Nt-Nr)×t
Wherein, NtIndicate the maximum capacity of client-cache, NrIndicate that the accessed number of cache file, t indicate caching more The new time.
The present invention is first passed through file detection, is clustered to small documents using K-means clustering algorithms when file is written Analysis obtains the associated packet of small documents, then is associated to store after merging to the file in associated packet and arrives Ceph clusters. When file association merges, by small documents with merges mapping relations between file generate index file and storage in the client, carry The search efficiency of high small documents.When reading file, according to the phase merged in advance according to demand file in blocks of files between alternative document Closing property realizes prefetching and caching for small documents.Cache optimization mechanism, small documents are prefetched in caching, and calculate corresponding power Repeated factor, weight can reduce with the growth of time, if weight is less than given threshold value, be removed from caching, in this way can be with The waste for reducing spatial cache, improves the hit rate of cache file.The present invention is reduced and is used by the interaction of reduction user and cluster Family access time improves storage and the reading efficiency of mass small documents, improves the overall performance of system.
It should be noted that although the above embodiment of the present invention is illustrative, this is not to the present invention Limitation, therefore the invention is not limited in above-mentioned specific implementation mode.Without departing from the principles of the present invention, every The other embodiment that those skilled in the art obtain under the inspiration of the present invention is accordingly to be regarded as within the protection of the present invention.

Claims (8)

1. a kind of mass small documents based on Ceph access optimization method, characterized in that including steps are as follows:
Step 1, the filename and file size for obtaining the file of file to be uploaded in the client same period, and according to setting File threshold value classify to these files:When the size of file to be uploaded is more than file threshold value, then it is determined as big text Part is uploaded directly into Ceph clusters;When the size of file to be uploaded is equal to or less than file threshold value, then it is determined as small text Part;
Step 2 is associated grouping using K-means clustering algorithms to small documents, and to the small documents in each grouping according to File size is ranked up from big to small, then Ceph clusters are uploaded to after the small documents in each grouping are merged successively, together When according to small documents merge file in mapping relations generate index file;
Step 3, when user sends out access request, whether client judges demand file in the caching of client:If in visitor In the caching at family end, then the demand file is directly directly accessed from the caching of client;Otherwise, client will be in solicited message Pass Ceph clusters;
Step 4, Ceph clusters receive solicited message, and determine its file type according to the filename of demand file, if request text When part is big file, then the demand file is directly read from Ceph clusters, and store and accessed into client-cache for user, If demand file is small documents, more specific location information of the demand file in merging file is first determined according to index file, The demand file is read from Ceph clusters again, and stores and is accessed into client-cache for user.
2. a kind of mass small documents based on Ceph according to claim 1 access optimization method, characterized in that step 1 In, file threshold value is set according to Ceph group document block sizes.
3. a kind of mass small documents based on Ceph according to claim 1 access optimization method, characterized in that step 2 In, the small documents in each grouping need to judge small documents to be combined and merge generation before in being associated with merging process Merge whether the sum of size of file is more than file threshold value;If being less than or equal to file threshold value, directly small documents to be combined are closed And it in the merging file for merging generation before, otherwise, needs to apply for a merging file again.
4. a kind of mass small documents based on Ceph according to claim 1 access optimization method, characterized in that step 2 In, the structure of index file is <key,value>, wherein key preserve small documents filename, value preserve small documents closing And the size file_length of the initial position file_offset and small documents in file.
5. a kind of mass small documents based on Ceph according to claim 1 access optimization method, characterized in that also into one Step includes file prefetching process, i.e.,:
In the read requests file from Ceph clusters, and when demand file is small documents, need to merge where computation requests file The correlation ratio Ψ of each small documents and demand file in file, and correlation ratio Ψ in the merging file is more than dependent thresholds Small documents are read out together with demand file, in storage to client-cache;Wherein correlation ratio Ψ is:
Wherein, n accessed numbers of demand file in timing statistics section, d indicate to merge small text in file in timing statistics section The accessed number of part, sum indicate the total degree that all small documents are accessed in timing statistics section.
6. a kind of mass small documents based on Ceph according to claim 5 access optimization method, characterized in that in file In prefetching process, number is prefetched more than given maximum when merging correlation ratio Ψ in file more than the small documents number of dependent thresholds When num, then num small documents store in client-cache together with demand file before only coming correlation ratio Ψ.
7. a kind of mass small documents based on Ceph according to claim 6 access optimization method, characterized in that maximum pre- The number num is taken to be:
Wherein, math.floor (*) indicates downward rounding, TwIndicate the maximum latency of user, TCephIndicate that Ceph clusters connect Access request is received to the time for returning to file, TpreIndicate that Ceph clusters prefetch the time of a file.
8. a kind of mass small documents based on Ceph access optimization method according to claim 1 or 5, characterized in that also Further comprise the process for carrying out cache optimization to the file of the caching in client, that is, calculates separately the weight of each file Rw, and according to the weight R of cache filewFile is ranked up, wherein the high file of weight is stored in the L2 cache of client In, and the low file of weight is stored in level cache;Client is arrived when the file newly read in follow-up Ceph clusters needs to store Caching in end, and when inadequate buffer space, weight R is gradually deleted from level cachewMinimum file;The wherein power of file Weight RwFor:
Rw=e-(Nt-Nr)×t
Wherein, NtIndicate the maximum capacity of client-cache, NrThe accessed number of cache file is indicated, when t indicates buffer update Between.
CN201810343960.6A 2018-04-17 2018-04-17 Ceph-based access optimization method for mass small files Expired - Fee Related CN108710639B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810343960.6A CN108710639B (en) 2018-04-17 2018-04-17 Ceph-based access optimization method for mass small files

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810343960.6A CN108710639B (en) 2018-04-17 2018-04-17 Ceph-based access optimization method for mass small files

Publications (2)

Publication Number Publication Date
CN108710639A true CN108710639A (en) 2018-10-26
CN108710639B CN108710639B (en) 2021-05-14

Family

ID=63867222

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810343960.6A Expired - Fee Related CN108710639B (en) 2018-04-17 2018-04-17 Ceph-based access optimization method for mass small files

Country Status (1)

Country Link
CN (1) CN108710639B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109634914A (en) * 2018-11-21 2019-04-16 华侨大学 A kind of scattered point of optimization method retrieved with bifurcated of radio voice small documents whole deposit
CN109726177A (en) * 2018-12-29 2019-05-07 北京赛思信安技术股份有限公司 A kind of mass file subregion indexing means based on HBase
CN110018997A (en) * 2019-03-08 2019-07-16 中国农业科学院农业信息研究所 A kind of mass small documents storage optimization method based on HDFS
CN110888838A (en) * 2019-10-16 2020-03-17 平安科技(深圳)有限公司 Object storage based request processing method, device, equipment and storage medium
CN112363872A (en) * 2020-11-25 2021-02-12 深圳潮数软件科技有限公司 Efficient backup method for small file slice transmission
CN112422448A (en) * 2020-08-21 2021-02-26 苏州浪潮智能科技有限公司 FPGA accelerator card network data transmission method and related components
CN112905557A (en) * 2021-03-03 2021-06-04 山东兆物网络技术股份有限公司 Mass file integration storage method and system supporting asynchronous submission
CN113760190A (en) * 2021-08-23 2021-12-07 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Small file merging system and method based on Ceph storage
CN113778341A (en) * 2021-09-17 2021-12-10 北京航天泰坦科技股份有限公司 Distributed storage method and device for remote sensing data and remote sensing data reading method
CN115630021A (en) * 2022-12-13 2023-01-20 中国华能集团清洁能源技术研究院有限公司 Method and device for merging small and medium files in object storage under big data environment
CN118069589A (en) * 2024-04-17 2024-05-24 济南浪潮数据技术有限公司 File access method, device, computer equipment and program product
CN118132520A (en) * 2024-05-08 2024-06-04 济南浪潮数据技术有限公司 Storage system file processing method, electronic device, storage medium and program product

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012226492A (en) * 2011-04-18 2012-11-15 Magic Software Japan Kk Document information providing device, document browsing terminal and method, and computer program
CN104536959A (en) * 2014-10-16 2015-04-22 南京邮电大学 Optimized method for accessing lots of small files for Hadoop
CN106980616A (en) * 2016-01-15 2017-07-25 航天信息股份有限公司 A kind of mass small documents merge storage method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012226492A (en) * 2011-04-18 2012-11-15 Magic Software Japan Kk Document information providing device, document browsing terminal and method, and computer program
CN104536959A (en) * 2014-10-16 2015-04-22 南京邮电大学 Optimized method for accessing lots of small files for Hadoop
CN106980616A (en) * 2016-01-15 2017-07-25 航天信息股份有限公司 A kind of mass small documents merge storage method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张毕涛: "分布式存储系统小文件性能优化方案的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109634914B (en) * 2018-11-21 2021-11-30 华侨大学 Optimization method for whole storage, dispersion and bifurcation retrieval of talkback voice small files
CN109634914A (en) * 2018-11-21 2019-04-16 华侨大学 A kind of scattered point of optimization method retrieved with bifurcated of radio voice small documents whole deposit
CN109726177A (en) * 2018-12-29 2019-05-07 北京赛思信安技术股份有限公司 A kind of mass file subregion indexing means based on HBase
CN110018997A (en) * 2019-03-08 2019-07-16 中国农业科学院农业信息研究所 A kind of mass small documents storage optimization method based on HDFS
CN110018997B (en) * 2019-03-08 2021-07-23 中国农业科学院农业信息研究所 Mass small file storage optimization method based on HDFS
CN110888838A (en) * 2019-10-16 2020-03-17 平安科技(深圳)有限公司 Object storage based request processing method, device, equipment and storage medium
WO2021072881A1 (en) * 2019-10-16 2021-04-22 平安科技(深圳)有限公司 Object storage-based request processing method, apparatus and device, and storage medium
CN110888838B (en) * 2019-10-16 2024-03-08 平安科技(深圳)有限公司 Request processing method, device, equipment and storage medium based on object storage
CN112422448A (en) * 2020-08-21 2021-02-26 苏州浪潮智能科技有限公司 FPGA accelerator card network data transmission method and related components
CN112363872A (en) * 2020-11-25 2021-02-12 深圳潮数软件科技有限公司 Efficient backup method for small file slice transmission
CN112905557A (en) * 2021-03-03 2021-06-04 山东兆物网络技术股份有限公司 Mass file integration storage method and system supporting asynchronous submission
CN113760190A (en) * 2021-08-23 2021-12-07 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Small file merging system and method based on Ceph storage
CN113778341A (en) * 2021-09-17 2021-12-10 北京航天泰坦科技股份有限公司 Distributed storage method and device for remote sensing data and remote sensing data reading method
CN115630021A (en) * 2022-12-13 2023-01-20 中国华能集团清洁能源技术研究院有限公司 Method and device for merging small and medium files in object storage under big data environment
CN118069589A (en) * 2024-04-17 2024-05-24 济南浪潮数据技术有限公司 File access method, device, computer equipment and program product
CN118132520A (en) * 2024-05-08 2024-06-04 济南浪潮数据技术有限公司 Storage system file processing method, electronic device, storage medium and program product

Also Published As

Publication number Publication date
CN108710639B (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN108710639A (en) A kind of mass small documents access optimization method based on Ceph
US7966289B2 (en) Systems and methods for reading objects in a file system
CN104133882B (en) A kind of small documents processing method based on HDFS
US9836514B2 (en) Cache based key-value store mapping and replication
US8131697B2 (en) Method and apparatus for approximate matching where programmable logic is used to process data being written to a mass storage medium and process data being read from a mass storage medium
CN103856567A (en) Small file storage method based on Hadoop distributed file system
CN106503051B (en) A kind of greediness based on meta data category prefetches type data recovery system and restoration methods
US11693885B2 (en) Cache optimization via topics in web search engines
US10503792B1 (en) Cache optimization via topics in web search engines
CN103345449B (en) A kind of fingerprint forecasting method towards data de-duplication technology and system
CN110569245A (en) Fingerprint index prefetching method based on reinforcement learning in data de-duplication system
CN106528451B (en) The cloud storage frame and construction method prefetched for the L2 cache of small documents
CN105787012B (en) A kind of method and storage system improving storage system processing small documents
CN110515920A (en) A kind of mass small documents access method and system based on Hadoop
CN111782612A (en) File data edge caching method in cross-domain virtual data space
CN107180043B (en) Paging implementation method and paging system
CN107506154A (en) A kind of read method of metadata, device and computer-readable recording medium
CN113722274A (en) Efficient R-tree index remote sensing data storage model
CN111787062B (en) Wide area network file system-oriented adaptive fast increment pre-reading method
CN106294526B (en) A kind of mass small documents moving method in hierarchical stor
CN114546962A (en) Hadoop-based distributed storage system for marine bureau ship inspection big data
CN114168084A (en) File merging method, file merging device, electronic equipment and storage medium
CN116069752A (en) Mixed prefetching method for distributed file system
CN114461590A (en) Database file page prefetching method and device based on association rule
CN112860641A (en) Small file storage method and device based on HADOOP

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20181026

Assignee: Guangxi Boyan Technology Co.,Ltd.

Assignor: GUILIN University OF ELECTRONIC TECHNOLOGY

Contract record no.: X2022450000542

Denomination of invention: A Ceph based Access Optimization Method for Massive Small Files

Granted publication date: 20210514

License type: Common License

Record date: 20221229

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210514