CN107066505A - The system and method that a kind of small documents storage of performance optimization is accessed - Google Patents

The system and method that a kind of small documents storage of performance optimization is accessed Download PDF

Info

Publication number
CN107066505A
CN107066505A CN201710015554.2A CN201710015554A CN107066505A CN 107066505 A CN107066505 A CN 107066505A CN 201710015554 A CN201710015554 A CN 201710015554A CN 107066505 A CN107066505 A CN 107066505A
Authority
CN
China
Prior art keywords
file
block
small documents
data
write
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710015554.2A
Other languages
Chinese (zh)
Inventor
聂东旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201710015554.2A priority Critical patent/CN107066505A/en
Publication of CN107066505A publication Critical patent/CN107066505A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Abstract

The present invention relates to the small documents storage method technical field in distributed file system, the system and method that more particularly to a kind of small documents storage of performance optimization is accessed.Continuous data in logic are stored in the continuous space of physical disk by the present invention as far as possible, serve as the role of meta data server using Cache and improve Cache utilization rates by simplified fileinfo node, improve small documents access performance;When writing data polymerization update the data and its file domain in related data be I/O request write-in, reduce file fragmentation quantity, improve memory space utilization rate;Principle of locality is utilized when file is transmitted, the small documents of the high rate of people logging in of batch are sent in advance, reduces and sets up network connection expense, improve file transmission performance.

Description

The system and method that a kind of small documents storage of performance optimization is accessed
Technical field
The present invention relates to the small documents storage method technical field in distributed file system, more particularly to a kind of performance is excellent The system and method that the small documents storage of change is accessed.
Background technology
Miscellaneous service and mass data information are flooded with continuing to develop for Internet technology, on internet. In order to these mass datas of preferably organization and management, various types of distributed file system knots are had been presented at present Structure.Because the data message on internet is showed in high-frequency small documents form mostly, and in the letter of general user During breath storage is accessed, the storage to small documents accesses more, therefore high-frequency small documents read/write performance on internet is ground Studying carefully has important realistic meaning.
In distributed file system, for big file, often file is cut into slices using striping technology, and distributes Stored on multiple data servers, concurrency of the user to file access is improved with this, so as to improve to big file Access performance.And for small documents (≤64KB), because it is unfavorable for striping, so being usually to use to deposit single file Store up the strategy on individual data server.But after the quantity of small documents is reached to a certain degree, to a large amount of of small documents Ground repeated accesses will bring burden and I/O bottleneck problems in performance to data server.
The problem of traditional distributed file system is primarily present following 3 aspects in small documents management:
1) because the access frequency of small documents is higher, it is necessary to disk repeatedly be accessed, so the performance of magnetic disc i/o is relatively low;
2) because Documents Comparison is small, the easy waste for forming file fragmentation and causing disk space;
3) easily network delay is produced when setting up a connection for the request of each small documents.
Therefore, the storage access performance of optimization small documents is most important.At present, the optimization storage of small documents accesses research master Brought if being directed to the problem of small documents access relatively low I/O performances and easy formation file fragmentation, but not accounting for other operations File change problem.
Other related researchs also include optimization of optimizing research and file transmission to existing distributed file system etc.. But the research is mainly used in the transmission of general file, the transmission performance of small documents can't be optimized.
The content of the invention
In order to solve problem of the prior art, the invention provides the system that a kind of small documents storage of performance optimization is accessed And method, continuous data in logic are stored in the continuous space of physical disk, metadata are served as using Cache by it as far as possible The role of server simultaneously improves Cache utilization rates by simplified fileinfo node, improves small documents access performance;Write number According to when polymerization update the data and its file domain in related data be I/O request write-in, reduce file fragmentation quantity, Improve memory space utilization rate;Principle of locality is utilized when file is transmitted, the small documents of the high rate of people logging in of batch are sent in advance, Reduce and set up network connection expense, improve file transmission performance.
The technical scheme of the present invention is as follows:
The system that a kind of small documents storage of performance optimization is accessed, including file system interface, file domain manager, text Part information node manager, five modules of block manager and file buffering manager, described file system interface are responsible for it He is packaged module, and flexibly unified file access interface is provided to upper strata;Described file domain manager is responsible for text Part folder domain is managed, the institute that it is responsible for each fileinfo node of same file folder and is stored in this document folder There is file data;Described fileinfo node manager is responsible for managing fileinfo node;Described block manager is born Blame to the space management of disk block, disk space open up be also thus module be responsible for;Described file buffering manager is responsible for Management to file cache.
In described fileinfo node manager, the buffer of a fileinfo node is also set, this delays Storage is used to deposit the fileinfo node that accessed recently and high-frequency is accessed.
A kind of method that small documents storage of performance optimization is accessed, including:
A, disk space is divided into multiple pieces, the size of each block is 64KB, as the file≤64KB run into, then institute The file stated can only be stored in single piece, it is impossible to which, across 2 block storages, it is empty that each file data is stored in continuous disk Between on;
B, when system will read some file, by the way of pre-reading, the file in same piece is read together Out;
C, the role for serving as using cache meta data server, preserve the information of fileinfo node on cache, and And make the disk space information of each fileinfo node document retaining by simplified Inode data structures, it is described Inode data structures are as shown in the table:
File_id
StartPosition
Length
Weight
Block_id
i_count
Lock
Wherein, Folder_id is file identifier;
StartPisition is original position of the file in block;
Length is the length of file;
Wright is file weight;The access frequency of file is represented in the system of the present invention;
BLOCK_ID is the identifier for the block that file is deposited;
I_clock is the access counter of file;
Lock is file lock;
D, file fragmentation caused by deletion or modification due to file is reduced using optimization method in write operation; When described optimization method includes more new file optimization method is write when writing optimization method and establishment file;
E, the method for optimization file transmission, be specifically:To each file, according to the access of each file in file Frequency one sorted lists of formation;When user accesses some file of this document underedge, system automatically can sort this The file of high access frequency is sent over together in list.
Optimization method of writing during more new file is specifically included:
1) when the data of a file update, then since it, being present in current block and behind it File, and from those scattered piece taking-up file, condense together, as single I/O request write-in disk, the file is all It is to be stored in cache;
If 2) in current block, exist and be not placed into cache but again in the file being updated after file, Now, polymerization write-in is not carried out to it;
3) Documents Comparison after renewal is big so that when originally block does not have the file after enough space storage renewals, then This file can first be read the suitable block of reselection and be deposited by system, and system finds one suitably scattered piece, and will update File in herewith scattered piece of file polymerize write-in together, if system can not find suitably scattered piece, can be looked for from fragment Space to suitable size is deposited, then carries out polymerization write-in using same mechanism.
Optimization method of writing during establishment file is specifically included:
1) when creating a new file, find suitable scattered piece in current file folder domain first, i.e., it is remaining empty Between size be more than scattered piece of new file size, and the All Files data in this block are all in Cache;
2) when can not find suitable scattered piece, system will find suitable fragment in the fragment in this file domain empty Between write, before writing, the buffered file in side can polymerize by system together simultaneously write again, if also had enough Space, can also take out appropriate file to write together from scattered piece;
3) when can not find suitable fragment to deposit, then a newly-built block, then new file data is write.
In method E, in order to avoid being transmitted across many files, the threshold value TF of a high access frequency is set, and visit all Ask that document order of the frequency higher than TF is divided into multiple groups, each group may include All Files size sum in multiple files, group No more than 64KB, when user asks a file in current file folder, system can be in sequence by the file of a group The past is sent together, and described TF calculation formula is as follows:
Tf=N × Favg(N≥1);(1)
Fi is the access frequency of each file of file, and M is the file number during current file is pressed from both sides, and N is by User Defined Set.
Stored for current small documents and access the confinement problems that research is present, the present invention stores access structure from small documents Set out with data storage arrangement, it is proposed that a kind of small documents storage of performance optimization accesses (small file storage Access, SFSA) strategy in SFSA system designs, first proposed a kind of small documents storage organization of optimization, and will in logic Continuous data are stored on continuous disk space as much as possible, so as to improve the hit rate of pre-reads data, are reduced disk and are sought Secondly road time, serves as the role of meta data server by using cache and the mode of simplified-file information node is reduced and looked into The expense of file nodal information is looked for, so as to improve I/O performances in addition, write operation is optimized, is reduced because file is deleted Remove or change the file fragmentation produced and finally use principle of locality, it is proposed that a kind of " sending batch high-frequency in advance to access " File transmitting policy reduce the propagation delay time of small documents.
The beneficial effect that the technical scheme that the present invention is provided is brought is:
The present invention is for current distributed system in problem present on small documents management, it is proposed that a kind of performance optimization Small documents storage system accesses (SFSA) method, and SFSA systems can both improve the access performance of small documents, can also improve disk empty Between utilization rate;
In addition, according to principle of locality, present invention employs the method for " send batch high-frequency in advance and access file ", because This it can also optimize the transmission performance of file.By experimental verification, method proposed by the present invention can effectively optimize small documents Storage access performance.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, makes required in being described below to embodiment Accompanying drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for For those of ordinary skill in the art, on the premise of not paying creative work, other can also be obtained according to these accompanying drawings Accompanying drawing.
The system construction drawing that Fig. 1 accesses for a kind of small documents storage of performance optimization of the present invention;
The small documents storage exemplary plot for the system that Fig. 2 accesses for a kind of small documents storage of performance optimization of the present invention;
File storage in the same file for the system that Fig. 3 accesses for a kind of small documents storage of performance optimization of the present invention Exemplary plot;
The data structure in the file domain for the system that Fig. 4 accesses for a kind of small documents storage of performance optimization of the present invention Figure;
Inode data knots after the simplification for the system that Fig. 5 accesses for a kind of small documents storage of performance optimization of the present invention Composition;
Optimization side is write during the more new file for the method that Fig. 6 accesses for a kind of small documents storage of performance optimization of the present invention Method schematic diagram;
Optimization side is write during the more new file for the method that Fig. 7 accesses for a kind of small documents storage of performance optimization of the present invention Method exemplary plot;
Optimization side is write during the establishment file for the method that Fig. 8 accesses for a kind of small documents storage of performance optimization of the present invention Method schematic diagram;
Fig. 9 is the comparison schematic diagram of server average response time before and after magnetic disc i/o performance optimizes;
Figure 10 is the comparative effectiveness figure for writing the front and rear disk space usage of optimization;
Figure 11 is the comparative effectiveness figure of file transmission optimization forward backward averaging file propagation delay time.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to embodiment party of the present invention Formula is described in further detail.
The structure such as Fig. 1 institutes for the system (SFSA systems) that a kind of small documents storage of performance optimization of the present embodiment is accessed Show.The system mainly includes file system interface (fail system interface, FSI), file domain manager (folder domain manager, FDM), fileinfo node manager (inode manager, IM), block manager (block manager, BM) and 5 modules of file buffering manager (file cache manager, FCM).
FSI is responsible for being packaged other modules, and flexibly unified file access interface is provided to upper strata.Responsible couple of FDM (Folder domain, ID) is managed in file domain.It is responsible for same file folder each fileinfo node and The .IM such as the blockid of All Files data being stored in this document folder are responsible for managing fileinfo node.In this pipe Manage in device, also set the buffer of a fileinfo node.This buffer is mainly used in depositing the sum accessed recently The fileinfo node that high-frequency is accessed.BM is responsible for the space management of disk block, and opening up for disk space is also thus module It is responsible for the responsible management to file cache of .FCM.
In the present embodiment, SFSA systems are the side by using the continuous disk space for opening up bulk when storing small documents Formula stores substantial amounts of small documents.Disk space is divided into multiple pieces (block) first, the size of each block is 64KB, greatly The continuous disk space of file is just constituted by this series of piece.When the small (definition of the Documents Comparison run into:When the size of file During≤64KB, it is referred to as " small documents ") when, then each small documents can only be stored in single piece, it is impossible to deposited across 2 blocks, often Individual file data is all stored on continuous disk space.It is the example for depositing multiple small documents in one block such as accompanying drawing 2 Figure.
In Fig. 2, F1, F2, F3, F4 and F5 are 5 files, are continuously deposited between file and file, such as F1 and F2, F2 with F3, F4 and F5.Black portions are the fragment of this block, should preferential handle when the size for file occur is less than the size of these fragments File is stored in these fragments.
In order to improve the hit rate of " pre-reading " data, SATA systems continuous data will be stored in thing as much as possible in logic On the continuous space for managing disk.File data under being pressed from both sides i.e. by the data of same file or by same file is deposited as much as possible Storage is on continuous disk space block, and each file will possess one or more pieces, and these blocks all only deposit this file File, as shown in Figure 3.
FD is mostly important data structure in SFSA systems, and its design relation is set to following multiple optimisation strategies Count and realize that its data structure of is as shown in Figure 4:
Wherein, Folder_id is the identifier of file, and system can be that each file or file distribute an independence Identifier distinguish each other.
Scattered Block ID List and Good Block Id List are to deposit the file in this file respectively 2 block identifier lists of data.Wherein Scattered Block ID List storages are scattered block identifier lists, Good Block Id List storages are non-scattered block identifier lists.The two lists can be used for the whole of disk space fragment Reason.
Inode List are, to one of the Inode chained list that sorts, to be ranked up according to the access frequency of file.It is in order to Realize the transmission of optimization file and design.
F_count is the access counter of file.
Rwlock is the Read-Write Locks of file access.
Under normal circumstances, distributed file system is all that the attribute information (fileinfo node) of file is stored in first number According on server, on I/O servers, it is only necessary to which knowing the disk space information of file can conduct interviews.Therefore in I/O clothes It is engaged on device, it is only necessary to the disk space information of log file, without other attributes of log file, such as creation time, most Access time and owning user etc. afterwards.Based on this, the data structure to Inode simplifies, the disk space of a document retaining Information and the low volume data member for belonging to it, the data structure of the Inode after simplifying are as shown in Figure 5.
Folder_id is file identifier.
StartPisition is original position of the file in block.
Length is the length of file.
Wright is file weight.The access frequency of file is represented in SFSA systems.
BLOCK_ID is the identifier for the block that file is deposited.
I_clock is the access counter of file.
Lock is file lock.
In the present embodiment, the method for optimizing magnetic disc i/o performance is as follows:
In Modern disk equipment, the time delay for reading and writing low volume data is typically mainly spent in the tracking positioning of magnetic head.Once Have good positioning, it is not very big to read a data block with the time phase difference for reading continuous several data blocks.
Therefore, with reference to the data store organisation of optimization presented above, i.e., continuous data in logic are deposited as much as possible Storage is on the continuous space of physical disk.According to principle of locality, when system will read some file, using the side pre-read Formula, the file in same piece is read out together, so as to reduce the number of times of magnetic disc i/o.
A kind of dispatching algorithm pre-read is given below:
Algorithm 1. pre-reads dispatching algorithm Readfiled (File_id).
①if(IsCcahe(File_id))Return File Buffer(File_id);If/* current files are present In Cache, then directly return Cache in file data */
2. Block_id=GetBlockId (File_id);/ * obtains depositing current file by the identifier of file The identifier * of block/
3. OutStream=ReadBlockData (Block_id);/ * read all data * of the block/
4. CacheFilesBufferFromBlock (OutStream, Block_id);/ * caches what is stored in this block All Files data */
Now file data in Cache, directly returns to Cache to return filebuffer (File_id)/* In file data */
Under normal conditions, the attribute information (fileinfo node) of file is stored on meta data server.In I/O clothes It is engaged on device, it is only necessary to which knowing the disk space information of file can conduct interviews.But in this case on meta data server I/O expenses it is very big, since it is desired that frequently access meta data server on disk, so the I/O poor-performings of whole system. In SFSA system architectures, the role of meta data server is served as using cache, fileinfo node is preserved on cache Information.And by simplified Inode data structures make the disk space information of each fileinfo node document retaining with And other a small amount of useful information, so as to improve cache utilization rate, cache is preserved substantial amounts of fileinfo node. In this way, the number of times for accessing disk and the expense for reading fileinfo node are reduced.
In the present embodiment, reduce disk fragmentses space waste to write optimization method as follows:
General distributed file system, can between small documents after substantial amounts of deletion file and modification file request Substantial amounts of file fragmentation can be produced, and these fragments are difficult re-using, this equally also results in a large amount of disk spaces Waste
In view of the above-mentioned problems, optimizing processing to write operation i.e. in SFSA systems:Optimization plan is used in write operation Slightly reduce file fragmentation problem caused by deletion or modification due to file.Write operation occur mainly 2 kinds of situations, point It is not more new file and establishment file.
1) optimization method is write during more new file:
As shown in Figure 6, optimization Integral Thought is write when being more new file:
When occurring F3 in file data, such as Fig. 6 in some file domain, the associated documents in this file domain will be polymerize, And as in single I/O request write-in disks, so as to reduce the fragment between file and file.
The following is more specifically solution, as shown in Figure 7.
1. when the data (F3) of a file update, then since it, be present in current block and in it File below, and suitable file is taken out from those scattered piece, condense together, as single I/O request write-in disks, As shown in Fig. 7 (a).Notice that these files are stored in cache.
If 2. in current block, existing and not being placed into cache but again in the file being updated after file, Such as Fig. 7 (b) F5.Now, polymerization write-in is not carried out to F5, so the file size of energy one-time write is from F3 starting points Location is to F5 initial addresses.
3. situation 1. 2. in, be all the situation for having enough space storages after file updates in original block, but work as Documents Comparison after renewal is big so that when originally block does not have the file after enough space storage renewals, then system can be by this text Part first reads the suitable block of reselection and deposited.As shown in Figure 7.Fig. 7 (C) represents that system finds one suitably scattered piece, And the file in herewith scattered piece of more new file is polymerize write-in together., can be from if system can not find suitable scattered piece The space that suitable size is found in fragment is deposited, then carries out polymerization write-in using same mechanism.
2), during establishment file to write optimization method as follows:
As shown in Figure 8:
1. when creating a new file, find suitable scattered piece in current file folder domain first, i.e., it is remaining empty Between size be more than scattered piece of new file size, and the All Files data in this block are all in Cache, such as Fig. 8 (a) institutes Show.
3. when can not find suitable scattered piece, system will find suitable fragment in the fragment in this file domain empty Between write.Before writing, the buffered file in side can together polymerize and write again by system simultaneously, such as Fig. 8 (b) institutes Show.If also sufficient space, can also take out appropriate file to write together, shown in such as Fig. 8 (c) from scattered piece.
4. when can not find suitable fragment to deposit, then a newly-built block, then new file data is write.
Above optimization method uses similar DFS de-fragmentation strategies, here except that organizing herein Elementary cell be variable length whole file, rather than the blocks of files in DFS being fixed size.
In the present embodiment, the method for optimization file transmission is as follows:
In a distributed system, small documents are often unfavorable for striping, and whole file can only be stored in individual data clothes Is when user asks a small documents on business device, and system sets up TCP connections between user and corresponding data server, then File data is transmitted to user, connection is finally closed.Need to shake hands for 3 times due to setting up TCP connections, so as to add file transmission Time delay.Some current distributed systems are by the way of connection is kept.But in the case where there is a large number of users to file a request, it is Each user keeps burden and the waste that connection will be brought in resource to system.
Therefore, to each file, a sorted lists are formed according to the access frequency of each file in file.When When user accesses some file of this document underedge, system automatically can rise the file one of high access frequency in this list Pass through.But in order to avoid being transmitted across many files, the threshold value TF of a high access frequency is set, and all access frequencys are high It is divided into multiple groups in TF document order, each group may include All Files size sum in multiple files, group and be no more than 64KB.When user asks a file in current file folder, system in sequence can rise the file one of a group Pass through, so that the calculation formula for reducing file propagation delay time .TF is as follows:
Tf=N × Favg(N≥1):(1)
Fi is the access frequency of each file of file, and M is the file number during current file is pressed from both sides, and N is by User Defined Set.
Therefore, according to principle of locality, the present embodiment proposes a kind of side of " send batch high-frequency in advance and access file " Method, its arthmetic statement is as follows:
SendFile(ReqFileId).
1.Folder_id=GetFolderId (ReqfileId);/ * is obtained by the identifier of demand file where it The identifier * of file/
2.ReqFileBuffer=ReadFile (ReqFileId);/ * read requests file datas */
3.BatchFilesBuffer=GetHighFreqBatchFilesBuffer (Folder_id);/ * obtains batch High-frequency file data */
4.SendToClient(ReqFileBuffer+BatchReqFileBuffer);/ * demand file data and Batch high-frequency file data send jointly to user */
When user asks a file in a file, while the high access frequency file in this file is sent out Pass through, a part of file is only sent every time, it is consideration that the total bytes of this partial document are set to 64KB no more than 64KB. The transmission size is smaller to the transmission delay that user brings.
In order to examine the actual effect of SFSA systems in the present embodiment, substantial amounts of emulation has been carried out in laboratory environments real Test.The main effect for verifying and analyzing 3 optimisation strategies of experiment.Experimental situation is:5 PCs constitute experimental system.PC The processor for being configured to P42.6GHz, 1GB internal memory is connected with 100MBps Ethernets between node.
1st, the test optimized for magnetic disc i/o performance:
Using 2 PCs as the server of 2 kinds of forms, one is not done I/O optimization processings, and one has carried out I/O in addition Optimization processing uses 3 PCs constantly to be sent as client clients to 2 servers and reads file request, and server connects Receive data and handle request, while gathering corresponding experimental data in experimental situation in server end, place on the server About 20 files, each file averagely places 100 files.Fig. 9 show server before and after the optimization of magnetic disc i/o performance The comparison of average response time.
From fig. 9, it can be seen that after magnetic disc i/o optimization, performance is obviously improved, than being improved about before optimization 34.7%.
2nd, the test of write performance optimization:
In this experiment, 200 files are deposited in initialization system, Figure 10 writes disk space usage before and after optimization Comparative effectiveness.
From fig. 10 it can be seen that initialization when system disk space usage highest, reach 95.6% or so.But be System, due to constantly there is the files such as document creation, deletion and renewal to change operation, makes disk space after operation after a while Utilization rate declined, but 91% or so is stilled remain in, relative to the 83.4% of no optimization, using the present invention's SFSA system optimization methods have more preferable disk space usage.
3rd, the test of file transmission performance optimization:
In this experiment, according to ordinary circumstance, it is 4 to set the N values in TF expression formulas.Figure 11 show file transmission optimization The comparison of forward backward averaging file propagation delay time.
It can be seen from figure 11 that after user asks the file number of times in same file folder at 10 times or so, file Propagation delay time has obtained preferable improvement, with being compared before optimization, and SFSA optimization methods of the invention can reduce about 14% net Network postpones.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent substitution and improvements made etc. should be included in the scope of the protection.

Claims (6)

1. the system that a kind of small documents storage of performance optimization is accessed, including file system interface, file domain manager, file Information node manager, five modules of block manager and file buffering manager, described file system interface are responsible for other Module is packaged, and flexibly unified file access interface is provided to upper strata;Described file domain manager is responsible for file Folder domain is managed, and it is responsible for each fileinfo node of same file folder and is stored in all in this document folder File data;Described fileinfo node manager is responsible for managing fileinfo node;Described block manager is responsible for To the space management of disk block, disk space open up be also thus module be responsible for;Responsible pair of described file buffering manager The management of file cache.
2. the system that a kind of small documents storage of performance optimization according to claim 1 is accessed, it is characterised in that described Fileinfo node manager in, also set the buffer of a fileinfo node, this buffer be used for deposit most The fileinfo node that nearly accessed and high-frequency is accessed.
3. a kind of method that small documents storage of performance optimization is accessed, including:
A, disk space is divided into multiple pieces, the size of each block is 64KB, as the file≤64KB run into, then described File can only be stored in single piece, it is impossible to which, across 2 block storages, each file data is stored in continuous disk space On;
B, when system will read some file, by the way of pre-reading, the file in same piece is read out together;
C, the role for serving as using cache meta data server, preserve the information of fileinfo node on cache, and lead to Crossing simplified Inode data structures makes the disk space information of each fileinfo node document retaining, described Inode numbers It is as shown in the table according to structure:
File_id StartPasition Length Weight Blockid icount Lock
Wherein, Folder_id is file identifier;
StartPisition is original position of the file in block;
Length is the length of file;
Wright is file weight;The access frequency of file is represented in the system of the present invention;
BLOCK_ID is the identifier for the block that file is deposited;
I_clock is the access counter of file;
Lock is file lock;
D, file fragmentation caused by deletion or modification due to file is reduced using optimization method in write operation;It is described Optimization method include more new file when write optimization method when writing optimization method and establishment file;
E, the method for optimization file transmission, be specifically:To each file, according to the access frequency of each file in file Form a sorted lists;When user accesses some file of this document underedge, system can be automatically by this sorted lists In the file of high access frequency send over together.
4. the method that a kind of small documents storage of performance optimization according to claim 3 is accessed, it is characterised in that described Optimization method of writing during more new file is specifically included:
1) when the data of a file update, then since it, the text being present in current block and behind it Part, and from those scattered piece taking-up file, condense together, as single I/O request write-in disks, the file is all deposited It is placed in cache;
If 2) in current block, there is the file for not being placed into cache but being in and be updated after file again, this When, polymerization write-in is not carried out to it;
3) Documents Comparison after renewal is big so that when originally block does not have the file after enough space storage renewals, then system This file can first be read to the suitable block of reselection to be deposited, system finds one suitably scattered piece, and will more new file File in herewith scattered piece polymerize write-in together, if system can not find suitably scattered piece, conjunction can be found from fragment The space of suitable size is deposited, then carries out polymerization write-in using same mechanism.
5. the method that a kind of small documents storage of performance optimization according to claim 3 is accessed, it is characterised in that described Optimization method of writing during establishment file is specifically included:
1) when creating a new file, find suitable scattered piece in current file folder domain first, i.e., remaining space is big Small scattered piece more than new file size, and All Files data in this block are all in Cache;
2) when can not find suitable scattered piece, system will find suitable fragment space in the fragment in this file domain Write-in, before writing, the buffered file in side can polymerize by system together simultaneously to be write again, if also had empty enough Between, also it can take out appropriate file to write together from scattered piece;
3) when can not find suitable fragment to deposit, then a newly-built block, then new file data is write.
6. the method that a kind of small documents storage of performance optimization according to claim 1 is accessed, it is characterised in that described In method E, in order to avoid being transmitted across many files, the threshold value TF of a high access frequency is set, and all access frequencys are high It is divided into multiple groups in TF document order, each group may include All Files size sum in multiple files, group and be no more than 64KB, when user asks a file in current file folder, system in sequence can rise the file one of a group Pass through, described TF calculation formula is as follows:
Tf=N × Favg(N≥1); (1)
<mrow> <msub> <mi>F</mi> <mrow> <mi>a</mi> <mi>v</mi> <mi>g</mi> </mrow> </msub> <mo>=</mo> <mfrac> <mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <msub> <mi>F</mi> <mi>i</mi> </msub> </mrow> <mi>M</mi> </mfrac> <mo>.</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>
Fi is the access frequency of each file of file, and M is the file number during current file is pressed from both sides, and N is set by User Defined.
CN201710015554.2A 2017-01-10 2017-01-10 The system and method that a kind of small documents storage of performance optimization is accessed Pending CN107066505A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710015554.2A CN107066505A (en) 2017-01-10 2017-01-10 The system and method that a kind of small documents storage of performance optimization is accessed

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710015554.2A CN107066505A (en) 2017-01-10 2017-01-10 The system and method that a kind of small documents storage of performance optimization is accessed

Publications (1)

Publication Number Publication Date
CN107066505A true CN107066505A (en) 2017-08-18

Family

ID=59597848

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710015554.2A Pending CN107066505A (en) 2017-01-10 2017-01-10 The system and method that a kind of small documents storage of performance optimization is accessed

Country Status (1)

Country Link
CN (1) CN107066505A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376100A (en) * 2018-11-05 2019-02-22 浪潮电子信息产业股份有限公司 A kind of caching wiring method, device, equipment and readable storage medium storing program for executing
CN109597903A (en) * 2018-11-21 2019-04-09 北京市商汤科技开发有限公司 Image file processing apparatus and method, document storage system and storage medium
CN109933570A (en) * 2019-03-15 2019-06-25 中山大学 A kind of metadata management method, system and medium
CN110442555A (en) * 2019-07-26 2019-11-12 华中科技大学 A kind of method and system of the reduction fragment of selectivity reserved space
CN113641883A (en) * 2021-05-26 2021-11-12 中国再保险(集团)股份有限公司 Rapid reading method and interface for large amount of multi-element heterogeneous complex underlying surface space data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103176754A (en) * 2013-04-02 2013-06-26 浪潮电子信息产业股份有限公司 Reading and storing method for massive amounts of small files
US20140258347A1 (en) * 2013-03-11 2014-09-11 Microsoft Corporation Grouping files for optimized file operations
CN104375782A (en) * 2014-10-21 2015-02-25 浪潮电子信息产业股份有限公司 Read-write solution for tens of millions of small file data
CN104391961A (en) * 2014-12-03 2015-03-04 浪潮集团有限公司 Tens of millions of small file data read and write solution strategy

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140258347A1 (en) * 2013-03-11 2014-09-11 Microsoft Corporation Grouping files for optimized file operations
CN103176754A (en) * 2013-04-02 2013-06-26 浪潮电子信息产业股份有限公司 Reading and storing method for massive amounts of small files
CN104375782A (en) * 2014-10-21 2015-02-25 浪潮电子信息产业股份有限公司 Read-write solution for tens of millions of small file data
CN104391961A (en) * 2014-12-03 2015-03-04 浪潮集团有限公司 Tens of millions of small file data read and write solution strategy

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵跃龙等: "一种性能优化的小文件存储访问策略的研究", 《计算机研究与发展》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376100A (en) * 2018-11-05 2019-02-22 浪潮电子信息产业股份有限公司 A kind of caching wiring method, device, equipment and readable storage medium storing program for executing
CN109597903A (en) * 2018-11-21 2019-04-09 北京市商汤科技开发有限公司 Image file processing apparatus and method, document storage system and storage medium
CN109597903B (en) * 2018-11-21 2021-12-28 北京市商汤科技开发有限公司 Image file processing apparatus and method, file storage system, and storage medium
CN109933570A (en) * 2019-03-15 2019-06-25 中山大学 A kind of metadata management method, system and medium
CN109933570B (en) * 2019-03-15 2020-02-07 中山大学 Metadata management method, system and medium
CN110442555A (en) * 2019-07-26 2019-11-12 华中科技大学 A kind of method and system of the reduction fragment of selectivity reserved space
CN113641883A (en) * 2021-05-26 2021-11-12 中国再保险(集团)股份有限公司 Rapid reading method and interface for large amount of multi-element heterogeneous complex underlying surface space data

Similar Documents

Publication Publication Date Title
CN107066505A (en) The system and method that a kind of small documents storage of performance optimization is accessed
US10140050B2 (en) Providing access information to a storage controller to determine a storage tier for storing data
US9471248B2 (en) Snapshots and clones of volumes in a storage system
US7711916B2 (en) Storing information on storage devices having different performance capabilities with a storage system
US9110909B2 (en) File level hierarchical storage management system, method, and apparatus
US7805416B1 (en) File system query and method of use
CN103714123B (en) Enterprise&#39;s cloud memory partitioning object data de-duplication and restructuring version control method
US8433674B2 (en) Method for clipping migration candidate file in hierarchical storage management system
CN109947668B (en) Method and device for storing data
CN104978362B (en) Data migration method, device and the meta data server of distributed file system
US20170032005A1 (en) Snapshot and/or clone copy-on-write
US11561930B2 (en) Independent evictions from datastore accelerator fleet nodes
CN104243425A (en) Content management method, device and system in content delivery network
US20170315878A1 (en) Method for low overhead, space tracking, high performance snapshots and clones by transfer of extent ownership
US8135763B1 (en) Apparatus and method for maintaining a file system index
US20150081966A1 (en) Dense tree volume metadata organization
CN103176754A (en) Reading and storing method for massive amounts of small files
US10210188B2 (en) Multi-tiered data storage in a deduplication system
US20120290595A1 (en) Super-records
CN109522283A (en) A kind of data de-duplication method and system
CN109101580A (en) A kind of hot spot data caching method and device based on Redis
CN109767274B (en) Method and system for carrying out associated storage on massive invoice data
CN108509507A (en) The account management system and its implementation of unified entrance
US20070174360A1 (en) Storage system embedding database
CN104391961A (en) Tens of millions of small file data read and write solution strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170818