CN104408091B - The date storage method and system of distributed file system - Google Patents

The date storage method and system of distributed file system Download PDF

Info

Publication number
CN104408091B
CN104408091B CN201410645370.0A CN201410645370A CN104408091B CN 104408091 B CN104408091 B CN 104408091B CN 201410645370 A CN201410645370 A CN 201410645370A CN 104408091 B CN104408091 B CN 104408091B
Authority
CN
China
Prior art keywords
file
data
data block
file system
data file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410645370.0A
Other languages
Chinese (zh)
Other versions
CN104408091A (en
Inventor
陈康
郑纬民
王振钊
黄剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201410645370.0A priority Critical patent/CN104408091B/en
Publication of CN104408091A publication Critical patent/CN104408091A/en
Application granted granted Critical
Publication of CN104408091B publication Critical patent/CN104408091B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • G06F16/183Provision of network file services by network file servers, e.g. by using NFS, CIFS

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of date storage methods of distributed file system, comprising the following steps: receives the data file that user sends;Judge the size of data file;If the size of data file is less than preset value, data file is stored by the KV storage method of the merger tree LSM-Tree based on journal format to the key-value database of cloud server;It is multiple subdata files by data file cutting, and store to local file system if the size of data file is greater than preset value.The method of the embodiment of the present invention, to improve the efficiency of distributed file system, realizes the promotion of overall performance by distinguishing data file according to the size of file.The embodiment of the invention also discloses a kind of data-storage systems of distributed file system.

Description

The date storage method and system of distributed file system
Technical field
The present invention relates to file system technology field, in particular to the date storage method of a kind of distributed file system and System.
Background technique
Currently, distributed file system such as GFS (Google File System, the file system of storing data), MooseFS, Lusture etc. are to establish on the basis of single machine file system.Distributed layer is responsible for organizational logical file To the mapping of logic data block list, and local file system is then responsible for the mapping from logic data block to hard disc data.Two layers Each performs its own functions completes addressing and the read-write operation of data jointly.
Wherein, local file system carrys out the data on tissue disk usually using multilayer index.Such as the file system of Ext series System, must find corresponding metadata before accessing data block by index of metadata, and necessary before finding the metadata Find the metadata of its parent directory.It needs first to undergo the mistake for finding metadata along directory tree before carrying out reading and writing data Journey.However, such access module directly results in the low read-write efficiency of small documents.For big file, these expenses can To share reading and writing data equally, the design of coupled system caching the influence of performance will can be dropped to it is minimum, but for small documents For, this Section Overhead occupies the overwhelming majority of entire access time.Therefore, performance of the local file system for small documents It is often very poor, an order of magnitude can be differed compared with size file read-write performance, and since distributed file system uses Local file system has read and write mostly network interaction several times compared to local, has caused dividing as rear end storage, the read-write process of file The lower problem of more serious small documents readwrite performance is equally faced under cloth environment.
In addition, since the basic storage service of local file system, local file system is used only in distributed file system The metadata of reservation is all much unnecessary.The NameSpace organizational form of distributed document and local file system not phase Together, and it is independently of local file system, therefore is optimization small documents read-write property, distributed system in the related technology has It is to be modified.
Summary of the invention
The present invention is directed to solve at least to a certain extent it is above-mentioned in the related technology the technical issues of one of.
For this purpose, an object of the present invention is to provide one kind can be improved system effectiveness and performance, simply and easily divide The date storage method of cloth file system.
It is another object of the present invention to the data-storage systems for proposing a kind of distributed file system.
In order to achieve the above objectives, one aspect of the present invention embodiment proposes a kind of data storage side of distributed file system Method, comprising the following steps: receive the data file that user sends;Judge the size of the data file;If the data text The size of part is less than preset value, then the data file is passed through the merger tree LSM-Tree (Log- based on journal format Structured Merge-Tree, the merger tree based on journal format) KV (key-value, key-value pair) storage method storage To the key-value database of cloud server;And if the size of the data file is greater than the preset value, by institute Stating data file cutting is multiple subdata files, and is stored to local file system.
The date storage method of the distributed file system proposed according to embodiments of the present invention, by by data file according to Size distinguishes, if the size of data file is less than certain value, data file is stored by the KV based on LSMTree Method is stored to the key-value database of cloud server, and if the size of data file is greater than certain value, will be counted It is multiple subdata files according to file cutting, and stores to local file system, so that the efficiency of distributed file system is improved, Realize the promotion of overall performance.
In addition, the date storage method of cloth file system according to the above embodiment of the present invention can also have it is following attached The technical characteristic added:
Further, in one embodiment of the invention, in the KV that the data file is passed through LSM-Tree After storage mode is into the key-value database of cloud server, further includes: according to the data block ID of the data file (identity, identity number), data block version, data block sequence number generate the Key of the data file.
It further, in one embodiment of the invention, is multiple subdatas by the data file cutting described File, and store to local file system, further includes: according to the data block ID and data block version of each subdata file Generate the filename of the corresponding local file system of the multiple subdata file.
Further, in one embodiment of the invention, the above method further include: according to each subdata file Middle data block generates corresponding check code, to safeguard subdata file.
Preferably, in one embodiment of the invention, the preset value can be 64MB.
Another aspect of the present invention embodiment proposes a kind of data-storage system of distributed file system, comprising: receives Module, for receiving the data file of user's transmission;Judgment module, for judging the size of the data file;First storage Module, if the size of the data file is less than preset value, for the data file to be passed through the KV based on LSM-Tree Storage method is stored to the key-value database of cloud server;And second memory module, if the data file Size is greater than the preset value, for being multiple subdata files by the data file cutting, and stores to local file system System.
The data-storage system of the distributed file system proposed according to embodiments of the present invention, by by data file according to Size distinguishes, if the size of data file is less than certain value, data file is deposited by the KV based on LSM-Tree Method for storing is stored to the key-value database of cloud server, and if the size of data file is greater than certain value, will Data file cutting is multiple subdata files, and is stored to local file system, to improve the effect of distributed file system Rate realizes the promotion of overall performance.
In addition, the data-storage system of distributed file system according to the above embodiment of the present invention can also have it is as follows Additional technical characteristic:
Further, in one embodiment of the invention, above system further include: Key generation module, for according to institute State the Key that the data block ID of data file, data block version, data block sequence number generate the data file.
Further, in one embodiment of the invention, above system further include: filename generation module is used for root The corresponding local file system of the multiple subdata file is generated according to the data block ID and data block version of each subdata file The filename of system.
Further, in one embodiment of the invention, above system further include: check code generation module is used for root Corresponding check code is generated according to data block in each subdata file, to safeguard subdata file.
Preferably, in one embodiment of the invention, the preset value can be 64MB.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partially become from the following description Obviously, or practice through the invention is recognized.
Detailed description of the invention
Above-mentioned and/or additional aspect of the invention and advantage will become from the description of the embodiment in conjunction with the following figures Obviously and it is readily appreciated that, in which:
Fig. 1 is the flow chart according to the date storage method of the distributed file system of the embodiment of the present invention;
Fig. 2 is the structural schematic diagram according to the MooseFS of one embodiment of the invention used;
Fig. 3 is the structural schematic diagram according to the data server of one embodiment of the invention;
Fig. 4 is the structural schematic diagram according to the Key of one embodiment of the invention;
Fig. 5 is according to the corresponding filename structural schematic diagram of chunk in the File Region of one embodiment of the invention;
Fig. 6 is the flow chart according to the SepStore write operation of one embodiment of the invention;
Fig. 7 is the structural schematic diagram according to the data-storage system of the distributed file system of the embodiment of the present invention;And
Fig. 8 is the structural schematic diagram according to the data-storage system of the distributed file system of one embodiment of the invention.
Specific embodiment
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, it is intended to is used to explain the present invention, and is not considered as limiting the invention.
In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or Implicitly include one or more of the features.In the description of the present invention, the meaning of " plurality " is two or more, Unless otherwise specifically defined.
In the present invention unless specifically defined or limited otherwise, term " installation ", " connected ", " connection ", " fixation " etc. Term shall be understood in a broad sense, for example, it may be being fixedly connected, may be a detachable connection, or be integrally connected;It can be machine Tool connection, is also possible to be electrically connected;It can be directly connected, two members can also be can be indirectly connected through an intermediary Connection inside part.For the ordinary skill in the art, above-mentioned term can be understood in this hair as the case may be Concrete meaning in bright.
In the present invention unless specifically defined or limited otherwise, fisrt feature second feature "upper" or "lower" It may include that the first and second features directly contact, also may include that the first and second features are not direct contacts but pass through it Between other characterisation contact.Moreover, fisrt feature includes the first spy above the second feature " above ", " above " and " above " Sign is right above second feature and oblique upper, or is merely representative of first feature horizontal height higher than second feature.Fisrt feature exists Second feature " under ", " lower section " and " following " include fisrt feature right above second feature and oblique upper, or be merely representative of First feature horizontal height is less than second feature.
The date storage method of distributed file system and system proposed according to embodiments of the present invention in description below it Before, the reason of embodiment of the present invention proposes is briefly described first.
From the point of view of the read-write process of entire distributed file system, the data exchange process of client and data server It is to influence the pith of readwrite performance, and cause due to file system structure and disk tracking etc. on data server small The readwrite performance of file is very poor.
Distributed file system has individual metadata management service, and many metadata that local file system saves are such as It is unnecessary that modification time, waiting time, Hard link number etc. are mostly.
The concept of file and the Document Concepts of local file system are not one-to-one in distributed file system, It is parallel to realize that actually big file is often cut into the file of multiple local file systems.And small documents can also polymerize At biggish to reduce the metadata expense during member read-write on local file system.
Open, close expense of local file system are unnecessary, and in view of the design of local file system, it is complete It totally disappeared except open, close expense are simultaneously impossible, but this Section Overhead can be dropped to most by the way that small documents are packaged into big file It is low.
From the point of view of the reading and writing data process on data server, the tracking of disk caused by random read-write is to influence file system The system most important factor of performance.It, can will the property write by using the magnetic disk of log-structured for random write It can increase substantially.And for random read operation, it can be by reducing IO number during read-write using memory index technology.
After optimizing small documents read-write, after being especially the reduction of a large amount of disk tracking operation, entire file system Performance can all have promotion.
The present invention is based on the above problems, and proposes the date storage method and one kind of a kind of distributed file system The data-storage system of distributed file system.
Describe with reference to the accompanying drawings the distributed file system proposed according to embodiments of the present invention date storage method and System describes the date storage method of the distributed file system proposed according to embodiments of the present invention with reference to the accompanying drawings first.Ginseng According to shown in Fig. 1, method includes the following steps:
S101 receives the data file that user sends.
Firstly, the embodiment of the present invention remains the overall architecture of MooseFS, meta data server referring to shown in Fig. 2 It is known as Master in MooseFS, or is most of code of Metadata Server) and client, realizes a new number According to server.Wherein, the structure of MooseFS is similar with GFS, by multiple client 10 (Clients), metadata management server 20 (Master) and multiple data servers 30 (Chunkserver) are constituted, and the function that each part undertakes is also similar with GFS.
Specifically, in an embodiment of the present invention, the function that each part undertakes is as follows:
Client in multiple client 10 is responsible for the resource virtualizing of entire distributed file system utilizing FUSE POSIX interface is externally provided.After using client carry, using MooseFS can be used as using local disk.
Metadata management server 20 is responsible for management metadata information, and the Placement Strategy of determination data safeguards entire system The operation of system.Wherein, entirely servicing after Master delay machine will also stop.In order to protect the reliability of metadata, MooseFS A Metadatalogger (metadata log server) has also been devised, so that the operation to metadata is backed up in realtime.
Chunkserver in multiple data servers 30 is responsible for storage of the data on disk.Wherein, The realization of Chunkserver depends on the process of a User space, and the storage of bottom then relies on local file system.
S102 judges the size of data file.
Data file is passed through the merger based on journal format if the size of data file is less than preset value by S103 The KV storage method of tree LSM-Tree is stored to the key-value database of cloud server.
Preferably, in one embodiment of the invention, preset value can be 64MB.It should be noted that changing preset value It is only that for exemplary purposes, it's not limited to that for the i.e. default size of preset value.
Further, in one embodiment of the invention, in the KV storage mode that data file is passed through to LSM-Tree After into the key-value database of cloud server, further includes: according to the data block ID of data file, data block version, The Key of data block sequence number generation data file.
Data file cutting is multiple subdata files if the size of data file is greater than preset value by S104, and It stores to local file system.
Further, in one embodiment of the invention, it is being multiple subdata files by data file cutting, and is depositing After storage to local file system, further includes: generated according to the data block ID of each subdata file and data block version multiple The filename of the corresponding local file system of subdata file.
Specifically, in one embodiment of the invention, the reading and writing data process and storage scheme of MooseFS also with GFS It is similar.Size, which carries out stripping and slicing, (will be determined, default 64MB) according to 64M for big file MooseFS when compiling, will be divided into multiple Chunk (data block is equivalent to and is divided into multiple subdata files) is stored on different Chunkserver.For small documents MooseFS then regards an individual Chunk and is stored on Chunkserver.Each chunk corresponds to Chunkserver One actual file of upper file system.Metadata management server 20 maintains the corresponding pass from filename to chunk list System, the mark of chunk is one 64 chunkID (data block ID).When carrying out reading and writing data, client can first to IP and the end of the chunid and the Chunkserver where chunk of the corresponding chunk of 20 demand file of metadata management server Slogan.Client can issue reading and writing data request after taking these information to specified Chunkserver.If it is read operation, Client completes reading data from specified Chunkserver.If it is write operation, client needs to pass data to be written To a Chunkserver, which can forward the data to other pairs while completing the read-write of local disk This place data server.Master is needed to confirm after the result that client receives the write operation that Chunkserver is transmitted The change of metadata information or the failure of write operation.
Further, in one embodiment of the invention, referring to shown in Fig. 3, the data file of the embodiment of the present invention is pressed Different storage schemes is provided according to different sizes, and provides the side of LSM technology cooperation key-value database for small documents Case, to bring the promotion of overall efficiency.Specifically, in SepStore, for big file by according to fixed size for example 64MB carries out cutting and puts chunk on a different server, and small documents are then considered an individual chunk.Finally There are the files of three kinds of sizes on Chunkserver, are considered as small documents one is be less than size preset value T, one is Between T and 64MB, last one kind is then the file of 64MB.The selection of T is a critical issue, root of the embodiment of the present invention The size for choosing 64KB as T is factually tested, but in practical applications, it should be selected according to application scenarios.In SepStore Chunkserver will be respectively the first file and second, third file provides targeted storage scheme.The present invention is real Example is applied to combine by the KV memory technology and local file system for being based on LSM (Log-Structure Merge) Tree, thus Optimum distribution formula file system data access efficiency.
Specifically, the Chunkserver structure of SepStore is as shown in Figure 3.Wherein, the Chunkserver of SepStore Be divided into three parts: a part is File Region, is used to store biggish file and is greater than T (i.e. T is used to distinguish file and be It is put into the boundary value of File Region or KV Region, T is defaulted as 64KB in SepSotre, if the T next mentioned No specified otherwise is 64KB without exception) it is less than the file of 64MB;Another part is then KV Region, is mainly used for storing small text Part is the file for being less than T;Last part is then Metadata Region, for storing data on server The metadata informations such as chunkid, version (data block version).The embodiment of the present invention will be greater than the file deposit File of T Region, the file less than T are stored in KV region, but this divide not is absolute, such as deposits in KV Region Small documents volume may become larger than T after a write operation, when this happens, data can't be migrated immediately But multiple KV pairs can be temporarily cut into, real migration operation is then by the asynchronous completion of background thread.
Further, Chunkserver externally provides unified access interface, mainly include read, write and delete.Specifically, after being connected to client request, Chunkserver can generate task and be put into task pool, task queue It is the queue of a first in first out, there is no priority differences between task.Chunkserver maintains a thread pool to locate The task in task pool is managed, read-write task is all asynchronous completion.Wherein, at runtime, all metadata can be all loaded To in memory, and it is organized into a huge Hash table, chunkid is then the input value of Hash table.Chunkserver is removed Outside the request of data at customer in response end, can also timing to Master report the letter such as local disk service condition, error situation Breath.
Further, in one embodiment of the invention, referring to shown in Fig. 4, KV different from local file system Region provides the NameSpace of a flattening.Wherein, each key-value is to having unique Key, data Cooley The positioning of data is realized with Key.In the design of KV Region, there is no preservation Key values for the embodiment of the present invention, but set A kind of simple algorithm has been counted to generate Key.Specifically, the Key of each chunk is formed as shown, by chunkid (data Block ID), version (data block version) and blocknum (data block sequence number) are collectively formed.Wherein chunkid and version All be directly be transmitted through by client Lai information, blocknum is then used to current key-value to the position in entire chunk It sets.The size of each block is fixed for example, 64KB, therefore blocknum can pass through following equation blocknum= Offset%64KB is obtained.
Further, in one embodiment of the invention, referring to Figure 5, under distributed environment, in order to realize base In the reading and writing data function of local file system, SepStore need to safeguard some additional metadata informations realize from Mapping of the chunkid to system file.As the design of KVRegion, in File Region, the embodiment of the present invention is not Have and directly save the map information of chunkid to local file, but uses a simple algorithm to realize data Address function.Specifically, the filename of the corresponding local file system of each chunk is made of chunid, version, Structure is as shown, this lie in the way in file name there are two benefit for metadata information:
The quick positioning from chunkid to data may be implemented.
SepStore opens a background thread and is periodically scanned to all File Region All Files, and Using the information that filename includes come the consistency of verify data.
In order to improve the reading performance of file, File Region takes following Optimized Measures:
(1) compared to the significantly more efficient cache policy of local file system, SepStore is maintained on file system Data buffer storage for block simultaneously realizes LRU (Least Recently Used at least uses page replacement algorithm) strategy.
(2) data load balance between catalogue.Most of file system all safeguard data using B+ tree or its mutation Index, the harmony of these index structures preferably, but cannot handle the situation that file is excessive under same catalogue well.Together When, if directories deep excessively if will increase chunkserver metadata burden.Therefore, it is desirable to can guarantee the catalogue number of plies as far as possible As far as possible by file equiblibrium mass distribution under the premise of few.SepStore uses a simple strategy to realize the equilibrium point of file Cloth.Under the root directory, default is created 256 subdirectories by SepStore, and the position of chunk is then determined by chunkid%256 Determine the sub-directory location of file.
Wherein, in one embodiment of the invention, the above method further include: according to data block in each subdata file Corresponding check code is generated, to safeguard subdata file.In other words, a difference of File Region and KV Region is FileRegion is that each chunk maintains a check code, and every 4KB has the verification data an of byte.
Further, in one embodiment of the invention, referring to shown in Fig. 6, by utilizing KV storage cooperation LSM skill Art, the embodiment of the present invention have designed and Implemented the Chunkserver for having preferable performance for big small documents.? In SepStore, big file will be stored as local file to make full use of the characteristics of local file system is for sequential write, and Small documents will be stored in the KV database that one uses LSM to improve readwrite performance.It is complex in view of process is write, it will The read-write process for illustrating whole system for writing process, after improvement whole system write process the following steps are included:
S1, client initiates write request, and relevant parameter (filename, displacement etc.) is issued metadata management server 20。
S2 means that this is a newly-built operation if file is not present.Metadata management server 20 can be to file Distribution fileid (64 integers that SepStore has used an overall situation incremental to distribute the value), chunkid, and comprehensively consider The determination datas placement location such as quantity, service condition of existing Chunkserver, finally returns to client for relevant information.It is no Then, IP, port for finding chunk server according to chunkid can be directly returned to client by Meta Server (Master) End.
S3, client send the data to Chunkserver.A task is directly generated if file has existed to put Enter in work pool, otherwise, then means that this is a newly-built chunk operation, Chunkserver needs to determine putting for file Seated position is placed in KV Region and is still stored directly in file system.After determining placement location, Chunkserver can distribute corresponding metadata information to file.
S4, Chunkserver maintain a thread pool to complete the work inside work pool.There are two types of inside work pool Work, it is another then be the write operation for KV Region one is the write operation for File Region.
Processing result is returned to client by S5, Chunkserver.Client can determine to retry or notify member according to result Data management server 20 modifies associated metadata.
It should be noted that reading process and to write process closely similar, distinguishes and be that step S3 can directly search data bit It sets, and completes reading data in step S4 and be not described in detail herein to reduce redundancy.
Current distributed file system depends directly on local file system mostly and improves disk management function.However, In distributed file system, there is many unnecessary expenses, such as data block for the data access process on data server The positioning expense of storage address.The embodiment of the present invention combines key-value database, LSM technology and local file system Come, file distinguished according to size, and is optimized particular for the read-write of small documents, reduces global random read-write, To realize the promotion of overall performance.Wherein, small documents will be stored in the key-value data using LSM technology, and The data block for being cut into fixed size is stored on local file system by big file in a manner of ordinary file, realizes one A distributed file system prototype SepStore, and pass through the experimental verification validity of prioritization scheme.Experimental result shows, SepStore can be improved 210% for the write operation speed of small documents.Under the load of big small documents mixing, integrally handling up can be with Promote 78%, and whole IOPS (Input/Output Operations Per Second, it is per second to be written and read (I/O) operation Number) 37% can be promoted.
The date storage method of the distributed file system proposed according to embodiments of the present invention, by by data file according to Size distinguishes, if the size of data file is less than certain value, data file is deposited by the KV based on LSM-Tree Method for storing is stored to the key-value database of cloud server, and if the size of data file is greater than certain value, will Data file cutting is multiple subdata files, and is stored to local file system, and particular for small documents read-write into Row optimization, to improve the efficiency of distributed file system, the data access efficiency of Optimum distribution formula file system is realized whole The promotion of performance.
Referring next to the data-storage system for the distributed file system that attached drawing description proposes according to embodiments of the present invention.Ginseng According to shown in Fig. 7, which includes: receiving module 100, judgment module 200, the first memory module 300 and the second storage mould Block 400.
Wherein, receiving module 100 is used to receive the data file of user's transmission.Judgment module 200 is for judging data text The size of part.If the size of data file is less than preset value, the first memory module 300 is used for data file by being based on The KV storage method of LSM-Tree is stored to the key-value database of cloud server.If the size of data file is greater than Preset value, the second memory module 400 is used to data file cutting be multiple subdata files, and stores to local file system System.The storage system 1000 of the embodiment of the present invention by based on LSM (Log-Structure Merge) Tree KV memory technology and Local file system combines, and realizes the promotion for distributed file system data access efficiency.
Preferably, in one embodiment of the invention, preset value can be 64MB.It should be noted that changing preset value It is only that for exemplary purposes, it's not limited to that for the i.e. default size of preset value.
Firstly, the embodiment of the present invention remains the overall architecture of MooseFS, meta data server referring to shown in Fig. 2 It is known as Master in MooseFS, or is most of code of Metadata Server) and client, realizes a new number According to server.Wherein, the structure of MooseFS is similar with GFS, by multiple client 10 (Clients), metadata management server 20 (Master) and multiple data servers 30 (Chunkserver) are constituted, and the function that each part undertakes is also similar with GFS.
Specifically, in an embodiment of the present invention, the function that each part undertakes is as follows:
Client in multiple client 10 is responsible for the resource virtualizing of entire distributed file system utilizing FUSE POSIX interface is externally provided.After using client carry, using MooseFS can be used as using local disk.
Metadata management server 20 is responsible for management metadata information, and the Placement Strategy of determination data safeguards entire system The operation of system.Wherein, entirely servicing after Master delay machine will also stop.In order to protect the reliability of metadata, MooseFS A Metadatalogger (metadata log server) has also been devised, so that the operation to metadata is backed up in realtime.
Chunkserver in multiple data servers 30 is responsible for storage of the data on disk.Wherein, The realization of Chunkserver depends on the process of a User space, and the storage of bottom then relies on local file system.
Further, in one embodiment of the invention, referring to shown in Fig. 8, above-mentioned storage system 1000 further include: Key generation module 500.Wherein, Key generation module 500 is for the data block ID according to data file, data block version, data The Key of block serial number gencration data file.
Further, in one embodiment of the invention, referring to shown in Fig. 8, above-mentioned storage system 1000 further include: text Part name generation module 600.Filename generation module 600 is used for data block ID and data block version according to each subdata file Generate the filename of the corresponding local file system of multiple subdata files.
Specifically, in one embodiment of the invention, the reading and writing data process and storage scheme of MooseFS also with GFS It is similar.Size, which carries out stripping and slicing, (will be determined, default 64MB) according to 64M for big file MooseFS when compiling, will be divided into multiple Chunk (data block is equivalent to and is divided into multiple subdata files) is stored on different Chunkserver.For small documents MooseFS then regards an individual Chunk and is stored on Chunkserver.Each chunk corresponds to Chunkserver One actual file of upper file system.Metadata management server 20 maintains the corresponding pass from filename to chunk list System, the mark of chunk is one 64 chunkID (data block ID).When carrying out reading and writing data, client can first to IP and the end of the chunid and the Chunkserver where chunk of the corresponding chunk of 20 demand file of metadata management server Slogan.Client can issue reading and writing data request after taking these information to specified Chunkserver.If it is read operation, Client completes reading data from specified Chunkserver.If it is write operation, client needs to pass data to be written To a Chunkserver, which can forward the data to other pairs while completing the read-write of local disk This place data server.Master is needed to confirm after the result that client receives the write operation that Chunkserver is transmitted The change of metadata information or the failure of write operation.
Further, in one embodiment of the invention, referring to shown in Fig. 3, the data file of the embodiment of the present invention is pressed Different storage schemes is provided according to different sizes, and provides the side of LSM technology cooperation key-value database for small documents Case, to bring the promotion of overall efficiency.Specifically, in SepStore, for big file by according to fixed size for example 64MB carries out cutting and puts chunk on a different server, and small documents are then considered an individual chunk.Finally There are the files of three kinds of sizes on Chunkserver, are considered as small documents one is be less than size preset value T, one is Between T and 64MB, last one kind is then the file of 64MB.The selection of T is a critical issue, root of the embodiment of the present invention The size for choosing 64KB as T is factually tested, but in practical applications, it should be selected according to application scenarios.In SepStore Chunkserver will be respectively the first file and second, third file provides targeted storage scheme.The present invention is real Example is applied to combine by the KV memory technology and local file system for being based on LSM (Log-Structure Merge) Tree, thus Optimum distribution formula file system data access efficiency.
Specifically, the Chunkserver structure of SepStore is as shown in Figure 3.Wherein, the Chunkserver of SepStore Be divided into three parts: a part is File Region, is used to store biggish file and is greater than T (i.e. T is used to distinguish file and be It is put into the boundary value of File Region or KV Region, T is defaulted as 64KB in SepSotre, if the T next mentioned No specified otherwise is 64KB without exception) it is less than the file of 64MB;Another part is then KV Region, is mainly used for storing small text Part is the file for being less than T;Last part is then Metadata Region, for storing data on server The metadata informations such as chunkid, version (data block version).The embodiment of the present invention will be greater than the file deposit File of T Region, the file less than T are stored in KV region, but this divide not is absolute, such as deposits in KV Region Small documents volume may become larger than T after a write operation, when this happens, data can't be migrated immediately But multiple KV pairs can be temporarily cut into, real migration operation is then by the asynchronous completion of background thread.
Further, Chunkserver externally provides unified access interface, mainly include read, write and delete.Specifically, after being connected to client request, Chunkserver can generate task and be put into task pool, task queue It is the queue of a first in first out, there is no priority differences between task.Chunkserver maintains a thread pool to locate The task in task pool is managed, read-write task is all asynchronous completion.Wherein, at runtime, all metadata can be all loaded To in memory, and it is organized into a huge Hash table, chunkid is then the input value of Hash table.Chunkserver is removed Outside the request of data at customer in response end, can also timing to Master report the letter such as local disk service condition, error situation Breath.
Further, in one embodiment of the invention, referring to shown in Fig. 4, KV different from local file system Region provides the NameSpace of a flattening.Wherein, each key-value is to having unique Key, data Cooley The positioning of data is realized with Key.In the design of KV Region, there is no preservation Key values for the embodiment of the present invention, but set A kind of simple algorithm has been counted to generate Key.Specifically, the Key of each chunk is formed as shown, by chunkid (data Block ID), version (data block version) and blocknum (data block sequence number) are collectively formed.Wherein chunkid and version All be directly be transmitted through by client Lai information, blocknum is then used to current key-value to the position in entire chunk It sets.The size of each block is fixed for example, 64KB, therefore blocknum can pass through following equation blocknum= Offset%64KB is obtained.
Further, in one embodiment of the invention, referring to Figure 5, under distributed environment, in order to realize base In the reading and writing data function of local file system, SepStore need to safeguard some additional metadata informations realize from Mapping of the chunkid to system file.As the design of KVRegion, in File Region, the embodiment of the present invention is not Have and directly save the map information of chunkid to local file, but uses a simple algorithm to realize data Address function.Specifically, the filename of the corresponding local file system of each chunk is made of chunid, version, Structure is as shown, this lie in the way in file name there are two benefit for metadata information:
The quick positioning from chunkid to data may be implemented.
SepStore opens a background thread and is periodically scanned to all File Region All Files, and Using the information that filename includes come the consistency of verify data.
In order to improve the reading performance of file, File Region takes following Optimized Measures:
(1) compared to the significantly more efficient cache policy of local file system, SepStore is maintained on file system Data buffer storage for block simultaneously realizes LRU (Least Recently Used at least uses page replacement algorithm) strategy.
(2) data load balance between catalogue.Most of file system all safeguard data using B+ tree or its mutation Index, the harmony of these index structures preferably, but cannot handle the situation that file is excessive under same catalogue well.Together When, if directories deep excessively if will increase chunkserver metadata burden.Therefore, it is desirable to can guarantee the catalogue number of plies as far as possible As far as possible by file equiblibrium mass distribution under the premise of few.SepStore uses a simple strategy to realize the equilibrium point of file Cloth.Under the root directory, default is created 256 subdirectories by SepStore, and the position of chunk is then determined by chunkid%256 Determine the sub-directory location of file.
Wherein, in one embodiment of the invention, referring to shown in Fig. 8, above-mentioned storage system 1000 further include: check code Generation module 700.Wherein, check code generation module 700 is used to generate corresponding school according to data block in each subdata file Code is tested, to safeguard subdata file.In other words, a difference of File Region and KV Region is that File Region is Each chunk maintains a check code, and every 4KB has the verification data an of byte.
Further, in one embodiment of the invention, referring to shown in Fig. 6, by utilizing KV storage cooperation LSM skill Art, the embodiment of the present invention have designed and Implemented the Chunkserver for having preferable performance for big small documents.? In SepStore, big file will be stored as local file to make full use of the characteristics of local file system is for sequential write, and Small documents will be stored in the KV database that one uses LSM to improve readwrite performance.It is complex in view of process is write, it will The read-write process for illustrating whole system for writing process, after improvement whole system write process the following steps are included:
S1, client initiates write request, and relevant parameter (filename, displacement etc.) is issued metadata management server 20。
S2 means that this is a newly-built operation if file is not present.Metadata management server 20 can be to file Distribution fileid (64 integers that SepStore has used an overall situation incremental to distribute the value), chunkid, and comprehensively consider The determination datas placement location such as quantity, service condition of existing Chunkserver, finally returns to client for relevant information.It is no Then, IP, port for finding chunk server according to chunkid can be directly returned to client by Meta Server (Master) End.
S3, client send the data to Chunkserver.A task is directly generated if file has existed to put Enter in work pool, otherwise, then means that this is a newly-built chunk operation, Chunkserver needs to determine putting for file Seated position is placed in KV Region and is still stored directly in file system.After determining placement location, Chunkserver can distribute corresponding metadata information to file.
S4, Chunkserver maintain a thread pool to complete the work inside work pool.There are two types of inside work pool Work, it is another then be the write operation for KV Region one is the write operation for File Region.
Processing result is returned to client by S5, Chunkserver.Client can determine to retry or notify member according to result Data management server 20 modifies associated metadata.
It should be noted that reading process and to write process closely similar, distinguishes and be that step S3 can directly search data bit It sets, and completes reading data in step S4 and be not described in detail herein to reduce redundancy.
Current distributed file system depends directly on local file system mostly and improves disk management function.However, In distributed file system, there is many unnecessary expenses, such as data block for the data access process on data server The positioning expense of storage address.The embodiment of the present invention combines key-value database, LSM technology and local file system Come, file distinguished according to size, and is optimized particular for the read-write of small documents, reduces global random read-write, To realize the promotion of overall performance.Wherein, small documents will be stored in the key-value data using LSM technology, and The data block for being cut into fixed size is stored on local file system by big file in a manner of ordinary file, realizes one A distributed file system prototype SepStore, and pass through the experimental verification validity of prioritization scheme.Experimental result shows, SepStore can be improved 210% for the write operation speed of small documents.Under the load of big small documents mixing, integrally handling up can be with 78% is promoted, whole IOPS can promote 37%.
The data-storage system of the distributed file system proposed according to embodiments of the present invention, by by data file according to Size distinguishes, if the size of data file is less than certain value, data file is deposited by the KV based on LSM-Tree Method for storing is stored to the key-value database of cloud server, and if the size of data file is greater than certain value, will Data file cutting is multiple subdata files, and is stored to local file system, and particular for small documents read-write into Row optimization, to improve the efficiency of distributed file system, the data access efficiency of Optimum distribution formula file system is realized whole The promotion of performance.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discussed suitable Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be of the invention Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (such as computer based system, including the system of processor or other can be held from instruction The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium ", which can be, any may include, stores, communicates, propagates or pass Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment It sets.The more specific example (non-exhaustive list) of computer-readable medium include the following: there is the electricity of one or more wirings Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable Medium, because can then be edited, be interpreted or when necessary with it for example by carrying out optical scanner to paper or other media His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the invention can be realized with hardware, software, firmware or their combination.Above-mentioned In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware Any one of column technology or their combination are realized: having a logic gates for realizing logic function to data-signal Discrete logic, with suitable combinational logic gate circuit specific integrated circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..
Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.
It, can also be in addition, each functional unit in each embodiment of the present invention can integrate in a processing module It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not Centainly refer to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be any One or more embodiment or examples in can be combined in any suitable manner.
Although the embodiments of the present invention has been shown and described above, it is to be understood that above-described embodiment is example Property, it is not considered as limiting the invention, those skilled in the art are not departing from the principle of the present invention and objective In the case where can make changes, modifications, alterations, and variations to the above described embodiments within the scope of the invention.

Claims (6)

1. a kind of date storage method of distributed file system, which comprises the following steps:
Receive the data file that user sends;
Judge the size of the data file;
If the size of the data file is less than preset value, the data file is passed through into the merger tree based on journal format The KV storage method of LSM-Tree is stored to the key-value database of cloud server, and described by the data file By the KV storage mode of LSM-Tree into the key-value database of cloud server after, further includes: according to the number According to the Key of data file described in the data block ID of file, data block version and data block serial number gencration, and the data block ID, Data block version is directly obtained by client, and the data block sequence number is the Key of the data file in entire data block Position;And
It is multiple subdata texts by the data file cutting if the size of the data file is greater than the preset value Part, and storing to local file system, and it is described by the data file cutting be multiple subdata files, and store to this After ground file system, further includes: generate the multiple son according to the data block ID of each subdata file and data block version The filename of the corresponding local file system of data file, to realize the addressing function of data.
2. the date storage method of distributed file system according to claim 1, which is characterized in that further include: according to Data block generates corresponding check code in each subdata file, to safeguard subdata file.
3. the date storage method of distributed file system according to claim 1, which is characterized in that the preset value is 64MB。
4. a kind of data-storage system of distributed file system characterized by comprising
Receiving module, for receiving the data file of user's transmission;
Judgment module, for judging the size of the data file;
First memory module is based on if the size of the data file is less than preset value for passing through the data file The KV storage method of LSM-Tree is stored to the key-value database of cloud server;
Key generation module, for according to the data block ID of the data file, data block version and data block serial number gencration The Key of data file, and the data block ID, data block version are directly obtained by client, the data block sequence number is described Position of the Key of data file in entire data block;And
Second memory module is used for if the size of the data file is greater than the preset value by the data file cutting For multiple subdata files, and store to local file system;
Filename generation module, for generating the multiple son according to the data block ID and data block version of each subdata file The filename of the corresponding local file system of data file, to realize the addressing function of data.
5. the data-storage system of distributed file system according to claim 4, which is characterized in that further include:
Check code generation module, for generating corresponding check code according to data block in each subdata file, with maintenance Subdata file.
6. the data-storage system of distributed file system according to claim 4, which is characterized in that the preset value is 64MB。
CN201410645370.0A 2014-11-11 2014-11-11 The date storage method and system of distributed file system Active CN104408091B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410645370.0A CN104408091B (en) 2014-11-11 2014-11-11 The date storage method and system of distributed file system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410645370.0A CN104408091B (en) 2014-11-11 2014-11-11 The date storage method and system of distributed file system

Publications (2)

Publication Number Publication Date
CN104408091A CN104408091A (en) 2015-03-11
CN104408091B true CN104408091B (en) 2019-03-01

Family

ID=52645722

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410645370.0A Active CN104408091B (en) 2014-11-11 2014-11-11 The date storage method and system of distributed file system

Country Status (1)

Country Link
CN (1) CN104408091B (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105159915B (en) * 2015-07-16 2018-07-10 中国科学院计算技术研究所 The LSM trees merging method and system of dynamic adaptable
CN105138632A (en) * 2015-08-20 2015-12-09 浪潮(北京)电子信息产业有限公司 Organization and management method for file data and file management server
CN106557509A (en) * 2015-09-29 2017-04-05 镇江雅迅软件有限责任公司 A kind of distributed file system
CN105787093B (en) * 2016-03-17 2019-07-02 清华大学 A kind of construction method of the log file system based on LSM-Tree structure
CN107656697B (en) * 2016-07-26 2021-03-02 阿里巴巴集团控股有限公司 Method and device for operating data on storage medium
CN107870940B (en) 2016-09-28 2021-06-18 杭州海康威视数字技术股份有限公司 File storage method and device
CN107977341A (en) * 2016-10-21 2018-05-01 北京航天爱威电子技术有限公司 Big data text immediate processing method
CN106412093B (en) * 2016-10-25 2019-07-23 Oppo广东移动通信有限公司 A kind of method for uploading of data, apparatus and system
CN106708427B (en) * 2016-11-17 2019-05-10 华中科技大学 A kind of storage method suitable for key-value pair data
CN107193988A (en) * 2017-05-30 2017-09-22 梅婕 The quick method for cleaning of data
CN108052284B (en) * 2017-12-08 2020-11-06 北京奇虎科技有限公司 Distributed data storage method and device
CN108446363B (en) * 2018-03-13 2021-05-25 北京奇安信科技有限公司 Data processing method and device of KV engine
CN109241015B (en) * 2018-07-24 2021-07-16 北京百度网讯科技有限公司 Method for writing data in a distributed storage system
CN109491807A (en) * 2018-11-01 2019-03-19 浪潮软件集团有限公司 Data exchange method, device and system
CN109684414B (en) * 2018-12-26 2022-04-08 百度在线网络技术(北京)有限公司 Method, device and equipment for synchronizing block data and storage medium
CN110321077B (en) * 2019-06-17 2023-04-14 浩云科技股份有限公司 Method and device for managing centrally stored files
CN112486939A (en) * 2019-09-11 2021-03-12 上海擎感智能科技有限公司 Public cloud-based Moosefs distributed file storage method, system, medium and device
CN112699092B (en) * 2021-01-13 2023-02-03 浪潮云信息技术股份公司 Method for storing big value data by RocksDB
CN112965856B (en) * 2021-02-24 2022-04-08 上海英方软件股份有限公司 Backup data-based fast fine-grained recovery method and device
CN113094372A (en) 2021-04-16 2021-07-09 三星(中国)半导体有限公司 Data access method, data access control device and data access system
CN113688099B (en) * 2021-08-09 2023-10-13 上海沄熹科技有限公司 SPDK-based database storage engine acceleration method and system
CN117520305B (en) * 2023-11-21 2024-04-23 北京中领启天信息科技有限公司 High concurrency data migration method and data security storage device
CN118349532B (en) * 2024-06-17 2024-08-27 北京乐讯科技有限公司 Filecoin scene adaptation method and system based on additional storage

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101916289A (en) * 2010-08-20 2010-12-15 浙江大学 Method for establishing digital library storage system supporting mass small files and dynamic backup number

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7024427B2 (en) * 2001-12-19 2006-04-04 Emc Corporation Virtual file system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101916289A (en) * 2010-08-20 2010-12-15 浙江大学 Method for establishing digital library storage system supporting mass small files and dynamic backup number

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SDFS分布式文件系统的研究与设计;罗雄威;《中国优秀硕士学位论文全文数据库 信息科技辑》;20131215(第82期);第7、38、40、45、47页
一种基于海量信息处理的云存储模型研究;张桂刚等;《计算机研究与发展》;20121231;第32-36页

Also Published As

Publication number Publication date
CN104408091A (en) 2015-03-11

Similar Documents

Publication Publication Date Title
CN104408091B (en) The date storage method and system of distributed file system
US10198356B2 (en) Distributed cache nodes to send redo log records and receive acknowledgments to satisfy a write quorum requirement
US10977124B2 (en) Distributed storage system, data storage method, and software program
US9507800B2 (en) Data management in distributed file systems
US10474631B2 (en) Method and apparatus for content derived data placement in memory
KR100490723B1 (en) Apparatus and method for file-level striping
US10216757B1 (en) Managing deletion of replicas of files
US9400792B1 (en) File system inline fine grained tiering
US8312242B2 (en) Tracking memory space in a storage system
US8996490B1 (en) Managing logical views of directories
US20230016822A1 (en) Creating Batches Of Training Data For Machine Learning Workflows
CN101137981A (en) Methods and apparatus for managing the storage of content in a file system
US9383936B1 (en) Percent quotas for deduplication storage appliance
JP2015521310A (en) Efficient data object storage and retrieval
CN101567003A (en) Method for managing and allocating resource in parallel file system
CN103635900A (en) Time-based data partitioning
US20140181455A1 (en) Category based space allocation for multiple storage devices
US20200042399A1 (en) Method, apparatus and computer program product for managing data storage
US11199990B2 (en) Data reduction reporting in storage systems
US9916102B1 (en) Managing data storage reservations on a per-family basis
US10481820B1 (en) Managing data in storage systems
CN109522283A (en) A kind of data de-duplication method and system
CN108733306A (en) A kind of Piece file mergence method and device
CN103412929A (en) Mass data storage method
US10409687B1 (en) Managing backing up of file systems

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant