CN104408091B - The date storage method and system of distributed file system - Google Patents
The date storage method and system of distributed file system Download PDFInfo
- Publication number
- CN104408091B CN104408091B CN201410645370.0A CN201410645370A CN104408091B CN 104408091 B CN104408091 B CN 104408091B CN 201410645370 A CN201410645370 A CN 201410645370A CN 104408091 B CN104408091 B CN 104408091B
- Authority
- CN
- China
- Prior art keywords
- file
- data
- data block
- file system
- data file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 62
- 238000003860 storage Methods 0.000 title claims abstract description 57
- 238000005520 cutting process Methods 0.000 claims abstract description 16
- 238000013500 data storage Methods 0.000 claims abstract description 14
- 230000006870 function Effects 0.000 claims description 21
- 230000005540 biological transmission Effects 0.000 claims description 3
- 238000012423 maintenance Methods 0.000 claims 1
- 230000008569 process Effects 0.000 description 24
- 238000007726 management method Methods 0.000 description 17
- 238000009826 distribution Methods 0.000 description 8
- 238000013461 design Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 239000004744 fabric Substances 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 238000005457 optimization Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 241000208340 Araliaceae Species 0.000 description 2
- 241001269238 Data Species 0.000 description 2
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 2
- 235000003140 Panax quinquefolius Nutrition 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013523 data management Methods 0.000 description 2
- 238000006073 displacement reaction Methods 0.000 description 2
- 235000008434 ginseng Nutrition 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000035772 mutation Effects 0.000 description 2
- 238000004321 preservation Methods 0.000 description 2
- 238000012913 prioritisation Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 210000003733 optic disk Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
- G06F16/1824—Distributed file systems implemented using Network-attached Storage [NAS] architecture
- G06F16/183—Provision of network file services by network file servers, e.g. by using NFS, CIFS
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of date storage methods of distributed file system, comprising the following steps: receives the data file that user sends;Judge the size of data file;If the size of data file is less than preset value, data file is stored by the KV storage method of the merger tree LSM-Tree based on journal format to the key-value database of cloud server;It is multiple subdata files by data file cutting, and store to local file system if the size of data file is greater than preset value.The method of the embodiment of the present invention, to improve the efficiency of distributed file system, realizes the promotion of overall performance by distinguishing data file according to the size of file.The embodiment of the invention also discloses a kind of data-storage systems of distributed file system.
Description
Technical field
The present invention relates to file system technology field, in particular to the date storage method of a kind of distributed file system and
System.
Background technique
Currently, distributed file system such as GFS (Google File System, the file system of storing data),
MooseFS, Lusture etc. are to establish on the basis of single machine file system.Distributed layer is responsible for organizational logical file
To the mapping of logic data block list, and local file system is then responsible for the mapping from logic data block to hard disc data.Two layers
Each performs its own functions completes addressing and the read-write operation of data jointly.
Wherein, local file system carrys out the data on tissue disk usually using multilayer index.Such as the file system of Ext series
System, must find corresponding metadata before accessing data block by index of metadata, and necessary before finding the metadata
Find the metadata of its parent directory.It needs first to undergo the mistake for finding metadata along directory tree before carrying out reading and writing data
Journey.However, such access module directly results in the low read-write efficiency of small documents.For big file, these expenses can
To share reading and writing data equally, the design of coupled system caching the influence of performance will can be dropped to it is minimum, but for small documents
For, this Section Overhead occupies the overwhelming majority of entire access time.Therefore, performance of the local file system for small documents
It is often very poor, an order of magnitude can be differed compared with size file read-write performance, and since distributed file system uses
Local file system has read and write mostly network interaction several times compared to local, has caused dividing as rear end storage, the read-write process of file
The lower problem of more serious small documents readwrite performance is equally faced under cloth environment.
In addition, since the basic storage service of local file system, local file system is used only in distributed file system
The metadata of reservation is all much unnecessary.The NameSpace organizational form of distributed document and local file system not phase
Together, and it is independently of local file system, therefore is optimization small documents read-write property, distributed system in the related technology has
It is to be modified.
Summary of the invention
The present invention is directed to solve at least to a certain extent it is above-mentioned in the related technology the technical issues of one of.
For this purpose, an object of the present invention is to provide one kind can be improved system effectiveness and performance, simply and easily divide
The date storage method of cloth file system.
It is another object of the present invention to the data-storage systems for proposing a kind of distributed file system.
In order to achieve the above objectives, one aspect of the present invention embodiment proposes a kind of data storage side of distributed file system
Method, comprising the following steps: receive the data file that user sends;Judge the size of the data file;If the data text
The size of part is less than preset value, then the data file is passed through the merger tree LSM-Tree (Log- based on journal format
Structured Merge-Tree, the merger tree based on journal format) KV (key-value, key-value pair) storage method storage
To the key-value database of cloud server;And if the size of the data file is greater than the preset value, by institute
Stating data file cutting is multiple subdata files, and is stored to local file system.
The date storage method of the distributed file system proposed according to embodiments of the present invention, by by data file according to
Size distinguishes, if the size of data file is less than certain value, data file is stored by the KV based on LSMTree
Method is stored to the key-value database of cloud server, and if the size of data file is greater than certain value, will be counted
It is multiple subdata files according to file cutting, and stores to local file system, so that the efficiency of distributed file system is improved,
Realize the promotion of overall performance.
In addition, the date storage method of cloth file system according to the above embodiment of the present invention can also have it is following attached
The technical characteristic added:
Further, in one embodiment of the invention, in the KV that the data file is passed through LSM-Tree
After storage mode is into the key-value database of cloud server, further includes: according to the data block ID of the data file
(identity, identity number), data block version, data block sequence number generate the Key of the data file.
It further, in one embodiment of the invention, is multiple subdatas by the data file cutting described
File, and store to local file system, further includes: according to the data block ID and data block version of each subdata file
Generate the filename of the corresponding local file system of the multiple subdata file.
Further, in one embodiment of the invention, the above method further include: according to each subdata file
Middle data block generates corresponding check code, to safeguard subdata file.
Preferably, in one embodiment of the invention, the preset value can be 64MB.
Another aspect of the present invention embodiment proposes a kind of data-storage system of distributed file system, comprising: receives
Module, for receiving the data file of user's transmission;Judgment module, for judging the size of the data file;First storage
Module, if the size of the data file is less than preset value, for the data file to be passed through the KV based on LSM-Tree
Storage method is stored to the key-value database of cloud server;And second memory module, if the data file
Size is greater than the preset value, for being multiple subdata files by the data file cutting, and stores to local file system
System.
The data-storage system of the distributed file system proposed according to embodiments of the present invention, by by data file according to
Size distinguishes, if the size of data file is less than certain value, data file is deposited by the KV based on LSM-Tree
Method for storing is stored to the key-value database of cloud server, and if the size of data file is greater than certain value, will
Data file cutting is multiple subdata files, and is stored to local file system, to improve the effect of distributed file system
Rate realizes the promotion of overall performance.
In addition, the data-storage system of distributed file system according to the above embodiment of the present invention can also have it is as follows
Additional technical characteristic:
Further, in one embodiment of the invention, above system further include: Key generation module, for according to institute
State the Key that the data block ID of data file, data block version, data block sequence number generate the data file.
Further, in one embodiment of the invention, above system further include: filename generation module is used for root
The corresponding local file system of the multiple subdata file is generated according to the data block ID and data block version of each subdata file
The filename of system.
Further, in one embodiment of the invention, above system further include: check code generation module is used for root
Corresponding check code is generated according to data block in each subdata file, to safeguard subdata file.
Preferably, in one embodiment of the invention, the preset value can be 64MB.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partially become from the following description
Obviously, or practice through the invention is recognized.
Detailed description of the invention
Above-mentioned and/or additional aspect of the invention and advantage will become from the description of the embodiment in conjunction with the following figures
Obviously and it is readily appreciated that, in which:
Fig. 1 is the flow chart according to the date storage method of the distributed file system of the embodiment of the present invention;
Fig. 2 is the structural schematic diagram according to the MooseFS of one embodiment of the invention used;
Fig. 3 is the structural schematic diagram according to the data server of one embodiment of the invention;
Fig. 4 is the structural schematic diagram according to the Key of one embodiment of the invention;
Fig. 5 is according to the corresponding filename structural schematic diagram of chunk in the File Region of one embodiment of the invention;
Fig. 6 is the flow chart according to the SepStore write operation of one embodiment of the invention;
Fig. 7 is the structural schematic diagram according to the data-storage system of the distributed file system of the embodiment of the present invention;And
Fig. 8 is the structural schematic diagram according to the data-storage system of the distributed file system of one embodiment of the invention.
Specific embodiment
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end
Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached
The embodiment of figure description is exemplary, it is intended to is used to explain the present invention, and is not considered as limiting the invention.
In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance
Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or
Implicitly include one or more of the features.In the description of the present invention, the meaning of " plurality " is two or more,
Unless otherwise specifically defined.
In the present invention unless specifically defined or limited otherwise, term " installation ", " connected ", " connection ", " fixation " etc.
Term shall be understood in a broad sense, for example, it may be being fixedly connected, may be a detachable connection, or be integrally connected;It can be machine
Tool connection, is also possible to be electrically connected;It can be directly connected, two members can also be can be indirectly connected through an intermediary
Connection inside part.For the ordinary skill in the art, above-mentioned term can be understood in this hair as the case may be
Concrete meaning in bright.
In the present invention unless specifically defined or limited otherwise, fisrt feature second feature "upper" or "lower"
It may include that the first and second features directly contact, also may include that the first and second features are not direct contacts but pass through it
Between other characterisation contact.Moreover, fisrt feature includes the first spy above the second feature " above ", " above " and " above "
Sign is right above second feature and oblique upper, or is merely representative of first feature horizontal height higher than second feature.Fisrt feature exists
Second feature " under ", " lower section " and " following " include fisrt feature right above second feature and oblique upper, or be merely representative of
First feature horizontal height is less than second feature.
The date storage method of distributed file system and system proposed according to embodiments of the present invention in description below it
Before, the reason of embodiment of the present invention proposes is briefly described first.
From the point of view of the read-write process of entire distributed file system, the data exchange process of client and data server
It is to influence the pith of readwrite performance, and cause due to file system structure and disk tracking etc. on data server small
The readwrite performance of file is very poor.
Distributed file system has individual metadata management service, and many metadata that local file system saves are such as
It is unnecessary that modification time, waiting time, Hard link number etc. are mostly.
The concept of file and the Document Concepts of local file system are not one-to-one in distributed file system,
It is parallel to realize that actually big file is often cut into the file of multiple local file systems.And small documents can also polymerize
At biggish to reduce the metadata expense during member read-write on local file system.
Open, close expense of local file system are unnecessary, and in view of the design of local file system, it is complete
It totally disappeared except open, close expense are simultaneously impossible, but this Section Overhead can be dropped to most by the way that small documents are packaged into big file
It is low.
From the point of view of the reading and writing data process on data server, the tracking of disk caused by random read-write is to influence file system
The system most important factor of performance.It, can will the property write by using the magnetic disk of log-structured for random write
It can increase substantially.And for random read operation, it can be by reducing IO number during read-write using memory index technology.
After optimizing small documents read-write, after being especially the reduction of a large amount of disk tracking operation, entire file system
Performance can all have promotion.
The present invention is based on the above problems, and proposes the date storage method and one kind of a kind of distributed file system
The data-storage system of distributed file system.
Describe with reference to the accompanying drawings the distributed file system proposed according to embodiments of the present invention date storage method and
System describes the date storage method of the distributed file system proposed according to embodiments of the present invention with reference to the accompanying drawings first.Ginseng
According to shown in Fig. 1, method includes the following steps:
S101 receives the data file that user sends.
Firstly, the embodiment of the present invention remains the overall architecture of MooseFS, meta data server referring to shown in Fig. 2
It is known as Master in MooseFS, or is most of code of Metadata Server) and client, realizes a new number
According to server.Wherein, the structure of MooseFS is similar with GFS, by multiple client 10 (Clients), metadata management server
20 (Master) and multiple data servers 30 (Chunkserver) are constituted, and the function that each part undertakes is also similar with GFS.
Specifically, in an embodiment of the present invention, the function that each part undertakes is as follows:
Client in multiple client 10 is responsible for the resource virtualizing of entire distributed file system utilizing FUSE
POSIX interface is externally provided.After using client carry, using MooseFS can be used as using local disk.
Metadata management server 20 is responsible for management metadata information, and the Placement Strategy of determination data safeguards entire system
The operation of system.Wherein, entirely servicing after Master delay machine will also stop.In order to protect the reliability of metadata, MooseFS
A Metadatalogger (metadata log server) has also been devised, so that the operation to metadata is backed up in realtime.
Chunkserver in multiple data servers 30 is responsible for storage of the data on disk.Wherein,
The realization of Chunkserver depends on the process of a User space, and the storage of bottom then relies on local file system.
S102 judges the size of data file.
Data file is passed through the merger based on journal format if the size of data file is less than preset value by S103
The KV storage method of tree LSM-Tree is stored to the key-value database of cloud server.
Preferably, in one embodiment of the invention, preset value can be 64MB.It should be noted that changing preset value
It is only that for exemplary purposes, it's not limited to that for the i.e. default size of preset value.
Further, in one embodiment of the invention, in the KV storage mode that data file is passed through to LSM-Tree
After into the key-value database of cloud server, further includes: according to the data block ID of data file, data block version,
The Key of data block sequence number generation data file.
Data file cutting is multiple subdata files if the size of data file is greater than preset value by S104, and
It stores to local file system.
Further, in one embodiment of the invention, it is being multiple subdata files by data file cutting, and is depositing
After storage to local file system, further includes: generated according to the data block ID of each subdata file and data block version multiple
The filename of the corresponding local file system of subdata file.
Specifically, in one embodiment of the invention, the reading and writing data process and storage scheme of MooseFS also with GFS
It is similar.Size, which carries out stripping and slicing, (will be determined, default 64MB) according to 64M for big file MooseFS when compiling, will be divided into multiple
Chunk (data block is equivalent to and is divided into multiple subdata files) is stored on different Chunkserver.For small documents
MooseFS then regards an individual Chunk and is stored on Chunkserver.Each chunk corresponds to Chunkserver
One actual file of upper file system.Metadata management server 20 maintains the corresponding pass from filename to chunk list
System, the mark of chunk is one 64 chunkID (data block ID).When carrying out reading and writing data, client can first to
IP and the end of the chunid and the Chunkserver where chunk of the corresponding chunk of 20 demand file of metadata management server
Slogan.Client can issue reading and writing data request after taking these information to specified Chunkserver.If it is read operation,
Client completes reading data from specified Chunkserver.If it is write operation, client needs to pass data to be written
To a Chunkserver, which can forward the data to other pairs while completing the read-write of local disk
This place data server.Master is needed to confirm after the result that client receives the write operation that Chunkserver is transmitted
The change of metadata information or the failure of write operation.
Further, in one embodiment of the invention, referring to shown in Fig. 3, the data file of the embodiment of the present invention is pressed
Different storage schemes is provided according to different sizes, and provides the side of LSM technology cooperation key-value database for small documents
Case, to bring the promotion of overall efficiency.Specifically, in SepStore, for big file by according to fixed size for example
64MB carries out cutting and puts chunk on a different server, and small documents are then considered an individual chunk.Finally
There are the files of three kinds of sizes on Chunkserver, are considered as small documents one is be less than size preset value T, one is
Between T and 64MB, last one kind is then the file of 64MB.The selection of T is a critical issue, root of the embodiment of the present invention
The size for choosing 64KB as T is factually tested, but in practical applications, it should be selected according to application scenarios.In SepStore
Chunkserver will be respectively the first file and second, third file provides targeted storage scheme.The present invention is real
Example is applied to combine by the KV memory technology and local file system for being based on LSM (Log-Structure Merge) Tree, thus
Optimum distribution formula file system data access efficiency.
Specifically, the Chunkserver structure of SepStore is as shown in Figure 3.Wherein, the Chunkserver of SepStore
Be divided into three parts: a part is File Region, is used to store biggish file and is greater than T (i.e. T is used to distinguish file and be
It is put into the boundary value of File Region or KV Region, T is defaulted as 64KB in SepSotre, if the T next mentioned
No specified otherwise is 64KB without exception) it is less than the file of 64MB;Another part is then KV Region, is mainly used for storing small text
Part is the file for being less than T;Last part is then Metadata Region, for storing data on server
The metadata informations such as chunkid, version (data block version).The embodiment of the present invention will be greater than the file deposit File of T
Region, the file less than T are stored in KV region, but this divide not is absolute, such as deposits in KV Region
Small documents volume may become larger than T after a write operation, when this happens, data can't be migrated immediately
But multiple KV pairs can be temporarily cut into, real migration operation is then by the asynchronous completion of background thread.
Further, Chunkserver externally provides unified access interface, mainly include read, write and
delete.Specifically, after being connected to client request, Chunkserver can generate task and be put into task pool, task queue
It is the queue of a first in first out, there is no priority differences between task.Chunkserver maintains a thread pool to locate
The task in task pool is managed, read-write task is all asynchronous completion.Wherein, at runtime, all metadata can be all loaded
To in memory, and it is organized into a huge Hash table, chunkid is then the input value of Hash table.Chunkserver is removed
Outside the request of data at customer in response end, can also timing to Master report the letter such as local disk service condition, error situation
Breath.
Further, in one embodiment of the invention, referring to shown in Fig. 4, KV different from local file system
Region provides the NameSpace of a flattening.Wherein, each key-value is to having unique Key, data Cooley
The positioning of data is realized with Key.In the design of KV Region, there is no preservation Key values for the embodiment of the present invention, but set
A kind of simple algorithm has been counted to generate Key.Specifically, the Key of each chunk is formed as shown, by chunkid (data
Block ID), version (data block version) and blocknum (data block sequence number) are collectively formed.Wherein chunkid and version
All be directly be transmitted through by client Lai information, blocknum is then used to current key-value to the position in entire chunk
It sets.The size of each block is fixed for example, 64KB, therefore blocknum can pass through following equation blocknum=
Offset%64KB is obtained.
Further, in one embodiment of the invention, referring to Figure 5, under distributed environment, in order to realize base
In the reading and writing data function of local file system, SepStore need to safeguard some additional metadata informations realize from
Mapping of the chunkid to system file.As the design of KVRegion, in File Region, the embodiment of the present invention is not
Have and directly save the map information of chunkid to local file, but uses a simple algorithm to realize data
Address function.Specifically, the filename of the corresponding local file system of each chunk is made of chunid, version,
Structure is as shown, this lie in the way in file name there are two benefit for metadata information:
The quick positioning from chunkid to data may be implemented.
SepStore opens a background thread and is periodically scanned to all File Region All Files, and
Using the information that filename includes come the consistency of verify data.
In order to improve the reading performance of file, File Region takes following Optimized Measures:
(1) compared to the significantly more efficient cache policy of local file system, SepStore is maintained on file system
Data buffer storage for block simultaneously realizes LRU (Least Recently Used at least uses page replacement algorithm) strategy.
(2) data load balance between catalogue.Most of file system all safeguard data using B+ tree or its mutation
Index, the harmony of these index structures preferably, but cannot handle the situation that file is excessive under same catalogue well.Together
When, if directories deep excessively if will increase chunkserver metadata burden.Therefore, it is desirable to can guarantee the catalogue number of plies as far as possible
As far as possible by file equiblibrium mass distribution under the premise of few.SepStore uses a simple strategy to realize the equilibrium point of file
Cloth.Under the root directory, default is created 256 subdirectories by SepStore, and the position of chunk is then determined by chunkid%256
Determine the sub-directory location of file.
Wherein, in one embodiment of the invention, the above method further include: according to data block in each subdata file
Corresponding check code is generated, to safeguard subdata file.In other words, a difference of File Region and KV Region is
FileRegion is that each chunk maintains a check code, and every 4KB has the verification data an of byte.
Further, in one embodiment of the invention, referring to shown in Fig. 6, by utilizing KV storage cooperation LSM skill
Art, the embodiment of the present invention have designed and Implemented the Chunkserver for having preferable performance for big small documents.?
In SepStore, big file will be stored as local file to make full use of the characteristics of local file system is for sequential write, and
Small documents will be stored in the KV database that one uses LSM to improve readwrite performance.It is complex in view of process is write, it will
The read-write process for illustrating whole system for writing process, after improvement whole system write process the following steps are included:
S1, client initiates write request, and relevant parameter (filename, displacement etc.) is issued metadata management server
20。
S2 means that this is a newly-built operation if file is not present.Metadata management server 20 can be to file
Distribution fileid (64 integers that SepStore has used an overall situation incremental to distribute the value), chunkid, and comprehensively consider
The determination datas placement location such as quantity, service condition of existing Chunkserver, finally returns to client for relevant information.It is no
Then, IP, port for finding chunk server according to chunkid can be directly returned to client by Meta Server (Master)
End.
S3, client send the data to Chunkserver.A task is directly generated if file has existed to put
Enter in work pool, otherwise, then means that this is a newly-built chunk operation, Chunkserver needs to determine putting for file
Seated position is placed in KV Region and is still stored directly in file system.After determining placement location,
Chunkserver can distribute corresponding metadata information to file.
S4, Chunkserver maintain a thread pool to complete the work inside work pool.There are two types of inside work pool
Work, it is another then be the write operation for KV Region one is the write operation for File Region.
Processing result is returned to client by S5, Chunkserver.Client can determine to retry or notify member according to result
Data management server 20 modifies associated metadata.
It should be noted that reading process and to write process closely similar, distinguishes and be that step S3 can directly search data bit
It sets, and completes reading data in step S4 and be not described in detail herein to reduce redundancy.
Current distributed file system depends directly on local file system mostly and improves disk management function.However,
In distributed file system, there is many unnecessary expenses, such as data block for the data access process on data server
The positioning expense of storage address.The embodiment of the present invention combines key-value database, LSM technology and local file system
Come, file distinguished according to size, and is optimized particular for the read-write of small documents, reduces global random read-write,
To realize the promotion of overall performance.Wherein, small documents will be stored in the key-value data using LSM technology, and
The data block for being cut into fixed size is stored on local file system by big file in a manner of ordinary file, realizes one
A distributed file system prototype SepStore, and pass through the experimental verification validity of prioritization scheme.Experimental result shows,
SepStore can be improved 210% for the write operation speed of small documents.Under the load of big small documents mixing, integrally handling up can be with
Promote 78%, and whole IOPS (Input/Output Operations Per Second, it is per second to be written and read (I/O) operation
Number) 37% can be promoted.
The date storage method of the distributed file system proposed according to embodiments of the present invention, by by data file according to
Size distinguishes, if the size of data file is less than certain value, data file is deposited by the KV based on LSM-Tree
Method for storing is stored to the key-value database of cloud server, and if the size of data file is greater than certain value, will
Data file cutting is multiple subdata files, and is stored to local file system, and particular for small documents read-write into
Row optimization, to improve the efficiency of distributed file system, the data access efficiency of Optimum distribution formula file system is realized whole
The promotion of performance.
Referring next to the data-storage system for the distributed file system that attached drawing description proposes according to embodiments of the present invention.Ginseng
According to shown in Fig. 7, which includes: receiving module 100, judgment module 200, the first memory module 300 and the second storage mould
Block 400.
Wherein, receiving module 100 is used to receive the data file of user's transmission.Judgment module 200 is for judging data text
The size of part.If the size of data file is less than preset value, the first memory module 300 is used for data file by being based on
The KV storage method of LSM-Tree is stored to the key-value database of cloud server.If the size of data file is greater than
Preset value, the second memory module 400 is used to data file cutting be multiple subdata files, and stores to local file system
System.The storage system 1000 of the embodiment of the present invention by based on LSM (Log-Structure Merge) Tree KV memory technology and
Local file system combines, and realizes the promotion for distributed file system data access efficiency.
Preferably, in one embodiment of the invention, preset value can be 64MB.It should be noted that changing preset value
It is only that for exemplary purposes, it's not limited to that for the i.e. default size of preset value.
Firstly, the embodiment of the present invention remains the overall architecture of MooseFS, meta data server referring to shown in Fig. 2
It is known as Master in MooseFS, or is most of code of Metadata Server) and client, realizes a new number
According to server.Wherein, the structure of MooseFS is similar with GFS, by multiple client 10 (Clients), metadata management server
20 (Master) and multiple data servers 30 (Chunkserver) are constituted, and the function that each part undertakes is also similar with GFS.
Specifically, in an embodiment of the present invention, the function that each part undertakes is as follows:
Client in multiple client 10 is responsible for the resource virtualizing of entire distributed file system utilizing FUSE
POSIX interface is externally provided.After using client carry, using MooseFS can be used as using local disk.
Metadata management server 20 is responsible for management metadata information, and the Placement Strategy of determination data safeguards entire system
The operation of system.Wherein, entirely servicing after Master delay machine will also stop.In order to protect the reliability of metadata, MooseFS
A Metadatalogger (metadata log server) has also been devised, so that the operation to metadata is backed up in realtime.
Chunkserver in multiple data servers 30 is responsible for storage of the data on disk.Wherein,
The realization of Chunkserver depends on the process of a User space, and the storage of bottom then relies on local file system.
Further, in one embodiment of the invention, referring to shown in Fig. 8, above-mentioned storage system 1000 further include:
Key generation module 500.Wherein, Key generation module 500 is for the data block ID according to data file, data block version, data
The Key of block serial number gencration data file.
Further, in one embodiment of the invention, referring to shown in Fig. 8, above-mentioned storage system 1000 further include: text
Part name generation module 600.Filename generation module 600 is used for data block ID and data block version according to each subdata file
Generate the filename of the corresponding local file system of multiple subdata files.
Specifically, in one embodiment of the invention, the reading and writing data process and storage scheme of MooseFS also with GFS
It is similar.Size, which carries out stripping and slicing, (will be determined, default 64MB) according to 64M for big file MooseFS when compiling, will be divided into multiple
Chunk (data block is equivalent to and is divided into multiple subdata files) is stored on different Chunkserver.For small documents
MooseFS then regards an individual Chunk and is stored on Chunkserver.Each chunk corresponds to Chunkserver
One actual file of upper file system.Metadata management server 20 maintains the corresponding pass from filename to chunk list
System, the mark of chunk is one 64 chunkID (data block ID).When carrying out reading and writing data, client can first to
IP and the end of the chunid and the Chunkserver where chunk of the corresponding chunk of 20 demand file of metadata management server
Slogan.Client can issue reading and writing data request after taking these information to specified Chunkserver.If it is read operation,
Client completes reading data from specified Chunkserver.If it is write operation, client needs to pass data to be written
To a Chunkserver, which can forward the data to other pairs while completing the read-write of local disk
This place data server.Master is needed to confirm after the result that client receives the write operation that Chunkserver is transmitted
The change of metadata information or the failure of write operation.
Further, in one embodiment of the invention, referring to shown in Fig. 3, the data file of the embodiment of the present invention is pressed
Different storage schemes is provided according to different sizes, and provides the side of LSM technology cooperation key-value database for small documents
Case, to bring the promotion of overall efficiency.Specifically, in SepStore, for big file by according to fixed size for example
64MB carries out cutting and puts chunk on a different server, and small documents are then considered an individual chunk.Finally
There are the files of three kinds of sizes on Chunkserver, are considered as small documents one is be less than size preset value T, one is
Between T and 64MB, last one kind is then the file of 64MB.The selection of T is a critical issue, root of the embodiment of the present invention
The size for choosing 64KB as T is factually tested, but in practical applications, it should be selected according to application scenarios.In SepStore
Chunkserver will be respectively the first file and second, third file provides targeted storage scheme.The present invention is real
Example is applied to combine by the KV memory technology and local file system for being based on LSM (Log-Structure Merge) Tree, thus
Optimum distribution formula file system data access efficiency.
Specifically, the Chunkserver structure of SepStore is as shown in Figure 3.Wherein, the Chunkserver of SepStore
Be divided into three parts: a part is File Region, is used to store biggish file and is greater than T (i.e. T is used to distinguish file and be
It is put into the boundary value of File Region or KV Region, T is defaulted as 64KB in SepSotre, if the T next mentioned
No specified otherwise is 64KB without exception) it is less than the file of 64MB;Another part is then KV Region, is mainly used for storing small text
Part is the file for being less than T;Last part is then Metadata Region, for storing data on server
The metadata informations such as chunkid, version (data block version).The embodiment of the present invention will be greater than the file deposit File of T
Region, the file less than T are stored in KV region, but this divide not is absolute, such as deposits in KV Region
Small documents volume may become larger than T after a write operation, when this happens, data can't be migrated immediately
But multiple KV pairs can be temporarily cut into, real migration operation is then by the asynchronous completion of background thread.
Further, Chunkserver externally provides unified access interface, mainly include read, write and
delete.Specifically, after being connected to client request, Chunkserver can generate task and be put into task pool, task queue
It is the queue of a first in first out, there is no priority differences between task.Chunkserver maintains a thread pool to locate
The task in task pool is managed, read-write task is all asynchronous completion.Wherein, at runtime, all metadata can be all loaded
To in memory, and it is organized into a huge Hash table, chunkid is then the input value of Hash table.Chunkserver is removed
Outside the request of data at customer in response end, can also timing to Master report the letter such as local disk service condition, error situation
Breath.
Further, in one embodiment of the invention, referring to shown in Fig. 4, KV different from local file system
Region provides the NameSpace of a flattening.Wherein, each key-value is to having unique Key, data Cooley
The positioning of data is realized with Key.In the design of KV Region, there is no preservation Key values for the embodiment of the present invention, but set
A kind of simple algorithm has been counted to generate Key.Specifically, the Key of each chunk is formed as shown, by chunkid (data
Block ID), version (data block version) and blocknum (data block sequence number) are collectively formed.Wherein chunkid and version
All be directly be transmitted through by client Lai information, blocknum is then used to current key-value to the position in entire chunk
It sets.The size of each block is fixed for example, 64KB, therefore blocknum can pass through following equation blocknum=
Offset%64KB is obtained.
Further, in one embodiment of the invention, referring to Figure 5, under distributed environment, in order to realize base
In the reading and writing data function of local file system, SepStore need to safeguard some additional metadata informations realize from
Mapping of the chunkid to system file.As the design of KVRegion, in File Region, the embodiment of the present invention is not
Have and directly save the map information of chunkid to local file, but uses a simple algorithm to realize data
Address function.Specifically, the filename of the corresponding local file system of each chunk is made of chunid, version,
Structure is as shown, this lie in the way in file name there are two benefit for metadata information:
The quick positioning from chunkid to data may be implemented.
SepStore opens a background thread and is periodically scanned to all File Region All Files, and
Using the information that filename includes come the consistency of verify data.
In order to improve the reading performance of file, File Region takes following Optimized Measures:
(1) compared to the significantly more efficient cache policy of local file system, SepStore is maintained on file system
Data buffer storage for block simultaneously realizes LRU (Least Recently Used at least uses page replacement algorithm) strategy.
(2) data load balance between catalogue.Most of file system all safeguard data using B+ tree or its mutation
Index, the harmony of these index structures preferably, but cannot handle the situation that file is excessive under same catalogue well.Together
When, if directories deep excessively if will increase chunkserver metadata burden.Therefore, it is desirable to can guarantee the catalogue number of plies as far as possible
As far as possible by file equiblibrium mass distribution under the premise of few.SepStore uses a simple strategy to realize the equilibrium point of file
Cloth.Under the root directory, default is created 256 subdirectories by SepStore, and the position of chunk is then determined by chunkid%256
Determine the sub-directory location of file.
Wherein, in one embodiment of the invention, referring to shown in Fig. 8, above-mentioned storage system 1000 further include: check code
Generation module 700.Wherein, check code generation module 700 is used to generate corresponding school according to data block in each subdata file
Code is tested, to safeguard subdata file.In other words, a difference of File Region and KV Region is that File Region is
Each chunk maintains a check code, and every 4KB has the verification data an of byte.
Further, in one embodiment of the invention, referring to shown in Fig. 6, by utilizing KV storage cooperation LSM skill
Art, the embodiment of the present invention have designed and Implemented the Chunkserver for having preferable performance for big small documents.?
In SepStore, big file will be stored as local file to make full use of the characteristics of local file system is for sequential write, and
Small documents will be stored in the KV database that one uses LSM to improve readwrite performance.It is complex in view of process is write, it will
The read-write process for illustrating whole system for writing process, after improvement whole system write process the following steps are included:
S1, client initiates write request, and relevant parameter (filename, displacement etc.) is issued metadata management server
20。
S2 means that this is a newly-built operation if file is not present.Metadata management server 20 can be to file
Distribution fileid (64 integers that SepStore has used an overall situation incremental to distribute the value), chunkid, and comprehensively consider
The determination datas placement location such as quantity, service condition of existing Chunkserver, finally returns to client for relevant information.It is no
Then, IP, port for finding chunk server according to chunkid can be directly returned to client by Meta Server (Master)
End.
S3, client send the data to Chunkserver.A task is directly generated if file has existed to put
Enter in work pool, otherwise, then means that this is a newly-built chunk operation, Chunkserver needs to determine putting for file
Seated position is placed in KV Region and is still stored directly in file system.After determining placement location,
Chunkserver can distribute corresponding metadata information to file.
S4, Chunkserver maintain a thread pool to complete the work inside work pool.There are two types of inside work pool
Work, it is another then be the write operation for KV Region one is the write operation for File Region.
Processing result is returned to client by S5, Chunkserver.Client can determine to retry or notify member according to result
Data management server 20 modifies associated metadata.
It should be noted that reading process and to write process closely similar, distinguishes and be that step S3 can directly search data bit
It sets, and completes reading data in step S4 and be not described in detail herein to reduce redundancy.
Current distributed file system depends directly on local file system mostly and improves disk management function.However,
In distributed file system, there is many unnecessary expenses, such as data block for the data access process on data server
The positioning expense of storage address.The embodiment of the present invention combines key-value database, LSM technology and local file system
Come, file distinguished according to size, and is optimized particular for the read-write of small documents, reduces global random read-write,
To realize the promotion of overall performance.Wherein, small documents will be stored in the key-value data using LSM technology, and
The data block for being cut into fixed size is stored on local file system by big file in a manner of ordinary file, realizes one
A distributed file system prototype SepStore, and pass through the experimental verification validity of prioritization scheme.Experimental result shows,
SepStore can be improved 210% for the write operation speed of small documents.Under the load of big small documents mixing, integrally handling up can be with
78% is promoted, whole IOPS can promote 37%.
The data-storage system of the distributed file system proposed according to embodiments of the present invention, by by data file according to
Size distinguishes, if the size of data file is less than certain value, data file is deposited by the KV based on LSM-Tree
Method for storing is stored to the key-value database of cloud server, and if the size of data file is greater than certain value, will
Data file cutting is multiple subdata files, and is stored to local file system, and particular for small documents read-write into
Row optimization, to improve the efficiency of distributed file system, the data access efficiency of Optimum distribution formula file system is realized whole
The promotion of performance.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes
It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion
Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discussed suitable
Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be of the invention
Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use
In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for
Instruction execution system, device or equipment (such as computer based system, including the system of processor or other can be held from instruction
The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set
It is standby and use.For the purpose of this specification, " computer-readable medium ", which can be, any may include, stores, communicates, propagates or pass
Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment
It sets.The more specific example (non-exhaustive list) of computer-readable medium include the following: there is the electricity of one or more wirings
Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory
(ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits
Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable
Medium, because can then be edited, be interpreted or when necessary with it for example by carrying out optical scanner to paper or other media
His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the invention can be realized with hardware, software, firmware or their combination.Above-mentioned
In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage
Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware
Any one of column technology or their combination are realized: having a logic gates for realizing logic function to data-signal
Discrete logic, with suitable combinational logic gate circuit specific integrated circuit, programmable gate array (PGA), scene
Programmable gate array (FPGA) etc..
Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries
It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium
In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.
It, can also be in addition, each functional unit in each embodiment of the present invention can integrate in a processing module
It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould
Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as
Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer
In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example
Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not
Centainly refer to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be any
One or more embodiment or examples in can be combined in any suitable manner.
Although the embodiments of the present invention has been shown and described above, it is to be understood that above-described embodiment is example
Property, it is not considered as limiting the invention, those skilled in the art are not departing from the principle of the present invention and objective
In the case where can make changes, modifications, alterations, and variations to the above described embodiments within the scope of the invention.
Claims (6)
1. a kind of date storage method of distributed file system, which comprises the following steps:
Receive the data file that user sends;
Judge the size of the data file;
If the size of the data file is less than preset value, the data file is passed through into the merger tree based on journal format
The KV storage method of LSM-Tree is stored to the key-value database of cloud server, and described by the data file
By the KV storage mode of LSM-Tree into the key-value database of cloud server after, further includes: according to the number
According to the Key of data file described in the data block ID of file, data block version and data block serial number gencration, and the data block ID,
Data block version is directly obtained by client, and the data block sequence number is the Key of the data file in entire data block
Position;And
It is multiple subdata texts by the data file cutting if the size of the data file is greater than the preset value
Part, and storing to local file system, and it is described by the data file cutting be multiple subdata files, and store to this
After ground file system, further includes: generate the multiple son according to the data block ID of each subdata file and data block version
The filename of the corresponding local file system of data file, to realize the addressing function of data.
2. the date storage method of distributed file system according to claim 1, which is characterized in that further include: according to
Data block generates corresponding check code in each subdata file, to safeguard subdata file.
3. the date storage method of distributed file system according to claim 1, which is characterized in that the preset value is
64MB。
4. a kind of data-storage system of distributed file system characterized by comprising
Receiving module, for receiving the data file of user's transmission;
Judgment module, for judging the size of the data file;
First memory module is based on if the size of the data file is less than preset value for passing through the data file
The KV storage method of LSM-Tree is stored to the key-value database of cloud server;
Key generation module, for according to the data block ID of the data file, data block version and data block serial number gencration
The Key of data file, and the data block ID, data block version are directly obtained by client, the data block sequence number is described
Position of the Key of data file in entire data block;And
Second memory module is used for if the size of the data file is greater than the preset value by the data file cutting
For multiple subdata files, and store to local file system;
Filename generation module, for generating the multiple son according to the data block ID and data block version of each subdata file
The filename of the corresponding local file system of data file, to realize the addressing function of data.
5. the data-storage system of distributed file system according to claim 4, which is characterized in that further include:
Check code generation module, for generating corresponding check code according to data block in each subdata file, with maintenance
Subdata file.
6. the data-storage system of distributed file system according to claim 4, which is characterized in that the preset value is
64MB。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410645370.0A CN104408091B (en) | 2014-11-11 | 2014-11-11 | The date storage method and system of distributed file system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410645370.0A CN104408091B (en) | 2014-11-11 | 2014-11-11 | The date storage method and system of distributed file system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104408091A CN104408091A (en) | 2015-03-11 |
CN104408091B true CN104408091B (en) | 2019-03-01 |
Family
ID=52645722
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410645370.0A Active CN104408091B (en) | 2014-11-11 | 2014-11-11 | The date storage method and system of distributed file system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104408091B (en) |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105159915B (en) * | 2015-07-16 | 2018-07-10 | 中国科学院计算技术研究所 | The LSM trees merging method and system of dynamic adaptable |
CN105138632A (en) * | 2015-08-20 | 2015-12-09 | 浪潮(北京)电子信息产业有限公司 | Organization and management method for file data and file management server |
CN106557509A (en) * | 2015-09-29 | 2017-04-05 | 镇江雅迅软件有限责任公司 | A kind of distributed file system |
CN105787093B (en) * | 2016-03-17 | 2019-07-02 | 清华大学 | A kind of construction method of the log file system based on LSM-Tree structure |
CN107656697B (en) * | 2016-07-26 | 2021-03-02 | 阿里巴巴集团控股有限公司 | Method and device for operating data on storage medium |
CN107870940B (en) | 2016-09-28 | 2021-06-18 | 杭州海康威视数字技术股份有限公司 | File storage method and device |
CN107977341A (en) * | 2016-10-21 | 2018-05-01 | 北京航天爱威电子技术有限公司 | Big data text immediate processing method |
CN106412093B (en) * | 2016-10-25 | 2019-07-23 | Oppo广东移动通信有限公司 | A kind of method for uploading of data, apparatus and system |
CN106708427B (en) * | 2016-11-17 | 2019-05-10 | 华中科技大学 | A kind of storage method suitable for key-value pair data |
CN107193988A (en) * | 2017-05-30 | 2017-09-22 | 梅婕 | The quick method for cleaning of data |
CN108052284B (en) * | 2017-12-08 | 2020-11-06 | 北京奇虎科技有限公司 | Distributed data storage method and device |
CN108446363B (en) * | 2018-03-13 | 2021-05-25 | 北京奇安信科技有限公司 | Data processing method and device of KV engine |
CN109241015B (en) * | 2018-07-24 | 2021-07-16 | 北京百度网讯科技有限公司 | Method for writing data in a distributed storage system |
CN109491807A (en) * | 2018-11-01 | 2019-03-19 | 浪潮软件集团有限公司 | Data exchange method, device and system |
CN109684414B (en) * | 2018-12-26 | 2022-04-08 | 百度在线网络技术(北京)有限公司 | Method, device and equipment for synchronizing block data and storage medium |
CN110321077B (en) * | 2019-06-17 | 2023-04-14 | 浩云科技股份有限公司 | Method and device for managing centrally stored files |
CN112486939A (en) * | 2019-09-11 | 2021-03-12 | 上海擎感智能科技有限公司 | Public cloud-based Moosefs distributed file storage method, system, medium and device |
CN112699092B (en) * | 2021-01-13 | 2023-02-03 | 浪潮云信息技术股份公司 | Method for storing big value data by RocksDB |
CN112965856B (en) * | 2021-02-24 | 2022-04-08 | 上海英方软件股份有限公司 | Backup data-based fast fine-grained recovery method and device |
CN113094372A (en) | 2021-04-16 | 2021-07-09 | 三星(中国)半导体有限公司 | Data access method, data access control device and data access system |
CN113688099B (en) * | 2021-08-09 | 2023-10-13 | 上海沄熹科技有限公司 | SPDK-based database storage engine acceleration method and system |
CN117520305B (en) * | 2023-11-21 | 2024-04-23 | 北京中领启天信息科技有限公司 | High concurrency data migration method and data security storage device |
CN118349532B (en) * | 2024-06-17 | 2024-08-27 | 北京乐讯科技有限公司 | Filecoin scene adaptation method and system based on additional storage |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101916289A (en) * | 2010-08-20 | 2010-12-15 | 浙江大学 | Method for establishing digital library storage system supporting mass small files and dynamic backup number |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7024427B2 (en) * | 2001-12-19 | 2006-04-04 | Emc Corporation | Virtual file system |
-
2014
- 2014-11-11 CN CN201410645370.0A patent/CN104408091B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101916289A (en) * | 2010-08-20 | 2010-12-15 | 浙江大学 | Method for establishing digital library storage system supporting mass small files and dynamic backup number |
Non-Patent Citations (2)
Title |
---|
SDFS分布式文件系统的研究与设计;罗雄威;《中国优秀硕士学位论文全文数据库 信息科技辑》;20131215(第82期);第7、38、40、45、47页 |
一种基于海量信息处理的云存储模型研究;张桂刚等;《计算机研究与发展》;20121231;第32-36页 |
Also Published As
Publication number | Publication date |
---|---|
CN104408091A (en) | 2015-03-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104408091B (en) | The date storage method and system of distributed file system | |
US10198356B2 (en) | Distributed cache nodes to send redo log records and receive acknowledgments to satisfy a write quorum requirement | |
US10977124B2 (en) | Distributed storage system, data storage method, and software program | |
US9507800B2 (en) | Data management in distributed file systems | |
US10474631B2 (en) | Method and apparatus for content derived data placement in memory | |
KR100490723B1 (en) | Apparatus and method for file-level striping | |
US10216757B1 (en) | Managing deletion of replicas of files | |
US9400792B1 (en) | File system inline fine grained tiering | |
US8312242B2 (en) | Tracking memory space in a storage system | |
US8996490B1 (en) | Managing logical views of directories | |
US20230016822A1 (en) | Creating Batches Of Training Data For Machine Learning Workflows | |
CN101137981A (en) | Methods and apparatus for managing the storage of content in a file system | |
US9383936B1 (en) | Percent quotas for deduplication storage appliance | |
JP2015521310A (en) | Efficient data object storage and retrieval | |
CN101567003A (en) | Method for managing and allocating resource in parallel file system | |
CN103635900A (en) | Time-based data partitioning | |
US20140181455A1 (en) | Category based space allocation for multiple storage devices | |
US20200042399A1 (en) | Method, apparatus and computer program product for managing data storage | |
US11199990B2 (en) | Data reduction reporting in storage systems | |
US9916102B1 (en) | Managing data storage reservations on a per-family basis | |
US10481820B1 (en) | Managing data in storage systems | |
CN109522283A (en) | A kind of data de-duplication method and system | |
CN108733306A (en) | A kind of Piece file mergence method and device | |
CN103412929A (en) | Mass data storage method | |
US10409687B1 (en) | Managing backing up of file systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |