WO2018233331A1

WO2018233331A1 - File storage method and system and computer storage medium

Info

Publication number: WO2018233331A1
Application number: PCT/CN2018/079683
Authority: WO
Inventors: 江汛洋; 葛利亚; 王静; 李道兵; 许式伟
Original assignee: 上海七牛信息技术有限公司
Priority date: 2017-06-22
Filing date: 2018-03-20
Publication date: 2018-12-27
Also published as: CN107229427B; CN107229427A

Abstract

A file storage method and system, and a computer storage medium. The method comprises: in a file system layer, blocking a file to form multiple data blocks and performing an out-of-order writing operation on the data blocks (S110); synchronizing the data blocks into the queue of an object storage; if data of the data blocks changes, adding a task into the queue of the object storage; periodically executing the tasks in the queue of the object storage according to a first preset period (S120); in an object storage layer, splicing, according to an operational instruction, the multiple data blocks in a preset sequence into a file (S130); and setting a cycle recycling task in the file system layer, the cycle recycling task comprising: according to a preset strategy, recycling data blocks in the file system synchronized into the object storage and satisfying a preset condition, deleting the data blocks and marking the address of the data blocks to be object storage (S140). The described hierarchical storage method merging the file system and the object storage not only can support out-of-order reading and writing, but also has the advantage of the object storage; i.e., the method has low cost, can easily distribute concurrent accesses and can support mass storage.

Description

File storage method, system and computer storage medium

Technical field

The present invention relates to the field of storage technologies, and more particularly to a file storage method, system, and computer storage medium.

Background technique

Object storage has the characteristics of low cost, easy to distribute concurrent access, and support mass storage, but does not support random writing. At present, many software still have the need for out-of-order reading and writing of the storage system, and the file system can support out-of-order reading and writing.

Summary of the invention

The technical problem to be solved by the present invention is to provide a file storage method, system and computer storage medium capable of supporting out-of-order read and write and object storage advantages.

The object of the present invention is achieved by the following technical solutions:

A file storage method comprising:

At the file system layer, the file is divided into blocks to form a plurality of data blocks, and the data blocks are written out of order;

Synchronizing the data block to the object storage queue. If the data of the data block changes, the task is added to the object storage queue, and the tasks in the object storage queue are cyclically executed according to the first preset period;

In the object storage layer, splicing a plurality of data blocks into files according to an operation instruction in a preset order;

Setting a loop recycling task in the file system layer; the loop recycling task includes: reclaiming a data block in the file system that has been synchronized to the object storage and satisfying a preset condition according to a preset policy, and deleting the data block and marking the data block The data block address is stored as an object.

Further, it also includes:

Set a loop retransmission task in the file system layer;

The cyclic retransmission task includes: acquiring a data block in the file system that is not synchronized to the object storage in a second preset period, and generating a synchronization task according to the data block, and adding the synchronization task to the object storage queue.

Further, the performing an out-of-order write operation on the data block includes:

If the data block of the file is in the file system layer, write directly;

If the data block of the file is in the object store, the data block is read from the object store and stored in the file system, and then overwritten.

Further, the step of assembling the plurality of data blocks into the file according to the operation instruction according to the operation instruction includes:

Used to splicing files if the file data block is in the object store and has not expired;

If the file data block completes the overwrite write at the file system layer, the file data block is re-uploaded into the object store;

If the file data block is in the object store but has expired, the data block is read from the object store by offset, and the data block is downloaded to the file system and re-uploaded to form a data block.

Further, it also includes:

In the file system layer, when a file out-of-order read operation is performed, if the file is in the file system, the data block formation file is directly read from the file system.

A file storage system comprising:

Processing module: used to form a plurality of data blocks in a file system layer, and perform an out-of-order write operation on the data blocks;

a synchronization module, configured to synchronize the data block to the object storage queue, if the data of the data block changes, add a task in the object storage queue, and execute the task in the object storage queue cyclically according to the first preset period;

a splicing module, configured to splicing a plurality of data blocks into a file in a preset order according to an operation instruction in an object storage layer;

a recycler, configured to: perform a loopback task in a file system layer, the loopback task includes: reclaiming, in a preset policy, a data block in the file system that has been synchronized to the object store and meets a preset condition, and the data is The block is deleted and the data block address is marked as an object store. Further, the processing module is further configured to set a cyclic retransmission task in the file system layer; the cyclic retransmission task includes: acquiring, in the second preset period, a data block in the file system that is not synchronized to the object storage, and A synchronization task is generated according to the data block, and the synchronization task is added to the object storage queue.

Further, the processing module is further configured to directly write if the data block of the file is in the file system layer; the processing module is further configured to: if the data block of the file is in the object storage, then the data block Read from the object store and store it in the file system, then overwrite the write.

Further, the splicing module is further configured to splicing a file if the file data block is in the object storage and has not expired;

The splicing module further re-uploads the file data block into the object storage if the file data block completes the overwrite writing at the file system layer;

The splicing module also uses the offset if the file data block is in the object store but has expired, and reads the data block from the object storage by offset, and downloads the data block to the file system, and then re-uploads to form a data block.

Further, the processing module is further configured to perform a file out-of-order read operation in the file system layer, and if the file is in the file system, directly read the data block forming file from the file system.

A computer storage medium storing a program, the program performing the steps of any of the above.

In the present invention, at the file system layer, the file is divided into blocks to form a plurality of data blocks, and the data blocks are written out of order; then the data blocks are synchronized to the object storage queue, and if the data of the data block changes, the object is The storage queue adds a task, and cyclically executes the tasks in the object storage queue according to the first preset period; in the object storage layer, according to the operation instruction, the plurality of data blocks are spliced into files according to a preset order, and a loop recycling task is set in the file system layer. The loop recycling task includes: reclaiming, in a preset policy, a data block in the file system that has been synchronized to the object storage and satisfying the preset condition, and deleting the data block and marking the data block address as an object storage. The tiered storage method of the file system and the object storage can achieve the advantages of being able to support out-of-order reading and writing, and possessing object storage, that is, low-cost, easy to distribute concurrent access, and support mass storage. The recycle task can maintain a large number of files at the file system level and transfer them to the object store.

DRAWINGS

1 is a flowchart of a file storage method according to an embodiment of the present invention;

2 is a block diagram of a file storage system according to an embodiment of the present invention;

3 is a schematic flowchart of a file system layer writing process according to an embodiment of the present invention;

4 is a schematic diagram of a file storage system according to an embodiment of the present invention.

Detailed ways

Before discussing the exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as a process or method depicted as a flowchart. Although the flowcharts describe various operations as a sequential process, many of the operations can be implemented in parallel, concurrently or concurrently. In addition, the order of operations can be rearranged. The process may be terminated when its operation is completed, but may also have additional steps not included in the figures.

It should also be noted that in some alternative implementations, the functions/acts noted may occur in a different order than that illustrated in the drawings. For example, two figures shown in succession may in fact be executed substantially concurrently or sometimes in the reverse order, depending on the function/acts involved.

The invention will now be further described with reference to the drawings and preferred embodiments.

As shown in FIG. 1, a file storage method includes steps S110-S140. among them:

S110: At the file system layer, the file is divided into blocks to form a plurality of data blocks, and the data blocks are written out of order.

Specifically, at the file system layer, the file is diced and sliced to form a plurality of data blocks, and each data block size may be set by a system or by a user. At the file system level, it is supported to perform an out-of-order write operation on a file, and a single write operation of the file is split into a write operation to multiple data blocks.

S120: Synchronize the data block to the object storage queue. If the data of the data block changes, add a task in the object storage queue, and execute the task in the object storage queue cyclically according to the first preset period. In the object storage layer, multiple files can be stitched into one large file in order. Since the files in the object storage are stored in order, it is convenient to support reading some data at an offset. The object storage layer supports chunk uploading and chunking into a single file, but does not support out-of-order write operations. Object layer storage data is mainly used for archiving and distribution. In the file system layer, a data block is synchronized to the object storage queue Q. After the data is written or modified, the task is added to the synchronization queue and the repeated tasks are merged, and the queue task is executed according to the first preset period. The time of the first preset period can be automatically set according to the system, or can be set by the user.

S130: In the object storage layer, multiple data blocks are spliced into files according to an operation instruction in a preset order.

The operation instruction may be an instruction to splicing a plurality of data blocks into a file. When the user triggers a close/sync operation on the file at the file system level, the data block is spliced into a file task in the object store.

S140: Set a loop recycling task in the file system layer; the loop recycling task includes: reclaiming, in a preset policy, a data block in the file system that is synchronized to the object storage and satisfying a preset condition, and deleting the data block and The data block address is marked as an object store.

The file system layer runs a loop recycle task, which recycles the data that has been synchronized to the object store by cycle and policy, and then pulls from the object store when reading data from the file system layer. With the recycle task, you can maintain a reasonable transfer of compressed files to the object store at the file system level. A loop recycling task is started in the file system layer, and a file in the file system that has been synchronized to the object storage and meets the user setting conditions is obtained according to a preset policy, and the file is deleted and the marked file address is stored in the object. The default policy can be a user-specified policy, such as date modified and frequency of use.

In this embodiment, the hierarchical storage method of the file system and the object storage is combined to achieve the advantages of being able to support out-of-order reading and writing, and possessing object storage, that is, low-cost, easy to distribute concurrent access, and support mass storage. In this embodiment, the out-of-order writes are written on the file system, and the storage structure that mainly falls on the object storage is read. The recycle task can maintain a large number of files at the file system level and transfer them to the object store.

Optionally, the file storage method further includes: setting a cyclic retransmission task in the file system layer; the cyclic retransmission task includes: acquiring, in the second preset period, a data block in the file system that is not synchronized to the object storage, and A synchronization task is generated according to the data block, and the synchronization task is added to the object storage queue.

A loop retransmission task is started in the file system layer, and the files in the file system that are not synchronized to the object storage are obtained in cycles, and the synchronization task is added to the synchronization queue Q according to the file generation.

Specifically, the out-of-order write operation on the data block includes:

If the data block of the file is in the file system layer, write directly;

In the file system layer, when out-of-order reading and writing, if the file is already written directly at the file system layer, if it is in the object storage, the corresponding data block of the file is read from the object storage and stored in the file system. , then overwrite write.

Specifically, the method of assembling a plurality of data blocks into a file according to an operation instruction according to a preset order includes:

When the end user triggers the splicing file, the number of file data blocks is determined according to the size of the file at this time, and it is determined whether all the data blocks are in the object storage one by one, and is valid, and is read from the disk and restarted from the object storage as needed. Download, read and re-upload data blocks directly from memory. There are four cases required for splicing file data blocks:

Case 1: The file data block is already in the object store and has not expired. It can be reused for splicing files and reused, that is, read from disk.

Case 2: The file data block is overwritten in the file system layer and then re-uploaded into the object storage. It is re-downloaded from the object storage. You can also set the version variable. Specifically, each file data block has two version variables, one is The file content version, the file data block is zero when it is created, and each subsequent update is incremented. The other is the file data block upload version. After each data block upload, the upload version number is set to the file content version number. In this case, Check whether the file content version and the uploaded version are consistent. If they are inconsistent, they will be retransmitted.

Case 3: If the file data block is already in the object store but has expired, it is downloaded from the object store as an offset to the file system and then re-uploaded to form a block.

Case 4, if the data block has not been uploaded, the upload is triggered. If the splicing file fails during the process, the splicing is terminated, and the splicing is triggered again by the retransmission module.

When the truncate operation is performed on the file, the file size in the local file system is updated, and the corresponding data block of the truncate size boundary is updated, and the file content version is incremented. Thus, the truncate file can be correctly reflected in the file splicing stage in the trigger object storage. Optionally, the file storage method further includes: when the file out-of-order read operation is performed in the file system layer, if the file is in the file system, the data block forming file is directly read from the file system.

In the file system layer, when the out-of-order read is performed, if the file is in the file system, it is directly read from the file system, and a certain amount of data blocks are continuously read according to the user configuration to reduce the network request amount.

As shown in FIG. 2, the file A is divided into several data block files A block 1, file A block 2, and the like, and the file B is also divided into several data block files B block 1 and file B block 2.

The data block is then added to the synchronization queue Q through the file writing module, or into the synchronization queue Q through the cyclic retransmission module.

The data blocks in the synchronous queue Q are stored in the object storage and combined into a file A and a file B.

Among them, the file system can read data out of order from the object storage and store the file data block to the file system layer.

Among them, the recovery of file modules by policy can recover the data blocks in the file system.

As shown in FIG. 3, the file system layer writing process includes randomly writing data to the file system layer.

The file write content is split into writes for multiple data blocks.

If the data block is written to overwrite the write data, the write is overwritten, and if the data block is written as an additional write under the new data block, the write is additionally performed;

If it is overwritten, it is judged whether the corresponding data block is in the file system layer. If it is not in the system layer, the data block data is read from the object storage layer and stored in the file storage layer, and then the data is written to the file system layer. On the disk. If in the system layer, the data is written to the file system layer disk.

If it is an additional write, the data is written to the file system layer disk.

In another preferred embodiment of the present invention, as shown in FIG. 4, a file storage system 200 includes a processing module 210, a synchronization module 220, a splicing module 230, and a recycler 240.

The processing module 210 is configured to block the file into a plurality of data blocks at the file system layer, and perform an out-of-order write operation on the data block.

Specifically, at the file system layer, the processing module 210 performs dicing and fragmenting the file to form a plurality of data blocks, and each data block size may be set by the system or may be specified by the user. At the file system level, it is supported to perform an out-of-order write operation on a file, and a single write operation of the file is split into a write operation to multiple data blocks.

The synchronization module 220 is configured to synchronize the data block to the object storage queue. If the data of the data block changes, the task is added to the object storage queue, and the tasks in the object storage queue are cyclically executed in the first preset period.

The synchronization module 220 maintains a data block synchronization to the object storage queue Q in the file system layer. After writing or modifying the data, the task is added to the synchronization queue and the repeated tasks are merged, and the queue task is executed according to the first preset period. The time of the first preset period can be automatically set according to the system, or can be set by the user.

The splicing module 230 is configured to splicing a plurality of data blocks into files in a preset order according to an operation instruction in the object storage layer.

The operation instruction may be an instruction to splicing a plurality of data blocks into a file. When the user triggers a close/sync operation on the file at the file system layer, the splicing module 230 will trigger the splicing of the data blocks into file tasks in the object storage.

a recycler 240, configured to perform a loopback task in a file system layer, where the loopback task includes: reclaiming, in a preset policy, a data block in the file system that has been synchronized to the object store and meets a preset condition, and The data block is deleted and the data block address is marked as an object store.

The file system layer runs a recycler 240. The recycler 240 recycles the data that has been synchronized to the object store by cycle and policy, and then pulls from the object store when reading data from the file system layer. With the recycler, you can maintain a reasonable transfer of compressed files to the object store at the file system level. A loop recycling task is started in the file system layer, and a file in the file system that has been synchronized to the object storage and meets the user setting conditions is obtained according to a preset policy, and the file is deleted and the marked file address is stored in the object. The default policy can be a user-specified policy, such as date modified and frequency of use.

Optionally, the processing module is further configured to: set a cyclic retransmission task in the file system layer; the cyclic retransmission task includes: acquiring a data block in the file system that is not synchronized to the object storage according to the second preset period, and according to The data block generates a synchronization task and adds the synchronization task to the object storage queue.

Optionally, the processing module is further configured to directly write if the data block of the file is in the file system layer; if the data block of the file is in the object storage, read and store the data block from the object storage Go to the file system and then overwrite the write.

Optionally, the splicing module is further configured to splicing the file if the file data block is in the object storage and has not expired; if the file data block completes the overwrite writing at the file system layer, re-uploading the file data block to the object storage If the file data block is in the object store but has expired, the data block is read from the object store by offset, and the data block is downloaded to the file system and re-uploaded to form a data block.

When the truncate operation is performed on the file, the file size in the local file system is updated, and the corresponding data block of the truncate size boundary is updated, and the file content version is incremented. Thus, the truncate file can be correctly reflected in the file splicing stage in the trigger object storage.

Optionally, the processing module is further configured to perform a file out-of-order read operation in the file system layer, and if the file is in the file system, directly read the data block forming file from the file system.

Another preferred embodiment of the present invention is a computer storage medium, the computer storage medium storing a program, the program performing the steps of any of the above.

The above is a further detailed description of the present invention in connection with the specific preferred embodiments, and the specific embodiments of the present invention are not limited to the description. It will be apparent to those skilled in the art that the present invention may be made without departing from the spirit and scope of the invention.

Claims

A file storage method comprising:

At the file system layer, the file is divided into blocks to form a plurality of data blocks, and the data blocks are written out of order;

Synchronizing the data block to the object storage queue. If the data of the data block changes, the task is added to the object storage queue, and the tasks in the object storage queue are cyclically executed according to the first preset period;

In the object storage layer, splicing a plurality of data blocks into files according to an operation instruction in a preset order;

Setting a loop recycling task in the file system layer; the loop recycling task includes: reclaiming a data block in the file system that has been synchronized to the object storage and satisfying a preset condition according to a preset policy, and deleting the data block and marking the data block The data block address is stored as an object.
A file storage method according to claim 1, further comprising:

Set a loop retransmission task in the file system layer;

The cyclic retransmission task includes: acquiring a data block in the file system that is not synchronized to the object storage in a second preset period, and generating a synchronization task according to the data block, and adding the synchronization task to the object storage queue.
The file storage method according to claim 1, wherein the performing an out-of-order write operation on the data block comprises:

If the data block of the file is in the file system layer, write directly;

If the data block of the file is in the object store, the data block is read from the object store and stored in the file system, and then overwritten.
The file storage method according to claim 3, wherein the assembling the plurality of data blocks into the file in a preset order according to the operation instruction comprises:

Used to splicing files if the file data block is in the object store and has not expired;

If the file data block completes the overwrite write at the file system layer, the file data block is re-uploaded into the object store;

If the file data block is in the object store but has expired, the data block is read from the object store by offset, and the data block is downloaded to the file system and re-uploaded to form a data block.
A file storage method according to claim 1, further comprising:

In the file system layer, when a file out-of-order read operation is performed, if the file is in the file system, the data block formation file is directly read from the file system.
A file storage system, comprising:

Processing module: used to form a plurality of data blocks in a file system layer, and perform an out-of-order write operation on the data blocks;

a synchronization module, configured to synchronize the data block to the object storage queue, if the data of the data block changes, add a task in the object storage queue, and execute the task in the object storage queue cyclically according to the first preset period;

a splicing module, configured to splicing a plurality of data blocks into a file in a preset order according to an operation instruction in an object storage layer;

a recycler, configured to: perform a loopback task in a file system layer, the loopback task includes: reclaiming, in a preset policy, a data block in the file system that has been synchronized to the object store and meets a preset condition, and the data is The block is deleted and the data block address is marked as an object store.
A file storage system according to claim 6, wherein the processing module is further configured to set a round-robin retransmission task in the file system layer; and the cyclic retransmission task comprises: acquiring the second preset period The data block in the object storage is not synchronized in the file system, and a synchronization task is generated according to the data block, and the synchronization task is added to the object storage queue.
A file storage system according to claim 6, wherein said processing module is further configured to directly write if a data block of the file is in a file system layer; said processing module is further configured to: if the file The data block is stored in the object store, and the data block is read from the object store and stored in the file system, and then overwritten.
A file storage system according to claim 8, wherein the splicing module is further configured to splicing a file if the file data block is in the object storage and has not expired;

The splicing module further re-uploads the file data block into the object storage if the file data block completes the overwrite writing at the file system layer;

The splicing module also uses the offset if the file data block is in the object storage but has expired, and reads the data block from the object storage by the offset, and downloads the data block to the file system, and then re-uploads to form the data block.
The file storage system according to claim 6, wherein the processing module is further configured to perform a file out-of-order read operation in the file system layer, and if the file is in the file system, directly from the file system. Read the data block to form a file.
A computer storage medium, characterized in that the computer storage medium can store a program, the program execution comprising the steps of any one of claims 1-5.