WO2018233331A1 - Procédé et système de mémorisation de fichiers, et support de stockage informatique - Google Patents

Procédé et système de mémorisation de fichiers, et support de stockage informatique Download PDF

Info

Publication number
WO2018233331A1
WO2018233331A1 PCT/CN2018/079683 CN2018079683W WO2018233331A1 WO 2018233331 A1 WO2018233331 A1 WO 2018233331A1 CN 2018079683 W CN2018079683 W CN 2018079683W WO 2018233331 A1 WO2018233331 A1 WO 2018233331A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
data block
file system
object storage
storage
Prior art date
Application number
PCT/CN2018/079683
Other languages
English (en)
Chinese (zh)
Inventor
江汛洋
葛利亚
王静
李道兵
许式伟
Original Assignee
上海七牛信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海七牛信息技术有限公司 filed Critical 上海七牛信息技术有限公司
Publication of WO2018233331A1 publication Critical patent/WO2018233331A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data

Definitions

  • the present invention relates to the field of storage technologies, and more particularly to a file storage method, system, and computer storage medium.
  • Object storage has the characteristics of low cost, easy to distribute concurrent access, and support mass storage, but does not support random writing. At present, many software still have the need for out-of-order reading and writing of the storage system, and the file system can support out-of-order reading and writing.
  • the technical problem to be solved by the present invention is to provide a file storage method, system and computer storage medium capable of supporting out-of-order read and write and object storage advantages.
  • a file storage method comprising:
  • the file is divided into blocks to form a plurality of data blocks, and the data blocks are written out of order;
  • Synchronizing the data block to the object storage queue If the data of the data block changes, the task is added to the object storage queue, and the tasks in the object storage queue are cyclically executed according to the first preset period;
  • the object storage layer splicing a plurality of data blocks into files according to an operation instruction in a preset order
  • the loop recycling task includes: reclaiming a data block in the file system that has been synchronized to the object storage and satisfying a preset condition according to a preset policy, and deleting the data block and marking the data block
  • the data block address is stored as an object.
  • the cyclic retransmission task includes: acquiring a data block in the file system that is not synchronized to the object storage in a second preset period, and generating a synchronization task according to the data block, and adding the synchronization task to the object storage queue.
  • the performing an out-of-order write operation on the data block includes:
  • the data block of the file is in the object store, the data block is read from the object store and stored in the file system, and then overwritten.
  • step of assembling the plurality of data blocks into the file according to the operation instruction according to the operation instruction includes:
  • the file data block completes the overwrite write at the file system layer, the file data block is re-uploaded into the object store;
  • the data block is read from the object store by offset, and the data block is downloaded to the file system and re-uploaded to form a data block.
  • the file system layer when a file out-of-order read operation is performed, if the file is in the file system, the data block formation file is directly read from the file system.
  • a file storage system comprising:
  • Processing module used to form a plurality of data blocks in a file system layer, and perform an out-of-order write operation on the data blocks;
  • a synchronization module configured to synchronize the data block to the object storage queue, if the data of the data block changes, add a task in the object storage queue, and execute the task in the object storage queue cyclically according to the first preset period;
  • a splicing module configured to splicing a plurality of data blocks into a file in a preset order according to an operation instruction in an object storage layer
  • a recycler configured to: perform a loopback task in a file system layer, the loopback task includes: reclaiming, in a preset policy, a data block in the file system that has been synchronized to the object store and meets a preset condition, and the data is The block is deleted and the data block address is marked as an object store.
  • the processing module is further configured to set a cyclic retransmission task in the file system layer; the cyclic retransmission task includes: acquiring, in the second preset period, a data block in the file system that is not synchronized to the object storage, and A synchronization task is generated according to the data block, and the synchronization task is added to the object storage queue.
  • processing module is further configured to directly write if the data block of the file is in the file system layer; the processing module is further configured to: if the data block of the file is in the object storage, then the data block Read from the object store and store it in the file system, then overwrite the write.
  • the splicing module is further configured to splicing a file if the file data block is in the object storage and has not expired;
  • the splicing module further re-uploads the file data block into the object storage if the file data block completes the overwrite writing at the file system layer;
  • the splicing module also uses the offset if the file data block is in the object store but has expired, and reads the data block from the object storage by offset, and downloads the data block to the file system, and then re-uploads to form a data block.
  • processing module is further configured to perform a file out-of-order read operation in the file system layer, and if the file is in the file system, directly read the data block forming file from the file system.
  • a computer storage medium storing a program, the program performing the steps of any of the above.
  • the file is divided into blocks to form a plurality of data blocks, and the data blocks are written out of order; then the data blocks are synchronized to the object storage queue, and if the data of the data block changes, the object is
  • the storage queue adds a task, and cyclically executes the tasks in the object storage queue according to the first preset period; in the object storage layer, according to the operation instruction, the plurality of data blocks are spliced into files according to a preset order, and a loop recycling task is set in the file system layer.
  • the loop recycling task includes: reclaiming, in a preset policy, a data block in the file system that has been synchronized to the object storage and satisfying the preset condition, and deleting the data block and marking the data block address as an object storage.
  • the tiered storage method of the file system and the object storage can achieve the advantages of being able to support out-of-order reading and writing, and possessing object storage, that is, low-cost, easy to distribute concurrent access, and support mass storage.
  • the recycle task can maintain a large number of files at the file system level and transfer them to the object store.
  • FIG. 1 is a flowchart of a file storage method according to an embodiment of the present invention.
  • FIG. 2 is a block diagram of a file storage system according to an embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of a file system layer writing process according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a file storage system according to an embodiment of the present invention.
  • a file storage method includes steps S110-S140. among them:
  • S110 At the file system layer, the file is divided into blocks to form a plurality of data blocks, and the data blocks are written out of order.
  • the file is diced and sliced to form a plurality of data blocks, and each data block size may be set by a system or by a user.
  • the file system level it is supported to perform an out-of-order write operation on a file, and a single write operation of the file is split into a write operation to multiple data blocks.
  • S120 Synchronize the data block to the object storage queue. If the data of the data block changes, add a task in the object storage queue, and execute the task in the object storage queue cyclically according to the first preset period.
  • the object storage layer multiple files can be stitched into one large file in order. Since the files in the object storage are stored in order, it is convenient to support reading some data at an offset.
  • the object storage layer supports chunk uploading and chunking into a single file, but does not support out-of-order write operations.
  • Object layer storage data is mainly used for archiving and distribution. In the file system layer, a data block is synchronized to the object storage queue Q.
  • the task is added to the synchronization queue and the repeated tasks are merged, and the queue task is executed according to the first preset period.
  • the time of the first preset period can be automatically set according to the system, or can be set by the user.
  • the operation instruction may be an instruction to splicing a plurality of data blocks into a file.
  • the data block is spliced into a file task in the object store.
  • S140 Set a loop recycling task in the file system layer; the loop recycling task includes: reclaiming, in a preset policy, a data block in the file system that is synchronized to the object storage and satisfying a preset condition, and deleting the data block and The data block address is marked as an object store.
  • the file system layer runs a loop recycle task, which recycles the data that has been synchronized to the object store by cycle and policy, and then pulls from the object store when reading data from the file system layer.
  • a loop recycling task is started in the file system layer, and a file in the file system that has been synchronized to the object storage and meets the user setting conditions is obtained according to a preset policy, and the file is deleted and the marked file address is stored in the object.
  • the default policy can be a user-specified policy, such as date modified and frequency of use.
  • the hierarchical storage method of the file system and the object storage is combined to achieve the advantages of being able to support out-of-order reading and writing, and possessing object storage, that is, low-cost, easy to distribute concurrent access, and support mass storage.
  • the out-of-order writes are written on the file system, and the storage structure that mainly falls on the object storage is read.
  • the recycle task can maintain a large number of files at the file system level and transfer them to the object store.
  • the file storage method further includes: setting a cyclic retransmission task in the file system layer; the cyclic retransmission task includes: acquiring, in the second preset period, a data block in the file system that is not synchronized to the object storage, and A synchronization task is generated according to the data block, and the synchronization task is added to the object storage queue.
  • a loop retransmission task is started in the file system layer, and the files in the file system that are not synchronized to the object storage are obtained in cycles, and the synchronization task is added to the synchronization queue Q according to the file generation.
  • the out-of-order write operation on the data block includes:
  • the data block of the file is in the object store, the data block is read from the object store and stored in the file system, and then overwritten.
  • the method of assembling a plurality of data blocks into a file according to an operation instruction according to a preset order includes:
  • the file data block completes the overwrite write at the file system layer, the file data block is re-uploaded into the object store;
  • the data block is read from the object store by offset, and the data block is downloaded to the file system and re-uploaded to form a data block.
  • the number of file data blocks is determined according to the size of the file at this time, and it is determined whether all the data blocks are in the object storage one by one, and is valid, and is read from the disk and restarted from the object storage as needed. Download, read and re-upload data blocks directly from memory. There are four cases required for splicing file data blocks:
  • Case 1 The file data block is already in the object store and has not expired. It can be reused for splicing files and reused, that is, read from disk.
  • Case 2 The file data block is overwritten in the file system layer and then re-uploaded into the object storage. It is re-downloaded from the object storage. You can also set the version variable. Specifically, each file data block has two version variables, one is The file content version, the file data block is zero when it is created, and each subsequent update is incremented. The other is the file data block upload version. After each data block upload, the upload version number is set to the file content version number. In this case, Check whether the file content version and the uploaded version are consistent. If they are inconsistent, they will be retransmitted.
  • Case 3 If the file data block is already in the object store but has expired, it is downloaded from the object store as an offset to the file system and then re-uploaded to form a block.
  • the file storage method further includes: when the file out-of-order read operation is performed in the file system layer, if the file is in the file system, the data block forming file is directly read from the file system.
  • the file system layer when the out-of-order read is performed, if the file is in the file system, it is directly read from the file system, and a certain amount of data blocks are continuously read according to the user configuration to reduce the network request amount.
  • the file A is divided into several data block files A block 1, file A block 2, and the like, and the file B is also divided into several data block files B block 1 and file B block 2.
  • the data block is then added to the synchronization queue Q through the file writing module, or into the synchronization queue Q through the cyclic retransmission module.
  • the data blocks in the synchronous queue Q are stored in the object storage and combined into a file A and a file B.
  • the file system can read data out of order from the object storage and store the file data block to the file system layer.
  • the recovery of file modules by policy can recover the data blocks in the file system.
  • the file system layer writing process includes randomly writing data to the file system layer.
  • the file write content is split into writes for multiple data blocks.
  • the write is overwritten, and if the data block is written as an additional write under the new data block, the write is additionally performed;
  • the data block data is read from the object storage layer and stored in the file storage layer, and then the data is written to the file system layer. On the disk. If in the system layer, the data is written to the file system layer disk.
  • the data is written to the file system layer disk.
  • a file storage system 200 includes a processing module 210, a synchronization module 220, a splicing module 230, and a recycler 240.
  • the processing module 210 is configured to block the file into a plurality of data blocks at the file system layer, and perform an out-of-order write operation on the data block.
  • the processing module 210 performs dicing and fragmenting the file to form a plurality of data blocks, and each data block size may be set by the system or may be specified by the user.
  • the file system level it is supported to perform an out-of-order write operation on a file, and a single write operation of the file is split into a write operation to multiple data blocks.
  • the synchronization module 220 is configured to synchronize the data block to the object storage queue. If the data of the data block changes, the task is added to the object storage queue, and the tasks in the object storage queue are cyclically executed in the first preset period.
  • the synchronization module 220 maintains a data block synchronization to the object storage queue Q in the file system layer. After writing or modifying the data, the task is added to the synchronization queue and the repeated tasks are merged, and the queue task is executed according to the first preset period.
  • the time of the first preset period can be automatically set according to the system, or can be set by the user.
  • the splicing module 230 is configured to splicing a plurality of data blocks into files in a preset order according to an operation instruction in the object storage layer.
  • the operation instruction may be an instruction to splicing a plurality of data blocks into a file.
  • the splicing module 230 will trigger the splicing of the data blocks into file tasks in the object storage.
  • a recycler 240 configured to perform a loopback task in a file system layer, where the loopback task includes: reclaiming, in a preset policy, a data block in the file system that has been synchronized to the object store and meets a preset condition, and The data block is deleted and the data block address is marked as an object store.
  • the file system layer runs a recycler 240.
  • the recycler 240 recycles the data that has been synchronized to the object store by cycle and policy, and then pulls from the object store when reading data from the file system layer. With the recycler, you can maintain a reasonable transfer of compressed files to the object store at the file system level.
  • a loop recycling task is started in the file system layer, and a file in the file system that has been synchronized to the object storage and meets the user setting conditions is obtained according to a preset policy, and the file is deleted and the marked file address is stored in the object.
  • the default policy can be a user-specified policy, such as date modified and frequency of use.
  • the processing module is further configured to: set a cyclic retransmission task in the file system layer; the cyclic retransmission task includes: acquiring a data block in the file system that is not synchronized to the object storage according to the second preset period, and according to The data block generates a synchronization task and adds the synchronization task to the object storage queue.
  • the processing module is further configured to directly write if the data block of the file is in the file system layer; if the data block of the file is in the object storage, read and store the data block from the object storage Go to the file system and then overwrite the write.
  • the splicing module is further configured to splicing the file if the file data block is in the object storage and has not expired; if the file data block completes the overwrite writing at the file system layer, re-uploading the file data block to the object storage If the file data block is in the object store but has expired, the data block is read from the object store by offset, and the data block is downloaded to the file system and re-uploaded to form a data block.
  • the number of file data blocks is determined according to the size of the file at this time, and it is determined whether all the data blocks are in the object storage one by one, and is valid, and is read from the disk and restarted from the object storage as needed. Download, read and re-upload data blocks directly from memory. There are four cases required for splicing file data blocks:
  • Case 1 The file data block is already in the object store and has not expired. It can be reused for splicing files and reused, that is, read from disk.
  • Case 2 The file data block is overwritten in the file system layer and then re-uploaded into the object storage. It is re-downloaded from the object storage. You can also set the version variable. Specifically, each file data block has two version variables, one is The file content version, the file data block is zero when it is created, and each subsequent update is incremented. The other is the file data block upload version. After each data block upload, the upload version number is set to the file content version number. In this case, Check whether the file content version and the uploaded version are consistent. If they are inconsistent, they will be retransmitted.
  • Case 3 If the file data block is already in the object store but has expired, it is downloaded from the object store as an offset to the file system and then re-uploaded to form a block.
  • the file size in the local file system is updated, and the corresponding data block of the truncate size boundary is updated, and the file content version is incremented.
  • the truncate file can be correctly reflected in the file splicing stage in the trigger object storage.
  • the processing module is further configured to perform a file out-of-order read operation in the file system layer, and if the file is in the file system, directly read the data block forming file from the file system.
  • the file system layer when the out-of-order read is performed, if the file is in the file system, it is directly read from the file system, and a certain amount of data blocks are continuously read according to the user configuration to reduce the network request amount.
  • Another preferred embodiment of the present invention is a computer storage medium, the computer storage medium storing a program, the program performing the steps of any of the above.

Abstract

L'invention concerne un procédé et un système de mémorisation de fichiers, et un support de stockage informatique. Le procédé consiste : dans une couche de système de fichiers, à bloquer un fichier pour former de multiples blocs de données et à effectuer une opération d'écriture dans le désordre sur les blocs de données (S110) ; à synchroniser les blocs de données dans la file d'attente d'une mémoire d'objets ; si les données des blocs de données changent, à ajouter une tâche dans la file d'attente de la mémoire d'objets ; à exécuter périodiquement les tâches dans la file d'attente de la mémoire d'objets selon une première période prédéfinie (S120) ; dans une couche de mémoire d'objets, à épisser, selon une instruction opérationnelle, des multiples blocs de données dans une séquence prédéfinie dans un fichier (S130) ; et à définir une tâche de recyclage de cycle dans la couche de système de fichiers, la tâche de recyclage de cycle consistant : selon une stratégie prédéfinie, à recycler des blocs de données dans le système de fichiers synchronisé vers la mémoire d'objet et à satisfaire une condition prédéfinie, à supprimer des blocs de données et à marquer l'adresse des blocs de données en tant que mémoire d'objets (S140). Le procédé de mémorisation hiérarchique décrit, qui fusionne le système de fichiers et la mémoire d'objets, peut non seulement prendre en charge la lecture et l'écriture dans le désordre, mais il présente également l'avantage de la mémoire d'objets ; c'est-à-dire que le procédé présente un faible coût, il permet de facilement distribuer des accès simultanés et peut prendre en charge une mémoire de masse.
PCT/CN2018/079683 2017-06-22 2018-03-20 Procédé et système de mémorisation de fichiers, et support de stockage informatique WO2018233331A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710480364.8A CN107229427B (zh) 2017-06-22 2017-06-22 一种文件存储方法、系统及计算机存储介质
CN201710480364.8 2017-06-22

Publications (1)

Publication Number Publication Date
WO2018233331A1 true WO2018233331A1 (fr) 2018-12-27

Family

ID=59936588

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/079683 WO2018233331A1 (fr) 2017-06-22 2018-03-20 Procédé et système de mémorisation de fichiers, et support de stockage informatique

Country Status (2)

Country Link
CN (1) CN107229427B (fr)
WO (1) WO2018233331A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107229427B (zh) * 2017-06-22 2019-10-18 上海七牛信息技术有限公司 一种文件存储方法、系统及计算机存储介质
CN116048424B (zh) * 2023-03-07 2023-06-06 浪潮电子信息产业股份有限公司 Io数据处理方法、装置、设备及介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090228511A1 (en) * 2008-03-05 2009-09-10 Nec Laboratories America, Inc. System and Method for Content Addressable Storage
CN103077245A (zh) * 2013-01-18 2013-05-01 浪潮电子信息产业股份有限公司 一种利用集群计算节点空闲硬盘空间扩展并行文件系统的方法
CN106021256A (zh) * 2015-03-31 2016-10-12 Emc 公司 使用基于云的对象存储的消除重复分布式文件系统
CN107229427A (zh) * 2017-06-22 2017-10-03 上海七牛信息技术有限公司 一种文件存储方法、系统及计算机存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102722506A (zh) * 2009-12-29 2012-10-10 华为数字技术(成都)有限公司 数据存储方法及设备
US8954408B2 (en) * 2011-07-28 2015-02-10 International Business Machines Corporation Allowing writes to complete without obtaining a write lock to a file
US9959207B2 (en) * 2015-06-25 2018-05-01 Vmware, Inc. Log-structured B-tree for handling random writes
CN106021536A (zh) * 2016-05-27 2016-10-12 成都索贝数码科技股份有限公司 一种基于fics对象存储的数据插入方法与系统
CN106406981A (zh) * 2016-09-18 2017-02-15 深圳市深信服电子科技有限公司 一种读、写磁盘数据的方法及虚拟机监视器
CN106776967B (zh) * 2016-12-05 2020-03-27 哈尔滨工业大学(威海) 基于时序聚合算法的海量小文件实时存储方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090228511A1 (en) * 2008-03-05 2009-09-10 Nec Laboratories America, Inc. System and Method for Content Addressable Storage
CN103077245A (zh) * 2013-01-18 2013-05-01 浪潮电子信息产业股份有限公司 一种利用集群计算节点空闲硬盘空间扩展并行文件系统的方法
CN106021256A (zh) * 2015-03-31 2016-10-12 Emc 公司 使用基于云的对象存储的消除重复分布式文件系统
CN107229427A (zh) * 2017-06-22 2017-10-03 上海七牛信息技术有限公司 一种文件存储方法、系统及计算机存储介质

Also Published As

Publication number Publication date
CN107229427B (zh) 2019-10-18
CN107229427A (zh) 2017-10-03

Similar Documents

Publication Publication Date Title
US8738883B2 (en) Snapshot creation from block lists
JP6236533B2 (ja) 差分アップデートパッケージの作成方法及び装置、システム差分アップデート方法及び装置
US9436556B2 (en) Customizable storage system for virtual databases
CN108628874B (zh) 迁移数据的方法、装置、电子设备和可读存储介质
CN107193615B (zh) 项目代码信息的更新部署方法及装置
CN109634774B (zh) 数据备份、恢复方法及装置
WO2015107666A1 (fr) Appareil de stockage et procédé de commande de cache pour appareil de stockage
US10474537B2 (en) Utilizing an incremental backup in a decremental backup system
US10810035B2 (en) Deploying a cloud instance of a user virtual machine
US10372555B1 (en) Reversion operations for data store components
EP3042289A1 (fr) Reproduction d'instantanés et clones
CN109144785B (zh) 用于备份数据的方法和装置
US8966200B1 (en) Pruning free blocks out of a decremental backup chain
JP2016045869A (ja) データの復旧方法、プログラムおよびデータ処理システム
CN110442648B (zh) 数据同步方法和装置
CN111433760A (zh) 用于复制云存储的文件的增强技术
WO2017113694A1 (fr) Système, dispositif et procédé de synchronisation de fichier
WO2018233331A1 (fr) Procédé et système de mémorisation de fichiers, et support de stockage informatique
CN113254394B (zh) 一种快照处理方法、系统、设备及存储介质
US20220382642A1 (en) Reducing bandwidth during synthetic restores from a deduplication file system
CN112579550B (zh) 一种分布式文件系统的元数据信息同步方法及系统
CN106339176B (zh) 中间文件处理方法、客户端、服务器和系统
CN113971041A (zh) 跨版本控制系统的版本同步方法及装置
US9921918B1 (en) Cloud-based data backup and management
US10031961B1 (en) Systems and methods for data replication

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18820563

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18820563

Country of ref document: EP

Kind code of ref document: A1