WO2016101283A1 - 一种数据处理方法装置及系统 - Google Patents

一种数据处理方法装置及系统 Download PDF

Info

Publication number
WO2016101283A1
WO2016101283A1 PCT/CN2014/095223 CN2014095223W WO2016101283A1 WO 2016101283 A1 WO2016101283 A1 WO 2016101283A1 CN 2014095223 W CN2014095223 W CN 2014095223W WO 2016101283 A1 WO2016101283 A1 WO 2016101283A1
Authority
WO
WIPO (PCT)
Prior art keywords
written
strip
stripe
read
version number
Prior art date
Application number
PCT/CN2014/095223
Other languages
English (en)
French (fr)
Inventor
方新
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to SG11201703410YA priority Critical patent/SG11201703410YA/en
Priority to AU2014415350A priority patent/AU2014415350B2/en
Priority to JP2017528138A priority patent/JP6607941B2/ja
Priority to CN201480075382.2A priority patent/CN105993013B/zh
Priority to CA2965715A priority patent/CA2965715C/en
Priority to PCT/CN2014/095223 priority patent/WO2016101283A1/zh
Priority to KR1020177012992A priority patent/KR102030786B1/ko
Priority to BR112017011412-7A priority patent/BR112017011412B1/pt
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201810336937.4A priority patent/CN108733761B/zh
Priority to EP14908851.0A priority patent/EP3203386A4/en
Publication of WO2016101283A1 publication Critical patent/WO2016101283A1/zh
Priority to US15/634,774 priority patent/US20170295239A1/en
Priority to US15/634,819 priority patent/US11032368B2/en
Priority to US17/160,032 priority patent/US11799959B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/128Details of file system snapshots on the file-level, e.g. snapshot creation, administration, deletion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1461Backup scheduling policy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/122File system administration, e.g. details of archiving or snapshots using management policies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/84Using snapshots, i.e. a logical point-in-time copy of the data

Definitions

  • the present invention relates to the field of storage, and in particular, to a data processing method, apparatus, and system.
  • An object-based storage system is a distributed storage system consisting of a storage server and an object-based storage device (OSD).
  • An object-based storage system may also be referred to as an object storage system, and an object-based storage device may also be referred to as an object storage device.
  • the object is the most basic storage content unit.
  • the data can be a file or a volume. Taking a file as an example, the file is split into pieces, and the file fragment has attribute information.
  • the file fragment, the metadata of the file fragment, and the attributes of the file fragment can together constitute an object, and the object is stored in multiple OSDs.
  • a snapshot is a copy of a specified collection of data that marks an image of the corresponding data at a point in time (the point in time when the copy begins).
  • the modified data needs to be stored in the storage system.
  • the prior art uses the object ID as the unique identifier of the object. If the same file is updated, the updated data needs to be stored in the storage device with the new object ID. If the files are frequently updated, the total number of object IDs becomes very large, taking up more storage space and increasing the loss of system resources.
  • the invention provides a data management technology, which can reduce the total number of object IDs and save the object ID from occupying the storage space.
  • the embodiment of the present invention provides a data storage method, including: an object storage device (OSD) receives a stripe write request sent by a client server, where the stripe write request carries a strip to be written, and a stripe version number to be written The offset of the strip to be written, and the object ID of the strip to be written, wherein the stripe version number to be written and the last snapshot of the file or volume to which the strip to be written belongs Corresponding to the snapshot ID, the offset of the strip to be written describes the position of the strip to be written in the object to which the strip is to be written, and the object ID of the strip to be written is the ID of the object to which the strip to be written belongs; The OSD writes the to-be-written strip to a storage location determined by the object ID, the stripe version number to be written, and the stripe offset to be written.
  • OSD object storage device
  • an embodiment of the present invention provides a data storage method, including: an object storage device (OSD) receives a stripe write request sent by a client server, where the stripe write request carries a strip to be written, and a stripe version number to be written And an object ID of the strip to be written, wherein the version number of the strip to be written corresponds to the snapshot ID of the last snapshot of the file or volume to which the strip to be written belongs.
  • OSD object storage device
  • the offset of the strip to be written describes the position of the strip to be written in the object to which the strip is to be written, and the object ID of the strip to be written is the ID of the object to which the strip to be written belongs; Determining whether the stripe version number and the object determined by the object ID have been backed up: if it has been backed up, the OSD writes the strip to be written by the object ID, the stripe version number to be written And a storage location determined by the offset of the strip to be written; if not backed up, the OSD uses the strip to be written to create a splice object, and then writes the splice object to the strip to be written The version number and the storage location determined by the object ID.
  • the embodiment of the present invention provides a data storage method, including: an object storage device (OSD) receives a stripe write request sent by a client server, where the stripe write request carries a strip to be written, and a stripe version number to be written And an object ID of the strip to be written, wherein the version number of the strip to be written corresponds to the snapshot ID of the last snapshot of the file or volume to which the strip to be written belongs.
  • OSD object storage device
  • the offset of the strip to be written describes the position of the strip to be written in the object to which the strip is to be written, and the object ID of the strip to be written is the ID of the object to which the strip to be written belongs; Writing a strip version number, an object ID of the strip to be written, and a stripe determined by the offset of the strip to be written; if the backup has been performed, writing the strip to be written a stripe version number to be written, an object ID of the strip to be written, and a storage location determined by the stripe offset to be written; if not backed up, the initial version object in the OSD is located in the to-be-written
  • the strip offset and the size are the data of the size of the strip to be written back to the location
  • a version number to be written strips, the strips to be offset, and write the ID to be written storage position of the target band is determined, wherein, said The object ID of the initial version object is the same as the object ID of the strip to be written, the version number of the initial version object is an initial version
  • the embodiment of the present invention provides a data storage method, including: an object storage device (OSD) receives a stripe write request sent by a client server, where the stripe write request carries a strip to be written, and a stripe version number to be written And an object ID of the strip to be written, wherein the version number of the strip to be written corresponds to the snapshot ID of the last snapshot of the file or volume to which the strip to be written belongs.
  • OSD object storage device
  • the offset of the to-be-written stripe describes the location of the to-be-written strip in the object to which it belongs
  • the object ID of the strip to be written is the ID of the object to which the strip to be written belongs
  • the OSD judgment is determined by the Whether the object to be written with the strip version number and the object ID has been backed up: if it has been backed up, the OSD writes the strip to be written by the object ID, the version number of the object, and the a storage location to be determined by the strip offset; if not backed up, backing up data in the initial version object in the OSD to a storage location determined by the stripe version number to be written and the object ID, wherein The object ID of the initial version object and the The object ID of the strip to be written is the same, the version number of the initial version object is an initial version number; the OSD writes the strip to be written by the object ID, the initial version number, and the to-be-written The storage location determined by the strip offset.
  • an embodiment of the present invention provides a method for reading data, including: an object storage device (OSD) receives a read stripe request sent by a client server, where the read stripe request carries a stripe size to be read, and a stripe to be read a shift amount, a stripe version number to be read, and an object ID of the strip to be read, wherein the stripe version number to be read corresponds to a snapshot ID of a last snapshot of the file or volume to which the strip to be read belongs,
  • the object ID of the strip to be read is an ID of the object to which the strip to be read belongs; the OSD determines that the object ID, the version number of the strip to be read, and the offset of the strip to be read are determined.
  • the strip has been backed up: if it has been backed up, reading data determined by the object ID, the stripe version number to be read, the stripe offset to be read, and the size of the strip to be read, Transmitting the read data as a strip to be read to the client server; if not backed up, the object ID and the object ID of the to-be-read strip are the same, the version number, and the version number of the strip to be read are different.
  • the object is searched one by one in the order of late to early, until the object storing the valid data in the storage location of the offset of the strip to be read is found, and the valid data found is taken as the object.
  • the read strip is sent to the client server, wherein the version number of the object corresponds to the snapshot ID of the last snapshot of the file or volume before the object is generated.
  • an embodiment of the present invention provides a method for reading data, including: the OSD receiving a read strip request sent by the client server, where the read strip request carries a strip size to be read, and a strip to be read The offset, the version number of the strip to be read, and the object ID of the strip to be read, wherein the version number of the strip to be read corresponds to the snapshot ID of the last snapshot of the file or volume to which the strip to be read belongs.
  • the object ID of the strip to be read is the ID of the object to which the strip to be read belongs; the OSD determines whether the object determined by the object ID and the version number of the strip to be read has been backed up: if it has been backed up, Reading data determined by the object ID, the stripe version number to be read, the stripe offset to be read, and the size of the strip to be read, and sending the read data as a strip to be read To the client server; if not backed up, the object ID is the same as the object ID of the to-be-read stripe, and the version number and the version number of the strip to be read are different, according to the snapshot time of the object.
  • an embodiment of the present invention provides a data processing apparatus, including: a stripe request receiving module, configured to receive a stripe write request sent by a client server, where the stripe write request carries a strip to be written, to be written a stripe version number, an offset of the strip to be written, and an object ID of the strip to be written, wherein the stripe version number to be written and the last snapshot of the file or volume to which the strip to be written belongs Corresponding to the snapshot ID, the offset of the to-be-written stripe describes the location of the strip to be written in the object to which the strip is to be written, and the object ID of the strip to be written is the ID of the object to which the strip to be written belongs; And a storage module, configured to write the to-be-written stripe to a storage location determined by the object ID, the stripe version number to be written, and the to-be-written stripe offset.
  • an embodiment of the present invention provides a data processing apparatus, including: a strip request connection And a receiving module, configured to receive a stripe write request sent by the client server, where the stripe write request carries a strip to be written, a stripe version number to be written, a stripe offset to be written, and the strip to be written
  • An object ID wherein the stripe version number to be written corresponds to a snapshot ID of a last snapshot of the file or volume to which the strip to be written belongs, and the stripe offset to be written describes the strip to be written
  • the object ID of the to-be-written strip is the ID of the object to be written by the stripe; the strip storage module is configured to determine the strip version number and the object ID to be written by the stripe Determine if the object has been backed up:
  • the strip storage module is further configured to write the to-be-written stripe into the storage determined by the object ID, the stripe version number to be written, and the to-be-written strip offset Positioning; if not backed up, the strip storage module is further configured to establish a stitching object by using the to-be-written strip, and then write the stitching object by the stripe version number and the object ID to be written Determined storage location.
  • the embodiment of the present invention provides a data processing apparatus, including: a stripe request receiving module, configured to receive a stripe write request sent by a client server, where the stripe write request carries a strip to be written, to be written a stripe version number, an offset of the strip to be written, and an object ID of the strip to be written, wherein the stripe version number to be written and the last snapshot of the file or volume to which the strip to be written belongs Corresponding to the snapshot ID, the offset of the strip to be written describes the position of the strip to be written in the object to which the strip is to be written, and the ID of the object to be written is the ID of the object to be written by the strip; the strip storage module And determining, by the stripe version number to be written, the object ID of the to-be-written stripe, and the stripe to be written, whether the stripe has been backed up; if it has been backed up, the Writing a stripe writes a storage location determined by the stripe version number to be written, the
  • an embodiment of the present invention provides a data processing apparatus, including: a stripe request receiving module, configured to receive a stripe write request sent by a client server, where the stripe write request carries a strip to be written, to be written a stripe version number, an offset of the strip to be written, and an object ID of the strip to be written, wherein the stripe version number to be written and the last snapshot of the file or volume to which the strip to be written belongs Corresponding to the snapshot ID, the offset of the to-be-written stripe describes the location of the to-be-deleted strip in the object to which it belongs, the object ID of the strip to be written is the ID of the object to which the strip to be written belongs; strip storage a module, configured to determine whether an object determined by the stripe version number and the object ID has been backed up: if it has been backed up, write the strip to be written by the object ID, the object a version number and a storage location determined by the stripe offset to be written;
  • an embodiment of the present invention provides a data processing apparatus, including: a stripe request receiving module, configured to receive a read stripe request sent by a client server, where the read stripe request carries a stripe size to be read, a stripe offset to be read, a stripe version number to be read, and an object ID of the strip to be read, wherein the stripe version number to be read and a snapshot of the last snapshot of the file or volume to which the strip to be read belongs Corresponding to the ID, the object ID of the strip to be read is the ID of the object to which the strip to be read belongs; the strip reading module is configured to determine the object ID, the version number of the strip to be read, and the Whether the stripe determined by the strip offset is backed up: if it has been backed up, reads the object ID, the stripe version number to be read, the stripe offset to be read, and the The data determined by the strip size is read, and the read data is sent to the client server as a strip to be read;
  • the embodiment of the present invention provides a data processing apparatus, including: a stripe request receiving module, configured to receive a read strip request sent by a client server, where the read strip request carries a strip size to be read, a stripe offset to be read, a stripe version number to be read, and an object ID of the strip to be read, wherein the stripe version number to be read and a snapshot of the last snapshot of the file or volume to which the strip to be read belongs
  • the object ID of the strip to be read is the ID of the object to which the strip to be read belongs
  • the stripe reading module is configured to determine the object determined by the object ID and the version number of the strip to be read.
  • Whether it has been backed up if it has been backed up: if it has been backed up, reading the data determined by the object ID, the stripe version number to be read, the stripe offset to be read, and the size of the strip to be read, read
  • the fetched data is sent to the client server as a strip to be read; if not, the object ID is the same as the object ID of the to-be-read strip, and the version number and the version number of the strip to be read are different.
  • the snapshot time of the object In the order of the snapshot time of the object from late to early Searching object by object until an object storing valid data in the storage location of the to-be-read strip offset is found, and the found valid data is sent to the client server as a to-be-read strip, wherein the object
  • the version number corresponds to the snapshot ID of the last snapshot of the file or volume before the object is generated.
  • the embodiment of the present invention provides a data storage system, including a client server and an object storage device, where the client server is configured to: receive a file write request, where the file write request carries data to be written, and data to be written is biased a shift amount, and a name of the file, the data to be written is a part of the file; the client server obtains a file number FID according to the name of the file, and queries the metadata of the file according to the FID to obtain the file.
  • the version number of the file is used as the stripe version number to be written, wherein a version number of the file corresponds to a snapshot ID of a last snapshot of the file;
  • the client server is to be written according to the version a data offset and a size of the data to be written, splitting the data to be written into a plurality of strips including the strip to be written, determining an ID of an object to which the strip to be written belongs, and obtaining an identifier Declaring a write strip offset; creating a strip write request to send to the object storage device;
  • the object storage device is configured to: receive a stripe write request, where the stripe write request carries a strip to be written, a stripe version number to be written, a stripe offset to be written, and an object ID of the strip to be written, wherein the stripe write request
  • the version number of the strip to be written corresponds to the snapshot ID of the last snapshot of the file to which the strip to be written belongs, and the offset of the strip to be written describes the position of the strip
  • an embodiment of the present invention provides a data storage system, including the client server and an object storage device, where the client server is configured to: receive a volume write request, where the volume write request carries data to be written, and data to be written is biased a shift amount and a number ID of the volume, the data to be written is a part of the volume; querying metadata of the volume according to an ID of the volume, obtaining a version number of the volume, wherein a version number of the volume Corresponding to the snapshot ID of the last snapshot of the volume; according to the offset of the data to be written and the size of the data to be written, the data segment to be written is split into a plurality of strips including a strip to be written Determining an ID of the object to which the strip to be written belongs, and obtaining an offset of the strip to be written; creating a stripe write request to be sent to the object storage device; the object storage device is configured to: receive the stripe a write request, the stripe write request carrying the strip to be written,
  • the object ID of the strip to be written is the ID of the object to which the strip to be written belongs; the OSD will treat the strip
  • the write strip writes a storage location determined by the object ID, the stripe version number to be written, and the stripe offset to be written.
  • an embodiment of the present invention provides a data storage system, including a client server and an object storage device, where the client server is configured to: receive a file write request, where the file write request carries data to be written, and data to be written is biased a shift amount, and a name of the file, the data to be written is a part of the file; the client service device obtains a file number FID according to the name of the file, and queries the metadata of the file according to the FID to obtain the file Version number, wherein the version number of the file corresponds to a snapshot ID of a last snapshot of the file;
  • the client server splits the data to be written into a plurality of stripes including the to-be-written strip according to the offset of the data to be written and the size of the data to be written, and determines the to-be-written strip.
  • an embodiment of the present invention provides a data storage system, including a client server and an object storage device, where the client server is configured to: receive a volume write request, where the volume write request carries data to be written, and data to be written is biased a shift amount and a number ID of the volume, the data to be written is a part of the volume; querying metadata of the volume according to an ID of the volume, obtaining a version number of the volume, wherein a version number of the volume Corresponding to the snapshot ID of the last snapshot of the volume; according to the offset of the data to be written and the size of the data to be written, the data segment to be written is split into multiple bands including the to-be-written strip a stripe, determining an ID of the object to which the strip to be written belongs, and obtaining the offset of the strip to be written; creating the stripe write request to be sent to the object storage device; Receiving the stripe write request, where the stripe write request carries the strip to be written, the stripe version number to
  • the embodiment of the present invention provides a read data system, including a client server and an object storage device, where the client server is configured to: receive a file read request, where the file read request carries a file name and data to be read. a size, a data offset to be read, the data to be read is a part of the file; obtaining a file number FID according to the name of the file, and querying the metadata of the file according to the FID to obtain a version number of the file, The version number of the file is used as the stripe version number to be read, wherein the version number of the file corresponds to the snapshot ID of the last snapshot of the file to which the strip to be read belongs; a shift amount and a size of the data to be read, determining an ID of the object to which the strip to be read belongs, and obtaining an offset of the strip to be read; generating a stripe read request and transmitting; the object storage device is configured to: Receiving the read strip request, where the read strip request carries the size of
  • the storage location of the stripe offset stores an object of valid data, and the found valid data is sent to the client server as a strip to be read, wherein the version number of the object and the file before the object is generated or The last snapshot of the volume is fast Correspond to the ID.
  • an embodiment of the present invention provides a read data system, including a client server and An object storage device, the client server is configured to: receive a volume read request, where the volume read request carries a volume ID, a size of data to be read, a data offset to be read, and the data to be read is a part of the volume; Querying the metadata of the volume according to the volume ID, obtaining a version number of the volume, using a version number of the volume as the stripe version number to be read, wherein a version number of the volume and the to-be-read strip Corresponding to the snapshot ID of the last snapshot of the volume to be read; determining the ID of the object to be read by the strip to be read according to the offset of the data to be read and the size of the data to be read, and obtaining the strip to be read Offset; generating a stripe read request and transmitting; the object storage device is configured to: receive the read stripe request, where the read stripe request carries a stripe size to be
  • the object storing the valid data is stored in the storage location of the to-be-read strip offset, and the found valid data is sent to the client server as a to-be-read strip, wherein the version number and location of the object Before the object is generated, the snapshot ID of the last snapshot of the file or volume belongs to it.
  • the combination of the object ID and the version number is used in place of the object ID in the prior art, and the number of object IDs is reduced, thereby reducing the loss of system resources.
  • FIG. 1 is a block diagram of an object storage system according to an embodiment of the present invention.
  • FIG. 2 is a flow chart of an embodiment of a data processing method of the present invention.
  • FIG. 3A and FIG. 3B are schematic diagrams of a strip distribution strategy according to an embodiment of the present invention.
  • FIG. 4 is a diagram of an embodiment of a ROW-based read strip scheme
  • Figure 5 is a diagram of an embodiment of a COW-based read strip scheme
  • FIG. 6 is a schematic structural diagram of an embodiment of a storage system of the present invention.
  • FIG. 7 is a schematic diagram showing the composition of an embodiment of a storage system of the present invention.
  • FIG. 1 is an architectural diagram of an object-based storage system, which may include a client server 11 and an object storage device 12.
  • the object storage device 12 can provide the client server 11 with a storage service of objects.
  • An Object-based Storage Device may be referred to as an object storage device.
  • OSD Object-based Storage Device
  • a storage system is constructed based on an object storage device, and each object storage device can have certain intelligence and can automatically manage the distribution of data thereon.
  • the object is the basic unit of data storage in the system. Taking a file as an example, an object is actually a combination of a part of the data of the file and the attribute information of the part of the data.
  • the attribute information is also called Meta Data, and can be defined based on the file.
  • RAID Redundant Arrays of Independent Disks
  • the properties of each block in the object the object maintains its own properties by communicating with the storage system.
  • all objects have an object identifier (ID) to access the object.
  • ID object identifier
  • the OSD has some intelligence, it can have CPU, memory and storage media.
  • the OSD provides different access interfaces than block devices.
  • Figure 1 is exemplified by two OSDs.
  • the object storage device is usually implemented in a blade structure in the world.
  • the OSD can provide three functions:
  • the OSD manages objects and stores them in a storage medium such as a disk.
  • the OSD does not provide a block interface access method.
  • the client requests data, the object ID and offset are used to read and write data.
  • the OSD optimizes the distribution of locally stored data with its own CPU and memory and supports data prefetching. Since the OSD can intelligently support prefetching of objects, data reading speed can be optimized.
  • the OSD manages the metadata of the objects stored thereon, which is recorded in a data structure called an index node (inode). Metadata usually includes information such as the size of the object, the number of stripes included, and so on. In traditional Network Attached Storage (NAS) systems, this metadata is maintained by the file server.
  • the object storage architecture can manage the metadata with the metadata server, or the main metadata management work in the system can be completed by the OSD, which reduces the overhead of the client.
  • COW Copy On First Write
  • COW Copy On Write
  • ROW Redirect On First Write
  • each OSD is responsible for managing the distribution and retrieval of locally stored data, and 90% of the metadata management work is distributed to intelligent storage. For devices, only 10% of the metadata management work is performed by the metadata server, which improves the performance of system metadata management.
  • the OSD is connected to the network.
  • a device that itself contains storage media, such as disk or tape, and has enough intelligence to manage locally stored data.
  • the storage server communicates directly with the OSD and accesses the data it stores. Since the OSD is intelligent, there is no need for file server intervention.
  • An object is a combination of data and data attributes.
  • Data attributes can be set according to the needs of the application, including data distribution, quality of service, and so on.
  • the client server 11 may be a server based on a NAS protocol or a server storage area network of a Storage Area Network (SAN) protocol. That is, embodiments of the present invention are applicable to both file systems and block systems.
  • SAN Storage Area Network
  • the object in the embodiment of the present invention is from a file, and the file is split into multiple fragments, and one piece and the attributes, metadata, and the like of the piece are common. Make up an object.
  • the volume that is split into slices is a volume.
  • the prior art uses object IDs to determine objects, so the ID of each object is unique when the same file is updated multiple times. A large number of object IDs are generated, and recording the object ID requires a large amount of storage space.
  • the object is determined by using a combination of the object ID and the version number. When the data of one file is updated multiple times, if the offset range of the updated data is unchanged, the corresponding object ID of the updated data may be It remains the same, just updating the different object version numbers, reducing the total number of object IDs maintained by the system.
  • the object version number and the snapshot ID have a corresponding relationship, and between the two snapshots, regardless of how many times the file data is updated, all objects in the same file use the same version number, so The version number occupies very little storage space.
  • the metadata of the object involved in the modification needs to be updated with the metadata stored in the file layer (the volume semantic layer for the block system), and the updated data amount. More.
  • the access node can access the OSD through the client server. If different access nodes can access the objects involved in the modification, metadata synchronization needs to be performed between the nodes. Specifically, when an access node performs metadata on the object, After the update, other access nodes will cause an overall update of all object IDs of the file in which the modified object is located, and frequent synchronization causes a serious expansion of the metadata.
  • the solution provided by the embodiment of the present invention does not have an object ID. Need to change, only need to update the version number at the OSD layer, the amount of updated data is much smaller than the prior art. Further, the object ID in the embodiment of the present invention is obtained by the offset calculation.
  • a flowchart of an embodiment of the data processing method of the present invention is specifically described by taking a file request as an example. If the various terms in the file system are replaced with the corresponding terms of the block system, then another embodiment is.
  • the file is replaced with a volume
  • the file metadata is replaced with the volume metadata
  • the file ID is replaced with the volume ID
  • the file version number is replaced with the volume version number
  • the file ID is replaced with the volume ID.
  • the difference is: (1) the volume metadata has another storage location, not stored in the inode; (2) the volume ID can be obtained directly, without using the volume name conversion.
  • Step 20 Create a snapshot.
  • the target of the snapshot is a file or a file system including the file, that is, the target of the snapshot includes a file, and the snapshot ID is assigned to the snapshot.
  • One is to create a snapshot of the file, and the target of the snapshot is a single file.
  • the other is to create a snapshot of the file system, the target of the snapshot is the entire file system, and the file system includes multiple files.
  • file metadata is saved in different locations.
  • the selected file creates a snapshot, and the snapshot name is set for the file. If the snapshot name is not used, the snapshot ID of the file is assigned.
  • the file snapshot ID is saved as metadata of the file in the inode (index node) of the file. It should be noted that the snapshot ID is a mark of the snapshot, for example, using the point in time at which the snapshot is created as the snapshot ID. Or use the incremented number as the snapshot ID in the order of the point in time when the snapshot was created.
  • the selected file system is taken for snapshot. If the snapshot name is not used, the snapshot ID of the file system is assigned. The assigned snapshot ID is then saved in the root inode of the file system. In this way, it can be considered that the snapshot ID of each file in the file system is the same as the snapshot ID of the file system.
  • the difference from the former method is that the snapshot ID of the file is not stored in the inode of the file, but is stored in the root inode of the file system.
  • file metadata In addition to the file snapshot ID, the file metadata also includes a file identification (FID). File metadata can also include information such as file size (Size) and write time.
  • Size file size
  • write time write time
  • step 20 is a preset step, and other steps of the method embodiment. There is a relative independence.
  • the embodiment of the present invention mainly describes operations performed by the client server and the OSD after creating a snapshot and before creating the next snapshot.
  • Step 21 The client server receives a file write request, where the file write request carries data to be written, a data offset to be written, and a file name.
  • the data to be written is part of the file.
  • a file write request is a write request that can be recognized by the file system.
  • the file write request may be to create a file, or to update an existing file using the data to be written, and the data to be written is part of the file or all of the file.
  • the file write request may further carry the size of the data to be written, so that the subsequent step splits the data to be written into strips according to the offset of the data to be written. It is also possible not to carry the size of the data to be written, because the size of the data to be written can be obtained by measuring the data to be written.
  • the data offset to be written describes the relative position of the data to be written within the file. Specifically, the data offset to be written may describe the distance from the start position of the data to be written relative to the file header. If the data offset to be written is 0, it indicates that the starting position of the data to be written is the starting position of the file to be written. If the data offset to be written is 1 KB, the data size indicating the start position of the data to be written is 1 KB from the start position of the file.
  • the file write request may further carry a file path of the file write request, where the file path indicates a storage location of the file and the mapping relationship table.
  • the file path and file name together determine a file.
  • the combination of file path and file name is /root/mma/a1, where /root/mma/ is the file path, a1 is the file name, and /root/mma/ is stored in this path with the file and mapping table.
  • File names under the same file path are not duplicated.
  • the write request may further carry a storage location of the mapping relationship table, and the mapping relationship table records the mapping relationship between the file name and the FID.
  • Each snapshot ID has a corresponding file version number, and the snapshot ID and the file version number correspond one-to-one.
  • the change rule of the snapshot ID corresponding to the adjacent snapshot time is the same as the change rule of the file version number corresponding to the adjacent snapshot time.
  • the write mode set in the client server is ROW
  • the updated version number is saved in the inode being backed up.
  • the write mode set in the client server is COW, and the updated version number is saved in the inode generated by the backup; optionally, the backed inode can also record the updated version number. For example, if A inode is backed up to generate a B inode, then A inode is the inode being backed up; B inode is the inode generated by the backup.
  • the snapshot ID is generated in step 20.
  • the file version number and the snapshot ID have a corresponding relationship, and the snapshot ID corresponds to the snapshot time. Therefore, the file version number and the snapshot time can also be considered as corresponding.
  • the corresponding relationship means that each file version number corresponds to a unique snapshot ID, and the file version number changes in a similar manner to the snapshot ID. For example, the larger the snapshot ID, the larger the file version number; or the larger the snapshot ID, the smaller the file version number. Between multiple snapshots, the later the snapshot, the larger the ID of the snapshot.
  • volume ID is marked with a volume ID instead of a file name.
  • the role of the volume ID is similar to the FID.
  • the volume does not have a similar concept to the file path. Therefore, in step 22, the step of querying the mapping relationship table is no longer needed, and the volume metadata can be directly queried by the volume ID to obtain the file version number.
  • Step 22 The client server uses the file name to query the mapping relationship table, obtains the file number (FID) of the file where the data to be written is located, and obtains the file version number according to the FID query file metadata.
  • FID file number
  • the mapping relationship table records the mapping relationship between the file name and the FID, and the file name and the FID correspond one-to-one.
  • the storage location of the mapping relationship table can be carried in the file write request and obtained by the client server from the write request.
  • the mapping relationship table may also be pre-stored in the client server by the client server, and the client server finds the mapping relationship table according to the file path.
  • the mapping relationship table can also be stored in other storage devices.
  • file metadata can be updated.
  • file metadata can be saved in the inode information.
  • the file path indicates the storage location of the inode: As can be seen from the above, for ROW, the version number is stored in the inode being backed up, so this step reads the inode being backed up. For COW, the version number is saved in the inode generated by the backup. Therefore, this step reads the inode generated by the backup.
  • the file version number and the file snapshot ID have a one-to-one correspondence.
  • the client server After the client server generates the snapshot ID, it generates a file version number corresponding to one of the files.
  • the snapshot ID can be directly used as the file version number, or the snapshot ID can be calculated as the file version number. If the snapshot created later has a larger snapshot ID, an optional way is: the later the snapshot is created, the larger the snapshot ID is. The other alternative is: the later the snapshot is created, The smaller the value of its snapshot ID.
  • the strip version number to be written is sometimes also used.
  • the version number to be written is the file version number of the file to be written. That is, different strips from the same file have the same strip version number.
  • the object version number (or the version number of the object) is the file version number of the file to which the strip is to be written. That is, different objects from the same file have the same object version number.
  • step 23 the client server splits the data to be written into a plurality of stripes including a strip to be written.
  • the offset of the strip to be written and the ID of the object to be written by the stripe are obtained, and the ID of the object to which the stripe to be written belongs is also referred to as an object ID.
  • the client server splits the data into one or more stripes according to the size of the stripe.
  • a strip is a certain size of data. Wherein, when the data to be written is less than or equal to the size of a single strip, it is split into 1 strip; otherwise, it is split into multiple strips.
  • the strips split from the same file are the same size.
  • Stripe size can be saved in file metadata, in which case different files can use different stripe sizes.
  • the stripe size may also not be stored in the metadata of the file to which the object belongs, but the files in the entire file system share a stripe size, in which case different files use the same stripe size, stripe size. Saved in the root inode of the file system.
  • the object can be seen as a container that can hold strips.
  • the strip of this step refers to the strip.
  • the data strips are separated.
  • a plurality of check strips are also generated to perform data protection on the data strips.
  • the strips in this step include both data strips and check strips.
  • the total number of stripes owned by each object can be saved in the file metadata, in which case the total number of stripes owned by objects of different files can be different.
  • the total number of stripes owned by each object may also not be stored in the metadata of the file to which the object belongs, in which case the total number of stripes owned by objects of different files is the same.
  • the starting position of the data to be written in the file can be known from the data offset to be written.
  • the end position of the data to be written in the file can be known from the data offset to be written and the size of the data to be written. If the starting position of the data to be written is not an integer multiple of the stripe size, or the value of the offset of the ending position plus 1 is not an integer multiple of the stripe size, the data to be written is first split according to the stripe size, and the boundary is split. Is an integer multiple of the strip size. If splitting produces data that is less than one strip in size (this data can also be called dirty data for strips), it is padded to form a strip. Due to the completion operation of this step, the strip and strip offset mentioned in the subsequent steps refer to the strip and strip offset after completion, unless otherwise specified.
  • the offset of the data to be written is the relative position of the data to be written in the file.
  • Another complementation method is: if the starting position of the data to be written is not an integer multiple of the strip size, or the value of the ending offset plus one is not an integer multiple of the strip size, the strip can be complemented. Make the strips after splitting the same size and there is no blank in the strip. The data already stored in the OSD can be read out as data for completion.
  • the offset of the data to be written ranges from 4 KB to 300 KB, and the size of the stripe is 256 KB. Then, the data to be written can be filled to form data with an offset range of 0KB-511KB, and then split into 2 strips of 0KB-255KB and 256KB-511KB, so that the size of each strip is 256KB.
  • the stripe distribution policy is provided by the file system of the client server. Describes the object to which the strip belongs, that is, the correspondence between the strip and the object. Specifically, it may be a correspondence between the offset of the stripe and the object.
  • the object ID uniquely marks an object, and the IDs of the objects belonging to the same file are different, and the IDs of the objects of different files are also different.
  • the object ID and the FID of the file to which the object belongs may have a corresponding relationship. That is to say, for example, the object ID can know the file from which the object represented by this object ID comes.
  • an optional object ID generation method is that the object ID is composed of 64-bit binary numbers, wherein the first 32 bits are the ID of the file to which the object belongs, the last 32 bits are given by the client server, and the last 32 bits are unique within the file.
  • the same 32 files are different from the last 32 bits of the object, for example using the object number in the file.
  • the corresponding FID can be known from the first 32 bits of the object ID.
  • the relationship between the object ID and the volume ID can also be established.
  • Another optional object ID generation method is: the object ID is composed of 48-bit binary numbers, the first 16 bits correspond to the file, and the first 16 bits of different files are different; the last 32 bits are given by the client server, and the last 32 bits are unique within the file. The same 32 files are different from the last 32 bits of the object.
  • the ID and the FID storage of the file to which the object belongs may also have no corresponding relationship.
  • Figures 3A and 3B illustrate two different strip distribution strategies.
  • the stripe index describes the offset relationship of the stripe in the file.
  • the stripe index is an integer greater than or equal to 0.
  • the smallest stripe index is 0, the second stripe index is 1, and the third stripe index is 2, ..., and so on.
  • the two adjacent strips of the index value are also adjacent to each other in the file.
  • An optional stripe distribution strategy is, as shown in Figure 3A: (1) the size of the objects belonging to the same file is fixed. Since the strip sizes of the same file are the same, it means that different objects have the same The total number of strips is the same; (2) strips are filled in the previous order according to the index order The next object is loaded, that is, according to the order of the offset of the stripe in the data to be written, successive strips belong to the same object. As shown in Fig. 3A, each object is fixed by 4 strips.
  • the first object holds the 0th to 3th strips
  • the second object holds 4 to 7 strips
  • the third object holds the 8th to 11th strips. correspondingly, the ID of the first object is 0, the ID of the second object is 1, and the ID of the third object is 3...
  • the strip offset is used to describe the relative position of the strip within the object, and in particular, the relative position of the strip's starting data within the object.
  • Stripe offset (the number of stripes in the strip index % object) ⁇ strip size.
  • % means taking the previous item except the following items, and then taking the remainder. Therefore, the value of "the number of stripes in the strip index % object" is the remainder of the calculated strip index divided by the number of stripes in the object.
  • FIG. 3B Another optional stripe distribution strategy is shown in Figure 3B: (1) the size of the objects in the same file is not fixed, that is, different objects of the same file can have different total number of stripes; (2) total number of objects Fixed, that is, different files have the same number of objects, as shown in Figure 3B, a total of 3 objects.
  • the stripe size is 256KB, and the total number of objects is fixed at 3.
  • the first strip (strip 0) is in the first object (object 0)
  • the second strip (strip 1) is in the second.
  • the 4th strip (strip 3) is again in the first object
  • the 5th strip (strip 4) is in the second object.
  • a stripe index is an integer greater than or equal to 0, describing the positional relationship between strips in a file.
  • the offset of each strip within the belonging object can be determined.
  • the object number in the file can be the remainder obtained by dividing the strip index by the total number of objects in the file.
  • Stripe offset (strip index / number of objects) ⁇ strip size.
  • the slice index can be determined by the offset of the data to be written. For example, for the entire file, the split data is located in the stripe of the first object (strip 0), and the offset of the data to be written is located in the 5th strip of the object 1 (strips) 4). Then, in the stripe generated by splitting the data to be written, the index of the first stripe is 4, and the indexes of the remaining strips are analogously.
  • the above describes two schemes for calculating the ID of the object to which the strip belongs, according to the strip distribution strategy. Different, there are other implementations. Different distribution strategies can use different parameters, and these parameters can usually be obtained from the client server.
  • step 24 the client server selects an OSD for storing the strip to be written.
  • this step can be performed by the object storage client of the client server.
  • An optional algorithm is to determine the OSD storing the strip to be written according to the FID of the strip to be written. For example, the hash value of the FID is divided by the total number of OSDs, and the remainder is used as the number of the OSD storing the strip to be written. That is, the hash value of the FID is modulo the total number of OSDs.
  • the OSD of the storage strip may also be determined according to the FID of the strip to be written and the object ID.
  • the algorithm can be chosen arbitrarily, as long as you can choose an OSD.
  • Step 25 The client server sends a stripe write request to the OSD, and the stripe write request carries the strip to be written, the stripe version number to be written, the stripe offset to be written, and the object ID to be written by the stripe. Optional, you can also allocate funds to write the strip size.
  • the write mode may also be sent, so that the OSD writes the to-be-written strip according to the write mode specified by the client server.
  • the write mode is ROW or COW. If the OSD only supports one write mode, then you can not send the write mode to the OSD.
  • step 26 the OSD receives the stripe write request and writes the strip to be written to the storage medium of the OSD.
  • the OSD can write the strip to be written to the storage medium directly in the default write mode without confirming whether the write mode is ROW or COW.
  • the OSD temporarily stores the data in the cache. This step stores the data to be written in the cache to the storage medium.
  • the backup of the data is marked in the OSD, and the object ID is used as an index to query the backup mark granularity of the data from the OSD. It is also possible to set all strips received by the OSD by default to be stored at the same backup mark granularity. Stripes belonging to the same file use the same record granularity. For an actual device, you can only support objects as backup mark granularity or only stripe as backup mark granularity. In this case, the OSD can store directly without querying the backup tag granularity.
  • the two parameters of the object ID and the version number can jointly determine one object, in this embodiment, the set of these two parameters is referred to as an object key parameter. Since the strip offset is determined after the object is determined, the strip can be determined. That is to say, the three parameters of the object ID, the version number, and the stripe offset can collectively determine one stripe, so the set of the constituents of these three parameters is called a stripe key parameter.
  • object key parameters can point to a storage location for storing objects. Specifically, you can point to a starting address for the object to use. Optionally, you can also point to an address segment for the object to use. Similarly, the key parameters of the stripe can also point to a starting address or an address segment for storing strips. The starting address and the address segment can be physical addresses or logical addresses.
  • the OSD has recorded the key parameters of the object carried in the stripe write request before receiving the stripe write request, and allocates a storage location for the stripe represented by the set of key parameters.
  • the OSD does not record the set of key parameters, and does not allocate storage locations for the strips represented by the set of key parameters. After the OSD receives the strip write request, the OSD assigns key parameters to the set of objects. storage location.
  • Object set A collection of objects with the same object ID and different version numbers.
  • the object set includes at least one object.
  • An object set can be a logical concept without real division.
  • Object ID Determined by the range of offsets within the file for data carried within the object. If the same file has been snapshotted multiple times, and the changed data is stored to the OSD after each snapshot, the changed data is the same if the object IDs of the data with the same offset are the same.
  • the object or strip is marked for backup.
  • the backup markup granularity can be a stripe or an object. If the smallest unit to be marked is a stripe, the backup markup granularity is a stripe. If the smallest unit to be marked is an object, the granularity of the backup markup is the object.
  • Backup markup of the object The object identified by the object ID+version number has been backed up. Specifically, after the snapshot corresponding to the version number is created, whether the object corresponding to the object ID is backed up. 1 means that it has been backed up, 0 means it has not been backed up. The backup mark of the object is 0. There are two cases: one case is that the object determined by the object ID+version number is modified, and the backup operation has not been performed; in another case, the object determined by the object ID+version number is not modified.
  • Striped backup tag The object ID + version number + stripe offset determined strip has been backed up. Specifically, after the snapshot corresponding to the version number is created, whether the strip corresponding to the object ID+strip offset is backed up. 1 means that it has been backed up, 0 means it has not been backed up. The backup mark of the stripe is 0. There are two cases: one case is that the strip determined by the object ID + version number + stripe offset is modified, and the backup operation has not been performed; the other case is that there is no object ID+ version. The band determined by the number + strip offset is modified.
  • the OSD there are four possibilities for the OSD to write the strip to be written: (1) the write mode is ROW, the backup mark granularity is stripe; (2) the write mode is ROW, the backup mark granularity is the object; (3) the write mode Is COW, the backup markup granularity is stripe; (4) the write mode is COW, and the backup mark granularity is the object.
  • the write mode is ROW, the backup mark granularity is stripe
  • the write mode Is COW the backup markup granularity is stripe
  • the write mode is COW
  • the backup mark granularity is the object.
  • Method 1 For ROW, the backup mark granularity of the data in the OSD is stripe.
  • the strip to be written is directly written into the OSD according to the storage location determined by the strip key parameters in the strip request.
  • this step can also mark the storage location (starting storage address or address segment) occupied by the written stripe as "written valid data".
  • the storage location occupied by the stripe stored in the storage medium of the OSD is also referred to as stripe space.
  • Bits can be used to mark whether individual stripes within the object are backed up. Such as taking this strip The storage location flag is set to 1, 1 indicates that data has been written, and 0 indicates no data.
  • a strip index can be used to describe the order of the strips in the object, and the strips are used to mark each strip within the object. For example, if there are 4 strip spaces in total, then 0000 means that no data is written in any of the four stripe spaces; 0010 means that only the second stripe space is written with data; 0101 means that the first and third stripe spaces are written. Data is entered, and no data is written in the 2nd and 4th strip spaces.
  • Nth (N is a natural number) strip space described in this embodiment refers to the relative position of the strip space within the object to which the strip belongs, and does not refer to the strip index.
  • the offset of the strip can be used.
  • the offset of the stripe is the offset of the strip within the object. If the strip space has been marked as "backed up" before, this step can be repeated without changing the mark.
  • Method 2 For ROW, the backup mark granularity of the data in the OSD is the object.
  • the second method determines that the granularity of the backup mark is different, and the identifier bit of the judgment strip becomes the identification bit of the judgment object.
  • the key parameters of the object carried in the stripe write request are queried in the write record of the OSD, and it is determined whether the storage location pointed to by the key parameter of the object stores valid data.
  • the instinctive embodiment can use the decision flag to determine if a memory location stores valid data. For example, the flag bit is 1 indicating that valid data is stored, and the flag bit is 0. The storage location does not store valid data.
  • the identification bit of the storage location pointed to by the key parameter of the object it can be determined whether the stripe write request received this time is the first write operation to the object after the snapshot is created. For example, when the flag is 0 or the flag is not found, it indicates that it is the first write after the snapshot; the flag bit is 1 indicates that it is not the first write after the snapshot.
  • the strip to be written is directly written into the storage location occupied by the object, and the specific write position can be determined by the strip key parameter.
  • the stripe write request is a first time after the snapshot of this object, please write to this object. begging. Then, the stripe to be written in the request and the stripe combination obtained from other objects in the OSD are stitched into a complete object called a stitched object. Specifically, the remaining part comes from the object that has the largest version number (but smaller than the version number carried in the stripe request) among the objects that have valid data.
  • the object with the largest version number is selected, and a stripe different from the offset of the strip to be written is obtained therefrom, and
  • the strips to be written together constitute a splicing object.
  • the set of objects stored in the OSD that are the same as the object ID of the strip to be written and whose version number is different is referred to as the object set of the object ID of the strip to be written.
  • the write mode is ROW
  • the later the snapshot time the larger the corresponding version number of the object.
  • the object ID of the strip to be written is the ID of the object to which the strip to be written belongs.
  • Each object consists of 32 strips, and the strip to be written by the OSD is the 15th, for the remaining 31 strips, ie 1st - 14th, and 16th - 32nd
  • the objects from which the strips come are: the objects of the same object ID that have valid data recorded in the OSD after the previous snapshot.
  • the identification bit of this object is recorded as having been backed up, for example, the flag bit is set to 1. This means that the first stripe write operation after the snapshot has been completed, that is, if any stripe in this object is written again before the next snapshot, it will no longer be the first to the object after the snapshot. Writes once, so you don't need a backup operation to write strips directly.
  • the same object ID may have multiple objects, and each snapshot ID corresponds to one.
  • the time of writing these objects into the OSD is different, and the version number of the object adjacent to the writing time is adjacent, and the writing time is adjacent. The later the version number is larger.
  • the newly written object becomes a new member in the object set.
  • Mode 3 The write mode is COW, and the backup markup granularity of the data is stripe.
  • a storage location can be determined by striping the object's key parameters in the request plus the stripe offset. First, it is detected whether the storage location determined by the key parameters of the strip to be written has stored data. If the judgment result is no or no record is found, it means that the current write request is the first write request after the snapshot is created, and needs to be performed first. The backup operation then writes the strip to be written.
  • the latest stripe that the OSD has stored is the strip that was last sent by the client server. In this embodiment, it is a stripe that has an object ID, a version number of 0, and an offset of the offset to be written in the stripe that has been stored in the OSD. Subsequent receipt of the stripe write request can directly perform the write operation of the strip to be written, without having to make a backup.
  • the latest object that the OSD has stored always uses the same version number, for example, using 0 or NULL as the version number, which is called the initial version number in this embodiment.
  • the version number which is called the initial version number in this embodiment.
  • the strip version number used when writing data to the OSD before the first snapshot of the file is the initial version number.
  • the value of the initial version number can be either 0 or NULL.
  • the stripe key parameters carried in the stripe write request are pointed to the storage location and the data is already stored.
  • the OSD receives the COW write request for the offset position to be written again, the data can no longer be migrated, and the received stripe is written into the version number 0 object in an overwrite manner, to be written.
  • the strip to be written is written to the storage location determined by the stripe object ID to be written, the initial version number, and the stripe offset to be written.
  • this step can also mark the storage location of the strip to be written as the written valid data.
  • marking method refer to the first method.
  • the write mode is COW
  • the backup mark granularity of the data in the OSD is the object.
  • mode four The difference between mode four and mode three is that the data backup tag granularity is changed from stripe to object, and the granularity of backup is also changed from stripe to object.
  • a storage location can be determined by writing the key parameters of the object in the request.
  • OSD use pair
  • the key parameters are queried in the write record of the OSD, and it is determined whether the storage location pointed to by the key parameter of the object to be written stores valid data.
  • the embodiment may mark the object by using the identifier bit, for example, the identifier bit 1 indicates that the valid data is stored, the identifier bit is 0, or the identifier of the object key parameter is not found in the write record of the OSD. Bit, indicating that no valid data is stored.
  • the backup of the object data is required. Specifically, if valid data is stored, it means that after the snapshot is created, the object ID of the object to be written and the version number to be written by the stripe are backed up, and no backup is needed; if no storage is valid.
  • the data or the record of the key parameters of the object in the stripe write request is not found in the OSD, which means that the backup needs to be performed in this step before the strip to be written in the stripe write request received this time can be written.
  • the strip to be written is directly written into the object ID to be written, the version number 0, and the position to be written by the strip offset.
  • all strips in the 0 version object are first backed up to the storage location pointed to by the object key parameter in the stripe write request. After the backup is complete, mark the storage location pointed to by the object's key data in the stripe write request as 1. Then write the strip to be written to the storage location originally occupied by the 0 version object. The write position is determined by the object ID of the strip to be written, the strip version number, and the initial version number.
  • step 26 the OSD sends a response message that the strip to be written is successfully stored to the client server.
  • step 26 is performed before the next snapshot occurs. That is to say, steps 21-26 are performed after the first snapshot and before the next snapshot. Steps 21-26 are the flow of writing the strip to be written to the OSD. The following describes how to read out the strips that have been written to the OSD. The read and write processes are relatively independent.
  • Step 27 The client server receives the file read request, where the file write request carries the file name, the size of the data to be read, and the offset of the data to be read.
  • a file read request can also carry a file path for a file read request.
  • the file path records the storage location of the mapping table.
  • the file path and file name can be determined to uniquely identify a file.
  • a file read request is a write request that can be recognized by the file system.
  • the file read request request reads out a complete file, or part of the data of the file.
  • the data offset to be read describes the relative position of the data to be read in the file. Specifically, the data offset to be read may describe the distance from the start position of the data to be read relative to the file header. If the data offset to be read is 0, it indicates that the starting position of the data to be read is the starting position of the file to be read. If the data offset to be read is 2 KB, the data size indicating the start position of the data to be read is 2 KB from the start position of the file.
  • the file read request may further carry a file path, where the file path records the storage location of the mapping relationship table. See step 21 for details on the mapping table.
  • the file name may be the file name of the file where the data to be read is located, or the file name of a snapshot of the file where the data to be read is located. If it is the former, it indicates that the file read request wants to access the latest pending data; if it is the latter, it indicates that the file read request wants to access the data to be read of a certain snapshot.
  • Step 28 The client server uses the file name to query the mapping relationship table, obtains the FID of the file where the data to be read is located, and queries the file metadata according to the FID to obtain the file version number.
  • the file path of the storage mapping table is the file path of the file in which the data to be read is located.
  • the file version number is obtained according to the FID query metadata corresponding to the file.
  • the file path of the mapping table is the path where the snapshot file is located.
  • the file version number is obtained according to the FID query metadata of the snapshot file.
  • the mapping relationship table records the mapping relationship between the file name and the FID, and the file name and the FID correspond one-to-one. Refer to Step 21 and Step 22 for the description of the FID and the relationship between the FID and the file version number.
  • the storage location of the mapping relationship table can be carried in the file read request and obtained by the client server from the write request.
  • the mapping relationship table can also be pre-stored in the client server by the client server, the client server root Find the mapping table according to the file path.
  • the mapping relationship table can also be stored in other storage devices.
  • Metadata may be stored in the inode of the file or in the root inode of the file system.
  • the snapshot ID and the file version number have a one-to-one relationship, so the client server can obtain the file version number according to the snapshot ID. This correspondence can be stored in the file metadata.
  • the client server converts the file read request process into a plurality of read requests including a stripe read request.
  • Each stripe read request is used to request the reading of a stripe
  • the stripe read request is used to request the OSD to read out the strip to be read.
  • the stripe read request carries: the version number of the strip to be read, the offset of the strip to be read, the size of the strip to be read, and the object ID of the strip to be read.
  • the offset of each strip to be read in the strip to be read can be known according to the size of the data to be read and the offset of the data to be read.
  • the method for generating a stripe can be divided into strips by the offset of the data to be written and the length of the data to be written according to the strip size, and the offset of the strip to be read is obtained.
  • this step can also obtain the offset of each strip to be read by the strip size, the data offset to be read, and the length of the data to be read.
  • the stripe size can come from a file inode, in which case different strip sizes can be used for different files. It is also possible to share a stripe size for all files in the entire system.
  • the ID of the object to be read can be obtained in the same manner as in step 23. It should be noted that, regardless of whether the file name is the file name of the file in which the data to be read is located or the file name of the snapshot, the FID used for the object ID corresponding to the query read request is the FID of the file in which the data to be read is located.
  • step 30 the client server selects an OSD for transmitting a stripe read request.
  • this step can be performed by the object storage client of the client server.
  • Stripe read requests and stripe write requests for the same stripe must correspond to the same OSD.
  • One possible solution is to use the same OSD selection algorithm as step 24.
  • step 31 the client server sends a stripe read request to the OSD selected in step 30.
  • the version number of the strip to be read is actually the version number of the file to which the strip to be read belongs.
  • the write mode can also be sent to the OSD, and the write mode and the write mode in the stripe write request in step 25 are consistent.
  • the object ID of the strip to be read is the ID of the object to which the strip to be read belongs.
  • Step 32 The OSD receives the stripe read request, searches for the storage location of the strip to be read, and sends the strip to be read to the client server.
  • the storage location of the strip to be read may be the start address of the strip to be read, read from the start address, read out a strip size data, and the read data is the strip to be read.
  • step 26 there are many possibilities for the strip to be written. Therefore, the OSD can read out the strip to be read in a corresponding manner, which will be separately described below.
  • the method of determining whether the stripe/object has been backed up may use the identifier bit introduced in step 26. For example, if the flag bit is 1 to indicate that it has been backed up, the flag is 0 to indicate that it is not backed up.
  • the version number carried in the stripe read request is the initial version number
  • the way in which the strip to be read is read is different from other cases. Equivalent to specifying the initial version number as the maximum version number (even if the value of the initial version number is 0). Therefore, for example, the version number introduced in step 26 is 0 because it is already the largest version number. Then, it is not necessary to determine whether the strip determined by the key parameters of the strip to be read has been backed up, and directly read out the data in the storage location as a strip to be read and send it to the client server.
  • the strip to be read can be read in the following two ways. In addition to this special exception, other situations can be divided into the following two ways.
  • the backup mark granularity of the data in the OSD is stripe.
  • Step 26 Determine whether the strip determined by the key parameters carried in the strip to be read is backed up. In other words, it is judged whether the strip of the object ID to be read, the strip to be read, and the storage location determined by the offset of the strip to be read have been backed up. In this step, the offset of the strip to be read can be converted into the number of the strip to be read in the object to which the strip to be read belongs. See Method 1 of Step 26 for the conversion method.
  • the stripe determined by the object ID of the strip to be read, the strip to be read, and the offset of the strip to be read is read out, and sent as a strip to be read to the client server.
  • the offset of the strip to be read is used, according to the object
  • the snapshot time is searched from object to object in the late to early order until a strip marked as backed up is found, and the found strip is sent as a strip to be read to the client server.
  • the snapshot time of the object refers to the time of the last snapshot of the file or the file system containing the file before the object is generated.
  • the search is performed from late to early. Specifically, for ROW, the search is performed one by one according to the version number from the largest to the smallest; for the COW, the version number is searched one by one from small to large.
  • the version number of the strip to be read is: the later the snapshot time point, the larger the version number. Then the order of the strips to be read in this step is reversed.
  • the backup mark granularity of the data in the OSD is the object.
  • the valid data determined by the object ID of the strip to be read, the version number of the strip to be read, and the offset of the strip to be read are read out, and sent as data to be read to the client server.
  • the objects in the object set are sequentially searched according to the snapshot version number in the order of small to large, until a valid snapshot object is found.
  • the strip to be read is read out from the strip to be read according to the offset to be read and sent to the client server.
  • file A is composed of object 1, object 2, and object 3. After these objects are first stored in the OSD, their version number is 0.
  • object 1.0 represents an object whose object ID is 1 and whose version number is 0.
  • object 3.2 represents an object whose object ID is 3 and whose version number is 2.
  • the solid object indicates that the object has a backup, and the dotted object has no backup.
  • the object set in which the object 1.0 is located includes the object 1.0 and the object 1.3.
  • the object set where object 2 is located including object 2.0 and object 2.1.
  • the objects in which object 3.0 is located include object 3.0, object 3.1, and object 3.2.
  • the direction of the arrow in Figure 4 marks the lookup relationship of the object. If the stripe read request wishes to read the stripe in the object 1.2, as shown in the figure, the object is not backed up, and the object 1.0 has a backup in the order of the version number, so the strip in the read and write 1.0 is read. . For the same reason, if the stripe read request wishes to read the object to be read in object 2.2 or object 2.3, then the data in object 2.1 is actually read. Of course, if the stripe read request wishes to read the data in object 1.3 or object 2.1 or object 3.2, since these objects have been backed up, they can be read directly.
  • FIG. 5 is a COW-based read strip scheme. The difference from FIG. 4 is that the search order is reversed, and the search is performed in the order of the version numbers from small to large.
  • the backup mark granularity is a stripe
  • the principle is similar to that of FIG. 4 and FIG. 5, except that the target object marked by the backup mark is a strip in the object instead of the object.
  • the client server receives the data of the stripe read request and the return of other read requests, and splicing them together to generate the data to be read.
  • FIG. 6 is the hardware that performs the above method.
  • the interface 413 of the client server 41 is connected to the interface 423 of the object storage device 42.
  • the client server 41 is composed of a processor 411, a storage medium 412, and an interface 413.
  • the processor 411 is connected to the storage medium 412 and the interface 413.
  • the storage medium 412 is, for example, a memory in which a computer program is stored.
  • the processor 411 runs a program in the storage medium 412 to perform the steps performed by the client server in the above method.
  • Interface 413 provides an interface to the OSD, such as sending a read strip request or writing a strip request to the OSD.
  • the client server 41 may not set the persistent storage, that is, all the information required by the client server 41 involved in the above method may be recorded in the volatile storage medium 412 of the client server 41.
  • the OSD 42 includes a processor 421, a storage medium 422, an interface 423, and a hard disk 424.
  • the processor 421 is connected to the storage medium 422 and the interface 423; the hard disk 424 is connected to the storage medium 422.
  • Storage medium 422 can be a volatile medium, such as a memory, in which a computer program is stored.
  • the processor 421 runs a program in the storage medium 422 to perform the steps performed by the client server in the above method.
  • Interface 423 provides an interface to the OSD, such as sending a read strip request or writing a strip request to the OSD.
  • Hard disk 424 provides persistent storage for strips, such as providing physical storage space for strips/objects to be written, and storing strips/objects to be read, typically non-volatile storage media.
  • the movie 424 can be replaced by a medium such as a flash memory or a rewritable optical disk.
  • FIG. 7 there is shown a block diagram of a data processing system in accordance with an embodiment of the present invention.
  • the data processing system is composed of a client service device 51 and an object storage device 52.
  • the client service device 51 may be a physical device such as a server, or may be a virtual module implemented by software running on the server;
  • the object storage device 52 may be a physical device such as an object storage device, or may be operated by the object storage device.
  • the client service device 51 can be configured to perform the steps performed by the client server in the above method; the object storage device 52 can be configured to perform the steps performed by the object storage device in the above method,
  • the client service device 51 includes a stripe request generating module 511 and a stripe request sending module 512 connected to the stripe request generating module 511.
  • the snapshot module 513 connected to the stripe request generating module 511 may also be included.
  • the object storage device 52 includes a stripe request receiving module 521, and a stripe storage module 522 and a stripe reading module 523 connected to the stripe receiving module 521.
  • the stripe reading module is not required to implement the function of the memory strip.
  • a stripe storage module is not required to implement the function of reading strips.
  • the stripe request receiving module 521 and the stripe request sending module 512 are connected.
  • the snapshot module 513 is configured to create a snapshot.
  • the target of the snapshot includes a file, and the snapshot ID is assigned to the snapshot.
  • file metadata is saved in different locations.
  • the selected file creates a snapshot, and the snapshot name is set for the file. If the snapshot name is not used, the snapshot ID of the file is assigned.
  • the file snapshot ID is saved as metadata of the file in the inode (index node) of the file. It should be noted that the snapshot ID is a mark of the snapshot, for example, using the point in time at which the snapshot is created as the snapshot ID. Or use the incremented number as the snapshot ID in the order of the point in time when the snapshot was created.
  • the selected file system is taken for snapshot. If the snapshot name is not used, the snapshot ID of the file system is assigned. The assigned snapshot ID is then saved in the root inode of the file system. In this way, it can be considered that the snapshot ID of each file in the file system is the same as the snapshot ID of the file system.
  • the difference from the former method is that the snapshot ID of the file is not stored in the inode of the file, but is stored in the root inode of the file system.
  • file metadata In addition to the file snapshot ID, the file metadata also includes the file number (FID, File Identification). File metadata can also include file size (Size), write time, and so on.
  • FID file number
  • File metadata can also include file size (Size), write time, and so on.
  • the snapshot module 513 is optional.
  • the embodiment of the present invention mainly describes the operations of the client service device and the object storage device after creating a snapshot and before creating the next snapshot.
  • the stripe request generating module 511 is configured to receive a file write request, where the file write request carries data to be written, a data offset to be written, and a file name.
  • the data to be written is part of the file.
  • the function of the stripe request generation module 511 may be performed by a file system program of the client server.
  • a file write request is a write request that can be recognized by the file system.
  • the file write request may be to create a file, or to update an existing file using the data to be written, and the data to be written is part of the file or all of the file.
  • the file write request may further carry a data size to be written, so as to subsequently split the data to be written into strips according to the data offset to be written. It is also possible not to carry the size of the data to be written, because the size of the data to be written can be obtained by measuring the data to be written.
  • the data offset to be written describes the relative position of the data to be written within the file. Specifically, The write data offset can describe the distance from the starting position of the data to be written relative to the file header. If the data offset to be written is 0, it indicates that the starting position of the data to be written is the starting position of the file to be written. If the data offset to be written is 1 KB, the data size indicating the start position of the data to be written is 1 KB from the start position of the file.
  • the file write request may further carry a file path of the file write request, where the file path indicates a storage location of the file and the mapping relationship table.
  • the file path and file name together determine a file.
  • the combination of file path and file name is /root/mma/a1, where /root/mma/ is the file path, a1 is the file name, and /root/mma/ is stored in this path with the file and mapping table.
  • File names under the same file path are not duplicated.
  • the write request may further carry a storage location of the mapping relationship table, and the mapping relationship table records the mapping relationship between the file name and the FID.
  • mapping relationship between the snapshot ID and the file version number can be recorded, and the following two operations can be performed.
  • the write mode set in the client server is ROW
  • the updated version number is saved in the inode being backed up.
  • the write mode set in the client server is COW, and the updated version number is saved in the inode generated by the backup; optionally, the backed inode can also record the updated version number. For example, if A inode is backed up to generate a B inode, then A inode is the inode being backed up; B inode is the inode generated by the backup.
  • the file version number and the snapshot ID have a corresponding relationship, and the snapshot ID corresponds to the snapshot time. Therefore, the file version number and the snapshot time can also be considered as corresponding.
  • Correspondence means that each file version number corresponds to a unique snapshot ID. And the change of the file version number is similar to the snapshot ID. For example, the larger the snapshot ID, the larger the file version number. Or the larger the snapshot ID, the smaller the file version number. Between multiple snapshots, the later the snapshot, the larger the ID of the snapshot.
  • volume ID instead of the file name.
  • the role of the volume ID is similar to the FID.
  • the volume does not have a similar concept to the file path. Therefore, it is no longer necessary to query the mapping relationship table, and the volume metadata can be directly queried by the volume ID to obtain the file version number.
  • the stripe request generating module 511 is further configured to use a file name query mapping relationship table to obtain a file number (FID) of a file where the data to be written is located; and query the file metadata according to the FID to obtain a file version number.
  • FID file number
  • the mapping relationship table records the mapping relationship between the file name and the FID, and the file name and the FID correspond one-to-one.
  • the storage location of the mapping relationship table can be carried in the file write request and obtained by the client server from the write request.
  • the mapping relationship table may also be pre-stored in the client server by the client server, and the client server finds the mapping relationship table according to the file path.
  • the mapping relationship table can also be stored in other storage devices.
  • the stripe request generation module 511 can also update the obtained file version number into the metadata. After the update, the FID and the file version number are recorded in the file metadata, and the file version number can be obtained by querying from the file metadata using the FID.
  • File metadata can be saved in the inode information.
  • the file path indicates the storage location of the inode: As can be seen from the above, for ROW, the version number is saved in the backed up inode, so the stripe request generation module 511 reads the backed up inode. For COW, the version number is saved in the inode generated by the backup. Therefore, the strip request generation module 512 reads the inode generated by the backup.
  • the file version number and the file snapshot ID have a one-to-one correspondence.
  • the client server After the client server generates the snapshot ID, it generates a file version number corresponding to one of the files.
  • the snapshot ID can be directly used as the file version number, or the snapshot ID can be calculated as the file version number. If the snapshot created later has a larger snapshot ID, an optional way is: the later the snapshot is created, the larger the snapshot ID is. The other alternative is: the later the snapshot is created, The smaller the value of its snapshot ID.
  • the stripe request generating module 511 is further configured to split the data to be written into a plurality of strips including a strip to be written. According to the stripe distribution strategy, the offset of the strip to be written and the ID of the object to which the strip to be written belongs are obtained. This ID is called an object ID.
  • the client server splits the data into one or more stripes according to the size of the stripe.
  • Article The band is a certain size of data. Wherein, when the data to be written is less than or equal to the size of a single strip, it is split into 1 strip; otherwise, it is split into multiple strips.
  • the strips split from the same file are the same size.
  • Stripe size can be saved in file metadata, in which case different files can use different stripe sizes.
  • the stripe size may also not be stored in the metadata of the file to which the object belongs, but the files in the entire file system share a stripe size, in which case different files use the same stripe size, stripe size. Saved in the inode of the file system root.
  • the object can be seen as a container that can hold strips.
  • the strip generated by the split refers to the data strip that is split.
  • a plurality of check strips are also generated to perform data protection on the data strips, and the strips generated by the split include both data strips and check strips.
  • the total number of stripes owned by each object can be saved in the file metadata, in which case the total number of stripes owned by objects of different files can be different.
  • the total number of stripes owned by each object may also not be stored in the metadata of the file to which the object belongs, in which case the total number of stripes owned by objects of different files is the same.
  • the starting position of the data to be written in the file can be known from the data offset to be written.
  • the end position of the data to be written in the file can be known from the data offset to be written and the size of the data to be written. If the starting position of the data to be written is not an integer multiple of the stripe size, or the value of the offset of the ending position plus 1 is not an integer multiple of the stripe size, the data to be written is first split according to the stripe size, and the boundary is split. Is an integer multiple of the strip size. If splitting produces data that is less than one strip in size (this data can also be called dirty data for strips), it is padded to form a strip. Due to the completion operation performed by the strip request generation module 511, the strip and strip offsets mentioned later refer to the stripe and strip offset after completion, unless otherwise specified.
  • the offset of the data to be written is the relative position of the data to be written in the file.
  • Another complementation method is: if the starting position of the data to be written is not an integer multiple of the strip size, or the value of the ending offset plus one is not an integer multiple of the strip size, the strip can be complemented. Make the strips after splitting the same size and there is no blank in the strip. The data already stored in the OSD can be read out as data for completion.
  • the offset of the data to be written ranges from 4 KB to 300 KB, and the size of the stripe is 256 KB. Then, the data to be written can be filled to form data with an offset range of 0KB-511KB, and then split into 2 strips of 0KB-255KB and 256KB-511KB, so that the size of each strip is 256KB.
  • the stripe distribution policy is provided by the file system of the client server. Describes the object to which the strip belongs, that is, the correspondence between the strip and the object. Specifically, it may be a correspondence between the offset of the stripe and the object.
  • the object ID uniquely marks an object, and the IDs of the objects belonging to the same file are different, and the IDs of the objects of different files are also different.
  • the object ID and the FID of the file to which the object belongs may have a corresponding relationship. That is to say, for example, the object ID can know the file from which the object represented by this object ID is derived.
  • an optional object ID generation method is that the object ID is composed of 64-bit binary numbers, wherein the first 32 bits are the ID of the file to which the object belongs, the last 32 bits are given by the client server, and the last 32 bits are unique within the file.
  • the same 32 files are different from the last 32 bits of the object, for example using the object number in the file.
  • the corresponding FID can be known from the first 32 bits of the object ID.
  • the relationship between the object ID and the volume ID can also be established.
  • Another optional object ID generation method is: the object ID is composed of 48-bit binary numbers, the first 16 bits correspond to the file, and the first 16 bits of different files are different; the latter 32 bits are given by the client server, and the last 32 bits are in the file. Uniquely, the same 32 files are different from the last 32 bits of the object.
  • the ID and the FID storage of the file to which the object belongs may also not exist. Should be related.
  • Figures 3A and 3B illustrate two different strip distribution strategies.
  • the stripe index describes the offset relationship of the stripe in the file.
  • the stripe index is an integer greater than or equal to 0, the smallest stripe index is 0, the second smallest stripe index is 1, and the third smallest stripe The index is 2, ..., and so on.
  • the two adjacent strips of the index value are also adjacent to each other in the file.
  • An optional stripe distribution strategy is, as shown in Figure 3A: (1) the size of the objects belonging to the same file is fixed. Since the strip sizes of the same file are the same, it means that different objects have the same The total number of strips is the same; (2) strips are in the order of index, first fill the previous object and then install the next object, that is, according to the order of the offset of the stripe in the data to be written, consecutive Strips belong to the same object. As shown in Fig. 3A, each object is fixed by 4 strips.
  • the first object holds the 0th to 3th strips
  • the second object holds 4 to 7 strips
  • the third object holds the 8th to 11th strips. correspondingly, the ID of the first object is 0, the ID of the second object is 1, and the ID of the third object is 3...
  • strip offset (the number of stripes in the strip index % object) ⁇ strip size.
  • the number of stripes in the strip index % object is the meaning of calculating the remainder of the strip index divided by the number of strips in the object.
  • FIG. 3B Another optional stripe distribution strategy is shown in Figure 3B: (1) the size of the objects in the same file is not fixed, that is, different objects of the same file can have different total number of stripes; (2) total number of objects Fixed, that is, different files have the same number of objects, as shown in Figure 3B, a total of 3 objects.
  • the stripe size is 256KB, and the total number of objects is fixed at 3.
  • the first strip (strip 0) is in the first object (object 0)
  • the second strip (strip 1) is in the second.
  • the 4th strip (strip 3) is again in the first object
  • the 5th strip (strip 4) is in the second object.
  • a stripe index is an integer greater than or equal to 0, describing the positional relationship between strips in a file.
  • the offset of each strip within the belonging object can be determined.
  • the object number in the file can be the strip index divided by the object in the file. The remainder obtained after the total number is taken.
  • Stripe offset (strip index / number of objects) ⁇ strip size.
  • the slice index can be determined by the offset of the data to be written. For example, for the entire file, the split data is located in the stripe of the first object (strip 0), and the offset of the data to be written is located in the 5th strip of the object 1 (strips) 4). Then, in the stripe generated by splitting the data to be written, the index of the first stripe is 4, and the indexes of the remaining strips are analogously.
  • the above describes two schemes for calculating the ID of the object to which the strip belongs.
  • the strip distribution strategy there are other implementation schemes. Different distribution strategies can use different parameters, and these parameters can usually be queried from the client server. obtain.
  • a stripe request sending module 512 is configured to select an OSD for storing a strip to be written.
  • An optional algorithm is to determine the OSD storing the strip to be written according to the FID of the strip to be written. For example, the hash value of the FID is divided by the total number of OSDs, and the remainder is used as the number of the OSD storing the strip to be written. That is, the hash value of the FID is modulo the total number of OSDs.
  • the OSD of the storage strip may also be determined according to the FID of the strip to be written and the object ID.
  • the algorithm can be chosen arbitrarily, as long as you can choose an OSD.
  • the stripe request sending module 512 is further configured to send a stripe write request to the OSD, and the stripe write request carries the strip to be written, the stripe version number to be written, the stripe size to be written, the stripe offset to be written, and the to-be-written stripe offset Write the object ID of the strip.
  • the write mode may also be sent, so that the OSD writes the to-be-written strip according to the write mode specified by the client server.
  • the write mode is ROW or COW. If the OSD only supports one write mode, then you can not send the write mode to the OSD.
  • the stripe request receiving module 521 is configured to receive a stripe write request, and write the strip to be written to the storage medium of the OSD.
  • the stripe request receiving module 521 can perform the method of step 26. For example, writing of a strip to be written can be implemented using one or more of four ways.
  • the stripe request generating module 511 is further configured to receive a file read request, where the file write request carries a file name, a size of the data to be read, and an offset of the data to be read.
  • the file read request can also carry the file path of the file read request, and the file path records the storage location of the mapping relationship table.
  • the file path and file name can be determined to uniquely identify a file.
  • a file read request is a write request that can be recognized by the file system.
  • the file read request request reads out a complete file, or part of the data of the file.
  • the data offset to be read describes the relative position of the data to be read in the file. Specifically, the data offset to be read may describe the distance from the start position of the data to be read relative to the file header. If the data offset to be read is 0, it indicates that the starting position of the data to be read is the starting position of the file to be read. If the data offset to be read is 2 KB, the data size indicating the start position of the data to be read is 2 KB from the start position of the file.
  • the file read request may further carry a file path, where the file path records the storage location of the mapping relationship table. See step 21 for details on the mapping table.
  • the file name may be the file name of the file where the data to be read is located, or the file name of a snapshot of the file where the data to be read is located. If it is the former, it indicates that the file read request wants to access the latest pending data; if it is the latter, it indicates that the file read request wants to access the data to be read of a certain snapshot.
  • the stripe storage module 522 is configured to use the file name to query the mapping relationship table, obtain the FID of the file where the data to be read is located, and query the file metadata according to the FID to obtain the file version number.
  • the file path of the storage mapping table is the file path of the file in which the data to be read is located.
  • the file version number is obtained according to the FID query metadata corresponding to the file.
  • the file path of the mapping table is the path where the snapshot file is located.
  • the file version number is obtained according to the FID query metadata of the snapshot file.
  • the mapping relationship table records the mapping relationship between the file name and the FID, and the file name and the FID correspond one-to-one. Refer to Step 21 and Step 22 for the description of the FID and the relationship between the FID and the file version number.
  • the storage location of the mapping relationship table can be carried in the file read request and obtained by the client server from the write request.
  • the mapping relationship table may also be pre-stored in the client server by the client server, and the client server finds the mapping relationship table according to the file path.
  • the mapping relationship table can also be stored in other storage devices.
  • Metadata may be stored in the inode of the file or in the root inode of the file system.
  • the snapshot ID and the file version number have a one-to-one relationship, so the client server can obtain the file version number according to the snapshot ID. This correspondence can be stored in the file metadata.
  • the stripe request generating module 512 is further configured to: convert the file read request processing into a plurality of read requests including a stripe read request.
  • Each stripe read request is used to request the reading of a stripe, and the stripe read request is used to request the OSD to read out the strip to be read.
  • the stripe read request carries: the version number of the strip to be read, the offset of the strip to be read, the size of the strip to be read, and the object ID of the strip to be read.
  • the offset of each strip to be read in the strip to be read can be known according to the size of the data to be read and the offset of the data to be read.
  • the method for generating a stripe can be divided into strips by the offset of the data to be written and the length of the data to be written according to the strip size, and the offset of the strip to be read is obtained.
  • this step can also obtain the offset of each strip to be read by the strip size, the data offset to be read, and the length of the data to be read.
  • the stripe size can come from a file inode, in which case different strip sizes can be used for different files. It is also possible to share a stripe size for all files in the entire system.
  • the ID of the object to be read can be obtained in the same manner as in step 23. It should be noted that, regardless of whether the file name is the file name of the file in which the data to be read is located or the file name of the snapshot, the FID used for the object ID corresponding to the query read request is the FID of the file in which the data to be read is located.
  • the stripe request sending module 512 is further configured to: select an OSD for sending a stripe read request.
  • this step can be performed by the object storage client of the client server.
  • Stripe read requests and stripe write requests for the same stripe must correspond to the same OSD.
  • One possible solution is to use the same OSD selection algorithm as step 24.
  • the stripe request sending module 512 is further configured to: send a stripe read request to the selected OSD.
  • the version number of the strip to be read is actually the version number of the file to which the strip to be read belongs.
  • the write mode can also be sent to the OSD, and the write mode and the write mode in the stripe write request in step 25 are consistent.
  • the object ID of the strip to be read is the ID of the object to which the strip to be read belongs.
  • the stripe request receiving module 521 is further configured to: receive a stripe read request, find a storage location of the strip to be read, and send the strip to be read to the client service apparatus.
  • the stripe request receiving module 521 can implement the function of step 32, for example, using step 32 to refer to mode one or mode two to read out the strip to be read. Therefore, the specific function of the stripe request receiving module 521 can be seen in step 32.
  • aspects of the invention, or possible implementations of various aspects may be embodied as a system, method, or computer program product.
  • aspects of the invention, or possible implementations of various aspects may be in the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, etc.), or a combination of software and hardware aspects, They are collectively referred to herein as "circuits," “modules,” or “systems.”
  • aspects of the invention, or possible implementations of various aspects may take the form of a computer program product, which is a computer readable program code stored in a computer readable medium.
  • the computer readable medium can be a computer readable signal medium or a computer readable storage medium.
  • the computer readable storage medium includes, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing, such as random access memory (RAM), read only memory (ROM), Erase programmable read-only memory (EPROM or flash memory), optical fiber, portable read-only memory (CD-ROM).
  • a processor in a computer reads computer readable program code stored in a computer readable medium such that the processor can perform the steps specified in each step of the flowchart, or a combination of steps Functional action; means for implementing a functional action defined in each block of the block diagram or a combination of blocks.

Abstract

提供了一种数据管理技术,OSD接收客户服务器(11)发送的条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,所述OSD将所述待写条带写入由所述对象ID、所述待写条带版本号以及所述待写条带偏移量确定的存储位置。可以减少对象ID的数量。

Description

一种数据处理方法装置及系统 技术领域
本发明涉及存储领域,特别涉及数据处理方法、装置及系统。
背景技术
基于对象的存储系统(Object-based Storage System)是一种分布式存储系统,由存储服务器和基于对象的存储设备(Object-based Storage Device,OSD)组成。基于对象的存储系统也可以称为对象存储系统,基于对象的存储设备也可以称为对象存储设备。在对象存储系统中,以对象作为最基本的存储内容单元。数据可以是文件或者卷。以文件为例,文件被拆分成分片,文件分片有属性信息,文件分片、文件分片的元数据、文件分片的属性可以共同组成了一个对象,对象存储在多个OSD中。
对象存储系统提供快照(Snapshot)功能。快照是关于指定数据集合的拷贝,该拷贝标记了相应数据在某个时间点(拷贝开始的时间点)的映像。
以文件为例,在快照后,如果对整个文件或者文件的部分数据进行修改,需要把修改数据存入存储系统中。现有技术使用对象ID作为对象的唯一标识,如果同一个文件更新,被更新的数据需要以新的对象ID存储到存储设备中。如果文件频繁更新,对象ID的总数变得非常庞大,占用了较多的存储空间从而增加了系统资源的损耗。
发明内容
本发明提供一种数据管理技术,可以减少对象ID的总数,节约对象ID占用存储空间。
第一方面,本发明实施例提供一种数据存储方法,包括:对象存储设备OSD接收客户服务器发送的条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属文件或者卷的最近一次快照 的快照ID对应,所述待写条带偏移量描述所述待写条带在所属对象中的位置,所述待写条带的对象ID是所述待写条带所属对象的ID;所述OSD将所述待写条带写入由所述对象ID、所述待写条带版本号以及所述待写条带偏移量确定的存储位置。
第二方面,本发明实施例提供一种数据存储方法,包括:对象存储设备OSD接收客户服务器发送的条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属文件或者卷的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待写条带在所属对象中的位置,所述待写条带的对象ID是所述待写条带所属对象的ID;所述OSD判断由所述待写条带版本号和所述对象ID确定的对象是否已备份:如果已备份,则所述OSD将所述待写条带写入由所述对象ID、所述待写条带版本号以及所述待写条带偏移量确定的存储位置;如果未备份,则所述OSD使用所述待写条带建立一个拼接对象,然后把所述拼接对象写入由所述待写条带版本号和所述对象ID确定的存储位置。
第三方面,本发明实施例提供一种数据存储方法,包括:对象存储设备OSD接收客户服务器发送的条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属文件或者卷的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待写条带在所属对象中的位置,所述待写条带的对象ID是待写条带所属对象的ID;所述OSD判断由所述待写条带版本号、所述待写条带的对象ID以及所述待写条带偏移量确定的条带是否已备份;如果已备份,则将所述待写条带写入由所述待写条带版本号、所述待写条带的对象ID以及所述待写条带偏移量确定的存储位置;如果未备份,则将所述OSD中初始版本对象中位于所述待写条带偏移量、大小是所述待写条带大小的数据备份到由所述待写条带版本号、所述待写条带偏移量以及所述待写条带的对象ID确定的存储位置,其中,所述 初始版本对象的对象ID和所述待写条带的对象ID相同,所述初始版本对象的版本号是初始版本号;把所述待写条带写入由所述待写条带的对象ID、所述初始版本号以及所述待写条带偏移量确定的存储位置。
第四方面,本发明实施例提供一种数据存储方法,包括:对象存储设备OSD接收客户服务器发送的条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属文件或者卷的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待定条带在所属对象中的位置,所述待写条带的对象ID是所述待写条带所属对象的ID;所述OSD判断由所述待写条带版本号和所述对象ID确定的对象是否已备份:如果已备份,则所述OSD将所述待写条带写入由所述对象ID、所述对象的版本号以及所述待写条带偏移量确定的存储位置;如果未备份,则将所述OSD中初始版本对象中的数据备份到由所述待写条带版本号以及所述对象ID确定的存储位置,其中,所述初始版本对象的对象ID和所述待写条带的对象ID相同,所述初始版本对象的版本号是初始版本号;所述OSD将所述待写条带写入由所述对象ID、所述初始版本号以及所述待写条带偏移量确定的存储位置。
第五方面,本发明实施例提供一种读数据方法,包括:对象存储设备OSD接收客户服务器发送的读条带请求,所述读条带请求中携带待读条带大小、待读条带偏移量、待读条带版本号以及待读条带的对象ID,其中,所述待读条带版本号与所述待读条带所属文件或者卷的最近一次快照的快照ID对应,所述待读条带的对象ID是所述待读条带所属对象的ID;所述OSD判断由所述对象ID、所述待读条带版本号以及所述待读条带偏移量所确定的条带是否已备份:如果已备份,则读取由所述对象ID、所述待读条带版本号、所述待读条带偏移量以及所述待读条带大小所确定的数据,把读取的数据作为待读条带发送给所述客户服务器;如果未备份,则从对象ID和所述待读条带的对象ID相同、版本号和所述待读条带的版本号不同的 对象中,按照对象的快照时间由晚到早的的顺序,逐个对象进行查找,直至找到在所述待读条带偏移量的存储位置存储有有效数据的对象,把找到的有效数据作为待读条带发送给所述客户服务器,其中,所述对象的版本号和所述对象生成之前,所属文件或者卷的最近一次快照的快照ID对应。
第六方面,本发明实施例提供一种读数据方法,包括:所述OSD接收所述客户服务器发送的读条带请求,所述读条带请求中携带待读条带大小、待读条带偏移量、待读条带版本号以及待读条带的对象ID,其中,所述待读条带版本号与所述待读条带所属文件或者卷的最近一次快照的快照ID对应,所述待读条带的对象ID是所述待读条带所属对象的ID;所述OSD判断由所述对象ID、所述待读条带版本号确定的对象是否已备份:如果已备份,则读取由所述对象ID、所述待读条带版本号、所述待读条带偏移量以及所述待读条带大小所确定的数据,把读取的数据作为待读条带发送给所述客户服务器;如果未备份,则从对象ID和所述待读条带的对象ID相同、版本号和所述待读条带的版本号不同的对象中,按照对象的快照时间由晚到早的的顺序,逐个对象进行查找,直至找到在所述待读条带偏移量的存储位置存储有有效数据的对象,把找到的有效数据作为待读条带发送给所述客户服务器,其中,所述对象的版本号和所述对象生成之前,所属文件或者卷的最近一次快照的快照ID对应。
第七方面,本发明实施例提供一种数据处理装置,包括:条带请求接收模块,用于接收客户服务器发送的条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属文件或者卷的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待写条带在所属对象中的位置,所述待写条带的对象ID是所述待写条带所属对象的ID;条带存储模块,用于将所述待写条带写入由所述对象ID、所述待写条带版本号以及所述待写条带偏移量确定的存储位置。
第八方面,本发明实施例提供一种数据处理装置,包括:条带请求接 收模块,用于接收客户服务器发送的条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属文件或者卷的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待写条带在所属对象中的位置,所述待写条带的对象ID是所述待写条带所属对象的ID;条带存储模块,用于判断由所述待写条带版本号和所述对象ID确定的对象是否已备份:
如果已备份,则所述条带存储模块还用于将所述待写条带写入由所述对象ID、所述待写条带版本号以及所述待写条带偏移量确定的存储位置;如果未备份,则所述条带存储模块还用于使用所述待写条带建立一个拼接对象,然后把所述拼接对象写入由所述待写条带版本号和所述对象ID确定的存储位置。
第九方面,本发明实施例提供一种数据处理装置,包括:条带请求接收模块,用于接收客户服务器发送的条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属文件或者卷的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待写条带在所属对象中的位置,所述待写条带的对象ID是待写条带所属对象的ID;条带存储模块,用于判断由所述待写条带版本号、所述待写条带的对象ID以及所述待写条带偏移量确定的条带是否已备份;如果已备份,则将所述待写条带写入由所述待写条带版本号、所述待写条带的对象ID以及所述待写条带偏移量确定的存储位置;如果未备份,则将所述数据存储装置中初始版本对象中位于所述待写条带偏移量、大小是所述待写条带大小的数据备份到由所述待写条带版本号、所述待写条带偏移量以及所述待写条带的对象ID确定的存储位置,其中,所述初始版本对象的对象ID和所述待写条带的对象ID相同,所述初始版本对象的版本号是初始版本号;把所述待写条带写入由所述待写条带的对象ID、所述初始版本号以及所述待写条带偏移量确定的 存储位置。
第十方面,本发明实施例提供一种数据处理装置,包括:条带请求接收模块,用于接收客户服务器发送的条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属文件或者卷的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待定条带在所属对象中的位置,所述待写条带的对象ID是所述待写条带所属对象的ID;条带存储模块,用于判断由所述待写条带版本号和所述对象ID确定的对象是否已备份:如果已备份,则将所述待写条带写入由所述对象ID、所述对象的版本号以及所述待写条带偏移量确定的存储位置;如果未备份,则将初始版本对象中的数据备份到由所述待写条带版本号以及所述对象ID确定的存储位置,其中,所述初始版本对象的对象ID和所述待写条带的对象ID相同,所述初始版本对象的版本号是初始版本号;条带存储模块,还用于将所述待写条带写入由所述对象ID、所述初始版本号以及所述待写条带偏移量确定的存储位置。
第十一方面,本发明实施例提供一种数据处理装置,包括:条带请求接收模块,用于接收客户服务器发送的读条带请求,所述读条带请求中携带待读条带大小、待读条带偏移量、待读条带版本号以及待读条带的对象ID,其中,所述待读条带版本号与所述待读条带所属文件或者卷的最近一次快照的快照ID对应,所述待读条带的对象ID是所述待读条带所属对象的ID;条带读取模块,用于判断由所述对象ID、所述待读条带版本号以及所述待读条带偏移量所确定的条带是否已备份:如果已备份,则读取由所述对象ID、所述待读条带版本号、所述待读条带偏移量以及所述待读条带大小所确定的数据,把读取的数据作为待读条带发送给所述客户服务器;如果未备份,则从对象ID和所述待读条带的对象ID相同、版本号和所述待读条带的版本号不同的对象中,按照对象的快照时间由晚到早的的顺序,逐个对象进行查找,直至找到在所述待读条带偏移量的存储位置存储有有 效数据的对象,把找到的有效数据作为待读条带发送给所述客户服务器,其中,所述对象的版本号和所述对象生成之前,所属文件或者卷的最近一次快照的快照ID对应。
第十二方面,本发明实施例提供一种数据处理装置,包括:条带请求接收模块,用于接收客户服务器发送的读条带请求,所述读条带请求中携带待读条带大小、待读条带偏移量、待读条带版本号以及待读条带的对象ID,其中,所述待读条带版本号与所述待读条带所属文件或者卷的最近一次快照的快照ID对应,所述待读条带的对象ID是所述待读条带所属对象的ID;条带读取模块,用于判断由所述对象ID、所述待读条带版本号确定的对象是否已备份:如果已备份,则读取由所述对象ID、所述待读条带版本号、所述待读条带偏移量以及所述待读条带大小所确定的数据,把读取的数据作为待读条带发送给所述客户服务器;如果未备份,则从对象ID和所述待读条带的对象ID相同、版本号和所述待读条带的版本号不同的对象中,按照对象的快照时间由晚到早的的顺序,逐个对象进行查找,直至找到在所述待读条带偏移量的存储位置存储有有效数据的对象,把找到的有效数据作为待读条带发送给所述客户服务器,其中,所述对象的版本号和所述对象生成之前,所属文件或者卷的最近一次快照的快照ID对应。
第十三方面,本发明实施例提供一种数据存储系统,包括客户服务器和对象存储设备,所述客户服务器用于:接收文件写请求,所述文件写请求携带待写数据、待写数据偏移量、以及文件的名称,所述待写数据是所述文件的一部分;所述客户服务器根据所述文件的名称获得文件编号FID,根据FID查询所述文件的元数据,获得所述文件的版本号,将所述文件的版本号作为所述待写条带版本号,其中,所述文件的版本号与所述文件的最近一次快照的快照ID对应;所述客户服务器按照所述待写数据偏移量以及所述待写数据的大小,把所述待写数据拆分成包括所述待写条带的多个条带,确定所述待写条带所属对象的ID,以及获得所述待写条带偏移量;创建条带写请求发送给所述对象存储设备;所述对象存储设备用于:接收 所述条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属文件的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待写条带在所属对象中的位置,所述待写条带的对象ID是所述待写条带所属对象的ID;所述OSD将所述待写条带写入由所述对象ID、所述待写条带版本号以及所述待写条带偏移量确定的存储位置。
第十四方面,本发明实施例提供一种数据存储系统,包括所述客户服务器和对象存储设备,客户服务器用于:接收卷写请求,所述卷写请求携带有待写数据、待写数据偏移量以及卷的编号ID,所述待写数据是所述卷的一部分;根据所述卷的ID查询所述卷的元数据,获得所述卷的版本号,其中,所述卷的版本号与所述卷的最近一次快照的快照ID对应;按照所述待写数据偏移量以及所述待写数据的大小,把所述待写数据段拆分成包括待写条带的多个条带,确定所述待写条带所属对象的ID,以及获得待写条带偏移量;创建条带写请求发送给所述对象存储设备;所述对象存储设备用于:接收所述条带写请求,所述条带写请求中携带所述待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述卷的版本号为所述待写条带版本号,所述待写条带偏移量描述所述待写条带在所属对象中的位置,所述待写条带的对象ID是所述待写条带所属对象的ID;所述OSD将所述待写条带写入由所述对象ID、所述待写条带版本号以及所述待写条带偏移量确定的存储位置。
第十五方面,本发明实施例提供一种数据存储系统,包括客户服务器和对象存储设备,所述客户服务器用于:接收文件写请求,所述文件写请求携带待写数据、待写数据偏移量、以及文件的名称,所述待写数据是所述文件的一部分;所述客户服务装置根据所述文件的名称获得文件编号FID,根据FID查询所述文件的元数据,获得所述文件的版本号,其中,所述文件的版本号与所述文件的最近一次快照的快照ID对应;
所述客户服务器按照所述待写数据偏移量以及所述待写数据的大小,把所述待写数据拆分成包括所述待写条带的多个条带,确定所述待写条带所属对象的ID,以及获得所述待写条带偏移量;创建所述条带写请求发送给所述对象存储设备;所述对象存储设备用于:接收所述条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属文件的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待写条带在所属对象中的位置,所述待写条带的对象ID是所述待写条带所属对象的ID;用于判断由所述待写条带版本号和所述对象ID确定的对象是否已备份:如果已备份,则将所述待写条带写入由所述对象ID、所述待写条带版本号以及所述待写条带偏移量确定的存储位置;如果未备份,则使用所述待写条带建立一个拼接对象,然后把所述拼接对象写入由所述待写条带版本号和所述对象ID确定的存储位置。
第十六方面,本发明实施例提供一种数据存储系统,包括客户服务器和对象存储设备,所述客户服务器用于:接收卷写请求,所述卷写请求携带有待写数据、待写数据偏移量以及卷的编号ID,所述待写数据是所述卷的一部分;根据所述卷的ID查询所述卷的元数据,获得所述卷的版本号,其中,所述卷的版本号与所述卷的最近一次快照的快照ID对应;按照所述待写数据偏移量以及所述待写数据的大小,把所述待写数据段拆分成包括所述待写条带的多个条带,确定所述待写条带所属对象的ID,以及获得所述待写条带偏移量;创建所述条带写请求发送给所述对象存储设备;所述对象存储设备用于:接收所述条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属卷的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待写条带在所属对象中的位置,所述待写条带的对象ID是所述待写条带所属对象的ID;用于判断由所述待写条带版本号和所述对象ID确定的对象是否已备份:如果已备份,则将所述待 写条带写入由所述对象ID、所述待写条带版本号以及所述待写条带偏移量确定的存储位置;如果未备份,则使用所述待写条带建立一个拼接对象,然后把所述拼接对象写入由所述待写条带版本号和所述对象ID确定的存储位置。
第十七方面,本发明实施例提供一种读数据系统,包括客户服务器和对象存储设备,所述客户服务器用于:接收文件读请求,所述文件读请求中携带文件的名称、待读数据大小、待读数据偏移量,所述待读数据是所述文件的一部分;根据所述文件的名称获得文件编号FID,根据FID查询所述文件的元数据,获得所述文件的版本号,将所述文件的版本号作为所述待读条带版本号,其中,所述文件的版本号与所述待读条带所属文件的最近一次快照的快照ID对应;按照所述待读数据偏移量以及所述待读数据的大小,确定所述待读条带所属对象的ID,以及获得所述待读条带偏移量;生成条带读请求并发送;所述对象存储设备用于:接收所述读条带请求,所述读条带请求中携带待读条带大小、待读条带偏移量、待读条带版本号以及待读条带的对象ID,其中,所述待读条带版本号与所述待读条带所属文件的最近一次快照的快照ID对应,所述待读条带的对象ID是所述待读条带所属对象的ID;判断由所述对象ID、所述待读条带版本号以及所述待读条带偏移量所确定的条带是否已备份:如果已备份,则读取由所述对象ID、所述待读条带版本号、所述待读条带偏移量以及所述待读条带大小所确定的数据,把读取的数据作为待读条带发送给所述客户服务器;如果未备份,则从对象ID和所述待读条带的对象ID相同、版本号和所述待读条带的版本号不同的对象中,按照对象的快照时间由晚到早的的顺序,逐个对象进行查找,直至找到在所述待读条带偏移量的存储位置存储有有效数据的对象,把找到的有效数据作为待读条带发送给所述客户服务器,其中,所述对象的版本号和所述对象生成之前,所属文件或者卷的最近一次快照的快照ID对应。
第十八方面,本发明实施例提供一种读数据系统,包括客户服务器和 对象存储设备,所述客户服务器用于:接收卷读请求,所述卷读请求中携带卷ID、待读数据大小、待读数据偏移量,所述待读数据是所述卷的一部分;根据卷ID查询所述卷的元数据,获得所述卷的版本号,将所述卷的版本号作为所述待读条带版本号,其中,所述卷的版本号与所述待读条带所属卷的最近一次快照的快照ID对应;按照所述待读数据偏移量以及所述待读数据的大小,确定所述待读条带所属对象的ID,以及获得所述待读条带偏移量;生成条带读请求并发送;所述对象存储设备用于:接收所述读条带请求,所述读条带请求中携带待读条带大小、待读条带偏移量、待读条带版本号以及待读条带的对象ID,其中,所述待读条带版本号与所述待读条带所属文件或者卷的最近一次快照的快照ID对应,所述待读条带的对象ID是所述待读条带所属对象的ID;判断由所述对象ID、所述待读条带版本号以及所述待读条带偏移量所确定的条带是否已备份:如果已备份,则读取由所述对象ID、所述待读条带版本号、所述待读条带偏移量以及所述待读条带大小所确定的数据,把读取的数据作为待读条带发送给所述客户服器;如果未备份,则从对象ID和所述待读条带的对象ID相同、版本号和所述待读条带的版本号不同的对象中,按照对象的快照时间由晚到早的的顺序,逐个对象进行查找,直至找到在所述待读条带偏移量的存储位置存储有有效数据的对象,把找到的有效数据作为待读条带发送给所述客户服务器,其中,所述对象的版本号和所述对象生成之前,所属文件或者卷的最近一次快照的快照ID对应。
应用本发明,使用对象ID与版本号的组合替代现有技术中的对象ID,减少了对象ID的数量,从而降低了系统资源的损耗。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,下面描述中的附图仅仅是本发明的一些实施例,还可以根据这些附图获得其他的附图。
图1是本发明实施例对象存储系统的架构图;
图2是本发明数据处理方法实施例流程图;
图3A、图3B均是本发明实施例条带分布策略示意图;
图4是基于ROW的读条带方案实施例图;
图5是基于COW的读条带方案实施例图;
图6是本发明存储系统实施例结构示意图;
图7是本发明存储系统实施例组成示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例所获得的所有其他实施例,都属于本发明保护的范围。
如图1所示是对象存储系统(Object-based Storage System)的架构图,可以包括客户服务器11、对象存储设备12。对象存储设备12可以为客户服务器11提供对象(Object)的存储服务。
基于对象的存储设备(Object-based Storage Device,OSD)可以称为对象存储设备。在对象存储技术中,基于对象存储设备构建存储系统,每个对象存储设备可以具有一定的智能,能够自动管理其上的数据分布。
对象是系统中数据存储的基本单位,以文件为例,一个对象实际上就是文件的一部分数据和这部分数据的属性信息的组合,属性信息又称为元数据(Meta Data),可以定义基于文件的廉价磁盘冗余阵列(Redundant Arrays of Independent Disks,RAID)参数、数据分布和服务质量等,而传统的存储系统中用文件或块作为基本的存储单位,在块存储系统中还需要始终追踪系统中每个块的属性,对象通过与存储系统通信维护自己的属性。在对象存储设备中,所有对象都有一个对象标识(ID),以便对对象进行访问。
OSD具有一定的智能,它可以拥有CPU、内存和存储介质。OSD和块设备相比,提供的访问接口可以不同。在同一个存储系统中,可以有一个 或者多个OSD,图1以2个OSD进行示例。目前国际上通常采用刀片式结构实现对象存储设备。OSD可以提供三个功能:
(1)数据存储。OSD管理对象,并将它们存储在磁盘等存储介质中,OSD不提供块接口访问方式,客户端请求数据时使用对象ID、偏移量进行数据读写。
(2)智能分布。OSD用其自身的CPU和内存优化本地所存储的数据的分布,并支持数据的预取。由于OSD可以智能地支持对象的预取,从而可以优化数据读取速度。
(3)每个对象元数据的管理。OSD管理存储在其上对象的元数据,该元数据记录在一种称为索引节点(index node,inode)的数据结构中。元数据通常包括对象的大小、包含的条带数量等信息。在传统的网络附属存储(Network Attached Storage,NAS)系统中,这些元数据是由文件服务器维护的。对象存储架构可以将元数据有元数据服务器进行管理,也可以将系统中主要的元数据管理工作由OSD来完成,降低了客户端的开销。
当前一种存储方式是写时首次复制(Copy On First Write,COFW),有时简称为写时复制(Copy On Write,COW)。即在新数据第一次写入到存储设备的某个存储位置时,首先将这个存储位置的原有数据读取出来,写到另一存储位置处(这另一存储位置是为快照预留的存储位置,我们称为快照空间),然后再将新数据写入到存储设备中。从COW的执行过程我们可以知道,这种实现方式需要执行一次读操作和两次写操作。
写时重定向写(Redirect On First Write,ROW)是另外一种存储新数据的方法。ROW把新数据写入预留的存储位置,旧数据的存储位置保持不变。相对于COW,它可以减少一次写操作。
在对象存储技术中,可以把大部分元数据管理工作分布到每个智能化的OSD,每个OSD负责管理本地所存储的数据的分布和检索,90%的元数据管理工作分布到智能的存储设备,仅10%的元数据管理工作有元数据服务器执行,从而提高了系统元数据管理的性能。此外,OSD是与网络连接 的设备,它自身包含存储介质,如磁盘或磁带,并具有足够的智能可以管理本地存储的数据。存储服务器直接与OSD通信,访问它存储的数据,由于OSD具有智能,因此不需要文件服务器的介入。
对象是数据和数据属性的综合体。数据属性可以根据应用的需求进行设置,包括数据分布、服务质量等。客户服务器11可以是基于NAS协议的服务器或者存储区域网(Storage Area Network,SAN)协议的服务器存储区域网。也就是说,本发明的实施例既适用文件系统,也适用块系统。
对网络附属存储(Network Attached Storage,NAS)的数据而言,本发明实施例中的对象来自于文件,文件拆成多个分片,一个分片以及这个分片的属性、元数据等信息共同组成一个对象。类似的,对存储区域网(Storage Area Network,SAN)的数据而言,被拆分成分片的是卷(Volume)。
现有技术使用对象ID确定对象,因此每个对象的ID是唯一的,当同一个文件多次更新后。会产生大量的对象ID,记录对象ID需要耗费大量的存储空间。本发明各实施例中,使用对象ID和版本号的组合共同确定对象,当一个文件的数据发生了多次更新,如果更新的数据的偏移量范围不变,那么更新数据相应的对象ID可以保持不变,仅仅更新不同对象版本号即可,减少了系统维护的对象ID的总数。而且本发明实施例的方案中,对象版本号和快照ID有对应关系,在两次快照之间,不论文件的数据发生了多少次更新,同一个文件内的所有对象使用同一个版本号,因此版本号占用的存储空间很少。
现有技术中,文件或者卷的内容进行更新后,对修改涉及的对象的元数据,需要对存储在文件层(对块系统而言是卷语义层)的元数据进行更新,更新的数据量较多。此外,接入节点可以通过客户服务器访问OSD,如果不同接入节点都可以访问修改所涉及的对象,那么节点间需要进行元数据同步,具体而言,当一个接入节点对对象的元数据进行更新后,会引发其他接入节点对修改的对象所在文件的所有对象ID进行整体更新,频繁的同步导致了元数据的严重膨胀。而本发明实施例提供的方案,对象ID不 需要改变,仅需要在OSD层对版本号进行更新,更新的数据量远远小于现有技术。此外,本发明实施例中对象ID是由偏移量计算获得的。
如图2所述,以文件请求为例,对本发明数据处理方法实施例流程图进行具体说明。如果把其中关于文件系统的各个术语,替换成块系统的相应术语,那么就是另外一种实施方式。例如:文件替换成卷,文件元数据替换成卷元数据,文件ID替换成卷ID,文件版本号替换成卷版本号,文件ID替换成卷ID。不同之处在于:(1)卷元数据有另外的存储位置,不是存储在inode中;(2)卷ID可以直接获得,不需要使用卷名转换获得。
步骤20,创建快照,快照的目标是文件或者包括了文件的文件系统,也就是说快照的目标包括文件,为快照分配快照ID。
创建快照包括2种方式,一种是对文件创建快照,快照的目标是单个文件。另外一种是对文件系统创建快照,快照的目标是整个文件系统,文件系统中包括多个文件。这两种方式下,文件元数据的保存位置不同。
在创建文件快照的方式中,选定文件创建快照,给文件设置快照名,如果快照名是没有使用过的,则为文件的快照分配快照ID。并将文件快照ID作为文件的元数据保存在文件的inode(index node)中。需要说明的是,快照ID是快照的标记,例如使用创造快照的时间点作为快照ID。或者按创建快照的时间点的先后顺序,使用递增的数字作为快照ID。
在创建文件系统快照的方式中,选定文件系统进行快照,如果快照名是没有使用过的,则为文件系统的快照分配快照ID。然后把分配的快照ID保存在文件系统的根inode中。在这种方式中,可以认为文件系统中各个文件的快照ID和文件系统的快照ID相同。和前一种方式所不同的是,文件的快照ID不是保存在文件的inode中,而是保存在文件系统的根inode中。
除了文件快照ID,文件元数据中还包括文件编号(File Identification,FID)。文件元数据还可以包括文件大小(Size)以及写入时间等信息。
需要说明的是,步骤20是一个预置的步骤,和本方法实施例其他的步 骤有相对的独立性。本发明实施例主要描述在创建一次快照之后,以及创建下一次快照之前,客户服务器和OSD执行的操作。
步骤21,客户服务器接收文件写请求,文件写请求中携带有待写数据、待写数据偏移量以及文件名。待写数据是文件的一部分。
具体而言,本步骤可以由客户服务器的文件系统程序执行。文件写请求是能够被文件系统识别的写请求。文件写请求可以是创建某个文件,或者使用待写数据对已经存在的文件进行更新,待写数据是文件的一部分或者文件的全部。
所述文件写请求还可以携带待写数据大小,以便后续步骤根据待写数据偏移量把待写数据拆分成条带。也可以不携带待写数据大小,因为待写数据大小可以通过对待写数据进行测量获得。
待写数据偏移量描述了待写数据在文件内的相对位置。具体而言,待写数据偏移量可以描述待写数据的起始位置相对于文件头的距离。如果待写数据偏移量是0,表示待写数据的起始位置是待写文件的起始位置。如果待写数据偏移量是1KB,表示待写数据的起始位置距离文件的起始位置1KB的数据大小。
可选的,文件写请求还可以携带文件写请求的文件路径,文件路径指示了文件以及映射关系表的存储位置。文件路径和文件名共同确定一个文件。例如文件路径和文件名的组合是/root/mma/a1,其中/root/mma/是文件路径,a1是文件名,/root/mma/这个路径下存储有文件以及映射关系表。
不同的文件可以有不同的文件名。同一个文件路径下的文件名不重复。
可选的,写请求还可以携带映射关系表的存储位置,映射关系表记录了文件名和FID的映射关系。
每创建一次快照会生成一个快照ID,每个快照ID有一个对应的文件版本号,快照ID和文件版本号一一对应。而且相邻快照时间对应的快照ID的变化规律,和相邻快照时间对应的文件版本号的变化规律相同。
在执行步骤22之前,可以记录快照ID和文件版本号的映射关系,
包括以下两个步骤:
(1)备份当前最新的文件元数据,具体而言,可以通过备份inode实现。文件级别的快照,则备份文件的inode。文件系统级别的快照,则备份文件系统的inode,既包括文件的inode也包括文件的根inode中。
(2)更新inode中的版本号。如果客户服务器中设置的写模式是ROW,更新的版本号保存在被备份的inode中。对于客户服务器中设置的写模式是COW,更新的版本号保存在备份生成的inode中;可选的,被备份的inode也可以记录更新的版本号。例如把A inode备份生成B inode,那么A inode是被备份的inode;B inode是备份生成的inode。
步骤20中生成了快照ID。文件版本号和快照ID存在对应关系,而快照ID又和快照时间对应,因此也可以认为文件版本号和快照时间有对应关系。对应关系是指,每个文件版本号对应有唯一的一个快照ID,以及文件版本号的变化规律和快照ID相似。例如:快照ID越大,其文件版本号越大;或者快照ID越大,其文件版本号越小。在多个快照之间,快照时间越晚的快照其ID也越大。
需要说明的是,在例如SAN在内的基于块系统的写数据方法中。使用卷ID而不是文件名对卷进行标记。卷ID的作用和FID类似。此外卷没有和文件路径相类似的概念。因此,在步骤22中不再需要查询映射关系表的步骤,可以直接由卷ID查询卷元数据,获得文件版本号。
步骤22,客户服务器使用文件名查询映射关系表,获得待写数据所在文件的文件编号(FID);根据FID查询文件元数据,获得文件版本号。
映射关系表记录了文件名和FID的映射关系,文件名和FID一一对应。映射关系表的存储位置可以携带在文件写请求中,由客户服务器从写请求中获得。映射关系表也可以由客户服务器预先存储在客户服务器中,客户服务器根据文件路径找到映射关系表。映射关系表还可以存储在其他存储设备中。
还可以把获得的文件版本号更新到元数据中。更新后,文件元数据中 记录了FID和文件版本号,使用FID即可从文件元数据中查询获得文件版本号。文件元数据可以保存在inode信息中。文件路径指示了inode的存储位置:由上文可知,对于ROW,版本号保存在被备份的inode中,因此本步骤读取的是被备份的inode。对于COW,版本号保存在备份生成的inode中。因此本步骤读取的是备份生成的inode。
文件版本号和文件快照ID有一一对应关系,客户服务器在生成快照ID后,生成与之一一对应的文件版本号。例如可以直接把快照ID作为文件版本号,也可以把快照ID运算后作为文件版本号。如果越晚创建的快照其快照ID越大,那么一种可选的方式是:越晚创建的快照,其快照ID的值越大;另外一种可选的方式是:越晚创建的快照,其快照ID的值越小。
在本发明各实施例中,有时候也采用待写条带版本号。待写条带版本号是待写条带所属文件的文件版本号。也就是说,来自同一文件的不同条带,其条带版本号相同。类似的,对象版本号(或者对象的版本号),是待写条带所属文件的文件版本号。也就是说,来自同一文件的不同对象,其对象版本号相同。
步骤23,客户服务器将待写数据拆分成包括待写条带(strip)在内的多个条带。按照条带分布策略,获得待写条带偏移量以及待写条带所属对象的ID,待写条带所属对象的ID也称为对象ID。
客户服务器按照条带大小(Size)把数据拆分成一个或者多个条带。条带是一定大小的数据。其中,当待写数据小于或者等于单个条带的大小时,拆分成1个条带;否则拆分成多个条带。同一个文件拆分出的条带大小相同。条带大小(Size)可以保存在文件元数据中,在这种情况下,不同的文件可以使用不同的条带大小。条带大小也可以不保存在对象所属于的文件的元数据中,而是整个文件系统中的文件共用一个条带大小,在这种情况下,不同文件使用相同的条带大小,条带大小保存在文件系统的根inode中。对象可以看做一个容器,可以容纳条带。
举例:待写数据被拆分成若干个数据条带,则本步骤的条带是指被拆 分出的数据条带。或者待写数据在拆分成数据条带后,还生成若干个校验条带对数据条带进行数据保护,则本步骤的条带既包括数据条带也包括校验条带。
每个对象中拥有的条带总数可以保存在文件元数据中,在这种情况下,不同的文件的对象拥有的条带总数可以是不同的。每个对象中拥有的条带总数也可以不保存在对象所属于的文件的元数据中,在这种情况下,不同文件的对象拥有的条带总数是相同的。
需要说明的是,由待写数据偏移量可以知道待写数据在文件中的起始位置。由待写数据偏移量和待写数据大小可以知道待写数据在文件中的结束位置。如果待写数据的起始位置不是条带大小整数倍,或者结束位置的偏移量加1的值不是条带大小的整数倍,先按照条带大小对待写数据进行拆分,拆分的边界是条带大小整数倍。如果拆分后产生大小不足一个条带的数据(这种数据也可以称为条带的脏数据),将其补齐形成条带。由于本步骤的补齐操作,在没有特别说明的情况下,后续步骤中提到的条带、条带偏移量都是指补齐后的条带、条带偏移量。
例如:待写数据的偏移量范围是4KB-300KB,条带的大小是256KB。那么,以0KB和256KB作为边界拆分待写数据。形成2个数据块,这两个数据块在待写数据中的偏移量范围分别是4KB-255KB和256KB-300KB。对这两个数据块进行补齐,形成2个大小为256KB的条带。其中,补齐前一个数据块的数据(大小是3KB-0KB=3KB)来自前一个条带,补齐后一个数据块的数据(大小是511KB-300KB=21KB)来自后一个条带。待写数据的偏移量是待写数据在文件内的相对位置。
另外一种补齐办法是:如果待写数据的起始位置不是条带大小整数倍,或者结束偏移量加1的值不是条带大小整数倍,可以对条带进行补齐操作。使得拆分后的条带大小一致,并且条带中不存在空白。可以把OSD中已经存储的数据读取出来作为补齐用的数据。
例如:待写数据的偏移量范围是4KB-300KB,条带的大小是256KB。 那么,可以把待写数据补齐后形成偏移量范围0KB-511KB的数据,然后再将其拆分成0KB-255KB和256KB-511KB共2个条带,使得每个条带的大小都是256KB。
条带分布策略由客户服务器的文件系统提供。描述了条带所属于的对象,也就是条带和对象的对应关系。具体而言,可以是条带的偏移量和对象的对应关系。
对象ID唯一标记了一个对象,属于同一个文件的对象的ID不同,不同文件的对象的ID也不同。
可选的,对象ID和对象所属的文件的FID可以存在对应关系。也就说例如,由对象ID可以知道这个对象ID所代表的对象来自的文件。
例如:一种可选的对象ID生成方式是,对象ID由64位二进制数组成,其中,前32位是对象所属文件的ID,后32位由客户服务器赋予,后32位在文件内唯一,同一个文件不同对象的后32位不同,例如使用文件内的对象编号。在这种方式中,由对象ID的前32位即可获知对应的FID。类似的,在块(block)系统中,也可以建立对象ID和卷ID的关系。
另外一种可选的对象ID生成方式是:对象ID由48位二进制数组成,前16位和文件对应,不同文件前16位不同;后32位由客户服务器赋予,后32位在文件内唯一,同一个文件不同对象的后32位不同。
在其他实施例中,ID和对象所属的文件的FID存储存也可以不存在对应关系。
图3A与图3B示例了两种不同的条带分布策略。条带索引描述了条带在文件中的偏移量关系,条带索引是大于等于0的整数,最小的条带索引是0,第二的条带索引是1,第三的条带索引是2,……,以此类推。索引数值相邻的2个条带,在文件中的偏移量也相邻。
一种可选的条带分布策略是,如图3A:(1)属于同一个文件的对象大小是固定的,由于同一个文件的条带大小是相同的,也就意味着不同的对象拥有的条带总数是相同的;(2)条带按照索引顺序,先装满前一个对象 再装下一个对象,也就是说,按照条带在待写数据中的偏移量大小顺序,连续的若干个条带属于同一个对象中。如图3A,每个对象固定由4个条带组成。样例:条带大小为256KB,每个对象拥有4个条带,也就是说对象大小是256KB×4=1024KB。那么第1个对象保存第0~3个条带,第2个对象保存4~7个条带,第3个对象保存第8-11个条带……相应的,第一个对象的ID是0,第二个对象的ID是1,第三个对象的ID是3……
用条带偏移量描述条带在对象内的相对位置,具体而言,可以是条带的起始数据在对象内的相对位置。条带偏移量=(条带索引%对象内的条带数量)×条带大小。其中,%是指取前项除以后项,然后取余数。因此,“条带索引%对象内的条带数量”的值是计算条带索引除以对象内的条带数量的余数。
另外一种可选的条带分布策略如图3B:(1)同一个文件中对象的大小不固定,也就是说,同一个文件的不同对象可以拥有不同的条带总数;(2)对象总数固定,也就是说,不同文件拥有相同数量的对象,如图3B,一共有3个对象。样例:条带大小为256KB,对象总数固定为3,则第1个条带(条带0)位于第一个对象(对象0)中,第2个条带(条带1)位于第二个对象(对象1)中,……,依次类推,第4个条带(条带3)又位于第一个对象中,第5个条带(条带4)又位于第二个对象中。条带索引是大于等于0的整数,描述条带之间在文件中的位置关系。同时可以确定各条带在所属对象内的偏移量,文件内的对象编号可以是条带索引除以文件中对象总数后得到的余数。具体计算公式可以是:文件内的对象编号=条带索引%文件中对象个数。条带偏移量=(条带索引/对象个数)×条带大小。
条带索引可以由待写数据的偏移量确定。例如:对整个文件而言,其拆分后起始数据位于第一个对象的条带(条带0),而本次待写数据偏移量位于对象1的第5个条带(条带4)。那么由待写数据拆分生成的条带中,第一个条带的索引就是4,其余条带的索引依次类推。
以上介绍了两种计算条带所属对象的ID的方案,根据条带分布策略的 不同,还可以有其他实现方案,不同的分布策略使用的参数可以不同,而这些参数通常可以从客户服务器中查询获得。
由于各个条带的处理方式相同,因此下面仅以某一个“待写条带”为例进行介绍。
步骤24,客户服务器选择用于存储待写条带的OSD。
具体而言,本步骤可以由客户服务器的对象存储客户端执行。
一种可选的算法是根据待写条带的FID确定存储待写条带的OSD。例如:FID的哈希值除以OSD总数,余数作为存储待写条带的OSD的编号。也就是FID的哈希值对OSD总数取模。还可以有其他方案,例如由客户服务器选择任意一个OSD存储属于某个对象的待写条带。属于同一对象的条带可以存储到同一个OSD中。
此外,也可以根据待写条带的FID和对象ID共同确定存储条带的OSD。实际上,算法可以任意选择,只要能选择一个OSD出来即可。
步骤25,客户服务器发送条带写请求给OSD,条带写请求携带待写条带、待写条带版本号、待写条带偏移量、待写条带所属对象ID。可选的,还可以啊拨款待写条带大小。
可选的,OSD既支持ROW也支持COW的情况下,还可以发送写模式,以便OSD按照客户服务器指定的写模式写入待写条带。写模式是ROW或者COW。如果OSD仅支持一种写模式,则可以不用发送写模式给OSD。
步骤26,OSD接收条带写请求,把待写条带写入OSD的存储介质。
当OSD只支持一种写模式时,OSD可以不用确认写模式是ROW还是COW,而直接以缺省写模式把待写条带写入存储介质。
OSD把收到数据时先暂存在缓存中,本步骤可以把缓存中的待写数据存储到存储介质。
条带偏移量描述了条带在对象中的相对位置。具体而言,可以是条带的起始数据在对象中的相对位置;条带偏移量+条带大小=条带结束数据在对象中的相对位置。
数据的备份标记在OSD中,使用对象ID作为索引可以从OSD中查询数据的备份标记粒度。也可以缺省设置为OSD收到的所有条带都按照同样的备份标记粒度进行存储。属于同一个文件的条带使用同一种记录粒度。对于一个实际设备,可以只支持对象作为备份标记粒度或者仅支持条带作为备份标记粒度。在这种情况下,那么OSD可以不用查询备份标记粒度直接进行存储。
由于在OSD中,由对象ID和版本号这两个参数能够共同确定一个对象,因此本实施例中,把这两个参数组成的集合称作对象关键参数。由于在确定对象后,再借助条带偏移量,可以确定条带。也就是说,对象ID、版本号以及条带偏移量这三个参数能够共同确定一个的条带,因此把这三个参数的组成的集合称为条带关键参数。
在OSD中,对象关键参数可以指向了一个用于存储对象的存储位置。具体而言,可以指向的是一个供对象使用的起始地址。可选的,也可以指向一个供对象使用的地址段。类似的,条带的关键参数也可以指向一个起始地址或者一个用于存储条带的地址段。起始地址、地址段可以是物理地址也可以是逻辑地址。
使用对象关键参数查找为对象参数所确定的对象的存储位置有多种可能情况。一种情况是,OSD在收到条带写请求之前,已经记录有条带写请求中携带的对象关键参数,并为这组关键参数所代表的条带分配有存储位置。另外一种情况是,OSD并没有记录这组关键参数,且并没有为这组关键参数所代表的条带分配存储位置,则OSD在收到条带写请之后,为这组对象关键参数分配存储位置。
对象集:对象ID相同、版本号不同的对象的集合。对象集中包括至少一个对象。对象集可以是一个逻辑概念,不用真实划分。
对象ID:由对象内携带的数据在文件内的偏移量范围确定。如果同一个文件进行过多次快照,每次快照后把发生改变的数据存储到OSD,那么发生改变的数据,如果偏移量相同的数据的对象ID相同。
OSD中对会对对象或者条带是否被备份进行标记。备份标记粒度可以是条带或者对象,如果被标记的最小单元是条带,那么备份标记粒度就是条带,如果被标记的最小单元是对象,那么备份标记的粒度就是对象。
对象的备份标记:对象ID+版本号确定的对象已经被备份。具体而言:创建版本号对应的快照后,对象ID对应的对象是否被备份。1表示已经被备份,0表示没有被备份。对象的备份标记是0具体有两种情况:一种情况是对象ID+版本号确定的对象修改了,还没有执行备份操作;另外一种情况是没有对对象ID+版本号确定的对象做修改。
条带的备份标记:对象ID+版本号+条带偏移量确定的条带已经被备份。具体而言:创建版本号对应的快照后,对象ID+条带偏移量对应的条带是否被备份。1表示已经被备份,0表示没有被备份。条带的备份标记是0具体有两种情况:一种情况是对象ID+版本号+条带偏移量确定的条带修改了,还没有执行备份操作;另外一种情况是没有对对象ID+版本号+条带偏移量确定的条带做修改。
通过比较对象版本号,可以确定同一个对象集中,不同对象快照时间的早晚。
OSD写入待写条带的方式一共有四种可能性:(1)写模式是ROW,备份标记粒度是条带;(2)写模式是ROW,备份标记粒度是对象;(3)写模式是COW,备份标记粒度是条带;(4)写模式是COW,备份标记粒度是对象。对于一个OSD,可以支持其中一种或者数种。下面对这四种可能性分别进行说明。
方式一:对于ROW,OSD中数据的备份标记粒度是条带。
按照条带请求中条带关键参数确定的存储位置,直接把待写条带写入OSD中。此外,写入完成后本步骤还可以把写入的条带所占用的存储位置(起始存储地址或者地址段)标记为“已写入有效数据”。条带存储在OSD的存储介质中所占用的存储位置也称为条带空间。
可以使用比特位标记对象内各个条带是否被备份。例如把这个条带的 存储位置的标识位设置为1,1表示已写入数据,0表示无数据。可以用条带索引描述条带在对象中的先后次序,用标记位对对象内的各个条带进行标记。例如:一共有4个条带空间,那么0000表示4个条带空间都没有写入数据;0010表示仅第2个条带空间写入了数据;0101则表示第1、3个条带空间写入了数据,第2、4个条带空间没有写入数据。
需要说明的是,本实施例中所描述的第N(N是自然数)个条带空间,是指条带空间在条带所属对象内的相对位置,并不是指条带索引。
至于对象内的条带编号的确定方法,例如可以使用条带的偏移量确定,偏移量的值越小其条带编号值越小,相邻条带的编号差值是1,最小的条带编号是0。如果条带分布策略是步骤图3A描述的策略,一种快捷的确定一个条带编号的算法是:条带编号=条带偏移量/条带大小。条带的偏移量是条带在对象内的偏移量。如果这个条带空间之前已经标记为“已备份”,则本步骤可以不重复标记,保持标记不变即可。
方式二:对于ROW,OSD中数据的备份标记粒度是对象。
和方式一相比,方式二判断备份标记的粒度不同,由判断条带的标识位变成判断对象的标识位。
使用条带写请求中携带对象关键参数在OSD的写入记录中进行查询,判断对象关键参数指向的存储位置是不是存储了有效数据。本能实施例可以用判断标识位来确定一个存储位置是否存储了有效数据。例如,标识位是1表示存储了有效数据,标识位是0存储位置没有存储有效数据。通过判断对象关键参数指向的存储位置的标识位,可以确定本次收到的条带写请求,是不是在创建快照后的、对这个对象的第一次写入操作。例如,当标识位是0或者没有找到标识位,表明是快照后的首次写入;标识位是1表示不是快照后的首次写入。
如果不是对这个对象快照后的第一次写入,则直接把待写条带写入这个对象占用的存储位置,具体写入位置可以由条带关键参数确定。
如果条带写请求是对这个对象快照后,首次收到的对这个对象的写请 求。则用条带写请求中的待写条带以及从OSD中其他对象中获得的条带组合,拼接成一个称之为拼接对象的完整的对象。具体而言,余下部分来自的对象是:拥有有效数据的对象中,版本号最大(但是比条带请求中携带的版本号小)的对象。
也就是说从属于所述待写条带的对象ID的对象集、拥有有效数据的对象中,选择版本号最大的对象,从中获得与所述待写条带偏移量不同的条带,和所述待写条带共同组成拼接对象。所述OSD中存储的和所述待写条带的对象ID相同、且版本号不同的对象的集合称为所述待写条带的对象ID的对象集。写模式是ROW时,快照时间越晚则对应的对象版本号越大。待写条带的对象ID就是待写条带所属对象的ID。
举例:每个对象由32个条带组成,而OSD收到的待写条带是其中的第15个,对于余下的31个条带,也就是第1-第14,以及第16-第32个条带所来自的对象是:前一次快照后记录到OSD中的、拥有有效数据的相同对象ID的对象。
写入完成后,把这个对象的标识位记录为已经备份,例如把标识位设置为1。这意味着已经完成了快照后的首次条带写入操作,也就说,如果在下一次快照前再次对这个对象中的任意条带进行写入,将不再是快照后对这个对象的第一次写入,因此无需备份操作,直接写入条带即可。
由前面的介绍可知,同一个对象ID可以有多个对象,每个快照ID对应一个,这些对象写入OSD中的时间不同,写入时间相邻的对象的的版本号相邻,写入时间越晚版本号越大。
在完成本次写操作后,本次新写入的对象成为对象集中新的成员。
方式三:写模式是COW,数据的备份标记粒度是条带。
用条带写请求中的对象关键参数加上条带偏移量可以确定一个存储位置。先检测待写条带的关键参数所确定的存储位置是否已经存储有数据,如果判断结果为否或者没有找到记录,则意味着本次写请求是在创建快照后的首次写请求,需要先进行备份操作再写入待写条带。
通常情况下,在进行下一次快照之前,仅收到第一次条带写请求后,需要进行条带数据的备份,备份到待写条带的对象ID、待写条带版本号以及待写条带偏移量共同确定的存储位置。因此,需要先把OSD已存储的最新条带备份到待写条带的关键参数指向的存储位置,再在备份出数据的存储位置中,写入本次收到的条带。OSD已存储的最新条带是最近一次由客户服务器发送来的条带。本实施例中,它是OSD中已存储的条带中,拥有待写条带的对象ID、版本号是0、且偏移量和待写条带偏移量相同的条带。后续再收到条带写请求可以直接执行待写条带的写操作,不用再做备份。
在COW中,OSD已存储的最新对象始终使用同一个版本号,例如使用0或者空(NULL)作为版本号,本实施例称之为初始版本号。在对象集的其余对象中,除了初始版本号之外的版本号中,版本号越小的对象,其对应的快照时间越晚。
在ROW或者COW中,文件第一次快照之前,写入数据到OSD时,使用的条带版本号就是初始版本号。初始版本号的值可以使用0或者空(NULL)。
在完成备份操作后,将条带写请求中携带的条带关键参数指向存储位置标记为已经存储有数据。在下一次快照之前,如果OSD再次收到对这个待写偏移量位置的COW写请求,可以不再迁移数据,以覆盖的方式把收到的条带写入版本号为0对象中、待写条带偏移量占用的存储位置。换句话说,把待写条带写入由待写条带对象ID、初始版本号以及待写条带偏移量所确定的存储位置。
此外,本步骤还可以把写入待写条带的存储位置标记为已写入有效数据,具体标记方法可以参见方式一。
方式四,写模式是COW,OSD中数据的备份标记粒度是对象。
方式四和方式三的区别是:数据备份标记粒度由条带变成了对象,而且备份的粒度也由条带变成了对象。
用条带写请求中的对象关键参数可以确定一个存储位置。OSD使用对 象关键参数在OSD的写入记录中进行查询,判断待写条带的对象关键参数指向的存储位置是否存储了有效数据。类似方式一中的描述,本实施例可以使用标识位对对象进行标记,例如用标识位1表示存储了有效数据,标识位是0或者在OSD的写入记录中没有查找到对象关键参数的标识位,表示没有存储有效数据。
通常情况下,在进行下一次快照之前,仅收到第一次条带写请求后,需要进行对象数据的备份。具体而言:如果存储有有效数据,意味着在创建快照后,已经对待写条带的对象ID以及待写条带版本号所共同确定的对象做过备份,不需要再次备份;如果没有存储有效数据或者在OSD中没有找到条带写请求中的对象关键参数的记录,意味着本步骤中需要先进行备份,然后才能写入本次收到的条带写请求中的待写条带。
如果对象关键参数所指向的存储位置已经存储有有效数据。则直接把待写条带写入待写对象ID、版本号0以及待写条带偏移量共同确定的位置。
如果对象关键参数所指向的存储位置没有存储有有效数据,则先把0版本对象中的所有条带备份到条带写请求中对象关键参数指向的存储位置。备份完成后,把条带写请求中的对象关键数据指向的存储位置标记为1。然后把待写条带写入0版本对象原本占用的存储位置中。写入位置由待写条带的对象ID待写条带版本号、初始版本号共同确定。
步骤26执行完成后,OSD发送待写条带被存储成功的响应消息给客户服务器。
需要说明的是,步骤26在发生下一次快照之前执行。也就是说,步骤21-26都是第一次快照之后,下一次快照之前执行。步骤21-26是把待写条带写入OSD的流程。下面介绍如何把已经写入OSD的条带读出来,读、写过程是相对独立的两个方法。
步骤27,客户服务器接收文件读请求,文件写请求中携带有文件名、待读数据大小、待读数据偏移量。
和文件写请求类似,文件读请求还可以携带文件读请求的文件路径, 文件路径记录了映射关系表的存储位置。由文件路径和文件名可以确定唯一确定一个文件。
具体而言,本步骤可以由客户服务器的文件系统程序执行。文件读请求是能够被文件系统识别的写请求。文件读请求请求读出的是一个完整的文件,或者文件的部分数据。
其中,待读数据偏移量述了待读数据在文件内的相对位置。具体而言,待读数据偏移量可以描述待读数据的起始位置相对于文件头的距离。如果待读数据偏移量是0,表示待读数据的起始位置是待读文件的起始位置。如果待读数据偏移量是2KB,表示待读数据的起始位置距离文件的起始位置2KB的数据大小。
可选的,文件读请求还可以携带文件路径,文件路径记录了映射关系表的存储位置。映射关系表的细节参见步骤21的介绍。
文件名可能是待读数据所在文件的文件名,也可能是待读数据所在文件的一个快照的文件名。如果是前者,说明文件读请求希望访问的是最新的待读数据;如果是后者,说明文件读请求希望访问的是某个快照的待读数据。
步骤28,客户服务器使用文件名查询映射关系表,获得待读数据所在文件的FID;根据FID查询文件元数据,获得文件版本号。
如果文件名是待读数据所在文件的的文件名,那么存储映射关系表的文件路径是待读数据所在文件的文件路径。根据文件对应的FID查询元数据获取到文件版本号。
如果文件名是快照的文件名,那么映射关系表的文件路径是快照文件所在路径。根据快照文件的FID查询元数据获取文件版本号。
映射关系表记录了文件名和FID的映射关系,文件名和FID一一对应。FID的介绍以及FID和文件版本号的关系参见步骤21以及步骤22。映射关系表的存储位置可以携带在文件读请求中,由客户服务器从写请求中获得。映射关系表也可以由客户服务器预先存储在客户服务器中,客户服务器根 据文件路径找到映射关系表。映射关系表还可以存储在其他存储设备中。
参见步骤22,根据具体情况的不同。元数据可能存储在文件的inode中也可能存储在文件系统的根inode中。
快照ID和文件版本号存在对应一一关系,因此客户服务器根据快照ID可以获得文件版本号。这个对应关系可以存储在文件元数据中。
步骤29,客户服务器将文件读请求处理转换成包括条带读请求在内的多个读请求。每个条带读请求用于请求读出一个条带,条带读请求用于向OSD请求读出待读条带。确定每个读请求对应的对象ID。条带读请求中携带:待读条带版本号、待读条带偏移量、待读条带大小以及待读条带的对象ID。
具体而言,根据待读数据大小、待读数据偏移量可以知道包括待读条带内的需要读出的每个条带的偏移量。
参见步骤23把生成条带的方法,按照条带大小,由待写数据偏移量和待写数据的长度可以把待写数据拆分成条带,获得待读条带的偏移量。依照同样的办法,本步骤由条带大小、待读数据偏移量以及待读数据长度同样可以获得每个需要读出的条带的偏移量。条带大小可以来自文件inode,在这种情况下,不同文件可以使用不同的条带大小。也可以整个系统所有文件共用一个条带大小。
在获得待读条带的偏移量后,按照和步骤23相同的办法,可以获得待读条带所在对象的ID。需要说明的是,不论文件名是待读数据所在文件的的文件名还是快照的文件名,查询读请求对应的对象ID所使用的FID都是待读数据所在文件的FID。
步骤30,客户服务器选择用于发送条带读请求的OSD。
具体而言,本步骤可以由客户服务器的对象存储客户端执行。
同一个条带的条带读请求和条带写请求必须对应到同一个OSD。一种可行的办法是:使用和步骤24相同的OSD选择算法。
步骤31,客户服务器发送条带读请求给步骤30选出的OSD。
待读条带版本号实际上是待读条带所属文件的版本号。
可选的,还可以发送写模式给OSD,写模式和步骤25中条带写请求中携带写模式保持一致。待读条带的对象ID就是待读条带所属对象的ID。
步骤32,OSD接收条带读请求,查找待读条带的存储位置,把待读条带发送给客户服务器。
待读条带的存储位置可以是待读条带的起始地址,从起始地址开始读起,读出一个条带大小的数据,被读出的数据就是待读条带。
按照步骤26中,条带被写入的方式有多种可能。因此OSD可以使用相应的方式读出待读条带,下面同样分别进行说明。判断条带/对象是否已备份的方法可以使用步骤26介绍的标识位,例如如果标识位是1表示已备份,标识为是0表示未备份。
对于COW,可以有个特例,如果条带读请求中携带的版本号是初始版本号,其读出待读条带的方式不同于其他情况。相当于把初始版本号指定为最大版本号(即使初始版本号的值是0)。因此,例如步骤26中介绍的版本号为0的情况,因为它已经是最大的版本号。则可以不用判断待读条带的关键参数确定的条带是否已备份,直接读出这个存储位置中的数据作为待读条带发送给客户服务器。在其他情况下,可以按如下两种方式读出待读条带。除了这个特例外,其他情况可以分为以下两种方式。
方式一,OSD中数据的备份标记粒度是条带。
判断待读条带中携带的条带关键参数所确定条带是否已备份。换句话说,判断待读条带的对象ID、待读条带以及待读条带偏移量确定的存储位置的条带是否已备份。本步骤中,可以把待读条带偏移量转换成待读条带在待读条带所属对象内的编号。转换方法参见步骤26的方式一。
如果已备份,则读出由待读条带的对象ID、待读条带以及待读条带偏移量确定确定的条带,作为待读条带发送给客户服务器。
如果没有备份,则判断待读条带的对象ID的对象集中,前一个快照对象中的条带数据是否存在有效数据,直至找到有效条带数据返回。
具体而言,是从属于所述待读条带的对象ID的对象集、且快照时间比所述待读条带的快照时间早的对象中,使用待读条带偏移量,按照对象的快照时间由晚到早的顺序逐个对象进行查找,直至找到标记为已备份的条带,把找到的条带作为待读条带发送给所述客户服务器。其中,对象的快照时间是指生成这个对象前,文件或者包含这个文件的文件系统最晚一次快照的时间。
如果快照时间越晚快照版本号越大。那么按照对象的快照时间由晚到早进行查找,具体而言:对于ROW,按照版本号从大到小的顺序逐个进行查找;对于COW,从版本号从小到大顺序逐个进行查找。
当然,如果在条带被写入OSD时,使用的待读条带版本号是:快照时间点越晚则版本号越大。那么本步骤中中查找的待读条带顺序相反。
方式二,OSD中数据的备份标记粒度是对象。
本步骤和方式的区别在于备份标记粒度由条带变为了对象。
判断待读条带中携带的条带关键参数确定的存储位置是否存储有有效数据。换句话说,判断待读条带的对象ID、待读条带版本号确定的存储位置(对象空间)是否存储有有效数据。
如果有有效数据,则读出由待读条带的对象ID、待读条带版本号以及待读条带偏移量确定确定的有效数据,作为待读数据发送给客户服务器。
如果没有存储有效数据,则与本步骤方式一相似的方式,按照快照版本号由小到大的顺序,依次查找对象集中的对象,直至找到存储有有效的快照对象。按照所述待读条带偏移量从中读出待读条带发送给客户服务器。
图4是基于ROW的读条带方案,如图所示,文件A由对象1、对象2、对象3共同组成。在这些对象首次被存储到OSD后,他们的版本号是0。在图4中,对象1.0表示对象ID是1,版本号是0的对象。类似的,对象3.2表示对象ID是3,版本号是2的对象。实线的对象表示这个对象有备份,虚线的对象的没有备份。
本例中,第一次快照(版本号是1)后,对象1的数据没有更新,对象 2以及对象3有备份,对象1没有备份。第二次快照(版本号是2)后,对象3有备份,对象1以及对象2有备份。第三次快照(版本号是3)后,对象1有备份,对象2、3没有备份。
由对象集的概念可知,在对象1.0所在的对象集中包括对象1.0和对象1.3。在对象2所在的对象集中,包括对象2.0和对象2.1。在对象3.0所在的对象中,包括对象3.0,对象3.1以及对象3.2。
图4中箭头的方向标记了对象的查找关系。如果条带读请求希望读出对象1.2中的条带,由图可知,这个对象没有备份,而按照版本号由大到小的顺序,对象1.0有备份,因此读出读写1.0中的条带。基于同样的道理,如果条带读请求希望读出对象2.2或者对象2.3中的待读条带,那么实际读出的是对象2.1中的数据。当然,如果条带读请求希望读出对象1.3或者对象2.1或者对象3.2中的数据,由于这些对象已备份,因此可以直接读出。
图5是基于COW的读条带方案,和图4不同的是,查找的顺序相反,按照版本号由小到大的顺序进行查找。
如果备份标记粒度是条带,其原理和图4、图5相似,区别在于备份标记所标记的目标物是对象中的条带而不是对象。
使用上述方式一或者方式二,客户服务器收到条带读请求以及其他读请求的返还的数据,把它们拼接起来即可生成待读数据。
如图6,是执行上述方法的硬件。客户服务器41的接口413和对象存储设备42的接口423连接。其中客户服务器41由处理器411、存储介质412、接口413组成,处理器411和存储介质412、接口413连接。存储介质412例如是内存,其中中存储有计算机程序。处理器411运行存储介质412中的程序,执行上述方法中由客户服务器执行的步骤。接口413提供与OSD之间的接口,例如发送读条带请求或者写条带请求给OSD。客户服务器41可以不设置永久性存储器,也就是说上述方法中涉及的需要客户服务器41记录的所有信息可以都记录在客户服务器41的易失性存储介质412中。
OSD42包括处理器421、存储介质422、接口423以及硬盘424组成。处理器421和存储介质422、接口423连接;硬盘424和存储介质422连接。存储介质422可以是易失性介质,例如是内存,其中存储有计算机程序。处理器421运行存储介质422中的程序,执行上述方法中由客户服务器执行的步骤。接口423提供与OSD之间的接口,例如发送读条带请求或者写条带请求给OSD。硬盘424对条带提供持久化存储,例如为待写条带/对象提供物理存储空间,以及存储待读条带/对象,通常是非易失性存储介质。影片424可以用闪存或者可擦写光盘等其成介质代替。
参见图7,是本发明实施例数据处理系统的结构图。
数据处理系统由客户服务装置51和对象存储装置52组成。其中客户服务装置51可以是物理设备例如服务器,也可以是由运行在服务器上的软件实现的虚拟模块;对象存储装置52可以是物理设备例如对象存储设备,也可以是由运行在对象存储设备上的软件实现的虚拟模块。客户服务装置51可以用于执行上述方法中由客户服务器执行的步骤;对象存储装置52可以用于执行上述方法中由对象存储设备执行的步骤,
客户服务装置51包括条带请求生成模块511、和条带请求生成模块511连接的条带请求发送模块512。可选的,还可以包括与条带请求生成模块511连接的快照模块513。
对象存储装置52包括条带请求接收模块521,以及和条带接收模块521连接的条带存储模块522、条带读取模块523。在实现存储条带的功能时,条带读取模块不是必须的。在实现读取条带的功能时,条带存储模块不是必须的。条带请求接收模块521和条带请求发送模块512连接。
下面对各模块功能继续具体说明。
快照模块513,用于创建快照,快照的目标包括文件,为快照分配快照ID。
创建快照包括2种方式,一种是对文件创建快照,快照的目标是单个文件。另外一种是对文件系统创建快照,快照的目标是整个文件系统,文 件系统中包括多个文件。这两种方式下,文件元数据的保存位置不同。
在创建文件快照的方式中,选定文件创建快照,给文件设置快照名,如果快照名是没有使用过的,为文件的快照分配快照ID。并将文件快照ID作为文件的元数据保存在文件的inode(index node)中。需要说明的是,快照ID是快照的标记,例如使用创造快照的时间点作为快照ID。或者按创建快照的时间点的先后顺序,使用递增的数字作为快照ID。
在创建文件系统快照的方式中,选定文件系统进行快照,如果快照名是没有使用过的,则为文件系统的快照分配快照ID。然后把分配的快照ID保存在文件系统的根inode中。在这种方式中,可以认为文件系统中各个文件的快照ID和文件系统的快照ID相同。和前一种方式所不同的是,文件的快照ID不是保存在文件的inode中,而是保存在文件系统的根inode中。
除了文件快照ID,文件元数据中还包括文件编号(FID,File Identification)。文件元数据还可以包括文件大小(Size)、写入时间以及等信息。
需要说明的是,快照模块513是可选的。本发明实施例主要描述在创建一次快照之后,以及创建下一次快照之前,客户服务装置和对象存储装置的操作。
条带请求生成模块511,用于接收文件写请求,文件写请求中携带有待写数据、待写数据偏移量以及文件名。待写数据是文件的一部分。
具体而言,条带请求生成模块511的功能可以是由客户服务器的文件系统程序执行。文件写请求是能够被文件系统识别的写请求。文件写请求可以是创建某个文件,或者使用待写数据对已经存在的文件进行更新,待写数据是文件的一部分或者文件的全部。
所述文件写请求还可以携带待写数据大小,以便后续根据待写数据偏移量把待写数据拆分成条带。也可以不携带待写数据大小,因为待写数据大小可以通过对待写数据进行测量获得。
待写数据偏移量描述了待写数据在文件内的相对位置。具体而言,待 写数据偏移量可以描述待写数据的起始位置相对于文件头的距离。如果待写数据偏移量是0,表示待写数据的起始位置是待写文件的起始位置。如果待写数据偏移量是1KB,表示待写数据的起始位置距离文件的起始位置1KB的数据大小。
可选的,文件写请求还可以携带文件写请求的文件路径,文件路径指示了文件以及映射关系表的存储位置。文件路径和文件名共同确定一个文件。例如文件路径和文件名的组合是/root/mma/a1,其中/root/mma/是文件路径,a1是文件名,/root/mma/这个路径下存储有文件以及映射关系表。
不同的文件可以有不同的文件名。同一个文件路径下的文件名不重复。
可选的,写请求还可以携带映射关系表的存储位置,映射关系表记录了文件名和FID的映射关系。
在执使用文件名查询映射关系表之前,可以记录快照ID和文件版本号的映射关系,可以执行以下两个操作。
(1)备份当前最新的文件元数据,具体而言,可以通过备份inode实现。文件级别的快照,则备份文件的inode。文件系统创建快照,则备份文件系统的inode,既包括文件的inode也包括文件的根inode中。
(2)更新inode中的版本号。如果客户服务器中设置的写模式是ROW,更新的版本号保存在被备份的inode中。对于客户服务器中设置的写模式是COW,更新的版本号保存在备份生成的inode中;可选的,被备份的inode也可以记录更新的版本号。例如把A inode备份生成B inode,那么A inode是被备份的inode;B inode是备份生成的inode。
文件版本号和快照ID存在对应关系,而快照ID又和快照时间对应,因此也可以认为文件版本号和快照时间有对应关系。对应关系是指,每个文件版本号对应有唯一的一个快照ID。以及文件版本号的变化规律和快照ID相似。例如:快照ID越大,其文件版本号越大。或者快照ID越大,其文件版本号越小。在多个快照之间,快照时间越晚的快照其ID也越大。
需要说明的是,在例如SAN在内的基于块系统的写数据技术中。使用 卷ID而不是文件名对卷进行标记。卷ID的作用和FID类似。此外卷没有和文件路径相类似的概念。因此,不再需要查询映射关系表,可以直接由卷ID查询卷元数据,获得文件版本号。
条带请求生成模块511,还用于使用文件名查询映射关系表,获得待写数据所在文件的文件编号(FID);根据FID查询文件元数据,获得文件版本号。
映射关系表记录了文件名和FID的映射关系,文件名和FID一一对应。映射关系表的存储位置可以携带在文件写请求中,由客户服务器从写请求中获得。映射关系表也可以由客户服务器预先存储在客户服务器中,客户服务器根据文件路径找到映射关系表。映射关系表还可以存储在其他存储设备中。
条带请求生成模块511还可以把获得的文件版本号更新到元数据中。更新后,文件元数据中记录了FID和文件版本号,使用FID即可从文件元数据中查询获得文件版本号。文件元数据可以保存在inode信息中。文件路径指示了inode的存储位置:由上文可知,对于ROW,版本号保存在被备份的inode中,因此条带请求生成模块511读取的是被备份的inode。对于COW,版本号保存在备份生成的inode中。因此条带请求生成模块512读取的是备份生成的inode。
文件版本号和文件快照ID有一一对应关系,客户服务器在生成快照ID后,生成与之一一对应的文件版本号。例如可以直接把快照ID作为文件版本号,也可以把快照ID运算后作为文件版本号。如果越晚创建的快照其快照ID越大,那么一种可选的方式是:越晚创建的快照,其快照ID的值越大;另外一种可选的方式是:越晚创建的快照,其快照ID的值越小。
条带请求生成模块511,还用于将待写数据拆分成包括待写条带(strip)在内的多个条带。按照条带分布策略,获得待写条带偏移量以及待写条带所属于的对象的ID,这个ID称为对象ID。
客户服务器按照条带大小(Size)把数据拆分成一个或者多个条带。条 带是一定大小的数据。其中,当待写数据小于或者等于单个条带的大小时,拆分成1个条带;否则拆分成多个条带。同一个文件拆分出的条带大小相同。条带大小(Size)可以保存在文件元数据中,在这种情况下,不同的文件可以使用不同的条带大小。条带大小也可以不保存在对象所属于的文件的元数据中,而是整个文件系统中的文件共用一个条带大小,在这种情况下,不同文件使用相同的条带大小,条带大小保存在文件系统根的inode中。对象可以看做一个容器,可以容纳条带。
举例:待写数据被拆分成若干个数据条带,则拆分生成的条带是指被拆分出的数据条带。或者待写数据在拆分成数据条带后,还生成若干个校验条带对数据条带进行数据保护,则拆分生成的条带既包括数据条带也包括校验条带。
每个对象中拥有的条带总数可以保存在文件元数据中,在这种情况下,不同的文件的对象拥有的条带总数可以是不同的。每个对象中拥有的条带总数也可以不保存在对象所属于的文件的元数据中,在这种情况下,不同文件的对象拥有的条带总数是相同的。
需要说明的是,由待写数据偏移量可以知道待写数据在文件中的起始位置。由待写数据偏移量和待写数据大小可以知道待写数据在文件中的结束位置。如果待写数据的起始位置不是条带大小整数倍,或者结束位置的偏移量加1的值不是条带大小的整数倍,先按照条带大小对待写数据进行拆分,拆分的边界是条带大小整数倍。如果拆分后产生大小不足一个条带的数据(这种数据也可以称为条带的脏数据),将其补齐形成条带。由于条带请求生成模块511执行的补齐操作,在没有特别说明的情况下,后续提到的条带、条带偏移量都是指补齐后的条带、条带偏移量。
例如:待写数据的偏移量范围是4KB-300KB,条带的大小是256KB。那么,以0KB和256KB作为边界拆分待写数据。形成2个数据块,这两个数据块在待写数据中的偏移量范围分别是4KB-255KB和256KB-300KB。对这两个数据块进行补齐,形成2个大小为256KB的条带。其中,补齐前 一个数据块的数据(大小是3KB-0KB=3KB)来自前一个条带,补齐后一个数据块的数据(大小是511KB-300KB=21KB)来自后一个条带。待写数据的偏移量是待写数据在文件内的相对位置。
另外一种补齐办法是:如果待写数据的起始位置不是条带大小整数倍,或者结束偏移量加1的值不是条带大小整数倍,可以对条带进行补齐操作。使得拆分后的条带大小一致,并且条带中不存在空白。可以把OSD中已经存储的数据读取出来作为补齐用的数据。
例如:待写数据的偏移量范围是4KB-300KB,条带的大小是256KB。那么,可以把待写数据补齐后形成偏移量范围0KB-511KB的数据,然后再将其拆分成0KB-255KB和256KB-511KB共2个条带,使得每个条带的大小都是256KB。
条带分布策略由客户服务器的文件系统提供。描述了条带所属于的对象,也就是条带和对象的对应关系。具体而言,可以是条带的偏移量和对象的对应关系。
对象ID唯一标记了一个对象,属于同一个文件的对象的ID不同,不同文件的对象的ID也不同。
可选的,对象ID和对象所属的文件的FID可以存在对应关系。也就说例如,由对象ID可以知道这个对象ID所代表的对象所来自的文件。
例如:一种可选的对象ID生成方式是,对象ID由64位二进制数组成,其中,前32位是对象所属的文件的ID,后32位由客户服务器赋予,后32位在文件内唯一,同一个文件不同对象的后32位不同,例如使用文件内的对象编号。在这种方式中,由对象ID的前32位即可获知对应的FID。类似的,在块(block)系统中,也可以建立对象ID和卷ID的关系。
另外一种可选的对象ID生成方式是:对象ID由48位二进制数组成,前16位和文件对应,不同文件前16位不同;后后32位由客户服务器赋予,后32位在文件内唯一,同一个文件不同对象的后32位不同。
在其他实施例中,ID和对象所属的文件的FID存储存也可以不存在对 应关系。
图3A与图3B示例了两种不同的条带分布策略。条带索引描述了条带在文件中的偏移量关系,条带索引是大于等于0的整数,最小的条带索引是0,第二小的条带索引是1,第三小的条带索引是2,……,以此类推。索引数值相邻的2个条带,在文件中的偏移量也相邻。
一种可选的条带分布策略是,如图3A:(1)属于同一个文件的对象大小是固定的,由于同一个文件的条带大小是相同的,也就意味着不同的对象拥有的条带总数是相同的;(2)条带按照索引顺序,先装满前一个对象再装下一个对象,也就是说,按照条带在待写数据中的偏移量大小顺序,连续的若干个条带属于同一个对象中。如图3A,每个对象固定由4个条带组成。样例:条带大小为256KB,每个对象拥有4个条带,也就是说对象大小是256KB×4=1024KB。那么第1个对象保存第0~3个条带,第2个对象保存4~7个条带,第3个对象保存第8-11个条带……相应的,第一个对象的ID是0,第二个对象的ID是1,第三个对象的ID是3……
用条带偏移量描述条带在对象内的相对位置,具体而言,可以是条带的起始数据在对象内的相对位置。条带偏移量=(条带索引%对象内的条带数量)×条带大小。其中,条带索引%对象内的条带数量是计算条带索引除以对象内的条带数量的余数的意思。
另外一种可选的条带分布策略如图3B:(1)同一个文件中对象的大小不固定,也就是说,同一个文件的不同对象可以拥有不同的条带总数;(2)对象总数固定,也就是说,不同文件拥有相同数量的对象,如图3B,一共有3个对象。样例:条带大小为256KB,对象总数固定为3,则第1个条带(条带0)位于第一个对象(对象0)中,第2个条带(条带1)位于第二个对象(对象1)中,……,依次类推,第4个条带(条带3)又位于第一个对象中,第5个条带(条带4)又位于第二个对象中。条带索引是大于等于0的整数,描述条带之间在文件中的位置关系。同时可以确定各条带在所属对象内的偏移量,文件内的对象编号可以是条带索引除以文件中对象 总数取模后得到的余数。具体计算公式可以是:文件内的对象编号=条带索引%文件中对象个数。条带偏移量=(条带索引/对象个数)×条带大小。
条带索引可以由待写数据的偏移量确定。例如:对整个文件而言,其拆分后起始数据位于第一个对象的条带(条带0),而本次待写数据偏移量位于对象1的第5个条带(条带4)。那么由待写数据拆分生成的条带中,第一个条带的索引就是4,其余条带的索引依次类推。
以上介绍了两种计算条带所属对象的ID的方案,根据条带分布策略的不同,还可以有其他实现方案,不同的分布策略使用的参数可以不同,而这些参数通常可以从客户服务器中查询获得。
由于各个条带的处理方式相同,因此下面仅以“待写条带”作为代表进行介绍。
条带请求发送模块512,用于选择用于存储待写条带的OSD。
一种可选的算法是根据待写条带的FID确定存储待写条带的OSD。例如:FID的哈希值除以OSD总数,余数作为存储待写条带的OSD的编号。也就是FID的哈希值对OSD总数取模。还可以有其他方案,例如由客户服务器选择任意一个OSD存储属于某个对象的待写条带。属于同一对象的条带可以存储到同一个OSD中。
此外,也可以根据待写条带的FID和对象ID共同确定存储条带的OSD。实际上,算法可以任意选择,只要能选择一个OSD出来即可。
条带请求发送模块512,还用于发送条带写请求给OSD,条带写请求携带待写条带、待写条带版本号、待写条带大小、待写条带偏移量、待写条带所属对象ID。
可选的,OSD既支持ROW也支持COW的情况下,还可以发送写模式,以便OSD按照客户服务器指定的写模式写入待写条带。写模式是ROW或者COW。如果OSD仅支持一种写模式,则可以不用发送写模式给OSD。
条带请求接收模块521,用于接收条带写请求,把待写条带写入OSD的存储介质。
条带请求接收模块521可以执行步骤26的方法。例如可以使用四种方式中的一种或者多种实现待写条带的写入。
条带请求生成模块511,还可以用于接收文件读请求,文件写请求中携带有文件名、待读数据大小、待读数据偏移量。
和文件写请求类似,文件读请求还可以携带文件读请求的文件路径,文件路径记录了映射关系表的存储位置。由文件路径和文件名可以确定唯一确定一个文件。
具体而言,本步骤可以由客户服务器的文件系统程序执行。文件读请求是能够被文件系统识别的写请求。文件读请求请求读出的是一个完整的文件,或者文件的部分数据。
其中,待读数据偏移量述了待读数据在文件内的相对位置。具体而言,待读数据偏移量可以描述待读数据的起始位置相对于文件头的距离。如果待读数据偏移量是0,表示待读数据的起始位置是待读文件的起始位置。如果待读数据偏移量是2KB,表示待读数据的起始位置距离文件的起始位置2KB的数据大小。
可选的,文件读请求还可以携带文件路径,文件路径记录了映射关系表的存储位置。映射关系表的细节参见步骤21的介绍。
文件名可能是待读数据所在文件的文件名,也可能是待读数据所在文件的一个快照的文件名。如果是前者,说明文件读请求希望访问的是最新的待读数据;如果是后者,说明文件读请求希望访问的是某个快照的待读数据。条带存储模块522,用于使用文件名查询映射关系表,获得待读数据所在文件的FID;根据FID查询文件元数据,获得文件版本号。
如果文件名是待读数据所在文件的的文件名,那么存储映射关系表的文件路径是待读数据所在文件的文件路径。根据文件对应的FID查询元数据获取到文件版本号。
如果文件名是快照的文件名,那么映射关系表的文件路径是快照文件所在路径。根据快照文件的FID查询元数据获取文件版本号。
映射关系表记录了文件名和FID的映射关系,文件名和FID一一对应。FID的介绍以及FID和文件版本号的关系参见步骤21以及步骤22。映射关系表的存储位置可以携带在文件读请求中,由客户服务器从写请求中获得。映射关系表也可以由客户服务器预先存储在客户服务器中,客户服务器根据文件路径找到映射关系表。映射关系表还可以存储在其他存储设备中。
参见步骤22,根据具体情况的不同。元数据可能存储在文件的inode中也可能存储在文件系统的根inode中。
快照ID和文件版本号存在对应一一关系,因此客户服务器根据快照ID可以获得文件版本号。这个对应关系可以存储在文件元数据中。
条带请求生成模块512,还可以用于:将文件读请求处理转换成包括条带读请求在内的多个读请求。每个条带读请求用于请求读出一个条带,条带读请求用于向OSD请求读出待读条带。确定每个读请求对应的对象ID。条带读请求中携带:待读条带版本号、待读条带偏移量、待读条带大小以及待读条带的对象ID。
具体而言,根据待读数据大小、待读数据偏移量可以知道包括待读条带内的需要读出的每个条带的偏移量。
参见步骤23把生成条带的方法,按照条带大小,由待写数据偏移量和待写数据的长度可以把待写数据拆分成条带,获得待读条带的偏移量。依照同样的办法,本步骤由条带大小、待读数据偏移量以及待读数据长度同样可以获得每个需要读出的条带的偏移量。条带大小可以来自文件inode,在这种情况下,不同文件可以使用不同的条带大小。也可以整个系统所有文件共用一个条带大小。
在获得待读条带的偏移量后,按照和步骤23相同的办法,可以获得待读条带所在对象的ID。需要说明的是,不论文件名是待读数据所在文件的的文件名还是快照的文件名,查询读请求对应的对象ID所使用的FID都是待读数据所在文件的FID。
条带请求发送模块512,还可以用于:选择用于发送条带读请求的OSD。
具体而言,本步骤可以由客户服务器的对象存储客户端执行。
同一个条带的条带读请求和条带写请求必须对应到同一个OSD。一种可行的办法是:使用和步骤24相同的OSD选择算法。
条带请求发送模块512,还可以用于:发送条带读请求给选出的OSD。
待读条带版本号实际上是待读条带所属文件的版本号。
可选的,还可以发送写模式给OSD,写模式和步骤25中条带写请求中携带写模式保持一致。待读条带的对象ID就是待读条带所属对象的ID。
条带请求接收模块521,还可以用于:接收条带读请求,查找待读条带的存储位置,把待读条带发送给客户服务装置。
条带请求接收模块521可以实现步骤32的功能,例如使用步骤32提及方式一或者方式二读出待读条带。因此条带请求接收模块521的具体功能,可以参见步骤32。
本发明的各个方面、或各个方面的可能实现方式可以被具体实施为系统、方法或者计算机程序产品。因此,本发明的各方面、或各个方面的可能实现方式可以采用完全硬件实施例、完全软件实施例(包括固件、驻留软件等等),或者组合软件和硬件方面的实施例的形式,在这里都统称为“电路”、“模块”或者“系统”。此外,本发明的各方面、或各个方面的可能实现方式可以采用计算机程序产品的形式,计算机程序产品是指存储在计算机可读介质中的计算机可读程序代码。
计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质包含但不限于电子、磁性、光学、电磁、红外或半导体系统、设备或者装置,或者前述的任意适当组合,如随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或者快闪存储器)、光纤、便携式只读存储器(CD-ROM)。
计算机中的处理器读取存储在计算机可读介质中的计算机可读程序代码,使得处理器能够执行在流程图中每个步骤、或各步骤的组合中规定的 功能动作;生成实施在框图的每一块、或各块的组合中规定的功能动作的装置。

Claims (45)

  1. 一种数据存储方法,其特征在于,所述方法包括:
    对象存储设备OSD接收客户服务器发送的条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属文件或者卷的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待写条带在所属对象中的位置,所述待写条带的对象ID是所述待写条带所属对象的ID;
    所述OSD将所述待写条带写入由所述对象ID、所述待写条带版本号以及所述待写条带偏移量确定的存储位置。
  2. 根据权利要求1所述的方法,其特征在于,在接收所述客户服务器发送的条带写请求之前,所述方法还包括:
    所述客户服务器对所述待写条带所属文件或者卷进行快照,生成所述最近一次快照的快照ID;
    根据所述最近一次快照的快照ID生成所述待写条带版本号。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    所述客户服务器把所述待写条带版本号更新到所述文件或者卷的元数据中。
  4. 根据权利要求1、2或3所述的方法,在所述OSD接收所述条带写请求之前,所述方法进一步包括:
    所述客户服务器接收文件写请求,所述文件写请求携带待写数据、待写数据偏移量、以及文件的名称,所述待写数据是所述文件的一部分;
    所述客户服务器根据所述文件的名称获得文件编号FID,根据FID查询所述文件的元数据,获得所述文件的版本号,将所述文件的版本号作为所述待写条带版本号,其中,所述文件的版本号与所述待写条带所属文件的最近一次快照的快照ID对应;
    所述客户服务器按照所述待写数据偏移量以及所述待写数据的大小,把所述待写数据拆分成包括所述待写条带的多个条带,确定所述待写条带所属对象 的ID,以及获得所述待写条带偏移量;
    创建所述条带写请求。
  5. 根据权利要求1、2或3所述的方法,在所述OSD接收所述条带写请求之前,所述方法进一步包括:
    所述客户服务器接收卷写请求,所述卷写请求携带有待写数据、待写数据偏移量以及卷的编号ID,所述待写数据是所述卷的一部分;
    所述客户服务器根据所述卷的ID查询所述卷的元数据,获得所述卷的版本号,将所述卷的版本号作为所述待写条带版本号,其中,所述卷的版本号与所述待写条带所属卷的最近一次快照的快照ID对应;
    所述客户服务器按照所述待写数据偏移量以及所述待写数据的大小,把所述待写数据段拆分成包括所述待写条带的多个条带,确定所述待写条带所属对象的ID,以及获得所述待写条带偏移量;
    创建所述条带写请求。
  6. 一种数据存储方法,其特征在于,所述方法包括:
    对象存储设备OSD接收客户服务器发送的条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属文件或者卷的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待写条带在所属对象中的位置,所述待写条带的对象ID是所述待写条带所属对象的ID;
    所述OSD判断由所述待写条带版本号和所述对象ID确定的对象是否已备份:
    如果已备份,则所述OSD将所述待写条带写入由所述对象ID、所述待写条带版本号以及所述待写条带偏移量确定的存储位置;
    如果未备份,则所述OSD使用所述待写条带建立一个拼接对象,然后把所述拼接对象写入由所述待写条带版本号和所述对象ID确定的存储位置。
  7. 根据权利要求6所述的方法,则所述OSD使用所述待写条带建立一个 拼接对象,具体包括:
    所述OSD从属于所述待写条带的对象ID的对象集中的已备份对象中,选择快照时间最晚的对象,从中获得与所述待写条带偏移量不同的条带,使用与所述待写条带偏移量不同的条带以及所述待写条带共同组成所述拼接对象;
    其中,所述OSD中存储的和所述待写条带的对象ID相同、且与所述待写条带版本号不同的对象的集合称为所述待写条带的对象ID的对象集。
  8. 根据权利要求6或7所述的方法,其特征在于,在接收所述客户服务器发送的条带写请求之前,所述方法还包括:
    所述客户服务器对所述待写条带所属文件或者卷进行快照,生成所述最近一次快照的快照ID;
    根据所述最近一次快照的快照ID生成所述待写条带版本号;
    所述客户服务器把所述待写条带版本号更新到所述文件或者卷的元数据中。
  9. 一种数据处理方法,其特征在于,该方法包括:
    对象存储设备OSD接收客户服务器发送的条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属文件或者卷的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待写条带在所属对象中的位置,所述待写条带的对象ID是待写条带所属对象的ID;
    所述OSD判断由所述待写条带版本号、所述待写条带的对象ID以及所述待写条带偏移量确定的条带是否已备份;
    如果已备份,则将所述待写条带写入由所述待写条带版本号、所述待写条带的对象ID以及所述待写条带偏移量确定的存储位置;
    如果未备份,则将所述OSD中初始版本对象中位于所述待写条带偏移量、大小是所述待写条带大小的数据备份到由所述待写条带版本号、所述待写条带偏移量以及所述待写条带的对象ID确定的存储位置,其中,所述初始版本对象的对象ID和所述待写条带的对象ID相同,所述初始版本对象的版本号是初始 版本号;把所述待写条带写入由所述待写条带的对象ID、所述初始版本号以及所述待写条带偏移量确定的存储位置。
  10. 根据权利要求9所述的方法,其特征在于,在接收所述客户服务器发送的条带写请求之前,所述方法还包括:
    所述客户服务器对所述待写条带所属文件或者卷进行快照,生成所述最近一次快照的快照ID;
    根据所述最近一次快照的快照ID生成所述待写条带版本号。
  11. 根据权利要求10所述的方法,其特征在于,所述方法还包括:
    所述客户服务器把所述待写条带版本号更新到所述文件或者卷的元数据中。
  12. 根据权利要求9、10或11所述的方法,在所述OSD接收所述条带写请求之前,所述方法进一步包括:
    所述客户服务器接收文件写请求,所述文件写请求携带待写数据、待写数据偏移量、以及文件的名称,所述待写数据是所述文件的一部分;
    所述客户服务器根据所述文件的名称获得文件编号FID,根据FID查询所述文件的元数据,获得所述文件的版本号,将所述文件的版本号作为所述待写条带版本号,其中,所述文件的版本号与所述待写条带所属文件的最近一次快照的快照ID对应;
    所述客户服务器按照所述待写数据偏移量以及所述待写数据的大小,把所述待写数据拆分成包括所述待写条带的多个条带,确定所述待写条带所属对象的ID,以及获得所述待写条带偏移量;
    创建所述条带写请求。
  13. 根据权利要求9、10或11所述的方法,在所述OSD接收所述条带写请求之前,所述方法进一步包括:
    所述客户服务器接收卷写请求,所述卷写请求携带有待写数据、待写数据偏移量以及卷的编号ID,所述待写数据是所述卷的一部分;
    所述客户服务器根据所述卷的ID查询所述卷的元数据,获得所述卷的版本 号,将所述卷的版本号作为所述待写条带版本号,其中,所述卷的版本号与所述待写条带所属卷的最近一次快照的快照ID对应;
    所述客户服务器按照所述待写数据偏移量以及所述待写数据的大小,把所述待写数据段拆分成包括所述待写条带的多个条带,确定所述待写条带所属对象的ID,以及获得所述待写条带偏移量;
    创建所述条带写请求。
  14. 一种数据处理方法,其特征在于,所述方法包括:
    对象存储设备OSD接收客户服务器发送的条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属文件或者卷的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待定条带在所属对象中的位置,所述待写条带的对象ID是所述待写条带所属对象的ID;
    所述OSD判断由所述待写条带版本号和所述对象ID确定的对象是否已备份:
    如果已备份,则所述OSD将所述待写条带写入由所述对象ID、所述对象的版本号以及所述待写条带偏移量确定的存储位置;
    如果未备份,则将所述OSD中初始版本对象中的数据备份到由所述待写条带版本号以及所述对象ID确定的存储位置,其中,所述初始版本对象的对象ID和所述待写条带的对象ID相同,所述初始版本对象的版本号是初始版本号;
    所述OSD将所述待写条带写入由所述对象ID、所述初始版本号以及所述待写条带偏移量确定的存储位置。
  15. 一种读数据方法,其特征在于,所述方法包括:
    对象存储设备OSD接收客户服务器发送的读条带请求,所述读条带请求中携带待读条带大小、待读条带偏移量、待读条带版本号以及待读条带的对象ID,其中,所述待读条带版本号与所述待读条带所属文件或者卷的最近一次快照的快照ID对应,所述待读条带的对象ID是所述待读条带所属对象的ID;
    所述OSD判断由所述对象ID、所述待读条带版本号以及所述待读条带偏移量所确定的条带是否已备份:
    如果已备份,则读取由所述对象ID、所述待读条带版本号、所述待读条带偏移量以及所述待读条带大小所确定的数据,把读取的数据作为待读条带发送给所述客户服务器;
    如果未备份,则从对象ID和所述待读条带的对象ID相同、版本号和所述待读条带的版本号不同的对象中,按照对象的快照时间由晚到早的的顺序,逐个对象进行查找,直至找到在所述待读条带偏移量的存储位置存储有有效数据的对象,把找到的有效数据作为待读条带发送给所述客户服务器,其中,所述对象的版本号和所述对象生成之前,所属文件或者卷的最近一次快照的快照ID对应。
  16. 根据权利要求15所述的方法,该方法进一步包括:
    如果所述OSD的写模式是ROW,则对象的快照时间由晚到早的的顺序是版本号由大到小的顺序;或者
    如果所述OSD的写模式是COW,则对象的快照时间由晚到早的的顺序是版本号由小到大的顺序。
  17. 根据权利要求15或16的方法,在所述OSD接收所述客户服务器发送的条带写请求之前,所述方法还包括:
    所述客户服务器接收文件读请求,所述文件读请求中携带文件的名称、待读数据大小、待读数据偏移量,所述待读数据是所述文件的一部分;
    所述客户服务器根据所述文件的名称获得文件编号F I D,根据F I D查询所述文件的元数据,获得所述文件的版本号,将所述文件的版本号作为所述待读条带版本号,其中,所述文件的版本号与所述待读条带所属文件的最近一次快照的快照ID对应;
    所述客户服务器,按照所述待读数据偏移量以及所述待读数据的大小,确定所述待读条带所属对象的ID,以及获得所述待读条带偏移量;
    生成所述条带读请求。
  18. 根据权利要求15或16的方法,在所述OSD接收所述客户服务器发送的条带写请求之前,所述方法还包括:
    所述客户服务器接收卷读请求,所述卷读请求中携带卷ID、待读数据大小、待读数据偏移量,所述待读数据是所述卷的一部分;
    所述客户服务器根据卷I D查询所述卷的元数据,获得所述卷的版本号,将所述卷的版本号作为所述待读条带版本号,其中,所述卷的版本号与所述待读条带所属卷的最近一次快照的快照ID对应;
    所述客户服务器,按照所述待读数据偏移量以及所述待读数据的大小,确定所述待读条带所属对象的ID,以及获得所述待读条带偏移量;
    生成所述条带读请求。
  19. 一种读数据方法,应用于对象存储系统中,所述对象存储系统包括基于对象的存储设备OSD和客户服务器,其特征在于,所述方法包括:
    所述OSD接收所述客户服务器发送的读条带请求,所述读条带请求中携带待读条带大小、待读条带偏移量、待读条带版本号以及待读条带的对象ID,其中,所述待读条带版本号与所述待读条带所属文件或者卷的最近一次快照的快照ID对应,所述待读条带的对象ID是所述待读条带所属对象的ID;
    所述OSD判断由所述对象ID、所述待读条带版本号确定的对象是否已备份:
    如果已备份,则读取由所述对象ID、所述待读条带版本号、所述待读条带偏移量以及所述待读条带大小所确定的数据,把读取的数据作为待读条带发送给所述客户服务器;
    如果未备份,则从对象ID和所述待读条带的对象ID相同、版本号和所述待读条带的版本号不同的对象中,按照对象的快照时间由晚到早的的顺序,逐个对象进行查找,直至找到在所述待读条带偏移量的存储位置存储有有效数据的对象,把找到的有效数据作为待读条带发送给所述客户服务器,其中,所述对象的版本号和所述对象生成之前,所属文件或者卷的最近一次快照的快照ID 对应。
  20. 一种数据存储装置,其特征在于,所述装置包括:
    条带请求接收模块,用于接收客户服务器发送的条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属文件或者卷的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待写条带在所属对象中的位置,所述待写条带的对象ID是所述待写条带所属对象的ID;
    条带存储模块,用于将所述待写条带写入由所述对象ID、所述待写条带版本号以及所述待写条带偏移量确定的存储位置。
  21. 一种数据存储装置,其特征在于,所述装置包括:
    条带请求接收模块,用于接收客户服务器发送的条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属文件或者卷的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待写条带在所属对象中的位置,所述待写条带的对象ID是所述待写条带所属对象的ID;
    条带存储模块,用于判断由所述待写条带版本号和所述对象ID确定的对象是否已备份:
    如果已备份,则所述条带存储模块还用于将所述待写条带写入由所述对象ID、所述待写条带版本号以及所述待写条带偏移量确定的存储位置;
    如果未备份,则所述条带存储模块还用于使用所述待写条带建立一个拼接对象,然后把所述拼接对象写入由所述待写条带版本号和所述对象ID确定的存储位置。
  22. 根据权利要求21所述的装置,则所述条带存储模块还用于使用所述待写条带建立一个拼接对象,具体包括:
    所述条带存储模块,用于从属于所述待写条带的对象ID的对象集中的已备份对象中,选择快照时间最晚的对象,从中获得与所述待写条带偏移量不同的 条带,使用与所述待写条带偏移量不同的条带以及所述待写条带共同组成所述拼接对象;
    其中,所述数据存储装置中存储的和所述待写条带的对象ID相同、且与所述待写条带版本号不同的对象的集合称为所述待写条带的对象ID的对象集。
  23. 一种数据处理装置,其特征在于,所述装置包括:
    条带请求接收模块,用于接收客户服务器发送的条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属文件或者卷的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待写条带在所属对象中的位置,所述待写条带的对象ID是待写条带所属对象的ID;
    条带存储模块,用于判断由所述待写条带版本号、所述待写条带的对象ID以及所述待写条带偏移量确定的条带是否已备份;
    如果已备份,则将所述待写条带写入由所述待写条带版本号、所述待写条带的对象ID以及所述待写条带偏移量确定的存储位置;
    如果未备份,则将所述数据存储装置中初始版本对象中位于所述待写条带偏移量、大小是所述待写条带大小的数据备份到由所述待写条带版本号、所述待写条带偏移量以及所述待写条带的对象ID确定的存储位置,其中,所述初始版本对象的对象ID和所述待写条带的对象ID相同,所述初始版本对象的版本号是初始版本号;把所述待写条带写入由所述待写条带的对象ID、所述初始版本号以及所述待写条带偏移量确定的存储位置。
  24. 一种数据处理装置,其特征在于,所述装置包括:
    条带请求接收模块,用于接收客户服务器发送的条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属文件或者卷的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待定条带在所属对象中的位置,所述待写条带的对象ID是所述待写条带所属对象的ID;
    条带存储模块,用于判断由所述待写条带版本号和所述对象ID确定的对象是否已备份:
    如果已备份,则将所述待写条带写入由所述对象ID、所述对象的版本号以及所述待写条带偏移量确定的存储位置;
    如果未备份,则将初始版本对象中的数据备份到由所述待写条带版本号以及所述对象ID确定的存储位置,其中,所述初始版本对象的对象ID和所述待写条带的对象ID相同,所述初始版本对象的版本号是初始版本号;
    所述条带存储模块,还用于将所述待写条带写入由所述对象ID、所述初始版本号以及所述待写条带偏移量确定的存储位置。
  25. 一种读数据装置,其特征在于,所述装置包括:
    条带请求接收模块,用于接收客户服务器发送的读条带请求,所述读条带请求中携带待读条带大小、待读条带偏移量、待读条带版本号以及待读条带的对象ID,其中,所述待读条带版本号与所述待读条带所属文件或者卷的最近一次快照的快照ID对应,所述待读条带的对象ID是所述待读条带所属对象的ID;
    条带读取模块,用于判断由所述对象ID、所述待读条带版本号以及所述待读条带偏移量所确定的条带是否已备份:
    如果已备份,则读取由所述对象ID、所述待读条带版本号、所述待读条带偏移量以及所述待读条带大小所确定的数据,把读取的数据作为待读条带发送给所述客户服务器;
    如果未备份,则从对象ID和所述待读条带的对象ID相同、版本号和所述待读条带的版本号不同的对象中,按照对象的快照时间由晚到早的的顺序,逐个对象进行查找,直至找到在所述待读条带偏移量的存储位置存储有有效数据的对象,把找到的有效数据作为待读条带发送给所述客户服务器,其中,所述对象的版本号和所述对象生成之前,所属文件或者卷的最近一次快照的快照ID对应。
  26. 根据权利要求25所述的装置,所述对象的快照时间由晚到早的的顺序, 具体包括:
    如果所述数据存储装置的写模式是ROW,则对象的快照时间由晚到早的的顺序是版本号由大到小的顺序;或者
    如果所述数据存储装置的写模式是COW,则对象的快照时间由晚到早的的顺序是版本号由小到大的顺序。
  27. 一种读数据装置,其特征在于,所述装置包括:
    条带请求接收模块,用于接收客户服务器发送的读条带请求,所述读条带请求中携带待读条带大小、待读条带偏移量、待读条带版本号以及待读条带的对象ID,其中,所述待读条带版本号与所述待读条带所属文件或者卷的最近一次快照的快照ID对应,所述待读条带的对象ID是所述待读条带所属对象的ID;
    条带读取模块,用于判断由所述对象ID、所述待读条带版本号确定的对象是否已备份:
    如果已备份,则读取由所述对象ID、所述待读条带版本号、所述待读条带偏移量以及所述待读条带大小所确定的数据,把读取的数据作为待读条带发送给所述客户服务器;
    如果未备份,则从对象ID和所述待读条带的对象ID相同、版本号和所述待读条带的版本号不同的对象中,按照对象的快照时间由晚到早的的顺序,逐个对象进行查找,直至找到在所述待读条带偏移量的存储位置存储有有效数据的对象,把找到的有效数据作为待读条带发送给所述客户服务器,其中,所述对象的版本号和所述对象生成之前,所属文件或者卷的最近一次快照的快照ID对应。
  28. 一种数据存储系统,包括客户服务器和对象存储设备,其特征在于:
    所述客户服务器用于:
    接收文件写请求,所述文件写请求携带待写数据、待写数据偏移量、以及文件的名称,所述待写数据是所述文件的一部分;
    所述客户服务器根据所述文件的名称获得文件编号FID,根据FID查询所 述文件的元数据,获得所述文件的版本号,其中,所述文件的版本号与所述文件的最近一次快照的快照ID对应;
    所述客户服务器按照所述待写数据偏移量以及所述待写数据的大小,把所述待写数据拆分成包括待写条带的多个条带,确定所述待写条带所属对象的ID,以及获得待写条带偏移量;
    创建条带写请求发送给所述对象存储设备;
    所述对象存储设备用于:
    接收所述条带写请求,所述条带写请求中携带所述待写条带、待写条带版本号、所述待写条带偏移量、以及所述待写条带的对象ID,其中,所述文件的版本号为所述待写条带版本号,所述待写条带偏移量描述所述待写条带在所属对象中的位置,所述待写条带的对象ID是所述待写条带所属对象的ID;
    所述OSD将所述待写条带写入由所述对象ID、所述待写条带版本号以及所述待写条带偏移量确定的存储位置。
  29. 根据权利要求28所述的系统,其特征在于,所述客户服务器还用于:
    在接收所述客户服务器发送的条带写请求之前,对所述待写条带所属文件进行快照,生成所述最近一次快照的快照ID;
    根据所述最近一次快照的快照ID生成所述待写条带版本号;
    把所述待写条带版本号更新到所述文件的元数据中。
  30. 一种数据存储系统,包括客户服务器和对象存储设备,其特征在于:
    所述客户服务器用于:
    接收卷写请求,所述卷写请求携带有待写数据、待写数据偏移量以及卷的编号ID,所述待写数据是所述卷的一部分;
    根据所述卷的ID查询所述卷的元数据,获得所述卷的版本号,其中,所述卷的版本号与所述卷的最近一次快照的快照ID对应;
    按照所述待写数据偏移量以及所述待写数据的大小,把所述待写数据段拆分成包括待写条带的多个条带,确定所述待写条带所属对象的ID,以及获得待 写条带偏移量;
    创建条带写请求发送给所述对象存储设备;
    所述对象存储设备用于:
    接收所述条带写请求,所述条带写请求中携带所述待写条带、待写条带版本号、所述待写条带偏移量、以及所述待写条带的对象ID,其中,所述卷的版本号为所述待写条带版本号,所述待写条带偏移量描述所述待写条带在所属对象中的位置,所述待写条带的对象ID是所述待写条带所属对象的ID;
    所述OSD将所述待写条带写入由所述对象ID、所述待写条带版本号以及所述待写条带偏移量确定的存储位置。
  31. 根据权利要求30所述的系统,其特征在于,所述客户服务器还用于:
    在接收所述客户服务器发送的条带写请求之前,对所述待写条带所属卷进行快照,生成所述最近一次快照的快照ID;
    根据所述最近一次快照的快照ID生成所述待写条带版本号;
    把所述待写条带版本号更新到所述者卷的元数据中。
  32. 一种数据存储系统,包括客户服务器和对象存储设备,其特征在于:
    所述客户服务器用于:
    接收文件写请求,所述文件写请求携带待写数据、待写数据偏移量、以及文件的名称,所述待写数据是所述文件的一部分;
    所述客户服务装置根据所述文件的名称获得文件编号FID,根据FID查询所述文件的元数据,获得所述文件的版本号,其中,所述文件的版本号与所述文件的最近一次快照的快照ID对应;
    所述客户服务器按照所述待写数据偏移量以及所述待写数据的大小,把所述待写数据拆分成包括所述待写条带的多个条带,确定所述待写条带所属对象的ID,以及获得所述待写条带偏移量;
    创建所述条带写请求发送给所述对象存储设备;
    所述对象存储设备用于:
    接收所述条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属文件的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待写条带在所属对象中的位置,所述待写条带的对象ID是所述待写条带所属对象的ID;
    用于判断由所述待写条带版本号和所述对象ID确定的对象是否已备份:
    如果已备份,则将所述待写条带写入由所述对象ID、所述待写条带版本号以及所述待写条带偏移量确定的存储位置;
    如果未备份,则使用所述待写条带建立一个拼接对象,然后把所述拼接对象写入由所述待写条带版本号和所述对象ID确定的存储位置。
  33. 根据权利要求32所述的系统,其特征在于,所述使用所述待写条带建立一个拼接对象,具体包括:
    从属于所述待写条带的对象ID的对象集中的已备份对象中,选择快照时间最晚的对象,从中获得与所述待写条带偏移量不同的条带,使用与所述待写条带偏移量不同的条带以及所述待写条带共同组成所述拼接对象;
    其中,所述数据存储设备中存储的和所述待写条带的对象ID相同、且与所述待写条带版本号不同的对象的集合称为所述待写条带的对象ID的对象集。
  34. 一种数据存储系统,包括客户服务器和对象存储设备,其特征在于:
    所述客户服务器用于:
    接收卷写请求,所述卷写请求携带有待写数据、待写数据偏移量以及卷的编号ID,所述待写数据是所述卷的一部分;
    根据所述卷的ID查询所述卷的元数据,获得所述卷的版本号,其中,所述卷的版本号与所述卷的最近一次快照的快照ID对应;
    按照所述待写数据偏移量以及所述待写数据的大小,把所述待写数据段拆分成包括所述待写条带的多个条带,确定所述待写条带所属对象的ID,以及获得所述待写条带偏移量;
    创建所述条带写请求发送给所述对象存储装置;
    所述对象存储装置用于:
    接收所述条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属卷的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待写条带在所属对象中的位置,所述待写条带的对象ID是所述待写条带所属对象的ID;
    用于判断由所述待写条带版本号和所述对象ID确定的对象是否已备份:
    如果已备份,则将所述待写条带写入由所述对象ID、所述待写条带版本号以及所述待写条带偏移量确定的存储位置;
    如果未备份,则使用所述待写条带建立一个拼接对象,然后把所述拼接对象写入由所述待写条带版本号和所述对象ID确定的存储位置。
  35. 根据权利要求34所述的系统,其特征在于,所述使用所述待写条带建立一个拼接对象,具体包括:
    从属于所述待写条带的对象ID的对象集中的已备份对象中,选择快照时间最晚的对象,从中获得与所述待写条带偏移量不同的条带,使用与所述待写条带偏移量不同的条带以及所述待写条带共同组成所述拼接对象;
    其中,所述数据存储设备中存储的和所述待写条带的对象ID相同、且与所述待写条带版本号不同的对象的集合称为所述待写条带的对象ID的对象集。
  36. 一种读数据系统,包括客户服务器和对象存储设备,其特征在于:
    所述客户服务器用于:
    接收文件读请求,所述文件读请求中携带文件的名称、待读数据大小、待读数据偏移量,所述待读数据是所述文件的一部分;
    根据所述文件的名称获得文件编号F I D,根据F I D查询所述文件的元数据,获得所述文件的版本号,其中,所述文件的版本号与所述文件的最近一次快照的快照ID对应;
    按照所述待读数据偏移量以及所述待读数据的大小,确定所述待读条带所属对象的ID,以及获得所述待读条带偏移量;
    生成条带读请求并发送;
    所述对象存储设备用于:
    接收所述读条带请求,所述读条带请求中携带待读条带大小、待读条带偏移量、待读条带版本号以及待读条带的对象ID,其中,所述待读条带版本号与所述待读条带所属文件的最近一次快照的快照ID对应,所述待读条带的对象ID是所述待读条带所属对象的ID;
    判断由所述对象ID、所述待读条带版本号以及所述待读条带偏移量所确定的条带是否已备份:
    如果已备份,则读取由所述对象ID、所述待读条带版本号、所述待读条带偏移量以及所述待读条带大小所确定的数据,把读取的数据作为待读条带发送给所述客户服务器;
    如果未备份,则从对象ID和所述待读条带的对象ID相同、版本号和所述待读条带的版本号不同的对象中,按照对象的快照时间由晚到早的的顺序,逐个对象进行查找,直至找到在所述待读条带偏移量的存储位置存储有有效数据的对象,把找到的有效数据作为待读条带发送给所述客户服务器,其中,所述对象的版本号和所述对象生成之前,所属文件或者卷的最近一次快照的快照ID对应。
  37. 一种数据读系统,包括客户服务装置和对象存储设备,其特征在于:
    所述客户服务器用于:
    接收卷读请求,所述卷读请求中携带卷ID、待读数据大小、待读数据偏移量,所述待读数据是所述卷的一部分;
    根据卷I D查询所述卷的元数据,获得所述卷的版本号,其中,所述卷的版本号与所述卷的最近一次快照的快照ID对应;
    按照所述待读数据偏移量以及所述待读数据的大小,确定所述待读条带所 属对象的ID,以及获得所述待读条带偏移量;
    生成条带读请求并发送;
    所述对象存储设备用于:
    接收所述读条带请求,所述读条带请求中携带待读条带大小、待读条带偏移量、待读条带版本号以及待读条带的对象ID,其中,所述待读条带版本号与所述待读条带所属文件或者卷的最近一次快照的快照ID对应,所述待读条带的对象ID是所述待读条带所属对象的ID;
    判断由所述对象ID、所述待读条带版本号以及所述待读条带偏移量所确定的条带是否已备份:
    如果已备份,则读取由所述对象ID、所述待读条带版本号、所述待读条带偏移量以及所述待读条带大小所确定的数据,把读取的数据作为待读条带发送给所述客户服务器;
    如果未备份,则从对象ID和所述待读条带的对象ID相同、版本号和所述待读条带的版本号不同的对象中,按照对象的快照时间由晚到早的的顺序,逐个对象进行查找,直至找到在所述待读条带偏移量的存储位置存储有有效数据的对象,把找到的有效数据作为待读条带发送给所述客户服务器,其中,所述对象的版本号和所述对象生成之前,所属文件或者卷的最近一次快照的快照ID对应。
  38. 一种对象存储设备,包括处理器、和所述处理器连接的存储介质和接口:
    所述接口,用于接收客户服务器发送的条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属文件或者卷的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待写条带在所属对象中的位置,所述待写条带的对象ID是所述待写条带所属对象的ID;
    所述存储介质存储有计算机程序;
    所述处理器,通过运行所述计算机程序,执行步骤:
    将所述待写条带写入由所述对象ID、所述待写条带版本号以及所述待写条带偏移量确定的存储位置。
  39. 一种对象存储设备,包括处理器、和所述处理器连接的存储介质和接口:
    所述接口,用于接收所述客户服务器发送的条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属文件或者卷的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待写条带在所属对象中的位置,所述待写条带的对象ID是所述待写条带所属对象的ID;
    所述存储介质存储有计算机程序;
    所述处理器,通过运行所述计算机程序,执行步骤:
    判断由所述待写条带版本号和所述对象ID确定的对象是否已备份:
    如果已备份,则所述条带存储模块还用于将所述待写条带写入由所述对象ID、所述待写条带版本号以及所述待写条带偏移量确定的存储位置;
    如果未备份,则所述条带存储模块还用于使用所述待写条带建立一个拼接对象,然后把所述拼接对象写入由所述待写条带版本号和所述对象ID确定的存储位置。
  40. 根据权利要求23所述的对象存储设备,所述处理器使用所述待写条带建立一个拼接对象,具体包括:
    所述处理器,用于从属于所述待写条带的对象ID的对象集中的已备份对象中,选择快照时间最晚的对象,从中获得与所述待写条带偏移量不同的条带,使用与所述待写条带偏移量不同的条带以及所述待写条带共同组成所述拼接对象;
    其中,所述数据存储设备中存储的和所述待写条带的对象ID相同、且与所述待写条带版本号不同的对象的集合称为所述待写条带的对象ID的对象集。
  41. 一种对象存储设备,包括处理器、和所述处理器连接的存储介质和接口:
    所述接口,用于接收客户服务器发送的条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属文件或者卷的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待写条带在所属对象中的位置,所述待写条带的对象ID是待写条带所属对象的ID;
    所述存储介质存储有计算机程序;
    所述处理器,通过运行所述计算机程序,执行步骤:
    判断由所述待写条带版本号、所述待写条带的对象ID以及所述待写条带偏移量确定的条带是否已备份;
    如果已备份,则将所述待写条带写入由所述待写条带版本号、所述待写条带的对象ID以及所述待写条带偏移量确定的存储位置;
    如果未备份,则将所述数据存储设备中初始版本对象中位于所述待写条带偏移量、大小是所述待写条带大小的数据备份到由所述待写条带版本号、所述待写条带偏移量以及所述待写条带的对象ID确定的存储位置,其中,所述初始版本对象的对象ID和所述待写条带的对象ID相同,所述初始版本对象的版本号是初始版本号;把所述待写条带写入由所述待写条带的对象ID、所述初始版本号以及所述待写条带偏移量确定的存储位置。
  42. 一种对象存储设备,包括处理器、和所述处理器连接的存储介质和接口:
    所述接口,用于接收客户服务器发送的条带写请求,所述条带写请求中携带待写条带、待写条带版本号、待写条带偏移量、以及所述待写条带的对象ID,其中,所述待写条带版本号与所述待写条带所属文件或者卷的最近一次快照的快照ID对应,所述待写条带偏移量描述所述待定条带在所属对象中的位置,所述待写条带的对象ID是所述待写条带所属对象的ID;
    所述存储介质存储有计算机程序;
    所述处理器,通过运行所述计算机程序,执行步骤:
    判断由所述待写条带版本号和所述对象ID确定的对象是否已备份:
    如果已备份,则所述数据存储设备将所述待写条带写入由所述对象ID、所述对象的版本号以及所述待写条带偏移量确定的存储位置;
    如果未备份,则将所述数据存储设备中初始版本对象中的数据备份到由所述待写条带版本号以及所述对象ID确定的存储位置,其中,所述初始版本对象的对象ID和所述待写条带的对象ID相同,所述初始版本对象的版本号是初始版本号;
    所述数据存储设备将所述待写条带写入由所述对象ID、所述初始版本号以及所述待写条带偏移量确定的存储位置。
  43. 一种对象存储设备,包括处理器、和所述处理器连接的存储介质和接口:
    所述接口,用于接收客户服务器发送的读条带请求,所述读条带请求中携带待读条带大小、待读条带偏移量、待读条带版本号以及待读条带的对象ID,其中,所述待读条带版本号与所述待读条带所属文件或者卷的最近一次快照的快照ID对应,所述待读条带的对象ID是所述待读条带所属对象的ID;
    所述存储介质存储有计算机程序;
    所述处理器,通过运行所述计算机程序,执行步骤:
    判断由所述对象ID、所述待读条带版本号以及所述待读条带偏移量所确定的条带是否已备份:
    如果已备份,则读取由所述对象ID、所述待读条带版本号、所述待读条带偏移量以及所述待读条带大小所确定的数据,把读取的数据作为待读条带发送给所述客户服务器;
    如果未备份,则从对象ID和所述待读条带的对象ID相同、版本号和所述 待读条带的版本号不同的对象中,按照对象的快照时间由晚到早的的顺序,逐个对象进行查找,直至找到在所述待读条带偏移量的存储位置存储有有效数据的对象,把找到的有效数据作为待读条带发送给所述客户服务器,其中,所述对象的版本号和所述对象生成之前,所属文件或者卷的最近一次快照的快照ID对应。
  44. 根据权利要求43所述的对象存储设备,所述对象的快照时间由晚到早的的顺序,具体包括:
    如果所述数据存储设备的写模式是ROW,则对象的快照时间由晚到早的的顺序是版本号由大到小的顺序;或者
    如果所述数据存储设备的写模式是COW,则对象的快照时间由晚到早的的顺序是版本号由小到大的顺序。
  45. 一种对象存储设备,包括处理器、和所述处理器连接的存储介质和接口:
    所述接口,用于接收客户服务器发送的读条带请求,所述读条带请求中携带待读条带大小、待读条带偏移量、待读条带版本号以及待读条带的对象ID,其中,所述待读条带版本号与所述待读条带所属文件或者卷的最近一次快照的快照ID对应,所述待读条带的对象ID是所述待读条带所属对象的ID;
    所述存储介质存储有计算机程序;
    所述处理器,通过运行所述计算机程序,执行步骤:
    判断由所述对象ID、所述待读条带版本号确定的对象是否已备份:
    如果已备份,则读取由所述对象ID、所述待读条带版本号、所述待读条带偏移量以及所述待读条带大小所确定的数据,把读取的数据作为待读条带发送给所述客户服务器;
    如果未备份,则从对象ID和所述待读条带的对象ID相同、版本号和所述待读条带的版本号不同的对象中,按照对象的快照时间由晚到早的的顺序,逐个对象进行查找,直至找到在所述待读条带偏移量的存储位置存储有有效数据 的对象,把找到的有效数据作为待读条带发送给所述客户服务器,其中,所述对象的版本号和所述对象生成之前,所属文件或者卷的最近一次快照的快照ID对应。
PCT/CN2014/095223 2014-12-27 2014-12-27 一种数据处理方法装置及系统 WO2016101283A1 (zh)

Priority Applications (13)

Application Number Priority Date Filing Date Title
KR1020177012992A KR102030786B1 (ko) 2014-12-27 2014-12-27 데이터 처리 방법, 장치 및 시스템
JP2017528138A JP6607941B2 (ja) 2014-12-27 2014-12-27 データ処理方法、装置、およびシステム
CN201480075382.2A CN105993013B (zh) 2014-12-27 2014-12-27 一种数据处理方法装置及系统
CA2965715A CA2965715C (en) 2014-12-27 2014-12-27 Data processing method, apparatus, and system
PCT/CN2014/095223 WO2016101283A1 (zh) 2014-12-27 2014-12-27 一种数据处理方法装置及系统
SG11201703410YA SG11201703410YA (en) 2014-12-27 2014-12-27 Data processing method, apparatus, and system
BR112017011412-7A BR112017011412B1 (pt) 2014-12-27 2014-12-27 Método e aparelho de armazenamento de dados
AU2014415350A AU2014415350B2 (en) 2014-12-27 2014-12-27 Data processing method, apparatus and system
CN201810336937.4A CN108733761B (zh) 2014-12-27 2014-12-27 一种数据处理方法装置及系统
EP14908851.0A EP3203386A4 (en) 2014-12-27 2014-12-27 Data processing method, apparatus and system
US15/634,774 US20170295239A1 (en) 2014-12-27 2017-06-27 Data processing method, apparatus, and system
US15/634,819 US11032368B2 (en) 2014-12-27 2017-06-27 Data processing method, apparatus, and system
US17/160,032 US11799959B2 (en) 2014-12-27 2021-01-27 Data processing method, apparatus, and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/095223 WO2016101283A1 (zh) 2014-12-27 2014-12-27 一种数据处理方法装置及系统

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US15/634,819 Continuation US11032368B2 (en) 2014-12-27 2017-06-27 Data processing method, apparatus, and system
US15/634,774 Continuation US20170295239A1 (en) 2014-12-27 2017-06-27 Data processing method, apparatus, and system

Publications (1)

Publication Number Publication Date
WO2016101283A1 true WO2016101283A1 (zh) 2016-06-30

Family

ID=56149009

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/095223 WO2016101283A1 (zh) 2014-12-27 2014-12-27 一种数据处理方法装置及系统

Country Status (10)

Country Link
US (3) US20170295239A1 (zh)
EP (1) EP3203386A4 (zh)
JP (1) JP6607941B2 (zh)
KR (1) KR102030786B1 (zh)
CN (2) CN108733761B (zh)
AU (1) AU2014415350B2 (zh)
BR (1) BR112017011412B1 (zh)
CA (1) CA2965715C (zh)
SG (1) SG11201703410YA (zh)
WO (1) WO2016101283A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106708443A (zh) * 2017-01-03 2017-05-24 北京百度网讯科技有限公司 数据读写方法及装置
CN108021333A (zh) * 2016-11-03 2018-05-11 阿里巴巴集团控股有限公司 随机读写数据的系统、装置及方法
CN108604201A (zh) * 2016-12-30 2018-09-28 华为技术有限公司 一种快照回滚方法、装置、存储控制器和系统
CN110546620A (zh) * 2017-04-14 2019-12-06 华为技术有限公司 数据处理方法、存储系统和交换设备
CN111352594A (zh) * 2020-03-12 2020-06-30 上海路虹电子科技有限公司 eFuse中写入数据、读取数据的方法及装置
CN113821377A (zh) * 2021-08-27 2021-12-21 济南浪潮数据技术有限公司 一种分布式存储集群的数据恢复方法、系统及存储介质

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10733305B2 (en) 2016-11-15 2020-08-04 StorageOS Limited System and method for implementing cryptography in a storage system
US10353632B2 (en) 2016-11-15 2019-07-16 StorageOS Limited System and method for storing data blocks in a volume of data
US10691350B2 (en) 2016-11-15 2020-06-23 StorageOS Limited Method for provisioning a volume of data including placing data based on rules associated with the volume
US10652330B2 (en) * 2017-01-15 2020-05-12 Google Llc Object storage in cloud with reference counting using versions
US10547683B2 (en) * 2017-06-26 2020-01-28 Christopher Squires Object based storage systems that utilize direct memory access
US20190114232A1 (en) * 2017-10-17 2019-04-18 Christopher Squires Local and offloaded snapshots for volatile memory
CN110309100B (zh) * 2018-03-22 2023-05-23 腾讯科技(深圳)有限公司 一种快照对象生成方法和装置
CN110837479B (zh) * 2018-08-17 2023-09-01 华为云计算技术有限公司 数据处理方法、相关设备及计算机存储介质
CN110874181B (zh) * 2018-08-31 2021-12-17 杭州海康威视系统技术有限公司 一种数据更新方法及更新装置
CN109634526B (zh) * 2018-12-11 2022-04-22 浪潮(北京)电子信息产业有限公司 一种基于对象存储的数据操作方法及相关装置
CN109669634B (zh) * 2018-12-17 2022-03-04 浪潮电子信息产业股份有限公司 一种数据落盘方法、装置、设备及可读存储介质
CN111936960B (zh) * 2018-12-25 2022-08-19 华为云计算技术有限公司 分布式存储系统中数据存储方法、装置及计算机程序产品
US11163730B2 (en) * 2019-05-13 2021-11-02 Microsoft Technology Licensing, Llc Hard link operations for files in a file system
CN110674518A (zh) * 2019-09-26 2020-01-10 海南新软软件有限公司 一种设备标识信息生成方法、装置及系统
CN110769062A (zh) * 2019-10-29 2020-02-07 广东睿江云计算股份有限公司 一种分布式存储的异地灾备方法
CN112835511B (zh) * 2019-11-25 2022-09-20 浙江宇视科技有限公司 分布式存储集群的数据写入方法、装置、设备和介质
CN111064801B (zh) * 2019-12-26 2023-06-13 浪潮电子信息产业股份有限公司 一种基于分布式文件系统的osd通信方法、装置及介质
US11609834B2 (en) * 2020-01-21 2023-03-21 Druva Inc. Event based aggregation for distributed scale-out storage systems
CN111857602B (zh) * 2020-07-31 2022-10-28 重庆紫光华山智安科技有限公司 数据处理方法、装置、数据节点及存储介质
CN111966845B (zh) * 2020-08-31 2023-11-17 重庆紫光华山智安科技有限公司 图片管理方法、装置、存储节点及存储介质
CN112261097B (zh) * 2020-10-15 2023-11-24 科大讯飞股份有限公司 用于分布式存储系统的对象定位方法及电子设备
CN114697351B (zh) * 2020-12-30 2023-03-10 华为技术有限公司 一种存储管理方法、设备及介质
CN114490192A (zh) * 2021-11-03 2022-05-13 统信软件技术有限公司 一种文件备份方法、装置及计算设备
CN115981875B (zh) * 2023-03-21 2023-08-25 人工智能与数字经济广东省实验室(广州) 内存存储系统的增量更新方法、装置、设备、介质和产品

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101515296A (zh) * 2009-03-06 2009-08-26 成都市华为赛门铁克科技有限公司 数据更新方法和装置
CN101783814A (zh) * 2009-12-29 2010-07-21 上海交通大学 海量存储系统的元数据存储方法
CN103558998A (zh) * 2013-11-07 2014-02-05 华为技术有限公司 一种数据操作的方法和设备

Family Cites Families (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6651147B2 (en) 2001-05-08 2003-11-18 International Business Machines Corporation Data placement and allocation using virtual contiguity
US6957362B2 (en) 2002-08-06 2005-10-18 Emc Corporation Instantaneous restoration of a production copy from a snapshot copy in a data storage system
US7209933B2 (en) * 2003-12-12 2007-04-24 Oracle International Corporation Object versioning
US7386663B2 (en) * 2004-05-13 2008-06-10 Cousins Robert E Transaction-based storage system and method that uses variable sized objects to store data
US7814273B2 (en) * 2004-11-05 2010-10-12 Data Robotics, Inc. Dynamically expandable and contractible fault-tolerant storage system permitting variously sized storage devices and method
US7228320B2 (en) 2004-11-17 2007-06-05 Hitachi, Ltd. System and method for creating an object-level snapshot in a storage system
US20060204134A1 (en) 2005-03-01 2006-09-14 James Modrall Method and system of viewing digitized roll film images
US7373366B1 (en) * 2005-06-10 2008-05-13 American Megatrends, Inc. Method, system, apparatus, and computer-readable medium for taking and managing snapshots of a storage volume
US7716171B2 (en) * 2005-08-18 2010-05-11 Emc Corporation Snapshot indexing
CN100355899C (zh) 2006-04-14 2007-12-19 清华大学 一种对具荚膜细菌发酵液过滤预处理方法
US8285758B1 (en) * 2007-06-30 2012-10-09 Emc Corporation Tiering storage between multiple classes of storage on the same container file system
US20100082538A1 (en) * 2008-09-29 2010-04-01 Heiko Rentsch Isolated replication of shared objects
US8099572B1 (en) * 2008-09-30 2012-01-17 Emc Corporation Efficient backup and restore of storage objects in a version set
JP5244979B2 (ja) * 2009-02-23 2013-07-24 株式会社日立製作所 ストレージシステムおよびその制御方法
US20100332401A1 (en) * 2009-06-30 2010-12-30 Anand Prahlad Performing data storage operations with a cloud storage environment, including automatically selecting among multiple cloud storage sites
US8352501B2 (en) 2010-01-28 2013-01-08 Cleversafe, Inc. Dispersed storage network utilizing revision snapshots
US8825602B1 (en) * 2010-03-15 2014-09-02 Symantec Corporation Systems and methods for providing data protection in object-based storage environments
US9824095B1 (en) 2010-05-03 2017-11-21 Panzura, Inc. Using overlay metadata in a cloud controller to generate incremental snapshots for a distributed filesystem
US9852150B2 (en) 2010-05-03 2017-12-26 Panzura, Inc. Avoiding client timeouts in a distributed filesystem
US8396905B2 (en) * 2010-11-16 2013-03-12 Actifio, Inc. System and method for improved garbage collection operations in a deduplicated store by tracking temporal relationships among copies
WO2013038447A1 (en) * 2011-09-14 2013-03-21 Hitachi, Ltd. Method for creating clone file, and file system adopting the same
US9804928B2 (en) 2011-11-14 2017-10-31 Panzura, Inc. Restoring an archived file in a distributed filesystem
US9635132B1 (en) * 2011-12-15 2017-04-25 Amazon Technologies, Inc. Service and APIs for remote volume-based block storage
US9817834B1 (en) 2012-10-01 2017-11-14 Veritas Technologies Llc Techniques for performing an incremental backup
US9092837B2 (en) 2012-11-29 2015-07-28 International Business Machines Corporation Use of snapshots to reduce risk in migration to a standard virtualized environment
US9742873B2 (en) 2012-11-29 2017-08-22 International Business Machines Corporation Adjustment to managed-infrastructure-as-a-service cloud standard
CN104079600B (zh) 2013-03-27 2018-10-12 中兴通讯股份有限公司 文件存储方法、装置、访问客户端及元数据服务器系统
US20140344539A1 (en) 2013-05-20 2014-11-20 Kaminario Technologies Ltd. Managing data in a storage system
US20150244795A1 (en) * 2014-02-21 2015-08-27 Solidfire, Inc. Data syncing in a distributed system
US9400741B1 (en) * 2014-06-30 2016-07-26 Emc Corporation Reclaiming space from file system hosting many primary storage objects and their snapshots

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101515296A (zh) * 2009-03-06 2009-08-26 成都市华为赛门铁克科技有限公司 数据更新方法和装置
CN101783814A (zh) * 2009-12-29 2010-07-21 上海交通大学 海量存储系统的元数据存储方法
CN103558998A (zh) * 2013-11-07 2014-02-05 华为技术有限公司 一种数据操作的方法和设备

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108021333A (zh) * 2016-11-03 2018-05-11 阿里巴巴集团控股有限公司 随机读写数据的系统、装置及方法
CN108604201B (zh) * 2016-12-30 2022-02-25 华为技术有限公司 一种快照回滚方法、装置、存储控制器和系统
CN108604201A (zh) * 2016-12-30 2018-09-28 华为技术有限公司 一种快照回滚方法、装置、存储控制器和系统
CN106708443A (zh) * 2017-01-03 2017-05-24 北京百度网讯科技有限公司 数据读写方法及装置
CN106708443B (zh) * 2017-01-03 2020-01-17 北京百度网讯科技有限公司 数据读写方法及装置
CN110546620A (zh) * 2017-04-14 2019-12-06 华为技术有限公司 数据处理方法、存储系统和交换设备
US10728335B2 (en) 2017-04-14 2020-07-28 Huawei Technologies Co., Ltd. Data processing method, storage system, and switching device
EP3474146B1 (en) * 2017-04-14 2022-02-23 Huawei Technologies Co., Ltd. Data processing method, storage system and exchange device
CN110546620B (zh) * 2017-04-14 2022-05-17 华为技术有限公司 数据处理方法、存储系统和交换设备
CN114880256A (zh) * 2017-04-14 2022-08-09 华为技术有限公司 数据处理方法、存储系统和交换设备
CN111352594A (zh) * 2020-03-12 2020-06-30 上海路虹电子科技有限公司 eFuse中写入数据、读取数据的方法及装置
CN111352594B (zh) * 2020-03-12 2023-06-20 湖州旻合科技有限公司 eFuse中写入数据、读取数据的方法及装置
CN113821377A (zh) * 2021-08-27 2021-12-21 济南浪潮数据技术有限公司 一种分布式存储集群的数据恢复方法、系统及存储介质
CN113821377B (zh) * 2021-08-27 2023-12-22 济南浪潮数据技术有限公司 一种分布式存储集群的数据恢复方法、系统及存储介质

Also Published As

Publication number Publication date
CN105993013B (zh) 2018-05-04
BR112017011412A8 (pt) 2022-09-06
AU2014415350B2 (en) 2019-02-21
EP3203386A1 (en) 2017-08-09
CN108733761A (zh) 2018-11-02
CA2965715C (en) 2019-02-26
SG11201703410YA (en) 2017-06-29
US11032368B2 (en) 2021-06-08
KR102030786B1 (ko) 2019-10-10
JP2017537397A (ja) 2017-12-14
BR112017011412A2 (zh) 2018-06-26
US11799959B2 (en) 2023-10-24
US20170295239A1 (en) 2017-10-12
US20170293533A1 (en) 2017-10-12
JP6607941B2 (ja) 2019-11-20
CA2965715A1 (en) 2016-06-30
EP3203386A4 (en) 2017-12-27
CN108733761B (zh) 2021-12-03
KR20170068564A (ko) 2017-06-19
US20210152638A1 (en) 2021-05-20
CN105993013A (zh) 2016-10-05
BR112017011412B1 (pt) 2023-02-14
AU2014415350A1 (en) 2017-05-18

Similar Documents

Publication Publication Date Title
US11799959B2 (en) Data processing method, apparatus, and system
US10776315B2 (en) Efficient and flexible organization and management of file metadata
US10185629B2 (en) Optimized remote cloning
US10628378B2 (en) Replication of snapshots and clones
CN102779180B (zh) 数据存储系统的操作处理方法,数据存储系统
EP3477482B1 (en) Intelligent snapshot tiering
CN110321301B (zh) 一种数据处理的方法及装置
US9110820B1 (en) Hybrid data storage system in an HPC exascale environment
US11016884B2 (en) Virtual block redirection clean-up
US10732840B2 (en) Efficient space accounting mechanisms for tracking unshared pages between a snapshot volume and its parent volume
US10268411B1 (en) Policy and heuristic based conversion of write-optimized virtual disk format into read-optimized virtual disk format
CN109241011B (zh) 一种虚拟机文件处理方法及装置
CN113821476B (zh) 数据处理方法及装置
WO2021189312A1 (en) Meta server crash recovery in object storage system using enhanced meta structure
WO2021189315A1 (en) Proxy server crash recovery in object storage system using enhanced meta structure
WO2021189306A1 (en) Write operation in object storage system using enhanced meta structure
WO2021189308A1 (en) Delete operation in object storage system using enhanced meta structure
WO2021189314A1 (en) Data server crash recovery in object storage system using enhanced meta structure

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14908851

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2965715

Country of ref document: CA

REEP Request for entry into the european phase

Ref document number: 2014908851

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 11201703410Y

Country of ref document: SG

ENP Entry into the national phase

Ref document number: 20177012992

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2014415350

Country of ref document: AU

Date of ref document: 20141227

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2017528138

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112017011412

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 112017011412

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20170530