WO2018166526A1 - 数据存储、分散、重构、回收方法、装置及数据处理系统 - Google Patents

数据存储、分散、重构、回收方法、装置及数据处理系统 Download PDF

Info

Publication number
WO2018166526A1
WO2018166526A1 PCT/CN2018/079277 CN2018079277W WO2018166526A1 WO 2018166526 A1 WO2018166526 A1 WO 2018166526A1 CN 2018079277 W CN2018079277 W CN 2018079277W WO 2018166526 A1 WO2018166526 A1 WO 2018166526A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
storage
index information
data segment
storage object
Prior art date
Application number
PCT/CN2018/079277
Other languages
English (en)
French (fr)
Inventor
叶敏
林鹏
汪渭春
林起芊
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Priority to US16/495,042 priority Critical patent/US11010072B2/en
Priority to EP18768099.6A priority patent/EP3598289B1/en
Publication of WO2018166526A1 publication Critical patent/WO2018166526A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/062Securing storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0631Configuration or reconfiguration of storage systems by allocating resources to storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0619Improving the reliability of storage systems in relation to data integrity, e.g. data losses, bit errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • G06F3/0665Virtualisation aspects at area level, e.g. provisioning of virtual or logical volumes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0688Non-volatile semiconductor memory arrays
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD

Definitions

  • the present application relates to the field of data processing technologies, and in particular, to a data storage, decentralization, reconstruction, and recovery method, apparatus, and data processing system.
  • Redundant Arrays of Independent Disks refers to a large-capacity disk group composed of multiple disks, and the data to be stored is divided into multiple segments and stored in each disk.
  • the data to be stored may be divided into k original segments, and the k original segments are expanded and coded to obtain m redundant segments; the k original segments and m redundant segments are respectively stored to each disk. In this way, if a disk fails and data is lost, the lost data segments can be recovered based on the original fragments and redundant segments that are not lost.
  • each data segment (original segment and redundant segment) is usually randomly stored in each disk, and multiple pieces of data segments may be stored in one disk. If the disk fails, multiple parts are stored in the disk. Data fragments may all be lost, making data recovery impossible.
  • the purpose of the embodiment of the present application is to provide a data storage, decentralization, reconstruction, and recovery method, device, and data processing system, so as to avoid a disk failure, and all the pieces of data are lost, resulting in failure to perform data recovery.
  • the embodiment of the present application discloses a data storage method, including:
  • the data to be stored is sliced and redundantly processed to obtain k+m data segments;
  • the index information corresponding to the data to be stored is recorded, and the index information includes: a correspondence between each data segment and a storage object storing the data segment.
  • the embodiment of the present application further discloses a data dispersion method, including:
  • the index information includes: a correspondence between each data segment and a storage object storing the data segment, the storage object being a minimum unit for storing data;
  • the embodiment of the present application further discloses a data reconstruction method, including:
  • the index information to be reconstructed is determined; wherein the index information includes: a correspondence between each data segment and a storage object storing the data segment, where the index information to be reconstructed includes a faulty storage Information of the object, the storage object being a minimum unit for storing data;
  • the target data segment is: the information to be reconstructed Each piece of data of the data to be reconstructed;
  • the embodiment of the present application further discloses a data recovery method, including:
  • the index information includes: a correspondence between each data segment and a storage object storing the data segment, where the storage object is a minimum unit for storing data;
  • new index information is generated to replace the to-be-recovered index information.
  • a data storage device including:
  • a first allocation module configured to allocate x storage objects for the data to be stored according to the preset erasure code policy k+m; wherein the k represents the original fragment number, and the m represents the redundant fragment number, The x is greater than 1 and not greater than k+m, and the storage object is a minimum unit for storing data;
  • a slicing module configured to perform slice and redundancy processing on the data to be stored by using the erasure code strategy k+m, to obtain k+m data segments;
  • a first storage module configured to store the k+m data segments to the x storage objects, where a difference between the number of copies of the data segments stored in each storage object is less than a first preset threshold
  • the first recording module is configured to record index information corresponding to the data to be stored, where the index information includes: a correspondence between each data segment and a storage object storing the data segment.
  • a data dispersing apparatus including:
  • a first determining module configured to determine, in the recorded index information, index information to be distributed; wherein the index information includes: a correspondence between each data segment and a storage object storing the data segment, where the storage object is a storage data Minimum unit
  • a second determining module configured to determine, according to the determined index information to be distributed, a data segment to be distributed
  • a second allocation module configured to allocate a distributed storage object to the to-be-distributed data segment
  • a second storage module configured to separately store the to-be-distributed data segments to the distributed storage object
  • the first update module is configured to update the information to be distributed.
  • a data reconstruction apparatus including:
  • a third determining module configured to determine index information to be reconstructed in the recorded index information, where the index information includes: a correspondence between each data segment and a storage object storing the data segment, the index to be reconstructed The information includes information of the failed storage object, the storage object being a minimum unit for storing data;
  • a reading module configured to read a target data segment from the non-failed storage object according to the information of the storage object that is not faulty included in the information to be reconstructed, where the target data segment is: Each piece of data of the data to be reconstructed corresponding to the information to be reconstructed;
  • a reconstruction module configured to reconstruct the target data segment to obtain a repair segment
  • a third storage module configured to store the repaired segment into a storage object allocated thereto
  • a second update module configured to update the index information to be reconstructed.
  • a data recovery device including:
  • a fourth determining module configured to determine index information to be recovered in the recorded index information, where the index information includes: a correspondence between each data segment and a storage object storing the data segment, where the storage object is a storage data Minimum unit
  • a fifth determining module configured to determine, according to the to-be-recovered index information, a data segment to be recovered
  • a third allocation module configured to allocate a storage object for the to-be-recovered data segment
  • a fourth storage module configured to separately store the to-be-recovered data segments into a storage object allocated thereto;
  • a second recording module configured to record a correspondence between each data segment to be recovered and a storage object that stores the data segment to be recovered
  • a replacement module configured to replace the corresponding relationship with the to-be-recovered index information.
  • the embodiment of the present application further discloses a data processing system, including: a platform server and a management server, where
  • the platform server allocates x storage objects for the data to be stored according to the preset erasure code policy k+m; wherein k represents the original fragment number, and m represents the redundant fragment number, x is greater than 1 and not greater than k+m, and the storage object is a minimum unit for storing data;
  • the management server uses the erasure code strategy k+m to slice and redundantly process the data to be stored to obtain k+m data segments; and store the k+m data segments separately The x storage objects, wherein the difference of the number of copies of the data segments stored in each storage object is less than a fourth preset threshold;
  • the platform server records index information corresponding to the data to be stored, and the index information includes: a correspondence between each data segment and a storage object storing the data segment.
  • the platform server determines index information corresponding to the data to be read
  • the management server reads, according to the index information determined by the platform server, each data segment of the data to be read from the storage object; and combines each data segment read to obtain the Read the data.
  • the platform server determines, in the recorded index information, the information to be distributed;
  • the management server determines, according to the determined index information to be distributed, a data segment to be distributed; allocates a distributed storage object to the data segment to be distributed; and stores the to-be-distributed data segment to the distributed storage object, respectively;
  • the platform server updates the to-be-distributed index information
  • system further includes an audit server
  • the platform server determines, in the recorded index information, the information to be distributed
  • the auditing server determines, according to the determined index information to be distributed, a data segment to be distributed; allocates a distributed storage object to the data segment to be distributed; and stores the to-be-distributed data segment to the distributed storage object, respectively;
  • the platform server updates the to-be-distributed index information.
  • the platform server determines, in the recorded index information, the index information to be reconstructed, where the index information to be reconstructed includes information of the storage object that is faulty;
  • the management server reads the target data segment from the non-failed storage object according to the information of the storage object that is not faulty included in the information to be reconstructed, where the target data segment is: Reconstructing each data segment of the data to be reconstructed corresponding to the information; reconstructing the target data segment to obtain a repair segment; storing the repair segment into a storage object allocated thereto;
  • system further includes an audit server
  • the platform server in the recorded index information, determining the index information to be reconstructed; wherein the index information to be reconstructed includes information of the storage object that is faulty;
  • the audit server reads a target data segment from the non-failed storage object according to the information of the storage object that is not faulty included in the information to be reconstructed, where the target data segment is: Reconstructing each data segment of the data to be reconstructed corresponding to the information; reconstructing the target data segment to obtain a repair segment; storing the repair segment into a storage object allocated thereto;
  • the platform server updates the index information to be reconstructed.
  • the platform server determines, in the recorded index information, the index information to be collected
  • the platform server allocates a storage object for the to-be-recovered data segment
  • the management server stores the pieces of data to be recovered separately into storage objects allocated thereto;
  • the platform server records a correspondence between each of the to-be-recovered data segments and a storage object that stores the to-be-recovered data segment; and replaces the corresponding relationship with the to-be-recovered index information;
  • system further includes an audit server
  • the platform server determines, in the recorded index information, the index information to be recovered
  • the audit server determines, according to the to-be-recovered index information, a piece of data to be recovered
  • the platform server allocates a storage object for the to-be-recovered data segment
  • the audit server stores the pieces of data to be recovered separately into storage objects allocated thereto;
  • the platform server records a correspondence between each data segment to be recovered and a storage object that stores the data segment to be collected; and replaces the corresponding relationship with the to-be-recovered index information.
  • the system may further include: a storage server, where the storage server includes multiple storage objects;
  • the storage server reports the running status information of the plurality of storage objects to the platform server, so that the platform server allocates storage objects for the data to be stored and data to be distributed according to the running state information reported by each storage server.
  • the fragment allocation distributes the storage object and determines the index information to be reconstructed.
  • an embodiment of the present application further discloses an electronic device, including: a processor and a memory, wherein the memory is used to store executable program code, and the processor is operated by reading executable program code stored in the memory.
  • an embodiment of the present application further discloses an executable program code for being executed to execute any of the above distributed system upgrade methods.
  • an embodiment of the present application further discloses a computer readable storage medium for storing executable program code, where the executable program code is used to be executed to perform any of the above A distributed system upgrade method.
  • each data segment of the data to be stored is separately stored in each storage object, and the storage object is a minimum unit for storing data, and the difference of the number of data segments stored in each storage object is less than a first preset threshold. That is to say, in this solution, each data segment is stored more uniformly in a plurality of storage objects, which can prevent a storage object from being faulty, and all the pieces of data of the same data are lost, resulting in failure to perform data recovery on the data. happensing.
  • FIG. 1 is a schematic flowchart diagram of a data storage method according to an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a data dispersion method according to an embodiment of the present application
  • FIG. 3 is a schematic flowchart diagram of a data reconstruction method according to an embodiment of the present disclosure
  • FIG. 4 is a schematic flowchart of a data recovery method according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a data storage device according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of a data dispersing apparatus according to an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a data reconstruction apparatus according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a data recovery device according to an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a first structure of a data processing system according to an embodiment of the present disclosure.
  • FIG. 10 is a schematic diagram of a second structure of a data processing system according to an embodiment of the present disclosure.
  • FIG. 11 is a schematic diagram of a third structure of a data processing system according to an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the embodiment of the present application provides a data storage, decentralization, reconstruction, and recovery method, device, and data processing system.
  • the method and device can be applied to a server, a client, and various electronic devices, and are not limited.
  • the data storage method provided by the embodiment of the present application is first described in detail below.
  • FIG. 1 is a schematic flowchart of a data storage method according to an embodiment of the present disclosure, including:
  • k represents the original fragment number
  • m represents the redundant segment number
  • x is greater than 1 and not greater than k+m.
  • a storage object is the smallest unit that stores data, and can be understood as the smallest unit that stores data fragments (including original fragments and redundant fragments). If the minimum unit is a disk, the storage object is a disk. If the disk can be divided into smaller object blocks, the storage object is an object block, which is not limited. The following describes the case where the storage object is a disk.
  • S101 may include:
  • the x storage objects are determined in the available storage nodes; wherein the storage node in which each of the determined storage objects is located is different; the x is equal to the k+m;
  • k+m available storage nodes are allocated to store the data to be stored, so that the Each piece of data that stores data is stored in a different storage node.
  • all available disks are allocated to store the data to be stored, so that each data segment of the data to be stored stores as many different disks as possible.
  • the data to be stored is stored as widely as possible. If a disk fails or a storage node fails, resulting in data loss, the data fragments stored in other disks or storage nodes can still be utilized. The clip is restored.
  • the storage object may be allocated to the data to be stored according to the resource occupation of the storage object. Specifically, the storage object with a small resource occupancy rate can be preferentially allocated.
  • S102 Perform the slicing and redundancy processing on the to-be-stored data by using the erasure code strategy k+m to obtain k+m data segments.
  • the data to be stored may be divided into k original segments; the k original segments are redundantly processed to obtain m redundant segments.
  • S103 Store the k+m data segments to the x storage objects.
  • the difference between the number of copies of the data segments stored in each storage object is less than a first preset threshold.
  • the first preset threshold can be set according to the actual situation, which is assumed to be 2.
  • the storage object has been allocated for the data to be stored, and at this time, each data segment can be stored in the allocated storage object.
  • each data segment can be separately stored to a storage object in a different storage node.
  • the number of copies of the data segments stored in each storage object is 1, and the difference is 0, which is smaller than the first preset threshold.
  • each piece of data can be stored separately on a different disk. In this way, the number of copies of the data segments stored in each storage object is 1, and the difference is 0, which is smaller than the first preset threshold.
  • the k + m data segments may be evenly divided into x shares, and the x pieces of data segments are separately stored into the x storage objects.
  • Uniform division can be understood as: if k + m data segments can be equally divided into x shares, the k + m data segments are equally divided into x shares; if k + m data segments cannot be equally divided into x shares Then divide the k+m data segments into x shares as evenly as possible.
  • each of the two data segments is stored in one storage object, so that the number of copies of the data segments stored in each storage object is 2, and the difference is 0, which is smaller than the first preset threshold.
  • the first part may include 2 pieces of data
  • the second part may include 2 pieces of data.
  • the third share can include 1 data segment.
  • the number of data segments stored in each storage object is 2, 2, 1, respectively, and the difference is 0 or 1, which is still smaller than the first preset threshold.
  • S104 Record index information corresponding to the data to be stored, where the index information includes: a correspondence between each data segment and a storage object storing the data segment.
  • the data to be stored is divided into four original segments, and the four original segments are redundantly processed to obtain two redundant segments; the six data segments (four) The original segment and two redundant segments are: A1, A2, A3, A4, A5, A6. It is assumed that six disks are allocated as storage objects of the data to be stored, and the six disks are: B1, B2, B3, B4, B5, and B6.
  • the index information corresponding to the data to be stored may include: A1—B1, A2—B2, A3—B3, A4—B4, A5—B5, and A6—B6.
  • the original segment and the redundant segment may be distinguished, and the foregoing index information is only used to simplify the description, and is not limited.
  • each data segment of the data to be stored is separately stored in each storage object, and the storage object is a minimum unit for storing data, and the difference of the number of data segments stored in each storage object is smaller than the first A preset threshold, that is, in this solution, storing each data segment into multiple storage objects more uniformly, can avoid a storage object from being faulty, and all the data fragments of the same data are lost, and the data cannot be The case of data recovery.
  • FIG. 2 is a schematic flowchart of a data dispersion method according to an embodiment of the present disclosure, including:
  • S201 Determine, in the recorded index information, index information to be distributed.
  • the data in the embodiment shown in FIG. 2 of the present application may be stored according to the data storage method provided by the embodiment shown in FIG. 1 of the present application, or may be stored based on other storage methods.
  • index information of the data is recorded in the data storage process. Therefore, the index information to be distributed can be determined in the recorded index information.
  • the application of other storage methods also needs to record index information
  • the index information includes: a correspondence between each data segment and a storage object storing the data segment, and the storage object is a minimum unit for storing data.
  • the number of records of each storage object corresponding to the index information may be determined for each piece of index information.
  • the index information is determined as the index information to be distributed.
  • the number of records of one storage object in one index information is: the number of data segments in the index information stored in the storage object.
  • the index information includes: A1 - B1, A2 - B1, A3 - B1, A4 - B4, A5 - B4, A6 - B6, wherein the data segment in the index information stored in the object B1 is stored ( The number of A1, A2, and A3) is three, and the storage object B1 is recorded three times, that is, the number of recordings of the storage object B1 is three, the number of recordings of the storage object B4 is two, and the number of recordings of the storage object B6 is one.
  • the third preset threshold is 2
  • the data corresponding to the index information has three data segments stored in the same storage object.
  • the index information is determined as the index information to be distributed.
  • This scheme can be implemented to distribute data that is unevenly stored. If multiple pieces of data of the same data are stored in one storage object, the plurality of pieces of data can be dispersed.
  • the index information corresponding to the data may be used to determine whether the data has a data segment that needs to be dispersed, that is, when there is a target storage object whose recording times exceed a third preset threshold, the index information is determined as the information to be distributed. .
  • S202 Determine, according to the determined index information to be distributed, a piece of data to be dispersed.
  • the data segment to be distributed may be determined in the data segment corresponding to the target storage object.
  • the third preset threshold is 2
  • there is a target storage object B1 whose recording number is greater than 2 that is, in the data segments A1, A2, and A3 corresponding to B1
  • the data segments to be distributed are determined. It is assumed that A2 and A3 are determined as pieces of data to be dispersed.
  • S203 Allocate a distributed storage object for the to-be-distributed data segment.
  • the principle of allocating the distributed storage objects may be the same as the principle of allocating x storage objects for the data to be stored in the embodiment shown in FIG. 1 of the present application:
  • the data segments to be distributed are scattered as much as possible. If a disk fails or a storage node fails, data is lost, and data fragments stored in other disks or storage nodes can still be used. The data fragment is restored.
  • the distributed storage object may be allocated according to the resource occupancy of the storage object. Specifically, the storage object with a small resource occupancy rate may be preferentially allocated as a distributed storage object.
  • the specific manner of allocating the distributed storage object may be various, and is not limited.
  • S204 Store the to-be-distributed data segments to the distributed storage object.
  • the distributed storage objects are disk B3 and disk B5, which disk segment is stored to which disk is not limited, it is assumed that A2 is stored to B3 and A3 is stored to B5.
  • the to-be-distributed index information may be updated according to the correspondence between the to-be-distributed data segment and the distributed storage object that stores the to-be-distributed data segment.
  • the information to be distributed is updated to: A1 - B1, A2 - B3, A3 - B5, A4 - B4, A5 - B4, A6 - B6.
  • the data with uneven storage may be dispersed, so that each data segment is more uniformly stored into each storage object, and the storage object is stored data.
  • the smallest unit can avoid the failure of one storage object, and all the data fragments of the same data are lost, resulting in data recovery failure.
  • Storage to multiple storage objects can prevent a storage object from failing, and all of the data fragments are lost, resulting in data recovery failure.
  • FIG. 3 is a schematic flowchart of a data reconstruction method according to an embodiment of the present disclosure, including:
  • the index information to be reconstructed includes information of a storage object that is faulty, and the storage object is a minimum unit that stores data.
  • the data in the embodiment shown in FIG. 3 of the present application may be stored according to the data storage method provided by the embodiment shown in FIG. 1 of the present application, or may be stored based on other storage methods.
  • index information of the data is recorded in the data storage process. Therefore, the index information to be reconstructed can be determined in the recorded index information.
  • index information includes: a correspondence between each data segment and a storage object storing the data segment.
  • the electronic device executing the solution can obtain the information of the failed storage object, so that the electronic device can determine the index information to be reconstructed: the index information of the information including the failed storage object is determined as the index information to be reconstructed. .
  • the electronic device can periodically detect whether a storage node, a disk, or a smaller storage unit under a disk has failed.
  • the storage object can be the smallest unit that stores the data segments. If it is detected that a storage object has failed, the index information of the information including the storage object is determined as the index information to be reconstructed.
  • an index information includes: A1—C1, A2—C2, A3—C3, A4—C4, A5—C5, and A6—C6, and the index information is determined as the index information to be reconstructed. .
  • S302 Read the target data segment from the storage object that has not failed according to the information of the storage object that is not faulty included in the information to be reconstructed.
  • the target data segment is: each data segment of the data to be reconstructed corresponding to the information to be reconstructed.
  • the storage objects that do not have faults in the above index information are C1, C3, C4, and C5, A1 is read from C1, A3 is read from C3, A4 is read from C4, and A5 is read from C5. .
  • A1, A3, A4 and A5 are reconstructed to obtain the repaired fragments, namely the new A2 and A6.
  • Reassign storage objects for the new A2 and A6 C7 and C8, assuming that A2 is stored to C7 and A6 is stored to C8.
  • the to-be-reconstructed index information may be updated according to a correspondence between the repair segment and a storage object storing the repair segment.
  • index information is updated to: A1 - C1, A2 - C7, A3 - C3, A4 - C4, A5 - C5, A6 - C8.
  • FIG. 4 is a schematic flowchart of a data recovery method according to an embodiment of the present disclosure, including:
  • S401 Determine, in the recorded index information, index information to be recovered.
  • the data in the embodiment shown in FIG. 4 of the present application may be stored according to the data storage method provided by the embodiment shown in FIG. 1 of the present application, or may be stored based on other storage methods.
  • index information of the data is recorded. Therefore, the index information to be recovered can be determined in the recorded index information.
  • the application of other storage methods also needs to record index information
  • the index information includes: a correspondence between each data segment and a storage object storing the data segment, and the storage object is a minimum unit for storing data.
  • the index information includes: A1 - B1, A2 - B2, A3 - B2, A4 - B4, A5 - B4, A6 - B6, in which disks B2 and B6 fail. Then, the data segments A2, A3, and A6 corresponding to B2 and B6 are invalid data segments, and the number of invalid data segments is 3.
  • the fourth preset threshold is 2, whether the number of invalid data segments is greater than a fourth preset threshold, and determining the index information as the index information to be recovered.
  • S402 Determine, according to the to-be-recovered index information, a piece of data to be recovered.
  • the data segment to be recovered may be determined according to the valid data segment in the to-be-recovered index information, where the valid data segment is a data segment other than the invalid data segment.
  • the data segments corresponding to the index information include A1, A2, A3, A4, A5, and A6, wherein A2, A3, and A6 are invalid data segments, and the remaining data segments A1, A4, and A5 are referred to herein as valid data. Fragment.
  • a valid data segment is determined as a piece of data to be recovered.
  • the storage object may be allocated according to the resource occupation of the storage object. Specifically, the storage object with a small resource occupancy rate can be preferentially allocated.
  • S404 Store the to-be-recovered data segments into storage objects allocated thereto.
  • the allocated storage objects are disk B7, disk B8, and disk B9, which disk segment is stored to which disk is not limited, it is assumed that A1 is stored to B7, A4 is stored to B8, and A5 is stored to B9.
  • S405 Record a correspondence between each data segment to be recovered and a storage object that stores the data segment to be recovered.
  • S406 Generate new index information to replace the to-be-recovered index information according to the recorded correspondence.
  • the new index information is: A1-B7, A4-B8, A5-B9.
  • index information can only correspond to 6 data segments. In this case, it is necessary to collect 6 data segments to be recovered. Data recovery is possible, and the same index information can correspond to data segments of different data.
  • index information including: A10-B10, A20-B20, A30-B20, A40-B40, A50-B40, A60-B60, in which disks B20 and B40 fail.
  • the data segments A20, A30, and A60 corresponding to B20 and B40 are invalid data segments, and the number of invalid data segments is 3, which is greater than the fourth preset threshold value of 2, and the index information is determined as index information to be recovered.
  • the data segments corresponding to the index information include A10, A20, A30, A40, A50, and A60, wherein A20, A30, and A60 are invalid data segments, and the remaining data segments A10, A40, and A50 are referred to herein as valid data segments.
  • a valid data segment is determined as a piece of data to be recovered. In this way, the two pieces of information to be recycled are collected into six pieces of data to be recovered.
  • the storage objects allocated for the data segments A10, A40, and A50 to be reclaimed are the disk B70, the disk B80, and the disk B90. Which disk is stored to which disk is not limited, it is assumed that A10 is stored to the B70, and the A40 is stored to B80, store A50 to B90.
  • A10-B70, A40-B80, and A50-B90 as new index information, and replace the above two new index information with the information to be distributed.
  • a data storage, dispersion, reconstruction, and recovery device is also provided.
  • FIG. 5 is a schematic structural diagram of a data storage device according to an embodiment of the present disclosure, including:
  • the first allocation module 501 is configured to allocate x storage objects to the data to be stored according to the preset erasure code policy k+m; wherein the k represents the original fragment number, and the m represents the redundant fragment number
  • the x is greater than 1 and not greater than k+m, and the storage object is a minimum unit for storing data;
  • the sculpt module 502 is configured to perform slice and redundancy processing on the to-be-stored data by using the erasure code strategy k+m to obtain k+m data segments.
  • a first storage module 503 configured to store the k+m data segments to the x storage objects, where a difference between the number of data segment segments stored in each storage object is less than a first preset threshold;
  • the first recording module 504 is configured to record index information corresponding to the data to be stored, where the index information includes: a correspondence between each data segment and a storage object storing the data segment.
  • the first allocation module 501 may include: a first determining submodule and a first determining submodule (not shown), where
  • a first determining submodule configured to determine whether the number of available storage nodes is not less than k+m
  • a first determining submodule configured to: when the first determining submodule determines that the result is yes, determine x storage objects in the available storage nodes; wherein the determined storage node is located in each storage node; x is equal to the k+m;
  • the first storage module 503 may include:
  • the first storage sub-module (not shown) is configured to store each data segment to a storage object in a different storage node when the first determination sub-module determines that the result is yes.
  • the first allocation module 501 may include: a second determining submodule and a second determining submodule (not shown), where
  • a second determining sub-module configured to: when the first determining sub-module determines that the result is no, determine whether the number of available storage objects in all available storage nodes is not less than k+m;
  • a second determining submodule configured to: when the second determining submodule determines that the result is yes, determine x storage objects among all available storage nodes; wherein the x is equal to the k+m, and each available storage The difference in the number of storage objects determined in the node is less than a second preset threshold;
  • the first storage module 503 may include:
  • the second storage sub-module (not shown) is configured to store each data segment to a different storage object when the second determination sub-module determines that the result is yes.
  • the first allocating module 501 may include:
  • An allocation sub-module (not shown), configured to allocate all available storage objects to the data to be stored when the second determination sub-module determines that the result is no; wherein the x is equal to all available storage objects quantity;
  • the first storage module 503 may include:
  • a third storage sub-module (not shown), configured to divide the k+m data segments into x shares uniformly when the second determining sub-module determines that the result is no, and the x-part data The segments are stored to the x storage objects, respectively.
  • each data segment of the data to be stored is separately stored in each storage object, and the storage object is a minimum unit for storing data, and the difference of the number of data segments stored in each storage object is smaller than the first A preset threshold, that is, in this solution, storing each data segment into multiple storage objects more uniformly, can avoid a storage object from being faulty, and all the data fragments of the same data are lost, and the data cannot be The case of data recovery.
  • FIG. 6 is a schematic structural diagram of a data dispersing apparatus according to an embodiment of the present application.
  • the data in the embodiment shown in FIG. 6 may be stored by using the data storage device provided in the embodiment shown in FIG. 5 of the present application, or Other storage devices can be used for storage, and Figure 6 includes:
  • the first determining module 601 is configured to determine index information to be distributed in the recorded index information, where the index information includes: a correspondence between each data segment and a storage object storing the data segment, where the storage object is a storage The smallest unit of data;
  • a second determining module 602 configured to determine, according to the determined index information to be distributed, a data segment to be distributed;
  • a second allocation module 603, configured to allocate a distributed storage object to the data segment to be distributed
  • a second storage module 604 configured to separately store the to-be-distributed data segments to the distributed storage object
  • the first update module 605 is configured to update the to-be-distributed index information.
  • the first determining module 601 is specifically configured to:
  • the number of records of each storage object corresponding to the index information is determined, wherein the number of records of one storage object in one index information is: the number of data segments in the index information stored in the storage object;
  • the index information is determined as the index information to be distributed;
  • the second determining module 602 is specifically configured to:
  • the data segment to be distributed is determined.
  • the data with uneven storage may be dispersed, so that each data segment is more uniformly stored into each storage object, and the storage object is stored data.
  • the smallest unit can avoid the failure of one storage object, and all the data fragments of the same data are lost, resulting in data recovery failure.
  • Storage to multiple storage objects can prevent a storage object from failing, and all of the data fragments are lost, resulting in data recovery failure.
  • FIG. 7 is a schematic structural diagram of a data reconstruction apparatus according to an embodiment of the present disclosure.
  • the data in the embodiment shown in FIG. 7 may be stored by using the data storage apparatus provided in the embodiment shown in FIG. 5 of the present application, or Other storage devices can also be used for storage, and Figure 7 includes:
  • the third determining module 701 is configured to determine index information to be reconstructed in the recorded index information, where the index information includes: a correspondence between each data segment and a storage object storing the data segment, where the to-be-reconstructed The index information includes information of the failed storage object, where the storage object is a minimum unit for storing data;
  • the reading module 702 is configured to read a target data segment from the non-failed storage object according to the information of the storage object that is not faulty included in the information to be reconstructed, where the target data segment is: Determining each data segment of the data to be reconstructed corresponding to the reconstructed information;
  • a reconstruction module 703, configured to reconstruct the target data segment to obtain a repair segment
  • the second update module 705 is configured to update the index information to be reconstructed.
  • FIG. 8 is a schematic structural diagram of a data recovery device according to an embodiment of the present disclosure.
  • the data in the embodiment shown in FIG. 8 may be stored by using the data storage device provided in the embodiment shown in FIG. 5 of the present application, or Other storage devices can be used for storage, and Figure 8 includes:
  • the fourth determining module 801 is configured to determine index information to be collected in the recorded index information, where the index information includes: a correspondence between each data segment and a storage object storing the data segment, where the storage object is a storage The smallest unit of data;
  • a fifth determining module 802 configured to determine, according to the to-be-recovered index information, a data segment to be recovered
  • a third allocation module 803, configured to allocate a storage object for the to-be-recovered data segment
  • a fourth storage module 804 configured to separately store the to-be-recovered data segments into a storage object allocated thereto;
  • a second recording module 805, configured to record a correspondence between each data segment to be recovered and a storage object that stores the data segment to be recovered;
  • the replacing module 806 is configured to generate new index information to replace the to-be-recovered index information according to the recorded correspondence.
  • the fourth determining module 801 is specifically configured to:
  • the fifth determining module 802 is specifically configured to:
  • the embodiment of the present application further provides a data processing system.
  • the system can be as shown in FIG. 9 and includes: a platform server and a management server.
  • the platform server allocates x storage objects for the data to be stored according to the preset erasure code policy k+m; wherein k represents the original fragment number, the m represents the redundant fragment number, and the x is greater than 1 and not greater than k+m, the storage object is the smallest unit for storing data;
  • the management server uses the erasure code strategy k+m to slice and redundantly process the data to be stored to obtain k+m data segments; and store the k+m data segments to the x a storage object, wherein a difference between the number of copies of the data segments stored in each storage object is less than a fourth preset threshold;
  • the platform server records the index information corresponding to the data to be stored, and the index information includes: a correspondence between each data segment and a storage object storing the data segment.
  • the platform server may determine whether the number of available storage nodes is not less than k+m; if not smaller, determine x storage objects among the available storage nodes; wherein, the determined storage node where each storage object is located Different; the x is equal to the k+m; each data segment is separately stored to a storage object in a different storage node.
  • the platform server may determine, if the number of available storage nodes is less than k+m, whether the number of available storage objects in all available storage nodes is not less than k+m; if yes, in all available storage Determining x storage objects in the node; wherein the x is equal to the k+m, and the difference in the number of storage objects determined in each available storage node is less than a second preset threshold; storing each data segment separately Store objects.
  • the platform server may allocate all available storage objects to the to-be-stored data when determining that the number of available storage objects in the available storage nodes is less than k+m; wherein the x is equal to all available The number of storage objects; the k+m data segments are evenly divided into x shares, and the x data segments are separately stored to the x storage objects.
  • the platform server After receiving the data storage request sent by the user, the platform server specifies the management server that receives the data to be stored.
  • the designated management server receives the data to be stored sent by the user.
  • the management server applies for the strip resource to the platform server, and the platform server allocates the strip resource for the data to be stored according to the preset erasure code policy k+m:
  • the preset erasure code policy is 4+1
  • the platform server organizes the 4+1 stripe resource to be sent to the management server.
  • the platform server generates a unique stripe ID for each stripe. For example, when the platform server organizes the stripe resource of 4+1, the platform server allocates the stripe when the resource permits. 5 storage objects (disks).
  • the rule of allocating 5 storage objects may include: if there are enough available storage nodes (the number is not less than 5), allocate 5 available storage nodes to store the data to be stored, so that each data segment of the data to be stored is stored To different storage nodes.
  • OSD can be understood as a storage node OSD_1, OSD_2, etc. can be understood as the identification information of the storage node
  • wwn can be understood as a disk
  • wwn_1, wwn_2, etc. can be understood as the identification information of the disk.
  • the management server may also understand that the stripe to be stored is sliced and redundantly processed according to the preset stripping code strategy to obtain the original segment and the redundant segment, and The obtained original segment and redundant segment are respectively stored in the allocated storage object.
  • the management server generates a unique key information for each data segment in the stripe, such that each data segment in the stripe corresponds to a five-tuple ⁇ stripe_id, OSD, wwn, key, value>, where Stripe_id represents the strip ID, OSD represents the identification information of the storage node, wwn represents the identification information of the disk, key represents the key of the data segment, and value represents the value or content of the data segment.
  • the management server sends the data segment to its corresponding storage node according to the complete stripe described above.
  • the management server may send a triplet ⁇ wwn, key, value> to the storage node, and the storage node stores the data ⁇ key, value> to the corresponding disk according to the triplet, and returns the storage node after the storage is completed. Store a successful message to the management server.
  • the management server After the management server receives the storage success message sent by each storage node corresponding to the stripe (indicating that the data to be stored is successfully stored), the management server will ⁇ stripe_id, wwn, key> (that is, index) of each data segment. Information) is sent to the platform server.
  • the platform server After the platform server records the ⁇ stripe_id, wwn, key> (that is, the index information) of each data segment of the data to be stored, the data to be stored is stored.
  • the same disk may belong to different storage nodes at different times. Therefore, in this case, the stripe recorded by the platform server may not include the identification information of the storage node, that is, Said, the management server records the strips can be: ⁇ stripe_id, wwn_1, key_1>, ⁇ stripe_id, wwn_2, key_2>, ⁇ stripe_id, wwn_3, key_3>, ⁇ stripe_id, wwn_4, key_4>, ⁇ stripe_id, wwn_5, key_5 > ⁇ .
  • the foregoing storage node may also be a storage server, or other components, which are not specifically limited.
  • a platform server determining index information corresponding to the data to be read
  • the management server reads each data segment of the data to be read from the storage object according to the index information determined by the platform server; and combines each data segment read to obtain the data to be read.
  • the platform server receives the data read request sent by the user, determines the data to be read according to the data read request, and specifies the management server that performs the read operation.
  • the designated management server requests the platform server for the stripe information of the data to be read (that is, the index information corresponding to the data to be read).
  • the same disk may belong to different storage nodes at different times.
  • the stripe information recorded by the platform server does not include the identification information of the storage node. Therefore, the platform server needs to obtain the corresponding information according to the identification information of the disk.
  • the identification information of the storage node is filled in the stripe of the obtained storage node, and the padded strip is:
  • the management server may send a binary group ⁇ wwn, key> to the storage node according to the above-mentioned padded stripe, and the storage node reads the value on the wwn (disk) according to the binary group according to the binary group, and the storage node will The read ⁇ key, value> is sent to the management server.
  • the management server combines the ⁇ key, value> of each data segment sent by the storage node, and the strip is:
  • the management server sends the data to be read to the user.
  • the foregoing storage node may also be a storage server, or other components, which are not specifically limited.
  • the platform server determines, in the recorded index information, the information to be distributed
  • a management server determining, according to the determined information to be distributed, a data segment to be distributed; allocating a distributed storage object for the data segment to be distributed; and storing the data segment to be distributed separately to the distributed storage object;
  • the platform server updates the information to be distributed.
  • the platform server may determine the number of records of each storage object corresponding to each index information, where the number of records of one storage object in one index information is: the index information stored in the storage object.
  • the management server may determine the data segment to be distributed in the data segment corresponding to the target storage object.
  • the data processing system may further include an audit server.
  • the platform server determines, in the recorded index information, the information to be distributed
  • the audit server determines, according to the determined index information to be distributed, a data segment to be distributed; allocates a distributed storage object to the data segment to be distributed; and stores the data segment to be distributed separately to the distributed storage object;
  • the platform server updates the information to be distributed.
  • the platform server may determine the number of records of each storage object corresponding to each index information; and when there is a target storage object whose recording times exceed the third preset threshold, the index information is determined to be an index to be distributed. information.
  • the audit server may determine the data segment to be distributed in the data segment corresponding to the target storage object.
  • the platform server scans the recorded strips (that is, the index information corresponding to each data segment), and determines the number of records of each storage object corresponding to each strip, when there is a target storage object whose recording times exceed the third preset threshold.
  • the strip is determined as the strip to be dispersed.
  • the same disk may belong to different storage nodes at different times.
  • the stripe information recorded by the platform server does not include the identification information of the storage node. Therefore, the platform server needs to obtain the corresponding information according to the identification information of the disk.
  • the identification information of the storage node is filled in the stripe of the obtained storage node, and the padded strip is:
  • the platform server sends the filled stripe (that is, the information to be distributed) to the audit server.
  • the audit server receives the strip.
  • the audit server analyzes the strip and determines the fragment to be dispersed:
  • ⁇ key1, key2> is stored on a disk wwn_1
  • ⁇ key1, key2>, ⁇ key3, key4> are stored on a storage node OSD_1
  • key_1 and key_3 can be determined as pieces of data to be distributed. That is to say, when a plurality of data segments of the same data are stored in the same disk or the same storage node, the data segments to be distributed may be determined among the plurality of data segments.
  • the audit server sends ⁇ wwn_1, key_1>, ⁇ wwn_3, key_3> to the corresponding storage node to read the data.
  • the audit server applies to the platform server for the distributed storage node, and the disks ⁇ OSD_x, wwn_x>, ⁇ OSD_y, wwn_y> (that is, allocates the distributed storage objects for the data segments to be distributed).
  • the audit server writes the read data ⁇ OSD_1, wwn_1, key_1, value_1>, ⁇ OSD_3, wwn_3, key_3, value_3> to the new storage node and the disk, and the data segment after the completion of the writing can be represented.
  • the audit server notifies the platform server that the decentralized operation has been completed, and the platform server modifies the stripe to: ⁇ stripe_id, wwn_x, key_1>, ⁇ stripe_id, wwn_1, key_2>, ⁇ stripe_id, wwn_y, key_3>, ⁇ stripe_id, wwn_4, Key_4>, ⁇ stripe_id, wwn_5, key_5> ⁇ .
  • the foregoing storage node may also be a storage server, or other components, which are not specifically limited.
  • the data that is unevenly stored may be dispersed, so that each data segment is more uniformly stored into each storage object, and the storage object is the smallest unit for storing data, which can be avoided.
  • a storage object fails, and all the pieces of data of the same data are lost, resulting in failure to recover data.
  • Storage to multiple storage objects can prevent a storage object from failing, and all of the data fragments are lost, resulting in data recovery failure.
  • a platform server in the recorded index information, determining index information to be reconstructed; wherein the index information to be reconstructed includes information of a storage object that is faulty;
  • the management server reads the target data segment from the non-failed storage object according to the information of the non-faulty storage object included in the information to be reconstructed, where the target data segment is: the to-be-reconstructed Corresponding to each data segment of the data to be reconstructed; reconstructing the target data segment to obtain a repair segment; storing the repair segment into a storage object allocated thereto;
  • the platform server updates the index information to be reconstructed.
  • the data processing system may further include an audit server.
  • a platform server in the recorded index information, determining index information to be reconstructed; wherein the index information to be reconstructed includes information of a storage object that is faulty;
  • the audit server reads the target data segment from the non-failed storage object according to the information of the non-faulty storage object included in the information to be reconstructed, where the target data segment is: the to-be-reconstructed Corresponding to each data segment of the data to be reconstructed; reconstructing the target data segment to obtain a repair segment; storing the repair segment into a storage object allocated thereto;
  • the platform server updates the index information to be reconstructed.
  • the platform server obtains the information of the failed storage node or the disk; scans the striped strips (that is, the index information corresponding to each data segment); and determines the stripe containing the information of the failed storage node or the disk as the to-be-weighted Framing (ie, determining the index information to be reconstructed); specifying an audit server that performs the refactoring operation.
  • the same disk may belong to different storage nodes at different times.
  • the stripe information recorded by the platform server does not include the identification information of the storage node. Therefore, the platform server needs to obtain the corresponding information according to the identification information of the disk.
  • the identification information of the storage node is filled in the stripe of the obtained storage node, and the padded strip is:
  • the platform server sends the padded strip and the data fragments ⁇ stripe_id, OSD_1, wwn_1, key_1> (that is, the index information to be reconstructed) that need to be repaired to the audit server.
  • the audit server classifies the information in the stripe:
  • the audit server sends ⁇ wwn_2, key_2>, ⁇ wwn_3, key_3>, ⁇ wwn_4, key_4>, ⁇ wwn_5, key_5> ⁇ to the corresponding storage node to read the data.
  • the audit server requests the platform server for a new storage node and disk ⁇ OSD_z, wwn_z>.
  • the audit server writes the recovered data fragment value_1 to the new storage node and the disk ⁇ OSD_z, wwn_z>.
  • the audit server After the refactoring operation is completed, the audit server notifies the platform server to update the stripe. At this time, the platform server updates the stripe to:
  • the foregoing storage node may also be a storage server, or other components, which are not specifically limited.
  • the platform server determines, in the recorded index information, the index information to be recovered
  • the management server determines, according to the to-be-recovered index information, a piece of data to be recovered
  • a platform server configured to allocate a storage object for the data segment to be recycled
  • a management server respectively storing the to-be-recovered data segments into a storage object allocated thereto;
  • a platform server configured to record a correspondence between each of the to-be-recovered data segments and a storage object that stores the to-be-recovered data segment; and generate new index information to replace the to-be-recovered index information according to the recorded correspondence relationship;
  • the data processing system may further include an audit server.
  • the platform server determines, in the recorded index information, the index information to be recovered
  • An audit server determines, according to the to-be-recovered index information, a piece of data to be recovered
  • a platform server configured to allocate a storage object for the data segment to be recycled
  • An audit server wherein the pieces of data to be recycled are separately stored in a storage object allocated thereto;
  • the platform server records a correspondence between each of the to-be-recovered data segments and the storage object that stores the to-be-recovered data segment; and generates new index information to replace the to-be-recovered index information according to the recorded correspondence relationship.
  • the platform server may determine, for each index information that is recorded, whether the number of invalid data segments included in the index information is greater than a fourth preset threshold, and if yes, determine the index information to be reclaimed; wherein, the invalid The data fragment is: a data segment corresponding to the failed storage object.
  • the management server may determine a data segment to be recovered according to the valid data segment in the to-be-recovered index information, where the valid data segment is a data segment other than the invalid data segment.
  • the platform server scans the recorded strips (that is, the index information corresponding to each data segment), and determines the strip to be recovered:
  • the number of copies of the invalid data segments (using NULL) contained in the following five strips is 4, so the five strips are determined as the strips to be recovered:
  • the same disk may belong to different storage nodes at different times.
  • the stripe information recorded by the platform server does not include the identification information of the storage node. Therefore, the platform server needs to obtain the corresponding information according to the identification information of the disk.
  • the identification information of the storage node is filled in the stripe of the obtained storage node, and the padded strip is:
  • the platform server sends the populated stripe to the audit server.
  • the audit server receives the above five to-be-reconstructed strips.
  • the audit server determines the data segments to be recovered according to the received five stripes. That is, a data segment other than the invalid data segment (represented by NULL) in the stripe is determined as the data segment to be recovered.
  • the audit server sends the valid data segment ⁇ wwn_11, key_11> in the stripe_id_1 to the storage node OSD_11 to read the data value11; the valid data segment ⁇ wwn_21, key_21> in the stripe_id_2 is sent to the storage node OSD_21 to read the data value21; in the stripe_id_3
  • the valid data segment ⁇ wwn_31, key_31> is sent to the storage node OSD_31 to read the data value31;
  • the valid data segment ⁇ wwn_41, key_41> in the stripe_id_4 is sent to the storage node OSD_41 to read the data value41;
  • the valid data segment ⁇ bwn_51 in the stripe_id_5, Key_51> is sent to the storage node OSD_51 to read the data value51.
  • the audit server applies for an idle strip to the platform server, assuming that the applied strip is:
  • the audit server organizes the read data according to the new stripe. After the organization is completed, the new strip is:
  • the audit server sends the data segment to its corresponding storage node according to the above new stripe.
  • the audit server may send a triplet ⁇ wwn, key, value> to the storage node, and the storage node stores the data ⁇ key, value> to the corresponding disk according to the triplet, and returns the storage node after the storage is completed. Store a successful message to the audit server.
  • the platform server After the audit server is recycled, notify the platform server to send the above new strip to the platform server for storage. That is, the platform server records the correspondence between each of the to-be-recovered data segments and the storage object that stores the to-be-recovered data segment; and according to the recorded correspondence relationship, generates new index information to replace the to-be-recovered index information.
  • the same disk may belong to different storage nodes at different times. Therefore, in this case, the stripe recorded by the platform server may not include the identification information of the storage node, that is, Say, the stripes recorded by the management server can be:
  • the platform server deletes the above five to-be-recovered strips.
  • the platform server may execute the solution every preset period, and timely recover the valid segments in the stripe to save the strip resources.
  • the foregoing storage node may also be a storage server, or other components, which are not specifically limited.
  • the data processing system may further include a storage server, as shown in FIG.
  • the storage server may include the foregoing storage node, or the storage server may also be the foregoing storage node.
  • the storage server reports the running status information of the plurality of storage objects to the platform server, so that the platform server allocates storage objects for the data to be stored according to the running state information reported by each storage server, and allocates the data segments to be distributed. Decentralize storage objects and determine index information to be reconstructed.
  • the embodiment of the present application further provides an electronic device, as shown in FIG. 12, including: a processor 1201 and a memory 1202, wherein the memory 1202 is configured to store executable program code, and the processor 1201 reads the memory stored in the memory 1202.
  • the program code is executed to execute a program corresponding to the executable program code for performing any of the above data storage, decentralization, reconstruction, and recycling methods.
  • Embodiments of the present application also provide an executable program code for being executed to perform any of the above data storage, decentralization, reconstruction, and recycling methods.
  • the embodiment of the present application further provides a computer readable storage medium for storing executable program code for being executed to perform any of the above data storage, dispersion, Refactoring, recycling methods.
  • a portion of the embodiment of the method of recycling may be described; for the above-described executable program code embodiment and computer readable storage medium embodiment, since it is substantially similar to the data storage, decentralized, reconstructed,
  • the method of the recycling method is described, so the description is relatively simple.
  • the related parts refer to the description of the embodiment of the data storage, dispersion, reconstruction, and recycling method shown in FIG.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据存储、分散、重构、回收的方法、装置及数据处理系统,应用该存储方法,将待存储数据的各个数据片段分别存储至各个存储对象中,存储对象为存储数据的最小单元,各个存储对象中存储的数据片段份数的差值小于第一预设阈值。所述方法较均匀地将各个数据片段存储至多个存储对象中,能够避免一个存储对象出现故障,同一数据的多个部分数据片段丢失,导致无法对该数据进行数据恢复的情况。

Description

数据存储、分散、重构、回收方法、装置及数据处理系统
本申请要求于2017年3月17日提交中国专利局、申请号为201710162001.X、发明名称为“数据存储、分散、重构、回收方法、装置及数据处理系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术领域,特别涉及一种数据存储、分散、重构、回收方法、装置及数据处理系统。
背景技术
磁盘阵列(Redundant Arrays of Independent Disks,RAID)技术,是指由多个磁盘组成一个大容量的磁盘组,并将待存储数据分割成多个片段,分别存放在各个磁盘中。
在磁盘阵列中通常利用纠删码(erasure coding,EC)进行数据保护,纠删码策略可以用n=k+m来表示,其中,k表示原始片段份数,m表示冗余片段份数,n表示总的数据片段份数。具体的,可以将待存储数据分割成k个原始片段,基于这k个原始片段进行扩展、编码,得到m个冗余片段;将这k个原始片段和m个冗余片段分别存储至各个磁盘中;这样,如果某个磁盘出现故障导致数据丢失,则可以根据未丢失的原始片段及冗余片段对丢失的数据片段进行恢复。
上述方案中,通常将各个数据片段(原始片段及冗余片段)随机地存储至各个磁盘中,可能出现多部分数据片段存储至一个磁盘中,如果该磁盘出现故障,该磁盘中存储的多部分数据片段可能全部丢失,导致无法进行数据恢复。
发明内容
本申请实施例的目的在于提供一种数据存储、分散、重构、回收方法、装置及数据处理系统,以避免一个磁盘出现故障,多部分数据片段全部丢失,导致无法进行数据恢复的情况。
为达到上述目的,本申请实施例公开了一种数据存储方法,包括:
根据预设的纠删码策略k+m,为待存储数据分配x个存储对象;其中,所 述k表示原始片段份数,所述m表示冗余片段份数,所述x大于1且不大于k+m,所述存储对象为存储数据的最小单元;
利用所述纠删码策略k+m,对所述待存储数据进行切片及冗余处理,得到k+m个数据片段;
将所述k+m个数据片段分别存储至所述x个存储对象,其中,各个存储对象中存储的数据片段份数的差值小于第一预设阈值;
记录所述待存储数据对应的索引信息,所述索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系。
为达到上述目的,本申请实施例还公开了一种数据分散方法,包括:
在所记录的索引信息中,确定待分散索引信息;其中,索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系,所述存储对象为存储数据的最小单元;
根据所确定的待分散索引信息,确定待分散数据片段;
为所述待分散数据片段分配分散存储对象;
将所述待分散数据片段分别存储至所述分散存储对象;
更新所述待分散索引信息。
为达到上述目的,本申请实施例还公开了一种数据重构方法,包括:
在所记录的索引信息中,确定待重构索引信息;其中,索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系,所述待重构索引信息中包含出现故障的存储对象的信息,所述存储对象为存储数据的最小单元;
根据所述待重构信息中包含的未出现故障的存储对象的信息,从所述未出现故障的存储对象中读取目标数据片段,所述目标数据片段为:所述待重构信息对应的待重构数据的每个数据片段;
将所述目标数据片段进行重构,得到修复片段;
将所述修复片段存储至为其分配的存储对象中;
更新所述待重构索引信息。
为达到上述目的,本申请实施例还公开了一种数据回收方法,包括:
在所记录的索引信息中,确定待回收索引信息;其中,索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系,所述存储对象为存储数据的最小单元;
根据所述待回收索引信息,确定待回收数据片段;
为所述待回收数据片段分配存储对象;
将所述待回收数据片段分别存储至为其分配的存储对象中;
记录每个待回收数据片段与存储所述待回收数据片段的存储对象的对应关系;
根据所记录的对应关系,生成新的索引信息替换所述待回收索引信息。
为达到上述目的,本申请实施例还公开了一种数据存储装置,包括:
第一分配模块,用于根据预设的纠删码策略k+m,为待存储数据分配x个存储对象;其中,所述k表示原始片段份数,所述m表示冗余片段份数,所述x大于1且不大于k+m,所述存储对象为存储数据的最小单元;
切片模块,用于利用所述纠删码策略k+m,对所述待存储数据进行切片及冗余处理,得到k+m个数据片段;
第一存储模块,用于将所述k+m个数据片段分别存储至所述x个存储对象,其中,各个存储对象中存储的数据片段份数的差值小于第一预设阈值;
第一记录模块,用于记录所述待存储数据对应的索引信息,所述索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系。
为达到上述目的,本申请实施例还公开了一种数据分散装置,包括:
第一确定模块,用于在所记录的索引信息中,确定待分散索引信息;其中,索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系,所述存储对象为存储数据的最小单元;
第二确定模块,用于根据所确定的待分散索引信息,确定待分散数据片段;
第二分配模块,用于为所述待分散数据片段分配分散存储对象;
第二存储模块,用于将所述待分散数据片段分别存储至所述分散存储对象;
第一更新模块,用于更新所述待分散索引信息。
为达到上述目的,本申请实施例还公开了一种数据重构装置,包括:
第三确定模块,用于在所记录的索引信息中,确定待重构索引信息;其中,索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系,所述待重构索引信息中包含出现故障的存储对象的信息,所述存储对象为存储数据的最小单元;
读取模块,用于根据所述待重构信息中包含的未出现故障的存储对象的 信息,从所述未出现故障的存储对象中读取目标数据片段,所述目标数据片段为:所述待重构信息对应的待重构数据的每个数据片段;
重构模块,用于将所述目标数据片段进行重构,得到修复片段;
第三存储模块,用于将所述修复片段存储至为其分配的存储对象中;
第二更新模块,用于更新所述待重构索引信息。
为达到上述目的,本申请实施例还公开了一种数据回收装置,包括:
第四确定模块,用于在所记录的索引信息中,确定待回收索引信息;其中,索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系,所述存储对象为存储数据的最小单元;
第五确定模块,用于根据所述待回收索引信息,确定待回收数据片段;
第三分配模块,用于为所述待回收数据片段分配存储对象;
第四存储模块,用于将所述待回收数据片段分别存储至为其分配的存储对象中;
第二记录模块,用于记录每个待回收数据片段与存储所述待回收数据片段的存储对象的对应关系;
替换模块,用于将所述对应关系替换所述待回收索引信息。
为达到上述目的,本申请实施例还公开了一种数据处理系统,包括:平台服务器和管理服务器,其中,
所述平台服务器,根据预设的纠删码策略k+m,为待存储数据分配x个存储对象;其中,所述k表示原始片段份数,所述m表示冗余片段份数,所述x大于1且不大于k+m,所述存储对象为存储数据的最小单元;
所述管理服务器,利用所述纠删码策略k+m,对所述待存储数据进行切片及冗余处理,得到k+m个数据片段;将所述k+m个数据片段分别存储至所述x个存储对象,其中,各个存储对象中存储的数据片段份数的差值小于第四预设阈值;
所述平台服务器,记录所述待存储数据对应的索引信息,所述索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系。
可选的,所述平台服务器,确定待读取数据对应的索引信息;
所述管理服务器,根据所述平台服务器所确定的索引信息,从存储对象中读取所述待读取数据的每个数据片段;将所读取的每个数据片段进行组合,得到所述待读取数据。
可选的,所述平台服务器,在所记录的索引信息中,确定待分散索引信息;
所述管理服务器,根据所确定的待分散索引信息,确定待分散数据片段;为所述待分散数据片段分配分散存储对象;将所述待分散数据片段分别存储至所述分散存储对象;
所述平台服务器,更新所述待分散索引信息;
或者,所述系统还包括审计服务器,
所述平台服务器,在所记录的索引信息中,确定待分散索引信息;
所述审计服务器,根据所确定的待分散索引信息,确定待分散数据片段;为所述待分散数据片段分配分散存储对象;将所述待分散数据片段分别存储至所述分散存储对象;
所述平台服务器,更新所述待分散索引信息。
可选的,所述平台服务器,在所记录的索引信息中,确定待重构索引信息;其中,所述待重构索引信息中包含出现故障的存储对象的信息;
所述管理服务器,根据所述待重构信息中包含的未出现故障的存储对象的信息,从所述未出现故障的存储对象中读取目标数据片段,所述目标数据片段为:所述待重构信息对应的待重构数据的每个数据片段;将所述目标数据片段进行重构,得到修复片段;将所述修复片段存储至为其分配的存储对象中;
所述平台服务器,更新所述待重构索引信息;
或者,所述系统还包括审计服务器,
所述平台服务器,在所记录的索引信息中,确定待重构索引信息;其中,所述待重构索引信息中包含出现故障的存储对象的信息;
所述审计服务器,根据所述待重构信息中包含的未出现故障的存储对象的信息,从所述未出现故障的存储对象中读取目标数据片段,所述目标数据片段为:所述待重构信息对应的待重构数据的每个数据片段;将所述目标数据片段进行重构,得到修复片段;将所述修复片段存储至为其分配的存储对象中;
所述平台服务器,更新所述待重构索引信息。
可选的,所述平台服务器,在所记录的索引信息中,确定待回收索引信息;
所述管理服务器,根据所述待回收索引信息,确定待回收数据片段;
所述平台服务器,为所述待回收数据片段分配存储对象;
所述管理服务器,将所述待回收数据片段分别存储至为其分配的存储对象中;
所述平台服务器,记录每个待回收数据片段与存储所述待回收数据片段的存储对象的对应关系;将所述对应关系替换所述待回收索引信息;
或者,所述系统还包括审计服务器,
所述平台服务器,在所记录的索引信息中,确定待回收索引信息;
所述审计服务器,根据所述待回收索引信息,确定待回收数据片段;
所述平台服务器,为所述待回收数据片段分配存储对象;
所述审计服务器,将所述待回收数据片段分别存储至为其分配的存储对象中;
所述平台服务器,记录每个待回收数据片段与存储所述待回收数据片段的存储对象的对应关系;将所述对应关系替换所述待回收索引信息。
可选的,所述系统还可以包括:存储服务器,所述存储服务器包含多个存储对象;
所述存储服务器,向所述平台服务器上报自身多个存储对象的运行状态信息,以使所述平台服务器根据每个存储服务器上报的运行状态信息为待存储数据分配存储对象、以及为待分散数据片段分配分散存储对象、以及确定待重构索引信息。
为达到上述目的,本申请实施例还公开了一种电子设备,包括:处理器和存储器,其中,存储器用于存储可执行程序代码,处理器通过读取存储器中存储的可执行程序代码来运行与可执行程序代码对应的程序,以用于执行上述任一种分布式系统升级方法。
为达到上述目的,本申请实施例还公开了一种可执行程序代码,所述可执行程序代码用于被运行以执行上述任一种分布式系统升级方法。
为达到上述目的,本申请实施例还公开了一种计算机可读存储介质,所述计算机可读存储介质用于存储可执行程序代码,所述可执行程序代码用于被运行以执行上述任一种分布式系统升级方法。
应用本申请实施例,将待存储数据的各个数据片段分别存储至各个存储 对象中,存储对象为存储数据的最小单元,各个存储对象中存储的数据片段份数的差值小于第一预设阈值,也就是说,本方案中,较均匀地将各个数据片段存储至多个存储对象中,能够避免一个存储对象出现故障,同一数据的多部分数据片段全部丢失,导致无法对该数据进行数据恢复的情况。
当然,实施本申请的任一产品或方法并不一定需要同时达到以上所述的所有优点。
附图说明
为了更清楚地说明本申请实施例和现有技术的技术方案,下面对实施例和现有技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种数据存储方法的流程示意图;
图2为本申请实施例提供的一种数据分散方法的流程示意图;
图3为本申请实施例提供的一种数据重构方法的流程示意图;
图4为本申请实施例提供的一种数据回收方法的流程示意图;
图5为本申请实施例提供的一种数据存储装置的结构示意图;
图6为本申请实施例提供的一种数据分散装置的结构示意图;
图7为本申请实施例提供的一种数据重构装置的结构示意图;
图8为本申请实施例提供的一种数据回收装置的结构示意图;
图9为本申请实施例提供的数据处理系统的第一种结构示意图;
图10为本申请实施例提供的数据处理系统的第二种结构示意图;
图11为本申请实施例提供的数据处理系统的第三种结构示意图;
图12为本申请实施例提供的一种电子设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案、及优点更加清楚明白,以下参照附图并举实施例,对本申请进一步详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行 清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
为了解决上述技术问题,本申请实施例提供了一种数据存储、分散、重构、回收方法、装置及数据处理系统。该方法及装置可以应用于服务器、客户端及各种电子设备,具体不做限定。下面首先对本申请实施例提供的数据存储方法进行详细说明。
图1为本申请实施例提供的一种数据存储方法的流程示意图,包括:
S101:根据预设的纠删码策略k+m,为待存储数据分配x个存储对象。
其中,所述k表示原始片段份数,所述m表示冗余片段份数,所述x大于1且不大于k+m。
存储对象即为存储数据的最小单元,可以理解为存储数据片段(包括原始片段及冗余片段)的最小单元。如果该最小单元为磁盘,则该存储对象为磁盘,如果磁盘能被划分为更小的对象块,则该存储对象为对象块,具体不做限定。下面内容以存储对象为磁盘的情况进行说明。
作为一种实施方式,S101可以包括:
判断可用存储节点的数量是否不小于k+m;
如果不小于,在可用存储节点中确定x个存储对象;其中,所确定的每个存储对象所在的存储节点不同;所述x等于所述k+m;
如果小于,判断全部可用存储节点中的可用存储对象的数量是否不小于k+m;
如果是,在全部可用存储节点中确定x个存储对象;其中,所述x等于所述k+m,每个可用存储节点中确定的存储对象的数量差值小于第二预设阈值;
如果否,将全部可用存储对象分配给所述待存储数据。
也就是说,在为待存储数据分配存储对象时,如果有足够多的可用存储节点(数量不小于k+m),则分配k+m个可用存储节点以存储该待存储数据,使得该待存储数据的每个数据片段存储至不同的存储节点中。
如果没有足够多的可用存储节点(数量小于k+m),但是有足够多的可用存储对象,也就是可用磁盘的数量不小于k+m,则分配k+m个可用磁盘以存储该待存储数据,使得待存储数据的每个数据片段存储至不同的磁盘中。需要说明的是,在全部可用存储节点中分配k+m个可用磁盘时,每个可用存储节点 中确定的存储对象的数量差值小于第二预设阈值。也就是保证各数据片段均匀存储至各个存储节点。
如果也没有足够多的可用磁盘(数量小于k+m),则分配全部可用磁盘(x个可用磁盘)以存储该待存储数据,使得该待存储数据的每个数据片段存储的磁盘尽量不同。
综上所述,也就是将待存储数据尽量分散存储,如果一个磁盘出现故障、或者一个存储节点出现故障,导致数据丢失,仍可以利用其他磁盘或存储节点中存储的数据片段,对丢失的数据片段进行恢复。
作为一种实施方式,也可以根据存储对象的资源占用情况,为待存储数据分配存储对象。具体的,可以优先分配资源占用率较小的存储对象。
分配存储对象的具体方式可以有多种,具体不做限定。
S102:利用所述纠删码策略k+m,对所述待存储数据进行切片及冗余处理,得到k+m个数据片段。
具体的,可以将所述待存储数据切分成k个原始片段;将这k个原始片段进行冗余处理,得到m个冗余片段。
S103:将所述k+m个数据片段分别存储至所述x个存储对象。其中,各个存储对象中存储的数据片段份数的差值小于第一预设阈值。
第一预设阈值可以根据实际情况进行设定,这里假设为2。
在S101中,已经为待存储数据分配了存储对象,此时,可以把各数据片段存储至分配的存储对象中。
根据上面描述,如果分配了k+m个存储节点,则可以将每个数据片段分别存储至不同存储节点中的存储对象。这样,各个存储对象中存储的数据片段份数都为1,差值为0,小于第一预设阈值。
如果分配了k+m个存储对象(磁盘),则可以将每个数据片段分别存储至不同磁盘。这样,各个存储对象中存储的数据片段份数都为1,差值为0,小于第一预设阈值。
如果分配了x(小于k+m)个磁盘,则可以将这k+m个数据片段均匀划分为x份,将所述x份数据片段分别存储至所述x个存储对象中。均匀划分可以理解为:如果k+m个数据片段能被平均划分为x份,则把这k+m个数据片段平均划分为x份;如果k+m个数据片段不能被平均划分为x份,则把这k+m个数据片段尽量平均地划分为x份。
举例来说,如果k为4,m为2,x为3,也就是把6个数据片段划分为3份,能够平均划分,则每份包括2个数据片段。将每2个数据片段存储至1个存储对象中,这样,各个存储对象中存储的数据片段份数都为2,差值为0,小于第一预设阈值。
如果k为4,m为1,x为3,也就是把5个数据片段划分为3份,不能够平均划分,则第一份可以包括2个数据片段,第二份可以包括2个数据片段,第三份可以包括1个数据片段。这样,各个存储对象中存储的数据片段份数分别为2、2、1,差值为0或1,仍小于第一预设阈值。
综上所述,也就是将k+m个数据片段尽量均匀地存储至各个磁盘中。这样,如果一个磁盘出现故障、或者一个存储节点出现故障,导致数据丢失,仍可以利用其他磁盘或存储节点中存储的数据片段,对丢失的数据片段进行恢复。
S104:记录所述待存储数据对应的索引信息,所述索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系。
假设预设的纠删码策略为4+2,将待存储数据切分成4个原始片段,将这4个原始片段进行冗余处理,得到2个冗余片段;这6个数据片段(4个原始片段和2个冗余片段)分别为:A1、A2、A3、A4、A5、A6。假设分配了6个磁盘作为该待存储数据的存储对象,这6个磁盘分别为:B1、B2、B3、B4、B5、B6。
具体将哪个数据片段存储至哪个磁盘不做限定,这里假设将A1存储至B1,将A2存储至B2,将A3存储至B3,将A4存储至B4,将A5存储至B5,将A6存储至B6。这种情况下,记录该待存储数据对应的索引信息可以包括:A1—B1、A2—B2、A3—B3、A4—B4、A5—B5、A6—B6。
需要说明的是,在索引信息中,也可以区分标识原始片段及冗余片段,上述索引信息只是为了简化说明,具体不做限定。
应用本申请图1所示实施例,将待存储数据的各个数据片段分别存储至各个存储对象中,存储对象为存储数据的最小单元,各个存储对象中存储的数据片段份数的差值小于第一预设阈值,也就是说,本方案中,较均匀地将各个数据片段存储至多个存储对象中,能够避免一个存储对象出现故障,同一数据的多部分数据片段全部丢失,导致无法对该数据进行数据恢复的情况。
图2为本申请实施例提供的一种数据分散方法的流程示意图,包括:
S201:在所记录的索引信息中,确定待分散索引信息。
需要说明的是,本申请图2所示实施例中的数据可以基于本申请图1所示实施例提供的数据存储方法进行存储,或者,也可以基于其他存储方法进行存储。由本申请图1所示实施例中的描述可知,在数据存储过程中,记录了数据的索引信息。因此,可以在所记录的索引信息中,确定待分散索引信息。
或者,应用其他存储方法也需要记录索引信息,索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系,存储对象为存储数据的最小单元。
具体的,可以针对每条索引信息,确定其所对应的各个存储对象的记录次数;当存在记录次数超过第三预设阈值的目标存储对象时,将该索引信息确定为待分散索引信息。其中,一条索引信息中一个存储对象的记录次数为:该存储对象中存储的该条索引信息中的数据片段的数量。
举例来说,假设索引信息包括:A1—B1、A2—B1、A3—B1、A4—B4、A5—B4、A6—B6,其中,存储对象B1中存储的该条索引信息中的数据片段(A1、A2和A3)的数量为3,存储对象B1被记录了3次,也就是说存储对象B1的记录次数为3,存储对象B4的记录次数为2,存储对象B6的记录次数为1。
假设第三预设阈值为2,则存在记录次数大于2的目标存储对象B1。也就是说,该索引信息对应的数据有3个数据片段存储在同一个存储对象中。这种情况下,将该索引信息确定为待分散索引信息。
可以理解的是,多种情况可能导致数据存储不均匀,使得同一数据的多个数据片段存储至一个存储对象中。比如,当系统扩容时,新的存储节点或磁盘加入系统,可能导致数据存储不均匀的情况;或者,当存储节点数量不足时,也会导致数据存储不均匀的情况。
可以执行本方案对存储不均匀的数据进行分散。如果同一数据的多个数据片段存储至一个存储对象中,则可以将该多个数据片段进行分散。具体的,可以通过数据对应的索引信息,判断该数据是否存在需要分散的数据片段,也就是当存在记录次数超过第三预设阈值的目标存储对象时,将该索引信息确定为待分散索引信息。
S202:根据所确定的待分散索引信息,确定待分散数据片段。
具体的,可以在目标存储对象对应的数据片段中,确定待分散数据片段。
延续上述例子,第三预设阈值为2,则存在记录次数大于2的目标存储对象B1,也就是在B1对应的数据片段A1、A2和A3中,确定待分散数据片段。 假设将A2和A3确定为待分散数据片段。
S203:为所述待分散数据片段分配分散存储对象。
可以理解的是,同一数据的多个数据片段存储至同一存储对象时,可以将这多个数据片段分散存储至其他存储对象。因此,需要分配新的存储对象,这里将该新的存储对象称为分散存储对象。
分配分散存储对象的原则与本申请图1所示实施例中为待存储数据分配x个存储对象的原则可以相同:
如果有足够多的可用存储节点,则为每个待分散数据片段分配一个可用存储节点;
如果没有足够多的可用存储节点,但是有足够多的可用磁盘,则为每个待分散数据片段分配一个可用磁盘;
如果也没有足够多的可用磁盘,则分配全部可用磁盘以存储这些待分散数据片段。
综上所述,也就是将这些待分散数据片段尽量分散存储,如果一个磁盘出现故障、或者一个存储节点出现故障,导致数据丢失,仍可以利用其他磁盘或存储节点中存储的数据片段,对丢失的数据片段进行恢复。
作为一种实施方式,也可以根据存储对象的资源占用情况,分配分散存储对象。具体的,可以优先分配资源占用率较小的存储对象作为分散存储对象。
分配分散存储对象的具体方式可以有多种,具体不做限定。
S204:将所述待分散数据片段分别存储至所述分散存储对象。
这里假设没有足够多的可用存储节点,但是有足够多的可用磁盘,为每个待分散数据片段分配一个可用磁盘。
假设分散存储对象为磁盘B3和磁盘B5,具体将哪个数据片段存储至哪个磁盘不做限定,这里假设将A2存储至B3,将A3存储至B5。
S205:更新所述待分散索引信息。
具体的,可以根据所述待分散数据片段与存储所述待分散数据片段的分散存储对象的对应关系,更新所述待分散索引信息。
将该待分散索引信息更新为:A1—B1、A2—B3、A3—B5、A4—B4、A5—B4、A6—B6。
应用本申请图2所示实施例,当存在数据存储不均匀的情况时,可以对存 储不均匀的数据进行分散,使得各个数据片段较均匀地存储至各个存储对象中,存储对象为存储数据的最小单元,能够避免一个存储对象出现故障,同一数据的多部分数据片段全部丢失,导致无法进行数据恢复的情况。存储至多个存储对象中,能够避免一个存储对象出现故障,多部分数据片段全部丢失,导致无法进行数据恢复的情况。
图3为本申请实施例提供的一种数据重构方法的流程示意图,包括:
S301:在所记录的索引信息中,确定待重构索引信息。其中,所述待重构索引信息中包含出现故障的存储对象的信息,存储对象为存储数据的最小单元。
需要说明的是,本申请图3所示实施例中的数据可以基于本申请图1所示实施例提供的数据存储方法进行存储,或者,也可以基于其他存储方法进行存储。由本申请图1所示实施例中的描述可知,在数据存储过程中,记录了数据的索引信息。因此,可以在所记录的索引信息中,确定待重构索引信息。
或者,应用其他存储方法也需要记录索引信息,索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系。
执行本方案的电子设备可以获取出现故障的存储对象的信息,这样,该电子设备便可以确定出待重构索引信息:将包含出现故障的存储对象的信息的索引信息确定为待重构索引信息。
举例来说,该电子设备可以定期检测存储节点、磁盘、或者磁盘下更小的存储单元等是否出现故障。根据上面描述,存储对象可以为存储数据片段的最小单元。如果检测到某个存储对象出现了故障,则将包含该存储对象的信息的索引信息确定为待重构索引信息。
假设存储对象C2和C6出现故障,某条索引信息包括:A1—C1、A2—C2、A3—C3、A4—C4、A5—C5、A6—C6,将该索引信息确定为待重构索引信息。
S302:根据所述待重构信息中包含的未出现故障的存储对象的信息,从所述未出现故障的存储对象中读取目标数据片段。所述目标数据片段为:所述待重构信息对应的待重构数据的每个数据片段。
上述索引信息中未出现故障的存储对象为C1、C3、C4和C5,从C1中读取出A1、从C3中读取出A3、从C4中读取出A4、从C5中读取出A5。
S303:将所述目标数据片段进行重构,得到修复片段。
利用预设的纠删码策略,对A1、A3、A4和A5进行重构,得到修复片段, 也就是新的A2和A6。
S304:将所述修复片段存储至为其分配的存储对象中。
为新的A2和A6重新分配存储对象:C7和C8,假设将A2存储至C7,将A6存储至C8。
S305:更新所述待重构索引信息。
具体的,可以根据所述修复片段与存储所述修复片段的存储对象的对应关系,更新所述待重构索引信息。
将上述索引信息更新为:A1—C1、A2—C7、A3—C3、A4—C4、A5—C5、A6—C8。
应用本申请图3所示实施例,当存储对象出现故障导致数据片段丢失时,利用未出现故障的存储对象中存储的数据片段,能够重构出丢失的数据片段,起到数据恢复的作用。
图4为本申请实施例提供的一种数据回收方法的流程示意图,包括:
S401:在所记录的索引信息中,确定待回收索引信息。
需要说明的是,本申请图4所示实施例中的数据可以基于本申请图1所示实施例提供的数据存储方法进行存储,或者,也可以基于其他存储方法进行存储。由本申请图1所示实施例中的描述可知,在数据存储过程中,记录了数据的索引信息。因此,可以在所记录的索引信息中,确定待回收索引信息。
或者,应用其他存储方法也需要记录索引信息,索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系,存储对象为存储数据的最小单元。
具体的,可以针对所记录的每条索引信息,判断其所包含的无效数据片段份数是否大于第四预设阈值,如果是,将其确定为待回收索引信息;其中,所述无效数据片段为:出现故障的存储对象对应的数据片段。
举例来说,假设索引信息包括:A1—B1、A2—B2、A3—B2、A4—B4、A5—B4、A6—B6,其中,磁盘B2和B6出现故障。则B2和B6对应的数据片段A2、A3和A6为无效数据片段,无效数据片段份数为3。
假设第四预设阈值为2,则无效数据片段份数是否大于第四预设阈值,将上述索引信息确定为待回收索引信息。
S402:根据所述待回收索引信息,确定待回收数据片段。
具体的,可以根据所述待回收索引信息中的有效数据片段,确定待回收 数据片段,所述有效数据片段为除所述无效数据片段之外的数据片段。
延续上述例子,上述索引信息对应的数据片段包括A1、A2、A3、A4、A5和A6,其中,A2、A3和A6为无效数据片段,这里将剩余数据片段A1、A4和A5称为有效数据片段。将有效数据片段确定为待回收数据片段。
S403:为所述待回收数据片段分配存储对象。
作为一种实施方式,可以根据存储对象的资源占用情况,分配存储对象。具体的,可以优先分配资源占用率较小的存储对象。
S404:将所述待回收数据片段分别存储至为其分配的存储对象中。
这里假设没有足够多的可用存储节点,但是有足够多的可用磁盘,为每个待分散数据片段分配一个可用磁盘。
假设分配的存储对象为磁盘B7、磁盘B8和磁盘B9,具体将哪个数据片段存储至哪个磁盘不做限定,这里假设将A1存储至B7,将A4存储至B8,将A5存储至B9。
S405:记录每个待回收数据片段与存储所述待回收数据片段的存储对象的对应关系。
也就是记录A1-B7,A4-B8,A5-B9。
S406:根据所记录的对应关系,生成新的索引信息替换所述待回收索引信息。
新的索引信息为:A1-B7,A4-B8,A5-B9。
需要说明的是,如果所记录的索引信息对应的数据片段的份数是固定的,比如,一条索引信息只能对应6份数据片段,这种情况下,需要集齐6份待回收数据片段才能够进行数据回收,而且同一索引信息可以对应不同数据的数据片段。
延续上述例子,假设还有一条索引信息包括:A10—B10、A20—B20、A30—B20、A40—B40、A50—B40、A60—B60,其中,磁盘B20和B40出现故障。则B20和B40对应的数据片段A20、A30和A60为无效数据片段,无效数据片段份数为3,大于第四预设阈值为2,将该索引信息确定为待回收索引信息。
该索引信息对应的数据片段包括A10、A20、A30、A40、A50和A60,其中,A20、A30和A60为无效数据片段,这里将剩余数据片段A10、A40和A50称为有效数据片段。将有效数据片段确定为待回收数据片段。这样,这两条 待回收索引信息集齐了6份待回收数据片段。
假设为待回收数据片段A10、A40和A50分配的存储对象为磁盘B70、磁盘B80和磁盘B90,具体将哪个数据片段存储至哪个磁盘不做限定,这里假设将A10存储至B70,将A40存储至B80,将A50存储至B90。
记录A10-B70,A40-B80,A50-B90作为新的索引信息,将上述两条新的索引信息替换待分散索引信息,可以将该两条新的索引信息合并为:A1-B7,A4-B8,A5-B9,A10-B70,A40-B80,A50-B90。也就是将上述两条待分散索引信息删除,仅保留A1-B7,A4-B8,A5-B9,A10-B70,A40-B80,A50-B90。
应用本申请图4所示实施例,索引信息中的无效数据片段份数较多时,可以对索引信息中的有效片段进行回收,节省存储资源。
与上述方法实施例相对应,还提供一种数据存储、分散、重构、回收装置。
图5为本申请实施例提供的一种数据存储装置的结构示意图,包括:
第一分配模块501,用于根据预设的纠删码策略k+m,为待存储数据分配x个存储对象;其中,所述k表示原始片段份数,所述m表示冗余片段份数,所述x大于1且不大于k+m,所述存储对象为存储数据的最小单元;
切片模块502,用于利用所述纠删码策略k+m,对所述待存储数据进行切片及冗余处理,得到k+m个数据片段;
第一存储模块503,用于将所述k+m个数据片段分别存储至所述x个存储对象,其中,各个存储对象中存储的数据片段份数的差值小于第一预设阈值;
第一记录模块504,用于记录所述待存储数据对应的索引信息,所述索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系。
在本实施例中,第一分配模块501,可以包括:第一判断子模块和第一确定子模块(图中未示出),其中,
第一判断子模块,用于判断可用存储节点的数量是否不小于k+m;
第一确定子模块,用于当所述第一判断子模块判断结果为是时,在可用存储节点中确定x个存储对象;其中,所确定的每个存储对象所在的存储节点不同;所述x等于所述k+m;
第一存储模块503,可以包括:
第一存储子模块(图中未示出),用于当所述第一判断子模块判断结果为是时,将每个数据片段分别存储至不同存储节点中的存储对象。
在本实施例中,第一分配模块501,可以包括:第二判断子模块和第二确定子模块(图中未示出),其中,
第二判断子模块,用于当所述第一判断子模块判断结果为否时,判断全部可用存储节点中的可用存储对象的数量是否不小于k+m;
第二确定子模块,用于当所述第二判断子模块判断结果为是时,在全部可用存储节点中确定x个存储对象;其中,所述x等于所述k+m,每个可用存储节点中确定的存储对象的数量差值小于第二预设阈值;
第一存储模块503,可以包括:
第二存储子模块(图中未示出),用于当所述第二判断子模块判断结果为是时,将每个数据片段分别存储至不同存储对象。
在本实施例中,第一分配模块501,可以包括:
分配子模块(图中未示出),用于当所述第二判断子模块判断结果为否时,将全部可用存储对象分配给所述待存储数据;其中,所述x等于全部可用存储对象的数量;
第一存储模块503,可以包括:
第三存储子模块(图中未示出),用于当所述第二判断子模块判断结果为否时,将所述k+m个数据片段均匀划分为x份,将所述x份数据片段分别存储至所述x个存储对象。
应用本申请图5所示实施例,将待存储数据的各个数据片段分别存储至各个存储对象中,存储对象为存储数据的最小单元,各个存储对象中存储的数据片段份数的差值小于第一预设阈值,也就是说,本方案中,较均匀地将各个数据片段存储至多个存储对象中,能够避免一个存储对象出现故障,同一数据的多部分数据片段全部丢失,导致无法对该数据进行数据恢复的情况。
图6为本申请实施例提供的一种数据分散装置的结构示意图,本申请图6所示实施例中的数据可以利用本申请图5所示实施例提供的数据存储装置进行存储,或者,也可以利用其他存储装置进行存储,图6中包括:
第一确定模块601,用于在所记录的索引信息中,确定待分散索引信息;其中,索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系,所述存储对象为存储数据的最小单元;
第二确定模块602,用于根据所确定的待分散索引信息,确定待分散数据片段;
第二分配模块603,用于为所述待分散数据片段分配分散存储对象;
第二存储模块604,用于将所述待分散数据片段分别存储至所述分散存储对象;
第一更新模块605,用于更新所述待分散索引信息。
在本实施例中,第一确定模块601,具体可以用于:
针对每条索引信息,确定其所对应的各个存储对象的记录次数,其中,一条索引信息中一个存储对象的记录次数为:该存储对象中存储的该条索引信息中的数据片段的数量;
当存在记录次数超过第三预设阈值的目标存储对象时,将该索引信息确定为待分散索引信息;
第二确定模块602,具体可以用于:
在所述目标存储对象对应的数据片段中,确定待分散数据片段。
应用本申请图6所示实施例,当存在数据存储不均匀的情况时,可以对存储不均匀的数据进行分散,使得各个数据片段较均匀地存储至各个存储对象中,存储对象为存储数据的最小单元,能够避免一个存储对象出现故障,同一数据的多部分数据片段全部丢失,导致无法进行数据恢复的情况。存储至多个存储对象中,能够避免一个存储对象出现故障,多部分数据片段全部丢失,导致无法进行数据恢复的情况。
图7为本申请实施例提供的一种数据重构装置的结构示意图,本申请图7所示实施例中的数据可以利用本申请图5所示实施例提供的数据存储装置进行存储,或者,也可以利用其他存储装置进行存储,图7中包括:
第三确定模块701,用于在所记录的索引信息中,确定待重构索引信息;其中,索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系,所述待重构索引信息中包含出现故障的存储对象的信息,所述存储对象为存储数据的最小单元;
读取模块702,用于根据所述待重构信息中包含的未出现故障的存储对象的信息,从所述未出现故障的存储对象中读取目标数据片段,所述目标数据片段为:所述待重构信息对应的待重构数据的每个数据片段;
重构模块703,用于将所述目标数据片段进行重构,得到修复片段;
第三存储模块704,用于将所述修复片段存储至为其分配的存储对象中;
第二更新模块705,用于更新所述待重构索引信息。
应用本申请图7所示实施例,当存储对象出现故障导致数据片段丢失时,利用未出现故障的存储对象中存储的数据片段,能够重构出丢失的数据片段,起到数据恢复的作用。
图8为本申请实施例提供的一种数据回收装置的结构示意图,本申请图8所示实施例中的数据可以利用本申请图5所示实施例提供的数据存储装置进行存储,或者,也可以利用其他存储装置进行存储,图8中包括:
第四确定模块801,用于在所记录的索引信息中,确定待回收索引信息;其中,索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系,所述存储对象为存储数据的最小单元;
第五确定模块802,用于根据所述待回收索引信息,确定待回收数据片段;
第三分配模块803,用于为所述待回收数据片段分配存储对象;
第四存储模块804,用于将所述待回收数据片段分别存储至为其分配的存储对象中;
第二记录模块805,用于记录每个待回收数据片段与存储所述待回收数据片段的存储对象的对应关系;
替换模块806,用于根据所记录的对应关系,生成新的索引信息替换所述待回收索引信息。
在本实施例中,第四确定模块801,具体可以用于:
针对所记录的每条索引信息,判断其所包含的无效数据片段份数是否大于第四预设阈值,如果是,将其确定为待回收索引信息;其中,所述无效数据片段为:出现故障的存储对象对应的数据片段。
在本实施例中,第五确定模块802,具体可以用于:
根据所述待回收索引信息中的有效数据片段,确定待回收数据片段,所述有效数据片段为除所述无效数据片段之外的数据片段。
应用本申请图8所示实施例,索引信息中的无效数据片段份数较多时,可以对索引信息中的有效片段进行回收,节省存储资源。
本申请实施例还提供一种数据处理系统,该系统可以如图9所示,包括:平台服务器和管理服务器,下面对该系统进行数据存储的过程进行详细说明:
平台服务器,根据预设的纠删码策略k+m,为待存储数据分配x个存储对象;其中,所述k表示原始片段份数,所述m表示冗余片段份数,所述x大于1 且不大于k+m,所述存储对象为存储数据的最小单元;
管理服务器,利用所述纠删码策略k+m,对所述待存储数据进行切片及冗余处理,得到k+m个数据片段;将所述k+m个数据片段分别存储至所述x个存储对象,其中,各个存储对象中存储的数据片段份数的差值小于第四预设阈值;
平台服务器,记录所述待存储数据对应的索引信息,所述索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系。
作为一种实施方式,平台服务器可以判断可用存储节点的数量是否不小于k+m;如果不小于,在可用存储节点中确定x个存储对象;其中,所确定的每个存储对象所在的存储节点不同;所述x等于所述k+m;将每个数据片段分别存储至不同存储节点中的存储对象。
作为一种实施方式,平台服务器可以在判断可用存储节点的数量小于k+m的情况下,判断全部可用存储节点中的可用存储对象的数量是否不小于k+m;如果是,在全部可用存储节点中确定x个存储对象;其中,所述x等于所述k+m,每个可用存储节点中确定的存储对象的数量差值小于第二预设阈值;将每个数据片段分别存储至不同存储对象。
作为一种实施方式,平台服务器可以在判断可用存储节点中的可用存储对象的数量小于k+m的情况下,将全部可用存储对象分配给所述待存储数据;其中,所述x等于全部可用存储对象的数量;将所述k+m个数据片段均匀划分为x份,将所述x份数据片段分别存储至所述x个存储对象。
下面介绍一个具体的实施方式:
1、平台服务器接收用户发送的数据存储请求后,指定接收待存储数据的管理服务器。
2、被指定的管理服务器接收该用户发送的待存储数据。
3、该管理服务器向平台服务器申请条带资源,平台服务器根据预设的纠删码策略k+m,为待存储数据分配条带资源:
例如,预设的纠删码策略为4+1,平台服务器组织4+1的条带资源下发给管理服务器。
具体的,平台服务器为每个条带生成一个唯一的条带ID(stripe id),例如:平台服务器组织4+1的条带资源时,在资源允许的情况下,平台服务器为该条带分配5个存储对象(磁盘)。
分配5个存储对象的规则可以包括:如果有足够多的可用存储节点(数量不小于5),则分配5个可用存储节点以存储该待存储数据,使得该待存储数据的每个数据片段存储至不同的存储节点中。
如果没有足够多的可用存储节点(数量小于5),但是有足够多的可用磁盘(数量不小于5),则分配5个可用磁盘以存储该待存储数据,使得该待存储数据的每个数据片段存储至不同的磁盘中。需要说明的是,在全部可用存储节点中分配5个可用磁盘时,尽量保证各数据片段均匀存储至各个存储节点。
如果也没有足够多的可用磁盘(数量小于5),则分配全部可用磁盘以存储该待存储数据,使得该待存储数据的每个数据片段存储的磁盘尽量不同。
假设平台服务器为该待存储数据分配的条带为:
{<stripe_id,OSD_1,wwn_1>,<stripe_id,OSD_2,wwn_2>,<stripe_id,OSD_3,wwn_3>,<stripe_id,OSD_4,wwn_4>,<stripe_id,OSD_5,wwn_5>},其中,OSD可以理解为存储节点,OSD_1、OSD_2等可以理解为存储节点的标识信息,wwn可以理解为磁盘,wwn_1、wwn_2等可以理解为磁盘的标识信息。
4、管理服务器根据申请到的条带,也可以理解为根据该预设的纠删码策略为4+1,对该待存储数据进行切片及冗余处理,得到原始片段及冗余片段,并将得到的原始片段及冗余片段分别存储至所分配的存储对象中。
具体的,管理服务器为条带中的每个数据片段生成一个唯一的key信息,这样,条带中的每个数据片段对应一个五元组<stripe_id,OSD,wwn,key,value>,其中,stripe_id表示条带ID,OSD表示存储节点的标识信息,wwn表示磁盘的标识信息,key表示数据片段的key,value表示数据片段的值或内容。
上述条带可以完整表示为:
{<stripe_id,OSD_1,wwn_1,key_1,value_1>,<stripe_id,OSD_2,wwn_2,key_2,value_2>,<stripe_id,OSD_3,wwn_3,key_3,value_3>,<stripe_id,OSD_4,wwn_4,key_4,value_4>,<stripe_id,OSD_5,wwn_5,key_5,value_5>}。
5、管理服务器根据上述完整的条带,将数据片段发送给其对应的存储节点。
具体的,管理服务器可以向存储节点发送一个三元组<wwn,key,value>,存储节点根据该三元组,将数据<key,value>存储到对应的磁盘中,存储节点存储完毕后返回给管理服务器存储成功的消息。
6、管理服务器接收到条带对应的每个存储节点发送的存储成功的消息后 (表示该待存储数据存储成功),管理服务器将每个数据片段的<stripe_id,wwn,key>(也就是索引信息)发送给平台服务器。
7、平台服务器记录待存储数据的每个数据片段的<stripe_id,wwn,key>(也就是索引信息)后,该待存储数据存储完毕。
需要说明的是,在一些实施方式中,同一个磁盘在不同时刻可以属于不同的存储节点,因此,在这种情况下,平台服务器记录的条带中可以不包括存储节点的标识信息,也就是说,管理服务器记录的条带可以为:{<stripe_id,wwn_1,key_1>,<stripe_id,wwn_2,key_2>,<stripe_id,wwn_3,key_3>,<stripe_id,wwn_4,key_4>,<stripe_id,wwn_5,key_5>}。
另外,上述存储节点也可以为存储服务器,或者其他,具体不做限定。
下面对数据处理系统进行数据读取的过程进行详细说明:
平台服务器,确定待读取数据对应的索引信息;
管理服务器,根据平台服务器所确定的索引信息,从存储对象中读取所述待读取数据的每个数据片段;将所读取的每个数据片段进行组合,得到该待读取数据。
下面介绍一个具体的实施方式:
1、平台服务器接收用户发送的数据读取请求,根据该数据读取请求,确定待读取数据,并指定执行读取操作的管理服务器。
2、被指定的管理服务器向平台服务器请求该待读取数据的条带信息(也就是该待读取数据对应的索引信息)。
假设该待读取数据对应的纠删码策略为4+1,平台服务器中记录的该待读取数据的条带为:
{<stripe_id,wwn_1,key_1>,<stripe_id,wwn_2,key_2>,<stripe_id,wwn_3,key_3>,<stripe_id,wwn_4,key_4>,<stripe_id,wwn_5,key_5>}。
根据上面描述,同一个磁盘在不同时刻可以属于不同的存储节点,在这种情况下,平台服务器记录的条带信息不包括存储节点的标识信息,因此,平台服务器需要根据磁盘的标识信息获取对应的存储节点的标识信息,并将所获取的存储节点的标识信息填充到上述条带中,填充后的条带为:
{<stripe_id,OSD_1,wwn_1,key_1>,<stripe_id,OSD_2,wwn_2,key_2>,<stripe_id,OSD_3,wwn_3,key_3>,<stripe_id,OSD_4,wwn_4,key_4>,<stripe_id,OSD_5, wwn_5,key_5>};
3、管理服务器可以根据上述填充后的条带,向存储节点发送一个二元组<wwn,key>,存储节点根据该二元组,使用key读取wwn(磁盘)上的value,存储节点将读取到的<key,value>发送给管理服务器。
4、管理服务器将存储节点发送的每个数据片段的<key,value>进行组合,组合完毕后条带为:
{<key_1,value_1>,<key_2,value_2>,<key_3,value_3>,<key_4,value_4>,<key_5,value_5>},这样,就得到了该待读取数据。
5、管理服务器将该待读取数据发送给用户。
另外,上述存储节点也可以为存储服务器,或者其他,具体不做限定。
下面对数据处理系统进行数据分散的过程进行详细说明:
平台服务器,在所记录的索引信息中,确定待分散索引信息;
管理服务器,根据所确定的待分散索引信息,确定待分散数据片段;为所述待分散数据片段分配分散存储对象;将所述待分散数据片段分别存储至所述分散存储对象;
平台服务器,更新所述待分散索引信息。
具体的,平台服务器,可以针对每条索引信息,确定其所对应的各个存储对象的记录次数,其中,一条索引信息中一个存储对象的记录次数为:该存储对象中存储的该条索引信息中的数据片段的数量;当存在记录次数超过第三预设阈值的目标存储对象时,将该索引信息确定为待分散索引信息。
管理服务器,可以在所述目标存储对象对应的数据片段中,确定待分散数据片段。
作为另一种实施方式,如图10所示,该数据处理系统还可以包括审计服务器,
平台服务器,在所记录的索引信息中,确定待分散索引信息;
审计服务器,根据所确定的待分散索引信息,确定待分散数据片段;为所述待分散数据片段分配分散存储对象;将所述待分散数据片段分别存储至所述分散存储对象;
平台服务器,更新所述待分散索引信息。
具体的,平台服务器,可以针对每条索引信息,确定其所对应的各个存 储对象的记录次数;当存在记录次数超过第三预设阈值的目标存储对象时,将该索引信息确定为待分散索引信息。
审计服务器,可以在所述目标存储对象对应的数据片段中,确定待分散数据片段。
下面针对图10介绍一个具体的实施方式:
1、平台服务器扫描记录的条带(也就是各数据片段对应的索引信息),确定各条带所对应的各个存储对象的记录次数,当存在记录次数超过第三预设阈值的目标存储对象时,将该条带确定为待分散条带。
假设确定的待分散条带为:
{<stripe_id,wwn_1,key_1>,<stripe_id,wwn_1,key_2>,<stripe_id,wwn_3,key_3>,<stripe_id,wwn_4,key_4>,<stripe_id,wwn_5,key_5>}(其中wwn_1的记录次数为2,假设第三预设阈值为1,该条带中存在记录次数超过第三预设阈值的目标存储对象时,将该条带确定为待分散条带)。
根据上面描述,同一个磁盘在不同时刻可以属于不同的存储节点,在这种情况下,平台服务器记录的条带信息不包括存储节点的标识信息,因此,平台服务器需要根据磁盘的标识信息获取对应的存储节点的标识信息,并将所获取的存储节点的标识信息填充到上述条带中,填充后的条带为:
{<stripe_id,OSD_1,wwn_1,key_1>,<stripe_id,OSD_1,wwn_1,key_2>,<stripe_id,OSD_3,wwn_3,key_3>,<stripe_id,OSD_3,wwn_4,key_4>,<stripe_id,OSD_5,wwn_5,key_5>}。
平台服务器将填充后的条带(也就是待分散索引信息)发送给审计服务器。
2、审计服务器接收到该条带。
3、审计服务器对该条带进行分析,确定待分散数据片段:
具体的,<key1,key2>存储在一块磁盘wwn_1上,<key1,key2>、<key3,key4>存储在一台存储节点OSD_1上,可以将key_1及key_3确定为待分散数据片段。也就是说,当同一磁盘或者同一存储节点中存储了同一数据的多个数据片段时,可以在这多个数据片段中确定待分散数据片段。
审计服务器将<wwn_1,key_1>、<wwn_3,key_3>发送给对应的存储节点读取数据。
4、审计服务器向平台服务器申请用于分散的存储节点、及磁盘 <OSD_x,wwn_x>、<OSD_y,wwn_y>(也就是为待分散数据片段分配分散存储对象)。
5、审计服务器将读取到的数据<OSD_1,wwn_1,key_1,value_1>、<OSD_3,wwn_3,key_3,value_3>写入到新的存储节点、及磁盘中,写入完成后的数据片段可以表示为三元组:
<OSD_x,wwn_x,key_1,value_1>、<OSD_y,wwn_y,key_3,value_3>。
6、将所记录的分散之前的数据片段的三元组:<OSD_1,wwn_1,key_1,value_1>、<OSD_3,wwn_3,key_3,value_3>删除。
7、审计服务器通知平台服务器分散操作已完成,平台服务器将条带修改为:{<stripe_id,wwn_x,key_1>,<stripe_id,wwn_1,key_2>,<stripe_id,wwn_y,key_3>,<stripe_id,wwn_4,key_4>,<stripe_id,wwn_5,key_5>}。
也就是更新待分散索引信息。
另外,上述存储节点也可以为存储服务器,或者其他,具体不做限定。
应用上述实施例,当存在数据存储不均匀的情况时,可以对存储不均匀的数据进行分散,使得各个数据片段较均匀地存储至各个存储对象中,存储对象为存储数据的最小单元,能够避免一个存储对象出现故障,同一数据的多部分数据片段全部丢失,导致无法进行数据恢复的情况。存储至多个存储对象中,能够避免一个存储对象出现故障,多部分数据片段全部丢失,导致无法进行数据恢复的情况。
下面对数据处理系统进行数据重构的过程进行详细说明:
平台服务器,在所记录的索引信息中,确定待重构索引信息;其中,所述待重构索引信息中包含出现故障的存储对象的信息;
管理服务器,根据所述待重构信息中包含的未出现故障的存储对象的信息,从所述未出现故障的存储对象中读取目标数据片段,所述目标数据片段为:所述待重构信息对应的待重构数据的每个数据片段;将所述目标数据片段进行重构,得到修复片段;将所述修复片段存储至为其分配的存储对象中;
平台服务器,更新所述待重构索引信息。
作为另一种实施方式,如图10所示,该数据处理系统还可以包括审计服务器,
平台服务器,在所记录的索引信息中,确定待重构索引信息;其中,所述待重构索引信息中包含出现故障的存储对象的信息;
审计服务器,根据所述待重构信息中包含的未出现故障的存储对象的信息,从所述未出现故障的存储对象中读取目标数据片段,所述目标数据片段为:所述待重构信息对应的待重构数据的每个数据片段;将所述目标数据片段进行重构,得到修复片段;将所述修复片段存储至为其分配的存储对象中;
平台服务器,更新所述待重构索引信息。
当存储节点出现故障或者磁盘出与故障等情况时,数据处理系统将启动数据重构。下面针对图10介绍一个具体的实施方式:
1、平台服务器获取出现故障的存储节点或者磁盘的信息;扫描记录的条带(也就是各数据片段对应的索引信息);将包含出现故障的存储节点或者磁盘的信息的条带确定为待重构条带(也就是确定待重构索引信息);指定执行重构操作的审计服务器。
假设确定的待重构条带为:
{<stripe_id,wwn_1,key_1>,<stripe_id,wwn_2,key_2>,<stripe_id,wwn_3,key_3>,<stripe_id,wwn_4,key_4>,<stripe_id,wwn_5,key_5>},其中wwn_1磁盘下线;
根据上面描述,同一个磁盘在不同时刻可以属于不同的存储节点,在这种情况下,平台服务器记录的条带信息不包括存储节点的标识信息,因此,平台服务器需要根据磁盘的标识信息获取对应的存储节点的标识信息,并将所获取的存储节点的标识信息填充到上述条带中,填充后的条带为:
{<stripe_id,OSD_1,wwn_1,key_1>,<stripe_id,OSD_2,wwn_2,key_2>,<stripe_id,OSD_3,wwn_3,key_3>,<stripe_id,OSD_4,wwn_4,key_4>,<stripe_id,OSD_5,wwn_5,key_5>}。
平台服务器将填充后的条带、以及与需要修复的数据片段<stripe_id,OSD_1,wwn_1,key_1>(也就是待重构索引信息)发送给审计服务器。
2、审计服务器接收到上述填充后的条带、以及与需要修复的数据片段后,将条带中的信息进行分类:
{<stripe_id,OSD_2,wwn_2,key_2>,<stripe_id,OSD_3,wwn_3,key_3>,<stripe_id,OSD_4,wwn_4,key_4>,<stripe_id,OSD_5,wwn_5,key_5>}这四条索引信息对应的数据片段未丢失,可以正常读取;<stripe_id,OSD_1,wwn_1,key_1>对应的数据片段丢失,需要利用纠删码策略进行修复。
3、审计服务器将{<wwn_2,key_2>,<wwn_3,key_3>,<wwn_4,key_4>,<wwn_5,key_5>}发送给对应的存储节点读取数据。
4、读取数据完毕后,条带为:
{<wwn_2,key_2,value_2>,<wwn_3,key_3,value_3>,<wwn_4,key_4,value_4>,<wwn_5,key_5,value_5>},利用纠删码策略进行修复出丢失的wwn_1磁盘中的数据片段value_1。
5、审计服务器向平台服务器申请新的存储节点与磁盘<OSD_z,wwn_z>。
6、审计服务器将恢复出的数据片段value_1写入到新的存储节点与磁盘<OSD_z,wwn_z>中。
7、重构操作完成,审计服务器通知平台服务器更新条带,此时平台服务器更新条带为:
{<stripe_id,wwn_z,key_1>,<stripe_id,wwn_2,key_2>,<stripe_id,wwn_3,key_3>,<stripe_id,wwn_4,key_4>,<stripe_id,wwn_5,key_5>}。
另外,上述存储节点也可以为存储服务器,或者其他,具体不做限定。
下面对数据处理系统进行数据回收的过程进行详细说明:
平台服务器,在所记录的索引信息中,确定待回收索引信息;
管理服务器,根据所述待回收索引信息,确定待回收数据片段;
平台服务器,为所述待回收数据片段分配存储对象;
管理服务器,将所述待回收数据片段分别存储至为其分配的存储对象中;
平台服务器,记录每个待回收数据片段与存储所述待回收数据片段的存储对象的对应关系;根据所记录的对应关系,生成新的索引信息替换所述待回收索引信息;
作为另一种实施方式,如图10所示,该数据处理系统还可以包括审计服务器,
平台服务器,在所记录的索引信息中,确定待回收索引信息;
审计服务器,根据所述待回收索引信息,确定待回收数据片段;
平台服务器,为所述待回收数据片段分配存储对象;
审计服务器,将所述待回收数据片段分别存储至为其分配的存储对象中;
平台服务器,记录每个待回收数据片段与存储所述待回收数据片段的存储对象的对应关系;根据所记录的对应关系,生成新的索引信息替换所述待 回收索引信息。
具体的,平台服务器可以针对所记录的每条索引信息,判断其所包含的无效数据片段份数是否大于第四预设阈值,如果是,将其确定为待回收索引信息;其中,所述无效数据片段为:出现故障的存储对象对应的数据片段。
管理服务器可以根据所述待回收索引信息中的有效数据片段,确定待回收数据片段,所述有效数据片段为除所述无效数据片段之外的数据片段。
下面针对图10介绍一个具体的实施方式:
1、平台服务器扫描记录的条带(也就是各数据片段对应的索引信息),确定待回收条带:
假设第四预设阈值为2,如下五个条带中包含的无效数据片段(使用NULL表示)份数都为4,因此,将这五个条带确定为待回收条带:
{<stripe_id_1,wwn_11,key_11>,NULL,NULL,NULL,NULL},
{<stripe_id_2,wwn_21,key_21>,NULL,NULL,NULL,NULL},
{<stripe_id_3,wwn_31,key_31>,NULL,NULL,NULL,NULL},
{<stripe_id_4,wwn_41,key_41>,NULL,NULL,NULL,NULL},
{<stripe_id_5,wwn_51,key_51>,NULL,NULL,NULL,NULL}。
根据上面描述,同一个磁盘在不同时刻可以属于不同的存储节点,在这种情况下,平台服务器记录的条带信息不包括存储节点的标识信息,因此,平台服务器需要根据磁盘的标识信息获取对应的存储节点的标识信息,并将所获取的存储节点的标识信息填充到上述条带中,填充后的条带为:
{<stripe_id_1,OSD_11,wwn_11,key_11>,NULL,NULL,NULL,NULL},
{<stripe_id_2,OSD_21,wwn_21,key_21>,NULL,NULL,NULL,NULL},
{<stripe_id_3,OSD_31,wwn_31,key_31>,NULL,NULL,NULL,NULL},
{<stripe_id_4,OSD_41,wwn_41,key_41>,NULL,NULL,NULL,NULL},
{<stripe_id_5,OSD_51,wwn_51,key_51>,NULL,NULL,NULL,NULL}。
平台服务器将填充后的条带发送给审计服务器。
2、审计服务器接收到上述5个待重构条带。
3、审计服务器根据所接收到的5个条带,确定待回收数据片段。也就是将条带中除无效数据片段(使用NULL表示)以外的数据片段确定为待回收数据片段。
审计服务器将stripe_id_1中的有效数据片段<wwn_11,key_11>发送给存储节点OSD_11读取数据value11;将stripe_id_2中的有效数据片段<wwn_21,key_21>发送给存储节点OSD_21读取数据value21;将stripe_id_3中的有效数据片段<wwn_31,key_31>发送给存储节点OSD_31读取数据value31;将stripe_id_4中的有效数据片段<wwn_41,key_41>发送给存储节点OSD_41读取数据value41;将stripe_id_5中的有效数据片段<wwn_51,key_51>发送给存储节点OSD_51读取数据value51.
4、审计服务器向平台服务器申请空闲条带,假设申请到的条带为:
{<stripe_id,OSD_1,wwn_1>,<stripe_id,OSD_2,wwn_2>,<stripe_id,OSD_3,wwn_3>,<stripe_id,OSD_4,wwn_4>,<stripe_id,OSD_5,wwn_5>}。
5、审计服务器将读取到的数据按照新的条带进行组织,组织完成后新条带为:
{<stripe_id,OSD_1,wwn_1,key_11,value_11>,<stripe_id,OSD_2,wwn_2,key_21,value_21>,<stripe_id,OSD_3,wwn_3,key_31,value_31>,<stripe_id,OSD_4,wwn_4,key_41,value_41>,<stripe_id,OSD_5,wwn_5,key_51,value_51>}。
6、审计服务器根据上述新条带,将数据片段发送给其对应的存储节点。
具体的,审计服务器可以向存储节点发送一个三元组<wwn,key,value>,存储节点根据该三元组,将数据<key,value>存储到对应的磁盘中,存储节点存储完毕后返回给审计服务器存储成功的消息。
7、审计服务器回收完成后通知平台服务器,将上述新条带发送给平台服务器保存。也就是说,平台服务器记录每个待回收数据片段与存储所述待回收数据片段的存储对象的对应关系;根据所记录的对应关系,生成新的索引信息替换所述待回收索引信息。
需要说明的是,在一些实施方式中,同一个磁盘在不同时刻可以属于不同的存储节点,因此,在这种情况下,平台服务器记录的条带中可以不包括存储节点的标识信息,也就是说,管理服务器记录的条带可以为:
{<stripe_id,wwn_1,key_11>,<stripe_id,wwn_2,key_21>,<stripe_id,wwn_3,key_31>,<stripe_id,wwn_4,key_41>,<stripe_id,wwn_5,key_51>}。
8、平台服务器删除上述5个待回收条带。
可以理解的是,系统长时间运行,部分存储节点或者磁盘出现故障,或者其他情况会导致条带中的部分数据无效,无效数据仍占据条带资源,这样 会造成条带资源浪费。应用本方案,条带中的无效数据片段份数较多时,可以对条带中的有效片段进行回收,节省条带资源。
作为一种实施方式,平台服务器可以每隔预设周期执行本方案,及时对条带中的有效片段进行回收,节省条带资源。
另外,上述存储节点也可以为存储服务器,或者其他,具体不做限定。
需要说明的是,在本申请实施例提供的数据处理系统中还可以包括存储服务器,如图11所示。
作为一种实施方式,存储服务器可以包括上述存储节点,或者,存储服务器也可以为上述存储节点。
存储服务器,向所述平台服务器上报自身多个存储对象的运行状态信息,以使所述平台服务器根据每个存储服务器上报的运行状态信息为待存储数据分配存储对象、以及为待分散数据片段分配分散存储对象、以及确定待重构索引信息。
本申请实施例还提供一种电子设备,如图12所示,包括:处理器1201和存储器1202,其中,存储器1202用于存储可执行程序代码,处理器1201通过读取存储器1202中存储的可执行程序代码来运行与可执行程序代码对应的程序,以用于执行上述任一种数据存储、分散、重构、回收方法。
本申请实施例还提供一种可执行程序代码,所述可执行程序代码用于被运行以执行上述任一种数据存储、分散、重构、回收方法。
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质用于存储可执行程序代码,所述可执行程序代码用于被运行以执行上述任一种数据存储、分散、重构、回收方法。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除 在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于图5所示的数据存储装置实施例而言,由于其基本相似于图1所示的数据存储方法实施例,所以描述的比较简单,相关之处参见图1所示的数据存储方法实施例的部分说明即可;对于图6所示的数据分散装置实施例而言,由于其基本相似于图2所示的数据分散方法实施例,所以描述的比较简单,相关之处参见图2所示的数据分散方法实施例的部分说明即可;对于图7所示的数据重构装置实施例而言,由于其基本相似于图3所示的数据重构方法实施例,所以描述的比较简单,相关之处参见图3所示的数据重构方法实施例的部分说明即可;对于图8所示的数据回收装置实施例而言,由于其基本相似于图4所示的数据回收方法实施例,所以描述的比较简单,相关之处参见图4所示的数据回收方法实施例的部分说明即可;对于图9-11所示的数据处理系统实施例而言,由于其基本相似于图1-4所示的数据存储、分散、重构、回收方法实施例,所以描述的比较简单,相关之处参见图1-4所示的数据存储、分散、重构、回收方法实施例的部分说明即可;对于上述可执行程序代码实施例以及计算机可读存储介质实施例而言,由于其基本相似于图1-4所示的数据存储、分散、重构、回收方法实施例,所以描述的比较简单,相关之处参见图1-4所示的数据存储、分散、重构、回收方法实施例的部分说明即可。
本领域普通技术人员可以理解实现上述方法实施方式中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,所述的程序可以存储于计算机可读取存储介质中,这里所称得的存储介质,如:ROM/RAM、磁碟、光盘等。
以上所述仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和原则之内所作的任何修改、等同替换、改进等,均包含在本申请的保护范围内。

Claims (32)

  1. 一种数据存储方法,其特征在于,包括:
    根据预设的纠删码策略k+m,为待存储数据分配x个存储对象;其中,所述k表示原始片段份数,所述m表示冗余片段份数,所述x大于1且不大于k+m,所述存储对象为存储数据的最小单元;
    利用所述纠删码策略k+m,对所述待存储数据进行切片及冗余处理,得到k+m个数据片段;
    将所述k+m个数据片段分别存储至所述x个存储对象,其中,各个存储对象中存储的数据片段份数的差值小于第一预设阈值;
    记录所述待存储数据对应的索引信息,所述索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系。
  2. 根据权利要求1所述的方法,其特征在于,所述根据预设的纠删码策略k+m,为待存储数据分配x个存储对象的步骤包括:
    判断可用存储节点的数量是否不小于k+m;
    如果不小于,在可用存储节点中确定x个存储对象;其中,所确定的每个存储对象所在的存储节点不同;所述x等于所述k+m;
    所述将所述k+m个数据片段分别存储至所述x个存储对象的步骤包括:
    将每个数据片段分别存储至不同存储节点中的存储对象。
  3. 根据权利要求2所述的方法,其特征在于,在判断可用存储节点的数量小于k+m的情况下,所述方法还包括:
    判断全部可用存储节点中的可用存储对象的数量是否不小于k+m;
    如果是,在全部可用存储节点中确定x个存储对象;其中,所述x等于所述k+m,每个可用存储节点中确定的存储对象的数量差值小于第二预设阈值;
    所述将所述k+m个数据片段分别存储至所述x个存储对象的步骤包括:
    将每个数据片段分别存储至不同存储对象。
  4. 根据权利要求3所述的方法,其特征在于,在判断可用存储节点中的可用存储对象的数量小于k+m的情况下,所述方法还包括:
    将全部可用存储对象分配给所述待存储数据;其中,所述x等于全部可用存储对象的数量;
    所述将所述k+m个数据片段分别存储至所述x个存储对象的步骤包括:
    将所述k+m个数据片段均匀划分为x份,将所述x份数据片段分别存储至所述x个存储对象。
  5. 一种数据分散方法,其特征在于,包括:
    在所记录的索引信息中,确定待分散索引信息;其中,索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系,所述存储对象为存储数据的最小单元;
    根据所确定的待分散索引信息,确定待分散数据片段;
    为所述待分散数据片段分配分散存储对象;
    将所述待分散数据片段分别存储至所述分散存储对象。
  6. 根据权利要求5所述的方法,其特征在于,在所记录的索引信息中,确定待分散索引信息的步骤包括:
    针对每条索引信息,确定其所对应的各个存储对象的记录次数,其中,一条索引信息中一个存储对象的记录次数为:该存储对象中存储的该条索引信息中的数据片段的数量;
    当存在记录次数超过第三预设阈值的目标存储对象时,将该索引信息确定为待分散索引信息;
    所述根据所确定的待分散索引信息,确定待分散数据片段的步骤包括:
    在所述目标存储对象对应的数据片段中,确定待分散数据片段。
  7. 根据权利要求5所述的方法,其特征在于,在将所述待分散数据片段分别存储至所述分散存储对象之后,还包括:
    根据所述待分散数据片段与存储所述待分散数据片段的分散存储对象的对应关系,更新所述待分散索引信息。
  8. 一种数据重构方法,其特征在于,包括:
    在所记录的索引信息中,确定待重构索引信息;其中,索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系,所述待重构索引信息中包含出现故障的存储对象的信息,所述存储对象为存储数据的最小单元;
    根据所述待重构信息中包含的未出现故障的存储对象的信息,从所述未出现故障的存储对象中读取目标数据片段,所述目标数据片段为:所述待重构信息对应的待重构数据的每个数据片段;
    将所述目标数据片段进行重构,得到修复片段;
    将所述修复片段存储至为其分配的存储对象中。
  9. 根据权利要求8所述的方法,其特征在于,在将所述修复片段存储至为其分配的存储对象中之后,还包括:
    根据所述修复片段与存储所述修复片段的存储对象的对应关系,更新所述待重构索引信息。
  10. 一种数据回收方法,其特征在于,包括:
    在所记录的索引信息中,确定待回收索引信息;其中,索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系,所述存储对象为存储数据的最小单元;
    根据所述待回收索引信息,确定待回收数据片段;
    为所述待回收数据片段分配存储对象;
    将所述待回收数据片段分别存储至为其分配的存储对象中。
  11. 根据权利要求10所述的方法,其特征在于,在将所述待回收数据片段分别存储至为其分配的存储对象中之后,还包括:
    记录每个待回收数据片段与存储所述待回收数据片段的存储对象的对应关系;
    根据所记录的对应关系,生成新的索引信息替换所述待回收索引信息。
  12. 根据权利要求10所述的方法,其特征在于,所述在所记录的索引信息中,确定待回收索引信息的步骤包括:
    针对所记录的每条索引信息,判断其所包含的无效数据片段份数是否大于第四预设阈值,如果是,将其确定为待回收索引信息;其中,所述无效数据片段为:出现故障的存储对象对应的数据片段。
  13. 根据权利要求12所述的方法,其特征在于,所述根据所述待回收索引信息,确定待回收数据片段的步骤包括:
    根据所述待回收索引信息中的有效数据片段,确定待回收数据片段,所述有效数据片段为除所述无效数据片段之外的数据片段。
  14. 一种数据存储装置,其特征在于,包括:
    第一分配模块,用于根据预设的纠删码策略k+m,为待存储数据分配x个存储对象;其中,所述k表示原始片段份数,所述m表示冗余片段份数,所述x大于1且不大于k+m,所述存储对象为存储数据的最小单元;
    切片模块,用于利用所述纠删码策略k+m,对所述待存储数据进行切片及冗余处理,得到k+m个数据片段;
    第一存储模块,用于将所述k+m个数据片段分别存储至所述x个存储对象,其中,各个存储对象中存储的数据片段份数的差值小于第一预设阈值;
    第一记录模块,用于记录所述待存储数据对应的索引信息,所述索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系。
  15. 根据权利要求14所述的装置,其特征在于,所述第一分配模块,包括:
    第一判断子模块,用于判断可用存储节点的数量是否不小于k+m;
    第一确定子模块,用于当所述第一判断子模块判断结果为是时,在可用存储节点中确定x个存储对象;其中,所确定的每个存储对象所在的存储节点不同;所述x等于所述k+m;
    所述第一存储模块,包括:
    第一存储子模块,用于当所述第一判断子模块判断结果为是时,将每个数据片段分别存储至不同存储节点中的存储对象。
  16. 根据权利要求15所述的装置,其特征在于,所述第一分配模块,包括:
    第二判断子模块,用于当所述第一判断子模块判断结果为否时,判断全部可用存储节点中的可用存储对象的数量是否不小于k+m;
    第二确定子模块,用于当所述第二判断子模块判断结果为是时,在全部可用存储节点中确定x个存储对象;其中,所述x等于所述k+m,每个可用存储节点中确定的存储对象的数量差值小于第二预设阈值;
    所述第一存储模块,包括:
    第二存储子模块,用于当所述第二判断子模块判断结果为是时,将每个数据片段分别存储至不同存储对象。
  17. 根据权利要求16所述的装置,其特征在于,所述第一分配模块,包括:
    分配子模块,用于当所述第二判断子模块判断结果为否时,将全部可用存储对象分配给所述待存储数据;其中,所述x等于全部可用存储对象的数量;
    所述第一存储模块,包括:
    第三存储子模块,用于当所述第二判断子模块判断结果为否时,将所述k+m个数据片段均匀划分为x份,将所述x份数据片段分别存储至所述x个存储对象。
  18. 一种数据分散装置,其特征在于,包括:
    第一确定模块,用于在所记录的索引信息中,确定待分散索引信息;其中,索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系,所述存储对象为存储数据的最小单元;
    第二确定模块,用于根据所确定的待分散索引信息,确定待分散数据片 段;
    第二分配模块,用于为所述待分散数据片段分配分散存储对象;
    第二存储模块,用于将所述待分散数据片段分别存储至所述分散存储对象。
  19. 根据权利要求18所述的装置,其特征在于,所述第一确定模块,具体用于:
    针对每条索引信息,确定其所对应的各个存储对象的记录次数,其中,一条索引信息中一个存储对象的记录次数为:该存储对象中存储的该条索引信息中的数据片段的数量;
    当存在记录次数超过第三预设阈值的目标存储对象时,将该索引信息确定为待分散索引信息;
    所述第二确定模块,具体用于:
    在所述目标存储对象对应的数据片段中,确定待分散数据片段。
  20. 根据权利要求18所述的装置,其特征在于,所述装置还包括:
    第一更新模块,用于根据所述待分散数据片段与存储所述待分散数据片段的分散存储对象的对应关系,更新所述待分散索引信息。
  21. 一种数据重构装置,其特征在于,包括:
    第三确定模块,用于在所记录的索引信息中,确定待重构索引信息;其中,索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系,所述待重构索引信息中包含出现故障的存储对象的信息,所述存储对象为存储数据的最小单元;
    读取模块,用于根据所述待重构信息中包含的未出现故障的存储对象的信息,从所述未出现故障的存储对象中读取目标数据片段,所述目标数据片段为:所述待重构信息对应的待重构数据的每个数据片段;
    重构模块,用于将所述目标数据片段进行重构,得到修复片段;
    第三存储模块,用于将所述修复片段存储至为其分配的存储对象中。
  22. 根据权利要求21所述的装置,其特征在于,所述装置还包括:
    第二更新模块,根据所述修复片段与存储所述修复片段的存储对象的对应关系,更新所述待重构索引信息。
  23. 一种数据回收装置,其特征在于,包括:
    第四确定模块,用于在所记录的索引信息中,确定待回收索引信息;其中,索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系,所述存储对象为存储数据的最小单元;
    第五确定模块,用于根据所述待回收索引信息,确定待回收数据片段;
    第三分配模块,用于为所述待回收数据片段分配存储对象;
    第四存储模块,用于将所述待回收数据片段分别存储至为其分配的存储对象中。
  24. 根据权利要求23所述的装置,其特征在于,所述装置还包括:
    第二记录模块,用于记录每个待回收数据片段与存储所述待回收数据片段的存储对象的对应关系;
    替换模块,用于根据所记录的对应关系,生成新的索引信息替换所述待回收索引信息。
  25. 根据权利要求23所述的装置,其特征在于,所述第四确定模块,具体用于:
    针对所记录的每条索引信息,判断其所包含的无效数据片段份数是否大于第四预设阈值,如果是,将其确定为待回收索引信息;其中,所述无效数据片段为:出现故障的存储对象对应的数据片段。
  26. 根据权利要求25所述的装置,其特征在于,所述第五确定模块,具体用于:
    根据所述待回收索引信息中的有效数据片段,确定待回收数据片段,所述有效数据片段为除所述无效数据片段之外的数据片段。
  27. 一种数据处理系统,其特征在于,包括:平台服务器和管理服务器, 其中,
    所述平台服务器,根据预设的纠删码策略k+m,为待存储数据分配x个存储对象;其中,所述k表示原始片段份数,所述m表示冗余片段份数,所述x大于1且不大于k+m,所述存储对象为存储数据的最小单元;
    所述管理服务器,利用所述纠删码策略k+m,对所述待存储数据进行切片及冗余处理,得到k+m个数据片段;将所述k+m个数据片段分别存储至所述x个存储对象,其中,各个存储对象中存储的数据片段份数的差值小于第四预设阈值;
    所述平台服务器,记录所述待存储数据对应的索引信息,所述索引信息包括:每个数据片段与存储该数据片段的存储对象的对应关系。
  28. 根据权利要求27所述的系统,其特征在于,
    所述平台服务器,确定待读取数据对应的索引信息;
    所述管理服务器,根据所述平台服务器所确定的索引信息,从存储对象中读取所述待读取数据的每个数据片段;将所读取的每个数据片段进行组合,得到所述待读取数据。
  29. 根据权利要求27所述的系统,其特征在于,
    所述平台服务器,在所记录的索引信息中,确定待分散索引信息;
    所述管理服务器,根据所确定的待分散索引信息,确定待分散数据片段;为所述待分散数据片段分配分散存储对象;将所述待分散数据片段分别存储至所述分散存储对象;
    所述平台服务器,更新所述待分散索引信息;
    或者,所述系统还包括审计服务器,
    所述平台服务器,在所记录的索引信息中,确定待分散索引信息;
    所述审计服务器,根据所确定的待分散索引信息,确定待分散数据片段;为所述待分散数据片段分配分散存储对象;将所述待分散数据片段分别存储至所述分散存储对象;
    所述平台服务器,更新所述待分散索引信息。
  30. 根据权利要求27所述的系统,其特征在于,
    所述平台服务器,在所记录的索引信息中,确定待重构索引信息;其中,所述待重构索引信息中包含出现故障的存储对象的信息;
    所述管理服务器,根据所述待重构信息中包含的未出现故障的存储对象的信息,从所述未出现故障的存储对象中读取目标数据片段,所述目标数据片段为:所述待重构信息对应的待重构数据的每个数据片段;将所述目标数据片段进行重构,得到修复片段;将所述修复片段存储至为其分配的存储对象中;
    所述平台服务器,更新所述待重构索引信息;
    或者,所述系统还包括审计服务器,
    所述平台服务器,在所记录的索引信息中,确定待重构索引信息;其中,所述待重构索引信息中包含出现故障的存储对象的信息;
    所述审计服务器,根据所述待重构信息中包含的未出现故障的存储对象的信息,从所述未出现故障的存储对象中读取目标数据片段,所述目标数据片段为:所述待重构信息对应的待重构数据的每个数据片段;将所述目标数据片段进行重构,得到修复片段;将所述修复片段存储至为其分配的存储对象中;
    所述平台服务器,更新所述待重构索引信息。
  31. 根据权利要求27所述的系统,其特征在于,
    所述平台服务器,在所记录的索引信息中,确定待回收索引信息;
    所述管理服务器,根据所述待回收索引信息,确定待回收数据片段;
    所述平台服务器,为所述待回收数据片段分配存储对象;
    所述管理服务器,将所述待回收数据片段分别存储至为其分配的存储对象中;
    所述平台服务器,记录每个待回收数据片段与存储所述待回收数据片段 的存储对象的对应关系;根据所记录的对应关系,生成新的索引信息替换所述待回收索引信息;
    或者,所述系统还包括审计服务器,
    所述平台服务器,在所记录的索引信息中,确定待回收索引信息;
    所述审计服务器,根据所述待回收索引信息,确定待回收数据片段;
    所述平台服务器,为所述待回收数据片段分配存储对象;
    所述审计服务器,将所述待回收数据片段分别存储至为其分配的存储对象中;
    所述平台服务器,记录每个待回收数据片段与存储所述待回收数据片段的存储对象的对应关系;根据所记录的对应关系,生成新的索引信息替换所述待回收索引信息。
  32. 根据权利要求27-31任一项所述的系统,其特征在于,还包括:存储服务器,所述存储服务器包含多个存储对象;
    所述存储服务器,向所述平台服务器上报自身多个存储对象的运行状态信息,以使所述平台服务器根据每个存储服务器上报的运行状态信息为待存储数据分配存储对象、以及为待分散数据片段分配分散存储对象、以及确定待重构索引信息。
PCT/CN2018/079277 2017-03-17 2018-03-16 数据存储、分散、重构、回收方法、装置及数据处理系统 WO2018166526A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/495,042 US11010072B2 (en) 2017-03-17 2018-03-16 Data storage, distribution, reconstruction and recovery methods and devices, and data processing system
EP18768099.6A EP3598289B1 (en) 2017-03-17 2018-03-16 Data storage, distribution, reconstruction and recovery methods and devices, and data processing system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710162001.X 2017-03-17
CN201710162001.XA CN108628539B (zh) 2017-03-17 2017-03-17 数据存储、分散、重构、回收方法、装置及数据处理系统

Publications (1)

Publication Number Publication Date
WO2018166526A1 true WO2018166526A1 (zh) 2018-09-20

Family

ID=63521721

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/079277 WO2018166526A1 (zh) 2017-03-17 2018-03-16 数据存储、分散、重构、回收方法、装置及数据处理系统

Country Status (4)

Country Link
US (1) US11010072B2 (zh)
EP (1) EP3598289B1 (zh)
CN (1) CN108628539B (zh)
WO (1) WO2018166526A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664223B (zh) * 2018-05-18 2021-07-02 百度在线网络技术(北京)有限公司 一种分布式存储方法、装置、计算机设备及存储介质
GB201905209D0 (en) * 2019-04-12 2019-05-29 Vaion Ltd Method of strong items of data
CN112653539B (zh) * 2020-12-29 2023-06-20 杭州趣链科技有限公司 一种待存储数据的存储方法、装置以及设备
CN115113819A (zh) * 2022-06-29 2022-09-27 京东方科技集团股份有限公司 一种数据存储的方法、单节点服务器及设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101175011A (zh) * 2007-11-02 2008-05-07 南京大学 基于dht的p2p系统中获得高可用数据冗余的方法
US20120060072A1 (en) * 2010-09-08 2012-03-08 Microsoft Corporation Erasure coding immutable data
CN103631539A (zh) * 2013-12-13 2014-03-12 百度在线网络技术(北京)有限公司 基于擦除编码机制的分布式存储系统及其存储方法
CN105159603A (zh) * 2015-08-18 2015-12-16 福建省海峡信息技术有限公司 一种分布式数据存储系统的修复方法
CN105630423A (zh) * 2015-12-25 2016-06-01 华中科技大学 一种基于数据缓存的纠删码集群存储扩容方法
CN105760116A (zh) * 2016-03-10 2016-07-13 天津科技大学 一种多网盘下的增量纠删码存储方法及系统

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8103904B2 (en) * 2010-02-22 2012-01-24 International Business Machines Corporation Read-other protocol for maintaining parity coherency in a write-back distributed redundancy data storage system
CN102279777B (zh) * 2011-08-18 2014-09-03 华为数字技术(成都)有限公司 数据冗余处理方法、装置和分布式存储系统
CN103136215A (zh) * 2011-11-24 2013-06-05 腾讯科技(深圳)有限公司 存储系统的数据读写方法和装置
CN103257831B (zh) * 2012-02-20 2016-12-07 深圳市腾讯计算机系统有限公司 存储器的读写控制方法及对应的存储器
US8799746B2 (en) 2012-06-13 2014-08-05 Caringo, Inc. Erasure coding and replication in storage clusters
US10019323B1 (en) * 2014-03-25 2018-07-10 EMC IP Holding Company LLC Method and system for container data recovery in a storage system
US9665428B2 (en) * 2015-02-05 2017-05-30 Netapp, Inc. Distributing erasure-coded fragments in a geo-distributed storage system
US9672905B1 (en) * 2016-07-22 2017-06-06 Pure Storage, Inc. Optimize data protection layouts based on distributed flash wear leveling

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101175011A (zh) * 2007-11-02 2008-05-07 南京大学 基于dht的p2p系统中获得高可用数据冗余的方法
US20120060072A1 (en) * 2010-09-08 2012-03-08 Microsoft Corporation Erasure coding immutable data
CN103631539A (zh) * 2013-12-13 2014-03-12 百度在线网络技术(北京)有限公司 基于擦除编码机制的分布式存储系统及其存储方法
CN105159603A (zh) * 2015-08-18 2015-12-16 福建省海峡信息技术有限公司 一种分布式数据存储系统的修复方法
CN105630423A (zh) * 2015-12-25 2016-06-01 华中科技大学 一种基于数据缓存的纠删码集群存储扩容方法
CN105760116A (zh) * 2016-03-10 2016-07-13 天津科技大学 一种多网盘下的增量纠删码存储方法及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3598289A4

Also Published As

Publication number Publication date
EP3598289B1 (en) 2023-06-07
CN108628539A (zh) 2018-10-09
US20200097199A1 (en) 2020-03-26
US11010072B2 (en) 2021-05-18
EP3598289A1 (en) 2020-01-22
CN108628539B (zh) 2021-03-26
EP3598289A4 (en) 2020-03-25

Similar Documents

Publication Publication Date Title
US10795789B2 (en) Efficient recovery of erasure coded data
WO2018166526A1 (zh) 数据存储、分散、重构、回收方法、装置及数据处理系统
US10114580B1 (en) Data backup management on distributed storage systems
CN106776130B (zh) 一种日志恢复方法、存储装置和存储节点
US8433947B2 (en) Computer program, method, and apparatus for controlling data allocation
US7761431B2 (en) Consolidating session information for a cluster of sessions in a coupled session environment
CN109582213B (zh) 数据重构方法及装置、数据存储系统
CN113176858B (zh) 数据处理方法、存储系统及存储设备
CN110018783B (zh) 一种数据存储方法、装置及系统
US11449402B2 (en) Handling of offline storage disk
CN109726036B (zh) 一种存储系统中的数据重构方法和装置
US11003554B2 (en) RAID schema for providing metadata protection in a data storage system
CN105404565B (zh) 一种双活数据保护方法和装置
CN111124264A (zh) 用于重建数据的方法、设备和计算机程序产品
US10936453B2 (en) Data storage systems using elastic spares
CN104486438A (zh) 分布式存储系统的容灾方法及装置
US20200285401A1 (en) Resiliency schemes for distributed storage systems
US10346610B1 (en) Data protection object store
CN112256204B (zh) 存储资源分配方法、装置、存储节点及存储介质
CN110825552B (zh) 数据存储方法、数据恢复方法、节点及存储介质
CN114721585A (zh) 存储管理方法、设备和计算机程序产品
KR100994342B1 (ko) 분산 파일 시스템 및 복제본 기반 장애 처리 방법
CN116069545A (zh) 一种用于数据异步冗余编码的方法、装置及设备
CN108153606A (zh) 一种无冗余保护集群实现前端业务连续性方法
JP7491545B2 (ja) 情報処理方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18768099

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2018768099

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2018768099

Country of ref document: EP

Effective date: 20191017