WO2015165351A1 - 一种数据存储方法和设备 - Google Patents

一种数据存储方法和设备 Download PDF

Info

Publication number
WO2015165351A1
WO2015165351A1 PCT/CN2015/077214 CN2015077214W WO2015165351A1 WO 2015165351 A1 WO2015165351 A1 WO 2015165351A1 CN 2015077214 W CN2015077214 W CN 2015077214W WO 2015165351 A1 WO2015165351 A1 WO 2015165351A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
storage device
value
storage
daily
Prior art date
Application number
PCT/CN2015/077214
Other languages
English (en)
French (fr)
Inventor
岳银亮
熊劲
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2015165351A1 publication Critical patent/WO2015165351A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/16Protection against loss of memory contents

Definitions

  • the present invention relates to the field of computer and storage technologies, and in particular, to a data storage method and device.
  • Data has different input/output (I/O) access characteristics at different stages throughout the lifecycle from generation to death; for example, in the data generation phase, data needs to be written to the storage system at high speed; During the analysis phase, the data needs to be read or scanned at high speed to participate in the calculation.
  • I/O input/output
  • RAID10 Redundant Arrays of Independent Disks
  • RAID10 is a commonly used disk array, including a set of primary disks and a set of mirrored disks. RAID10 divides a copy of data into multiple parts and stores them on multiple primary disks to improve read and write performance. Another copy is stored on the corresponding mirror disk to improve reliability.
  • RoRo is used to reduce the energy consumption of the RAID 10 of the disk array.
  • the rotating log architecture consolidates the free space of multiple mirrored disks into a logical pool of log space resources. By mining spatial time slices for decentralized synchronization, the log space resource pool can be recycled to improve system performance and energy efficiency.
  • multiple mirrored disks are used in turn as log-day log disks, while non-valued log disks are switched to a low-power state, thereby reducing power consumption.
  • the current disk array storage system with a rotating log architecture has the following drawbacks: an existing disk array storage system virtualizes multiple physical disks into one virtual disk, and the file system is built on top of the virtual disk, that is, on a single disk. There is no file system, so only the same data organization can be used to write data to all disks. This data organization is either write-optimized to improve write performance, or read optimization to improve read performance, but not At the same time, both read and write performance.
  • the embodiment of the invention provides a data storage method and device, so as to solve the technical problem that the storage system of the existing data copy mode cannot balance the read/write performance to a certain extent.
  • a first aspect of the present invention provides a data storage method for a storage system, where the storage system includes a primary storage subsystem and a standby storage subsystem; the primary storage subsystem includes N primary storage devices, and the standby storage device The system includes N backup storage devices corresponding to the N primary storage devices; N is a positive integer greater than 1; wherein each primary storage device and each standby storage device has a file system created thereon; the method includes: Writing the first copy of the data to be stored to the plurality of primary storage devices in the primary storage subsystem in a first data organization manner, and writing the second copy of the data to be stored to the standby storage in the second data organization manner a value daily storage device in the subsystem, wherein the value daily storage device is a standby storage device that is only in an active state; and the value stored in the primary storage device corresponding to the value of the daily storage device is The storage device inconsistent data readout, and the read inconsistent data is written into the value daily storage device in a third data organization manner; wherein the first data organization mode and the second number
  • the method further includes: determining whether the occupancy of the value log space of the value daily storage device reaches a preset value; if the occupancy reaches a preset value, the value date is The standby storage device is switched to the sleep state, and the other standby storage device is switched to the working state as the value daily storage device.
  • the first data organization manner is a log structure merge LSM manner; the second data organization manner It is the log structure file system LFS mode; the third data organization mode is the B+ tree mode.
  • the first copy of the data to be stored is written into the primary storage by the first data organization Before the system, the method further includes: creating a key value storage system LevelDB on each primary storage device; dividing two storage areas in each standby storage device, wherein the first storage area creates an LFS, and the second storage area Create a key value storage system BDB.
  • the first copy of the data to be stored is written into the primary storage in a first data organization manner
  • the plurality of primary storage devices in the system include: dividing the first copy of the data to be stored into a plurality of parts, and writing the plurality of parts to the LevelDB of the plurality of primary storage devices; the data to be stored
  • the second copy is written to the value in the standby storage subsystem in the second data organization manner: the second copy of the data to be stored is written into the LFS of the value daily storage device;
  • Writing the read inconsistent data in the third data organization manner to the value daily storage device includes writing the read inconsistent data into the BDB of the value daily storage device.
  • the storage system is a disk array or a node array.
  • a second aspect of the present invention provides a data storage device for a storage system, where the storage system includes a primary storage subsystem and a standby storage subsystem; the primary storage subsystem includes N primary storage devices, and the standby storage device
  • the system includes N standby storage devices corresponding to the N primary storage devices; N is a positive integer greater than 1; wherein each primary storage device and each standby storage device has a file system created thereon;
  • the device includes: a first read/write module, configured to write the first copy of the data to be stored into the plurality of primary storage devices in the primary storage subsystem in a first data organization manner; and the second read/write module configured to store the data to be stored
  • the second copy is written to the value of the standby storage device in the standby storage subsystem in the second data organization manner, wherein the value of the daily storage device is the only standby storage device in the working state;
  • the third read/write module is used Reading data that is stored in the main storage device corresponding to the value of the daily storage device and that is inconsistent
  • the device further includes: a storage device monitoring module, configured to determine whether the occupancy of the value log space of the value daily storage device reaches a preset value; if the occupancy reaches a preset value Then, the value of the daily storage device is switched to the sleep state, and the other standby storage device is switched to the working state as the value daily storage device.
  • a storage device monitoring module configured to determine whether the occupancy of the value log space of the value daily storage device reaches a preset value; if the occupancy reaches a preset value Then, the value of the daily storage device is switched to the sleep state, and the other standby storage device is switched to the working state as the value daily storage device.
  • the device further includes: a creating module, configured to create a key value storage system LevelDB on each primary storage device; and dividing two storage areas in each standby storage device, wherein, in the first storage The area creates the LFS, and the second storage area creates the key value storage system BDB.
  • the first read/write module is specifically configured to divide the first copy of the data to be stored into multiple And writing the plurality of parts to the LevelDB of the plurality of primary storage devices; the second read/write module is specifically configured to write the second copy of the data to be stored into the LFS of the value daily storage device The third read/write module is specifically configured to write the read inconsistent data into the BDB of the value daily storage device.
  • a third aspect of the present invention provides a computer device, which may include: a processor, a memory, a communication interface, and a bus; the processor, the memory, and the communication interface communicate with each other through the bus; and the memory includes a main storage subsystem And a storage subsystem; the primary storage subsystem includes N primary storage devices, and the standby storage subsystem includes N backup storage devices corresponding to the N primary storage devices; N is a positive integer greater than 1; a file system is created on each of the primary storage devices and each of the secondary storage devices; wherein the processor is configured to write the first copy of the data to be stored into the primary storage subsystem in a first data organization manner The plurality of primary storage devices write the second copy of the data to be stored in the second data organization manner to the value daily storage device in the standby storage subsystem, where the value of the daily storage device is the only device in the working state.
  • the data stored in the main storage device corresponding to the value of the daily storage device that is inconsistent with the value of the daily storage device is read, and the readout is inconsistent
  • the third data organization manner the value of the daily storage device is written; wherein the first data organization mode and the second data organization mode are write optimized, and the second data organization mode is fast. In the first data organization mode of writing speed, the third data organization is read optimization.
  • the embodiment of the present invention adopts a file system created on each primary storage device and each standby storage device, and writes the first copy of the data to be stored into the primary storage device in the first data organization manner.
  • the system writes the second copy of the data to be stored into the value storage device in the standby storage subsystem in the second data organization manner, and stores the value in the primary storage device corresponding to the value storage device.
  • the technical solution that the values of the data storage device inconsistent data are read out and the read inconsistent data are written into the value daily storage device in a third data organization manner achieves the following technical effects:
  • a file system is created on each primary storage device and each standby storage device, and the data is written into the primary storage device and the standby storage device in different data organization manners, which can satisfy various applications and multiple load types;
  • One and the second data organization is write-optimized, which can improve the writing speed of data;
  • the third data organization is read-optimized, so that the data on the final storage device is mostly in the third organization mode. It can have high readout performance to meet the needs of the data analysis phase; thus making the entire storage system both read and write.
  • the writing speed of the second data organization mode is faster than the writing speed of the first data organization mode, and the write bottleneck of the value daily storage device can be avoided.
  • FIG. 1 is a schematic diagram of a data storage method provided by the present invention.
  • FIG. 2 is a schematic diagram of another data storage method provided by the present invention.
  • FIG. 3 is a schematic diagram of a disk array RAID 10 in an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a data storage operation using the RAID 10 as an example in the method of the embodiment of the present invention
  • FIG. 5 is a schematic diagram of a data storage device provided by the present invention.
  • FIG. 6 is a schematic diagram of another data storage device provided by the present invention.
  • FIG. 7 is a schematic diagram of a computer device provided by the present invention.
  • the currently used disk array storage system with a rotating log architecture has the following drawbacks:
  • the file system is built on the virtual disk, and the same data organization is used to write data to all the disks.
  • the data organization is either write optimized to improve the write performance, or Read optimization to improve read performance, but not both.
  • one copy of the data is divided into multiple parts and written to multiple primary disks, and the writing speed is faster; but another copy of the data is only written to a value date log disk, write The slower entry speed, which fails to keep up with the write speed of the primary disk, creates a bottleneck that affects the performance of the entire disk array.
  • the embodiment of the invention provides a data storage method and device, so as to solve the problem that the storage system of the existing data copy mode is slow in reconstruction due to implementation at the block level, cannot reflect the characteristics of the data structure layer, and the same type is adopted.
  • the way data is organized to write data to all disks does not allow for both read and write performance, and write bottlenecks can occur in the log disk on the value date.
  • an embodiment of the present invention provides a data storage method.
  • the method is applied to a storage system, the storage system comprising a primary storage subsystem and a standby storage subsystem; the primary storage subsystem comprising N primary storage devices, the standby storage subsystem comprising corresponding to the N primary storages N spare storage devices of the device; N is a positive integer greater than one.
  • the storage system may be a disk array or a node array, and the primary and backup storage devices may be disks or nodes.
  • a file system is created on each primary storage device and each standby storage device; file systems created on different storage devices may be the same or different.
  • the method of the embodiment of the present invention may include:
  • two copies of each data to be stored are copied, wherein the first copy is written to the primary storage subsystem and the second copy is written to the standby storage subsystem.
  • the N main storage devices of the main storage subsystem can be kept in operation, and the first copy is divided into N parts, and each part is written into one main storage device.
  • one of the N storage devices of the standby storage subsystem may be kept in an active state during any period of time, and the second copy of the data is written into the value as a value storage device.
  • the value of the daily storage device; other storage devices are in a low-energy state such as sleep or standby to reduce energy consumption.
  • the first data organization mode and the second data organization mode are write optimized; and, in order to improve the read performance of the data in the analysis stage, the third data organization mode is It is read-optimized; thus making the entire storage system both readable and writable.
  • the second data organization mode can be written faster than the first data organization mode; thus, although the data is to be stored first
  • the copy is written to multiple main storage devices at the same time by the first data organization, which can improve the writing speed.
  • the second copy of the data is written to the value of the daily storage device than the first data organization.
  • the second way to organize data is faster, so that you can balance and try to write the two copies of the data to the primary and secondary storage subsystems simultaneously or nearly simultaneously to improve the write performance of the entire storage system.
  • a file system is established on each storage device, and the data of the written storage device exists in a certain organization form, and can reflect the characteristics of the data structure layer.
  • the embodiment of the present invention provides a data storage method, which adopts the above technical features, and achieves the following technical effects: a file system is created on each primary storage device and each standby storage device, and the data are different.
  • the data organization mode is written into the main storage device and the standby storage device, which can satisfy various applications and multiple load types; wherein the first and second data organization modes are write optimized, and the data writing speed can be improved;
  • the third type of data organization is read-optimized, so that the data on the final storage device is mostly in the third organization mode, and can have higher read performance to meet the needs of the data analysis phase; thereby making the entire storage system Both read and write performance.
  • the second data set The write speed of the woven mode is faster than the write speed of the first data organization mode, which can avoid the write bottleneck of the value storage device.
  • the method further includes: determining whether the occupancy of the value log space of the value daily storage device reaches a preset value.
  • each of the backup storage devices is a mirror storage device of the corresponding primary storage device, and the free storage space on all the storage devices is regarded as a log space, and the log space provided by the value storage device is called a log space.
  • Value log space the usage of the value log space of the value daily storage device can be monitored in real time, and it is determined whether the occupancy of the log space of the value day reaches a preset value.
  • step 110 the process proceeds to step 110, and the two copies of the data are respectively written into the primary storage subsystem and the value storage device. .
  • the value of the daily storage device is switched to the sleep state, and the other standby storage device is switched to the working state as the value of the daily storage device. .
  • a synchronization process can be triggered, and the synchronization process is used to synchronize the data in the corresponding primary storage device to the date log storage device. That is, in some embodiments, the step 120 may be triggered when the value storage device is switched, and the data stored in the main storage device corresponding to the value storage device that is inconsistent with the value storage device is read. The read inconsistent data is written into the value daily storage device in a third data organization manner.
  • the synchronization operation can be performed in the background of the computer device, using the free bandwidth and free space of the storage device to synchronize the read inconsistent data into the value day storage device, thereby not consuming additional energy.
  • the first and second data organization modes used in the above data writing may select a data organization mode with higher write performance; in order to improve the performance of the read data, The third data organization mode can select a data organization mode with higher read performance, so that the subsequent backup storage device can provide higher read performance, and since the synchronous operation is performed in the background, it can be ignored.
  • the third data organization mode of write performance in order to improve the write performance.
  • the first type of data organization may be selected with a sequential log structure, such as a Log Structured Merge (LSM) mode; and the second data organization mode may select an unordered log structure.
  • LSM Log Structured Merge
  • the second data organization mode may select an unordered log structure.
  • log structure file system Log Structured File System, LFS
  • LFS Log Structured File System
  • B+tree mode a local update index structure
  • LSM and LFS can provide better write performance
  • B+ tree can provide better read performance.
  • LFS write speed is faster than LSM write speed.
  • each primary storage device and each standby storage device may be formatted into an arbitrary file system, respectively;
  • An LSM system is created on each primary storage device, for example, a key value storage system LevelDB; two storage areas are defined in each storage device, wherein the LFS is created in the first storage area, and the second storage area is created.
  • Create a B+ tree system for example, specifically a key value storage system BDB.
  • the writing the first copy of the data to be stored to the primary storage subsystem in the first data organization manner may include: dividing the first copy of the data to be stored into multiple parts, A plurality of portions are respectively written into the LevelDB of the plurality of primary storage devices of the primary storage subsystem.
  • the writing the second copy of the data to be stored to the value in the standby storage subsystem in the second data organization manner may include: writing the second copy of the data to be stored into the value daily In the LFS of the storage device.
  • the writing the read inconsistent data to the value daily storage device in a third data organization manner may include writing the read inconsistent data into the BDB of the value daily storage device.
  • the first storage area creates an LFS, and the storage area is used for data writing; the second storage area creates a BDB, This storage area is used for data synchronization. Since a plurality of standby storage devices are used in turn as value-valued storage devices, during the rotation process, the copy data written in the LFS mode in the first storage area is only temporary data, is continuously written, and is non-stop. Is released, so the first storage area does not need to be too large; the second storage area is used to store data synchronized from the primary storage device, which is data that needs to be saved for a long time, is written continuously, but is generally not released. Therefore, the second storage area requires a large storage space and can occupy most of the space of the storage device.
  • all the data on the primary storage device is stored in the first data organization mode, which not only reflects the characteristics of the data structure layer, but also has high write performance; the copy of the data on the storage device that needs to be stored for a long time is
  • the third type of data organization which also reflects the characteristics of the data structure layer, provides higher read performance. Thus, when subsequent applications need to read data, they can be higher The read speed is read from the backup storage device.
  • the method of the embodiment of the invention is preferably applicable to a storage system oriented to a cloud storage environment.
  • the embodiment of the present invention discloses a data storage method, in which a file system is created on each primary storage device and each standby storage device, and the first copy of the data to be stored is first.
  • the data organization mode is written into the main storage subsystem, and the second copy of the data to be stored is written in the second data organization manner to the value daily storage device in the standby storage subsystem, and the value of the daily storage device is corresponding.
  • the data stored in the main storage device that is inconsistent with the value of the daily storage device is read, and the read inconsistent data is written into the value of the daily storage device in a third data organization manner.
  • a file system is created on each of the primary storage devices and each of the backup storage devices.
  • the data is stored in the primary and secondary storage devices in a certain data organization manner, which can reflect the characteristics of the data structure layer and improve the reconstruction speed. If a primary storage device fails and data is recovered from the corresponding storage device, the data organization can directly know which data needs to be recovered, so that it is not necessary to identify each data block, and thus the speed block is reconstructed.
  • the data is written into the main storage device and the standby storage device in different data organization manners, which can satisfy various applications and multiple load types.
  • the first and second data organization methods are write-optimized to improve the speed of data writing;
  • the third data organization is read-optimized, so that the data on the final storage device is mostly in the third organization.
  • the mode exists and can have high readout performance to meet the needs of the data analysis phase.
  • the second data organization mode write speed is faster than the first data organization mode write speed, and the write bottleneck on the value daily storage device can be avoided.
  • the data stored in the main storage device corresponding to the value-receiving storage device that is inconsistent with the value of the daily storage device is synchronized to the value-based storage device by using the free bandwidth and the free space of the storage device in the background.
  • the conversion of data organization is achieved without consuming additional energy.
  • the data is written into the main storage device and the standby storage device in a certain data organization manner, which is implemented in the data structure layer, and is flexible in implementation, and can be implemented on the block device or on the node, for example, can be used for Disk arrays, also available for node arrays, and can be implemented in single-node multi-disk rings Environment can also be implemented in a multi-node distributed storage environment.
  • the storage system is a disk array RAID 10 as an example.
  • RAID 10 includes a set of primary disks and a set of spare disks (or mirror disks).
  • a set of N primary disks constitutes a primary storage subsystem (or a primary disk group), and a set of N mirrored disks constitute a standby storage subsystem (or a standby disk group or a mirror disk group).
  • all mirror disks are regarded as log disks, and the free storage space on all mirror disks is regarded as the log space available by the log disk; only one log disk is kept active during any period of time ( That is, the working state), responding to the write operation request; the mirror disk that remains active is called the value date log disk, and the log space provided by the value date log disk is called the value day log space.
  • the embodiment of the scenario includes the following steps:
  • Receiving a key value request step receiving a key value request from an application, where the key value request carries data to be stored (ie, key value data), and the key value request may specifically be a write (PUT) or a read (GET) Or delete (DELETE) and so on.
  • the key value request carries data to be stored (ie, key value data)
  • the key value request may specifically be a write (PUT) or a read (GET) Or delete (DELETE) and so on.
  • Key data redirection step redirect the received key value request to the destination disk, that is, the corresponding primary disk and the date log disk.
  • switching the value log disk includes: switching the log log disk from the active working state to the hibernation or standby state, and switching the other log log disk to the working state; meanwhile, The switching operation of the log-day log disk triggers a synchronization process.
  • the data stored in the primary disk corresponding to the log disk of the value is inconsistent with the log disk of the value date, and the inconsistent data read is B+ tree. The way to write the value log log disk;
  • the disk array RAID 10 includes six disks, wherein three master disks are represented by P0, P1, and P2, respectively, and three mirror disks corresponding to the three primary disks are respectively M0, M1, and M2. Said. The corresponding two disks become mirrored disk pairs.
  • the RAID 10 includes three mirror disk pairs, which are represented by (P0, M0), (P1, M1) and (P2, M2) respectively.
  • the cylinder represents the disk
  • the black shaded portion of the cylinder represents the storage space already occupied by the disk
  • the white portion represents the storage space that has not been occupied by the disk.
  • each of the three mirrored disks M0, M1, and M2 has 50% of free storage space, that is, 50% of the log space.
  • the three mirror disks M0, M1, and M2 connected by the arrowed curve are used as log disks, and the free storage spaces on the three log disks are respectively indicated by scatter and twill as the log space.
  • the scatter and twill sections connected by the curve with arrows represent the log space made up of the free storage space of all three mirrored disks.
  • the mirror disk where the scatter is located is the date log disk
  • the disk where the tiling is located is the non-value log disk.
  • M0, M1 and M2 are sequentially used as log-day log disks, that is, in the 0th log cycle, M0 is the value log disk; in the first log cycle, M1 is the value log disk; in the second log cycle, M2 is the value log disk; in the third log cycle, M0 is again the value log disk; and so on.
  • the key space of the key value data is divided into equal-length key ranges (Key Range, KR), which are labeled as KR1, KR2, KR3, KR4, ..., KRi, respectively.
  • KR Key Range
  • M0 is used as the value date log disk, and the first copy of the newly written data in the log period T0 is divided.
  • the three parts, D0T0, D1T0 and D2T0, are written to the primary disks P0, P1 and P2, respectively; the second copies D0T0, D1T0 and D2T0 are written to the mirror disk M0.
  • the first copy of the newly written data is divided into three parts, namely D0T1, D1T1 and D2T1, which are written to the main disks P0, P1 and P2, respectively; the second copies D0T1, D1T1 and D2T1 , will be written to the mirror disk M1. Later, and so on.
  • the dotted line with arrows and the data layout shown in Figure 4 show the basic principles of circular logging.
  • the switch of the value day log disk triggers a synchronization process.
  • M0 is selected as the value date log disk. Since there is no inconsistent data between the 0th mirror disk pair (P0, M0) before T0, therefore, Mirroring within T0 There is no synchronization between the disk pairs (P0, M0).
  • M1 is selected as the value date log disk.
  • Figures 4(b), (c) and (d) show the distribution of key-value data on the disk group at the end of the three log periods T0, T1 and T2, respectively.
  • DmTn represents all the key value data of the mth mirror disk pair (Pm, Mm) written in the nth log period Tn.
  • m is 0, 1 or 2
  • n is greater than or equal to 0.
  • the blank square indicates the storage space on the primary disk and the mirror disk that has not been occupied.
  • the twill square indicates that the storage space indicated by the area on the disk has been released.
  • the square with vertical stripes indicates the logic on the primary disk.
  • the key value data corresponding to the area has been synchronously updated to the B+ tree in the mirror disk.
  • a new synchronization process is triggered and the new synchronization process is terminated only after all inconsistent data on the log disk has been updated.
  • M0 is selected as the value date log disk
  • the key value write operation request reaches the mirror disk pair (P0, M0)
  • the key value write operation request data is written to the master.
  • the key value write operation request data is in the log structure file.
  • the system mode is sequentially written to the value log disk M0; if the value of the log volume disk on the M0 log value in the T0 exceeds the preset threshold T, the M0 is switched to the low power standby state.
  • Select M1 as the new value log disk switch M1 to the active state of high power consumption, and trigger the synchronization process between mirrored disk pairs (P1, M1).
  • Figures 4(c) and (d) show that in T1 and T2, the unupdated key-value data in (P1, M1) and (P2, M2) are read out from the LSM of P1 or P2, respectively.
  • log disks M2 and M0 are selected as the new value date log disks, respectively.
  • M0 can be selected again as a value date log disk.
  • most of the occupied log space on log disks M1 and M2 follows the mirror disk pair (P0, M0), (P2, M2), and mirror disk pair (P0, M0), (P1, M1). The synchronization process between them is released, so M1 and M2 can also be selected again as the value date log disk.
  • the disk array needs to be initialized in advance, including:
  • each primary disk and each storage disk are formatted into an arbitrary file system; and an LSM system is created on each primary disk, for example, a key value storage system LevelDB; Two storage areas are defined in the disk, wherein the LFS is created in the first storage area; the second storage area creates a B+ tree system, for example, the key value storage system BDB.
  • the subsequent process enables: writing the first copy of the data to be stored into the LevelDB of the primary storage subsystem, writing the second copy of the data to be stored into the LFS of the value storage device, and, during the synchronization process The read inconsistent data is written into the BDB of the value day storage device.
  • the storage system is not limited to a disk array, and the disk in the disk array may be a bare disk or may be formatted as A disk behind a particular file system; the storage system can also be a node array, and the node array can be adapted to a multi-node distributed environment.
  • first, second and third data organization methods described above can be used as needed.
  • the first data organization can choose LSM, but you can also choose B+ tree, or other organization; the second data organization can choose LFS or B+ tree; the third data organization can choose B+ tree Or LSM; it will not be described in detail here.
  • the embodiment of the present invention discloses a data storage method, and the following technical effects are obtained:
  • a file system is created on each of the primary storage devices and each of the storage devices.
  • the data is stored in a certain data organization manner in both the primary and secondary storage devices, which can reflect the characteristics of the data structure layer and improve the reconstruction speed. If a primary storage device fails and data is recovered from the corresponding storage device, the data organization can directly know which data needs to be recovered, so that it is not necessary to identify each data block, and thus the speed block is reconstructed.
  • the data is written into the main storage device and the standby storage device in different data organization manners, which can satisfy various applications and multiple load types.
  • the first and second data organization methods are write-optimized to improve the speed of data writing;
  • the third data organization is read-optimized, so that the data on the final storage device is mostly in the third organization.
  • the mode exists and can have high readout performance to meet the needs of the data analysis phase.
  • the second data organization mode write speed is faster than the first data organization mode write speed, and the write bottleneck on the value daily storage device can be avoided.
  • the data stored in the main storage device corresponding to the value-receiving storage device that is inconsistent with the value of the daily storage device is synchronized to the value-based storage device by using the free bandwidth and the free space of the storage device in the background.
  • the conversion of data organization is achieved without consuming additional energy.
  • the data is written into the main storage device and the standby storage device in a certain data organization manner, which is implemented in the data structure layer, and is flexible in implementation, and can be implemented on the block device or on the node, for example, can be used for Disk arrays can also be used for node arrays, and can be implemented in a single-node multi-disk environment or in a multi-node distributed storage environment.
  • an embodiment of the present invention provides a data storage device 500.
  • the device is for a storage system, the storage system including a primary storage subsystem and a standby storage subsystem; the primary storage subsystem includes N primary storage devices, the standby storage subsystem comprising N standby storage devices corresponding to the N primary storage devices; N being a positive integer greater than 1; wherein each primary storage device and each standby storage device Both have file systems created.
  • the device 500 can include:
  • the first read/write module 510 is configured to write the first copy of the data to be stored into the plurality of primary storage devices in the primary storage subsystem in a first data organization manner;
  • the second read/write module 520 is configured to write the second copy of the data to be stored into the value storage device in the standby storage subsystem in a second data organization manner, where the value of the daily storage device is unique Stateful storage device;
  • the third reading and writing module 530 is configured to read out data that is stored in the main storage device corresponding to the value of the daily storage device and is inconsistent with the value of the daily storage device, and read the inconsistent data to the third Data organization mode is written into the value daily storage device;
  • the first data organization mode and the second data organization mode are write optimization, and the second data organization mode write speed is faster than the first data organization mode write speed, and the third data organization mode It is read optimized.
  • the device may further include:
  • the storage device monitoring module 540 is configured to determine whether the occupancy of the value log space of the value of the daily storage device reaches a preset value; if the occupancy reaches a preset value, the value of the daily storage device is switched to the sleep state. And switch another standby storage device to the working state as a value daily storage device.
  • the device may further include:
  • a creating module 550 configured to create a key value storage system LevelDB on each primary storage device; divide two storage areas in each standby storage device, wherein the first storage area creates an LFS, and the second storage area Create a key value storage system BDB.
  • the first read/write module is specifically configured to divide the first copy of the data to be stored into multiple parts, and write the multiple parts into the LevelDB of the multiple primary storage devices.
  • the second read/write module is specifically configured to write a second copy of the data to be stored into the LFS of the value daily storage device; the third read/write module is specifically configured to write the read inconsistent data. Enter the BDB of the value of the daily storage device.
  • the data storage device of the embodiment of the present invention may be, for example, a computer device including a disk array, or Manage the network devices of the node array.
  • a file system is created on each primary storage device and each standby storage device, and the first copy of the data to be stored is first.
  • the data organization mode is written into the primary storage subsystem, and the second copy of the data to be stored is written in the second data organization manner to the value daily storage device in the standby storage subsystem, and the value of the daily storage device is corresponding.
  • the data stored in the main storage device that is inconsistent with the value of the daily storage device is read, and the read inconsistent data is written into the value daily storage device in a third data organization manner, and the technical solution is obtained.
  • a file system is created on each of the primary storage devices and each of the backup storage devices.
  • the data is stored in the primary and secondary storage devices in a certain data organization manner, which can reflect the characteristics of the data structure layer and improve the reconstruction speed. If a primary storage device fails and data is recovered from the corresponding storage device, the data organization can directly know which data needs to be recovered, so that it is not necessary to identify each data block, and thus the speed block is reconstructed.
  • the data is written into the main storage device and the standby storage device in different data organization manners, which can satisfy various applications and multiple load types.
  • the first and second data organization methods are write-optimized to improve the speed of data writing;
  • the third data organization is read-optimized, so that the data on the final storage device is mostly in the third organization.
  • the mode exists and can have high readout performance to meet the needs of the data analysis phase.
  • the second data organization mode write speed is faster than the first data organization mode write speed, and the write bottleneck on the value daily storage device can be avoided.
  • the data stored in the main storage device corresponding to the value-receiving storage device that is inconsistent with the value of the daily storage device is synchronized to the value-based storage device by using the free bandwidth and the free space of the storage device in the background.
  • the conversion of data organization is achieved without consuming additional energy.
  • the data is written into the main storage device and the standby storage device in a certain data organization manner, which is in the data
  • the implementation of the structural layer is more flexible, and can be implemented on a block device or on a node. For example, it can be used for a disk array or a node array, and can be implemented in a single-node multi-disk environment, or Implemented in a multi-node distributed storage environment.
  • the embodiment of the present invention further provides a computer storage medium, where the computer storage medium can store a program, and the program includes some or all of the steps of the data storage method described in the foregoing method embodiments.
  • an embodiment of the present invention further provides a computer device 700, which may include:
  • the memory 720 is configured to store a program; the processor 710 is configured to execute the program in the memory.
  • the memory 720 may include a primary storage subsystem and a standby storage subsystem; the primary storage subsystem includes N primary storage devices, and the standby storage subsystem includes N standby devices corresponding to the N primary storage devices A storage device; N is a positive integer greater than one; wherein a file system is created on each primary storage device and each of the standby storage devices.
  • the primary and secondary storage devices can all be disks.
  • the processor 710 is configured to write the first copy of the data to be stored into the plurality of primary storage devices in the primary storage subsystem in a first data organization manner, and the second copy of the data to be stored in the second
  • the data organization mode is written into the value daily storage device in the standby storage subsystem, wherein the value daily storage device is the only standby storage device in the working state; and the value storage device stored in the primary storage device corresponding to the daily storage device is stored.
  • Data readout inconsistent with the value of the daily storage device, and the read inconsistent data is written into the value daily storage device in a third data organization manner; wherein the first data organization mode and the second data organization
  • the data organization method is write optimized, and the second data organization mode is faster than the first data organization mode, and the third data organization mode is read optimization.
  • the processor 710 is further configured to determine whether the occupancy of the value log space of the value daily storage device reaches a preset value; if the occupancy reaches a preset value, the value is prepared.
  • the storage device switches to the sleep state and switches the other standby storage device to the working state as the value daily storage device.
  • the first data organization mode is a log structure merge LSM mode; the second data organization mode is a log structure file system LFS mode; and the third data organization mode is B+ Tree way.
  • the processor 710 is further configured to create a key value storage system LevelDB on each primary storage device; two storage areas are divided in each standby storage device, wherein, in the first storage The area creates the LFS, and the second storage area creates the key value storage system BDB.
  • the processor 710 is specifically configured to divide the first copy of the data to be stored into multiple parts, and write the multiple parts into the LevelDB of the multiple primary storage devices; A second copy of the stored data is written into the LFS of the value daily storage device; the read inconsistent data is written into the BDB of the value daily storage device.
  • a file system is created on each of the primary storage devices and each of the backup storage devices.
  • the data is stored in the primary and secondary storage devices in a certain data organization manner, which can reflect the characteristics of the data structure layer and improve the reconstruction speed. If a primary storage device fails and data is recovered from the corresponding storage device, the data organization can directly know which data needs to be recovered, so that it is not necessary to identify each data block, and thus the speed block is reconstructed.
  • the data is written into the main storage device and the standby storage device in different data organization manners, which can satisfy various applications and multiple load types.
  • the first and second data organization methods are write-optimized to improve the speed of data writing;
  • the third data organization is read-optimized, so that the data on the final storage device is mostly in the third organization.
  • the mode exists and can have high readout performance to meet the needs of the data analysis phase.
  • the second data organization mode write speed is faster than the first data organization mode write speed, and the write bottleneck on the value daily storage device can be avoided.
  • the value of the daily storage device stored in the primary storage device corresponding to the value storage device The inconsistent data is utilized in the background to utilize the free bandwidth and free space of the storage device, and is synchronized to the value of the daily storage device, and the data organization mode is converted without consuming additional energy.
  • the data is written into the main storage device and the standby storage device in a certain data organization manner, which is implemented in the data structure layer, and is flexible in implementation, and can be implemented on the block device or on the node, for example, can be used for Disk arrays can also be used for node arrays, and can be implemented in a single-node multi-disk environment or in a multi-node distributed storage environment.
  • the program may be stored in a computer readable storage medium, and the storage medium may include: ROM, RAM, disk or CD.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据存储方法和设备,以在一定程度上解决现有的数据副本方式的存储系统不能兼顾读写性能的技术问题。在一些可行的实施方式中,该方法包括:将待存储数据的第一个副本以第一种数据组织方式写入主存储子系统中的多个主存储装置,将待存储数据的第二个副本以第二种数据组织方式写入备存储子系统中的值日备存储装置,所述值日备存储装置是唯一处于工作状态的备存储装置(110);将所述值日备存储装置对应的主存储装置中存储的、与所述值日备存储装置不一致的数据读出,并将读出的不一致数据以第三种数据组织方式写入所述值日备存储装置(120);第一种数据组织方式和第二种数据组织方式是写优化的,且第二种数据组织方式的写入速度快于第一种数据组织方式的写入速度,第三种数据组织方式则是读优化的。

Description

一种数据存储方法和设备 技术领域
本发明涉及计算机和存储技术领域,具体涉及一种数据存储方法和设备。
背景技术
数字技术的进步和存储技术的发展催生了海量的数据,数据需要以一定的组织形式存储在存储系统中。数据从产生到消亡的整个生命周期中,在不同的阶段有不同的输入输出(input/output,I/O)访问特征;比如,在数据产生阶段,数据需要以高速写入存储系统;在数据分析阶段,数据需要以高速读出或扫描从而参与计算。
为了提高存储可靠性,以副本方式存储数据越来越得到认可。副本是数据冗余方式的未来趋势。数据副本的典型场景是磁盘阵列(Redundant Arrays of Independent Disks,RAID,全称独立磁盘冗余阵列,简称磁盘阵列)。RAID10是一种常用的磁盘阵列,包括一组主磁盘和一组镜像磁盘,RAID10将数据的一个副本分成多个部分,分别存储到多个主磁盘上,以提高读写性能;并且,将数据的另一个副本存储到对应的镜像磁盘上,以提高可靠性。
存储系统的成本中,软硬件等一次性投入成本所占的比例有限,而能耗成本则逐渐成为存储系统总成本中的主要部分。一种现有技术中,采用旋转日志架构RoLo来降低磁盘阵列RAID10的能耗。旋转日志架构将多个镜像磁盘的空闲空间整合成一个逻辑的日志空间资源池。通过挖掘空间时间片来做分散式的同步,该日志空间资源池可以被循环地利用以提高系统的性能和能效。换句话说,多个镜像磁盘被轮流用作值日日志盘,同时非值日日志盘被切换到低能耗状态,从而降低能耗。
目前常用的旋转日志架构方式的磁盘阵列存储系统具有以下缺陷:现有的磁盘阵列存储系统将多个物理磁盘虚拟化为一个虚拟磁盘,文件系统建立在虚拟磁盘之上,即,单个磁盘之上没有文件系统,因而只能采用同一种数据组织方式向所有的磁盘中写入数据,这种数据组织方式要么是写优化的,以提高写性能,要么是读优化,以提高读性能,而不能同时兼顾读写性能。
发明内容
本发明实施例提供一种数据存储方法和设备,以在一定程度上解决现有的数据副本方式的存储系统不能兼顾读写性能的技术问题。
本发明第一方面提供一种数据存储方法,用于存储系统,所述存储系统包括主存储子系统和备存储子系统;所述主存储子系统包括N个主存储装置,所述备存储子系统包括对应于所述N个主存储装置的N个备存储装置;N为大于1的正整数;其中每个主存储装置和每个备存储装置上都创建有文件系统;所述方法包括:将待存储数据的第一个副本以第一种数据组织方式写入主存储子系统中的多个主存储装置,将待存储数据的第二个副本以第二种数据组织方式写入备存储子系统中的值日备存储装置,所述值日备存储装置是唯一处于工作状态的备存储装置;将所述值日备存储装置对应的主存储装置中存储的、与所述值日备存储装置不一致的数据读出,并将读出的不一致数据以第三种数据组织方式写入所述值日备存储装置;其中,第一种数据组织方式和第二种数据组织方式是写优化的,且第二种数据组织方式的写入速度快于第一种数据组织方式的写入速度,第三种数据组织方式则是读优化的。
在第一种可能的实现方式中,所述方法还包括:判断值日备存储装置的值日日志空间的占用量是否达到预设值;若占用量达到预设值,则将所述值日备存储装置切换到休眠状态,并将另一个备存储装置切换到工作状态作为值日备存储装置。
结合第一方面或者第一方面的第一种可能的实现方式,在第二种可能的实现方式中,所述第一种数据组织方式是日志结构合并LSM方式;所述第二种数据组织方式是日志结构文件系统LFS方式;所述第三种数据组织方式是B+树方式。
结合第一方面或者第一方面的第二种可能的实现方式,在第三种可能的实现方式中,所述将待存储数据的第一个副本以第一种数据组织方式写入主存储子系统之前,还包括:在每个主存储装置上创建键值存储系统LevelDB;在每个备存储装置中划分出两个存储区域,其中,在第一个存储区域创建LFS,第二个存储区域创建键值存储系统BDB。
结合第一方面或者第一方面的第三种可能的实现方式,在第四种可能的实现方式中,所述将待存储数据的第一个副本以第一种数据组织方式写入主存储子系统中的多个主存储装置包括:将待存储数据的第一个副本分割为多个部分,将所述多个部分分别写入多个主存储装置的LevelDB中;所述将待存储数据的第二个副本以第二种数据组织方式写入备存储子系统中的值日备存储装置包括:将待存储数据的第二个副本写入所述值日备存储装置的LFS中;所述将读出的不一致数据以第三种数据组织方式写入所述值日备存储装置包括:将读出的不一致数据写入所述值日备存储装置的BDB中。
结合第一方面或者第一方面的第一种至第四种方式中的任一种实现方式,在第五种可能的实现方式中,所述存储系统为磁盘阵列或节点阵列。
本发明第二方面提供一种数据存储设备,用于存储系统,所述存储系统包括主存储子系统和备存储子系统;所述主存储子系统包括N个主存储装置,所述备存储子系统包括对应于所述N个主存储装置的N个备存储装置;N为大于1的正整数;其中每个主存储装置和每个备存储装置上都创建有文件系统;所述设备包括:第一读写模块,用于将待存储数据的第一个副本以第一种数据组织方式写入主存储子系统中的多个主存储装置;第二读写模块,用于将待存储数据的第二个副本以第二种数据组织方式写入备存储子系统中的值日备存储装置,所述值日备存储装置是唯一处于工作状态的备存储装置;第三读写模块,用于将所述值日备存储装置对应的主存储装置中存储的、与所述值日备存储装置不一致的数据读出,并将读出的不一致数据以第三种数据组织方式写入所述值日备存储装置;其中,第一种数据组织方式和第二种数据组织方式是写优化的,且第二种数据组织方式的写入速度快于第一种数据组织方式的写入速度,第三种数据组织方式则是读优化的。
在第一种可能的实现方式中,所述设备还包括:存储装置监控模块,用于判断值日备存储装置的值日日志空间的占用量是否达到预设值;若占用量达到预设值,则将所述值日备存储装置切换到休眠状态,并将另一个备存储装置切换到工作状态作为值日备存储装置。
结合第一方面或者第一方面的第一种可能的实现方式,在第二种可能的实 现方式中,所述设备还包括:创建模块,用于在每个主存储装置上创建键值存储系统LevelDB;在每个备存储装置中划分出两个存储区域,其中,在第一个存储区域创建LFS,第二个存储区域创建键值存储系统BDB。
结合第一方面或者第一方面的第二种可能的实现方式,在第三种可能的实现方式中,所述第一读写模块具体用于将待存储数据的第一个副本分割为多个部分,将所述多个部分分别写入多个主存储装置的LevelDB中;所述第二读写模块具体用于将待存储数据的第二个副本写入所述值日备存储装置的LFS中;所述第三读写模块具体用于将读出的不一致数据写入所述值日备存储装置的BDB中。
本发明第三方面提供一种计算机设备,可包括:处理器,存储器,通信接口,总线;所述处理器,存储器,通信接口通过所述总线相互的通信;所述存储器,包括主存储子系统和备存储子系统;所述主存储子系统包括N个主存储装置,所述备存储子系统包括对应于所述N个主存储装置的N个备存储装置;N为大于1的正整数;其中每个主存储装置和每个备存储装置上都创建有文件系统;其中,所述处理器用于将待存储数据的第一个副本以第一种数据组织方式写入主存储子系统中的多个主存储装置,将待存储数据的第二个副本以第二种数据组织方式写入备存储子系统中的值日备存储装置,所述值日备存储装置是唯一处于工作状态的备存储装置;将所述值日备存储装置对应的主存储装置中存储的、与所述值日备存储装置不一致的数据读出,并将读出的不一致数据以第三种数据组织方式写入所述值日备存储装置;其中,第一种数据组织方式和第二种数据组织方式是写优化的,且第二种数据组织方式的写入速度快于第一种数据组织方式的写入速度,第三种数据组织方式则是读优化的。
由上可见,本发明实施例采用在每个主存储装置和每个备存储装置上都创建文件系统,以及,将待存储数据的第一个副本以第一种数据组织方式写入主存储子系统,将待存储数据的第二个副本以第二种数据组织方式写入备存储子系统中的值日备存储装置,将所述值日备存储装置对应的主存储装置中存储的、与所述值日备存储装置不一致的数据读出,并将读出的不一致数据以第三种数据组织方式写入所述值日备存储装置的技术方案,取得了以下技术效果:
每个主存储装置和每个备存储装置上都创建有文件系统,数据分别以不同的数据组织方式写入主存储装置和备存储装置,能够满足多种应用、多种负载类型;其中,第一种和第二种数据组织方式是写优化的,可以提高数据的写入速度;第三种数据组织方式是读优化的,使得最终备存储装置上的数据大都以第三种组织方式存在,可具有较高的读出性能,以满足数据分析阶段的需求;从而使整个存储系统兼顾读写性能。并且,第二种数据组织方式的写入速度快于第一种数据组织方式的写入速度,可避免值日备存储装置产生写入瓶颈。
附图说明
为了更清楚地说明本发明实施例技术方案,下面将对实施例和现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1是本发明提供的一种数据存储方法的示意图;
图2是本发明提供的另一种数据存储方法的示意图;
图3是本发明一个场景实施例中磁盘阵列RAID10的示意图;
图4是本发明实施例方法以RAID10为例的数据存储操作示意图;
图5是本发明提供的一种数据存储设备的示意图;
图6是本发明提供的另一种数据存储设备的示意图;
图7是本发明提供的一种计算机设备的示意图。
具体实施方式
目前常用的旋转日志架构方式的磁盘阵列存储系统具有以下缺陷:
一、现有的磁盘阵列存储系统中,在磁盘之上有一虚拟层,将多个磁盘虚拟化为一个虚拟磁盘,文件系统建立在虚拟磁盘之上,即,单个磁盘之上没有文件系统,每个磁盘都是块设备,整个磁盘阵列是在块级实现的。这就导致:写入磁盘的数据不能反映数据结构层的特征,还会进一步导致块级磁盘阵列的重建速度慢。如果一块磁盘失效,从对应的镜像磁盘恢复数据时,需要对镜像磁盘上的每个数据块进行恢复,因此速度较慢。
二、现有的存储系统中,文件系统建立虚拟磁盘之上,采用同一种数据组织方式向所有的磁盘中写入数据,这种数据组织方式要么是写优化的,以提高写性能,要么是读优化,以提高读性能,而不能同时兼顾读写性能。并且,旋转架构日志方式的磁盘阵列中,数据的一个副本被分成多个部分写入多个主磁盘,写入速度较快;但数据的另一个副本只被写入一个值日日志盘,写入速度较慢,跟不上主磁盘的写入速度,会形成瓶颈,影响整个磁盘阵列的性能。
本发明实施例提供一种数据存储方法和设备,以解决现有的数据副本方式的存储系统因在块级实现而导致的重建速度慢,不能反映数据结构层特征等问题,以及因采用同一种数据组织方式向所有磁盘中写入数据而导致的不能兼顾读写性能,在值日日志盘会出现写入瓶颈等问题。
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。
下面通过具体实施例,分别进行详细的说明。
请参考图1,本发明实施例提供一种数据存储方法。
该方法应用于存储系统,所述存储系统包括主存储子系统和备存储子系统;所述主存储子系统包括N个主存储装置,所述备存储子系统包括对应于所述N个主存储装置的N个备存储装置;N为大于1的正整数。本实施例中,所说的存储系统,可以是磁盘阵列或节点阵列,所说的主、备存储装置可以是磁盘或者节点等。本实施例中,每个主存储装置和每个备存储装置上都创建有文件系统;不同的存储装置上创建的文件系统可以相同,也可以不同。
如图1所示,本发明实施例方法可包括:
110、将待存储数据的第一个副本以第一种数据组织方式写入主存储子系统中的多个主存储装置,将待存储数据的第二个副本以第二种数据组织方式写入备存储子系统中的值日备存储装置,所述值日备存储装置是唯一处于工作状态的备存储装置;
120、将所述值日备存储装置对应的主存储装置中存储的、与所述值日备存储装置不一致的数据读出,并将读出的不一致数据以第三种数据组织方式写入所述值日备存储装置。
本实施例中,将每一份待存储的数据拷贝两个副本,其中第一个副本写入主存储子系统,第二个副本写入备存储子系统。为了提高数据的读写性能,可以将主存储子系统的N个主存储装置都保持在工作状态,将第一个副本分成N个部分,每个部分写入一个主存储装置。为了提高数据的可靠性,在任意时间段内,可以将备存储子系统的N个备存储装置中的一个保持在工作状态,作为值日备存储装置,将数据的第二个副本写入该值日备存储装置;其它备存储装置则处于休眠或待机等低能耗状态,以降低能耗。
本发明实施例中,为了提高写入速度,上述第一种数据组织方式和第二种数据组织方式是写优化的;以及,为了提高数据在分析阶段的读性能,第三种数据组织方式则是读优化的;从而使整个存储系统兼顾读写性能。并且,为了避免在值日备存储装置上形成写入瓶颈,可使第二种数据组织方式的写入速度快于第一种数据组织方式的写入速度;于是,虽然待存储数据的第一个副本是采用第一种数据组织方式被同时写入多个主存储装置,可提高写入速度,但是,数据的第二个副本写入值日备存储装置采用的是比第一中数据组织方式速度更快的第二种数据组织方式,这样,这样就可以取得平衡,尽量将数据的两个副本同时或接近同时写入主、备存储子系统,以提高整个存储系统的写入性能。另外,本发明实施例中,每个存储装置上都建立有文件系统,写入的存储装置的数据以一定的组织形式存在,可以反映数据结构层的特征。
综上,本发明实施例提供了一种数据存储方法,该方法采用上述技术特征,取得了以下技术效果:每个主存储装置和每个备存储装置上都创建有文件系统,数据分别以不同的数据组织方式写入主存储装置和备存储装置,能够满足多种应用、多种负载类型;其中,第一种和第二种数据组织方式是写优化的,可以提高数据的写入速度;第三种数据组织方式是读优化的,使得最终备存储装置上的数据大都以第三种组织方式存在,可具有较高的读出性能,以满足数据分析阶段的需求;从而使整个存储系统兼顾读写性能。并且,第二种数据组 织方式的写入速度快于第一种数据组织方式的写入速度,可避免值日备存储装置产生写入瓶颈。
本发明一些实施例中,所述方法还包括:判断值日备存储装置的值日日志空间的占用量是否达到预设值。
本发明实施例中,每个备存储装置是对应的主存储装置的镜像存储装置,将所有备存储装置上的空闲存储空间视为日志空间,将值日备存储装置所提供的日志空间称为值日日志空间。本实施例中,可以实时监控值日备存储装置的值日日志空间的使用情况,判断值日日志空间的占用量是否达到预设值。
如果接收到待存储数据时,判断值日日志空间的占用量未达到预设值,则正常执行步骤110,分别将数据的两个副本分别写入主存储子系统和值日备存储装置即可。
若判断值日日志空间的占用量达到预设值,则本实施例中,将所述值日备存储装置切换到休眠状态,并将另一个备存储装置切换到工作状态作为值日备存储装置。并且,切换值日日志备存储装置之后,可触发一个同步进程,该同步进程用于将对应的主存储装置中的数据同步到值日日志存储装置中。即,一些实施方式中,上述步骤120可在切换值日备存储装置时被触发执行,将值日备存储装置对应的主存储装置中存储的、与值日备存储装置不一致的数据读出,并将读出的不一致数据以第三种数据组织方式写入值日备存储装置。
该同步操作可以在计算机设备的后台执行,利用存储装置的空闲带宽和空闲空间,将读出的不一致数据同步到值日备存储装置中,从而不消耗额外能量。本发明实施例中,为了提高写入性能,上述写数据时采用的第一种和第二种数据组织方式可以选择具有较高写性能的数据组织方式;为了提高读出数据的性能,所说的第三种数据组织方式可以选择具有较高读性能的数据组织方式,以便后续备存储装置可以提供较高的读性能,而且,由于所说的同步操作是在后台进行的,因此可以不考虑第三种数据组织方式的写性能。
本发明一些实施例中,所述第一种数据组织方式可以选择有顺序日志结构,例如日志结构合并(Log Structured Merge,LSM)方式;所述第二种数据组织方式可以选择无顺序日志结构,例如日志结构文件系统(Log Structured  File System,LFS)方式;所述第三种数据组织方式可以选择本地更新索引结构,例如B+树方式。其中,LSM和LFS可提供较好的写性能,B+树可提供较好的读性能。且LFS写速度快于LSM的写速度。
为了实现上述的多种数据组织方式,本发明实施例中,在步骤110之前的初始化过程中,可以将每个主存储装置和每个备存储装置分别格式化为任意的文件系统;并且,在每个主存储装置上创建LSM系统,例如具体可以是键值存储系统LevelDB;在每个备存储装置中划分出两个存储区域,其中,在第一个存储区域创建LFS,第二个存储区域创建B+树系统,例如具体可以是键值存储系统BDB。
则上述步骤中,所述将待存储数据的第一个副本以第一种数据组织方式写入主存储子系统可包括:将待存储数据的第一个副本分割为多个部分,将所述多个部分分别写入所述主存储子系统的多个主存储装置的LevelDB中。所述将待存储数据的第二个副本以第二种数据组织方式写入备存储子系统中的值日备存储装置可包括:将待存储数据的第二个副本写入所述值日备存储装置的LFS中。所述将读出的不一致数据以第三种数据组织方式写入所述值日备存储装置可包括:将读出的不一致数据写入所述值日备存储装置的BDB中。
本发明实施例中,每个备存储装置中被划分出的两个存储区域中,第一个存储区域创建LFS,该存储区域用于在数据写入时使用;第二个存储区域创建BDB,该存储区域用于在数据同步时使用。由于多个备存储装置轮流用作值日备存储装置,在轮换过程中,第一个存储区域中以LFS方式写入的副本数据仅为临时数据,不停的被写入,且不停的被释放,因此第一个存储区域不需要太大;第二个存储区域用来保存从主存储装置同步过来的数据,是需要长期保存的数据,不停的被写入,但一般不被释放,因此第二个存储区域需要较大的存储空间,可以占用备存储装置的大部分空间。
最终,所有主存储装置上的数据都是以第一种数据组织方式存储,不仅反映数据结构层特征,而且具有较高的写入性能;备存储装置上需要长期保存的数据副本则都是以第三种数据组织方式存储,也能够反映数据结构层特征,而且,提供较高的读出性能。从而,当后续应用需要读取数据时,可以以较高的 读出速度从备存储装置中进行数据读取。
本发明实施例方法优选适用于面向云存储环境的存储系统。
以上,本发明实施例公开了一种数据存储方法,该方法采用在每个主存储装置和每个备存储装置上都创建有文件系统,以及,将待存储数据的第一个副本以第一种数据组织方式写入主存储子系统,将待存储数据的第二个副本以第二种数据组织方式写入备存储子系统中的值日备存储装置,将所述值日备存储装置对应的主存储装置中存储的、与所述值日备存储装置不一致的数据读出,并将读出的不一致数据以第三种数据组织方式写入所述值日备存储装置的技术方案,取得了以下技术效果:
1、由于只有值日备存储装置处于工作状态,其它备存储装置均处于不工作的低能耗状态,因此,可以降低能耗。
2、每个主存储装置和每个备存储装置上都创建有文件系统,数据在主、备存储装置中以一定的数据组织方式存储,能够反映数据结构层特征,且提高了重建速度。如果一块主存储装置失效,从对应的备存储装置恢复数据时,通过数据组织方式可以直接获知哪些数据是需要恢复的,从而不必对每个数据块进行识别,因此重建速度块。
3、数据分别以不同的数据组织方式写入主存储装置和备存储装置,能够满足多种应用、多种负载类型。例如,第一种和第二种数据组织方式是写优化的,可以提高数据的写入速度;第三种数据组织方式是读优化的,使得最终备存储装置上的数据大都以第三种组织方式存在,可具有较高的读出性能,以满足数据分析阶段的需求。并且,第二种数据组织方式的写入速度快于第一种数据组织方式的写入速度,可避免在值日备存储装置上产生写入瓶颈。
4、值日备存储装置对应的主存储装置中存储的、与所述值日备存储装置不一致的数据,是在后台利用存储装置的空闲带宽和空闲空间,同步到值日备存储装置中,在不消耗额外能量的情况下,实现了数据组织方式的转换。
5、数据以一定的数据组织方式写入主存储装置和备存储装置,是在数据结构层实现的,实现较为灵活,既可以实现在块设备上,也可以实现在节点上,例如,可用于磁盘阵列,也可用于节点阵列,并且,可实现于单节点多磁盘环 境,也可以实现于多节点分布式存储环境。
为便于更好的理解本发明实施例提供的技术方案,下面通过一个具体场景下的实施方式为例进行介绍。
本场景实施例中,以所说的存储系统是磁盘阵列RAID10为例。RAID10包括一组主磁盘和一组备磁盘(或者称为镜像磁盘)。一组N个主磁盘构成主存储子系统(或者称为主磁盘组),一组N个镜像磁盘构成备存储子系统(或者称为备磁盘组、镜像磁盘组)。
本实施例中,将所有的镜像磁盘均视为日志磁盘,将所有镜像磁盘上的空闲存储空间视为日志磁盘可提供的日志空间;在任意时间段内仅将一个日志磁盘保持在活动状态(即工作状态),响应写操作请求;保持在活动状态的镜像磁盘称为值日日志磁盘,值日日志磁盘所提供的日志空间称为值日日志空间。
如图2所示,本场景实施例包括以下步骤:
210、接收键值请求步骤:接收来自应用的键值请求,键值请求中携带待存储的数据(即键值数据),该键值请求具体可以是写入(PUT)、读出(GET)或删除(DELETE)等。
220、键值数据重定向步骤:将收到的键值请求重定向到目的磁盘上,即相应的主磁盘和值日日志磁盘上。
230、将待存储数据的第一个副本以LSM方式写入主磁盘组;
240、判断值日日志磁盘的值日日志空间的占用量是否达到预设值;
250、如果占用量达到预设值,切换值日日志磁盘,包括:将值日日志磁盘由活动的工作状态切换到休眠或待机状态,将另一个值日日志磁盘唤醒切换到工作状态;同时,值日日志磁盘的切换操作触发一个同步进程,同步进程中,将值日日志磁盘对应的主磁盘中存储的、与值日日志磁盘不一致的数据读出,并将读出的不一致数据以B+树方式写入值日日志磁盘;
260、如果占用量未达到预设值,将待存储数据的第二个副本以LFS方式写入备磁盘组的值日日志磁盘。
下面进一步详细描述。
如图3所示,假设磁盘阵列RAID10包括六块磁盘,其中,三块主磁盘,分别用P0、P1和P2表示,与这三块主磁盘对应的三块镜像磁盘分别用M0、M1和M2表示。相对应的两个磁盘成为镜像磁盘对,该RAID10包括三个镜像磁盘对,分别用(P0,M0),(P1,M1)和(P2,M2)表示。
图3中,圆柱体表示磁盘,圆柱体中黑色阴影部分表示磁盘中已被占用的存储空间,白色部分表示磁盘中尚未被占用的存储空间。假设M0,M1和M2这三个镜像磁盘上均各自有50%的空闲存储空间,即50%的日志空间。被带箭头的曲线连接起来的三个镜像磁盘M0、M1和M2被作为日志磁盘,该三个日志磁盘上的空闲存储空间,分别用散点和斜纹表示的部分,作为日志空间。用带箭头的曲线连接起来的散点和斜纹部分表示所有三个镜像磁盘的空闲存储空间构成的日志空间。散点所在的镜像磁盘为值日日志磁盘,而斜纹所在的磁盘为非值日日志磁盘。M0,M1和M2依次用作值日日志磁盘,即,在第0个日志周期,M0为值日日志磁盘;在第1个日志周期,M1为值日日志磁盘;在第2个日志周期,M2为值日日志磁盘;在第3个日志周期,M0重新为值日日志磁盘;依此类推。
如图4(a)所示,键值数据的键空间被切分成成等长的键值分段(Key Range,KR),分别标记为KR1、KR2、KR3、KR4、…、KRi,并被以轮转的方式分布在镜像磁盘对(P0,M0),(P1,M1)和(P2,M2)上。
如图4(b)、(c)和(d)所示,在日志周期T0内,M0被用作值日日志磁盘,在该日志周期T0内的新写入数据的第一个副本分作三部分,即D0T0、D1T0和D2T0,分别写入主磁盘P0、P1和P2;第二个副本D0T0、D1T0和D2T0,都将被写到镜像磁盘M0。类似地,当进入日志周期T1,新写入数据的第一个副本分作三部分,即D0T1、D1T1和D2T1,分别写入主磁盘P0、P1和P2;第二个副本D0T1、D1T1和D2T1,都将被写到镜像磁盘M1。以后,依次类推。图4所示的带箭头的虚线和数据布局展示了循环日志的基本原理。
每个新的日志周期开始时,值日日志磁盘的切换会触发一个同步进程。如图4(b)所示,在日志周期T0内,M0被选择作为值日日志磁盘,由于T0之前,第0个镜像磁盘对(P0,M0)之间不存在不一致的数据,因此,在T0内,镜像 磁盘对(P0,M0)之间无同步操作。在第1个日志周期T1内,M1被选择作为值日日志磁盘,由于T1之前,第1个镜像磁盘对(P1,M1)之间存在不一致的数据,即D1T0,因此,在T1开始时刻,镜像磁盘对(P1,M1)之间的同步过程被触发,数据D1T0被写入M1,同时,M0中存储D1T0的空间被释放,并且该同步过程在将不一致的数据全部同步完成之后才终止。依此类推,在日志周期T2内,M2被选择作为值日日志磁盘,并且在T2开始时刻,第2个镜像磁盘对(P2,M2)之间的同步过程被触发,数据D2T0和D2T1被写入M2,同时,M0和M1中存储D2T0和D2T1的空间被释放,并且该同步过程在将不一致的数据全部同步完成之后才终止。
图4(b)、(c)和(d)分别表示在T0、T1和T2三个日志周期结束时刻磁盘组上的键值数据分布情况。其中,DmTn代表在第n个日志周期Tn内写入第m个镜像磁盘对(Pm,Mm)的所有键值数据,本实施例中,m为0、1或2,n为大于等于0的自然数,空白方格表示主磁盘和镜像盘上尚未被占用的存储空间,带斜纹的方格表示磁盘上该区域所表示的存储空间已经被释放,带竖条纹的方格表示主磁盘上该逻辑区域对应的键值数据已经被同步更新到镜像磁盘中的B+树中。
当一个新的日志磁盘被选择作为值日日志磁盘的时候,一个新的同步过程就被触发,并且该新的同步过程只有当值日日志磁盘上所有不一致数据被更新完毕之后才会被终止。如图4(b)所示,在日志周期T0内,M0被选择作为值日日志磁盘,键值写操作请求到达镜像磁盘对(P0,M0)时,将键值写操作请求数据写到主磁盘P0的LSM数据结构中,经判断,如果T0内的值日日志磁盘M0上值日日志空间的占用量未超过预先设定的阈值T,此时将键值写操作请求数据以日志结构文件系统的方式顺序写到值日日志磁盘M0内;如果T0内的值日日志磁盘M0上值日日志空间的占用量超过预先设定的阈值T,此时将M0切换到低能耗的待机状态,选择M1作为新的值日日志磁盘,将M1切换到高能耗的活动状态,触发镜像磁盘对(P1,M1)之间的同步过程。依此类推,图4(c)和(d)显示,在T1和T2内,分别将(P1,M1)和(P2,M2)中未更新的键值数据从P1或P2的LSM中读出,并写入M1和M2的B+树中。在T1和T2结 束时刻,日志磁盘M2和M0分别被选择作为新的值日日志磁盘。
图4(c)和(d)中带箭头的实线和带斜纹的矩形方框分别表示了分散式同步过程和日志空间释放示意图。当日志磁盘M1被选择作为值日日志磁盘时,触发镜像磁盘对(P1,M1)中主磁盘P1和镜像磁盘M1之间的同步过程。当镜像磁盘对(P1,M1)之间的同步过程结束之后,M0上的D1T0所占用的存储空间被释放。类似地,镜像磁盘对(P2,M2)之间的同步过程在M2被选为值日日志磁盘时被触发。当镜像磁盘对(P2,M2)之间的同步过程结束之后,M0上的D2T0和M1上的D2T1所占用的存储空间被释放。
由于日志磁盘M0上大部分已被占用的日志空间分别在日志周期T1和T2内随着镜像磁盘对(P1,M1)和镜像磁盘对(P2,M2)之间的同步过程被释放,日志磁盘M0能够再次被选择作为值日日志磁盘。依此类推,日志磁盘M1和M2上大部分已被占用的日志空间分别随着镜像磁盘对(P0,M0)、(P2,M2)和镜像磁盘对(P0,M0)、(P1,M1)之间的同步过程被释放,因此,M1和M2也能够再次被选择作为值日日志磁盘。
本发明实施例中,为了使磁盘阵列能够支持本发明实例方法,需要预先对磁盘阵列进行初始化处理,包括:
初始化过程中,将每个主磁盘和每个备存磁盘分别格式化为任意的文件系统;并且,在每个主磁盘上创建LSM系统,例如具体可以是键值存储系统LevelDB;在每个备磁盘中划分出两个存储区域,其中,在第一个存储区域创建LFS;第二个存储区域创建B+树系统,例如具体可以是键值存储系统BDB。使得后续能够:将待存储数据的第一个副本写入所述主存储子系统的LevelDB中,将待存储数据的第二个副本写入值日备存储装置的LFS中,以及,在同步过程中,将读出的不一致数据写入值日备存储装置的BDB中。
以上,本实施例以磁盘阵列为例进行了说明,但需要理解,在其它实施例中,所说的存储系统不限于磁盘阵列,磁盘阵列中的磁盘可以是裸磁盘,也可以是格式化为特定文件系统之后的磁盘;所说的存储系统也可以是节点阵列,且节点阵列可以适用于多节点分布式环境。
另外,前文所述的第一种、第二种和第三种数据组织方式可以根据需要随 意选择,例如,第一种数据组织方式可以选择LSM,但是也可以选择B+树,或者其它组织方式;第二种数据组织方式可以选择LFS或B+树;第三种数据组织方式可以选择B+树或LSM;此处不再详细赘述。
以上,本发明实施例公开了一种数据存储方法,取得了以下技术效果:
1、由于只有值日备存储装置处于工作状态,其它备存储装置均处于不工作的低能耗状态,因此,可以降低能耗。
2、每个主存储装置和每个备存储装置上都创建有文件系统,数据在主、备存储装置中都以一定的数据组织方式存储,能够反映数据结构层特征,且提高了重建速度。如果一块主存储装置失效,从对应的备存储装置恢复数据时,通过数据组织方式可以直接获知哪些数据是需要恢复的,从而不必对每个数据块进行识别,因此重建速度块。
3、数据分别以不同的数据组织方式写入主存储装置和备存储装置,能够满足多种应用、多种负载类型。例如,第一种和第二种数据组织方式是写优化的,可以提高数据的写入速度;第三种数据组织方式是读优化的,使得最终备存储装置上的数据大都以第三种组织方式存在,可具有较高的读出性能,以满足数据分析阶段的需求。并且,第二种数据组织方式的写入速度快于第一种数据组织方式的写入速度,可避免在值日备存储装置上产生写入瓶颈。
4、值日备存储装置对应的主存储装置中存储的、与所述值日备存储装置不一致的数据,是在后台利用存储装置的空闲带宽和空闲空间,同步到值日备存储装置中,在不消耗额外能量的情况下,实现了数据组织方式的转换。
5、数据以一定的数据组织方式写入主存储装置和备存储装置,是在数据结构层实现的,实现较为灵活,既可以实现在块设备上,也可以实现在节点上,例如,可用于磁盘阵列,也可用于节点阵列,并且,可实现于单节点多磁盘环境,也可以实现于多节点分布式存储环境。
为了更好的实施本发明实施例的上述方案,下面还提供用于配合实施上述方案的相关装置。
请参考图5,本发明实施例提供一种数据存储设备500。该设备用于存储系统,所述存储系统包括主存储子系统和备存储子系统;所述主存储子系统包括 N个主存储装置,所述备存储子系统包括对应于所述N个主存储装置的N个备存储装置;N为大于1的正整数;其中每个主存储装置和每个备存储装置上都创建有文件系统。所述设备500可包括:
第一读写模块510,用于将待存储数据的第一个副本以第一种数据组织方式写入主存储子系统中的多个主存储装置;
第二读写模块520,用于将待存储数据的第二个副本以第二种数据组织方式写入备存储子系统中的值日备存储装置,所述值日备存储装置是唯一处于工作状态的备存储装置;
第三读写模块530,用于将所述值日备存储装置对应的主存储装置中存储的、与所述值日备存储装置不一致的数据读出,并将读出的不一致数据以第三种数据组织方式写入所述值日备存储装置;
其中,第一种数据组织方式和第二种数据组织方式是写优化的,且第二种数据组织方式的写入速度快于第一种数据组织方式的写入速度,第三种数据组织方式则是读优化的。
如图6所示,本发明一些实施例中,所述设备还可以包括:
存储装置监控模块540,用于判断值日备存储装置的值日日志空间的占用量是否达到预设值;若占用量达到预设值,则将所述值日备存储装置切换到休眠状态,并将另一个备存储装置切换到工作状态作为值日备存储装置。
如图6所示,本发明另一些实施例中,所述设备还可以包括:
创建模块550,用于在每个主存储装置上创建键值存储系统LevelDB;在每个备存储装置中划分出两个存储区域,其中,在第一个存储区域创建LFS,第二个存储区域创建键值存储系统BDB。
本发明另一些实施例中,所述第一读写模块具体可用于将待存储数据的第一个副本分割为多个部分,将所述多个部分分别写入多个主存储装置的LevelDB中;所述第二读写模块具体可用于将待存储数据的第二个副本写入所述值日备存储装置的LFS中;所述第三读写模块具体可用于将读出的不一致数据写入所述值日备存储装置的BDB中。
本发明实施例的数据存储设备例如可以是包括磁盘阵列的计算机设备,或 者管理节点阵列的网络设备。
可以理解,本发明实施例的数据存储设备的各个功能模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可参照上述方法实施例中的相关描述,此处不再赘述。
由上可见,在本发明的一些可行的实施方式中,采用在每个主存储装置和每个备存储装置上都创建有文件系统,以及,将待存储数据的第一个副本以第一种数据组织方式写入主存储子系统,将待存储数据的第二个副本以第二种数据组织方式写入备存储子系统中的值日备存储装置,将所述值日备存储装置对应的主存储装置中存储的、与所述值日备存储装置不一致的数据读出,并将读出的不一致数据以第三种数据组织方式写入所述值日备存储装置的技术方案,取得了以下技术效果:
1、由于只有值日备存储装置处于工作状态,其它备存储装置均处于不工作的低能耗状态,因此,可以降低能耗。
2、每个主存储装置和每个备存储装置上都创建有文件系统,数据在主、备存储装置中以一定的数据组织方式存储,能够反映数据结构层特征,且提高了重建速度。如果一块主存储装置失效,从对应的备存储装置恢复数据时,通过数据组织方式可以直接获知哪些数据是需要恢复的,从而不必对每个数据块进行识别,因此重建速度块。
3、数据分别以不同的数据组织方式写入主存储装置和备存储装置,能够满足多种应用、多种负载类型。例如,第一种和第二种数据组织方式是写优化的,可以提高数据的写入速度;第三种数据组织方式是读优化的,使得最终备存储装置上的数据大都以第三种组织方式存在,可具有较高的读出性能,以满足数据分析阶段的需求。并且,第二种数据组织方式的写入速度快于第一种数据组织方式的写入速度,可避免在值日备存储装置上产生写入瓶颈。
4、值日备存储装置对应的主存储装置中存储的、与所述值日备存储装置不一致的数据,是在后台利用存储装置的空闲带宽和空闲空间,同步到值日备存储装置中,在不消耗额外能量的情况下,实现了数据组织方式的转换。
5、数据以一定的数据组织方式写入主存储装置和备存储装置,是在数据 结构层实现的,实现较为灵活,既可以实现在块设备上,也可以实现在节点上,例如,可用于磁盘阵列,也可用于节点阵列,并且,可实现于单节点多磁盘环境,也可以实现于多节点分布式存储环境。
本发明实施例还提供一种计算机存储介质,该计算机存储介质可存储有程序,该程序执行时包括上述方法实施例中记载的数据存储方法的部分或全部步骤。
请参考图7,本发明实施例还提供一种计算机设备700,可包括:
处理器710,存储器720,通信接口730,总线740;所述处理器710,存储器720,通信接口730通过所述总线740相互的通信;所述通信接口730,用于接收和发送数据;所述存储器720用于存储程序;所述处理器710用于执行所述存储器中的所述程序。所述存储器720,可包括主存储子系统和备存储子系统;所述主存储子系统包括N个主存储装置,所述备存储子系统包括对应于所述N个主存储装置的N个备存储装置;N为大于1的正整数;其中每个主存储装置和每个备存储装置上都创建有文件系统。所说的主、备存储装置都可以是磁盘。
其中,处理器710用于将待存储数据的第一个副本以第一种数据组织方式写入主存储子系统中的多个主存储装置,将待存储数据的第二个副本以第二种数据组织方式写入备存储子系统中的值日备存储装置,所述值日备存储装置是唯一处于工作状态的备存储装置;将所述值日备存储装置对应的主存储装置中存储的、与所述值日备存储装置不一致的数据读出,并将读出的不一致数据以第三种数据组织方式写入所述值日备存储装置;其中,第一种数据组织方式和第二种数据组织方式是写优化的,且第二种数据组织方式的写入速度快于第一种数据组织方式的写入速度,第三种数据组织方式则是读优化的。
在本发明的一些实施例中,处理器710还用于判断值日备存储装置的值日日志空间的占用量是否达到预设值;若占用量达到预设值,则将所述值日备存储装置切换到休眠状态,并将另一个备存储装置切换到工作状态作为值日备存储装置。
在本发明的一些实施例中,所述第一种数据组织方式是日志结构合并LSM方式;所述第二种数据组织方式是日志结构文件系统LFS方式;所述第三中数据组织方式是B+树方式。
在本发明的一些实施例中,处理器710还用于在每个主存储装置上创建键值存储系统LevelDB;在每个备存储装置中划分出两个存储区域,其中,在第一个存储区域创建LFS,第二个存储区域创建键值存储系统BDB。
在本发明的一些实施例中,处理器710具体用于将待存储数据的第一个副本分割为多个部分,将所述多个部分分别写入多个主存储装置的LevelDB中;将待存储数据的第二个副本写入所述值日备存储装置的LFS中;将读出的不一致数据写入所述值日备存储装置的BDB中。
可以理解,本发明实施例的计算机设备的各个功能模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可参照上述方法实施例中的相关描述,此处不再赘述。
由上可见,在本发明的一些可行的实施方式,取得了以下技术效果:
1、由于只有值日备存储装置处于工作状态,其它备存储装置均处于不工作的低能耗状态,因此,可以降低能耗。
2、每个主存储装置和每个备存储装置上都创建有文件系统,数据在主、备存储装置中以一定的数据组织方式存储,能够反映数据结构层特征,且提高了重建速度。如果一块主存储装置失效,从对应的备存储装置恢复数据时,通过数据组织方式可以直接获知哪些数据是需要恢复的,从而不必对每个数据块进行识别,因此重建速度块。
3、数据分别以不同的数据组织方式写入主存储装置和备存储装置,能够满足多种应用、多种负载类型。例如,第一种和第二种数据组织方式是写优化的,可以提高数据的写入速度;第三种数据组织方式是读优化的,使得最终备存储装置上的数据大都以第三种组织方式存在,可具有较高的读出性能,以满足数据分析阶段的需求。并且,第二种数据组织方式的写入速度快于第一种数据组织方式的写入速度,可避免在值日备存储装置上产生写入瓶颈。
4、值日备存储装置对应的主存储装置中存储的、与所述值日备存储装置 不一致的数据,是在后台利用存储装置的空闲带宽和空闲空间,同步到值日备存储装置中,在不消耗额外能量的情况下,实现了数据组织方式的转换。
5、数据以一定的数据组织方式写入主存储装置和备存储装置,是在数据结构层实现的,实现较为灵活,既可以实现在块设备上,也可以实现在节点上,例如,可用于磁盘阵列,也可用于节点阵列,并且,可实现于单节点多磁盘环境,也可以实现于多节点分布式存储环境。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详细描述的部分,可以参见其它实施例的相关描述。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述动作顺序的限制,因为依据本发明,某些步骤可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:ROM、RAM、磁盘或光盘等。
以上对本发明实施例所提供的一种数据存储方法和设备进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。

Claims (11)

  1. 一种数据存储方法,其特征在于,用于存储系统,所述存储系统包括主存储子系统和备存储子系统;所述主存储子系统包括N个主存储装置,所述备存储子系统包括对应于所述N个主存储装置的N个备存储装置;N为大于1的正整数;其中每个主存储装置和每个备存储装置上都创建有文件系统;
    所述方法包括:
    将待存储数据的第一个副本以第一种数据组织方式写入主存储子系统中的多个主存储装置,将待存储数据的第二个副本以第二种数据组织方式写入备存储子系统中的值日备存储装置,所述值日备存储装置是唯一处于工作状态的备存储装置;
    将所述值日备存储装置对应的主存储装置中存储的、与所述值日备存储装置不一致的数据读出,并将读出的不一致数据以第三种数据组织方式写入所述值日备存储装置;
    其中,第一种数据组织方式和第二种数据组织方式是写优化的,且第二种数据组织方式的写入速度快于第一种数据组织方式的写入速度,第三种数据组织方式则是读优化的。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    判断值日备存储装置的值日日志空间的占用量是否达到预设值;
    若占用量达到预设值,则将所述值日备存储装置切换到休眠状态,并将另一个备存储装置切换到工作状态作为值日备存储装置。
  3. 根据权利要求1所述的方法,其特征在于:
    所述第一种数据组织方式是日志结构合并LSM方式;
    所述第二种数据组织方式是日志结构文件系统LFS方式;
    所述第三种数据组织方式是B+树方式。
  4. 根据权利要求3所述的方法,其特征在于,所述将待存储数据的第一个副本以第一种数据组织方式写入主存储子系统之前,还包括:
    在每个主存储装置上创建键值存储系统LevelDB;
    在每个备存储装置中划分出两个存储区域,其中,在第一个存储区域创建LFS,第二个存储区域创建键值存储系统BDB。
  5. 根据权利要求4所述的方法,其特征在于:
    所述将待存储数据的第一个副本以第一种数据组织方式写入主存储子系统中的多个主存储装置包括:将待存储数据的第一个副本分割为多个部分,将所述多个部分分别写入多个主存储装置的LevelDB中;
    所述将待存储数据的第二个副本以第二种数据组织方式写入备存储子系统中的值日备存储装置包括:将待存储数据的第二个副本写入所述值日备存储装置的LFS中;
    所述将读出的不一致数据以第三种数据组织方式写入所述值日备存储装置包括:将读出的不一致数据写入所述值日备存储装置的BDB中。
  6. 根据权利要求1至5中任一所述的方法,其特征在于:
    所述存储系统为磁盘阵列或节点阵列。
  7. 一种数据存储设备,其特征在于,用于存储系统,所述存储系统包括主存储子系统和备存储子系统;所述主存储子系统包括N个主存储装置,所述备存储子系统包括对应于所述N个主存储装置的N个备存储装置;N为大于1的正整数;其中每个主存储装置和每个备存储装置上都创建有文件系统;所述设备包括:
    第一读写模块,用于将待存储数据的第一个副本以第一种数据组织方式写入主存储子系统中的多个主存储装置;
    第二读写模块,用于将待存储数据的第二个副本以第二种数据组织方式写入备存储子系统中的值日备存储装置,所述值日备存储装置是唯一处于工作状态的备存储装置;
    第三读写模块,用于将所述值日备存储装置对应的主存储装置中存储的、与所述值日备存储装置不一致的数据读出,并将读出的不一致数据以第三种数据组织方式写入所述值日备存储装置;
    其中,第一种数据组织方式和第二种数据组织方式是写优化的,且第二种数据组织方式的写入速度快于第一种数据组织方式的写入速度,第三种数据组 织方式则是读优化的。
  8. 根据权利要求7所述的设备,其特征在于,还包括:
    存储装置监控模块,用于判断值日备存储装置的值日日志空间的占用量是否达到预设值;若占用量达到预设值,则将所述值日备存储装置切换到休眠状态,并将另一个备存储装置切换到工作状态作为值日备存储装置。
  9. 根据权利要求7所述的设备,其特征在于,还包括:
    创建模块,用于在每个主存储装置上创建键值存储系统LevelDB;在每个备存储装置中划分出两个存储区域,其中,在第一个存储区域创建LFS,第二个存储区域创建键值存储系统BDB。
  10. 根据权利要求9所述的设备,其特征在于:
    所述第一读写模块具体用于将待存储数据的第一个副本分割为多个部分,将所述多个部分分别写入多个主存储装置的LevelDB中;
    所述第二读写模块具体用于将待存储数据的第二个副本写入所述值日备存储装置的LFS中;
    所述第三读写模块具体用于将读出的不一致数据写入所述值日备存储装置的BDB中。
  11. 一种计算机设备,其特征在于,包括:
    处理器,存储器,通信接口,总线;所述处理器,存储器,通信接口通过所述总线相互的通信;所述存储器,包括主存储子系统和备存储子系统;所述主存储子系统包括N个主存储装置,所述备存储子系统包括对应于所述N个主存储装置的N个备存储装置;N为大于1的正整数;其中每个主存储装置和每个备存储装置上都创建有文件系统;
    其中,所述处理器用于将待存储数据的第一个副本以第一种数据组织方式写入主存储子系统中的多个主存储装置,将待存储数据的第二个副本以第二种数据组织方式写入备存储子系统中的值日备存储装置,所述值日备存储装置是唯一处于工作状态的备存储装置;将所述值日备存储装置对应的主存储装置中存储的、与所述值日备存储装置不一致的数据读出,并将读出的不一致数据以第三种数据组织方式写入所述值日备存储装置;其中,第一种数据组织方式和 第二种数据组织方式是写优化的,且第二种数据组织方式的写入速度快于第一种数据组织方式的写入速度,第三种数据组织方式则是读优化的。
PCT/CN2015/077214 2014-04-30 2015-04-22 一种数据存储方法和设备 WO2015165351A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410182608.0A CN105094761B (zh) 2014-04-30 2014-04-30 一种数据存储方法和设备
CN201410182608.0 2014-04-30

Publications (1)

Publication Number Publication Date
WO2015165351A1 true WO2015165351A1 (zh) 2015-11-05

Family

ID=54358160

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/077214 WO2015165351A1 (zh) 2014-04-30 2015-04-22 一种数据存储方法和设备

Country Status (2)

Country Link
CN (1) CN105094761B (zh)
WO (1) WO2015165351A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111198783B (zh) * 2018-11-16 2023-04-07 阿里巴巴集团控股有限公司 数据存取方法、装置、系统、设备及存储介质
CN111309267B (zh) * 2020-02-26 2023-10-03 Oppo广东移动通信有限公司 存储空间的分配方法、装置、存储设备及存储介质
CN113704261B (zh) * 2021-08-26 2024-05-24 平凯星辰(北京)科技有限公司 基于云存储的键值存储系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090164718A1 (en) * 2007-12-21 2009-06-25 Fujitsu Limited Disk array device control method
CN101840315A (zh) * 2010-06-17 2010-09-22 华中科技大学 一种磁盘阵列的数据组织方法
CN103645859A (zh) * 2013-11-19 2014-03-19 华中科技大学 一种虚拟ssd与ssd异构镜像的磁盘阵列缓存方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090164718A1 (en) * 2007-12-21 2009-06-25 Fujitsu Limited Disk array device control method
CN101840315A (zh) * 2010-06-17 2010-09-22 华中科技大学 一种磁盘阵列的数据组织方法
CN103645859A (zh) * 2013-11-19 2014-03-19 华中科技大学 一种虚拟ssd与ssd异构镜像的磁盘阵列缓存方法

Also Published As

Publication number Publication date
CN105094761A (zh) 2015-11-25
CN105094761B (zh) 2018-06-15

Similar Documents

Publication Publication Date Title
US12105678B2 (en) Cloned virtual machine disk replication
CN103503414B (zh) 一种计算存储融合的集群系统
US10545994B2 (en) Data replication method and storage system
US11262933B2 (en) Sharing memory resources between asynchronous replication workloads
CN111158587B (zh) 基于存储池虚拟化管理的分布式存储系统及数据读写方法
US7778960B1 (en) Background movement of data between nodes in a storage cluster
US9286261B1 (en) Architecture and method for a burst buffer using flash technology
US8010829B1 (en) Distributed hot-spare storage in a storage cluster
US8521685B1 (en) Background movement of data between nodes in a storage cluster
US10621058B2 (en) Moving a consistency group having a replication relationship
US11409708B2 (en) Gransets for managing consistency groups of dispersed storage items
CN106888116B (zh) 一种双控制器集群共享资源的调度方法
CN104424052A (zh) 一种自动冗余的分布式存储系统及方法
WO2015165351A1 (zh) 一种数据存储方法和设备
CN102820998B (zh) 实现面向办公应用的双机容错服务系统及其数据存储方法
CN103428288A (zh) 基于分区状态表和协调节点的副本同步方法
US11550755B2 (en) High performance space efficient distributed storage
US20070245081A1 (en) Storage system and performance tuning method thereof
US11449398B2 (en) Embedded container-based control plane for clustered environment
CN111400098B (zh) 一种副本管理方法、装置、电子设备及存储介质
JP5278254B2 (ja) ストレージシステム、データ記憶方法及びプログラム
Qu et al. The study of mixed storage scheme of private cloud platform based on Ceph
Hung et al. CSM-DBEN: Container Storage Manager for Data Backup on Edge Nodes
Zhao et al. CAWRM: A remote mirroring system based on AoDI volume
CN105844178A (zh) 一种jbod海量存储数据安全的方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15786357

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15786357

Country of ref document: EP

Kind code of ref document: A1