CN108491290B - Data writing method and device - Google Patents

Data writing method and device Download PDF

Info

Publication number
CN108491290B
CN108491290B CN201810264205.9A CN201810264205A CN108491290B CN 108491290 B CN108491290 B CN 108491290B CN 201810264205 A CN201810264205 A CN 201810264205A CN 108491290 B CN108491290 B CN 108491290B
Authority
CN
China
Prior art keywords
blocks
data block
original data
group
vobject
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810264205.9A
Other languages
Chinese (zh)
Other versions
CN108491290A (en
Inventor
马海兵
陈钊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou H3C Technologies Co Ltd
Original Assignee
Hangzhou H3C Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou H3C Technologies Co Ltd filed Critical Hangzhou H3C Technologies Co Ltd
Priority to CN201810264205.9A priority Critical patent/CN108491290B/en
Publication of CN108491290A publication Critical patent/CN108491290A/en
Application granted granted Critical
Publication of CN108491290B publication Critical patent/CN108491290B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1076Parity data used in redundant arrays of independent storages, e.g. in RAID systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data writing method and a device, wherein the method comprises the following steps: when a write-in request aiming at a target file is received, dividing the target file into original data blocks according to the size of a preset block, and calculating at least one check data block by utilizing a parity check algorithm according to the original data blocks; writing the original data block and the check data block corresponding to the target file into a plurality of continuous free blocks in an object group concurrently; and mapping each object in the object group to a different object storage device OSD. By applying the embodiment of the invention, the problem of write punishment of Ceph EC can be solved, and the data write-in performance of Ceph is improved.

Description

Data writing method and device
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data writing method and apparatus.
Background
The Ceph (distributed storage system) is a distributed storage system with excellent performance, high reliability and high expansibility, and is widely applied to various large, medium and small storage environments.
Currently, Ceph mainly adopts a multi-copy policy to ensure the reliability of data, but the multi-copy policy reduces the disk utilization of Ceph, for example, the disk utilization of 2 copies is 50%, and the disk utilization of 3 copies is 33%.
In order to improve the utilization rate of the disk, Ceph proposes an Erasure Coding (EC) strategy.
The erasure code algorithm divides the written data into K parts of original data, M parts of check data are calculated through the K parts of original data, and all the original data can be restored through any K parts of data in the K + M parts of original data.
However, practice shows that when part of original data in a stripe is modified in the existing Ceph EC scheme, original data that is not modified in the stripe needs to be read first, check data is recalculated, and the modified data and the newly calculated check block are written into corresponding positions in the stripe, that is, there is a write penalty.
Disclosure of Invention
The invention provides a data writing method and a data writing device, which are used for solving the problems that the existing Ceph EC scheme does not support overwriting and has writing punishment.
According to a first aspect of the present invention, there is provided a data writing method applied to a storage node in a distributed object storage system, the method including:
when a write-in request aiming at a target file is received, dividing the target file into original data blocks according to the size of a preset block, and calculating at least one check data block by utilizing a parity check algorithm according to the original data blocks;
writing the original data block and the check data block corresponding to the target file into a plurality of continuous free blocks in an object group concurrently; the object group comprises at least three objects, each object in the object group comprises equal-size blocks, and the continuous idle blocks belong to the objects in the object group in sequence;
and mapping each object in the object group to a different object storage device OSD.
According to a second aspect of the present invention, there is provided a data writing apparatus applied to a storage node in a distributed object storage system, the apparatus comprising:
a receiving unit configured to receive a write request;
the dividing unit is used for dividing the target file into original data blocks according to the size of a preset block when the receiving unit receives a writing request aiming at the target file;
a check unit for calculating at least one check data block using a parity check algorithm according to the original data block;
a writing unit, configured to concurrently write an original data block and a check data block corresponding to the target file into a plurality of consecutive free blocks in an object group; the object group comprises at least three objects, each object in the object group comprises equal-size blocks, and the continuous idle blocks belong to the objects in the object group in sequence;
and the mapping unit is used for mapping each object in the object group to different object storage devices OSD.
By applying the technical scheme disclosed by the invention, when a write-in request aiming at the target file is received, the target file is divided into original data blocks according to the size of a preset block, and at least one check data block is calculated by utilizing a parity check algorithm; furthermore, the original data block and the check data block corresponding to the target file can be concurrently written into a plurality of continuous free blocks in the object group, and each object in the object group is mapped to different OSD, so that the problem of write penalty of Ceph EC is solved, and the data writing performance of Ceph is improved.
Drawings
Fig. 1 is a schematic flow chart of a data writing method according to an embodiment of the present invention;
FIG. 2 is a diagram of an object group according to an embodiment of the present invention;
FIGS. 3-7 are schematic diagrams of data write objects according to embodiments of the present invention;
FIG. 8 is a schematic structural diagram of a data writing apparatus according to an embodiment of the present invention;
FIG. 9 is a schematic structural diagram of another data writing apparatus according to an embodiment of the present invention;
FIG. 10 is a schematic structural diagram of another data writing apparatus according to an embodiment of the present invention;
FIG. 11 is a schematic structural diagram of another data writing apparatus according to an embodiment of the present invention;
fig. 12 is a schematic hardware structure diagram of a data writing apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions in the embodiments of the present invention better understood and make the above objects, features and advantages of the embodiments of the present invention more comprehensible, the technical solutions in the embodiments of the present invention are described in further detail below with reference to the accompanying drawings.
Referring to fig. 1, a schematic flow chart of a data writing method according to an embodiment of the present invention is provided, where the data writing method may be applied to a storage node in a distributed object storage system (e.g., Ceph, which is hereinafter referred to as Ceph for example), and as shown in fig. 1, the data writing method may include the following steps:
step 101, when a write request for a target file is received, dividing the target file into original data blocks according to a preset block size, and calculating at least one check data block according to the original data blocks by using a parity check algorithm.
In the embodiment of the present invention, in order to improve the utilization rate of a disk on the premise of ensuring data reliability, a reduce Arrays of Independent Disks Z (Independent disk) mechanism of ZFS (Zettabyte File System) may be used to replace a Ceph EC redundancy policy, and the ZFS provides block storage for the Ceph externally.
The method for replacing the Ceph EC redundancy policy with the RAIDZ mechanism of ZFS may include, but is not limited to: replacing vdev (Virtual Device) in ZFS (Virtual Device) by using RBD (radial independent Distributed Object Store) in Ceph, dividing original data blocks of data to be written according to the size of a preset Block when data is required to be written, calculating a check data Block, and writing the check data Block into objects in each RBD; or replacing vdev in ZFS with objects in Ceph, dividing original data blocks of data to be written according to the size of a preset block when data writing is needed, calculating check data blocks, and writing the check data blocks into the objects respectively.
The original data block and the check data block corresponding to the same written data need to be written into different objects, the preset number of objects for writing the original data block and the check data block can form an object group, and the objects in the same object group are divided into blocks according to the preset block size.
The preset number is more than or equal to three, and each object in the object group comprises equal-size blocks.
For example, as shown in fig. 2, in this example, objects 1 through 89n are numbers of objects in an object group, and 0 through N are LBA (Logical Block Address) addresses of blocks of the objects in the object group, and when data writing is required to be performed on the object group, the data writing may be performed in the order from small to large and the LBA addresses from small to large, that is, first writing LBA0 of object1, then LBA0 … of object2, and finally LBA N of object N.
It should be noted that, in the embodiment of the present invention, as shown in fig. 2, an object group formed by a preset number of objects may be equivalent to M1vdev formed by a preset number of vdev in the ZFS RAIDZ scheme, an addressing manner of each block of an object in the object group may refer to an addressing manner of each block in vdev included in M1vdev, and specific implementation details of the object group responding to the data read/write request are similar to those of the M1vdev responding to the data read/write request, which is not described herein in detail in the embodiment of the present invention.
In an embodiment of the present invention, when a storage node receives a file (hereinafter referred to as a target file) write request, the storage node may divide the target file into data blocks (hereinafter referred to as original data blocks) according to a preset block size. The target file may include newly written data or modified data corresponding to previously written data.
For example, assuming that the predetermined block size is 512K and the target file size is 2M, the target file may be divided into 4 original data blocks.
In the embodiment of the invention, after the storage node divides the target file into the original data blocks, at least one check data block can be calculated by utilizing a parity check algorithm according to the original data blocks.
In the embodiment of the present invention, the number of the check data blocks may be one, two, or three.
Preferably, when the number of check data blocks is one, the number of objects in the object group is at least three; when the number of the check data blocks is two, the number of the objects in the object group is at least four; when the number of parity data blocks is three, the number of objects in the object group is at least five.
Step 102, writing an original data block and a check data block corresponding to a target file into a plurality of continuous free blocks in an object group concurrently; wherein the continuous plurality of free blocks sequentially belong to each object in the object group.
In the embodiment of the present invention, after the storage node determines the original data block and the check data block corresponding to the target file, the original data block and the check data block corresponding to the target file may be written into the object group.
In order to improve data reliability, it is necessary to ensure that the original data block and the check data block corresponding to the target file are located in different objects as much as possible.
Preferably, the sum of the number of original data blocks and check data blocks written each time is less than or equal to the number of objects in the object group.
In the embodiment of the present invention, when the storage node writes the original data block and the check data block corresponding to the target file into the object group, the original data block and the check data block corresponding to the target file may be concurrently written into a plurality of consecutive free blocks in the object group, that is, the entire stripe formed by the original data block and the check data block corresponding to the target file is written at one time.
It should be noted that, in the embodiment of the present invention, because the size of the target file is not fixed, the length of a stripe formed by the original data block and the check data block corresponding to the target file is also not fixed; in addition, because the original data blocks and the check data blocks corresponding to the target file are continuous free blocks in the write-once object group, the target file is successfully written only when each original data block or check data block is successfully written, or the target file is considered to be completely unwritten, so that when the storage node is suddenly powered off, the target file is in a write-successful state or an unwritten state, the situation of data inconsistency cannot occur, and the consistency of data is ensured.
The writing time of the target file is consistent with the writing time of each original data block corresponding to the target file and the data block (including the original data block or the check data block) which is written slowest in the check data blocks.
For example, referring to fig. 3, assuming that the object group includes 5 objects (assuming that the objects are respectively objects), the original data blocks corresponding to the target file are D1, D2 and D3, the check data block is P4, the first free data block in the object group is the block with the address of LBA2 in the object5, then the storage node can concurrently write the block with the address of LBA2 in the object5, the block with the address of LBA3 in the object1, the block with the address of LBA3 in the object2, and the block with the address of LBA3 in the object3 in P4, D1, D2 and D3.
Correspondingly, in the embodiment of the invention, when data modification is required, the storage node only needs to divide the modified data into a plurality of original data blocks according to the size of the preset block, and calculates at least one check data block by using a parity check algorithm according to the original data blocks, and then writes the obtained original data blocks and check data blocks into a plurality of continuous idle blocks in the obj cet group, thereby avoiding the write penalty caused by reading first and then writing when the data modification is required.
Step 103, mapping each object in the object group to a different OSD.
In the embodiment of the present invention, in order to ensure the reliability of data, each Object in the Object group needs to be mapped to a different OSD (Object-based Storage Device).
Specifically, in the embodiment of the present invention, the storage node may use different objects in the same object group as different copies of the same object, and map the different objects in the object group to different OSDs by using a Controlled Replication Under extensible hash (CRUSH) algorithm.
It can be seen that, in the method flow shown in fig. 1, a RAIDZ mechanism of ZFS is used to replace a Ceph EC redundancy policy, Ceph objects are used as vedv in RAIDZ in ZFS, an object group is formed by a preset number of objects, block storage is provided for Ceph externally through ZFS, and on the premise of ensuring data reliability and disk utilization, the write penalty problem of Ceph EC is solved by using RAIDZ; in addition, when the object file is written into the object group, the Ceph data writing performance is improved by adopting a concurrent writing mode.
In an embodiment of the present invention, the writing the original data block and the check data block corresponding to the target file into the consecutive free blocks in the object group concurrently may include:
if the used capacity of the current object group reaches a first preset threshold value and the total capacity of the allocated object groups does not reach a preset maximum capacity, allocating a new object group, and concurrently writing the original data block and the check data block corresponding to the target file into a plurality of continuous free blocks in the newly allocated object group.
In this embodiment, considering that the object supported by Ceph has limited capacity (currently, Ceph supports 2G maximum objects), the capacity of a single object group is insufficient to meet the requirement of data storage, and therefore, dynamic capacity expansion needs to be supported during data writing.
Accordingly, in this embodiment, before writing the original data block and the check data block corresponding to the target file, it may be determined whether the used capacity of the current object group reaches a first preset threshold (which may be set according to an actual scene), and whether the total capacity of the allocated object groups reaches a preset maximum capacity (which may be set according to an actual scene).
The current object group may refer to an object group written in the last data writing; for the first data write, the current object group is the initially allocated object group.
In this embodiment, if the used capacity of the current object group reaches the first preset threshold and the total capacity of the allocated object groups does not reach the preset maximum capacity, a new object group is allocated, and the original data block and the check data block corresponding to the target file are concurrently written into a plurality of consecutive free blocks in the newly allocated object group, so as to achieve dynamic balance of data.
For example, still taking the scenario shown in fig. 3 as an example, assuming that the storage node needs to write the original data blocks and the check data blocks (i.e., D1, D2, D3 and P4) corresponding to the target file into the object group, it is detected that the used capacity of the object group reaches the threshold value, and the total capacity of the allocated object groups does not reach the preset maximum capacity, the storage node may allocate new object groups (i.e., object 6-object 10), a schematic diagram of this can be seen in fig. 4, at which point, the storage node can compute weights from the remaining available space of all objects, the subsequent data are reasonably distributed according to the weight to achieve dynamic balance, so that the subsequent data are preferentially written into obj6-obj10, that is, the storage node can concurrently write P4, D1, D2, and D3 to a block of object6 addressed as LBA0, a block of object7 addressed as LBA0, a block of object8 addressed as LBA0, and a block of object9 addressed as LBA 0.
Further, in this embodiment, if the used capacity of the current object group reaches the first preset threshold and the total capacity of the allocated object groups reaches the preset maximum capacity, the storage node may concurrently write the original data block and the check data block corresponding to the target file into a plurality of consecutive free blocks in the initially allocated object group in which free blocks exist.
For example, as shown in fig. 5, assuming that the storage node needs to write the original data block and the check data block (i.e., D1, D2, D3, and P4) corresponding to the target file into the object group (object6 to object10), it is found that the used capacity of the current object group reaches the first preset threshold, and the total capacity of the allocated object group reaches the preset maximum capacity, then the storage node may concurrently write P4, D1, D2, and D3 into a plurality of consecutive free blocks in the initially allocated object group (i.e., object1 to object5), that is, the storage node may concurrently write the block with address of P1, D1, D2, and D3 into the block with address of LBA2 in object5, the block with address of LBA 6959 in object1, the block with address of LBA 69556 in LBA object 828653, and the block with address of LBA 86848653 in object 5.
It should be noted that, in the embodiment of the present invention, if the used capacity of the current object group does not reach the first preset threshold, the storage node may directly and concurrently write the original data block and the check data block corresponding to the target file into a plurality of consecutive free blocks in the current object group.
In another embodiment of the present invention, the Object groups are Virtual Object (Vobject) groups, each Vobject in the Vobject groups includes at least one Object, and the number of objects included in each Vobject is the same;
correspondingly, the writing the original data block and the check data block corresponding to the target file into the consecutive free blocks in the object group concurrently includes:
if the used capacity of the objects with free blocks in the Vobject group reaches a second preset threshold value and the total capacity of the allocated objects does not reach a preset maximum capacity, allocating new objects for each Vobject in the Vobject group, and concurrently writing the original data blocks and the check data blocks corresponding to the target files into a plurality of continuous free blocks of the objects which have free blocks and are initially allocated in the Vobject group.
In this embodiment, to implement dynamic capacity expansion, the concept of Vobject can be introduced, and one Vobject can include one or more objects. A preset number of Vobjects may be formed into a Vobject group; the number of objects included in each Vobject in the Vobject group is the same, and the addressing mode of each block in each Vobject in the Vobject group can be referred to the addressing mode of each block in each vdev included in M1 vdev.
Wherein blocks in multiple objects in the same Vobject are consecutively addressed.
For example, assuming that the Vobject1 includes object1 and object2, and object1 and object2 each include 5 blocks, the addresses of the blocks in object1 and object2 may be LBA0 to LBA9, respectively.
When the volume expansion is needed and the total volume of the Vobject group (i.e. the total volume of the allocated objects in the Vobject group) does not reach the preset maximum volume, a new object can be allocated to each Vobject in the Vobject group.
Accordingly, in this embodiment, before writing the original data block and the check data block corresponding to the target file, it may be determined whether the used capacity of the object having a free block in the object group reaches a second preset threshold (which may be set according to an actual scene), and whether the total capacity of the allocated object groups in the object group reaches a preset maximum capacity (which may be set according to an actual scene).
If the used capacity of the objects with free blocks in the Vobject group reaches the second preset threshold value and the total capacity of the allocated objects does not reach the preset maximum capacity, the storage node may allocate a new object for each Vobject in the vobcet group, and concurrently write the original data block and the check data block corresponding to the target file into a plurality of continuous free blocks of the objects with free blocks in the Vobject group and initially allocated.
The second preset threshold may be the same as the first preset threshold, or may be different from the first preset threshold.
For example, as shown in fig. 6, assuming that the Vobject group includes vobjects 1 to Vobject5, each Vobject is allocated with an object (object 1 to object5, respectively belonging to Vobject1 to Vobject5), assuming that the storage node needs to write the original data block and the check data block (i.e. D1, D2, D3 and P4) corresponding to the target file into the Vobject group, the used capacity of the object detecting that there is a free block in the Vobject group reaches the second preset threshold, and the total capacity of the allocated objects does not reach the preset maximum capacity, the storage node may allocate a new object group (object6 to object10, respectively belonging to Vobject1 to Vobject5, which may be as shown in fig. 7), and at this time, the storage node may write the corresponding object group (object6 to object10, respectively belonging to Vobject group of the object1, which are the original data block and the free block, which may be written into the original data block group, which is continuously allocated, D2 and D3 write concurrently to the block with address LBA2 in object5, the block with address LBA3 in object1, the block with address LBA3 in object2, and the block with address LBA3 in object 3.
It should be noted that, in the embodiment of the present invention, when data writing is required, if the used capacity of an object having a free block in a Vobject group does not reach the second preset threshold; or, if the used capacity of the object with the free block in the object group reaches the second preset threshold and the total capacity of the allocated objects reaches the preset maximum capacity, the storage node does not allocate a new object for the object any more, but directly and concurrently writes the original data block and the check data block corresponding to the target file into a plurality of continuous free blocks of the initially allocated object with the free block in the object group.
Further, in this embodiment, when the storage node performs object mapping, each object in the object group may be mapped to a different OSD; wherein objects belonging to different Vobjects are mapped to different OSD.
Further, in this embodiment, in consideration that the addresses of the data blocks in the Vobject are not completely the same as the addresses in the object, after the storage node concurrently writes the original data block and the check data block corresponding to the target file into a plurality of consecutive free blocks of the object in the Vobject group, where the free blocks exist and are initially allocated, the method may further include:
recording the marks and the addresses of the Vobjects to which the blocks where the original data blocks and the verification data blocks are located belong, and recording the marks and the mapping relations between the addresses and the marks and the addresses of the Vobjects to which the blocks where the original data blocks and the verification data blocks are located belong.
For example, taking the scenario shown in fig. 7 as an example, when the identifier of the object belonging to the block where a certain data block is located is object1 and the address is LBA2, the identifier of the corresponding object and the addresses are object1 and LBA2 (i.e. the third block in the first object of the object 1); when the mark of the object belonging to the block where a certain data block is located is the object3 and the address is the LBA5, the mark and the address of the corresponding object are the object6 and the LBA0 (i.e. the first block in the second object of the object 3).
Correspondingly, in the subsequent flow, when data reading is required, the identifiers and addresses of the vobjects to which the original data block of the data to be read and the block to which the check data block belong can be inquired and determined first, and then according to the identifiers and addresses of the vobjects, the identifiers and addresses of the vobjects to which the original data block and the block to which the check data block belong, and the identifiers and addresses of the object to which the original data block and the block to which the check data block belong are inquired and recorded by the device, so as to determine the identifiers and addresses of the object to which the block of the original data block of the data to be read belongs, and perform data reading according to the identifiers and addresses of the object.
As can be seen from the above description, in the technical solution provided in the embodiment of the present invention, when a write request for a target file is received, the target file is divided into original data blocks according to a preset block size, and at least one parity data block is calculated by using a parity algorithm; furthermore, the original data block and the check data block corresponding to the target file can be concurrently written into a plurality of continuous free blocks in the object group, and each object in the object group is mapped to different OSD, so that the problem of write penalty of Ceph EC is solved, and the data writing performance of Ceph is improved.
Referring to fig. 8, a schematic structural diagram of a data writing device according to an embodiment of the present invention is provided, where the data writing device may be applied to a storage node in the foregoing method embodiment, as shown in fig. 8, the data writing device may include:
a receiving unit 810, configured to receive a write request;
a dividing unit 820, configured to, when the receiving unit 810 receives a write request for a target file, divide the target file into original data blocks according to a preset block size;
a checking unit 830 for calculating at least one check data block using a parity check algorithm according to the original data block;
a writing unit 840, configured to concurrently write an original data block and a check data block corresponding to the target file into a plurality of consecutive free blocks in an object group; the object group comprises at least three objects, each object in the object group comprises equal-size blocks, and the continuous idle blocks belong to the objects in the object group in sequence;
and a mapping unit 850 for mapping each object in the object group to a different object storage device OSD.
Referring to fig. 9 together, a schematic structural diagram of another data writing device provided in the embodiment of the present application is shown in fig. 9, where on the basis of the data writing device shown in fig. 8, the data writing device shown in fig. 9 may further include:
a first allocation unit 860, configured to allocate a new object group if the used capacity of the current object group reaches a first preset threshold and the total capacity of the allocated object groups does not reach a preset maximum capacity;
the writing unit 840 is specifically configured to write the original data block and the check data block into a plurality of consecutive free blocks in the newly allocated object group concurrently.
In an optional embodiment, the writing unit 840 is further configured to, if the used capacity of the current object group reaches the first preset threshold and the total capacity of the allocated object groups reaches the preset maximum capacity, concurrently write the original data block and the check data block into a plurality of consecutive free blocks in the initially allocated object group in which free blocks exist.
In an optional embodiment, the object groups are virtual object Vobject groups, each Vobject in the Vobject groups includes at least one object, and the number of objects included in each Vobject is the same;
accordingly, referring to fig. 10 together, a schematic structural diagram of another data writing device provided in the embodiment of the present application is shown in fig. 10, where on the basis of the data writing device shown in fig. 8, the data writing device shown in fig. 10 may further include:
a second allocating unit 870, configured to allocate a new object to each of the objects in the Vobject group if the used capacity of the objects having free blocks in the Vobject group reaches a second preset threshold and the total capacity of the allocated objects does not reach a preset maximum capacity;
the writing unit 840 is specifically configured to concurrently write the original data block and the check data block into a plurality of consecutive free blocks of the objects that have free blocks and are initially allocated in the Vobject group;
the mapping unit 850 is specifically configured to map each object in the Vobject group to a different OSD; wherein objects belonging to different Vobjects are mapped to different OSD.
Referring to fig. 11 together, a schematic structural diagram of another data writing device provided in the embodiment of the present application is shown in fig. 11, where, on the basis of the data writing device shown in fig. 10, the data writing device shown in fig. 11 may further include:
a recording unit 880, configured to record the identifier and the address of the Vobject to which the block where the original data block and the check data block are located belongs, and record a mapping relationship between the identifier and the address of the Vobject to which the block where the original data block and the check data block are located belongs and the identifier and the address of the object to which the block where the original data block and the check data block are located belongs.
Fig. 12 is a schematic hardware structure diagram of a data writing apparatus according to an embodiment of the present invention. The data writing apparatus may include a processor 1201, a machine-readable storage medium 1202 having machine-executable instructions stored thereon. The processor 1201 and the machine-readable storage medium 1202 may communicate via a system bus 1203. Also, the processor 1201 may perform the data writing method described above by reading and executing machine executable instructions in the machine readable storage medium 1202 corresponding to the data writing logic.
The machine-readable storage medium 1202 of embodiments of the present invention may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.
As can be seen from the above embodiments, when a write request for a target file is received, the target file is divided into original data blocks according to a preset block size, and at least one parity data block is calculated by using a parity algorithm; furthermore, the original data block and the check data block corresponding to the target file can be concurrently written into a plurality of continuous free blocks in the object group, and each object in the object group is mapped to different OSD, so that the problem of write penalty of Ceph EC is solved, and the data writing performance of Ceph is improved.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (10)

1. A data writing method is applied to a storage node in a distributed object storage system, and is characterized by comprising the following steps:
when a write-in request aiming at a target file is received, dividing the target file into original data blocks according to the size of a preset block, and calculating at least one check data block by utilizing a parity check algorithm according to the original data blocks;
writing the original data block and the check data block corresponding to the target file into a plurality of continuous free blocks in an object group concurrently; the object group comprises at least three objects, each object in the object group comprises equal-size blocks, and the continuous idle blocks belong to the objects in the object group in sequence;
and mapping each object in the object group to a different object storage device OSD.
2. The method according to claim 1, wherein the writing the original data block and the check data block corresponding to the target file into a plurality of consecutive free blocks in an object group concurrently comprises:
if the used capacity of the current object group reaches a first preset threshold value and the total capacity of the allocated object groups does not reach a preset maximum capacity, allocating a new object group, and concurrently writing the original data block and the check data block into a plurality of continuous free blocks in the newly allocated object groups.
3. The method according to claim 2, wherein the writing the original data block and the check data block corresponding to the target file into a plurality of consecutive free blocks in an object group concurrently further comprises:
and if the used capacity of the current object group reaches a first preset threshold value and the total capacity of the allocated object groups reaches a preset maximum capacity, writing the original data block and the check data block into a plurality of continuous free blocks in the initially allocated object groups in a concurrent manner.
4. The method according to claim 1, wherein the object groups are virtual object Vobject groups, each Vobject in the Vobject groups comprises at least one object, and the number of objects included in each Vobject is the same;
the writing, concurrently, an original data block and a check data block corresponding to the target file into a plurality of consecutive free blocks in an object group includes:
if the used capacity of the objects with free blocks in the Vobject group reaches a second preset threshold value and the total capacity of the allocated objects does not reach a preset maximum capacity, allocating new objects for each Vobject in the Vobject group, and concurrently writing the original data block and the check data block into a plurality of continuous free blocks with free blocks in the Vobject group and the initially allocated objects;
the mapping each object in the object group to a different OSD includes:
mapping each object in the Vobject group to a different OSD; wherein objects belonging to different Vobjects are mapped to different OSD.
5. The method according to claim 4, wherein after the writing the original data blocks and the check data blocks corresponding to the target file into the consecutive free blocks of the initially allocated object, in which free blocks exist in the Vobject group, concurrently, further comprises:
recording the marks and the addresses of the Vobjects to which the blocks where the original data blocks and the verification data blocks are located belong, and recording the marks and the addresses of the Vobjects to which the blocks where the original data blocks and the verification data blocks are located belong and the mapping relations between the marks and the addresses of the objects to which the blocks where the original data blocks and the verification data blocks are located belong.
6. A data writing apparatus applied to a storage node in a distributed object storage system, the apparatus comprising:
a receiving unit configured to receive a write request;
the dividing unit is used for dividing the target file into original data blocks according to the size of a preset block when the receiving unit receives a writing request aiming at the target file;
a check unit for calculating at least one check data block using a parity check algorithm according to the original data block;
a writing unit, configured to concurrently write an original data block and a check data block corresponding to the target file into a plurality of consecutive free blocks in an object group; the object group comprises at least three objects, each object in the object group comprises equal-size blocks, and the continuous idle blocks belong to the objects in the object group in sequence;
and the mapping unit is used for mapping each object in the object group to different object storage devices OSD.
7. The apparatus of claim 6, further comprising:
a first allocation unit, configured to allocate a new object group if the used capacity of the current object group reaches a first preset threshold and the total capacity of the allocated object groups does not reach a preset maximum capacity;
the writing unit is specifically configured to concurrently write the original data block and the check data block into a plurality of consecutive free blocks in the newly allocated object group.
8. The apparatus of claim 7,
the writing unit is further configured to, if the used capacity of the current object group reaches a first preset threshold and the total capacity of the allocated object groups reaches a preset maximum capacity, concurrently write the original data block and the check data block into a plurality of consecutive free blocks in the initially allocated object group in which free blocks exist.
9. The apparatus according to claim 6, wherein the object groups are virtual object Vobject groups, each Vobject in the Vobject groups comprises at least one object, and the number of objects included in each Vobject is the same;
the device further comprises:
a second allocating unit, configured to allocate a new object to each of the objects in the Vobject group if the used capacity of the objects having free blocks in the Vobject group reaches a second preset threshold and the total capacity of the allocated objects does not reach a preset maximum capacity;
the writing unit is specifically configured to concurrently write the original data block and the check data block into a plurality of consecutive free blocks of the initially allocated object, where the free blocks exist in the Vobject group;
the mapping unit is specifically configured to map each object in the Vobject group to a different OSD; wherein objects belonging to different Vobjects are mapped to different OSD.
10. The apparatus of claim 9, further comprising:
and the recording unit is used for recording the identifications and the addresses of the Vobjects to which the blocks where the original data blocks and the verification data blocks are located belong, and recording the mapping relations between the identifications and the addresses of the Vobjects to which the blocks where the original data blocks and the verification data blocks are located belong and the identifications and the addresses to which the blocks where the original data blocks and the verification data blocks are located belong.
CN201810264205.9A 2018-03-28 2018-03-28 Data writing method and device Active CN108491290B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810264205.9A CN108491290B (en) 2018-03-28 2018-03-28 Data writing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810264205.9A CN108491290B (en) 2018-03-28 2018-03-28 Data writing method and device

Publications (2)

Publication Number Publication Date
CN108491290A CN108491290A (en) 2018-09-04
CN108491290B true CN108491290B (en) 2021-07-23

Family

ID=63316454

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810264205.9A Active CN108491290B (en) 2018-03-28 2018-03-28 Data writing method and device

Country Status (1)

Country Link
CN (1) CN108491290B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111936960B (en) 2018-12-25 2022-08-19 华为云计算技术有限公司 Data storage method and device in distributed storage system and computer program product
CN113315801B (en) * 2020-06-08 2024-06-04 阿里巴巴集团控股有限公司 Method and system for storing block chain data
CN113806314B (en) * 2020-06-15 2024-01-26 中移(苏州)软件技术有限公司 Data storage method, device, computer storage medium and system
CN114968653B (en) * 2022-07-14 2022-11-11 麒麟软件有限公司 Method for determining RAIDZ check value of ZFS file system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063270A (en) * 2010-12-28 2011-05-18 成都市华为赛门铁克科技有限公司 Write operation method and device
CN103699494A (en) * 2013-12-06 2014-04-02 北京奇虎科技有限公司 Data storage method, data storage equipment and distributed storage system
CN106708439A (en) * 2016-12-23 2017-05-24 深圳市中博科创信息技术有限公司 Node selection and calculation method and system in distributed file system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9537511B2 (en) * 2013-11-06 2017-01-03 Cypress Semiconductor Corporation Methods, circuits, systems and computer executable instruction sets for providing error correction of stored data and data storage devices utilizing same

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063270A (en) * 2010-12-28 2011-05-18 成都市华为赛门铁克科技有限公司 Write operation method and device
CN103699494A (en) * 2013-12-06 2014-04-02 北京奇虎科技有限公司 Data storage method, data storage equipment and distributed storage system
CN106708439A (en) * 2016-12-23 2017-05-24 深圳市中博科创信息技术有限公司 Node selection and calculation method and system in distributed file system

Also Published As

Publication number Publication date
CN108491290A (en) 2018-09-04

Similar Documents

Publication Publication Date Title
US11243706B2 (en) Fragment management method and fragment management apparatus
CN108491290B (en) Data writing method and device
US10223010B2 (en) Dynamic storage device provisioning
US9514055B2 (en) Distributed media cache for data storage systems
US20180232314A1 (en) Method for storing data by storage device and storage device
US8832356B2 (en) Apparatus and method for flash memory address translation
WO2018189858A1 (en) Storage system
US10922276B2 (en) Online file system check
US20180253252A1 (en) Storage system
KR102347841B1 (en) Memory management apparatus and control method thereof
CN108255414B (en) Solid state disk access method and device
US20210326207A1 (en) Stripe reassembling method in storage system and stripe server
US10402108B2 (en) Efficient control of data storage areas based on a size of compressed data to be written
CN110569000A (en) Host RAID (redundant array of independent disk) management method and device based on solid state disk array
US10282116B2 (en) Method and system for hardware accelerated cache flush
KR101077901B1 (en) Apparatus and method for managing flash memory using log block level mapping algorithm
CN109739688B (en) Snapshot resource space management method and device and electronic equipment
US20200073572A1 (en) Storage system and storage control method
CN111913664B (en) Data writing method and device
CN116917873A (en) Data access method, memory controller and memory device
CN112148220A (en) Method and device for realizing data processing, computer storage medium and terminal
KR100970537B1 (en) Method and device for managing solid state drive
CN112162703B (en) Cache implementation method and cache management module
CN115857813A (en) Data storage system and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant