CN113176858B - Data processing method, storage system and storage device - Google Patents

Data processing method, storage system and storage device Download PDF

Info

Publication number
CN113176858B
CN113176858B CN202110495606.7A CN202110495606A CN113176858B CN 113176858 B CN113176858 B CN 113176858B CN 202110495606 A CN202110495606 A CN 202110495606A CN 113176858 B CN113176858 B CN 113176858B
Authority
CN
China
Prior art keywords
storage
data
blocks
data blocks
stripes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110495606.7A
Other languages
Chinese (zh)
Other versions
CN113176858A (en
Inventor
彭飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ruijie Networks Co Ltd
Original Assignee
Ruijie Networks Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ruijie Networks Co Ltd filed Critical Ruijie Networks Co Ltd
Priority to CN202110495606.7A priority Critical patent/CN113176858B/en
Publication of CN113176858A publication Critical patent/CN113176858A/en
Application granted granted Critical
Publication of CN113176858B publication Critical patent/CN113176858B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a data processing method, a storage system and storage equipment. The method comprises the following steps: acquiring a plurality of data blocks; when the number of the data blocks reaches a set value, determining an idle storage section; the storage section comprises a plurality of strips, one strip is determined according to the segmentation result of one storage block, and different strips and different storage blocks have corresponding relations; the set value represents the number of data blocks required for filling the storage section; filling corresponding contents in partial stripes of the plurality of stripes based on the plurality of data blocks; determining verification data according to the plurality of data blocks; filling the check data into the rest stripes of the plurality of stripes; and according to the corresponding relation between the strips and the storage blocks, performing distributed storage on the filled storage sections so as to respectively store the contents in the strips in the storage sections into the corresponding storage blocks. When the technical scheme of the application is adopted for data storage, the writing punishment can be effectively reduced and the problem of inconsistent updating can be solved.

Description

Data processing method, storage system and storage device
Technical Field
The present application relates to the field of storage technologies, and in particular, to a data processing method, a storage system, and a storage device.
Background
With the development of network technology and information processing technology, personal and enterprise data is in an explosive expansion trend, which makes distributed storage systems a common data storage system. However, in a distributed storage system, a node failure is a normal state, and in order to ensure high availability of data when the node failure occurs, the existing distributed storage system usually stores data by using a data redundancy mode, and the current main redundancy mode includes a plurality of main modes and an erasure code mode; the erasure code mode has the characteristics of high storage efficiency, low storage space occupancy rate and the like, but the existing erasure code mode has the problems of write punishment and inconsistent updating.
Disclosure of Invention
In view of the above, the present application provides a data processing method, a storage system, and a storage device that solve the above problems, or at least partially solve the above problems.
In one embodiment of the present application, a data processing method is provided. The method comprises the following steps:
acquiring a plurality of data blocks;
when the number of the data blocks reaches a set value, determining an idle storage section; the storage section comprises a plurality of strips, one strip is determined according to the segmentation result of one storage block, and different strips and different storage blocks have corresponding relations; the set value represents the number of data blocks required for filling the storage section;
filling corresponding contents in partial stripes of the plurality of stripes based on the plurality of data blocks;
determining check data according to the data blocks;
filling the check data into the rest stripes of the plurality of stripes;
and according to the corresponding relation between the strips and the storage blocks, performing distributed storage on the filled storage sections so as to respectively store the contents in the strips in the storage sections into the corresponding storage blocks.
In one embodiment of the present application, a storage system is provided. The system comprises:
a storage pool comprising a plurality of storage disks, a storage disk having a plurality of storage blocks;
the distributed storage module is used for acquiring a plurality of data blocks; when the number of the data blocks reaches a set value, determining an idle storage section; the storage section comprises a plurality of strips, one strip is determined according to the segmentation result of one storage block, and different strips and different storage blocks have corresponding relations; the set value represents the number of data blocks required for filling the storage section; filling corresponding contents in partial stripes of the plurality of stripes based on the plurality of data blocks; determining check data according to the data blocks; filling the check data into the rest stripes of the plurality of stripes; and according to the corresponding relation between the strips and the storage blocks, performing distributed storage on the filled storage sections so as to respectively store the contents in the strips in the storage sections into the corresponding storage blocks.
In one embodiment of the present application, a storage device is provided. The storage device includes: a memory and a processor; the memory is used for storing one or more computer instructions, and the one or more computer instructions can realize the steps of the data processing method when being executed by the processor.
According to the technical scheme provided by the embodiment of the application, an idle storage section is determined only after the number of the acquired data blocks reaches a set value; the storage section comprises a plurality of strips, one strip is determined according to the segmentation result of one storage block, and different strips and different storage blocks have corresponding relations; the set value represents the number of data blocks required for filling the storage section; then, filling corresponding contents in partial stripes of the plurality of stripes based on the plurality of data blocks; determining check data according to the data blocks, and filling the check data into the rest stripes of the stripes; and then according to the corresponding relation between the strips and the storage blocks, performing distributed storage on the filled storage sections, so as to respectively store the strips in the storage sections into the corresponding storage blocks. According to the technical scheme provided by the embodiment of the application, the data blocks and the check data are stored in the corresponding storage blocks in a full storage section mode, and the reading operation of the data blocks and/or the check data is not involved in the period, so that the writing punishment is effectively reduced; and when the number of the acquired data blocks reaches the required set value, the operation of filling the corresponding stripes in the data blocks is executed, so that the problem of the reduction of random writing performance caused by directly writing small data into the disk can be avoided. In addition, the scheme of the present application updates data by using an additional writing method, and therefore, the problem of inconsistent update of data can also be effectively solved, and in particular, as to how the method of the present application updates data by using an additional writing method to solve the problem of inconsistent update of data, reference may be made to the following relevant contents.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required to be utilized in the description of the embodiments or the prior art are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained according to the drawings without creative efforts for those skilled in the art.
FIG. 1 is a schematic diagram of a distributed erasure code based storage system according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a data processing method according to an embodiment of the present application;
fig. 3a is a schematic diagram illustrating a specific data format corresponding to a plurality of strips included in a storage section according to an embodiment of the present application;
FIG. 3b is a schematic diagram of a data format corresponding to a header area of a first type of stripe in a plurality of stripes included in a storage section according to an embodiment of the present application;
fig. 3c is a schematic diagram of a data format corresponding to a segment header included in one of at least one first-type stripe included in a storage segment according to an embodiment of the present application;
fig. 4 is a schematic diagram illustrating a principle of writing data in an additional manner according to an embodiment of the present application;
FIG. 5 is a block diagram of a memory system according to another embodiment of the present application;
fig. 6 is a block diagram of a data processing apparatus according to an embodiment of the present application;
fig. 7 is a block diagram of a storage device according to an embodiment of the present application.
Detailed Description
Before explaining the schemes provided by the embodiments of the present application, related terms referred to in the present application will be briefly described.
A plurality of present storage techniques: one copy of original data is completely copied and stored in multiple copies, and for a distributed storage system, one copy of original data is completely stored to a plurality of storage server nodes (storage nodes for short); for example, an N copy is to store a user data to N storage nodes completely, so that when a certain storage node fails, the user data can be recovered from other storage nodes, thereby ensuring the reliability of the data, the technology allows the maximum number of storage node failures to be N-1, and the utilization rate of a storage space to be 1/N;
erasure Coding storage technology (EC): the K + M erasure code storage technology (EC (K + M) for short) is to divide original data into K original data blocks, encode the K original data blocks into M check blocks, and store the K + M data blocks (including the original data blocks and the check blocks) in different storage nodes of the storage system to form a stripe with consistency, where no more than M arbitrary data blocks are lost or damaged, the original K original data blocks can be recovered by the remaining other data blocks that are not lost or damaged, that is, the maximum number of data blocks that the storage system tolerates to be lost or damaged is M, and the storage space utilization rate is M
Figure BDA0003054286610000041
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
In some of the flows described in the specification, claims, and above-described figures of the present application, a number of operations are included that occur in a particular order, and these operations may be performed out of order or in parallel as they occur herein. The sequence numbers of the operations, e.g., 101, 102, etc., are merely used to distinguish between the various operations, and the sequence numbers themselves do not represent any order of execution. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different. In the present application, the term "or/and" is only one kind of association relationship describing the associated object, and means that three relationships may exist, for example: a or/and B, which means that A can exist independently, A and B can exist simultaneously, and B can exist independently; the character "/" in this application generally indicates that the former and latter associated objects are in an "or" relationship. In addition, the embodiments described below are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
With the development of network technology and information processing technology, personal and enterprise data is in an explosive expansion trend, and a traditional centralized storage server cannot meet the requirement of large-scale data storage. When the distributed storage system stores data, a plurality of storage servers are used for sharing the storage load, so that the reliability, the availability and the storage efficiency of the system can be improved, and the system is easy to expand. However, most of the existing distributed storage systems improve the reliability, availability and expandability of the systems by using a multi-copy data storage technology, which undoubtedly causes great waste of storage space and increases storage cost, for example, three copies are taken as an example, user data needs to be completely stored on three storage nodes, and the amount of data that can be stored by the whole storage system only occupies 1/3 of the original storage capacity, that is, the storage utilization rate of a disk is 1/3, and the number of disks allowed to fail is 2.
Compared with the multi-copy storage technology, the erasure code storage technology (EC technology for short) has higher storage efficiency and can reduce the storage space occupation. For example, EC (4 + 2) is used for data storage, user data is divided into 4 original data blocks, the 4 original data blocks are encoded to generate 2 parity blocks, and then the 4+2 data blocks (including the original data blocks and the parity blocks) are respectively stored on different storage nodes to form a consistency stripe, the maximum number of damaged system tolerant data blocks is 2, and the storage space utilization rate is 2/3. Based on the above, it can be seen that the EC technology can improve the utilization rate of the storage space more than necessary under the same fault redundancy condition. However, when data storage is performed by using the existing EC technology, although the redundant check blocks can ensure the reliability of data, since these redundant check blocks may change with the change of the original data blocks, there is a significant limitation in the existing EC technology: writing new data incurs additional overhead called write penalty, that is, write penalty means that one data write requires multiple reads and writes. Particularly, when data modification is required, the write penalty problem in the EC technology is particularly significant, for example, when a write operation is performed once, an old original data block and an old parity block to be modified need to be read out at the same time, a new parity block needs to be calculated by combining the new original data block to be written, and finally, the old original data block and the old parity block to be modified are covered by the new original data block and the new parity block. Specifically, for example, when EC (2 + 1) is used to store data, 1 data write, 2 reads (i.e. one old original data block to be modified and one old parity block to be read) need to be performed, 2 writes (i.e. one new original data block to be written and one new parity block to be written), and the write penalty is 4; for another example, when EC (4 + 2) is used to store data, since there are two parity chunks, compared with EC (2 + 1), 2 parity chunks need to be read and 2 parity chunks need to be written, so that the write penalty corresponding to EC (4 + 2) is 6, and the existence of the write penalty definitely affects the storage performance of the storage system. It should be noted that: the calculation formula of the new check block may specifically be: new check block = old original data block XOR new original data block XOR old check block; where XOR represents an exclusive or operation.
In addition, the existing EC technology also has a non-uniform update problem, for example, in the EC (K + M) technology, K + M data blocks (including original data blocks and parity blocks) distributed on different storage nodes form a stripe with consistency, and when any data block in the stripe is lost, as long as no less than K data blocks are arbitrarily taken from the K original data blocks and the M parity blocks, the original K original data blocks can be recovered through a corresponding reconstruction algorithm. When a block or blocks of data in a stripe need to be modified, the system typically reads the corresponding parity block or blocks, recalculates the parity block based on the new block of data, and finally writes the new block of data and parity block at the same time. If cluster faults (such as system crash and power failure) suddenly occur in the process of simultaneous writing, a phenomenon that part of original data blocks or check blocks are modified and the other part of original data blocks or check blocks are not modified yet may occur, or a phenomenon that the same original data blocks or check blocks are only modified partially and the other part of original data blocks or check blocks are not modified yet occurs, which may cause a problem that data blocks and check blocks on a stripe are inconsistent, and the inconsistency is a non-consistent updating problem. Based on this, in order to solve the above problem, the embodiment of the present application proposes a data processing method, which is applicable to a storage system (such as the storage system shown in fig. 1) or a storage device using a Stripe (Stripe) as a management manner, for example, a storage array composed of a Solid State Disk (SSD), or an SSD itself, or a storage array composed of a Shingled Magnetic Recording (SMR), and the like, and is not particularly limited herein.
Stripe (Stripe) is a method of merging multiple disks (single disk, i.e. multiple storage media; or a storage block of one disk, i.e. multiple block units constituting the storage block, etc.) into one volume, and is understood as a set of location-dependent blocks on two or more partitions in a disk array (or a single disk, or a storage block of a disk), which may also be referred to as a Stripe Unit (SU), i.e. a Stripe is composed of multiple SUs. Stripe is a specific result of striping management of storage space, as can be seen in particular by Stripe of skill in the art. In the embodiment of the present application, a stripe is composed of multiple block units (hereinafter referred to as data areas) of a storage block, and it is specifically understood that the detailed description will be described below, and details are not described herein.
A storage system to which the data processing method provided in the embodiment of the present application is applicable is described below. Referring to fig. 1, a schematic structural diagram of a distributed erasure code based storage system according to an embodiment of the present application is shown. As shown in fig. 1, the storage system includes a plurality of storage nodes, such as a storage node a, a storage node B, and a storage node C, which may be, but not limited to, servers, workstations, etc., and the storage nodes may communicate with each other via InfiniBand, ethernet, etc. Each storage node may include a plurality of storage disks (i.e., disks shown in the figure), for example, storage node a includes storage disk A1, storage disk A2, and storage disk A3, which include but are not limited to: mechanical Hard Disk (HDD), solid State Drive (SSD), storage Class Memory (SCM), and Shingling Magnetic Recording (SMR). In practical applications, the number of storage nodes in the storage system and the number of storage disks included in each storage node may be increased according to actual requirements, which is not limited in this embodiment. The storage system can perform centralized management on storage resources (such as storage nodes and storage disks) in the system, and when receiving data sent by a writer, allocates corresponding storage resources for the data. In specific implementation, the storage resources in the storage system are subjected to blocking processing, and the storage resources subjected to blocking processing are organized according to erasure code types, so that management and allocation of the storage resources are realized, specifically comprising the following processes:
step 1, establishing a storage pool aiming at storage resources in a storage system;
in particular, a storage pool may be formed by selecting portions of storage disks from a plurality of storage disks included under each storage node. The number of storage disks under the same storage node contained in the storage pool can be flexibly set according to actual requirements, for example, the number of storage disks under the same storage node contained in the storage pool can be 3, 5, and the like; alternatively, all storage disks contained under all storage nodes in the system may be grouped into one storage pool.
For example, as shown in fig. 1, the storage system includes 3 storage nodes, i.e., a storage node a, a storage node B, and a storage node C; each storage node comprises 3 storage disks, specifically, a storage node a comprises a storage disk A1, a storage disk A2, and a storage disk A3, a storage node B comprises a storage disk B1, a storage disk B2, and a storage disk B3, and a storage node C comprises a storage disk C1, a storage disk C2, and a storage disk C3, so that the storage node a, the storage node B, and all the storage disks under the storage node C can form one storage pool 100.
Step 2, segmenting each storage disk in the storage pool according to the designated capacity to virtualize the storage disks into a plurality of storage blocks (Chunk, CK) with the same size, wherein the designated capacity can be flexibly set according to actual requirements and is not limited herein;
illustratively, assuming that the total capacity size of the storage disks A1 in the storage pool 100 is 1T, and the designated capacity is 256M, the storage disks A1 may be virtualized into 1T/256m =4 × 1024=4096 equally-sized storage blocks, each having a respective corresponding identification for uniquely identifying the storage block. In particular, the identifier may be, but is not limited to, a number or an address, which may be allocated when the disk is virtualized into a plurality of storage blocks of the same size, and has uniqueness. For example, if the identifier is a number, when the storage disk A1 is virtualized into 4096 storage blocks, the 4096 storage blocks may be numbered sequentially as follows: a is a 11 ,a 12 ,...,a 14096 . Accordingly, other storage disks in the storage pool may be virtualized into a plurality of storage blocks of the same size at a predetermined capacity (e.g., 256M).
Step 3, selecting a storage block from different storage disks of a plurality of different storage nodes respectively according to the erasure code type to form a storage block Group (abbreviated as CKG);
with reference to fig. 1, for example, EC (2 + 1) erasure code type is used, that is, a storage block needs to be selected from storage disks under 3 storage nodes to form a CKG, so that data can be stored in different storage disks under different storage nodes in a distributed manner, thereby ensuring recoverability of data when a single point failure occurs. It is assumed here that a storage block a is selected from the storage disks A1 under the storage node a 11 Selecting a storage block B from storage disks B1 under the storage node B 12 And selecting a storage block C from the storage disk C1 under the storage node C 11 The memory block group 0 (i.e. CKG0 shown in the figure) is formed, and so on, a plurality of CKGs can be further organized and formed for use by upper layers, and the organized form is only schematically shown in the figureThe resulting 2 CKGs (i.e., CKG0, CKG 1) do not represent the number of actual CKGs. Here CKG is the smallest unit of storage resources allocated by the storage system. It should be noted that: when selecting storage blocks from different storage disks under different storage nodes to organize and form a storage block group, a random selection mode may be adopted, or selection may be performed according to a load condition of each storage disk, which is not specifically limited herein, and a specific selection process may refer to the prior art.
Step 4, further dividing each storage block in each CKG into a plurality of data areas with finer granularity, forming a stripe with the plurality of data areas belonging to the same storage block, namely, after further refining and dividing the plurality of storage blocks in one CKG, respectively, forming a stripe corresponding to each of the plurality of storage blocks, and after performing finer granularity dividing on each storage block, forming a stripe, organizing the stripes corresponding to each of the plurality of storage blocks together to form a storage segment, and accordingly: the storage section comprises a plurality of strips, and the plurality of strips contained in the storage section have one-to-one correspondence with the plurality of storage blocks contained in the storage block group. For example, the storage section 0 shown in fig. 1 includes stripe 1, stripe 2 and stripe 3, which are respectively connected to the storage block a in the storage block group 0 11 Storage block b 12 Storage block c 11 One-to-one correspondence, specifically, stripe 1 is formed by pairing memory blocks a 11 A plurality of data areas obtained by dividing the data area into a plurality of data areas, and a stripe 2 is formed by dividing a memory block b 12 A plurality of data areas obtained by dividing the data area, and a stripe 3 formed by dividing the memory block c 11 And a plurality of data areas obtained by the division. The storage segment externally represents a logical disk (LUN) (or may also be referred to as a logical space unit) accessed by the host, the LUN is a storage unit that can be directly mapped to the host operating system to implement reading and writing, and when a reading and writing request of a user is processed and data migration is performed, the LUN applies for space, releases space, and migrates data to the storage system all in units of a data area of a stripe in the storage segment.
Here, it should be noted that: a plurality of memory blocks in one memory block group have the same size; accordingly, the plurality of strips in one storage section also have the same length.
Fig. 2 shows a schematic flowchart of a data processing method according to an embodiment of the present application. The method can be applied to the storage system based on distributed erasure codes shown in fig. 1, and as shown in fig. 2, the method comprises the following steps:
101. acquiring a plurality of data blocks;
102. when the number of the data blocks reaches a set value, determining an idle storage section; the storage section comprises a plurality of strips, one strip is determined according to the segmentation result of one storage block, and different strips and different storage blocks have corresponding relations; the set value represents the number of data blocks required for filling the storage section;
103. filling corresponding contents in partial stripes of the plurality of stripes based on the plurality of data blocks;
104. determining check data according to the data blocks;
105. filling the check data into the rest stripes of the plurality of stripes;
106. and according to the corresponding relation between the strips and the storage blocks, performing distributed storage on the filled storage sections so as to respectively store the contents in the strips in the storage sections into the corresponding storage blocks.
In the above 101, the data blocks are from the same data, and the specific form of the data may be, but is not limited to, one or more of symbols, characters, numbers, voice, images, videos, and the like. The data may be data sent by a writer through an interaction mode (such as a keyboard, a mouse, a touch, and the like) provided by a corresponding device, or may be data acquired by the storage system of this embodiment from another server or another storage system, which is not specifically limited herein; the device corresponding to the writer may be any terminal device such as a mobile phone, a tablet computer, a desktop computer, a notebook computer, and an intelligent wearable device, which is not specifically limited in this embodiment. In the process of storing the data into the storage system of the embodiment, the data is firstly cached into a first storage medium in the system, and the first storage medium divides the data into a plurality of data blocks with the same size according to the specified size, so that the data can be distributed and stored to different storage disks under different storage nodes by using an erasure code technology, and the recoverability of the data when a single-point fault occurs is ensured; in a specific implementation, the size of the data block may be determined according to actual situations, for example, for a 24MB data, the data block may be divided into 12 data blocks according to the size of 2 MB.
After the data is divided into a plurality of data blocks with the same size, considering that the divided data blocks may be relatively small, if the smaller data blocks are directly combined into a stripe and written into the storage disk, the randomness of writing is easily increased, and the writing performance of the storage disk is further reduced. Accordingly, in order to avoid the problem of the decrease in random writing performance caused by the direct writing of small data blocks to the disk, further, in this embodiment, a plurality of data blocks corresponding to the data in the first storage medium are issued to the second storage medium for storage, so that when the number of the plurality of data blocks cached in the second storage medium reaches a set value, the plurality of data blocks are combined into a larger strip and then written into the corresponding storage disk, and a related process of how to store the data blocks into the corresponding storage disk may refer to the following related contents, which is not described herein in detail. Here, it should be noted that: referring to fig. 1, the first storage medium may be a storage volume 11 corresponding to a storage system; the second storage medium may be a Solid State Disk (SSD) 12 used for caching data in the storage system, and the first storage medium and the second storage medium may also be in other forms, which is not limited herein. Based on this, one possible implementation of the above step 101 "obtaining multiple data blocks" is: receiving data sent by a writer; the received data is divided into data blocks of the same size.
In the above 102, the plurality of stripes included in the storage segment correspond to a plurality of storage blocks belonging to different storage nodes and different storage disks, respectively. Specifically, a plurality of stripes included in the storage segment have a one-to-one correspondence with a plurality of storage blocks included in the storage block group; a stripe is determined based on the result of the slicing process for one memory block, and specifically, a stripe is understood to be composed of a plurality of data areas obtained by slicing one memory block. The storage block group is obtained by organizing storage blocks belonging to different storage nodes and different storage disks in a storage pool according to erasure codes, and specifically, for how to obtain the storage block group and how to obtain a corresponding storage segment according to the storage block group, reference may be made to the above related contents, which are not described in detail herein. In addition, based on the content in step 101, in this embodiment, in order to avoid the problem of the random write performance degradation caused by the direct disk writing of small data, after dividing the received data into a plurality of data blocks with the same size, the plurality of data blocks are firstly cached to the second storage medium, and when the number of the data blocks cached to the second storage medium reaches a set value, a free storage segment is determined for the plurality of data blocks to store the plurality of data blocks. Accordingly, after determining that the number of the acquired data blocks reaches a set value, "determining an empty storage segment" in the step 102 may be implemented by specifically adopting the following steps:
1021. allocating a free memory block group;
1022. and determining the storage section according to the storage block group.
In a specific implementation, when the number of the acquired data blocks reaches a set value, a random allocation manner may be adopted to allocate an idle storage block group to the data blocks, and of course, other manners may also be adopted, which is not limited herein. When determining the storage segment corresponding to the storage block group further according to the storage block group, see the above related contents.
In addition, based on the principle related to erasure code storage technology, it can be known that, in the plurality of stripes included in the storage segment, a part of the stripes are required to store the data block, and another part of the stripes are used to store the check data obtained by encoding the data block, so that when the content in a part of the stripes fails or cannot be read out, the data can be recovered from the content in the other stripes according to the corresponding EC algorithm, thereby ensuring high availability of the data in the system. On this basis, namely, the plurality of strips contained in the storage section are provided with two types of strips, namely a first type of strip and a second type of strip; the first-type stripe is used for storing the plurality of data blocks, the second-type stripe is used for storing the check data, and the check data is obtained by encoding the plurality of data blocks by using a corresponding erasure code encoding mode (such as RS encoding). In the case where the data areas of all the first type stripes included in a segment are filled, the segment is considered to be a full segment. As can be seen from this, the set value in step 102 represents the number of data blocks required to fill the storage segment, that is, the maximum number of data blocks required to represent that all data areas of the first type stripes included in the storage segment are filled, and specific descriptions related to the first type stripes and the determination process of the set value may be referred to the following related contents, which are not described herein again. Illustratively, referring to a storage segment 0 shown in fig. 3a, the storage segment 0 is determined according to the storage block group 0 (i.e. CKG 0) shown in fig. 1, since the storage block group 0 is obtained by a storage system organizing storage blocks belonging to different storage nodes and different storage disks in the system according to an EC (2 + 1) mode, there are 2 first type stripes in the storage segment 0, such as stripe 1 (i.e. D1) and stripe 2 (i.e. D2), and 1 second type stripe, such as stripe 3 (i.e. P); accordingly, the set value is the maximum number of data blocks required to fill the data areas of the stripe 1 and the stripe 2.
In a realizable technical method, at least one of said first type of stripes is contained in said deposit section, said first type of stripe comprising a stripe header area and a data area; accordingly, the step 103 "fill corresponding content in a partial stripe of the plurality of stripes based on the plurality of data blocks" may specifically include:
1031. sequentially filling the plurality of data blocks into a corresponding data area of at least one first type stripe;
1032. and determining the strip head information filled in the strip head area of the strip of the first type based on the data blocks in the data area of the strip of the first type.
1031, in the process of filling the plurality of data blocks into the corresponding data area of at least one of the first-type stripes, based on the position relationship among the at least one of the first-type stripes, first sequentially filling part of the plurality of data blocks into the corresponding data area of one of the first-type stripes whose relative position is the most front, and when the corresponding data area of the first-type stripe is filled, performing a filling operation of the corresponding data area of the next first-type stripe. For example, with continued reference to fig. 3a, the stripe D1 and the stripe D2 are stripes of the first type, and it is assumed that the plurality of data blocks are specifically: d1, D2, D3, d.1, dm +1, d.1, dn-1, dn, based on the position relationship between the strip D1 and the strip D2, where the data blocks D1, D2, D3, d.1, dm +1, d.1, dn-1, dn may be first sequentially filled into the corresponding data area 1 of the strip D1, when the data block dm is filled in the data area 1 corresponding to the stripe D1, and the data area 1 is in a full state, the data block dm +1,. -, dn-1, dn located behind the data block dm is filled in the data area 2 corresponding to the stripe D2. Of course, the multiple data blocks may also be sequentially filled into the corresponding data area of at least one of the first-type stripes in other manners, such as a priority order between at least one of the first-type stripes, which is not specifically limited herein.
1032, the data format of the slice header of the first type of slice is as shown in fig. 3b, and accordingly, the slice header information filled in the slice header includes a magic number and a checksum, where the magic number is an internally self-defined constant value, such as 0xf981ef0d; the checksum is a value determined based on the data block filled in the data area of the first type stripe, and specifically may be a checksum corresponding to the filled data block calculated according to a set check algorithm (such as a CRC check algorithm) when the data block is filled in the corresponding data area of the first type stripe, for example, the checksum in the stripe header information of the stripe D1 is a checksum calculated by using the CRC check algorithm if the data block filled in the corresponding data area 1 of the stripe D1 is D1, D2, D3,. And dm, for example, the checksum is obtained by receiving the above example in step 1031, and specifically how the checksum is calculated by using the CRC check algorithm. In the same way, the magic number and checksum included in the slice header information of the slice D2 can also be determined.
Further, a first type stripe of said at least one said first type stripe further comprises a segment header region, for example: two stripes of the first type, namely a stripe D1 and a stripe D2, are shown in fig. 3a, the stripe D1 comprising in addition to the above-mentioned header and data areas, a further stripe D1 comprising a section header area. Accordingly, the step 103 "filling corresponding contents in a part of the stripes based on the plurality of data blocks" further includes:
1033. determining segment header information based on the plurality of data blocks;
1034. and filling the segment header information into the segment header area.
In specific implementation, the data format of the segment header area is as shown in fig. 3c, and accordingly, the segment header information filled into the segment header area may specifically include: magic number, checksum, version identification, data block number and description information of each data block. Wherein the magic number is a constant value internally defined; the checksum is obtained by calculating the multiple data blocks according to a set check algorithm (such as a CRC check algorithm), for example, the example in step 1031 continues to be carried out, and specifically, the setting of the multiple data blocks is: d1, d2, d3, d.1, dm +1, d.1, dn-1, dn, the checksum filled in the segment header area is the checksum calculated by using the CRC check algorithm on the data blocks d1, d2, d3, d.1, dm +1, d.1, dn-1, dn; the description information of each data block may include, but is not limited to, a length, a filling time, an offset address, etc. of the data block; the version identifier is used for version upgrade, for example, in a case that a data format of a storage segment needs to be changed, the storage segment before and after the change can be distinguished through the version identifier, where the change operation may specifically be, for example, updating data in the storage segment.
Based on the above and referring to fig. 3a to 3c, the stripes in the storage segment shown in fig. 3a have the same length and are known in advance, and here the stripe length is set to stripe _ len, since the set value in the above step 1021 is the maximum number of data blocks required to fill the corresponding data areas of all the stripes of the first type in the storage segment, that is, for the storage segment 0 shown in fig. 3a, the maximum number of data blocks required to be able to fill the data areas 1 and 1 of the stripes 1 and 2, the set value T corresponding to the storage segment 0 can be determined by the following formula, specifically:
data_len*T=(stripe_len-stripe_head_len-segment_head_len-T*data_describe_info_len)+(stripe_len-stripe_head_len); (1)
and (3) converting the formula (1) to obtain the set value T:
Figure BDA0003054286610000151
in formula (1) and formula (2), stripe _ head _ len represents a stripe header region length, segment _ head _ len represents a segment header region length, data _ descriptor _ info _ len represents a description information length of a data block, and date _ len represents a length of the data block.
The "determining the check data according to the plurality of data blocks" in 104 includes:
1041. encoding the plurality of data blocks;
1042. and determining the check data according to the encoding processing result.
In specific implementation, the plurality of data blocks may be encoded according to an encoding calculation manner of an EC _ TYPE (such as RS encoding), so as to determine the check data according to an encoding processing result, and how to calculate the check data according to an EC _ TYPE calculation manner of the check encoding is the same as that in the prior art, which is not described herein again.
In the above 105, the determined check data may be filled into the second-type stripe in a sequential filling manner, specifically, refer to the related content of filling the data block into the first-type stripe.
In 106, after the check data is filled into the second type of stripe, distributed storage may be performed on the data content filled in each stripe in the storage segment according to a corresponding relationship between the stripe and the storage block, so as to respectively store the contents in the plurality of stripes in the storage segment into the respective corresponding storage blocks, that is, respectively store the contents in the plurality of stripes in the storage segment into the respective corresponding storage disks.
In view of the above, what needs to be added here is: the magic numbers included in the header information corresponding to the respective at least one first-type strip in the storage section, and the magic numbers included in the header information corresponding to the at least one first-type strip in the storage section may be the same or different, and this embodiment is not limited specifically. In addition, when the storage segments which are filled up are stored in a distributed mode, the magic numbers and the checksums which are included in the stripe header information of the first type of stripes are written into the corresponding storage disks together with the data blocks which are filled into the data areas of the first type of stripes, so that when the data are read from the storage disks, the reliability of the data can be checked through the magic numbers and the checksums. For example, referring to fig. 3a to 3B and fig. 4, assuming that user data (i.e. data 1) is distributed and stored in the storage disk A1, the storage disk B1, and the storage disk C1 after filling the storage segment 0, if the defined magic number is 0xf981ef0D, after reading the storage segment 0 data (i.e. data 1) from the storage disks (i.e. the storage disk A1, the storage disk B1, and the storage disk C1), if the magic number can be taken out from the stripe D1 in the storage segment 0 and compared with 0xf981ef0D, if the comparison result is inconsistent, it indicates that the data in the stripe D1 is unreliable (i.e. the data stored in the storage disk A1 is unreliable); in addition, when data is read, the checksum may be recalculated according to the corresponding check algorithm when the data is filled into the storage segment 0, and then the checksum is compared with the checksum stored in the storage disk, so as to determine the reliability of the data based on the comparison result, for example, when the data is read, the checksum corresponding to the stripe D1 may be recalculated by using the corresponding check algorithm when the data is filled, and the recalculated checksum is compared with the checksum stored in the storage disk A1, and if the comparison result is inconsistent, it is determined that the data stored in the storage disk A1 is unreliable. Based on the above, when data reading is performed, whether the data in the stripe has a problem or not can be quickly determined by using the magic number and the checksum, so that the data in the stripe which cannot be verified can be conveniently recovered through the data in the rest stripes, and the reliability of the data is improved. The magic number is arranged, so that the efficiency of determining whether data have problems can be rapidly improved, the number of times of checking and calculating is reduced, and the performance of the system is improved.
In the above, when data is read from the storage disk, the magic number and the checksum in the head area of the stripe of one stripe in the storage segment are used to check whether the data in the corresponding stripe is reliable. Similarly, the magic number and the checksum in the segment head area of the storage segment are used for verifying whether the data in the whole storage segment are reliable or not.
In summary, in this embodiment, the data block and the parity data are written into the corresponding storage disk together in a data full storage segment manner, that is, when performing the operation of writing data into the storage disk each time, only the contents (such as the data block or the parity block) in the multiple stripes in the whole full storage segment need to be written into the corresponding storage disk together in a distributed manner, and the operation of reading the data block and the parity data from the storage disk is not involved, which can effectively reduce the write penalty of the erasure code, and even for the EC (2 +) type erasure code, the write performance can be doubled compared with the normal erasure code write performance.
Further, the method provided by this embodiment further includes:
107a, acquiring logical addresses of the plurality of data blocks;
107b, determining a physical address based on the storage section filled with the plurality of data blocks;
107c, establishing a mapping relation between the logical address and the physical address, and storing the mapping relation into a database.
In a specific implementation, the logical address is generally issued and transferred by an upper operating system or application corresponding to the storage system, for example, when an upper operating system intends to perform read/write operations on the storage system corresponding to the upper operating system, the upper operating system will inform the storage system corresponding to the logical address to be read/written to the upper operating system. Based on this, the logical addresses of the data blocks may be issued and transferred by an upper operating system or application corresponding to the storage system provided in this embodiment, where the upper operating system or application is integrated on the device corresponding to the writer. Specifically, referring to fig. 1, when a writer performs a write operation on a storage system shown in the figure for data through a corresponding device, the device corresponding to the writer issues the data to be written and a corresponding written logical address to a storage volume 11 in the storage system, and then the storage volume 11 divides the received data into a plurality of data blocks and sends the plurality of data blocks to an SSD for caching, so that the storage system performs subsequent processing based on the plurality of data blocks cached in the SSD, where specific subsequent processing may refer to the related contents. It should be noted that the logical address is sent to the SSD by the storage volume 11 along with the plurality of data blocks, that is, in addition to the plurality of data blocks cached in the SSD, the logical address of the plurality of data blocks is recorded. Accordingly, the logical addresses of the data blocks may be obtained from the SSD shown in fig. 1, so that based on the obtained logical addresses of the data blocks and the physical addresses determined based on the storage segments filled by the data blocks, a mapping relationship between the logical addresses and the physical addresses may be established, and the mapping relationship is maintained to the database, which is convenient for performing corresponding data reading operations according to the mapping relationship.
The above-mentioned content introduces the process of storing data into the storage disk mainly from the viewpoint of reducing the write penalty of erasure code, and for the problem of inconsistent update existing in the process of updating data by erasure code, this embodiment adopts the method of storing updated data into the storage disk in the form of additional write, that is, the user overwrites the data written into the same logical position and keeps the data at different positions of the storage disk, that is to say: when the existing data is modified, a new storage segment is allocated to the modified data, so that the modified data is stored in a new position of the storage disk, and the modified data does not occupy the position of the storage disk used for storing the data before modification. Therefore, in the process of updating the data stored in the storage disk, if sudden conditions such as system crash, power failure and the like occur, the data successfully stored in the storage disk can be recovered, so that the consistency of the EC of the data of the storage disk is ensured to be always met. The specific implementation data updating process is as follows:
referring to fig. 4 in combination with fig. 1, in the storage system provided in this embodiment, it is assumed that when a user writes data 1 to a logical address LBA1 for the first time, the storage system allocates an idle storage segment 0 to the data 1 (different storage segments correspond to different physical addresses), and stores the data 1 in the storage disk A1, the storage disk B1, and the storage disk C1 in a distributed manner through the storage segment 0, and specifically, how to store the data 1 in the storage disk A1, the storage disk B1, and the storage disk C1 in a distributed manner may refer to the above related contents, which is not described herein again. When a user needs to update data 1 and write updated data 1 (i.e., data 2) into a logical address LBA1, the storage system allocates a new storage segment to the data 2, for example, the storage segment 1 (a plurality of stripes in the storage segment 1 have a one-to-one correspondence with the storage block group 1 shown in fig. 1), if a system crash, a power failure, or the like occurs during the distributed storage of the data 2 into a corresponding storage disk through the storage segment 1, because data in different stripes in the storage segment are distributed on different storage disks, when a power failure or the like occurs, it may be caused that only a part of the storage disks write data successfully, and another part of the storage disks do not write data successfully, so that data in the entire storage segment 1 is unusable. However, in this case, since a copy of the completed data 1 is stored in the storage segment 0, the user can read the complete data 1 stored in the disk before from the storage system after the power failure is recovered. However, if the conventional overwriting manner is adopted, that is, the same LBA is written into the same storage segment only, the data in the storage segment may be partially overwritten by new data after the system crashes or is powered off, and thus a normal piece of data cannot be read. In summary, the embodiment adopts the additional writing method, so that two different physical addresses can be corresponded to when data is written into the same LBA twice, and therefore, if the system crashes (e.g., system crashes or power failure) during the process of writing data into the disk, any influence on the data which has been written successfully last time will not be generated, thereby solving the problem of inconsistent update in the process of updating data by erasure codes and improving the reliability of data in the system.
It should be noted that: with continued reference to fig. 4, if the storage system successfully completes the update of the data 1 in the write-once manner, that is, the updated data 1 (i.e., the data 2) is successfully stored in the storage disk A2, the storage disk B3, and the storage disk C3 in a distributed manner, the data that is stored in the storage disk corresponding to the data 1 before the update (i.e., the data 1) in a distributed manner at this time becomes garbage data, that is, invalid data, and the data related to the data 1 stored in the storage disk A1, the storage disk B1, and the storage disk C1 is marked as garbage data. In addition, after the update of the data 1 is completed, the mapping relationship between the logical address and the physical address related to the data 1, which is established before, is further updated according to the position of the storage segment 1 filled with the updated data 1 (i.e. the data 2), and the old mapping relationship is cleared.
According to the technical scheme provided by the embodiment, after a plurality of data blocks are obtained, an idle storage segment is determined; the storage section comprises a plurality of strips, and different strips correspond to different storage blocks; filling corresponding contents in partial stripes of the plurality of stripes based on the plurality of data blocks; determining check data according to the data blocks, and filling the check data into the rest stripes of the stripes; and then, according to the corresponding relation between the strips and the storage blocks, performing distributed storage on the filled storage sections so as to respectively store the strips in the storage sections into the corresponding storage blocks. The scheme can reduce the write punishment and avoid the problem of the random write performance reduction caused by the direct write of the small data to the disk, and in addition, because the embodiment adopts the additional write mode to update the data, the problem of the non-uniform update in the erasure code data updating process is effectively solved.
Referring to fig. 5, a schematic structural diagram of a memory system according to another embodiment of the present application is shown. As shown in fig. 5, the storage system specifically includes:
a storage pool 20 comprising a plurality of storage disks having a plurality of storage blocks;
a distributed storage module 21, configured to obtain a plurality of data blocks; when the number of the data blocks reaches a set value, determining an idle storage section; the storage section comprises a plurality of strips, one strip is determined according to the segmentation result of one storage block, and different strips and different storage blocks have corresponding relations; the set value represents the number of data blocks required for filling the storage section; filling corresponding contents in partial stripes of the plurality of stripes based on the plurality of data blocks; determining check data according to the data blocks; filling the check data into the rest stripes of the plurality of stripes; and according to the corresponding relation between the strips and the storage blocks, performing distributed storage on the filled storage sections so as to store the strips in the storage sections into the corresponding storage blocks respectively.
In specific implementation, the specific form of the storage disk may be, but is not limited to: storage media such as a Hard Disk Drive (HDD), a Solid State Drive (SSD), a Storage Class Memory (SCM), and a Shingled Magnetic Recording (SMR); the storage disk in the storage pool is divided into a plurality of storage blocks according to the designated capacity.
Further, the storage system provided in this embodiment further includes:
the storage pool management module 22 is configured to organize storage blocks belonging to different storage nodes and to different storage disks according to erasure code types to obtain storage block groups;
the distributed storage module 21 is configured to allocate an idle storage block group when the number of the acquired data blocks reaches a set value; and determining the storage section according to the storage block group.
Further, the storage system provided in this embodiment further includes:
the storage volume 23 is used for receiving data sent by a writer and a logical address corresponding to the data; dividing received data into data blocks with the same size;
the cache module 24 is configured to cache the data blocks divided by the storage volume so that the distributed storage module obtains the plurality of data blocks; recording the logical addresses corresponding to the data blocks so that an address mapping module obtains the logical addresses corresponding to the data blocks; and determining the logical address corresponding to the data block according to the logical address corresponding to the data.
In specific implementation, the cache module may be a Solid State Disk (SSD), or may also be in other forms, which is not limited herein.
Further, the storage system provided in this embodiment further includes:
an address mapping module 25, configured to obtain the logical addresses corresponding to the multiple data blocks from the cache module 24; acquiring the physical addresses determined by the distributed storage module for the plurality of data blocks; and establishing and storing the mapping relation between the logical address and the physical address.
It should be noted here that, in addition to the above modules (such as the storage pool, the distributed storage module, the storage pool management module, the storage volume, the caching module, and the address mapping module), the storage system provided in this embodiment may further include: the data logging module is contained in the cache module (not shown in the figure), and is used for receiving the divided data blocks sent by the storage volume and sending the data blocks to the cache module (such as a Solid State Disk (SSD)) for caching, that is, after the storage volume divides the received data sent by the writer into the data blocks with the same size, the divided data blocks can be sent to the data logging module first, and the data logging module sends the data blocks to the cache module for keeping. The data log module is arranged here, so that the problem of reduced random writing performance caused by direct disk writing of small data blocks can be solved, and data loss caused by system downtime (such as system breakdown and power failure) when data is not written into the storage disk can be prevented.
It is also to be noted here that: the content of each step in the storage system provided in this embodiment, which is not described in detail in the foregoing embodiments, may refer to the corresponding content in the foregoing embodiments, and is not described herein again. In addition, the storage system provided in this embodiment may further include, in addition to the above steps, other parts or all of the steps in the above embodiments, and for details, reference may be made to corresponding contents in the above embodiments, and details are not described here again.
In summary, the data processing method provided by the embodiments of the present application can be summarized as the following process. Namely:
(1) Sending a plurality of data blocks written to a logical address LBA to a cache module via a storage volume; wherein a plurality of data blocks belong to the same data.
(2) The cache module sends a plurality of data blocks to the space distribution unit, if the space distribution unit does not have a storage section to be downloaded at present, the space distribution unit firstly distributes a free storage section, applies a storage section space for storing the storage section to be downloaded in the memory, and fills the plurality of data blocks into the storage section memory data space; if the space distribution unit has a storage section to be downloaded, directly filling the data blocks into the memory data space of the storage section to be downloaded; and if the storage section to be downloaded of the space distribution unit is filled, storing the filled storage section to the corresponding disk.
(3) And the space allocation unit obtains the physical addresses addr of the data blocks according to the filling positions of the data blocks and returns the physical addresses addr to the address mapping module.
(4) The address mapping module establishes a mapping relation between the physical address addr and the logical address LBA, stores the mapping relation to the database, and returns the success of data writing to the data log module.
(5) And when the data log module receives the data writing success message, clearing the data stored in the cache module.
(6) If the user continues to write a new data block at the same logical address LBA, for example, when updating the data that has been successfully written into the disk, the updated data block continues to be written at the same logical address LBA, and then the processes in steps (1) to (5) are repeated. However, when step (4) is executed, the address mapping module updates the mapping relationship between the physical address addr and the logical address LBA, and stores the updated mapping relationship in the database, and meanwhile, the address mapping module also clears the mapping relationship between the old physical address addr and the logical address LBA.
It can be seen from the above steps that the same LBA is written twice and corresponds to two different physical addresses, and if the system crashes during the process of writing data into the storage disk, data that has been written successfully before will not be affected, so that the consistency of the EC is ensured, and the reliability of the data is improved.
Here, it should be noted that: the space allocation unit may correspond to the distributed storage module shown in fig. 5, and the data log module is included in the cache module shown in fig. 5. For the content that is not described in detail in the above steps, reference may be made to the corresponding content in the above embodiments, and details are not described here.
Here, it should be further explained that: the technical solution provided by the embodiment of the present application is applicable to any adaptive storage system, and the embodiment of the present application does not limit a specific storage system.
Fig. 6 shows a block diagram of a data processing apparatus according to an embodiment of the present application. As shown in fig. 6, the data processing apparatus specifically includes:
a first obtaining module 501, configured to obtain a plurality of data blocks;
a first determining module 502, configured to determine an idle storage segment when the number of the data blocks reaches a set value; the storage section comprises a plurality of strips, one strip is determined according to the segmentation result of one storage block, and different strips and different storage blocks have corresponding relations; the set value represents the number of data blocks required for filling the storage section;
a first filling module 503, configured to fill corresponding contents in a partial stripe of the plurality of stripes based on the plurality of data blocks;
a second determining module 504, configured to determine check data according to the multiple data blocks;
a second filling module 505, configured to fill the parity data into a remaining stripe of the plurality of stripes;
a storing module 506, configured to perform distributed storage on the filled storage segments according to the corresponding relationship between the strips and the storage blocks, so as to store the contents in the strips in the storage segments into the respective corresponding storage blocks respectively.
According to the technical scheme provided by the embodiment, after the number of the acquired data blocks reaches a set value, an idle storage section is determined; the storage section comprises a plurality of strips, one strip is determined according to the segmentation result of one storage block, and different strips and different storage blocks have corresponding relations; the set value represents the number of data blocks required for filling the storage section; then filling corresponding contents in partial stripes of the plurality of stripes based on the plurality of data blocks; determining check data according to the data blocks, and filling the check data into the rest stripes of the stripes; and then, according to the corresponding relation between the strips and the storage blocks, performing distributed storage on the filled storage sections so as to respectively store the strips in the storage sections into the corresponding storage blocks. The technical scheme provided by the embodiment can reduce the writing punishment and solve the problem of inconsistent updating of data, and can also avoid the problem of reduced random writing performance caused by directly writing small data on the disk.
Further, the apparatus provided in this embodiment further includes:
the storage pool module comprises a plurality of storage nodes, one storage node comprises a plurality of storage disks, and one storage disk comprises a plurality of storage blocks;
the organizing module is used for organizing the storage blocks which belong to different storage nodes and different storage disks according to the erasure code type to obtain a storage block group;
the storage blocks are arranged in a storage block group, wherein a plurality of strips contained in the storage section correspond to a plurality of storage blocks contained in the storage block group one by one; and segmenting the storage block to obtain a plurality of data areas, wherein the plurality of data areas form a strip.
Further, the first determining module 502, when configured to determine an idle storage segment, is specifically configured to: when the number of the acquired data blocks reaches a set value, allocating an idle storage block group; and determining the storage section according to the storage block group.
Furthermore, the storage section comprises a plurality of strips, wherein the strips are of a first type and a second type; wherein the first type stripe is used for storing the plurality of data blocks; the second type stripe is used for storing the check data.
Still further, the storage section comprises at least one first type of strip, and the first type of strip comprises a strip header area and a data area; accordingly, the number of the first and second switches is increased,
the first filling module 503, when configured to fill corresponding contents in a partial stripe of the plurality of stripes based on the plurality of data blocks, is specifically configured to: sequentially filling the plurality of data blocks into a corresponding data area of at least one first type stripe; and determining the strip head information filled in the strip head area of the strip of the first type based on the data blocks in the data area of the strip of the first type.
Still further, the slice header information includes: magic number and checksum.
Still further, a first type stripe of said at least one said first type stripe further comprises a segment header region; accordingly, the number of the first and second switches is increased,
the first filling module 503 is configured to fill corresponding contents in a part of the plurality of stripes based on the plurality of data blocks, and is further specifically configured to: determining segment header information based on the plurality of data blocks; and filling the segment header information into the segment header area.
Still further, the segment header information includes: magic number, checksum, version identification, data block number and description information of each data block.
Further, when the second determining module 504 is configured to determine the check data according to the multiple data blocks, it is specifically configured to: encoding the plurality of data blocks; and determining the check data according to the encoding processing result.
Further, the apparatus provided in this embodiment further includes:
a second obtaining module, configured to obtain logical addresses of the multiple data blocks;
a third determining module, configured to determine a physical address based on the storage segment filled with the plurality of data blocks;
and the establishing module is used for establishing the mapping relation between the logical address and the physical address and storing the mapping relation into a database.
Further, when the first obtaining module 501 is configured to obtain a plurality of data blocks, it is specifically configured to: receiving data sent by a writer; the received data is divided into data blocks of the same size.
Here, it should be noted that: the data processing apparatus provided in this embodiment may implement the technical solution described in the data processing method embodiment shown in fig. 2, and the specific implementation principle of each module or unit may refer to the corresponding content in the data processing method embodiment shown in fig. 2, which is not described herein again.
Fig. 7 is a schematic structural diagram illustrating a storage device according to an embodiment of the present application. As shown in fig. 7, the storage device includes: a memory 601 and a processor 602. The memory 601 may be configured to store other various data to support operations on the sensors. Examples of such data include instructions for any application or method operating on the sensor. The memory 601 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The memory 601 for storing one or more computer instructions;
the processor 602, coupled to the memory 601, is configured to execute the one or more computer instructions stored in the memory 601 to implement the data processing method provided by the foregoing method embodiments.
Further, as shown in fig. 7, the storage device further includes: communications component 603, display 604, power component 605, and audio component 606, among other components. Only some of the components are shown schematically in fig. 7, and the sensor is not meant to include only the components shown in fig. 7.
Accordingly, embodiments of the present application further provide a computer-readable storage medium storing a computer program, where the computer program can implement the steps or functions of the data processing method provided in the foregoing embodiments when executed by a computer.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on the understanding, the above technical solutions substantially or otherwise contributing to the prior art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the various embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (13)

1. A data processing method, comprising:
dividing the received data to obtain a plurality of data blocks;
when the number of the data blocks reaches a set value, determining an idle storage section; the storage section comprises a plurality of strips, one strip is determined according to the segmentation result of one storage block, and different strips and different storage blocks have corresponding relations; the set value represents the number of data blocks required for filling the storage section; different storage sections correspond to different physical addresses;
filling corresponding contents in partial stripes of the plurality of stripes based on the plurality of data blocks;
determining check data according to the data blocks;
filling the check data into the rest stripes of the plurality of stripes;
according to the corresponding relation between the strips and the storage blocks, performing distributed storage on the filled storage sections so as to respectively store the contents in the strips in the storage sections into the corresponding storage blocks;
responding to the updating operation of the data, and acquiring the updated data; the logic address corresponding to the data is the same as the logic address corresponding to the updated data;
determining a new free storage segment;
and performing distributed storage on the updated data by using the determined new idle storage section.
2. The method of claim 1,
the storage pool comprises a plurality of storage nodes, one storage node comprises a plurality of storage disks, and one storage disk comprises a plurality of storage blocks;
organizing storage blocks belonging to different storage nodes and different storage disks according to erasure code types to obtain storage block groups;
the storage blocks are respectively provided with a storage segment, wherein a plurality of strips contained in the storage segment correspond to a plurality of storage blocks contained in the storage block group one by one; and segmenting the storage block to obtain a plurality of data areas, wherein the plurality of data areas form a strip.
3. The method of claim 2, wherein determining a free deposit segment comprises:
allocating a free memory block group;
and determining the storage section according to the storage block group.
4. The method of claim 3, wherein the storage section includes a plurality of strips having a first type of strip and a second type of strip; wherein the content of the first and second substances,
the first type stripe is used for storing the plurality of data blocks;
the second type stripe is used for storing the check data.
5. The method of claim 4, wherein said storage section contains at least one of said first type of stripe, said first type of stripe comprising a header region and a data region; and
based on the plurality of data blocks, filling corresponding contents in partial stripes of the plurality of stripes, including:
sequentially filling the plurality of data blocks into a corresponding data area of at least one first type stripe;
determining the strip head information filled in the strip head area of the first type strip based on the data blocks in the data area of the first type strip; wherein the slice header information includes: magic number and checksum.
6. The method of claim 5, wherein one of at least one of said first type of strips further comprises a segment header region; and
based on the plurality of data blocks, filling corresponding contents in partial stripes of the plurality of stripes, further comprising:
determining segment header information based on the plurality of data blocks;
filling the segment header information into the segment header area;
wherein the segment header information includes: magic number, checksum, version identification, data block number and description information of each data block.
7. The method of any of claims 1 to 6, further comprising:
acquiring logical addresses of the plurality of data blocks;
determining a physical address based on the storage section filled with the plurality of data blocks;
and establishing a mapping relation between the logical address and the physical address, and storing the mapping relation into a database.
8. The method according to any one of claims 1 to 6, wherein the dividing the received data to obtain a plurality of data blocks comprises:
receiving data sent by a writer;
the received data is divided into data blocks of the same size.
9. A storage system, comprising:
a storage pool comprising a plurality of storage disks, a storage disk having a plurality of storage blocks;
the distributed storage module is used for acquiring a plurality of data blocks, and the data blocks are obtained by dividing received data; when the number of the data blocks reaches a set value, determining an idle storage section; the storage section comprises a plurality of strips, one strip is determined according to the segmentation result of one storage block, and different strips and different storage blocks have corresponding relations; the set value represents the number of data blocks required for filling the storage section; different storage sections correspond to different physical addresses; filling corresponding contents in partial stripes of the plurality of stripes based on the plurality of data blocks; determining check data according to the data blocks; filling the check data into the rest stripes of the plurality of stripes; according to the corresponding relation between the strips and the storage blocks, performing distributed storage on the filled storage sections so as to respectively store the contents in the strips in the storage sections into the corresponding storage blocks; responding to the updating operation of the data, and acquiring the updated data; wherein the logical address corresponding to the data is the same as the logical address corresponding to the updated data; determining a new free storage segment; and performing distributed storage on the updated data by using the determined new idle storage section.
10. The system of claim 9, further comprising:
the storage pool management module is used for organizing storage blocks which belong to different storage nodes and different storage disks according to the erasure code type to obtain a storage block group;
the distributed storage module is used for allocating an idle storage block group when the number of the acquired data blocks reaches a set value; and determining the storage section according to the storage block group.
11. The system of claim 9 or 10, further comprising:
the storage volume is used for receiving data sent by a writer and a logic address corresponding to the data; dividing the received data into data blocks with the same size;
the cache module is used for caching the data blocks divided by the storage volume so that the distributed storage module can acquire the data blocks; recording the logical addresses corresponding to the data blocks so that an address mapping module obtains the logical addresses corresponding to the data blocks; and determining the logical address corresponding to the data block according to the logical address corresponding to the data.
12. The system of claim 11, further comprising:
the address mapping module is used for acquiring the logic addresses corresponding to the data blocks from the cache module; acquiring the physical addresses determined by the distributed storage module for the plurality of data blocks; and establishing and storing the mapping relation between the logical address and the physical address.
13. A storage device, comprising: a memory and a processor; the memory is used for storing one or more computer instructions which, when executed by the processor, are capable of implementing the steps of the data processing method of any of the preceding claims 1-8.
CN202110495606.7A 2021-05-07 2021-05-07 Data processing method, storage system and storage device Active CN113176858B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110495606.7A CN113176858B (en) 2021-05-07 2021-05-07 Data processing method, storage system and storage device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110495606.7A CN113176858B (en) 2021-05-07 2021-05-07 Data processing method, storage system and storage device

Publications (2)

Publication Number Publication Date
CN113176858A CN113176858A (en) 2021-07-27
CN113176858B true CN113176858B (en) 2022-12-13

Family

ID=76928450

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110495606.7A Active CN113176858B (en) 2021-05-07 2021-05-07 Data processing method, storage system and storage device

Country Status (1)

Country Link
CN (1) CN113176858B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113672175A (en) * 2021-08-09 2021-11-19 浙江大华技术股份有限公司 Distributed object storage method, device and equipment and computer storage medium
CN115757192A (en) * 2021-09-03 2023-03-07 华为技术有限公司 Recovery method of storage block and related device
CN114301575B (en) * 2021-12-21 2024-03-29 阿里巴巴(中国)有限公司 Data processing method, system, equipment and medium
CN114995770B (en) * 2022-08-02 2022-12-27 苏州浪潮智能科技有限公司 Data processing method, device, equipment, system and readable storage medium
CN115391093B (en) * 2022-08-18 2024-01-02 江苏安超云软件有限公司 Data processing method and system
CN115599315B (en) * 2022-12-14 2023-04-07 阿里巴巴(中国)有限公司 Data processing method, device, system, equipment and medium
CN117149094B (en) * 2023-10-30 2024-02-09 苏州元脑智能科技有限公司 Method and device for determining data area state, disk array and storage system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930500A (en) * 2016-05-06 2016-09-07 华为技术有限公司 Transaction recovery method in database system, and database management system
CN110399310A (en) * 2018-04-18 2019-11-01 杭州宏杉科技股份有限公司 A kind of recovery method and device of memory space

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6742081B2 (en) * 2001-04-30 2004-05-25 Sun Microsystems, Inc. Data storage array employing block checksums and dynamic striping
US7200715B2 (en) * 2002-03-21 2007-04-03 Network Appliance, Inc. Method for writing contiguous arrays of stripes in a RAID storage system using mapped block writes
CN102722340A (en) * 2012-04-27 2012-10-10 华为技术有限公司 Data processing method, apparatus and system
CN105677249B (en) * 2016-01-04 2019-01-15 浙江宇视科技有限公司 The division methods of data block, apparatus and system
CN112019788B (en) * 2020-08-27 2023-04-11 杭州海康威视系统技术有限公司 Data storage method, device, system and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930500A (en) * 2016-05-06 2016-09-07 华为技术有限公司 Transaction recovery method in database system, and database management system
CN110399310A (en) * 2018-04-18 2019-11-01 杭州宏杉科技股份有限公司 A kind of recovery method and device of memory space

Also Published As

Publication number Publication date
CN113176858A (en) 2021-07-27

Similar Documents

Publication Publication Date Title
CN113176858B (en) Data processing method, storage system and storage device
US10977124B2 (en) Distributed storage system, data storage method, and software program
US11150986B2 (en) Efficient compaction on log-structured distributed file system using erasure coding for resource consumption reduction
US8972779B2 (en) Method of calculating parity in asymetric clustering file system
US11074129B2 (en) Erasure coded data shards containing multiple data objects
CN110096217B (en) Method, data storage system, and medium for relocating data
US11698728B2 (en) Data updating technology
CN110096219B (en) Effective capacity of a pool of drive zones generated from a group of drives
US9207870B2 (en) Allocating storage units in a dispersed storage network
JP2022512064A (en) Improving the available storage space in a system with various data redundancy schemes
CN112889034A (en) Erase coding of content driven distribution of data blocks
US10996894B2 (en) Application storage segmentation reallocation
CN110383251B (en) Storage system, computer-readable recording medium, and method for controlling system
US11301137B2 (en) Storage system and data arrangement method of storage system
US11449402B2 (en) Handling of offline storage disk
CN109582213A (en) Data reconstruction method and device, data-storage system
CN101566930B (en) Virtual disk drive system and method
US11481275B2 (en) Managing reconstruction of a malfunctioning disk slice
US10031805B2 (en) Assigning slices to storage locations based on a predicted lifespan
US11507278B2 (en) Proactive copy in a storage environment
CN112783698A (en) Method and device for managing metadata in storage system
US11544005B2 (en) Storage system and processing method
US20230236932A1 (en) Storage system
JP6605762B2 (en) Device for restoring data lost due to storage drive failure
CN115391093A (en) Data processing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant