CN115599314A

CN115599314A - Data redundancy strategy changing method and device, storage node and storage medium

Info

Publication number: CN115599314A
Application number: CN202211587787.7A
Authority: CN
Inventors: 王辰
Original assignee: Shenzhen Fanlian Information Technology Co ltd
Current assignee: Shenzhen Fanlian Information Technology Co ltd
Priority date: 2022-12-12
Filing date: 2022-12-12
Publication date: 2023-01-13
Anticipated expiration: 2042-12-12
Also published as: CN115599314B

Abstract

According to the data redundancy strategy changing method, the data redundancy strategy changing device, the storage node and the storage medium, when the redundancy strategy switching condition is met, the target strip is determined; the target strip represents a strip meeting the switching requirement of the redundancy strategy; determining at least one data block to be written according to at least one data block in the target stripe; a check block corresponding to the data block representation data block to be written in or a copy corresponding to the data block; and writing the data block to be written into the storage space, and forming a logic stripe with at least one data block. When the scheme is used for realizing data dump, a space is not newly applied for a data block, but the data block of the target strip is directly used, only the newly generated check block or copy is written into the storage space, and the data block of the target strip and the newly generated check block or copy form a logic strip to finish the data dump. IO (input/output) overhead caused by new writing of the data block is effectively avoided, so that the IO load of the disk is reduced, and the performance of the storage system is improved.

Description

Data redundancy strategy changing method and device, storage node and storage medium

Technical Field

The present invention relates to the field of storage systems, and in particular, to a method and an apparatus for changing a data redundancy policy, a storage node, and a storage medium.

Background

Data redundancy strategies are mainly relied on in the storage system to provide data security and reliability. Common data redundancy strategies include copy, RAID and erasure code, wherein the copy, RAID and erasure code respectively subdivide redundancy levels, for example, the copy has two copies, three copies and the like, the RAID has RAID5, RAID6 and the like, and the erasure code has erasure code 4+2, erasure code 8+2 and the like. Data is stored in a storage system by a specific data redundancy strategy at the initial generation stage, and the reliability grade requirement, the read-write performance requirement and the storage cost requirement of the data are continuously changed along with the time. These changes require corresponding changes to the data redundancy policy to implement data dumping.

In the prior art, when data dump is implemented through data redundancy policy change, data blocks of an original stripe are re-sliced and then written into a new stripe, and such a dump mode can generate more write IO, thereby increasing the load of disk IO.

Disclosure of Invention

In view of this, an object of the present invention is to provide a method, an apparatus, a storage node and a storage medium for changing a data redundancy policy, which can directly use a data block of an original stripe, write only a newly generated parity block or a copy into a storage space, and form a logical stripe by the data block of the original stripe and the newly generated parity block or copy, thereby completing data dump. IO (input/output) overhead caused by new writing of the data block is effectively avoided, so that the IO load of the disk is reduced, and the performance of the storage system is improved.

In order to achieve the above purpose, the embodiment of the present invention adopts the following technical solutions:

in a first aspect, the present invention provides a method for changing a data redundancy policy, which is applied to a storage node, and the method includes:

when the switching condition of the redundancy strategy is met, determining a target strip; the target stripe represents a stripe meeting the switching requirement of the redundancy strategy;

determining at least one data block to be written according to at least one data block in the target stripe; the data block to be written represents a check block corresponding to the data block or a copy corresponding to the data block;

and writing the data block to be written into a storage space, and forming a logic strip with the at least one data block.

In an optional embodiment, the step of determining a target stripe when the redundancy policy switching condition is satisfied includes:

when the load utilization rate of the magnetic disk is greater than a disk load high threshold, determining the magnetic disk as a first type target disk; the first type of target disk represents a disk with high service load;

traversing all the strips in the system, and determining a first target strip, wherein a first type data block of the first target strip is stored on a first type target disk, and the first target strip is one of the target strips.

In an optional embodiment, the step of determining at least one data block to be written according to at least one data block in the target stripe includes:

reading each first type data block respectively;

dividing the first type data block into at least two first type data groups;

calculating a corresponding check block according to each first-class data group respectively, and taking the check block as a data block to be written corresponding to the first-class data group;

the step of writing the data block to be written into a storage space and forming a logical stripe with the at least one data block includes:

and writing the check block into a storage space, and forming the logic stripe together with the corresponding first class data group.

In an optional implementation manner, the step of determining at least one data block to be written according to at least one data block in the target stripe includes:

reading each first type data block respectively;

respectively generating a data copy corresponding to each first-type data block; taking the data copy as a data block to be written corresponding to each first-class data block;

and writing the data copy into a storage space, and forming the logical stripe with the corresponding first type data block.

when the load utilization rate of the disk is smaller than the disk load low threshold, determining the disk as a second type target disk; the second type target disc represents a disc with low service load;

traversing all the strips in the system, and determining a second target strip, wherein a second type data block of the second target strip is stored on a second type target disk, and the second target strip is one of the target strips.

reading the second type data blocks of at least two second target stripes respectively;

combining all the second type data blocks into at least one second type data block;

calculating a corresponding check block according to each second-class data group respectively, and taking the check block as a data block to be written corresponding to the second-class data group;

and writing the check block into a storage space, and forming the logic stripe together with the corresponding second type data group.

respectively generating a data copy corresponding to each second-class data block; taking the data copy as a data block to be written corresponding to each second type data block;

and writing the data copy into a storage space, and forming the logical stripe with the corresponding first class data block.

In a second aspect, the present invention provides a data redundancy policy changing apparatus, applied to a storage node, the apparatus including:

the decision module is used for determining a target strip when the switching condition of the redundancy strategy is met; the target stripe represents a stripe meeting the switching requirement of the redundancy strategy;

the generating module is used for determining at least one data block to be written according to at least one data block in the target stripe; the data block to be written represents a check block corresponding to the data block or a copy corresponding to the data block;

and the writing module is used for writing the data block to be written into a storage space and forming a logic strip with the at least one data block.

In a third aspect, the present invention provides a storage node, comprising a memory for storing a computer program and a processor for executing the data redundancy policy changing method according to any of the preceding embodiments when the computer program is called.

In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the data redundancy policy changing method according to any one of the foregoing embodiments.

Compared with the prior art, the data redundancy strategy changing method, the data redundancy strategy changing device, the storage node and the storage medium provided by the embodiment of the invention determine the target stripe when the redundancy strategy switching condition is met; the target stripe represents a stripe meeting the switching requirement of the redundancy strategy; determining at least one data block to be written according to at least one data block in the target stripe; the data block to be written represents a check block corresponding to the data block or a copy corresponding to the data block; and writing the data block to be written into a storage space, and forming a logic stripe with the at least one data block. When the scheme is used for realizing data dump, a space is not newly applied for a data block, but the data block of the target strip is directly used, and only the newly generated check block or copy forms a logic strip by the data block of the target strip and the newly generated check block or copy to finish the data dump. IO (input/output) overhead caused by new writing of the data block is effectively avoided, so that the IO load of the disk is reduced, and the performance of the storage system is improved.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 shows a schematic diagram of a prior art redundancy policy change.

Fig. 2 is a flowchart illustrating a data redundancy policy changing method according to an embodiment of the present invention.

Fig. 3 shows a schematic diagram of stripe metadata.

FIG. 4 is a diagram illustrating a comparison of file mapping metadata changes before and after a data redundancy policy change.

Fig. 5 shows a sub-step flow diagram of step S101 in fig. 2.

FIG. 6 shows a schematic diagram of the redundancy strategy changing from erasure code 8+2 to erasure code 4+ 2.

FIG. 7 shows a schematic diagram of the redundancy strategy changing from erasure code 4+2 to three copies.

FIG. 8 shows a schematic diagram of the redundancy strategy changing from three copies to erasure code 4+ 2.

FIG. 9 shows a schematic diagram of the redundancy strategy changing from erasure code 4+2 to erasure code 8+2.

Fig. 10 is a block diagram illustrating a data redundancy policy changing apparatus according to an embodiment of the present invention.

Fig. 11 is a block diagram illustrating a storage node according to an embodiment of the present invention.

Icon: 100-a storage node; 110-a memory; 120-a processor; 130-a communication module; 200-data redundancy policy changing means; 201-decision module; 202-a generation module; 203-write module.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

It is noted that relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

The enterprise-level Storage system has undergone years of development, and most of the early Storage modes are Direct Attached Storage (DAS), storage Area Network (SAN), and the like, and the development is rapid in recent years, and the main form of the enterprise-level Storage system is distributed Storage, and the enterprise-level Storage system is mainly characterized by providing very high data reliability and providing Storage services of blocks, files and objects for users. Enterprise-level storage typically relies on one or more data redundancy policies in terms of reliability.

With the continuous change of business requirements, data dump needs to be performed in order to be more suitable for practical application scenarios, and an important role of the data dump technology is to solve the balance between performance and cost. At present, a data dump method mainly reads data from an original stripe, and writes a new stripe after slicing according to a switched redundancy strategy, and such a dump method can generate more write IO, thereby increasing the load of disk IO.

In order to clearly understand the operation flow of data dump in the prior art, the following description will be made by taking fig. 1 as an example. Assuming that the file 1 adopts three copy mechanisms to ensure the reliability of file data, that is, each data block of the file 1 is copied into three copies and stored in different disks in a scattered manner, it can be seen that the utilization rate of the storage space of the three copies is 33%. In order to improve the space utilization rate and save the cost, for example, the erasure code 4+2 is used to dump the file 1, and the erasure code 4+2 mechanism is to divide the data into 4 fragments with the same size, and generate 2 parity chunks with the same size through the parity check algorithm, that is, 6 fragment spaces need to be used to store four fragment data, so that it can be known that the erasure code 4+2 storage space utilization rate is 67%. In the prior art, if a file 1 has only one stripe, generally, when dumping the file 1, any one of three copies of data in an original stripe is read first, then the read data is sliced and divided into 4 data blocks with the same size, 2 check blocks are generated according to the data blocks, and finally the 4 data blocks and the 2 check blocks are written into a disk. As can be seen, dump file 1 requires 7 IO scheduling operations, including 1 read IO and 6 write IOs.

Obviously, in order to implement data dump in the prior art, read data blocks are re-sliced and then written into a storage space, and such a dump manner may generate more write IO, thereby increasing the load of disk IO and affecting the performance of the storage system.

Based on this, the embodiment of the invention provides a data redundancy strategy changing method, a data redundancy strategy changing device, a storage node and a storage medium. And only the newly generated check block or copy is written into the storage space by directly using the data block of the original strip, and the data block of the original strip and the newly generated check block or copy form a logical strip to finish data dump. IO (input/output) overhead caused by newly writing data blocks is effectively avoided, so that the IO load of a disk is reduced, and the performance of a storage system is improved.

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

The data redundancy strategy changing method and device provided by the embodiment of the invention are applied to the storage node, and the storage node executes the data redundancy strategy changing method provided by the embodiment of the invention. In the embodiment of the present invention, the storage node may be a single storage node, or may also be a distributed storage node or a centralized storage node, which is not limited in the present invention.

Referring to fig. 2, fig. 2 is a schematic diagram illustrating a data redundancy policy change according to an embodiment of the present invention, where the method includes the following steps:

step S101, when a redundancy strategy switching condition is met, determining a target strip;

the target stripe represents a stripe meeting the switching requirement of the redundancy strategy;

in the embodiment of the present invention, the storage system mostly adopts a stripe technology to realize IO load balancing of the system, that is, a continuous block of data is divided into data blocks with the same size, and each data block is stored on a different disk. When the redundancy policy needs to be changed for data dump, the stripe is changed accordingly, so that a target stripe which needs to be subjected to data dump is found according to the redundancy policy switching condition, and the target stripe is the above-mentioned original stripe. The redundancy policy switching condition may be set according to an actual application scenario, which is not limited in the present invention.

In the embodiment of the present invention, the redundancy policy switching condition may include a redundancy policy pair, a disk load threshold, and a load utilization rate of a disk. The redundancy strategy pair is used for recording two redundancy strategies before and after data dump, a disk load threshold value refers to a boundary value of disk read-write pressure and is used for judging whether disk data needs to be switched over in the redundancy strategy, and the load utilization rate of a disk refers to actual read-write pressure of the disk.

In the embodiment of the invention, the redundancy policy pair and the disk load threshold value can be preset by an administrator through an interactive interface or a third-party server, and the disk load utilization rate is the load utilization rate of each disk obtained by regularly sampling the background program of the storage system.

In the embodiment of the invention, an administrator can set a plurality of redundancy policy pairs to provide flexible and various data dump modes. Each redundancy strategy pair comprises a first redundancy strategy and a second redundancy strategy, and the first redundancy strategy and the second redundancy strategy respectively correspond to redundancy strategies before and after data dump. The redundancy strategy of the target band to be dumped finds a first redundancy strategy or a second redundancy strategy which is consistent with the redundancy strategy in the redundancy strategy pair, and if the redundancy strategy of the target band to be dumped is a second redundancy strategy, the redundancy strategy of the new band after dumping is a first redundancy strategy, for example, the set redundancy strategy pair is erasure code 4+2 and erasure code 8+2, if it is detected that the load of the disk is too large, the data of the erasure code 8+2 needs to be dumped to improve the performance of the storage system, firstly, the corresponding erasure code 4+2 is found in the redundancy strategy pair through the erasure code 8+2, and then data dumping is performed by using the erasure code 4+ 2. On the contrary, if it is detected that the disk load is smaller and the erasure code 4+2 data dump needs to be reduced by the storage space, the corresponding erasure code 8+2 is found for data dump.

In the embodiment of the present invention, the disk load threshold may include a disk load high threshold and a disk load low threshold, and the disk load threshold may be used to define "hot data" and "cold data", where the "hot data" refers to data that is frequently accessed, where the read-write pressure of a disk where the data is located is relatively large, and the traffic load is relatively high; "cold data" refers to data that is rarely accessed, the read-write pressure of the disk where the data is located is relatively small, and the service load is low; through the disk load threshold, a 'hot data' disk and a 'cold data' disk which need to dump data can be identified, wherein the 'hot data' disk is a first-class target disk, and the 'cold data' disk is a second-class target disk; furthermore, when data dump occurs to balance system performance and cost, a targeted redundancy policy switch is performed for different data.

It should be noted that the disk load threshold supports administrator setting and modification, and system defaults may also be used, for example, the system default disk load high threshold is 80%, and the disk load low threshold is 60%, and the present invention is not limited thereto.

It is worth mentioning that, in order to avoid the impact of the IO load generated by dump on the user service, the data dump can be performed during the service peak period. For example, an administrator selects a service idle period as an optimal dump time period according to system service distribution, and presets the optimal dump time period on a storage node, and the storage node performs data dump in the specified optimal dump time period, so as to minimize the influence on service performance.

Step S102, determining at least one data block to be written according to at least one data block in a target stripe;

the data block to be written represents a check block corresponding to the data block or a copy corresponding to the data block;

in the embodiment of the present invention, to implement dumping of target stripe data, a data block of a target stripe is first read, a redundancy policy pair is then searched according to a redundancy policy of the target stripe, a redundancy policy of a new stripe corresponding to the redundancy policy pair is found, and a check block or a copy of the new stripe is finally generated according to the redundancy policy of the new stripe, where the check block or the copy of the new stripe is a data block to be written.

Step S103, writing the data block to be written into the storage space, and forming a logic stripe with at least one data block.

The logical stripe refers to a new stripe formed by the data block of the target stripe and the newly generated check block or the corresponding copy of the data block, that is, the stripe after data dump.

In summary, compared to the dump data in the prior art described above, it is necessary to re-slice the data blocks in the target stripe, and then write the sliced data blocks and the newly generated check blocks or copies into the storage space together, which may generate more write IO and increase the load of disk IO. In order to reduce the disk load generated by the dump IO, the data block in the target stripe is directly used, only the newly generated check block or copy is written into the storage space, the data block of the target stripe and the newly generated check block or copy form a logical stripe, the data dump is completed, the IO overhead caused by the new writing of the data block is effectively avoided, the disk IO load is reduced, and the performance of the storage system is improved.

In an embodiment of the invention, in order to efficiently manage logical stripes, data blocks and check blocks or copies in the logical stripes are quickly found. The metadata may be used to establish a mapping relationship between the logical stripe and the data block and the check block or copy in the logical stripe, so as to determine the data block and the check block or copy in the logical stripe through the metadata, and also determine a disk to which the data block and the check block or copy belong, and a position offset within the disk to which the data block and the check block or copy belong. Therefore, data reading and writing are carried out through the magnetic disk and the position deviation in the magnetic disk.

Specifically, in one possible implementation, the metadata of the logical stripe may include a stripe ID, a redundancy policy used by the stripe, and a stripe distribution array.

Taking fig. 3 as an example, the stripe distribution array element is related information of each unit in a stripe, and the related information includes a disk ID, an offset in the disk, a redundant computation sequence number in the stripe, and a file ID.

The redundant calculation sequence number in the strip is used for recording the relative position of each strip distribution array element in the strip, and errors of user service data caused by position disorder are prevented. The physical position of each unit of the strip distributed on the disk, the logical relation of each unit of the strip in the redundancy calculation and the file to which the data stored in each unit of the strip belong can be determined through the metadata of the logical strip.

In order to establish the mapping relationship between the dumped file and the data blocks, the logical stripe IDs need to be updated reversely into the file mapping metadata, the file mapping metadata is used for recording the mapping relationship between the file and the stripe to which the file data belongs, each file corresponds to a metadata record, the metadata record comprises a file name, a file ID and a stripe ID array, and the stripe corresponding to the file can be queried through the file name.

When the strip ID in the file mapping metadata is updated reversely, the file ID to which the strip distribution array element in the strip belongs is found in the strip metadata, the corresponding record is found in the file mapping metadata according to the file ID, and the newly generated logic strip ID is updated into the record. For example, if the stripe ID of file 1 before the data block a is dumped is 1, the stripe ID corresponding to the data block of file 1 recorded in the file mapping metadata is 1, and the stripe ID of file 1 after the data block a is dumped is 2, the stripe ID corresponding to the data block of file 1 in the file mapping metadata is updated to 2, and the change of the stripe ID in the file mapping metadata before and after the update is as shown in fig. 4. In this example, the file metadata and the stripe metadata are both stored in the KV database, to which the present invention is not limited.

Therefore, the data redundancy strategy updating method provided by the embodiment of the invention is applied to the storage node, and when the redundancy strategy switching condition is met, the target strip is determined; the target strip represents a strip meeting the switching requirement of the redundancy strategy; determining at least one data block to be written according to at least one data block in the target stripe; a check block corresponding to the data block representation data block to be written or a copy corresponding to the data block; and writing the data block to be written into the storage space, and forming a logic strip with at least one data block. According to the scheme, when the data dump is realized, a space is not newly applied for the data block, the data block of the target strip is directly used, only the newly generated check block or copy is written into the storage space after switching, and the data block of the target strip and the newly generated check block or copy form a logic strip to complete the data dump. IO (input/output) overhead caused by new writing of the data block is effectively avoided, so that the IO load of the disk is reduced, and the performance of the storage system is improved.

Optionally, in practical applications, in order to solve a problem of a large system IO load, a "hot data" disk is selected for data dump to improve performance. When the load utilization of the disk is greater than the disk load high threshold, referring to fig. 5, the sub-steps of step S101 may include, on the basis of fig. 2:

step S10111, when the load utilization rate of the disk is greater than the disk load high threshold, determining the disk as a first type target disk;

the first type of target disks represent disks with high service load;

in the embodiment of the invention, the sudden high load utilization rate is prevented in order to select the 'hot data' disk more reasonably. For example, before using the load utilization rate of the disk, five thousandths of the highest load utilization rate in the latest day sampling results of each disk are removed, and the highest value in the remaining sampling results is taken as the load utilization rate of the disk. And when the load utilization rate of the disk is greater than or equal to the low threshold of the disk load and less than or equal to the high threshold of the disk load, the disk does not dump data. And when the utilization rate of the disk is greater than the high threshold of the disk load, the service load of the disk is high, and the disk belongs to a 'hot data' disk, namely the first-class target disk.

Step S10112, traversing all the strips in the system, and determining a first target strip;

the first type data block of the first target stripe is stored on a first type target disk, and the first target stripe is one type of target stripe;

in the embodiment of the invention, the stripes storing all data in the first type of target disk can be found by traversing the stripes, and the obtained stripes are used as the first target stripes for data dump.

It should be noted that, in order to better fit the actual operation condition of the current system, the number of accesses to the stripe may be monitored by the background, and the stripe with a high access frequency in the system may be recorded. For example, in the present example, the stripe information accessed in the last week is saved using the hot stripe, and the stripe information is used to record the stripe ID and the number of accesses of the stripe. The stripes in the hot stripe region are sorted from high to low according to the access times, and the last elimination system is adopted to clear the stripe information with less access times, so that the stripes stored in the hot stripe region are all the stripes with high recent access frequency.

As a specific implementation manner, if a stripe obtained according to the first-class target disk is not in a hot stripe region, the stripe is not dumped in the process of dumping data this time, and the key point is to improve the performance of frequently accessed data in the system by dumping.

It should be noted that, in the prior art, the redundancy policy switching condition depends on the configuration of the administrator, for example, the administrator needs to preset a threshold related to the access frequency or the access probability of the data within a period of time, and dump the data whose statistical access frequency or access probability meets the threshold. The key problems encountered by the storage administrator during the configuration process are that the specific number of the threshold value is not clear or can not be accurately evaluated, and the optimal threshold value is continuously changed along with the change of the service, so that the final dumping effect is not ideal.

In summary, the embodiment of the present application provides flexible and various redundancy policy change configurations, and determines how data is stored by monitoring the utilization rate of a disk, so as to avoid an administrator from observing and evaluating a hotness threshold, and maximally reduce storage cost on the premise of guaranteeing service performance.

Optionally, in practical applications, it is necessary to improve system performance by dumping a "hot data" disk, the reliability of the dumped data may be improved by using parity chunks, and based on the redundancy manner of the parity chunks, step S102 in fig. 2 may include:

respectively reading each first type data block; dividing the first type data block into at least two first type data groups;

calculating a corresponding check block according to each first-class data group, and taking the check block as a data block to be written corresponding to the first-class data group;

for the manner of checking the chunk dump data, step S103 in fig. 2 may include:

and writing the check block into a storage space, and forming a logic strip with the corresponding first-class data group.

In the embodiment of the present invention, an erasure code is taken as an example for explanation, where erasure code 4+2 indicates that a stripe includes 6 disks, where there are 4 data disks and 2 parity disks, the space utilization rate is 67%, and reading stripe data requires reading data blocks in 4 disks. Erasure code 8+2 indicates that the stripe contains 10 disks, wherein 8 disks and 2 parity disks exist, the space utilization rate is 80%, and reading stripe data requires reading data blocks in 8 disks. Therefore, the performance of erasure code 4+2 is higher than erasure code 8+2, and the cost of erasure code 4+2 is higher than erasure code 8+2.

Taking fig. 6 as an example, by dumping the data stored by erasure code 8+2 to erasure code 4+2, the access performance of "hot data" is improved, then the erasure code 8+2 stripe is the first target stripe, the data block in the erasure code 8+2 stripe is the first class data block, and the erasure code 4+2 stripe after dumping is the logical stripe. 8 first-class data blocks of a first target stripe are divided into 2 first-class data block groups, and each first-class data block group comprises 4 first-class data blocks. And calculating a check value according to each first-class data group to obtain 2 check blocks, writing the 2 check blocks generated by each first-class data group into a storage space, and forming a logic strip with 4 first-class data blocks in the corresponding first-class data group. One stripe of erasure code 8+2 can be split into 2 stripes of erasure code 4+2, 2 newly generated logical stripes only generate write IO of 4 parity chunks, data chunks in the logical stripes use the first type of data chunks of the first target stripe to not generate write IO, and IO overhead generated during data dumping can be greatly reduced.

Optionally, in practical applications, it is necessary to improve system performance by dumping a "hot data" disk, the dumped data may adopt a copy to improve data reliability, and based on a redundancy manner of the copy, step S102 in fig. 2 may include:

respectively reading each first type data block; respectively generating a data copy corresponding to each first-type data block;

the data copy is used as a data block to be written corresponding to each first-class data block;

for the method of using copy dump data, step S103 in fig. 2 may include:

and writing the data copy into the storage space, and forming a logical stripe with the corresponding first type data block.

Taking fig. 7 as an example, the data stored by the erasure code 4+2 is dumped into three copies, so that the access performance of "hot data" is improved, the erasure code 4+2 stripe is the first target stripe, the data blocks in the erasure code 4+2 stripe are the first type of data blocks, and the three copied copy stripes are the logical stripes. Since the data blocks in the copies are identical redundant backups, 4 first-type data in the first target stripe are respectively dumped into three copies when dumping the data, and thus it can be known that each three copies need to generate 2 redundant backups according to the corresponding first-type data blocks and write the redundant backups into the storage space. Therefore, a stripe of erasure code 4+2 can be split into 4 stripes of three copies, and 4 three copies only produce 8 write IOs for redundant backup. The other 4 data blocks use the first type data block of the first target stripe, no write IO is generated, and IO overhead generated during data dump can be greatly reduced.

Alternatively, in practical applications, in order to save disk space and reduce data storage cost, a "cold data" disk is selected for dumping. When the load utilization of the disk is less than the disk load low threshold, referring to fig. 5 on the basis of fig. 2, the sub-step of step S101, and the sub-step of step S101 may include:

step S10121, when the load utilization rate of the disk is smaller than the disk load low threshold, determining the disk as a second type target disk;

the second type target disk represents a disk with low service load;

step S10122, traversing all the strips in the system, and determining a second target strip;

and storing the second type data blocks of the second target stripe on a second type target disk, wherein the second target stripe is one of the target stripes.

In the embodiment of the present invention, the manner of processing the load utilization sampled data of the disk and determining the target stripe is similar to the manner set forth in step S10111, except that the condition for selecting the second type target disk is that the load utilization of the disk is smaller than the disk load low threshold and the stripe does not belong to the hot stripe region, which is not described herein again.

Optionally, in practical applications, it is necessary to reduce storage space by dumping a "cold data" disk, so as to reduce storage cost, where the dumped data may use parity chunks to improve data reliability, and based on a redundancy manner of the parity chunks, step S102 in fig. 2 may include:

respectively reading second type data blocks of at least two second target stripes; combining all the second type data blocks into at least one second type data block;

calculating a corresponding check block according to each second-class data group, and taking the check block as a data block to be written corresponding to the second-class data group;

for the manner of using check block dump data, step S103 in fig. 2 may include:

and writing the check block into the storage space, and forming a logic strip with the corresponding second-class data group.

In order to more clearly illustrate the data redundancy strategy changing method provided by the embodiment of the present application, an exemplary description is given by combining with the comparison of the prior art.

As an embodiment, taking FIG. 8 as an example, suppose that files 1-4 use three copies to ensure the reliability of file data, and the utilization rate of storage space of the three copies is 33%. In order to improve the space utilization rate and save the cost, the four files are dumped by adopting erasure codes of 4+2, then the three copy stripes are second target stripes, any data block in the three copies is a second type of data block, and the dumped erasure codes of 4+2 are logical stripes. Based on the erasure code 4+2 algorithm, a block of second-class data blocks in each file is read from the four files as data blocks of erasure code 4+2, and 2 parity blocks are generated according to the data blocks of erasure code 4+ 2. Therefore, 6 IOs are needed in the dump process for 4 data blocks, wherein the read IOs of the 4 data blocks and the write IOs of the 2 parity blocks are included. In the prior art, 28 IOs are required for implementing the same data redundancy policy change, wherein 1 read IO and 6 write IOs are required for dumping each file, as shown in FIG. 1. It can be seen that, when dumping data, the embodiment of the present invention directly uses the data blocks in the original three-copy stripe, does not apply for space rewriting of the file data blocks, only needs to newly write 2 parity blocks, and forms a logical stripe by 4 data blocks of the second target stripe and the newly generated 2 parity blocks. The data block is prevented from being re-segmented and stored, IO (input/output) overhead is greatly reduced, and data storage performance is improved.

As still another embodiment, taking FIG. 9 as an example, suppose that file 1 and file 2 use erasure code 4+2 to guarantee the reliability of the file data, and in order to save cost, use erasure code 8+2 to dump both files. Then the erasure code 4+2 stripe is the second target stripe, the data block in the erasure code 4+2 stripe is the second class data block, and the erasure code 8+2 stripe after dumping is the logical stripe. Based on the erasure code 8+2 algorithm, 8 data blocks of 2 files are read from 2 second target stripes, the 8 data blocks are used as data blocks of erasure code 8+2, 2 parity chunks are generated according to the 8 data blocks, and the newly generated 2 parity chunks are written. 8 data blocks require 10 IOs in the dump process, 8 read IOs and 2 write IOs. In the prior art, 28 IOs are required for realizing the same data redundancy policy change, wherein 8 read IOs and 20 write IOs are required. As can be seen, when data is dumped, the embodiment of the present invention directly uses the data blocks in erasure code 4+2 stripe, does not apply for space to rewrite the file data blocks, only needs to write 2 parity chunks newly, and forms a logical stripe by 8 data blocks in the second target stripe and the newly generated 2 parity chunks. The data block is prevented from being re-segmented and stored, and IO (input/output) expenditure is greatly reduced.

Optionally, in practical applications, it is necessary to reduce storage space by dumping a "cold data" disk, so as to reduce storage cost, where the dumped data may use a copy to improve data reliability, and step S102 in fig. 2 may include:

respectively reading second type data blocks of at least two second target stripes; respectively generating a data copy corresponding to each second-class data block;

the data copy is used as a data block to be written corresponding to each second type data block;

for the method of using copy dump data, step S103 in fig. 2 may include:

In the embodiment of the present invention, in order to save space, an inter-copy dump is implemented, for example, four copy stripes are dumped into two copy stripes, similar to the example in fig. 7, one data block is read from the four copy stripes as a second type data block, one copy of data is copied for the second type data block, the data copy is written into a storage space, and the data copy and the corresponding second type data block form a logical stripe of the second copy, which is not described herein again.

It should be noted that, in the embodiment of the present invention, the data redundancy policy change is not limited to the change between several redundancy policies in the embodiment, and may be used for the mutual change between various copies, various RAIDs, and various erasure codes, which is not limited in this embodiment of the present invention.

Based on the same inventive concept, the present embodiment further provides a data redundancy policy changing apparatus, please refer to fig. 10, and fig. 10 shows a block schematic diagram of a data redundancy policy changing apparatus 200 according to the present embodiment. The data redundancy policy changing apparatus 200 is applied to a storage node, and the data redundancy policy changing apparatus 200 includes a decision module 201, a generation module 202, and a write module 203.

A decision module 201, configured to determine a target stripe when a redundancy policy switching condition is satisfied; the target strip represents a strip meeting the switching requirement of the redundancy strategy;

a generating module 202, configured to determine at least one data block to be written according to at least one data block in the target stripe; a check block corresponding to the data block representation data block to be written or a copy corresponding to the data block;

the writing module 203 is configured to write a data block to be written into the storage space, and form a logical stripe with at least one data block.

Optionally, the decision module 201 is specifically configured to determine the disk as a first-class target disk when the load utilization of the disk is greater than a disk load high threshold; the first type of target disk represents a disk with high service load; and traversing all the strips in the system, and determining a first target strip, wherein a first type data block of the first target strip is stored on a first type target disk, and the first target strip is one of the target strips.

Optionally, the decision module 201 determines the disk as a second type target disk when the load utilization rate of the disk is less than the disk load low threshold; the second type target disc represents a disc with low service load; and traversing all the strips in the system, and determining a second target strip, wherein a second type data block of the second target strip is stored on a second type target disk, and the second target strip is one of the target strips.

Optionally, the generating module 202 is specifically configured to read each first-type data block; dividing the first type data block into at least two first type data groups; and respectively calculating a corresponding check block according to each first-class data group, and taking the check block as a data block to be written corresponding to the first-class data group. The writing module 203 is specifically configured to write the check block into the storage space, and form a logical stripe with the corresponding first class data group.

Optionally, the generating module 202 is specifically configured to read each first-type data block; respectively generating a data copy corresponding to each first-type data block; and taking the data copy as a data block to be written corresponding to each first-class data block. The writing module 203 is specifically configured to write the data copy into the storage space, and form a logical stripe with the corresponding first type data block.

Optionally, the generating module 202 is specifically configured to read second class data blocks of at least two second target stripes, respectively; combining all the second type data blocks into at least one second type data block; and respectively calculating a corresponding check block according to each second type data group, and taking the check block as a data block to be written corresponding to the second type data group. The writing module 203 is specifically configured to write the check block into the storage space, and form a logical stripe with the corresponding second-class data group.

Optionally, the generating module 202 is specifically configured to read second class data blocks of at least two second target stripes, respectively; respectively generating a data copy corresponding to each second type data block; and taking the data copy as a data block to be written corresponding to each second-class data block. The writing module 203 is specifically configured to write the data copy into the storage space, and the data copy and the corresponding second type data block form a logical stripe.

Fig. 11 is a block diagram of a storage node 100 according to an embodiment of the present invention. The storage node 100 may be a single storage node, a distributed storage node, a centralized storage node, etc. The storage node 100 includes a memory 110, a processor 120, and a communication module 130. The memory 110, the processor 120, and the communication module 130 are electrically connected to each other directly or indirectly to enable data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.

The memory 110 is used to store programs or data. The Memory 110 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.

The processor 120 is used to read/write data or programs stored in the memory 110 and perform corresponding functions. For example, when the computer program stored in the memory 110 is executed by the processor 120, the data redundancy policy changing method disclosed in the above embodiments can be implemented.

The communication module 130 is used for establishing a communication connection between the storage node 100 and another communication terminal through a network, and for transceiving data through the network.

It should be understood that the structure shown in fig. 11 is merely a schematic diagram of the structure of the storage node 100, and the storage node 100 may also include more or less components than those shown in fig. 11, or have a different configuration than that shown in fig. 11. The components shown in fig. 11 may be implemented in hardware, software, or a combination thereof.

An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by the processor 120, implements the data redundancy policy changing method disclosed in the foregoing embodiments.

In summary, embodiments of the present invention provide a method, an apparatus, an electronic device, and a storage medium for changing a data redundancy policy, where when a redundancy policy switching condition is satisfied, a target stripe is determined; the target strip represents a strip meeting the switching requirement of the redundancy strategy; determining at least one data block to be written according to at least one data block in the target stripe; a check block corresponding to the data block representation data block to be written in or a copy corresponding to the data block; and writing the data block to be written into the storage space, and forming a logic stripe with at least one data block. When the scheme is used for realizing data dump, a space is not newly applied for a data block, but the data block of the target strip is directly used, a check block or a copy is generated according to the switched redundancy strategy, and the data block of the target strip and the newly generated check block or copy form a logic strip to finish the data dump. IO (input/output) overhead caused by newly writing a data block is effectively avoided, so that the IO load of a disk is remarkably reduced, and the performance of a storage system is improved.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A data redundancy strategy changing method is applied to a storage node and comprises the following steps:

2. The method according to claim 1, wherein the step of determining a target stripe when the redundancy policy switching condition is satisfied comprises:

and traversing all the strips in the system, and determining a first target strip, wherein a first type data block of the first target strip is stored on the first type target disk, and the first target strip is one of the target strips.

3. The method according to claim 2, wherein the step of determining at least one data block to be written according to at least one data block in the target stripe comprises:

reading each first type data block respectively;

dividing the first type data block into at least two groups of first type data groups;

4. The method according to claim 2, wherein the step of determining at least one data block to be written according to at least one data block in the target stripe comprises:

reading each first type data block respectively;

5. The method according to claim 1, wherein the step of determining a target stripe when the redundancy policy switching condition is satisfied comprises:

6. The method according to claim 5, wherein the step of determining at least one data block to be written according to at least one data block in the target stripe comprises:

7. The method according to claim 5, wherein the step of determining at least one data block to be written according to at least one data block in the target stripe comprises:

8. A data redundancy policy changing apparatus applied to a storage node, the apparatus comprising:

9. A storage node, characterized in that the storage node comprises a memory for storing a computer program and a processor for performing the data redundancy policy change method according to any of claims 1-7 when the computer program is invoked.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a data redundancy policy changing method according to any one of claims 1 to 7.