CN116501262A - Data storage method and device, electronic equipment and storage medium - Google Patents

Data storage method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116501262A
CN116501262A CN202310730545.7A CN202310730545A CN116501262A CN 116501262 A CN116501262 A CN 116501262A CN 202310730545 A CN202310730545 A CN 202310730545A CN 116501262 A CN116501262 A CN 116501262A
Authority
CN
China
Prior art keywords
data
check
fragment
erasure
fragments
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310730545.7A
Other languages
Chinese (zh)
Other versions
CN116501262B (en
Inventor
魏东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Information Technologies Co Ltd
Original Assignee
New H3C Information Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Information Technologies Co Ltd filed Critical New H3C Information Technologies Co Ltd
Priority to CN202310730545.7A priority Critical patent/CN116501262B/en
Publication of CN116501262A publication Critical patent/CN116501262A/en
Application granted granted Critical
Publication of CN116501262B publication Critical patent/CN116501262B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1004Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's to protect a block of data words, e.g. CRC or checksum
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Detection And Correction Of Errors (AREA)

Abstract

The embodiment of the invention provides a data storage method, a device, electronic equipment and a storage medium, which relate to the technical field of data processing, each check fragment of an erasure code strip contains data blocks with the same number as the data fragments, each data block in each check fragment corresponds to different data fragments respectively, and the storage space of each data block is the same as that of each data fragment, and the method comprises the following steps: determining target data fragments to be written in by target data; writing target data in target data fragments and target data blocks in each check fragment; under the condition that each data fragment in the erasure coding strip does not have a free space, calculating based on the data stored in the verification fragment in the erasure coding strip to obtain verification data; storing each check data segment in the check data into different check fragments; and clearing data except the check data segments in each check fragment. By applying the scheme provided by the embodiment of the invention, the utilization rate of the storage space can be improved.

Description

Data storage method and device, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of data processing, in particular to a data storage method, a data storage device, electronic equipment and a storage medium.
Background
In the field of data storage, erasure coding technology can calculate based on target data to obtain check data, divide the target data into m target data segments, divide the check data into n check data segments, and store m target data segments and n check data segments on m+n disks respectively, store one target data segment or one check data segment on each disk, when no more than n disks fail, only m normal data segments need to be selected arbitrarily to recover the target data.
The related erasure coding technique writes the data segments into erasure coding stripes, wherein one erasure coding stripe is composed of a plurality of slices with the same size for storing data, for example, m data slices and n check slices, the data slices store target data, the check slices store check data, and each slice is respectively located in different magnetic discs.
Because the related erasure coding technology can only calculate the target data with a fixed length to obtain the check data, for example, the fixed length can be the length of the data which can be accommodated by the m data slices, and under the condition that the target data does not reach the fixed length, 0 can be complemented on the target data in the existing erasure coding technology, so that the target data after 0 complementation reaches the fixed length. But 0 is complemented to the target data, so that the length of the target data is increased, and the space for storing the target data in the erasure code stripes is additionally occupied. Therefore, when data is stored based on the current erasure coding technique, there is a problem that the storage space utilization is not high.
Disclosure of Invention
The embodiment of the invention aims to provide a data storage method, a data storage device, electronic equipment and a storage medium, so as to improve the utilization rate of storage space. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a data storage method, where an erasure coding stripe includes data slices and check slices, each check slice includes data blocks having the same number as the data slices, each data block in each check slice corresponds to a different data slice, and the size of storage space between each data block and each data slice is the same, and the method includes:
determining target data fragments to be written in by target data;
writing the target data in the target data fragments and target data blocks in each check fragment, wherein the target data blocks correspond to the target data fragments, and the positions of the target data fragments and the target data blocks where the target data is written are the same;
under the condition that each data fragment in the erasure coding strip does not have a free space, calculating to obtain check data based on the data stored in the check fragment in the erasure coding strip;
Storing each check data segment in the check data into different check fragments;
and clearing data except the check data segments in each check fragment.
In one embodiment of the present invention, the method further comprises:
under the condition that an instruction of covering the existing data in the erasure code strip by using the coverage data is received, determining a coverage data fragment to be written by the coverage data and a position to be written in a free space in the coverage data fragment;
writing the overlay data in the to-be-written position in the overlay data fragments, and writing the overlay data in overlay data blocks in each check fragment, wherein the overlay data blocks correspond to the overlay data fragments, and the overlay data fragments are the same as the positions in the overlay data blocks where the overlay data is written;
and determining the existing data as disabling data, and updating the address of the existing data contained in the metadata of the existing data to the address of the covering data.
In one embodiment of the present invention, after the clearing of the data except the check data segment in each check fragment, the method further includes:
Determining a disabling erasure code band in which the proportion of disabling data in the contained data reaches a preset proportion;
writing other data except the deactivated data in the deactivated erasure code stripes into other erasure code stripes;
and clearing the data in the erasure code stripes.
In one embodiment of the present invention, the calculating, based on the data stored in the check fragment in the erasure coding stripe, to obtain the check data in the case that no free space exists in each data fragment in the erasure coding stripe includes:
reading data in check fragments in a plurality of erasure code stripes at one time under the condition that no free space exists in each data fragment of the plurality of erasure code stripes;
and respectively based on the read data corresponding to each erasure code strip, calculating to obtain the check data corresponding to the erasure code strip.
In one embodiment of the present invention, the calculating to obtain the verification data based on the data stored in the verification slices in the erasure coding stripe includes:
for each check fragment in the erasure code stripe, controlling a storage device in which the check fragment is located to execute the following operations:
reading data stored in the check fragment;
Calculating to obtain check data based on the read data;
storing each check data segment in the check data into different check fragments comprises the following steps:
and storing the check data segments expected to be stored in the check fragments.
In a second aspect, an embodiment of the present invention provides a data storage device, where an erasure coding stripe includes data slices and check slices, each check slice includes data blocks having the same number as the data slices, each data block in each check slice corresponds to a different data slice, and storage spaces of each data block and each data slice are the same, where the device includes:
the first determining module is used for determining target data fragments to be written in by target data;
the first writing module is used for writing the target data in the target data fragments and target data blocks in each check fragment, wherein the target data blocks correspond to the target data fragments, and the positions of the target data fragments and the target data blocks where the target data is written are the same;
the calculation module is used for calculating and obtaining check data based on the data stored in the check fragments in the erasure coding strip under the condition that no free space exists in each data fragment in the erasure coding strip;
The storage module is used for storing each check data segment in the check data into different check fragments;
and the first emptying module is used for emptying the data except the check data segments in each check fragment.
In one embodiment of the present invention, the apparatus further comprises:
the second determining module is used for determining an overlay data fragment to which the overlay data is to be written and a position to be written in an idle space in the overlay data fragment under the condition that an instruction of using the overlay data to overlay the existing data in the erasure code strip is received;
a second writing module, configured to write the overlay data in the to-be-written position in the overlay data slice, and write the overlay data in an overlay data block in each check slice, where the overlay data block corresponds to the overlay data slice, and the overlay data slice is the same as the position in the overlay data block where the overlay data is written;
and the disabling module is used for determining the existing data as disabling data and updating the address of the existing data contained in the metadata of the existing data into the address of the coverage data.
In one embodiment of the present invention, the apparatus further comprises:
a third determining module, configured to determine a disabling erasure code stripe in which a proportion of disabling data in the included data reaches a preset proportion;
a third writing module, configured to write other data except for the deactivated data in the deactivated erasure code stripe into other erasure code stripes;
and the second emptying module is used for emptying the data in the erasure code band.
In one embodiment of the present invention, the computing module is specifically configured to:
reading data in check fragments in a plurality of erasure code stripes at one time under the condition that no free space exists in each data fragment of the plurality of erasure code stripes;
and respectively based on the read data corresponding to each erasure code strip, calculating to obtain the check data corresponding to the erasure code strip.
In one embodiment of the present invention, the computing module is specifically configured to:
for each check fragment in the erasure code stripe, controlling a storage device in which the check fragment is located to execute the following operations:
reading data stored in the check fragment;
calculating to obtain check data based on the read data;
the storage module is specifically configured to:
And storing the check data segments expected to be stored in the check fragments.
In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any one of the first aspects when executing a program stored on a memory.
In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method steps of any of the first aspects.
The embodiment of the invention has the beneficial effects that:
the embodiment of the invention provides a data storage method, wherein an erasure code stripe comprises data fragments and check fragments, each check fragment comprises data blocks with the same number as the data fragments, each data block in each check fragment corresponds to different data fragments, and the storage space of each data block is the same as that of each data fragment, and the method comprises the following steps: determining target data fragments to be written in by target data; writing the target data into the target data fragments and target data blocks in each check fragment, wherein the target data blocks correspond to the target data fragments, and the positions of the target data fragments and the target data blocks where the target data is written are the same; under the condition that no free space exists in each data slice in the erasure code strip, calculating to obtain verification data based on the data stored in the verification slice in the erasure code strip; storing each check data segment in the check data into different check fragments; and clearing data except the check data segments in each check fragment.
From the above, in the scheme provided by the embodiment of the invention, the data is written into the data fragments of the erasure-coded strip, and the check data of the erasure-coded strip is calculated again under the condition that each data fragment of the erasure-coded strip is fully written, so that 0 supplementing is not required in the data fragments of the erasure-coded strip, and the storage space utilization rate of the erasure-coded strip is improved. Particularly, under the condition of small data writing quantity or small file writing, the scheme provided by the embodiment of the invention does not need to supplement 0 in a large quantity in the data slicing of the erasure code stripes, and the problem of large waste of the erasure code stripe storage space under the scene of small data writing quantity or small file writing is obviously solved.
If the data fragments fail in the case where the check data of the erasure-coded stripe has not been calculated yet, the target data is written into both the target data fragments of the erasure-coded stripe and the target data blocks in each check fragment, and if the erasure-coded stripe includes n check fragments, the target data is stored in n+1 parts, and the target data can be restored as long as the total number of the n data fragments or the check fragments fail.
In addition, the embodiment of the invention empties the data except the check data segment in each check fragment, so that the final erasure code calculation result is consistent with the erasure code calculation result in the related technology.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other embodiments may be obtained according to these drawings to those skilled in the art.
FIG. 1 is a schematic diagram of an erasure code stripe in a related erasure code technique;
FIG. 2 is a schematic diagram of a first structure of erasure code stripes after writing target data in a related erasure code technique;
FIG. 3 is a schematic diagram of a second structure of erasure code stripes after writing target data in a related erasure code technique;
FIG. 4 is a flowchart illustrating a first data storage method according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an erasure code stripe according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a first structure of an erasure code stripe after writing target data according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a second structure of an erasure code stripe after writing target data according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a third structure of an erasure code stripe after writing target data according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of a fourth structure of an erasure code stripe after writing target data according to an embodiment of the present invention;
FIG. 10 is a schematic diagram of a fifth structure of an erasure code stripe after writing target data according to an embodiment of the present invention;
FIG. 11 is a flowchart illustrating a second data storage method according to an embodiment of the present invention;
FIG. 12 is a schematic diagram of an erasure code stripe after writing overlay data according to an embodiment of the present invention;
FIG. 13 is a flowchart illustrating a third data storage method according to an embodiment of the present invention;
FIG. 14 is a flowchart of a fourth data storage method according to an embodiment of the present invention;
FIG. 15 is a flowchart illustrating a fifth data storage method according to an embodiment of the present invention;
FIG. 16 is a flowchart illustrating a sixth data storage method according to an embodiment of the present invention;
FIG. 17 is a schematic diagram of a first data storage device according to an embodiment of the present invention;
FIG. 18 is a schematic diagram of a second data storage device according to an embodiment of the present invention;
FIG. 19 is a schematic diagram of a third data storage device according to an embodiment of the present invention;
fig. 20 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by the person skilled in the art based on the present invention are included in the scope of protection of the present invention.
A schematic structural diagram of erasure code stripes in the related erasure code technique is shown in fig. 1, and two erasure code stripes are shown in fig. 1: erasure code stripe 1 and erasure code stripe 2. As can be seen from fig. 1, each erasure code strip contains m data slices and n parity slices, with a small rectangular box representing one slice. Illustratively, data slice 1-1 is the first data slice of erasure-coded stripe 1, data slice 1-2 is the second data slice of erasure-coded stripe 1, data slice 1-m is the mth data slice of erasure-coded stripe 1, parity slice 1-1 is the first parity slice of erasure-coded stripe 1, parity slice 1-n is the nth parity slice of erasure-coded stripe 1, and so on. In the related erasure coding technique, each slice is located in a different linear space, where different linear spaces are located in different disks of the same electronic device, or may be located in a storage space of a different electronic device (such as a server), which is exemplary, as can be seen from fig. 1, data slice 1-m and data slice 2-m are all located in linear space m, check slice 1-1 and check slice 2-1 are all located in linear space m+1, check slice 1-n and check slice 2-n are all located in linear space m+n, and so on.
In the related erasure coding technology, under the condition that all data fragments in one erasure coding stripe can just be written with target data, 0 supplementing is not needed in the data fragments. For example, referring to fig. 2, a schematic diagram of a first structure of an erasure code stripe after writing target data in the related erasure code technology is shown, where gray in a small rectangular box indicates that data is written in an area where gray is located, and blank indicates that no data is written therein. As can be seen from fig. 2, the length of the target data is exactly the length of the data that can be accommodated by all the data slices in the erasure-coded stripe 1, so that 0 is not needed to be added in the data slices of the erasure-coded stripe 1, the check data can be calculated directly based on the target data written in the erasure-coded stripe 1, and the check data is divided into n check data segments and written into n check slices respectively.
In the related erasure coding technology, if target data is written in one erasure coding stripe, but data fragments are not fully written in the erasure coding stripe, 0 is needed to be supplemented to the data fragments of the data which are not fully written in the erasure coding stripe, so that each data fragment in the erasure coding stripe is fully written in the data, and further verification data of the erasure coding stripe is calculated. As shown in fig. 3, the second structure diagram of the erasure-coded stripe after writing the target data in the related erasure-coded technique is illustrated, and as can be seen from fig. 3, the target data is insufficient to write all the data slices in the erasure-coded stripe 1, so that 0 is complemented to the portion of each data slice in the erasure-coded stripe 1, where the target data is not written, so that each data slice in the erasure-coded stripe 1 is full of data, and further the verification data of the erasure-coded stripe 1 is calculated. In this case, the 0 supplement additionally occupies a space for storing target data in the erasure code stripe, and thus, the related erasure code technology has a problem of low storage space utilization.
In order to solve the above problems, embodiments of the present invention provide a data storage method, apparatus, electronic device, and storage medium, and the following details are respectively described.
First, a data storage method provided by the embodiment of the invention is described.
Referring to fig. 4, a flowchart of a first data storage method according to an embodiment of the present invention is shown, where the method includes the following steps S401 to S405.
In the embodiment of the invention, the erasure coding stripe comprises data fragments and check fragments, each check fragment comprises data blocks with the same number as the data fragments, each data block in each check fragment corresponds to different data fragments, and the size of the storage space of each data block is the same as that of each data fragment.
Fig. 5 is a schematic structural diagram of an erasure coding stripe provided by an embodiment of the present invention, and as can be seen from fig. 5, compared with the structure of the erasure coding stripe in the related art shown in fig. 1, in the erasure coding stripe provided by the embodiment of the present invention, one erasure coding stripe includes m data slices, and each check slice of the erasure coding stripe includes m data blocks, each data block in each check slice corresponds to a different data slice, and the size of each data block is the same as the size of the storage space of each data slice.
Illustratively, the check fragment 1-1 in fig. 5 includes m data blocks such as data block 1 and data block 2, where the data block 1 of the check fragment 1-1 corresponds to the data fragment 1-1, the data block 2 of the check fragment 1-1 corresponds to the data fragment 1-2, and the data block m of the check fragment 1-1 corresponds to the data fragment 1-m; the check fragment 1-n in fig. 5 includes m data blocks such as data block 1, data block 2, etc., wherein data block 1 of the check fragment 1-n corresponds to data fragment 1-1, data block 2 of the check fragment 1-n corresponds to data fragment 1-2, data block m of the check fragment 1-n corresponds to data fragment 1-m, and so on. The linear spaces shown in fig. 5 may be a disk, or may be a server or a file within a local file system, where the addresses of the linear spaces are monotonically increasing logically, and the mapping of the actual physical space to the logical space is performed by the stored space management module.
Step S401: and determining the target data fragment to be written in by the target data.
In the embodiment of the invention, a plurality of erasure code strips are constructed in advance, before the target data is written into the erasure code strips, the erasure code strips and the linear space to be written into by the target data can be determined according to the metadata information of the target data, and after the erasure code strips and the linear space to be written into by the target data are determined, the target data fragments to be written into by the target data are determined.
For example, according to the erasure code stripe structure schematic diagram shown in fig. 5, if it is determined that the erasure code stripe to be written in the target data is erasure code stripe 1 and the linear space to be written in the target data is linear space 1, then it is determined that the target data to be written in the target data is fragmented into data fragments 1-1. The target data may be I/O (Input/Output) data. In particular, the specific manner of determining the target data slice to be written in the target data may refer to the related art, which is not described herein.
Step S402: and writing the target data into the target data blocks in the target data fragments and the check fragments.
Wherein the target data block corresponds to the target data slice, and the target data slice is the same as the position in the target data block where the target data is written. Specifically, the target data is written in the free space of the target data fragment and the free space of the target data block.
Exemplary, referring to fig. 6, a first structural schematic diagram of an erasure code stripe after writing target data according to an embodiment of the present invention is provided. As can be seen from fig. 6, the target data D1 is written into the data slice 1-1, that is, the target data slice is the data slice 1-1. In fig. 6, since the target data block in each of the check slices corresponding to the data slice 1-1 in the erasure-coded stripe 1 is the data block 1 of each of the check slices, the target data D1 is also written in the data block 1 of each of the check slices in the erasure-coded stripe 1. As can be seen from fig. 6, the positions of writing the target data D1 in the data fragments and the data blocks are the same.
The writing of subsequent target data into the erasure code stripe may continue in the manner of writing target data D1. Exemplary, referring to fig. 7, a second structural schematic diagram of an erasure code stripe after writing target data according to an embodiment of the present invention is provided. As can be seen from fig. 7, the target data D2, D4 to be continuously written are written with the data fragments 1-2, and the target data D3 to be continuously written are written with the data fragments 1-m; the target data blocks in each of the check slices corresponding to the data slices 1-2 in the erasure-coded stripe 1 are the data blocks 2 in each of the check slices, and the target data blocks in each of the check slices corresponding to the data slices 1-m in the erasure-coded stripe 1 are the data blocks m in each of the check slices, so that the target data D2 and D4 are written in the data blocks 2 in each of the check slices in the erasure-coded stripe 1, and the target data D3 are written in the data blocks m in each of the check slices in the erasure-coded stripe 1. As can be seen from fig. 7, the positions of writing the target data D2 in the data slices and the data blocks are the same, the positions of writing the target data D4 in the data slices and the data blocks are the same, and the positions of writing the target data D3 in the data slices and the data blocks are the same.
In addition, in fig. 7, after the target data D2, D4 is written into the data slice 1-2, there is no free space in the data slice 1-2, in this case, if it is determined that the target data D5 is written into the linear space 2, since there is no free space in the data slice 1-2, the target data D5 is sequentially written back into the linear space 2, that is, the target data D5 is written into the data slice 2-2 of the erasure-coded stripe 2. Since the target data block in each of the check slices corresponding to the data slice 2-2 in the erasure correction code stripe 2 is the data block 2 of each of the check slices, the target data D5 is written in the data block 2 of each of the check slices in the erasure correction code stripe 2, and the positions where the target data D5 is written in the data slice and each of the data blocks are the same.
Step S403: and under the condition that no free space exists in each data slice in the erasure code strip, calculating to obtain the verification data based on the data stored in the verification slice in the erasure code strip.
Specifically, data is continuously written into the data slices and the check slices of the erasure-coded stripes in the manner described in the step S401 and the step S402, and each data slice in the erasure-coded stripes is full of data, that is, when no free space exists, each check slice in the erasure-coded stripes is full of data. Exemplary, referring to fig. 8, a third structural schematic diagram of an erasure code stripe after writing target data according to an embodiment of the present invention is provided. In fig. 8, each data slice and check slice in erasure code strip 1 is full of data, and there is no free space.
And under the condition that no free space exists in each data slice in the erasure code strip, reading the data stored in the check slice in the erasure code strip, and calculating based on the read data to obtain the check data.
In one embodiment of the present invention, in the case that no free space exists in each data slice in the erasure coding stripe, the data stored in each check slice of the erasure coding stripe is identical to the total data information stored in all the data slices of the erasure coding stripe, so that the check data can be calculated based on the data stored in any one check slice of the erasure coding stripe. For example, the verification data may be calculated based on the data stored in the verification tile 1-1 of the erasure code stripe 1 in fig. 8.
In another embodiment of the present invention, the verification data may also be obtained in a manner as shown in the embodiment of fig. 15 below, which is not described in detail herein.
Specifically, the specific manner of calculating the verification data based on the total data information stored in all the data slices of the erasure code stripes may refer to the related art, and is not described herein.
Step S404: and storing each check data segment in the check data into different check fragments.
Specifically, each verification segment of the erasure code strip finally stores only the corresponding verification data segment, and the verification data segments corresponding to each verification segment are combined to form the verification data of the erasure code strip.
After the verification data is obtained through calculation, the verification data segments in the verification data corresponding to each verification fragment are determined, and the corresponding verification data segments are stored in each verification fragment.
In one embodiment of the invention, after data stored in the check fragments based on the erasure coding strip are calculated to obtain check data, the check data are divided into different check data segments, then the corresponding check data segments are sent to the storage device where each check fragment of the erasure coding strip is located, one data block in each check fragment is emptied, and the check data segments corresponding to the check fragments are stored in the emptied data block.
After storing the corresponding check data segment in each check fragment, marking the data block of each check fragment, in which the check data segment is not stored, as the data block to be cleared, referring to fig. 9, a fourth structural schematic diagram of the erasure code stripe after writing the target data provided in the embodiment of the present invention may be referred to as fig. 9, where in fig. 9, the black data block indicates that the data block is marked as the data block to be cleared.
In particular, the detailed manner of dividing the check data into different check data segments and determining the check data segments corresponding to different check fragments may refer to the related art, which is not described herein.
Step S405: and clearing data except the check data segments in each check fragment.
And (3) clearing data in all the data blocks to be cleared in each check fragment, and taking the storage space corresponding to the cleared data blocks into a free space list for subsequent use. After the data except the check data segments in each check fragment are emptied, the metadata of each check fragment is updated to indicate that the corresponding check data segments are stored in the check fragment.
Exemplary, referring to fig. 10, a fifth structural diagram of an erasure code stripe after writing target data according to an embodiment of the present invention is shown. As shown in fig. 10, after the data except the check data segment in each check fragment in the erasure coding stripe 1 is emptied, the storage space corresponding to the empty data block in the erasure coding stripe 1 is included in the free space list, that is, each check fragment of the erasure coding stripe 1 releases the storage space with the size of m-1 data blocks. In this case, each data slice in the erasure-coded stripe 1 is full of data, each verification slice only stores a corresponding verification data segment, and the finally obtained erasure-coded stripe 1 is consistent with the erasure-coded stripe obtained by the related erasure-coded technology.
From the above, in the scheme provided by the embodiment of the invention, the data is written into the data fragments of the erasure-coded strip, and the check data of the erasure-coded strip is calculated again under the condition that each data fragment of the erasure-coded strip is fully written, so that 0 supplementing is not required in the data fragments of the erasure-coded strip, and the storage space utilization rate of the erasure-coded strip is improved. Particularly, under the condition of small data writing quantity or small file writing, the scheme provided by the embodiment of the invention does not need to supplement 0 in a large quantity in the data slicing of the erasure code stripes, and the problem of large waste of the erasure code stripe storage space under the scene of small data writing quantity or small file writing is obviously solved.
If the data fragments fail in the case where the check data of the erasure-coded stripe has not been calculated yet, the target data is written into both the target data fragments of the erasure-coded stripe and the target data blocks in each check fragment, and if the erasure-coded stripe includes n check fragments, the target data is stored in n+1 parts, and the target data can be restored as long as the total number of the n data fragments or the check fragments fail.
In addition, the embodiment of the invention empties the data except the check data segment in each check fragment, so that the final erasure code calculation result is consistent with the erasure code calculation result in the related technology.
In the process of covering the existing data in the erasure code strip by using the related erasure code technology, all the data in all the data fragments in the erasure code strip need to be read out, after the existing data to be covered is cleared, the coverage data is written in the corresponding position after the clearing, then the check data is recalculated, and finally the corresponding data is written in the data fragments and the check fragments of the erasure code strip. In the process, the related erasure coding technology has the problems of write amplification, bandwidth resource and calculation resource waste; moreover, under the condition of using the solid state disk, the related erasure code technology can increase the workload of erasing and writing data of the solid state disk, reduce the performance of the solid state disk and reduce the service life of the solid state disk. In order to solve the above problems, an embodiment of the present invention provides an embodiment shown in fig. 11 below.
Referring to fig. 11, a flowchart of a second data storage method according to an embodiment of the present invention is shown, and compared with the embodiment shown in fig. 4, the method further includes the following steps S406 to S408.
Step S406: and under the condition that an instruction for covering the existing data in the erasure code strip by using the coverage data is received, determining the coverage data fragments to be written by the coverage data and the positions to be written in the free space in the coverage data fragments.
Specifically, after receiving an instruction to overwrite existing data in the erasure code stripe with overlay data, determining an overlay data fragment to which the overlay data is to be written according to the method of determining a target data fragment to which the target data is to be written in the embodiment shown in fig. 4, and further determining a writing position of the overlay data in a free space in the overlay data fragment.
Step S407: and writing the overlay data in the to-be-written position in the overlay data fragments, and writing the overlay data in the overlay data blocks in each check fragment.
Wherein the overlay data block corresponds to the overlay data slice, and the overlay data slice is the same as the location in the overlay data block where the overlay data is written.
Specifically, in the embodiment shown in fig. 4, the above-mentioned overlay data is written into the erasure code stripe in such a manner that the target data is written into the erasure code stripe.
As can be seen from the above description of step S406 and step S407, in the embodiment of the present invention, the overlay data is written as new target data into the erasure-coded stripe, that is, in the embodiment of the present invention, the overlay data corresponds to the target data in the embodiment shown in fig. 4, the overlay data fragmentation corresponds to the target data fragmentation in the embodiment shown in fig. 4, and the overlay data block corresponds to the target data block in the embodiment shown in fig. 4.
Exemplary, referring to fig. 12, a schematic diagram of an erasure code stripe after writing overlay data is provided in an embodiment of the present invention. In fig. 12, the existing data D2 is overwritten with the overwrite data D2', and it can be seen that the overwrite data slice of the certain write overwrite data D2' is the data slice 1-2, and the overwrite data D2' is written in the data slice 1-2 and the data block corresponding to the data slice 1-2 in each check slice.
Step S408: and determining the existing data as the disabling data, and updating the address of the existing data contained in the metadata of the existing data to the address of the covering data.
Specifically, after the overlay data is written in the erasure code stripe, the existing data to be overlaid by the overlay data is determined as the disabling data. In order to achieve the effect of overwriting existing data with overlay data, an address of the existing data included in metadata of the existing data is updated to an address of the overlay data, and in this case, a memory space in which the existing data is accessed is actually accessed by accessing the memory space in which the overlay data is located, and further, the overlay data is accessed.
From the above, in the scheme provided by the embodiment of the invention, the overlay data is written into the erasure correction code stripe as new target data, namely, the overlay data is written into the erasure correction code stripe according to the mode of writing the target data, so that the problems of erasure correction code stripe data and data slicing of the erasure correction code stripe are avoided, the existing data to be overlaid are cleared in the related erasure correction code technology, the overlay data is written into the cleared corresponding position, the check data is recalculated, and finally, a series of processes of writing the corresponding data in the erasure correction code stripe data slicing and the check slicing are performed, and the like are avoided, thereby avoiding the problems of write amplification, bandwidth resource and calculation resource waste existing in the process of using the erasure correction code technology to cover the existing data, reducing the workload of erasing and writing the data of the solid state disk under the condition of using the solid state disk, improving the solid state disk performance, and prolonging the service life of the solid state disk. Moreover, as the covered existing data still exists in the erasure code stripes, the scheme provided by the embodiment of the invention also provides bottom-layer support for lossless snapshot.
In the embodiment shown in fig. 11, after the existing data in the erasure code stripe is covered with the coverage data, the existing data is determined as the disabling data, and if the disabling data is stored in the erasure code stripe for a long time, the storage space is wasted, so that the embodiment shown in fig. 13 below is proposed to solve the problem.
Referring to fig. 13, a flowchart of a third data storage method according to an embodiment of the present invention is shown, and compared with the embodiment shown in fig. 11, the method further includes the following steps S409 to S4011.
Step S409: determining a shutdown erasure code stripe in which the proportion of shutdown data in the contained data reaches a preset proportion.
Specifically, when the proportion of the data amount of the disabling data in the data fragments of the erasure-coded stripe to the data amount which can be stored in all the data fragments of the erasure-coded stripe reaches a preset proportion, determining the erasure-coded stripe as the disabling erasure-coded stripe.
In one embodiment of the invention, the determination to deactivate the erasure code stripes may be made periodically; in another embodiment of the present invention, the determination of disabling the erasure code stripes may be performed upon receipt of a disable erasure code stripe determination instruction.
Step S4010: and writing other data except the deactivated data in the deactivated erasure code stripes into other erasure code stripes.
After determining the deactivated erasure code stripes, other data than the deactivated data in the deactivated erasure code stripes may be written into the free spaces of the data slices of the other erasure code stripes.
Specifically, other data may be read from the deactivated erasure code stripes and written into the free space of the data slices of the other erasure code stripes in accordance with the embodiment shown in fig. 4.
Step S4011: and clearing the data in the erasure code stripes.
After writing other data except the deactivated data in the deactivated erasure code stripes into other erasure code stripes, the data in the deactivated erasure code stripes can be emptied, and the deactivated erasure code stripes become erasure code stripes without data written, and can be used for writing new data subsequently.
From the above, in the scheme provided by the embodiment of the present invention, after determining that the deactivated data in the erasure code strip reaches the preset proportion, it is determined that the erasure code strip is the deactivated erasure code strip. Writing other data except the deactivated data in the deactivated erasure code strip into other erasure code strips, and emptying the data in the deactivated erasure code strip for writing of subsequent new data, so that the storage space occupied by the deactivated data is released, and the storage space utilization rate is improved.
If each data slice of an erasure code stripe is full of data, the erasure code stripe verification data is calculated immediately, and in this process, data reading is needed to perform verification data calculation, that is, when a certain amount of erasure code stripes verification data are obtained by calculation, the number of times of data reading is the same as the number of erasure code stripes, and each time of erasure code stripe reading occupies one bandwidth resource. In order to reduce the number of data reads due to the calculation of the verification data, embodiments of the present invention provide the embodiments shown in FIG. 14 below.
Referring to fig. 14, a flowchart of a fourth data storage method according to an embodiment of the present invention is shown, and compared with the embodiment shown in fig. 4, the above step S403 may be implemented by the following steps S403A and S403B.
Step S403A: and reading the data in the check fragments in the erasure code stripes at one time under the condition that no free space exists in each data fragment of the erasure code stripes.
In the embodiment of the invention, after each data slice of one erasure-coded strip is fully written, the check data of the erasure-coded strip is calculated immediately, and the data in the check slices of the erasure-coded strip is read again under the condition that each data slice of the erasure-coded strip is fully written. The specific number of the plurality of erasure code stripes may be set as desired, and embodiments of the present invention are not limited thereto.
Step S403B: and respectively based on the read data corresponding to each erasure code strip, calculating to obtain the check data corresponding to the erasure code strip.
After the data in the check fragments in the plurality of erasure code bands are read at one time, the check data corresponding to each erasure code band can be calculated and obtained based on the data corresponding to each erasure code band.
As can be seen from the above, in the scheme provided by the embodiment of the present invention, when each data slice of the plurality of erasure code bands is full of data, the data in the check slices of the plurality of erasure code bands is read again, so as to calculate and obtain the check data corresponding to each erasure code band, and the number of times of data reading due to calculation of the check data can be effectively reduced.
In the process of calculating the check data of the erasure code stripes, if only one check fragment in the erasure code stripes is selected, and the check data is calculated based on the data in the check fragment, the obtained check data is further divided into a plurality of different check data segments, and corresponding check data segments are sent from the linear space where the check fragment is located to the check fragments in other linear spaces, so that the consumption of bandwidth resources is brought.
Referring to fig. 15, a flowchart of a fifth data storage method according to an embodiment of the present invention is shown, and compared with the embodiment shown in fig. 4, the above step S403 may be implemented by the following steps S403C and S403D, and the above step S404 may be implemented by the following step S404A.
Step S403C: and aiming at each check fragment in the erasure code strip, controlling a storage device where the check fragment is positioned to read the data stored in the check fragment.
Specifically, the data stored in each check fragment in the erasure coding stripe is the same, and the data is the same as the total data information stored in each data fragment of the erasure coding stripe.
Step S403D: and calculating verification data based on the read data.
Because the data stored in each of the check fragments in the erasure-coded stripe is the same as the total data information stored in each of the data fragments in the erasure-coded stripe, for each check fragment, the check data can be calculated based on the data stored in the check fragment.
Step S404A: and storing the check data segments expected to be stored in the check fragments in the check data into the check fragments.
And after each check fragment is calculated based on the data stored in the check fragment to obtain check data, selecting a check data segment expected to be stored by the check fragment from the check data obtained based on the check fragment, and storing the check data segment in the check fragment. Specifically, the specific manner of selecting the parity data segment that the parity shard is expected to store from the parity data obtained based on the parity shard may refer to the related art, which is not described herein in detail.
As can be seen from the above, in the scheme provided by the embodiment of the present invention, for each verification segment, the verification data can be calculated based on the data stored in the verification segment, the calculation process is completed locally, no network resource is consumed, and each data block of the verification segment only participates in one calculation to obtain the verification data, so that no calculation resource is wasted; and for each check fragment in the erasure code strip, the check data segment expected to be stored by the check fragment is selected from the check data obtained based on the check fragment, and the check data segment is stored in the check fragment, so that the corresponding check data segment is not required to be sent from one linear space to the check fragments of other linear spaces, and the consumption of bandwidth resources is reduced.
Referring to fig. 16, a flowchart of a sixth data storage method according to an embodiment of the present invention is shown, and the method includes the following steps S1601 to S1606.
Step S1601: target data is acquired.
Step S1602: an erasure code stripe is constructed.
Step S1603: and determining the linear space to be written in by the target data and the target data fragments according to the metadata information of the target data.
Step S1604: and writing the target data into the target data fragments of the erasure code strip and the target data blocks in each check fragment.
Step S1605: and recording the storage mode and the spatial index of the target data.
Specifically, after the target data is written into the erasure code stripe, the storage mode and the spatial index of the target data are recorded, so as to update metadata information of the target data.
Step S1606: and returning a writing result to a calling layer when the target data is written into the erasure code strip.
Specifically, the writing result of the target data is returned to the calling layer, so that the target data can be quickly called when needed.
Specifically, the specific implementation manner of the embodiment shown in fig. 16 may refer to the content described in the foregoing embodiments, which is not described herein again.
Corresponding to the data storage method, the embodiment of the invention also provides a data storage device.
Referring to fig. 17, for a schematic structural diagram of a first data storage device according to an embodiment of the present invention, an erasure coding stripe includes data slices and check slices, each check slice includes data blocks having the same number as the data slices, each data block in each check slice corresponds to a different data slice, and the size of storage space between each data block and each data slice is the same, where the device includes:
A first determining module 1701 is configured to determine a target data slice into which target data is to be written.
A first writing module 1702 configured to write the target data into the target data chunk and the target data chunks in each check chunk, where the target data chunk corresponds to the target data chunk, and the target data chunk has the same location in which the target data is written in the target data chunk.
And a calculation module 1703, configured to calculate, based on data stored in the check fragments in the erasure-coded stripe, to obtain check data when no free space exists in each data fragment in the erasure-coded stripe.
And the storage module 1704 is used for storing each check data segment in the check data into different check fragments.
A first flushing module 1705 is configured to flush data in each of the parity chunks except for the parity data segment.
From the above, in the scheme provided by the embodiment of the invention, the data is written into the data fragments of the erasure-coded strip, and the check data of the erasure-coded strip is calculated again under the condition that each data fragment of the erasure-coded strip is fully written, so that 0 supplementing is not required in the data fragments of the erasure-coded strip, and the storage space utilization rate of the erasure-coded strip is improved. Particularly, under the condition of small data writing quantity or small file writing, the scheme provided by the embodiment of the invention does not need to supplement 0 in a large quantity in the data slicing of the erasure code stripes, and the problem of large waste of the erasure code stripe storage space under the scene of small data writing quantity or small file writing is obviously solved.
If the data fragments fail in the case where the check data of the erasure-coded stripe has not been calculated yet, the target data is written into both the target data fragments of the erasure-coded stripe and the target data blocks in each check fragment, and if the erasure-coded stripe includes n check fragments, the target data is stored in n+1 parts, and the target data can be restored as long as the total number of the n data fragments or the check fragments fail.
In addition, the embodiment of the invention empties the data except the check data segment in each check fragment, so that the final erasure code calculation result is consistent with the erasure code calculation result in the related technology.
Referring to fig. 18, a schematic structural diagram of a second data storage device according to an embodiment of the present invention is provided, and compared with the embodiment shown in fig. 17, the device further includes:
a second determining module 1706, configured to determine, when an instruction for overwriting existing data in the erasure code stripe with overlay data is received, an overlay data slice in which the overlay data is to be written and a location to be written in a free space in the overlay data slice.
And a second writing module 1707, configured to write the overlay data in the to-be-written location in the overlay data slice, and write the overlay data in an overlay data block in each check slice, where the overlay data block corresponds to the overlay data slice, and the overlay data slice is the same as the location in the overlay data block where the overlay data is written.
And a disabling module 1708, configured to determine the existing data as disabling data, and update an address of the existing data included in metadata of the existing data to an address of the overlay data.
From the above, in the scheme provided by the embodiment of the invention, the overlay data is written into the erasure correction code stripe as new target data, namely, the overlay data is written into the erasure correction code stripe according to the mode of writing the target data, so that the problems of erasure correction code stripe data and data slicing of the erasure correction code stripe are avoided, the existing data to be overlaid are cleared in the related erasure correction code technology, the overlay data is written into the cleared corresponding position, the check data is recalculated, and finally, a series of processes of writing the corresponding data in the erasure correction code stripe data slicing and the check slicing are performed, and the like are avoided, thereby avoiding the problems of write amplification, bandwidth resource and calculation resource waste existing in the process of using the erasure correction code technology to cover the existing data, reducing the workload of erasing and writing the data of the solid state disk under the condition of using the solid state disk, improving the solid state disk performance, and prolonging the service life of the solid state disk. Moreover, as the covered existing data still exists in the erasure code stripes, the scheme provided by the embodiment of the invention also provides bottom-layer support for lossless snapshot.
Referring to fig. 19, a schematic structural diagram of a third data storage device according to an embodiment of the present invention is provided, and compared with the embodiment shown in fig. 18, the device further includes:
a third determining module 1709 is configured to determine a deactivated erasure code band in which a proportion of deactivated data in the contained data reaches a preset proportion.
And a third writing module 1710, configured to write other data except for the deactivated data in the deactivated erasure code stripe into other erasure code stripes.
A second flushing module 1711, configured to flush the data in the erasure code stripe.
From the above, in the scheme provided by the embodiment of the present invention, after determining that the deactivated data in the erasure code strip reaches the preset proportion, it is determined that the erasure code strip is the deactivated erasure code strip. Writing other data except the deactivated data in the deactivated erasure code strip into other erasure code strips, and emptying the data in the deactivated erasure code strip for writing of subsequent new data, so that the storage space occupied by the deactivated data is released, and the storage space utilization rate is improved.
In one embodiment of the present invention, the calculating module 1703 is specifically configured to:
reading data in the check fragments in the erasure code stripes at one time under the condition that no free space exists in each data fragment of the erasure code stripes;
And respectively based on the read data corresponding to each erasure code strip, calculating to obtain the check data corresponding to the erasure code strip.
As can be seen from the above, in the scheme provided by the embodiment of the present invention, when each data slice of the plurality of erasure code bands is full of data, the data in the check slices of the plurality of erasure code bands is read again, so as to calculate and obtain the check data corresponding to each erasure code band, and the number of times of data reading due to calculation of the check data can be effectively reduced.
In another embodiment of the present invention, the calculating module 1703 is specifically configured to:
for each check fragment in the erasure code strip, controlling a storage device where the check fragment is located to execute the following operations:
reading data stored in the check fragment;
and calculating verification data based on the read data.
The storage module 1704 is specifically configured to:
and storing the check data segments expected to be stored in the check fragments in the check data into the check fragments.
As can be seen from the above, in the scheme provided by the embodiment of the present invention, for each verification segment, the verification data can be calculated based on the data stored in the verification segment, the calculation process is completed locally, no network resource is consumed, and each data block of the verification segment only participates in one calculation to obtain the verification data, so that no calculation resource is wasted; and for each check fragment in the erasure code strip, the check data segment expected to be stored by the check fragment is selected from the check data obtained based on the check fragment, and the check data segment is stored in the check fragment, so that the corresponding check data segment is not required to be sent from one linear space to the check fragments of other linear spaces, and the consumption of bandwidth resources is reduced.
Referring to fig. 20, a schematic structural diagram of an electronic device according to an embodiment of the present invention includes a processor 2001, a communication interface 2002, a memory 2003 and a communication bus 2004, where the processor 2001, the communication interface 2002, and the memory 2003 complete communication with each other through the communication bus 2004;
a memory 2003 for storing a computer program;
the processor 2001 is configured to execute a program stored in the memory 2003, thereby implementing any of the steps of the data storage method described above.
From the above, in the scheme provided by the embodiment of the invention, the data is written into the data fragments of the erasure-coded strip, and the check data of the erasure-coded strip is calculated again under the condition that each data fragment of the erasure-coded strip is fully written, so that 0 supplementing is not required in the data fragments of the erasure-coded strip, and the storage space utilization rate of the erasure-coded strip is improved. Particularly, under the condition of small data writing quantity or small file writing, the scheme provided by the embodiment of the invention does not need to supplement 0 in a large quantity in the data slicing of the erasure code stripes, and the problem of large waste of the erasure code stripe storage space under the scene of small data writing quantity or small file writing is obviously solved.
If the data fragments fail in the case where the check data of the erasure-coded stripe has not been calculated yet, the target data is written into both the target data fragments of the erasure-coded stripe and the target data blocks in each check fragment, and if the erasure-coded stripe includes n check fragments, the target data is stored in n+1 parts, and the target data can be restored as long as the total number of the n data fragments or the check fragments fail.
In addition, the embodiment of the invention empties the data except the check data segment in each check fragment, so that the final erasure code calculation result is consistent with the erasure code calculation result in the related technology.
The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the electronic device and other devices.
The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In yet another embodiment of the present invention, there is also provided a computer readable storage medium having stored therein a computer program which when executed by a processor implements the steps of any of the data storage methods described above.
When the computer program stored in the computer readable storage medium provided by the embodiment of the invention is used for data storage, data is written into the data fragments of the erasure-coded strip, and the verification data of the erasure-coded strip is calculated again under the condition that each data fragment of the erasure-coded strip is fully written, so that 0 supplementing is not needed in the data fragments of the erasure-coded strip, and the storage space utilization rate of the erasure-coded strip is improved. Particularly, under the condition of small data writing quantity or small file writing, the scheme provided by the embodiment of the invention does not need to supplement 0 in a large quantity in the data slicing of the erasure code stripes, and the problem of large waste of the erasure code stripe storage space under the scene of small data writing quantity or small file writing is obviously solved.
If the data fragments fail in the case where the check data of the erasure-coded stripe has not been calculated yet, the target data is written into both the target data fragments of the erasure-coded stripe and the target data blocks in each check fragment, and if the erasure-coded stripe includes n check fragments, the target data is stored in n+1 parts, and the target data can be restored as long as the total number of the n data fragments or the check fragments fail.
In addition, the embodiment of the invention empties the data except the check data segment in each check fragment, so that the final erasure code calculation result is consistent with the erasure code calculation result in the related technology.
In yet another embodiment of the present invention, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform any of the data storage methods of the above embodiments.
When the computer program product provided by the embodiment of the invention is used for data storage, data is written into the data fragments of the erasure code strip, and under the condition that each data fragment of the erasure code strip is fully written, the check data of the erasure code strip is calculated, so that 0 supplementing is not needed in the data fragments of the erasure code strip, and the storage space utilization rate of the erasure code strip is improved. Particularly, under the condition of small data writing quantity or small file writing, the scheme provided by the embodiment of the invention does not need to supplement 0 in a large quantity in the data slicing of the erasure code stripes, and the problem of large waste of the erasure code stripe storage space under the scene of small data writing quantity or small file writing is obviously solved.
If the data fragments fail in the case where the check data of the erasure-coded stripe has not been calculated yet, the target data is written into both the target data fragments of the erasure-coded stripe and the target data blocks in each check fragment, and if the erasure-coded stripe includes n check fragments, the target data is stored in n+1 parts, and the target data can be restored as long as the total number of the n data fragments or the check fragments fail.
In addition, the embodiment of the invention empties the data except the check data segment in each check fragment, so that the final erasure code calculation result is consistent with the erasure code calculation result in the related technology.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus, electronic devices, computer readable storage media and computer program product embodiments, the description is relatively simple as it is substantially similar to method embodiments, as relevant points are found in the partial description of method embodiments.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (12)

1. The data storage method is characterized in that an erasure code stripe comprises data fragments and check fragments, each check fragment comprises data blocks with the same number as the data fragments, each data block in each check fragment corresponds to different data fragments, and the storage space of each data block is the same as that of each data fragment, and the method comprises the following steps:
determining target data fragments to be written in by target data;
writing the target data in the target data fragments and target data blocks in each check fragment, wherein the target data blocks correspond to the target data fragments, and the positions of the target data fragments and the target data blocks where the target data is written are the same;
under the condition that each data fragment in the erasure coding strip does not have a free space, calculating to obtain check data based on the data stored in the check fragment in the erasure coding strip;
Storing each check data segment in the check data into different check fragments;
and clearing data except the check data segments in each check fragment.
2. The method according to claim 1, wherein the method further comprises:
under the condition that an instruction of covering the existing data in the erasure code strip by using the coverage data is received, determining a coverage data fragment to be written by the coverage data and a position to be written in a free space in the coverage data fragment;
writing the overlay data in the to-be-written position in the overlay data fragments, and writing the overlay data in overlay data blocks in each check fragment, wherein the overlay data blocks correspond to the overlay data fragments, and the overlay data fragments are the same as the positions in the overlay data blocks where the overlay data is written;
and determining the existing data as disabling data, and updating the address of the existing data contained in the metadata of the existing data to the address of the covering data.
3. The method of claim 2, further comprising, after said flushing data in each check fragment other than the check data segment:
Determining a disabling erasure code band in which the proportion of disabling data in the contained data reaches a preset proportion;
writing other data except the deactivated data in the deactivated erasure code stripes into other erasure code stripes;
and clearing the data in the erasure code stripes.
4. A method according to any of claims 1-3, wherein, in the case where there is no free space for each data slice in the erasure-coded stripe, calculating the parity data based on the data stored in the parity slices in the erasure-coded stripe comprises:
reading data in check fragments in a plurality of erasure code stripes at one time under the condition that no free space exists in each data fragment of the plurality of erasure code stripes;
and respectively based on the read data corresponding to each erasure code strip, calculating to obtain the check data corresponding to the erasure code strip.
5. A method according to any of claims 1-3, wherein said calculating check data based on data stored in check tiles in said erasure coded stripe comprises:
for each check fragment in the erasure code stripe, controlling a storage device in which the check fragment is located to execute the following operations:
Reading data stored in the check fragment;
calculating to obtain check data based on the read data;
storing each check data segment in the check data into different check fragments comprises the following steps:
and storing the check data segments expected to be stored in the check fragments.
6. A data storage device, wherein an erasure coding stripe includes data slices and check slices, each check slice includes data blocks having the same number as the data slices, each data block in each check slice corresponds to a different data slice, and each data block has the same size as a storage space of each data slice, the device comprising:
the first determining module is used for determining target data fragments to be written in by target data;
the first writing module is used for writing the target data in the target data fragments and target data blocks in each check fragment, wherein the target data blocks correspond to the target data fragments, and the positions of the target data fragments and the target data blocks where the target data is written are the same;
the calculation module is used for calculating and obtaining check data based on the data stored in the check fragments in the erasure coding strip under the condition that no free space exists in each data fragment in the erasure coding strip;
The storage module is used for storing each check data segment in the check data into different check fragments;
and the first emptying module is used for emptying the data except the check data segments in each check fragment.
7. The apparatus of claim 6, wherein the apparatus further comprises:
the second determining module is used for determining an overlay data fragment to which the overlay data is to be written and a position to be written in an idle space in the overlay data fragment under the condition that an instruction of using the overlay data to overlay the existing data in the erasure code strip is received;
a second writing module, configured to write the overlay data in the to-be-written position in the overlay data slice, and write the overlay data in an overlay data block in each check slice, where the overlay data block corresponds to the overlay data slice, and the overlay data slice is the same as the position in the overlay data block where the overlay data is written;
and the disabling module is used for determining the existing data as disabling data and updating the address of the existing data contained in the metadata of the existing data into the address of the coverage data.
8. The apparatus of claim 7, wherein the apparatus further comprises:
a third determining module, configured to determine a disabling erasure code stripe in which a proportion of disabling data in the included data reaches a preset proportion;
a third writing module, configured to write other data except for the deactivated data in the deactivated erasure code stripe into other erasure code stripes;
and the second emptying module is used for emptying the data in the erasure code band.
9. The apparatus according to any one of claims 6-8, wherein the computing module is specifically configured to:
reading data in check fragments in a plurality of erasure code stripes at one time under the condition that no free space exists in each data fragment of the plurality of erasure code stripes;
and respectively based on the read data corresponding to each erasure code strip, calculating to obtain the check data corresponding to the erasure code strip.
10. The apparatus according to any one of claims 6-8, wherein the computing module is specifically configured to:
for each check fragment in the erasure code stripe, controlling a storage device in which the check fragment is located to execute the following operations:
reading data stored in the check fragment;
Calculating to obtain check data based on the read data;
the storage module is specifically configured to:
and storing the check data segments expected to be stored in the check fragments.
11. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any one of claims 1-5 when executing a program stored on a memory.
12. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-5.
CN202310730545.7A 2023-06-19 2023-06-19 Data storage method and device, electronic equipment and storage medium Active CN116501262B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310730545.7A CN116501262B (en) 2023-06-19 2023-06-19 Data storage method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310730545.7A CN116501262B (en) 2023-06-19 2023-06-19 Data storage method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116501262A true CN116501262A (en) 2023-07-28
CN116501262B CN116501262B (en) 2023-09-19

Family

ID=87328679

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310730545.7A Active CN116501262B (en) 2023-06-19 2023-06-19 Data storage method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116501262B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776146A (en) * 2016-12-29 2017-05-31 华为技术有限公司 A kind of data verification method, apparatus and system
CN108170555A (en) * 2017-12-21 2018-06-15 浙江大华技术股份有限公司 A kind of data reconstruction method and equipment
CN110618895A (en) * 2019-09-29 2019-12-27 北京天融信网络安全技术有限公司 Data updating method and device based on erasure codes and storage medium
US20200348855A1 (en) * 2019-05-02 2020-11-05 Vast Data Ltd. System and method for using free space to improve erasure code locality
CN112148218A (en) * 2020-09-11 2020-12-29 北京浪潮数据技术有限公司 Method, device and equipment for storing check data of disk array and storage medium
CN112199054A (en) * 2020-12-07 2021-01-08 上海七牛信息技术有限公司 File storage method and system
CN114064347A (en) * 2022-01-18 2022-02-18 苏州浪潮智能科技有限公司 Data storage method, device and equipment and computer readable storage medium
CN115268773A (en) * 2022-07-18 2022-11-01 天翼云科技有限公司 Erasure code data storage method, device, equipment and medium
CN115437581A (en) * 2022-11-08 2022-12-06 浪潮电子信息产业股份有限公司 Data processing method, device and equipment and readable storage medium
WO2023051424A1 (en) * 2021-09-28 2023-04-06 华为技术有限公司 Erasure code-based coding method and related device
CN116048394A (en) * 2022-12-29 2023-05-02 天翼云科技有限公司 Distributed caching method, device, electronic equipment and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776146A (en) * 2016-12-29 2017-05-31 华为技术有限公司 A kind of data verification method, apparatus and system
CN108170555A (en) * 2017-12-21 2018-06-15 浙江大华技术股份有限公司 A kind of data reconstruction method and equipment
US20200348855A1 (en) * 2019-05-02 2020-11-05 Vast Data Ltd. System and method for using free space to improve erasure code locality
CN110618895A (en) * 2019-09-29 2019-12-27 北京天融信网络安全技术有限公司 Data updating method and device based on erasure codes and storage medium
CN112148218A (en) * 2020-09-11 2020-12-29 北京浪潮数据技术有限公司 Method, device and equipment for storing check data of disk array and storage medium
CN112199054A (en) * 2020-12-07 2021-01-08 上海七牛信息技术有限公司 File storage method and system
WO2023051424A1 (en) * 2021-09-28 2023-04-06 华为技术有限公司 Erasure code-based coding method and related device
CN114064347A (en) * 2022-01-18 2022-02-18 苏州浪潮智能科技有限公司 Data storage method, device and equipment and computer readable storage medium
CN115268773A (en) * 2022-07-18 2022-11-01 天翼云科技有限公司 Erasure code data storage method, device, equipment and medium
CN115437581A (en) * 2022-11-08 2022-12-06 浪潮电子信息产业股份有限公司 Data processing method, device and equipment and readable storage medium
CN116048394A (en) * 2022-12-29 2023-05-02 天翼云科技有限公司 Distributed caching method, device, electronic equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
XINZHE CAO等: "EC-Scheduler: A Load-Balanced Scheduler to Accelerate the Straggler Recovery for Erasure Coded Storage Systems", IEEE *
刘助翔;: "分布式大数据存储在融合新闻生产平台中的应用", 现代电视技术, no. 08 *
曾赛峰;屈喜龙;: "云存储环境下分组校验纠删码冗余算法研究", 湖南工程学院学报(自然科学版), no. 04 *

Also Published As

Publication number Publication date
CN116501262B (en) 2023-09-19

Similar Documents

Publication Publication Date Title
CN110245093B (en) Workload adaptive overallocation in solid state storage drive arrays
AU2016333294B2 (en) Data processing method and apparatus, and flash device
US9696916B2 (en) Techniques for reducing memory write operations using coalescing memory buffers and difference information
US11704239B2 (en) Garbage collection method for storage medium, storage medium, and program product
US7747813B2 (en) Multi-memory device system and method for managing a lifetime thereof
US7904764B2 (en) Memory lifetime gauging system, method and computer program product
RU2661280C2 (en) Massive controller, solid state disk and data recording solid state disk control method
WO2012039002A1 (en) Semiconductor storage device and data control method thereof
WO2021109590A1 (en) Data storage method and apparatus, electronic device, and storage medium
CN111324303B (en) SSD garbage recycling method, SSD garbage recycling device, computer equipment and storage medium
CN101925884A (en) Increasing spare space in memory to extend lifetime of memory
US11809330B2 (en) Information processing apparatus and method
KR20110018157A (en) Method for accessing flash memory device
CN113419685B (en) RAID creation based on SMR, data writing aiming at RAID and RAID recovery method
CN113568582B (en) Data management method, device and storage equipment
US10503608B2 (en) Efficient management of reference blocks used in data deduplication
CN110147203B (en) File management method and device, electronic equipment and storage medium
CN112463058B (en) Fragmented data sorting method and device and storage node
US20090249022A1 (en) Method for achieving sequential i/o performance from a random workload
CN115639971B (en) Data writing method, data writing device, electronic device, storage medium, and program product
US10282127B2 (en) Managing data in a storage system
CN116501262B (en) Data storage method and device, electronic equipment and storage medium
CN116339617A (en) Automatic deletion in persistent storage
CN113986604A (en) Data storage method and data storage device
US10608670B2 (en) Control device, method and non-transitory computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant