CN113821179A

CN113821179A - Data storage method and device, computing equipment and storage medium

Info

Publication number: CN113821179A
Application number: CN202111392168.8A
Authority: CN
Inventors: 李舒
Original assignee: Alibaba China Co Ltd; Alibaba Cloud Computing Ltd
Current assignee: Alibaba China Co Ltd; Alibaba Cloud Computing Ltd
Priority date: 2021-11-23
Filing date: 2021-11-23
Publication date: 2021-12-21
Anticipated expiration: 2041-11-23
Also published as: CN113821179B

Abstract

A data storage method and apparatus, a computing device and a storage medium for a distributed storage system are disclosed. And dividing the data block to be stored to obtain a plurality of slices. In the embodiment, the data block is compressed to obtain a compressed data block as the data block to be stored. And adding filling information generated by the distributed storage system for each slice respectively. The method includes storing a plurality of slices and corresponding padding information thereof, and storing an integer number of slices and corresponding padding information thereof in a single physical page. In an embodiment, the slices may be sized to store an integer number of slices and their corresponding padding information in a single physical page. Further, in the process of dividing the data block to be stored into a plurality of slices, erasure coding can be performed to obtain a plurality of data slices and a plurality of erasure check code slices. Therefore, the filling information and the corresponding user data can be conveniently stored on the same physical page, and the utilization efficiency of the physical page can be improved.

Description

Data storage method and device, computing equipment and storage medium

Technical Field

The present disclosure relates to distributed storage systems, and more particularly, to a data storage method and apparatus for a distributed storage system.

Background

With the development of internet technology, distributed storage systems are applied more and more widely. A distributed storage system is a cluster of servers connected by a network to ensure that sufficient data reliability is met for scale-out requirements.

To ensure data integrity, the distributed storage system additionally inserts padding verification information (or "padding information") for each logical mapping unit (or "logical block," e.g., 4096 bytes). These padding authentication information typically includes a check signature, metadata information, and the like. This operation facilitates an immediate check of the quality of data held by a large-scale distributed storage cluster and provides an efficient tool for identifying potential problems by breaking long paths.

On the other hand, NAND flash memory has been developed to 3D security protocol (3D domain), and thus the underlying error rate of NAND flash memory keeps increasing. To maintain the reliability and long lifetime required of SSDs, strong ECC (error correction coding) has been applied in SSD controller designs, which requires considerable parity information (parity). The 3D NAND flash pages can only provide limited space to accommodate the verification information.

Storage systems generally require IO alignment, i.e., aligning physical hard disk partitions with host logical partitions, to ensure read/write efficiency of the hard disks.

When padding information is additionally inserted into the logical mapping unit as described above, however, it is difficult for the physical NAND flash memory to put all data (including user data and check data) into the same physical page, and thus misalignment between the physical granularity and the logical granularity occurs, resulting in sub-optimal utilization of the NAND flash memory. Thus, SSD performance and lifetime are also inevitably affected.

The existing distributed storage system solution is to store the extended (inserted padding information) user data together with the corresponding check information in the same physical page in order to ensure good performance, in particular read latency and throughput performance. It is well known that the read latency from physical NAND flash is not negligible. If the check information is not in the same physical page as the user data, the NAND flash memory needs to be read twice to retrieve the error-free user data, which significantly affects performance and quality of service (QoS).

The existing solution is briefly described below with reference to fig. 1 and 2.

Fig. 1 schematically shows a prior art arrangement of data on a physical page.

As shown in fig. 1, for example, the total size of the 4 logical blocks LBA1, LBA2, LBA3, LBA4 may be the same as the size of the data area of the physical page. In this way, the data area and the 4 logical blocks of the physical page are aligned without the need for additional padding information to be inserted.

In addition, the ECC check information P1, P2, P3, and P4 corresponding to the 4 logical blocks LBA1, LBA2, LBA3, and LBA4 may be stored in an attachment area (for example, may also be referred to as an "ECC check area") of the same physical page. In this way, the 4 logical blocks and their respective corresponding ECC check information can all be stored in the same physical page. For example, the size of the logical block may be 4 kbytes, and the size of the data area of the physical page may be 16 kbytes.

However, in the scenario of the distributed storage system, after the padding information T1, T2, and T3 that needs to be generated and inserted by the distributed storage system is added to each of the logical blocks LBA1, LBA2, and LBA3, respectively, the data area of the physical page can no longer store the logical block LBA4 and its corresponding padding information.

Meanwhile, generally, after the physical page stores less number than the original number, for example, 3 blocks as shown in fig. 1, the logical blocks LBA1, LBA2, LBA3 and their corresponding filling information T1, T2, and T3, the data area of the physical page often has a larger storage space W1, which is not fully utilized. This storage space W1 will be wasted.

Similarly, the ECC check information P1', P2', and P3' obtained by performing ECC error correction coding on the LBA1, LBA2, LBA3 and the filling information T1, T2, and T3 corresponding to the LBA1, LBA2, and LBA3 respectively, may be larger than the previous P1, P2, and P3 respectively, but may not be enough to fill the storage space occupied by the original P1, P2, P3, and P4. Thus, a part of the storage space W2 is wasted in the attachment area.

With the waste of some fragmented small physical pages, such as memory space W1 and W2, underutilization of the physical pages is caused, which also results in uneven wear leveling in the NAND physical page.

Fig. 2 schematically shows a prior art storage space watermark management scheme.

As shown in fig. 2, in online deployment practice, drive capacity is managed through a watermark management scheme. In a watermark management scheme, a certain amount of storage space, i.e. OP (over-provisioning), is reserved before the drive becomes read-only due to full-speed driving. With the further help of the OP of an SSD, for example, nominal capacity of the drive can still be guaranteed in modern drives by performing more frequent trimming operations.

However, GC (Garbage Collection), which is frequently triggered due to trimming, may cause an increase in write amplification operations, which may reduce the performance and lifetime of the SSD.

Also, in this case, the drive capacity space actually used will exceed the preset use capacity as shown by the hatched portion in the right side of fig. 2. Accordingly, the reserved space OP is effectively shrunk, resulting in performance degradation.

Further, mismatches between logical units (logical blocks) and physical units (physical pages) complicate the mapping associated with misaligned boundaries.

Accordingly, there remains a need for an improved data storage scheme for facilitating storage of filler information from a distributed storage system for increased efficiency.

Disclosure of Invention

One technical problem to be solved by the present disclosure is to provide a data processing method and apparatus, which can conveniently store filling information from a distributed storage system to improve efficiency.

According to a first aspect of the present disclosure, there is provided a data storage method for a distributed storage system, the method comprising: dividing a data block to be stored to obtain a plurality of slices; filling information generated for each slice by a distributed storage system is added for each slice; and storing the plurality of slices and their corresponding padding information, wherein an integer number of slices and their corresponding padding information are stored in a single physical page.

Optionally, the size setting of the slices is adjustable; and/or the slices are sized to store an integer number of slices and their corresponding padding information in a single physical page; and/or the size of the slice is set such that after storing an integer number of slices and their corresponding padding information in a single physical page, the size of the remaining space on the physical page is less than a predetermined threshold.

Optionally, the method may further include: and compressing the data block to obtain a compressed data block as the data block to be stored.

Optionally, the method may further include: receiving a data packet, and removing a packet head and a packet tail of the received data packet to obtain data to be stored; and/or accumulating the data to be stored to form a data block with a set size.

Optionally, the step of compressing the data block to obtain a compressed data block includes: compressing the data block into a compressed data chunk; and accumulating the at least one compressed data chunk to obtain a compressed data chunk.

Optionally, the step of segmenting the compressed data block into a plurality of slices includes: dividing the compressed data block into n data slices; and performing erasure correction coding on the n data slices obtained by the division to obtain k erasure correction code slices, wherein the plurality of slices comprise the n data slices and the k erasure correction code slices, and both n and k are positive integers.

Optionally, the step of segmenting the data block to be stored into a plurality of slices may further include: under the condition that the size of the data block to be stored is smaller than the total size of n slices required by erasure coding, filling predetermined data into the data block to be stored to enable the size of the data block to reach the total size of the n slices so as to carry out erasure coding; and deleting the padded predetermined data after erasure coding.

Optionally, the method may further include: and carrying out error correction coding on the slice and the corresponding filling information thereof to obtain an error correction coding check code, and storing the error correction coding check code and the slice and the corresponding filling information thereof in the same physical page.

According to a second aspect of the present disclosure, there is provided a data storage apparatus for a distributed storage system, the apparatus comprising: the slicing device is used for dividing the data block to be stored to obtain a plurality of slices; the filling device is used for respectively adding filling information generated by the distributed storage system for each slice; and a drop device for storing a plurality of slices and corresponding padding information thereof, wherein an integer number of slices and corresponding padding information thereof are stored in a single physical page.

Optionally, the apparatus may further include: and the compression device is used for compressing the data block to obtain a compressed data block which is used as the data block to be stored.

Optionally, the slicing apparatus may include: the data dividing device is used for dividing the data block to be stored into n data slices; and an erasure coding device, configured to perform erasure coding based on the n data slices obtained by the division to obtain k erasure check code slices, where the multiple slices include the n data slices and the k erasure check code slices, and n and k are both positive integers.

According to a third aspect of the present disclosure, there is provided a computing device comprising: a processor; and a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method as described in the first aspect above.

According to a fourth aspect of the present disclosure, there is provided a non-transitory machine-readable storage medium having stored thereon executable code which, when executed by a processor of an electronic device, causes the processor to perform the method as described in the first aspect above.

Therefore, according to the data storage scheme disclosed by the invention, the filling information generated by the distributed storage system and the corresponding user data can be conveniently stored on the same physical page, and the utilization efficiency of the physical page can be improved.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in greater detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts throughout.

Fig. 1 schematically shows a prior art arrangement of data on a physical page.

Fig. 3 is a schematic flow chart diagram of a data storage method for a distributed storage system according to the present disclosure.

FIG. 4 is a schematic block diagram of a data storage device that may be used for a distributed storage system according to the present disclosure.

Fig. 5 is a schematic flow diagram of a process for segmenting into multiple slices based on a compressed database.

Fig. 6 is a schematic block diagram of a slicing apparatus for obtaining a plurality of slices.

Fig. 7 is a schematic diagram of a support scheme for variable length compressed data blocks.

Fig. 8 is a data processing schematic of a data storage method according to the present disclosure.

Fig. 9 is a schematic diagram of an example of processing data using the data storage method of the present disclosure.

FIG. 10 is a schematic block diagram of a storage server that may be used to implement a data storage method according to the present disclosure.

Fig. 11 is a schematic structural diagram of a computing device that can be used to implement the data storage method according to an embodiment of the present invention.

Detailed Description

Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As shown in fig. 4, the data storage device 300 may include a slicing device 320, a filling device 330, and a landing device 340. In addition, in the case of using the compressed data block as the data block to be stored, the data storage device 300 may further include a compression device 310. It should be understood that the data storage device 300 may not include the compression device 310 in the event that compression is not required.

As shown in fig. 3, first, in step S210, a data block to be stored is prepared.

The data block to be stored may be organized in advance, or may be received via a network, or may be obtained by data processing, for example.

In some embodiments, the data to be stored may also be compressed, reducing the amount of data storage. In this case, for example, the compression device 310 may compress the data block to be stored to obtain a compressed data block. And taking the compressed data block as a data block to be stored.

The data blocks to be stored may have various origins. For example, the data to be stored may be obtained by receiving a data packet from the network, for example, from a user, and removing a header and a trailer of the received data packet. Further, the data to be stored may be accumulated until the amount of accumulated data reaches a set size (compression window), thereby forming a data block having the set size.

In the process of compressing to obtain a compressed data block, one data block may be compressed to obtain one compressed data block, or a plurality of data blocks may be compressed to obtain one compressed data block. Or, each data block may be compressed into a compressed data small block, and then at least one compressed data small block is accumulated to obtain a compressed data block, which is used as the data block to be stored.

Then, in step S220, the data block to be stored (e.g., compressed data block) may be divided, for example, by the slicing device 320, so as to obtain a plurality of slices.

The slices may be sized to accommodate storing an integer number of slices and their corresponding padding information in a single physical page. For example, the size of the slice may be set such that after an integer number of slices and their corresponding padding information are stored in a single physical page, the size of the remaining space on the physical page is less than a predetermined threshold. The predetermined threshold may be a set number of bytes or bits. Alternatively, the predetermined threshold may be a set ratio of the slice size, such as 1/3 or 1/4 of the slice size.

The size setting of the slices is adjustable. In this way, the size of the slice can be controlled according to the actual application scene, so that the slice size is more matched with the physical page size.

In step S230, the filling information generated for each slice by the distributed storage system may be added for it, for example, by the filling device 330.

As described above, the padding information generated by the distributed storage system for the data typically includes a check signature, metadata information, and the like. The content of the padding information itself and the generating manner of the padding information are well known in the art and will not be described herein.

In step S240, a plurality of slices and their corresponding padding information may be stored in a physical page of the storage device, for example, by the landing device 340.

Here, an integer number of slices and their corresponding padding information may be stored in a single physical page.

For a slice of a set size, the padding information may for example have a size of a corresponding fixed size. In this way, after the slice size is set, the total size of the slice and its corresponding padding information may be a certain value.

If the total size of the integer M slices and their corresponding padding information is not larger than but close to the size of the data area of 1 physical page, when the integer M slices and their corresponding padding information are stored in one physical page, the storage space utilization of the physical page will be high and the waste will be less. Therefore, by setting a proper slice size, the utilization rate of the storage space of the physical page can be improved, and the waste of the storage space is reduced.

In addition, the size of the compression window (i.e., the set size of the data blocks to be accumulated) can be adjusted in addition to the set size of the slice. This is to handle the extreme case where sufficient space cannot be provided for padding information by compression. In this case, the compression window may be reduced accordingly to make room for the padding information in advance. This can be achieved by data merging in a data buffer before compression.

Further, in the process of dividing the data block to be stored into a plurality of slices, erasure correction coding (EC) may also be performed, where the plurality of slices obtained based on the data block to be stored include data slices obtained by dividing the data block to be stored and erasure correction code slices obtained by performing erasure correction coding.

As shown in fig. 6, the slicing means 320 may include data dividing means 321 and erasure coding means 322.

As shown in fig. 5, in step S221, the data block to be stored may be divided into n data slices by the data dividing means 321, for example.

Then, in step S222, erasure correction coding apparatus 322 may perform erasure correction coding on the basis of n divided data slices, for example, to obtain k erasure correction code slices, where n and k are both positive integers.

In this way, the plurality of slices obtained based on the data block to be stored may include the above-mentioned n data slices and k erasure checking code slices. The erasure check code slice and the data slice may have the same size and be treated the same during subsequent destage storage.

In addition, when accumulating data to be stored, the size of the data block obtained by accumulation may be set; when a data block is compressed to obtain a compressed data block as a data block to be stored, the sizes of the obtained compressed data block or compressed data small block are often different. In this way, compressed data blocks, which are data blocks to be stored, may have irregular lengths, so Erasure Coding (EC) needs to be able to support situations where the compressed data block size is variable.

The upper part of fig. 7 schematically shows the size of the complete data required or set for EC encoding and the EC check code size of the complete data.

The sizes of compressed data tiles a, B, C resulting from compressing a data block are shown in the middle of fig. 7 (data not presented in slice form in fig. 7). It can be seen that the sizes of the compressed data small block a, the compressed data small block B and the compressed data small block C are different from each other, and the total size of the compressed data small block a, the compressed data small block B and the compressed data small block C is not fixed and does not reach the set full data length for EC coding.

In this way, when the size of the compressed data block is smaller than the total size of the n slices necessary for erasure correction coding, the compressed data block is filled with predetermined data, for example, all "zeros" to have a size equal to the total size of the n slices, so that erasure correction coding is performed to obtain the EC check code.

Then, as shown in the lower part of fig. 7, when the EC codeword is to be stored in the drive after erasure correction coding, the padded predetermined data is deleted.

In this way, variable size compressed data blocks can be supported when erasure coding is performed. Meanwhile, when the data is finally stored in a disk-dropping manner, the total size of the complete compressed data and the EC check code can be shortened back to the length of the actual effective data, and the occupation of the storage space is reduced.

In addition, Error Correction Coding (ECC) may be performed on the slice and the corresponding padding information to obtain error correction coding check information, and the error correction coding check information and the slice and the corresponding padding information are stored in the same physical page.

As described above, the physical page may include a data area and an attachment area. The data area may be, for example, 16 kbytes and may be used to store user data awaiting storage data (data slices of a compressed data block) and corresponding EC-check code slices. The attachment area may be used, for example, to store ECC check information. In this way, the user data and its corresponding ECC check information may be stored on the same physical page.

Fig. 8 is a data processing diagram of a data storage method according to an embodiment of the present disclosure, in which data processing steps of converting an ethernet packet into a storage IO unit on, for example, a hard disk drive, are shown.

First, the header bit (header) H and the trailer bit (trailer) T of the received ethernet packet are removed, and the user data in the ethernet packet is put into the data buffer together.

User bits are accumulated in a data buffer to form a fixed-length block of data having a set size. The accumulated fixed-length data blocks serve as input blocks of the compression process of the subsequent compression operation, and the size of the fixed-length data blocks can be limited by a compression window.

And compressing the data blocks accumulated in the single compression window to obtain small compressed data blocks. The size of the compressed data chunks generally depends on the compression ratio.

An integer number of compressed data tiles are put together to form a compressed data block, which may also be referred to as an "Erasure Coding (EC) group," as input to a subsequent EC coding process.

In generating the EC check code, the compressed data block is divided into a plurality of data slices, and a plurality of EC check code slices are generated.

The slices are assigned flexible lengths to match the physical geometry of the storage medium in view of the padding information insertion. In other words, the length of a single slice is configured to eliminate IO alignment issues and thereby enable a friendly utilization of the driver.

The present invention proposes variable slice lengths that can be selected within a range rather than a predetermined constant. The number of slices can meet the basic requirements of EC reliability protection to extend the entire EC codeword to a sufficient number of (hard disk) drives, however each slice size is variable. In general, the size of the EC-window is larger than the padding information size, so slice length flexibility enables padding slices to be aligned at both physical and logical granularity.

Fig. 9 is a schematic diagram of an example of processing data using the data storage method of the present disclosure, in which an example of storing data of 4 LBAs by the method of the present invention is landed is shown.

The 4 LBAs packed together are compressed into a compressed page.

The compressed pages form an erasure coding group and are erasure coded together.

The compressed page may be viewed as a whole and the distributed storage system generates the total fill information for this compressed page.

The total padding information is stored in the same physical page with the compressed page, using the space freed by the compression process.

There may still be a certain amount of free storage capacity (shaded area in fig. 9). These free storage capacities can either be filled with dummy data or used for storing other slices.

Further, the complete data is further encoded by Error Correction Coding (ECC). The ECC codeword and check bits may be stored in the same physical page as the compressed page, e.g., in an attachment area of the physical page dedicated for storing ECC check information.

A Network Interface Controller (NIC) performs packet retrieval reassembly and compression operations on data packets from a network. The compressed data is then transferred to the system memory via the CPU interface to form EC-groups (compressed data blocks) and determine adjusted data slices. Padding information is generated based on the slices and concatenated with the slices for storage in the drop disk drive and the log drive based on different formats.

The present disclosure innovatively proposes flexible compression and Erasure Coding (EC) slices to align logical units and physical granularity when padding information is additionally inserted for data integrity purposes.

By giving freedom in data organization, relaxed requirements for life, capacity and performance are imposed on various series of hard disk drives, effectively improving storage performance and storage medium utilization.

Referring to fig. 11, the computing device 1000 includes a memory 1010 and a processor 1020.

The processor 1020 may be a multi-core processor or may include multiple processors. In some embodiments, processor 1020 may include a general-purpose host processor and one or more special purpose coprocessors such as a Graphics Processor (GPU), Digital Signal Processor (DSP), or the like. In some embodiments, processor 1020 may be implemented using custom circuits, such as an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA).

The memory 1010 may include various types of storage units, such as system memory, Read Only Memory (ROM), and permanent storage. Wherein the ROM may store static data or instructions that are needed by the processor 1020 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. Further, the memory 1010 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, among others. In some embodiments, memory 1010 may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual layer DVD-ROM), a read-only Blu-ray disc, an ultra-density optical disc, a flash memory card (e.g., SD card, min SD card, Micro-SD card, etc.), a magnetic floppy disc, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.

The memory 1010 has stored thereon executable code, which when processed by the processor 1020, causes the processor 920 to perform the data storage methods described above.

The data storage scheme for the distributed storage system according to the present invention has been described in detail above with reference to the accompanying drawings.

Furthermore, the method according to the invention may also be implemented as a computer program or computer program product comprising computer program code instructions for carrying out the above-mentioned steps defined in the above-mentioned method of the invention.

Alternatively, the invention may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or computing device, server, etc.), causes the processor to perform the steps of the above-described method according to the invention.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A data storage method for a distributed storage system, the method comprising:

dividing a data block to be stored to obtain a plurality of slices;

respectively adding filling information generated for each slice by the distributed storage system; and

storing the plurality of slices and their corresponding padding information, wherein an integer number of slices and their corresponding padding information are stored in a single physical page.

2. The method of claim 1, wherein,

the slice size setting is adjustable; and/or

The slices are sized to store an integer number of slices and their corresponding padding information in a single physical page; and/or

The size of the slice is set such that after an integer number of slices and their corresponding padding information are stored in a single physical page, the size of the remaining space on the physical page is less than a predetermined threshold.

3. The method of claim 1, further comprising:

and compressing the data block to obtain a compressed data block as the data block to be stored.

4. The method of claim 3, further comprising:

receiving a data packet, and removing a packet head and a packet tail of the received data packet to obtain data to be stored; and

and accumulating the data to be stored to form the data block with the set size.

5. The method of claim 3, wherein the step of compressing the data block to obtain a compressed data block comprises:

compressing the data block into a compressed data chunk; and

and accumulating at least one compressed data small block to obtain the compressed data block.

6. The method of claim 1, wherein the step of segmenting the block of data to be stored into a plurality of slices comprises:

dividing a data block to be stored into n data slices;

performing erasure correction coding based on the n data slices obtained by the division to obtain k erasure correction code slices,

wherein the plurality of slices includes the n data slices and the k erasure checking code slices, where n and k are both positive integers.

7. The method of claim 6, wherein the step of segmenting the data block to be stored into a plurality of slices further comprises:

under the condition that the size of the data block to be stored is smaller than the total size of n slices required by erasure correction coding, filling predetermined data into the data block to be stored to enable the size of the data block to be stored to reach the total size of the n slices so as to carry out erasure correction coding; and

after erasure coding is performed, the padded predetermined data is deleted.

8. The method of claim 1, further comprising:

and carrying out error correction coding on the slice and the filling information corresponding to the slice to obtain an error correction coding check code, and storing the error correction coding check code and the slice and the filling information corresponding to the slice into the same physical page.

9. A data storage apparatus for a distributed storage system, the apparatus comprising:

the slicing device is used for dividing the data block to be stored to obtain a plurality of slices;

the filling device is used for respectively adding filling information generated by the distributed storage system for each slice; and

and the tray dropping device is used for storing the plurality of slices and the corresponding filling information thereof, wherein an integral number of slices and the corresponding filling information thereof are stored in a single physical page.

10. The apparatus of claim 9, further comprising:

and the compression device is used for compressing the data block to obtain a compressed data block which is used as the data block to be stored.

11. The apparatus of claim 9, wherein the slicing apparatus comprises:

data dividing means for dividing the compressed data block into n data slices; and

erasure coding means for performing erasure coding based on the n data slices obtained by the division to obtain k erasure check code slices,

12. A computing device, comprising:

a processor; and

a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any of claims 1 to 8.

13. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method of any of claims 1-8.