CN116166197A - Data storage method, system, storage node and computer readable storage medium - Google Patents

Data storage method, system, storage node and computer readable storage medium Download PDF

Info

Publication number
CN116166197A
CN116166197A CN202310182167.3A CN202310182167A CN116166197A CN 116166197 A CN116166197 A CN 116166197A CN 202310182167 A CN202310182167 A CN 202310182167A CN 116166197 A CN116166197 A CN 116166197A
Authority
CN
China
Prior art keywords
data
data block
compression
storage
area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310182167.3A
Other languages
Chinese (zh)
Inventor
廖武钧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202310182167.3A priority Critical patent/CN116166197A/en
Publication of CN116166197A publication Critical patent/CN116166197A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Storage Device Security (AREA)

Abstract

The embodiment of the application provides a data storage method, a data storage system, a storage node and a computer readable storage medium. Wherein the method comprises the following steps: acquiring at least one first data block after compression and encryption from a log area of a storage disk; determining a compression rate of the at least one first data block; and storing the at least one first data block into a data area of the storage disk according to the compression rate. When the scheme provided by the application is adopted to realize data compression, encryption and storage, low-cost storage of data can be ensured, and the computational overhead of the cost required by data storage can be reduced.

Description

Data storage method, system, storage node and computer readable storage medium
Technical Field
The present disclosure relates to the field of cloud storage technologies, and in particular, to a data storage method and system, a storage node, and a computer readable storage medium.
Background
Cloud storage technology is an emerging network storage technology developed based on cloud computing. In a cloud storage system constructed based on a cloud storage technology, in order to save storage space and ensure data security, data is usually stored in a compressed and encrypted form to a cloud disk deployed in the system. At present, in the process of storing data in a cloud disk in a compressed and encrypted form, the data is usually compressed and encrypted first and then stored in a log area of the cloud disk, and after a certain amount of data is accumulated, the data is transferred to a data area of the cloud disk. In the process of transfer, how to achieve the improvement of the data compression rate of the data area and the reduction of the storage cost with relatively small expenditure of calculation power is a problem to be solved.
Disclosure of Invention
The present application provides a data storage method, system and storage node, computer readable storage medium that solves the above-mentioned problems, or at least partially solves the above-mentioned problems.
Thus, in one embodiment of the present application, a data storage method is provided. The method comprises the following steps:
acquiring at least one first data block after compression and encryption from a log area of a storage disk;
determining a compression rate of the at least one first data block;
and storing the at least one first data block into a data area of the storage disk according to the compression rate.
In another embodiment of the present application, a storage system is also provided. The system comprises:
a storage disk including a log area and a data area;
and the processing module is used for realizing the steps in the data storage method provided by the embodiment of the application.
In yet another embodiment of the present application, a storage node is also provided. The storage node includes: the system comprises a storage disk, a processing module and a memory, wherein the memory is used for storing a computing program; the processing module is coupled to the memory, and is configured to execute the computing program stored in the memory, so as to implement the steps in the data storage method provided in the embodiment of the present application.
In yet another embodiment of the present application, there is also provided a computer readable storage medium having stored thereon a computer program/instruction which when executed is capable of implementing the steps in the data storage method provided in the embodiments of the present application.
According to the technical scheme provided by the embodiment of the application, after at least one first data block after compression and encryption is obtained from the log area of a storage disk, the compression rate of the at least one first data block is determined, and the at least one first data block is stored into the data area of the storage disk according to the compression rate. When the scheme is adopted to realize the data compression encryption storage, the low-cost storage of the data can be ensured, and the calculation power cost of the data storage cost can be reduced.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, a brief description will be given below of the drawings that are needed to be utilized in the embodiments or the prior art descriptions, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1a is a schematic diagram of data encryption and decryption performed at a storage node according to the present application;
FIG. 1b is a schematic diagram of storing existing data in a compressed encrypted form on a cloud disk;
FIG. 2 is a flow chart of a data storage method according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of the present application for storing data in a compressed and encrypted form on a storage disk;
FIG. 4 is a schematic diagram of a data storage device according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a computer program product according to an embodiment of the present application.
Detailed Description
Before describing the technical scheme provided by the embodiment of the present application, some specific nouns/terms related in the present application are described.
The cloud storage technology refers to that a large number of storage devices of different types in a network are combined to work cooperatively through application software through functions of cluster application, network technology or distributed storage system and the like, and the functions of data storage and access are provided for the outside together.
The cloud storage system is a storage system built based on cloud storage technology, and can be deployed on a corresponding cloud server. The cloud server is provided by a cloud host manufacturer, is produced based on a cloud computing technology, and enables a user to operate and manage in a remote login mode, and the cloud server is the same as a common remote physical server in use mode of the user.
Cloud disk is a disk instance built on a cloud storage system, and can be read, written and used as a computer disk (physical disk). In order to consider response delay and stability of data writing during data writing, a log storage mode is often adopted, and specifically: setting a fixed proportion area as a log area in a cloud disk for temporarily storing data, and setting the rest area as a data area (a main storage area) for permanently storing data; when one data is needed to be written (namely, stored in the context) into the cloud disk, the data is written into the log area, and after a certain amount of data is accumulated, the data in the log area is transferred into the data area.
Encryption cloud disk: and the cloud disk is used for encrypting and storing the data. The conventional cloud storage system generally provides transparent encryption service for the cloud disk, and a user can maintain a corresponding key in the cloud storage system by himself or by means of a public cloud key management service, so that the corresponding key is used when the bottom layer encrypts and decrypts data, and the data finally stored on the cloud disk is ensured to be encrypted. When a user reads and writes data through an interface corresponding to the cloud disk, the read data or the data to be written carried by the write request are plaintext data, namely the read and write data of the user are in an encryption mode that the user cannot perceive the data, so the method is called transparent encryption. For example, referring to fig. 1a, when a user reads and writes data from and to a cloud disk through a terminal node, the data to be written carried by a write request is encrypted in a storage node by using a corresponding key, and request data corresponding to a read request is decrypted, so that encrypted ciphertext data is stored in the cloud disk, and decrypted plaintext data is read by the user.
It should be noted that the encryption and decryption calculation of data described in the above example may be performed on a terminal node (specifically, a processing module disposed on the terminal node, such as the data block storage terminal module shown in fig. 1 a), in addition to the storage node (more specifically, the processing module disposed on the storage node). The present application is directed to a scheme for performing encryption and decryption computation on a storage node, and therefore, the technical schemes provided in the embodiments of the present application described below are all implemented under the precondition that encryption and decryption are performed by the storage node.
The compression ratio is a ratio of the size of data after compression to the size of original data before compression. For example, if one 4KB of data is compressed to 2KB, the compression rate is (2/4) ×100% =50%. Generally, the smaller the compression ratio, the smaller the space occupied at the time of storage.
In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application.
In some of the flows described in the specification, claims, and drawings described above, a plurality of operations occurring in a particular order are included, and the operations may be performed out of order or concurrently with respect to the order in which they occur. The sequence numbers of operations such as 101, 102, etc. are merely used to distinguish between the various operations, and the sequence numbers themselves do not represent any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types. The term "or/and" in this application is merely an association relationship describing the association object, which means that three relationships may exist, for example: a and/or B are three cases that A can exist alone, A and B exist together and B exists alone; the character "/" in this application generally indicates that the associated object is an "or" relationship. It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a product or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such product or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a commodity or system comprising such elements. Furthermore, the embodiments described below are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
For the sake of understanding the solution provided in the present application, the following briefly describes an existing implementation of storing data in a encrypted cloud disk. The method comprises the following steps:
in general, in order to save storage space, the cloud storage system compresses data and then writes the compressed data into the cloud disk, and in combination with the log storage mode, the data written into the log area and the data written into the data area involved in the process of writing into the cloud disk are both compressed data. In addition, if the cloud disk is the encryption Yun Cipan, the cloud storage system encrypts the data and then writes the encrypted data into the cloud disk, and because the encrypted data is random, the data size before and after the encrypted data is compressed basically has no change, and there is usually no compression benefit, so when one data needs to be compressed and encrypted, the data is compressed first and then encrypted.
Fig. 1b shows an exemplary diagram of the principle of writing data to a cloud disk (as encryption Yun Cipan) in a compressed encrypted form. As shown in fig. 1b, when a write request a reaches a storage node, a processing module in the storage node firstly compresses and encrypts a data block m1 to be written carried by the write request a in response to the write request a to obtain a compressed and encrypted data block m1; the compression granularity corresponding to this compression is determined by the size of the data block, for example, if the size of the data block m1 is 4KB, the compression granularity is 4KB. Then, according to the log storage mode, according to the disk identifier carried by the write request a, the compressed and encrypted data block m1 is temporarily stored in the log area of the cloud disk corresponding to the disk identifier under the storage node. Further, after the processing module monitors that the data in the log area of the cloud disk is accumulated to a certain amount, the processing module reads the data in the log area in batches and decrypts and decompresses the read compressed and encrypted data blocks; then, the plurality of data blocks after encryption, decryption and decompression are compressed according to larger compression granularity (such as 16KB and 32 KB) and encrypted so as to be transferred into the data area of the cloud disk. For example, assuming that the data blocks m corresponding to the write requests a, B and the like are read in batches from the cloud disk, after the read compressed and encrypted data blocks m are decrypted and decompressed, a plurality of data sections with continuous addresses can be spliced according to address offsets of the corresponding write requests, and the data blocks are divided into 16KB data blocks according to 16KB in each data section; in other words, the plurality of data blocks m after decryption and decompression may be combined into one data with continuous corresponding addresses according to the address offsets of the corresponding plurality of writing requests, and then the data may be divided into 16KB data blocks, thereby obtaining a plurality of 16KB data blocks. And then, respectively compressing, encrypting and storing the data blocks with the size of 16KB into a data area of the cloud disk. As can be seen from the above, in the case of storing data in a compressed and encrypted form, each time a storage node writes data into its corresponding cloud disk, the storage node goes through two writing phases, namely, a log writing phase and a data writing phase, and the computing operations required in the two writing phases include:
1) In the stage of writing in the log area and the stage of writing in the data area, data compression and encryption are needed once respectively;
2) After the data is read from the log area, in order to reduce the storage cost by compressing with larger data block granularity, one-time decryption and decompression are needed to obtain the original data so as to re-block the data for compression.
The description is as follows: in theory, the larger the data block is, the better the corresponding compression effect (the lower the compression rate) is, and accordingly, the storage space of the cloud disk occupied during storage is lower, so that the cost of storage is lower. Since the writing delay is ensured in the stage of writing the log area, the data to be written carried by the writing request can only be stored after being compressed in a standing horse mode, and the compression is performed after no time is spent for accumulating more data, namely: the compression granularity corresponding to the data compression in the log writing area stage is determined by the size of the data, but is limited by the size of the data that can be carried by the write-once request (in most cases, the size of the data carried by the write-once request is 4KB or 8 KB), and the compression granularity corresponding to the compression is 4KB or 8KB. Therefore, compared with the data writing stage, the data writing stage can compress the data according to the large block sizes such as 16KB and 32KB, and the corresponding compression rate in the log writing stage is always about 20% -30% different under the same data volume.
In addition, when writing one piece of data to the cloud disk in a compressed and encrypted form, the total is subjected to compression, encryption, decryption and decompression for 2 times, and the data writing is increased by 2 times and decryption calculation overhead compared with the common writing of the data to the cloud disk (namely, the data is simply stored in a compressed form). Compared with compression calculation overhead, the encryption and decryption calculation overhead of unit data volume is larger in practice, so that the data is written into the writing calculation overhead corresponding to the cloud disk in a compression and encryption mode, usually, the data is written into the writing calculation overhead corresponding to the cloud disk in a compression mode only by more than 2 times (the writing calculation overhead is influenced by an encryption algorithm or can reach 5 to 10 times), and therefore, the calculation power of a CPU (Central Processing Unit, a processor) of a storage node is easy to appear to reach a bottleneck, and the throughput bandwidth of data writing is influenced.
If the compressed and encrypted data stored in the log area is transferred to the data area, the compressed and encrypted data in the log area is directly transferred to the data area for storage without changing the compression granularity, and the direct transfer and storage mode does not need to perform decryption, decompression and recompression encryption, but the compression rate corresponding to the data storage in the data area is basically unchanged from the compression rate corresponding to the data storage in the log area due to the lack of the process of performing the second compression according to the larger compression granularity, so that the disk space occupied during the storage in the data area is relatively larger.
As can be seen from the above description, the reason why the data read from the log area needs to be recompressed during the writing data area stage is that: to achieve lower compression ratios, storage costs are reduced. To achieve this requirement, the implementation principle of the scheme provided in the present application is: on the premise of controlling the compression rate corresponding to the data area, the data is compressed for the second time as much as possible, so that the calculation overhead of decryption and re-encryption caused by the second compression process is avoided. In other words, the scheme aims at encrypting the cloud disk, is an optimization scheme for balancing the storage cost and the encryption and decryption computing resources, and aims to relieve some scenes that the performance of the cloud disk is seriously damaged when the hardware encryption and decryption computing resources are insufficient.
Fig. 2 is a schematic flow chart of a data storage method according to an embodiment of the present application, where an execution body of the method is a storage node shown in fig. 3, specifically, a processing module disposed in the storage node, where the storage node is a cloud server, and the processing module is a functional module with data processing and computing capabilities, such as a CPU, a microprocessor, and the like. As shown in fig. 2, the data storage method provided in this embodiment includes the following steps:
101. Acquiring at least one first data block after compression and encryption from a log area of a storage disk;
102. determining a compression rate of the at least one first data block;
103. and storing the at least one first data block into a data area of the storage disk according to the compression rate.
In 101, the storage disk is a cloud disk deployed under a storage node, and more specifically is an encrypted cloud disk. The compressed and encrypted at least one first data block stored in the log area is obtained by compressing and encrypting the first data block to be written carried in the writing request after the writing request is received by an execution main body, wherein the corresponding compression granularity is the size of the first data block during compression. For the reason that the compression granularity corresponding to this compression is the size of the first data block, refer to the related content in the content described above in connection with fig. 1b, and will not be described in detail herein. In the above, referring to fig. 1a, the write request may be sent by the user through a client module deployed on the terminal node. The client module is a data access port provided by the storage node for users, and the data input and output by the client module are all plaintext data. In particular implementations, the client module may be, but is not limited to: the office application, video application, etc. are each applied software, it provides the reading of the data, write in the function; accordingly, the data type of the data block carried in the write request may be, but is not limited to, text data, audio-video data, picture data, and the like. The terminal node is a computing access node, which may be, but is not limited to, a smart phone, a tablet computer, a desktop computer, a smart wearable device (e.g., a smart bracelet), etc. Based on the foregoing, the method provided in this embodiment may further include the following steps:
100a1, receiving a writing request sent by a terminal node, wherein the writing request carries a disk identifier and a first data block to be stored;
100a2, performing compression encryption on the first data block to obtain a compressed and encrypted first data block;
100a3, storing the compressed and encrypted first data block into a log area in a storage disk corresponding to the disk identifier.
For a specific description of the terminal nodes and encryption implementation in steps 100a 1-100 a3, reference is made to the following description of the embodiments.
In this embodiment, in the processing procedure according to the steps 100a1 to 100a3 for the write request, the data in the log area is also monitored, so that when the data stored in the log area is monitored to meet the dump condition, the data reading is performed and the dump processing is performed. That is, in a specific implementation technical solution, the step 101 of "obtaining the compressed and encrypted at least one first data block from the log area of a storage disk" may specifically include
1011. Detecting whether at least one first data block stored in the log area after compression and encryption meets a transfer condition or not;
1012. and when the transfer condition is met, reading the at least one first data block from the log area.
In a specific implementation, the at least one first data block meeting the dump condition may include at least one of the following: the number of the at least one first data block reaches a preset number threshold; and the storage duration of the at least one first data block reaches a preset duration. The number threshold and the preset time length can be flexibly set according to practical situations, and the embodiment is not particularly limited.
In 102, the compression ratio of each of the read first data blocks is determined with the purpose of: for participating in a transfer work to be executed later, so as to reduce the calculation amount of the transfer work to be executed. The compression rate may be calculated according to a pre-compression size and a post-compression size of the first data block. The pre-compression size and the post-compression size of the first data block may be obtained from data information of the first data block recorded by storing the compressed and encrypted first data block in the log area. That is, in one implementation solution, the determining 102 "the compression rate of the at least one first data block" may specifically include:
1021. acquiring data information of each of the at least one first data block, wherein the data information comprises the pre-compression size and the post-compression size of the corresponding first data block;
1022. And determining the compression rate of the at least one first data block according to the respective pre-compression size and post-compression size of the at least one first data block.
In practice, the compression rate of a first data block may be obtained, but is not limited to, by calculating a ratio of a post-compression size to a pre-compression size of the first data block. Thus, in a specific implementation manner, the 1022 "determining the compression rate of the at least one first data block according to the pre-compression size and the post-compression size of each of the at least one first data block" may be implemented by the following specific steps:
10221. calculating the ratio of the compressed size to the pre-compressed size of each of the at least one first data block;
10222. and determining the ratio corresponding to each of the at least one first data block as the compression rate of each of the at least one data block.
For example, assuming that a pre-compression size of one of the at least one first data block is 4KB and a post-compression size is 2KB, a compression rate of the first data block is (2/4) ×100% =50%.
After the compression rate is determined, the compression rate of each first data block can be compared with a preset threshold, if the compression rate corresponding to one first data block after one-time compression is superior to the preset threshold, the storage cost of the first data is indicated to reach the average level in statistics, the storage cost is acceptable, so that the first data block is deduced to be free from secondary compression treatment, in other words, the first data block is compressed very little after one-time compression, the improvement space is very small, the expected potential of performing secondary compression on the first data block is not large, and the first data block can be directly transferred to a corresponding data area; if the compression rate of the first data block does not reach the preset threshold, the first data block can be decrypted and decompressed so as to be compressed and encrypted again according to larger compression granularity and stored in a corresponding data area. Based on this, in one implementation solution, 103 "storing the at least one first data block in the data area of the storage disk according to the compression ratio" may specifically include:
1031. Comparing the compression rate of the at least one first data block with a preset threshold;
1032. and storing the at least one first data block into a data area of the storage disk according to the comparison result.
In 1031, the preset threshold may be set to be an average compression rate corresponding to the practical statistics of the secondary compression, such as 50%, 45%, etc.
In addition, in order to achieve the purpose of re-compressing the first data blocks with the compression rate not reaching the preset threshold at a lower compression granularity, in this embodiment, the number of the acquired first data blocks is multiple (two or more), and correspondingly, in step 1032, the multiple first data blocks are stored in the data area of the storage disk according to the comparison result. In specific implementation, the step 1032 may be implemented by specifically adopting the following steps:
10321. processing a first data block of the plurality of first data blocks, the compression rate of which is greater than the preset threshold value, to generate at least one second data block which is compressed and encrypted and enables the compression rate to be smaller than or equal to the preset threshold value; storing said compressed and encrypted at least one second data block in said data area;
10322. and the first data blocks with the compression rate smaller than or equal to the preset threshold value in the plurality of first data blocks are not processed and directly stored in the data area.
In 10321, for each first data block with a compression rate greater than the preset threshold, operations such as decryption, decompression, integration, etc. may be sequentially performed, so as to obtain a corresponding plurality of second data blocks, and compression encryption may be performed on the plurality of second data blocks. That is, a specific implementation of the step 10321 may be implemented as follows:
103211, decrypting and decompressing each first data block with the compression rate larger than the preset threshold value to obtain each first data block after decryption and decompression;
103212, obtaining the compression granularity corresponding to the data area;
103213, integrating each first data block after decryption and decompression into at least one second data block based on the compression granularity;
103214, performing compression encryption on the at least one second data block to obtain at least one compressed and encrypted second data block.
In particular, the compression granularity corresponding to the data area is relatively larger, such as 16KB, 32KB, etc. When the decrypted and decompressed first data blocks are integrated, the decrypted and decompressed first data blocks with continuous addresses can be combined to be combined into the second data blocks with the compressed granularity according to the address offset (logic address) of the write request corresponding to each first data block; or, according to the address offset of each first data block corresponding to the write request, integrating all the first data blocks after decryption and decompression into one data, and then dividing the integrated data into at least one second data block with the compressed granularity. And finally, compressing the at least one second data block and encrypting the at least one second data block by using a corresponding key, thereby obtaining the compressed and encrypted at least one second data block, and storing the compressed and encrypted at least one second data block into a data area.
For example, referring to FIG. 3, assume that batch reading from the log area to a batch of compressed and encrypted first data blocks includes: the data block m1, the data block m2 and the data block m3, wherein the compression rate of each of the data block m2 and the data block m3 after compression and encryption is greater than a preset threshold value, and the data block m2 and the data block m3 are decrypted and decompressed; and then, based on the compression granularity corresponding to the data area, such as 16KB, the decrypted and decompressed data block m2 and the data block m3 can be integrated into a data block n1 with the size of 16KB, and the data block n1 is compressed and encrypted, so that the compressed and encrypted data block n1 is stored in the data area.
It should be noted that if the decrypted and decompressed first data block with the drop list cannot be integrated during the integration process, the decrypted and decompressed first data block may be temporarily not integrated, so as to wait for merging with the first data block adapted in the acquired next compressed and encrypted at least one first data block, so as to integrate and restore the first data block; the next compressed and encrypted at least one first data block is obtained from a log area of a storage disk corresponding to the compressed and encrypted at least one first data block. For example, in the example given above in connection with fig. 3, it is assumed that the first data block after batch reading to a batch of compression and encryption includes, in addition to the data block m1, the data block m2, and the data block m3, the data block m4 (not shown in fig. 3), where the compression rate of the data block m4 is greater than the preset threshold, after the decryption and decompression of the data block m4, since the decrypted and decompressed data block m2 and the data block m3 have been integrated into a data block n1 with a size of 16KB, at this time, the decrypted and decompressed data block m4 falls on a single table, and no additional decrypted and decompressed data blocks are integrated therewith, in this case, the decrypted and decompressed data block m4 may be temporarily not integrated for being integrated with the first data block after compression and encryption in the next batch read from the log area for storage. For example, if the read next batch of compressed and encrypted first data blocks includes the data block m5 and the data block m6, the compression rate of the data block m5 is greater than the preset threshold, the compression rate of the data block m6 is less than or equal to the preset threshold, and the decrypted and decompressed data block m4 and the decrypted and decompressed data block m5 can be integrated into a data block n2 with a size of 16KB, the decrypted and decompressed data block m4 and the decrypted and decompressed data block m5 can be integrated for being transferred to the data area.
In 10322, the first data block with the compression rate smaller than or equal to the preset threshold is directly transferred into the data area without processing, so that the first data block is stored into the data area, and the calculation cost of decryption and encryption at one time is reduced.
For example, in the example illustrated in connection with fig. 3 in step 10321, if the compression ratio of the compressed and encrypted data block m1 read from the log area is set to be less than or equal to the preset threshold, the compressed and encrypted data block m1 may be directly transferred into the data area without any processing.
It should be noted that, in this embodiment, the key used in performing the encryption and decryption operations may be obtained according to a preset related encryption algorithm; alternatively, the key management (KMS, key Management Service) function in the public cloud data encryption service (DEW, data Encrption Workshop) may be provided, which is not limited in this embodiment.
As can be seen from the content related to step 103, in the scheme, by setting a threshold value of compression rate, in the process of transferring, only some compressed and encrypted first data blocks with compression rate not reaching the threshold value in the log area are decrypted and decompressed to be recombined and then compressed and encrypted again, and then stored in the data area; the first data block after compression and encryption, which is remained in the log area and has the compression rate reaching the threshold value, is directly restored when being restored in the data area, and decryption and decompression are not needed to be carried out so as to carry out compression and encryption again; the scheme can ensure that the overall compression rate of the data stored in the data area is not obviously deteriorated (not lower than a threshold value), and can reduce the calculation cost of decryption and secondary encryption for a part of data, thereby releasing larger calculation force for encryption and decryption.
In summary, according to the technical solution provided in this embodiment, after at least one first data block after compression and encryption is obtained from a log area of a storage disk, a compression rate of the at least one first data block is determined, and the at least one first data block is stored in the data area of the storage disk according to the compression rate. When the scheme is adopted to realize the data compression encryption storage, the low-cost storage of the data can be ensured, and the calculation power cost of the data storage cost can be reduced.
Another embodiment of the present application further provides a storage system corresponding to the above embodiment of the method, and the architecture of the storage system may refer to fig. 3. As seen in fig. 3, the storage system comprises: a storage disk 12 and a processing module 11; wherein,,
a storage disk 12 including a log area and a data area;
a processing module 11, configured to obtain at least one first data block after compressed and encrypted from a log area of the storage disk; determining a compression rate of the at least one first data block; and storing the at least one first data block into a data area of the storage disk according to the compression rate.
The storage disk 12 and the processing module 11 are disposed under the corresponding storage node 1. The storage system provided in this embodiment is a cloud storage system, and accordingly, the storage node 1 may be a cloud server, and the storage disk 12 may be a cloud disk, and more specifically may be an encrypted cloud disk. The processing module 11 is a functional module with data processing and computing capabilities, such as a CPU, a microprocessor, etc. One or more (2 and more) storage disks 12 may be deployed under one storage node 1, only an example of which is shown in fig. 3.
Further, the storage system provided in this embodiment may further include: end nodes (not shown in fig. 3, see fig. 1 a);
the terminal node is configured to send a write request to the processing module, where the write request carries a disk identifier and a first data block to be stored;
the processing module 11 is further configured to receive the write request; compressing and encrypting the first data block carried in the writing request to obtain a compressed and encrypted first data block; and storing the compressed and encrypted first data block into a log area in a storage disk corresponding to the disk identifier.
In specific implementation, the terminal node is a computing access node, which may be, but is not limited to, a smart phone, a tablet computer, a desktop computer, an intelligent wearable device (such as a smart bracelet), and the like. Referring to fig. 1a, a client module may be deployed on a terminal node. The client module is a data access port provided by the storage node for users, and the data input and output by the client module are all plaintext data. In particular implementations, the client module may be, but is not limited to: the office application, video application, etc. are each applied software, it provides the reading of the data, write in the function; accordingly, the writing request may be sent by the user through a client module disposed on the terminal node, and the data type of the data block carried in the writing request may be, but is not limited to, text data, audio/video data, picture data, and the like. In addition, the terminal node may further have an operating system, virtualization software (KVM, keyboard Video Mouse), a data block storage terminal module, etc. disposed therein, which is not limited herein. For specific descriptions of the operating system, virtualization software (KVM), data block storage terminal module, etc. in the virtual machine described above, reference may be made to existing related content or related content in the context of the present application.
What needs to be explained here is: the details of each step in the storage system provided in this embodiment may be referred to the corresponding content in each embodiment, which is not described herein. In addition, the storage system provided in this embodiment may further include other part or all of the steps in the foregoing embodiments, and specific reference may be made to the corresponding content of the foregoing embodiments, which is not repeated herein.
Yet another embodiment of the present application provides a data storage device corresponding to the above method embodiment, which may be deployed in a processing module 11 as shown in fig. 3. Fig. 4 shows a schematic structure of a data storage device. Referring to fig. 4, the data storage device provided in this embodiment includes: an acquisition module 21, a determination module 22 and a storage module 23; wherein,,
an obtaining module 21, configured to obtain at least one first data block after being compressed and encrypted from a log area of a storage disk;
a determining module 22 for determining a compression rate of the at least one first data block;
a storing module 23, configured to store the at least one first data block in a data area of the storage disk according to the compression rate.
Further, the storing module 23 is specifically configured to, when storing the at least one first data block into the data area of the storage disk according to the compression ratio: comparing the compression rate of the at least one first data block with a preset threshold; and storing the at least one first data block into a data area of the storage disk according to the comparison result.
Further, the number of the first data blocks is a plurality of; and correspondingly, the above-mentioned storing module 23, when used for storing the multiple first data blocks into the data area of the storage disk according to the comparison result, specifically is used for: processing a first data block of the plurality of first data blocks, the compression rate of which is greater than the preset threshold value, to generate at least one second data block which is compressed and encrypted and enables the compression rate to be smaller than or equal to the preset threshold value; storing said compressed and encrypted at least one second data block in said data area; and the first data blocks with the compression rate smaller than or equal to the preset threshold value in the plurality of first data blocks are not processed and directly stored in the data area.
Further, the above-mentioned storing module 23, when configured to process a first data block with a compression rate smaller than the set threshold value among the plurality of first data blocks to generate at least one compressed and encrypted second data block with a compression rate greater than or equal to the preset threshold value, is specifically configured to: decrypting and decompressing each first data block with the compression rate larger than the preset threshold value to obtain each first data block after decryption and decompression; obtaining the compression granularity corresponding to the data area; integrating each first data block after decryption and decompression into at least one second data block based on the compression granularity; and carrying out compression encryption on the at least one second data block to obtain at least one compressed and encrypted second data block.
Further, the determining module 22 is specifically configured to, when configured to determine 5 the compression rate of the at least one first data block: acquiring respective data information of the at least one first data block; wherein, the data information comprises the pre-compression size and the post-compression size of the corresponding first data block; and determining the compression rate of the at least one first data block according to the respective pre-compression size and post-compression size of the at least one first data block.
Further, the determining module 22 is specifically configured to, when determining the compression rate of the at least one data block according to the respective pre-compression size and post-compression size of the at least one first data block: calculating the ratio of the compressed size to the pre-compressed size of each of the at least one first data block; and determining the ratio corresponding to each of the at least one first data block as the compression rate of each of the at least one first data block.
Further, the obtaining module 21 is specifically configured to, when obtaining the compressed and encrypted at least one first data block from a log storage area of a storage disk: detecting whether at least one first data block stored in the log area after compression and encryption meets a transfer condition or not; and when the transfer condition is met, reading the at least one first data block from the log area.
Further, the apparatus provided in this embodiment may further include: the receiving module is used for receiving a writing request sent by the terminal node, wherein the writing request carries a disk identifier and a first data block to be stored; the compression encryption module is used for carrying out compression encryption on the first data block to obtain a compressed and encrypted first data block; the storing module 23 is further configured to store the compressed and encrypted first data block in a log area in a storage disk corresponding to the disk identifier.
What needs to be explained here is: the data storage device provided in this embodiment may implement the technical solution described in the data storage method embodiment shown in fig. 2, and the specific implementation principle of each module or unit may refer to the corresponding content in the data storage method embodiment shown in fig. 2, which is not described herein.
Yet another embodiment of the present application further provides a storage node, and the structure of the storage node may be shown in fig. 3. As shown in fig. 3, the storage node 1 provided in this embodiment includes: a storage disk 12, a processing module 11 and a memory (not shown). Wherein the memory is for storing one or more computer programs/instructions; the processing module is coupled to the memory, and is configured to execute one or more computer programs/instructions stored in the memory, so as to implement the steps or functions in the data storage method provided in the embodiment of the present application.
The memory may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
Further, the storage node may be a cloud server. For a specific description of the storage node, the processing module, and the storage disk, reference may be made to the relevant content of the other embodiments.
Accordingly, the present embodiments also provide a computer readable storage medium having stored thereon a computer program/instruction which, when executed by a computer (more specifically, a processing module 11 as shown in fig. 3 described above), is capable of implementing the data storage method steps or functions provided by the embodiments of the present application.
The methods in this application may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. Fig. 5 schematically shows a block diagram of a computer program product provided by the present application. The computer program product comprises a computer program/instructions 31 which, when the computer program/instructions 31 are executed by a processor, such as the processing module 11 shown in fig. 3, may fully or partially perform the processes or functions of the data storage methods provided by the embodiments of the present application. The computer may be a general purpose computer, a special purpose computer, a computer network, a network device, a user device, a core network device, an OAM, or other programmable apparatus.
The computer program or instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer program or instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired or wireless means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that integrates one or more available media. The usable medium may be a magnetic medium, e.g., floppy disk, hard disk, tape; but also optical media such as digital video discs; but also semiconductor media such as solid state disks. The computer readable storage medium may be volatile or nonvolatile storage medium, or may include both volatile and nonvolatile types of storage medium.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (11)

1. A data storage method, comprising:
acquiring at least one first data block after compression and encryption from a log area of a storage disk;
determining a compression rate of the at least one first data block;
and storing the at least one first data block into a data area of the storage disk according to the compression rate.
2. The method of claim 1, storing the at least one first data block in a data area of the storage disk according to the compression ratio, comprising:
comparing the compression rate of the at least one first data block with a preset threshold;
and storing the at least one first data block into a data area of the storage disk according to the comparison result.
3. The method of claim 2, the number of first data blocks being a plurality; and
storing the plurality of first data blocks into a data area of the storage disk according to the comparison result, wherein the method comprises the following steps:
processing a first data block of the plurality of first data blocks, the compression rate of which is greater than the preset threshold value, to generate at least one second data block which is compressed and encrypted and enables the compression rate to be smaller than or equal to the preset threshold value; storing said compressed and encrypted at least one second data block in said data area;
and the first data blocks with the compression rate smaller than or equal to the preset threshold value in the plurality of first data blocks are not processed and directly stored in the data area.
4. A method according to claim 3, processing a first data block of the plurality of first data blocks having a compression rate greater than the preset threshold to generate at least one second data block of compressed encryption having a compression rate less than or equal to the preset threshold, comprising:
decrypting and decompressing each first data block with the compression rate larger than the preset threshold value to obtain each first data block after decryption and decompression;
obtaining the compression granularity corresponding to the data area;
Integrating each first data block after decryption and decompression into at least one second data block based on the compression granularity;
and carrying out compression encryption on the at least one second data block to obtain at least one compressed and encrypted second data block.
5. The method of any of claims 1 to 4, determining a compression rate of the at least one first data block, comprising:
acquiring respective data information of the at least one first data block; wherein, the data information comprises the pre-compression size and the post-compression size of the corresponding first data block;
and determining the compression rate of the at least one first data block according to the respective pre-compression size and post-compression size of the at least one first data block.
6. The method of claim 5, determining the compression rate of the at least one first data block based on the respective pre-compression and post-compression sizes of the at least one first data block, comprising:
calculating the ratio of the compressed size to the pre-compressed size of each of the at least one first data block;
and determining the ratio corresponding to each of the at least one first data block as the compression rate of each of the at least one first data block.
7. The method according to any one of claims 1 to 4, wherein the obtaining the compressed and encrypted at least one first data block from a log storage area of a storage disk comprises:
detecting whether at least one first data block stored in the log area after compression and encryption meets a transfer condition or not;
and when the transfer condition is met, reading the at least one first data block from the log area.
8. The method of claim 7, further comprising:
receiving a writing request sent by a terminal node, wherein the writing request carries a disk identifier and a first data block to be stored;
compressing and encrypting the first data block to obtain a compressed and encrypted first data block;
and storing the compressed and encrypted first data block into a log area in a storage disk corresponding to the disk identifier.
9. A storage system, comprising:
a storage disk including a log area and a data area;
a processing module for implementing the steps in the data storage method according to any one of claims 1 to 8.
10. A storage node, comprising: a storage disk, a processing module and a memory, wherein,
the memory is used for storing a computing program;
The processing module, coupled to the memory, for executing the computing program stored in the memory for implementing the steps in the data storage method according to any one of claims 1 to 8.
11. A computer readable storage medium having stored thereon a computer program/instruction which when executed is capable of implementing the steps in the data storage method of any of claims 1 to 8.
CN202310182167.3A 2023-02-16 2023-02-16 Data storage method, system, storage node and computer readable storage medium Pending CN116166197A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310182167.3A CN116166197A (en) 2023-02-16 2023-02-16 Data storage method, system, storage node and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310182167.3A CN116166197A (en) 2023-02-16 2023-02-16 Data storage method, system, storage node and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN116166197A true CN116166197A (en) 2023-05-26

Family

ID=86419839

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310182167.3A Pending CN116166197A (en) 2023-02-16 2023-02-16 Data storage method, system, storage node and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN116166197A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117194355A (en) * 2023-11-08 2023-12-08 本原数据(北京)信息技术有限公司 Data processing method and device based on database and electronic equipment
CN117201501A (en) * 2023-09-15 2023-12-08 武汉鲸禾科技有限公司 Intelligent engineering sharing management system and operation method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117201501A (en) * 2023-09-15 2023-12-08 武汉鲸禾科技有限公司 Intelligent engineering sharing management system and operation method
CN117201501B (en) * 2023-09-15 2024-03-26 武汉鲸禾科技有限公司 Intelligent engineering sharing management system and operation method
CN117194355A (en) * 2023-11-08 2023-12-08 本原数据(北京)信息技术有限公司 Data processing method and device based on database and electronic equipment
CN117194355B (en) * 2023-11-08 2024-02-13 本原数据(北京)信息技术有限公司 Data processing method and device based on database and electronic equipment

Similar Documents

Publication Publication Date Title
CN116166197A (en) Data storage method, system, storage node and computer readable storage medium
US10372357B2 (en) Securely recovering stored data in a dispersed storage network
JP2020509490A (en) Sequential storage of data in zones in a distributed storage network
US20190325147A1 (en) Method and apparatus for processing data, computer device and storage medium
JP2009099151A (en) User query processing system and method by query encryption transformation in database including encrypted column
CN111427860B (en) Distributed storage system and data processing method thereof
US11803309B2 (en) Selective compression and encryption for data replication
CN115208701B (en) Data packet selective encryption method and device
US20230350918A1 (en) Storage Network for Rebuilding Encoded Data Slices and Processing System for Use Therewith
CN113420308A (en) Data access control method and control system for encryption memory
CN116541320A (en) Intelligent IO module bus communication method, IO module, terminal and medium
US10769016B2 (en) Storing a plurality of correlated data in a dispersed storage network
US20070263876A1 (en) In-memory compression and encryption
CN113704206B (en) Metadata processing method and device, electronic equipment and storage medium
CN114741448A (en) Remote synchronization method and device for Redis cluster service
CN109240849B (en) Data backup method and device and multipoint control unit for video conference system
CN112883400B (en) Business resource service method, device, electronic equipment and storage medium
CN113297587B (en) Data storage method and system
Frühwirth Secure, Cost-Efficient and Redundant Data Placement in the Cloud
CN117376403B (en) Cloud data migration method and system
CN115718926B (en) Method for dynamically distributing dual-system isolated file system
CN111866868B (en) Method and system for encrypting contact through hardware
KR102702029B1 (en) Cloud data acquisition device and method through dpapi-based data regeneration
CN111399763B (en) Method, device, system, equipment and storage medium for data storage
US10481832B2 (en) Applying a probability function to avoid storage operations for already-deleted data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination