CN111399768A

CN111399768A - Data storage method, system, equipment and computer readable storage medium

Info

Publication number: CN111399768A
Application number: CN202010110907.9A
Authority: CN
Inventors: 岳斌
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2020-02-21
Filing date: 2020-02-21
Publication date: 2020-07-10

Abstract

The application discloses a data storage method, which comprises the following steps: after receiving the data to be written, splitting the data from the initial position of the data to be written according to a preset granularity value; the data to be written after being split is deleted again; and storing the data after the deduplication. By applying the scheme of the application, the deduplication rate can be effectively improved, and the data storage performance can be further improved. The application also provides a data storage system, data storage equipment and a computer readable storage medium, and the data storage system, the data storage equipment and the computer readable storage medium have corresponding technical effects.

Description

Data storage method, system, equipment and computer readable storage medium

Technical Field

The present invention relates to the field of storage technologies, and in particular, to a method, a system, a device, and a computer-readable storage medium for storing data.

Background

In the field of storage, huge resources are required to be occupied for mass data query and storage, and a large amount of repeated data exists in the data. Therefore, in order to reduce the resources occupied by the stored data and improve the performance of data storage, the duplicate data is only stored in one storage medium, the consistency of the data is not affected, and the data storage amount on the disk, namely the deletion of the duplicate data, is reduced.

In the current scheme, when deduplication is implemented, splitting is performed according to a logical address alignment mode and using data of a fixed length as granularity, and then deduplication is performed. Taking 256k as an example, splitting the data according to a logical address alignment mode, that is, an integer multiple address of 256k is a data start position of each grain, and then sequentially backward 256k of data is one grain, so as to achieve the purpose of splitting the data into 256 k. However, this splitting approach can only guarantee that the logical addresses are aligned. When the situation of fig. 1 occurs, for example, two identical data are written into the memory system one after the other, but the logical addresses to which the data are written are not necessarily aligned with integer multiples of 256 k. In this case, even if the identical two data are written, the deduplication cannot be realized. Specifically, during the first writing, the start address of the data logical address is an integer multiple of the grain, the io is processed in the same io storage volume, during the second writing, the start address of the data logical address is not an integer multiple of the grain, and after the io is split by the storage volume in a grain alignment mode, a complete io of 256k is split into two ios for processing, so that the deduplication cannot be realized after the splitting is completed, the deduplication rate of the system is reduced, and the performance of data storage is not improved.

In summary, how to effectively improve the deduplication rate to improve the performance of data storage is a technical problem that needs to be solved urgently by those skilled in the art.

Disclosure of Invention

The invention aims to provide a data storage method, a system, equipment and a computer readable storage medium, which can effectively improve the deduplication rate to improve the performance of data storage.

In order to solve the technical problems, the invention provides the following technical scheme:

a method of storing data, comprising:

after receiving data to be written, splitting the data to be written according to a preset granularity value from the initial position of the data to be written;

the data to be written after being split is deleted again;

and storing the data after the deduplication.

Preferably, the method further comprises the following steps:

receiving a granularity value modification instruction;

and adjusting the preset value of the granularity value to the numerical value carried in the granularity value modification instruction.

Preferably, the value carried in the granularity value modification instruction is a value determined by:

and counting the average size of the data to be written received in the first time period, and determining a corresponding numerical value as a numerical value carried in the granularity value modification instruction based on the average size.

Preferably, the average size is positively correlated to the determined value.

Preferably, after storing the data after the deduplication, the method further includes:

and outputting prompt information indicating that the storage is finished.

A system for storing data, comprising:

the data splitting module is used for splitting the data to be written according to a preset granularity value from the initial position of the data to be written after the data to be written is received;

the data deduplication module is used for deduplication of the data to be written after the data to be written is split;

and the data storage module is used for storing the data after the deduplication.

Preferably, the method further comprises the following steps:

the granularity value modification instruction receiving module is used for receiving granularity value modification instructions;

and the granularity value adjusting module is used for adjusting the preset granularity value to the numerical value carried in the granularity value modification instruction.

Preferably, the method further comprises the following steps:

and the prompt information output module is used for outputting prompt information which shows that the storage is finished after the data storage module stores the data subjected to the deduplication.

A storage device for data, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the steps of the method of storing data as claimed in any one of the above.

A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the steps of a method of storing data as set forth in any one of the above.

By applying the technical scheme provided by the embodiment of the invention, the splitting is not performed in a logical address alignment mode, and for the data to be written, the splitting is performed according to the preset granularity value from the initial position of the data to be written, so that the situation that the deduplication rate is increased due to the fact that the written logical addresses are not aligned in the traditional scheme can be avoided. If the data contents of the data to be written in the two times are the same, the situation that the traditional scheme can not realize the deduplication can not occur. Therefore, the scheme of the application can effectively improve the deduplication rate, and is further beneficial to improving the performance of data storage.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a diagram illustrating data splitting according to logical address alignment in a conventional scheme;

FIG. 2 is a flow chart of an embodiment of a data storage method according to the present invention;

FIG. 3 is a schematic diagram of a data storage system according to the present invention;

fig. 4 is a schematic structural diagram of a data storage method according to the present invention.

Detailed Description

The core of the invention is to provide a data storage method, which can effectively improve the deduplication rate and is further beneficial to improving the performance of data storage.

In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 2, fig. 2 is a flowchart illustrating an implementation of a data storage method according to the present invention, where the data storage method may include the following steps:

step S101: after receiving the data to be written, splitting the data from the initial position of the data to be written according to a preset granularity value.

Specifically, the data to be written issued by the host may be received, and after the data to be written is received, when the application performs splitting, the splitting is performed according to a preset granularity value from the initial position of the data to be written. For example, 8k of the preset granularity value, the data to be written is represented as xxxxyyyyyyzzzz, and X, Y, Z herein represents data of 8k size, then for the data to be written, regardless of the logical address of the data, after splitting according to the principles of the present application, the splitting result is uniquely determined, in this example, split into X, Y, Z. I.e. 4X, 4Y and 4Z are obtained.

Step S102: and deleting the data to be written after the splitting.

After the splitting is performed, the repeated data may be deleted, for example, 4X, 4Y and 4Z are obtained in the above example, and only 1X, 1Y and 1Z are reserved when step S102 is performed, thereby effectively reducing the resources occupied by the stored data. Of course, since only 1X, 1Y and 1Z are stored, and the data before deduplication is xxxxyyyyyyzzzz, the mapping information generally needs to be stored in the corresponding database.

Step S103: and storing the data after the deduplication.

Because the data subjected to the deduplication is dropped into the disk instead of directly storing the data to be written, the storage space occupied by the stored data is favorably reduced.

Further, in an embodiment of the present invention, the granularity of splitting may be adjusted manually by a user, and specifically, the method may further include the following two steps:

the method comprises the following steps: receiving a granularity value modification instruction;

step two: and adjusting the value of the preset granularity value into a numerical value carried in the granularity value modification instruction.

In practical application, a default value is usually preset as a granularity value to be used during splitting, and in the scheme of the application, the service condition is further considered to be constantly changed, and the originally set granularity value may not well meet the current service requirement, so that a user can send a granularity value modification instruction through input equipment according to actual needs and experience summary. After receiving a granularity value modification instruction sent by a user, adjusting the value of the preset granularity value to the numerical value carried in the granularity value modification instruction, so that the scheme of the application can support the user to manually adjust the granularity during splitting, and is favorable for further improving the deduplication rate under the current condition.

Of course, instead of manually adjusting the granularity during splitting, a corresponding policy may be set to automatically adjust the granularity value. For example, the granularity value modification instruction may be automatically generated periodically, and the numerical value carried in the granularity value modification instruction may be a numerical value determined by:

In the embodiment, the situation that the size difference of the data to be written once is large is considered, so that the average size of the data to be written received each time in the first time period is counted to reflect the situation of the data to be written in the current time period, and the service situation of the current time period can be further reflected. Of course, the specific value of the first duration may also be set and adjusted according to actual needs, and the implementation of the present invention is not affected.

When determining the corresponding numerical value as the numerical value carried in the granularity value modification instruction based on the average size, considering that the size of the granularity value should be properly reduced when the data to be written is small, and avoiding the situations that the repeated data is less and the deduplication rate is low due to overlarge granularity. Accordingly, when the data to be written is large, the size of the granularity value can be appropriately increased. I.e. the average size may be positively correlated with the determined value.

In an embodiment of the present invention, after step S103, the method may further include: and outputting prompt information indicating that the storage is finished so that the user can receive feedback.

Corresponding to the above method embodiments, the embodiments of the present invention further provide a data storage system, which can be referred to in correspondence with the above.

Referring to fig. 3, a schematic structural diagram of a data storage system according to the present invention may include:

a data splitting module 301, configured to split the data to be written according to a preset granularity value from an initial position of the data to be written after receiving the data to be written,

a data deduplication module 302, configured to deduplication the split data to be written;

a data storage module 303, configured to store the data after being subjected to deduplication.

In one embodiment of the present invention, the method further comprises:

and the granularity value adjusting module is used for adjusting the value of the preset granularity value into a numerical value carried in the granularity value modification instruction.

In a specific embodiment of the present invention, the value carried in the granularity value modification instruction is a value determined by the granularity value automatic adjustment module:

and the granularity value automatic adjusting module is used for counting the average size of each time of data to be written received in the first time period, and determining a corresponding numerical value as a numerical value carried in the granularity value modifying instruction based on the average size.

In one embodiment of the invention, the average size is positively correlated with the determined value.

In one embodiment of the present invention, the method further comprises:

Corresponding to the above method and system embodiments, the present invention also provides a data storage device and a computer readable storage medium, which can be referred to in correspondence with the above. The computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the method of storing data as in any of the embodiments described above. A computer-readable storage medium as referred to herein may include Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Referring to fig. 4, the data storage device may include:

a memory 401 for storing a computer program;

a processor 402 for executing a computer program to implement the steps of the method of storing data as in any of the embodiments described above.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. The principle and the implementation of the present invention are explained in the present application by using specific examples, and the above description of the embodiments is only used to help understanding the technical solution and the core idea of the present invention. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims

1. A method for storing data, comprising:

the data to be written after being split is deleted again;

and storing the data after the deduplication.

2. The data storage method according to claim 1, further comprising:

receiving a granularity value modification instruction;

3. The method according to claim 2, wherein the value carried in the granularity value modification instruction is a value determined by:

4. The method of claim 3, wherein the average size is positively correlated to the determined value.

5. The method for storing data according to claim 1, further comprising, after storing the data after the deduplication, the steps of:

and outputting prompt information indicating that the storage is finished.

6. A system for storing data, comprising:

7. The data storage system of claim 6, further comprising:

8. The data storage system of claim 6, further comprising:

9. A device for storing data, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the steps of the method of storing data according to any one of claims 1 to 5.

10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the method of storing data according to any one of claims 1 to 5.