CN111399768A - Data storage method, system, equipment and computer readable storage medium - Google Patents

Data storage method, system, equipment and computer readable storage medium Download PDF

Info

Publication number
CN111399768A
CN111399768A CN202010110907.9A CN202010110907A CN111399768A CN 111399768 A CN111399768 A CN 111399768A CN 202010110907 A CN202010110907 A CN 202010110907A CN 111399768 A CN111399768 A CN 111399768A
Authority
CN
China
Prior art keywords
data
written
value
deduplication
storing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010110907.9A
Other languages
Chinese (zh)
Inventor
岳斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010110907.9A priority Critical patent/CN111399768A/en
Publication of CN111399768A publication Critical patent/CN111399768A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data storage method, which comprises the following steps: after receiving the data to be written, splitting the data from the initial position of the data to be written according to a preset granularity value; the data to be written after being split is deleted again; and storing the data after the deduplication. By applying the scheme of the application, the deduplication rate can be effectively improved, and the data storage performance can be further improved. The application also provides a data storage system, data storage equipment and a computer readable storage medium, and the data storage system, the data storage equipment and the computer readable storage medium have corresponding technical effects.

Description

Data storage method, system, equipment and computer readable storage medium
Technical Field
The present invention relates to the field of storage technologies, and in particular, to a method, a system, a device, and a computer-readable storage medium for storing data.
Background
In the field of storage, huge resources are required to be occupied for mass data query and storage, and a large amount of repeated data exists in the data. Therefore, in order to reduce the resources occupied by the stored data and improve the performance of data storage, the duplicate data is only stored in one storage medium, the consistency of the data is not affected, and the data storage amount on the disk, namely the deletion of the duplicate data, is reduced.
In the current scheme, when deduplication is implemented, splitting is performed according to a logical address alignment mode and using data of a fixed length as granularity, and then deduplication is performed. Taking 256k as an example, splitting the data according to a logical address alignment mode, that is, an integer multiple address of 256k is a data start position of each grain, and then sequentially backward 256k of data is one grain, so as to achieve the purpose of splitting the data into 256 k. However, this splitting approach can only guarantee that the logical addresses are aligned. When the situation of fig. 1 occurs, for example, two identical data are written into the memory system one after the other, but the logical addresses to which the data are written are not necessarily aligned with integer multiples of 256 k. In this case, even if the identical two data are written, the deduplication cannot be realized. Specifically, during the first writing, the start address of the data logical address is an integer multiple of the grain, the io is processed in the same io storage volume, during the second writing, the start address of the data logical address is not an integer multiple of the grain, and after the io is split by the storage volume in a grain alignment mode, a complete io of 256k is split into two ios for processing, so that the deduplication cannot be realized after the splitting is completed, the deduplication rate of the system is reduced, and the performance of data storage is not improved.
In summary, how to effectively improve the deduplication rate to improve the performance of data storage is a technical problem that needs to be solved urgently by those skilled in the art.
Disclosure of Invention
The invention aims to provide a data storage method, a system, equipment and a computer readable storage medium, which can effectively improve the deduplication rate to improve the performance of data storage.
In order to solve the technical problems, the invention provides the following technical scheme:
a method of storing data, comprising:
after receiving data to be written, splitting the data to be written according to a preset granularity value from the initial position of the data to be written;
the data to be written after being split is deleted again;
and storing the data after the deduplication.
Preferably, the method further comprises the following steps:
receiving a granularity value modification instruction;
and adjusting the preset value of the granularity value to the numerical value carried in the granularity value modification instruction.
Preferably, the value carried in the granularity value modification instruction is a value determined by:
and counting the average size of the data to be written received in the first time period, and determining a corresponding numerical value as a numerical value carried in the granularity value modification instruction based on the average size.
Preferably, the average size is positively correlated to the determined value.
Preferably, after storing the data after the deduplication, the method further includes:
and outputting prompt information indicating that the storage is finished.
A system for storing data, comprising:
the data splitting module is used for splitting the data to be written according to a preset granularity value from the initial position of the data to be written after the data to be written is received;
the data deduplication module is used for deduplication of the data to be written after the data to be written is split;
and the data storage module is used for storing the data after the deduplication.
Preferably, the method further comprises the following steps:
the granularity value modification instruction receiving module is used for receiving granularity value modification instructions;
and the granularity value adjusting module is used for adjusting the preset granularity value to the numerical value carried in the granularity value modification instruction.
Preferably, the method further comprises the following steps:
and the prompt information output module is used for outputting prompt information which shows that the storage is finished after the data storage module stores the data subjected to the deduplication.
A storage device for data, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of the method of storing data as claimed in any one of the above.
A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the steps of a method of storing data as set forth in any one of the above.
By applying the technical scheme provided by the embodiment of the invention, the splitting is not performed in a logical address alignment mode, and for the data to be written, the splitting is performed according to the preset granularity value from the initial position of the data to be written, so that the situation that the deduplication rate is increased due to the fact that the written logical addresses are not aligned in the traditional scheme can be avoided. If the data contents of the data to be written in the two times are the same, the situation that the traditional scheme can not realize the deduplication can not occur. Therefore, the scheme of the application can effectively improve the deduplication rate, and is further beneficial to improving the performance of data storage.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a diagram illustrating data splitting according to logical address alignment in a conventional scheme;
FIG. 2 is a flow chart of an embodiment of a data storage method according to the present invention;
FIG. 3 is a schematic diagram of a data storage system according to the present invention;
fig. 4 is a schematic structural diagram of a data storage method according to the present invention.
Detailed Description
The core of the invention is to provide a data storage method, which can effectively improve the deduplication rate and is further beneficial to improving the performance of data storage.
In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 2, fig. 2 is a flowchart illustrating an implementation of a data storage method according to the present invention, where the data storage method may include the following steps:
step S101: after receiving the data to be written, splitting the data from the initial position of the data to be written according to a preset granularity value.
Specifically, the data to be written issued by the host may be received, and after the data to be written is received, when the application performs splitting, the splitting is performed according to a preset granularity value from the initial position of the data to be written. For example, 8k of the preset granularity value, the data to be written is represented as xxxxyyyyyyzzzz, and X, Y, Z herein represents data of 8k size, then for the data to be written, regardless of the logical address of the data, after splitting according to the principles of the present application, the splitting result is uniquely determined, in this example, split into X, Y, Z. I.e. 4X, 4Y and 4Z are obtained.
Step S102: and deleting the data to be written after the splitting.
After the splitting is performed, the repeated data may be deleted, for example, 4X, 4Y and 4Z are obtained in the above example, and only 1X, 1Y and 1Z are reserved when step S102 is performed, thereby effectively reducing the resources occupied by the stored data. Of course, since only 1X, 1Y and 1Z are stored, and the data before deduplication is xxxxyyyyyyzzzz, the mapping information generally needs to be stored in the corresponding database.
Step S103: and storing the data after the deduplication.
Because the data subjected to the deduplication is dropped into the disk instead of directly storing the data to be written, the storage space occupied by the stored data is favorably reduced.
Further, in an embodiment of the present invention, the granularity of splitting may be adjusted manually by a user, and specifically, the method may further include the following two steps:
the method comprises the following steps: receiving a granularity value modification instruction;
step two: and adjusting the value of the preset granularity value into a numerical value carried in the granularity value modification instruction.
In practical application, a default value is usually preset as a granularity value to be used during splitting, and in the scheme of the application, the service condition is further considered to be constantly changed, and the originally set granularity value may not well meet the current service requirement, so that a user can send a granularity value modification instruction through input equipment according to actual needs and experience summary. After receiving a granularity value modification instruction sent by a user, adjusting the value of the preset granularity value to the numerical value carried in the granularity value modification instruction, so that the scheme of the application can support the user to manually adjust the granularity during splitting, and is favorable for further improving the deduplication rate under the current condition.
Of course, instead of manually adjusting the granularity during splitting, a corresponding policy may be set to automatically adjust the granularity value. For example, the granularity value modification instruction may be automatically generated periodically, and the numerical value carried in the granularity value modification instruction may be a numerical value determined by:
and counting the average size of the data to be written received in the first time period, and determining a corresponding numerical value as a numerical value carried in the granularity value modification instruction based on the average size.
In the embodiment, the situation that the size difference of the data to be written once is large is considered, so that the average size of the data to be written received each time in the first time period is counted to reflect the situation of the data to be written in the current time period, and the service situation of the current time period can be further reflected. Of course, the specific value of the first duration may also be set and adjusted according to actual needs, and the implementation of the present invention is not affected.
When determining the corresponding numerical value as the numerical value carried in the granularity value modification instruction based on the average size, considering that the size of the granularity value should be properly reduced when the data to be written is small, and avoiding the situations that the repeated data is less and the deduplication rate is low due to overlarge granularity. Accordingly, when the data to be written is large, the size of the granularity value can be appropriately increased. I.e. the average size may be positively correlated with the determined value.
In an embodiment of the present invention, after step S103, the method may further include: and outputting prompt information indicating that the storage is finished so that the user can receive feedback.
By applying the technical scheme provided by the embodiment of the invention, the splitting is not performed in a logical address alignment mode, and for the data to be written, the splitting is performed according to the preset granularity value from the initial position of the data to be written, so that the situation that the deduplication rate is increased due to the fact that the written logical addresses are not aligned in the traditional scheme can be avoided. If the data contents of the data to be written in the two times are the same, the situation that the traditional scheme can not realize the deduplication can not occur. Therefore, the scheme of the application can effectively improve the deduplication rate, and is further beneficial to improving the performance of data storage.
Corresponding to the above method embodiments, the embodiments of the present invention further provide a data storage system, which can be referred to in correspondence with the above.
Referring to fig. 3, a schematic structural diagram of a data storage system according to the present invention may include:
a data splitting module 301, configured to split the data to be written according to a preset granularity value from an initial position of the data to be written after receiving the data to be written,
a data deduplication module 302, configured to deduplication the split data to be written;
a data storage module 303, configured to store the data after being subjected to deduplication.
In one embodiment of the present invention, the method further comprises:
the granularity value modification instruction receiving module is used for receiving granularity value modification instructions;
and the granularity value adjusting module is used for adjusting the value of the preset granularity value into a numerical value carried in the granularity value modification instruction.
In a specific embodiment of the present invention, the value carried in the granularity value modification instruction is a value determined by the granularity value automatic adjustment module:
and the granularity value automatic adjusting module is used for counting the average size of each time of data to be written received in the first time period, and determining a corresponding numerical value as a numerical value carried in the granularity value modifying instruction based on the average size.
In one embodiment of the invention, the average size is positively correlated with the determined value.
In one embodiment of the present invention, the method further comprises:
and the prompt information output module is used for outputting prompt information which shows that the storage is finished after the data storage module stores the data subjected to the deduplication.
Corresponding to the above method and system embodiments, the present invention also provides a data storage device and a computer readable storage medium, which can be referred to in correspondence with the above. The computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the method of storing data as in any of the embodiments described above. A computer-readable storage medium as referred to herein may include Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Referring to fig. 4, the data storage device may include:
a memory 401 for storing a computer program;
a processor 402 for executing a computer program to implement the steps of the method of storing data as in any of the embodiments described above.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. The principle and the implementation of the present invention are explained in the present application by using specific examples, and the above description of the embodiments is only used to help understanding the technical solution and the core idea of the present invention. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims (10)

1. A method for storing data, comprising:
after receiving data to be written, splitting the data to be written according to a preset granularity value from the initial position of the data to be written;
the data to be written after being split is deleted again;
and storing the data after the deduplication.
2. The data storage method according to claim 1, further comprising:
receiving a granularity value modification instruction;
and adjusting the preset value of the granularity value to the numerical value carried in the granularity value modification instruction.
3. The method according to claim 2, wherein the value carried in the granularity value modification instruction is a value determined by:
and counting the average size of the data to be written received in the first time period, and determining a corresponding numerical value as a numerical value carried in the granularity value modification instruction based on the average size.
4. The method of claim 3, wherein the average size is positively correlated to the determined value.
5. The method for storing data according to claim 1, further comprising, after storing the data after the deduplication, the steps of:
and outputting prompt information indicating that the storage is finished.
6. A system for storing data, comprising:
the data splitting module is used for splitting the data to be written according to a preset granularity value from the initial position of the data to be written after the data to be written is received;
the data deduplication module is used for deduplication of the data to be written after the data to be written is split;
and the data storage module is used for storing the data after the deduplication.
7. The data storage system of claim 6, further comprising:
the granularity value modification instruction receiving module is used for receiving granularity value modification instructions;
and the granularity value adjusting module is used for adjusting the preset granularity value to the numerical value carried in the granularity value modification instruction.
8. The data storage system of claim 6, further comprising:
and the prompt information output module is used for outputting prompt information which shows that the storage is finished after the data storage module stores the data subjected to the deduplication.
9. A device for storing data, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of the method of storing data according to any one of claims 1 to 5.
10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps of the method of storing data according to any one of claims 1 to 5.
CN202010110907.9A 2020-02-21 2020-02-21 Data storage method, system, equipment and computer readable storage medium Pending CN111399768A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010110907.9A CN111399768A (en) 2020-02-21 2020-02-21 Data storage method, system, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010110907.9A CN111399768A (en) 2020-02-21 2020-02-21 Data storage method, system, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN111399768A true CN111399768A (en) 2020-07-10

Family

ID=71434020

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010110907.9A Pending CN111399768A (en) 2020-02-21 2020-02-21 Data storage method, system, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111399768A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150142755A1 (en) * 2012-08-24 2015-05-21 Hitachi, Ltd. Storage apparatus and data management method
CN105493080A (en) * 2013-12-23 2016-04-13 华为技术有限公司 Method and apparatus for context aware based data de-duplication
US20160291877A1 (en) * 2013-12-24 2016-10-06 Hitachi, Ltd. Storage system and deduplication control method
CN106610794A (en) * 2016-11-21 2017-05-03 深圳市深信服电子科技有限公司 Convergence blocking method and device for data deduplication
WO2018156503A1 (en) * 2017-02-24 2018-08-30 Netapp Inc. Methods for performing data deduplication on data blocks at granularity level and devices thereof
CN109800218A (en) * 2019-01-04 2019-05-24 平安科技(深圳)有限公司 Distributed memory system, memory node equipment and data duplicate removal method
CN110427347A (en) * 2019-07-08 2019-11-08 新华三技术有限公司成都分公司 Method, apparatus, memory node and the storage medium of data de-duplication

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150142755A1 (en) * 2012-08-24 2015-05-21 Hitachi, Ltd. Storage apparatus and data management method
CN105493080A (en) * 2013-12-23 2016-04-13 华为技术有限公司 Method and apparatus for context aware based data de-duplication
US20160291877A1 (en) * 2013-12-24 2016-10-06 Hitachi, Ltd. Storage system and deduplication control method
CN106610794A (en) * 2016-11-21 2017-05-03 深圳市深信服电子科技有限公司 Convergence blocking method and device for data deduplication
WO2018156503A1 (en) * 2017-02-24 2018-08-30 Netapp Inc. Methods for performing data deduplication on data blocks at granularity level and devices thereof
CN109800218A (en) * 2019-01-04 2019-05-24 平安科技(深圳)有限公司 Distributed memory system, memory node equipment and data duplicate removal method
CN110427347A (en) * 2019-07-08 2019-11-08 新华三技术有限公司成都分公司 Method, apparatus, memory node and the storage medium of data de-duplication

Similar Documents

Publication Publication Date Title
CN107436725A (en) A kind of data are write, read method, apparatus and distributed objects storage cluster
CN110737388A (en) Data pre-reading method, client, server and file system
CN109240607B (en) File reading method and device
CN108874324B (en) Access request processing method, device, equipment and readable storage medium
CN110399096B (en) Method, device and equipment for deleting metadata cache of distributed file system again
CN113094183B (en) Training task creating method, device, system and medium of AI (Artificial Intelligence) training platform
CN111061752A (en) Data processing method and device and electronic equipment
CN111880734A (en) Data processing method, system, electronic equipment and storage medium
CN109471843A (en) A kind of metadata cache method, system and relevant apparatus
CN113794764A (en) Request processing method and medium for server cluster and electronic device
CN109508150B (en) Method and device for allocating storage space
CN111506254B (en) Distributed storage system and management method and device thereof
CN111124307B (en) Data downloading and brushing method, device, equipment and readable storage medium
CN111176570B (en) Thick backup roll creating method, device, equipment and medium
CN111399768A (en) Data storage method, system, equipment and computer readable storage medium
CN116700606A (en) Data storage method, device, equipment and storage medium
CN112003944B (en) Method, system, equipment and storage medium for uploading object file
CN115599299A (en) Storage bucket management method and device, electronic equipment and storage medium
CN110866066B (en) Service processing method and device
CN110362769B (en) Data processing method and device
CN110874268B (en) Data processing method, device and equipment
CN112084123B (en) Data processing method and device and data processing system
CN111090633A (en) Small file aggregation method, device and equipment of distributed file system
CN111143418A (en) Data reading method, device and equipment for database and storage medium
CN107862095B (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200710