CN107391761B

CN107391761B - Data management method and device based on repeated data deletion technology

Info

Publication number: CN107391761B
Application number: CN201710750609.4A
Authority: CN
Inventors: 胡永刚; 王利朋
Original assignee: Suzhou Wave Intelligent Technology Co Ltd
Current assignee: Suzhou Wave Intelligent Technology Co Ltd
Priority date: 2017-08-28
Filing date: 2017-08-28
Publication date: 2020-03-06
Anticipated expiration: 2037-08-28
Also published as: CN107391761A

Abstract

The invention discloses a data management method and a device based on a repeated data deleting technology, wherein the method calculates a fingerprint value of target data through a HASH algorithm; determining a storage position corresponding to the fingerprint value of the target data through CRUSH mapping; then, the target data is used as data to be stored, and whether the storage position of the data to be stored is stored with data is judged; if yes, adding one to the reference count of the data to be stored; if the first metadata information of the data to be stored is not stored, the data to be stored is stored, the reference count of the data to be stored is set to be one, and finally the first metadata information of the data to be stored is stored. Therefore, in the process of data storage, the repeated storage of data is avoided, and the working efficiency is improved; meanwhile, based on the repeated data technology, the management of data is realized, the cost is saved, and the service life of the storage system is prolonged. The data management device based on the data de-duplication technology provided by the embodiment of the invention also has the technical effects.

Description

Data management method and device based on repeated data deletion technology

Technical Field

The invention relates to the technical field of cloud computing data centers, in particular to a data management method and device based on a data de-duplication technology.

Background

With the rapid development of computer technology and internet industry, data information is increasing day by day, and a distributed storage system is developed in order to save storage space and realize resource sharing. The distributed storage system dispersedly stores data on a plurality of independent devices, adopts an expandable system structure, utilizes a plurality of storage servers to share storage load, utilizes the position server to position storage information, can improve the reliability, the availability and the management efficiency of the system, and is easy to expand.

However, since many terminals can access the storage server, a large amount of repeated data inevitably exists in the storage server, and the storage space is occupied, the repeated data deleting technology for optimizing the storage capacity solves the problem. Deduplication technology has found wide applications in backup, long-term archiving, and data disaster recovery, by eliminating duplicate data in a storage system, reducing the data actually stored in the system or transmitted over a network. In the field of distributed storage, in order to reduce the cost of storage unit capacity, the processing of online repeated data is urgent.

Therefore, how to implement the repeating data technology in the field of distributed storage, that is, how to implement the operations of storing, reading, and deleting data in the field of distributed storage by using the repeating data technology, is a problem to be solved by those skilled in the art.

Disclosure of Invention

The invention aims to provide a data management method and device based on a repeated data deleting technology, so as to realize the storage, reading and deleting operations of data based on the repeated data technology in the field of distributed storage.

In order to achieve the above purpose, the embodiment of the present invention provides the following technical solutions:

a data management method based on data de-duplication technology comprises the following steps:

s11, calculating a fingerprint value of the target data through a HASH algorithm;

s12, determining a storage position corresponding to the fingerprint value of the target data through CRUSH mapping; taking the target data as data to be stored, and executing S13;

s13, judging whether data exist in the storage position corresponding to the data to be stored; if yes, go to S14; if not, go to S15;

s14, adding one to the reference count corresponding to the data to be stored, and executing S16;

s15, storing the data to be stored to the storage position corresponding to the data to be stored, setting the reference count corresponding to the data to be stored to be one, and executing S16;

s16, storing first metadata information of the data to be stored, wherein the first metadata information comprises: fingerprint value of the data to be stored.

Before executing the S11, the method further includes:

s21, judging whether second metadata information corresponding to the target data exists or not; if yes, go to S22; if not, go to S11;

s22, acquiring the second metadata information;

s23, judging whether the second metadata information has a fingerprint value; if yes, go to S24; if not, go to S11;

s24, comparing the length of the target data with a preset data length; if the length of the target data is equal to the preset data length, executing S11; if the length of the target data is smaller than the preset data length, executing S25;

s25, splicing the target data and the data corresponding to the second metadata information to obtain spliced data, calculating a fingerprint value of the spliced data, and executing S26;

s26, determining a storage position corresponding to the fingerprint value of the splicing data through CRUSH mapping; the spliced data is regarded as data to be stored, and S13 is executed.

Wherein, if the length of the target data is equal to the preset data length, the method includes:

if the length of the target data is equal to the preset data length, subtracting one from the reference count of the data corresponding to the second metadata information;

judging whether the reference count of the data corresponding to the second metadata information is zero or not;

and if so, deleting the data corresponding to the second metadata information.

The splicing the target data and the data corresponding to the second metadata information to obtain spliced data, and calculating the fingerprint value of the spliced data includes:

acquiring data content corresponding to the second metadata information;

splicing the target data and the data corresponding to the second metadata information according to the preset data length and the preset data offset to obtain spliced data;

calculating a fingerprint value of the spliced data;

subtracting one from the reference count of the data corresponding to the second metadata information;

and if so, deleting the data corresponding to the second metadata information.

Wherein, still include:

receiving a deletion request sent by a client;

determining data to be deleted according to the deletion request, and acquiring third data information of the data to be deleted and fingerprint values of the data to be deleted in the third data information;

determining a storage position corresponding to the fingerprint value of the data to be deleted through CRUSH mapping, and subtracting one from the reference count corresponding to the data to be deleted;

judging whether the reference count corresponding to the data to be deleted is zero or not;

and if so, deleting the data to be deleted and the third element data information.

A data management apparatus based on deduplication technology, comprising:

the first calculation module is used for calculating a fingerprint value of the target data through a HASH algorithm;

the first determining module is used for determining a storage position corresponding to the fingerprint value of the target data through CRUSH mapping, and taking the target data as data to be stored;

the first judgment module is used for judging whether data exist in the storage position corresponding to the data to be stored;

the first execution module is used for adding one to the reference count corresponding to the data to be stored when the data is stored in the storage position corresponding to the data to be stored;

the first storage module is used for storing the data to be stored to the storage position corresponding to the data to be stored and setting the reference count corresponding to the data to be stored to be one when the data is not stored in the storage position corresponding to the data to be stored;

the second storage module is used for storing first metadata information of the data to be stored, and the first metadata information comprises: fingerprint value of the data to be stored.

Wherein, still include:

the second judgment module is used for judging whether second metadata information corresponding to the target data exists or not; if not, triggering the first computing module;

the first acquisition module is used for acquiring second metadata information corresponding to the target data when the second metadata information exists;

the third judging module is used for judging whether the second metadata information has a fingerprint value; if not, triggering the first computing module;

the comparison module is used for comparing the length of the target data with a preset data length when the fingerprint value exists in the second metadata information; if the length of the target data is equal to the preset data length, triggering the first calculation module;

the splicing module is used for splicing the target data and the data corresponding to the second metadata information to obtain spliced data when the length of the target data is smaller than the preset data length, and calculating a fingerprint value of the spliced data;

and the second determining module is used for determining the storage position corresponding to the fingerprint value of the splicing data through CRUSH mapping.

Wherein the comparison module comprises:

the first execution unit is used for subtracting one from the reference count of the data corresponding to the second metadata information when the length of the target data is equal to the preset data length;

a first judging unit, configured to judge whether a reference count of data corresponding to the second metadata information is zero;

and the first deleting unit is used for deleting the data corresponding to the second metadata information when the reference count of the data corresponding to the second metadata information is zero.

Wherein, the concatenation module includes:

an acquisition unit configured to acquire data content corresponding to the second metadata information;

the splicing unit is used for splicing the target data and the data corresponding to the second metadata information according to the preset data length and the preset data offset to obtain spliced data;

the calculation unit is used for calculating the fingerprint value of the splicing data;

the second execution unit is used for subtracting one from the reference count of the data corresponding to the second metadata information;

a second judging unit, configured to judge whether a reference count of data corresponding to the second metadata information is zero;

and the second deleting unit is used for deleting the data corresponding to the second metadata information when the reference count of the data corresponding to the second metadata information is zero.

Wherein, still include:

the receiving module is used for receiving a deleting request sent by a client;

the second obtaining module is used for determining data to be deleted according to the deletion request and obtaining third metadata information of the data to be deleted and fingerprint values of the data to be deleted in the third metadata information;

a third determining module, configured to determine, through CRUSH mapping, a storage location corresponding to a fingerprint value of the data to be deleted, and subtract one from a reference count corresponding to the data to be deleted;

the fourth judging module is used for judging whether the reference count corresponding to the data to be deleted is zero or not;

and the deleting module is used for deleting the data to be deleted and the third metadata information when the reference count corresponding to the data to be deleted is zero.

According to the scheme, the data management method based on the data de-duplication technology provided by the embodiment of the invention comprises the following steps:

Therefore, the fingerprint value of the target data is calculated through the HASH algorithm; determining a storage position corresponding to the fingerprint value of the target data through CRUSH mapping; the uniqueness of the target data is determined by the fingerprint value, and the uniqueness of the storage position of the target data is further determined; then, the target data is used as data to be stored, and whether data exist in a storage position corresponding to the data to be stored is judged; because the data to be stored has a unique storage position, if the data is stored in the storage position, the data to be stored is indicated to be stored, the data to be stored is not stored any more, and the reference count corresponding to the data to be stored is increased by one; if the storage position does not store the data, indicating that the data to be stored is not stored, storing the data to be stored to a storage position corresponding to the data to be stored, setting a reference count corresponding to the data to be stored to be one, and finally storing first metadata information of the data to be stored, wherein the first metadata information comprises: fingerprint value of the data to be stored. Therefore, by the method, in the process of data storage, not only is the repeated storage of data avoided, but also the working efficiency is improved, and the storage space of the system is saved; meanwhile, based on the repeated data technology, the data management is realized in the field of distributed storage, the cost is saved, and the service life of a storage system is prolonged.

Accordingly, the data management device based on the data de-duplication technology provided by the embodiment of the invention also has the technical effects.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart of a data management method based on a deduplication technology according to an embodiment of the present invention;

FIG. 2 is a flowchart of another data management method based on deduplication technology according to an embodiment of the present disclosure;

FIG. 3 is a flowchart of a data deletion method in a data management method based on a deduplication technology according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a data management apparatus based on a deduplication technology according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention discloses a data management method and device based on a repeated data deleting technology, which aim to realize the operations of storing, reading and deleting data based on the repeated data technology in the field of distributed storage.

Referring to fig. 1, a data management method based on a deduplication technology provided in an embodiment of the present invention includes:

specifically, in this embodiment, the target data is data to be stored in the current operation, and the target data needs to be partitioned first before calculating the fingerprint value of the target data.

In the field of distributed storage, data to be stored is generally divided into data of the size of an underlying storage object in order to regularize the data stored in the underlying storage. For example: if the underlying storage object is divided into 4M size and the size of the target data is 10M, the target data is divided into three blocks of 4M, 4M and 2M according to 4M. I.e. the data to be stored is cut into blocks smaller than or equal to 4M.

Specifically, when calculating the fingerprint value of the target data, the fingerprint value of the target data is calculated by the HASH algorithm according to the data content of the blocks, and the fingerprint value corresponds to the data content of the blocks one to one, that is, corresponds to the data to be stored one to one, that is, the data content matches the fingerprint value in pairs, and forms key-value-pair matching information. If the target data is divided into a plurality of blocks, each block has a fingerprint value, and subsequent operation is performed on the data corresponding to each fingerprint value; if the target data is divided into one block, the target data has one fingerprint value, and the subsequent operation is performed on the fingerprint value. In this embodiment, the target data is divided into one block having a unique fingerprint value.

specifically, according to the fingerprint value calculated by the HASH algorithm, the target data is replaced by the fingerprint value of the target data through a CRUSH mapping process on a Rados layer, and the fingerprint value is transmitted to the object storage device, so that the object storage device searches for a storage position corresponding to the target data in a storage system of the object storage device, and further determines the storage position.

specifically, the target data is used as the data to be stored, and after the object storage device determines the storage location of the data to be stored, it is first determined whether the data is stored in the storage location, and if the data is stored, it indicates that the data to be stored has already been stored; if the data is not stored, the data to be stored is not stored.

specifically, if it is determined in step S13 that the data to be stored has already been stored, the data to be stored is not stored any more, but the reference count corresponding to the data to be stored is incremented by one.

specifically, if it is determined in step S13 that the data to be stored is not already stored, the data to be stored is stored in the storage location corresponding to the data to be stored, and the reference count corresponding to the data to be stored is set to one.

Specifically, after the storage of the data to be stored is completed, the reference count corresponding to the data to be stored is also stored to the special storage location prepared by the object storage device; meanwhile, the metadata information of the data to be stored is also stored, and the metadata information includes various attributes such as fingerprint values of the data to be stored.

Specifically, when storing the fingerprint value, the 8K of metadata is stored first, and then the fingerprint value corresponding to the file is stored after the metadata. The metadata information of the metadata storage is stored in a cluster environment by taking 8K as an object and taking a file as a unit, for a 4MB data block, a 4088KB space is available for storing fingerprint data, SHA-1 is adopted as a fingerprint HASH algorithm, the size of one fingerprint is 20 bytes, and at this time, 209305 fingerprint values corresponding to 817GB data are stored.

It can be seen that, in the data management method based on the deduplication technology provided by this embodiment, the fingerprint value of the target data is calculated by the HASH algorithm; determining a storage position corresponding to the fingerprint value of the target data through CRUSH mapping; the uniqueness of the target data is determined by the fingerprint value, and the uniqueness of the storage position of the target data is further determined; then, the target data is used as data to be stored, and whether data exist in a storage position corresponding to the data to be stored is judged; because the data to be stored has a unique storage position, if the data is stored in the storage position, the data to be stored is indicated to be stored, the data to be stored is not stored any more, and the reference count corresponding to the data to be stored is increased by one; if the storage position does not store the data, indicating that the data to be stored is not stored, storing the data to be stored to a storage position corresponding to the data to be stored, setting a reference count corresponding to the data to be stored to be one, and finally storing first metadata information of the data to be stored, wherein the first metadata information comprises: fingerprint value of the data to be stored. Therefore, by the method, in the process of data storage, not only is the repeated storage of data avoided, but also the working efficiency is improved, and the storage space of the system is saved; meanwhile, based on the repeated data technology, the data management is realized in the field of distributed storage, the cost is saved, and the service life of a storage system is prolonged.

Referring to fig. 2, another data management method based on a deduplication technology provided in an embodiment of the present invention includes:

specifically, in this embodiment, before the target data is stored, it is first determined whether the second metadata information of the target data exists, that is, whether the target data is stored for the first time or stored again, so as to determine whether the current operation is to create a write or modify a write. If the second metadata information of the target data exists, the target data is not stored for the first time, the current operation is determined to be modification, and the step S22 is continuously executed; if the second metadata information of the target data does not exist, indicating that the target data is stored for the first time, step S11 is performed.

S22, acquiring the second metadata information;

specifically, the specific process of acquiring the second metadata information is as follows: the file system client acquires index information of the target number and acquires second metadata information from a metadata storage request; and the metadata storage acquires second metadata information according to the index information of the target data, wherein the second metadata information comprises a fingerprint value of the target data, and the fingerprint value is stored in a key value pair mode.

It should be noted that only the second metadata information is obtained here, and if the data content corresponding to a certain metadata is to be obtained, that is, the data is read, the client needs to obtain the index information of the data to be read, and request the metadata information from the metadata storage; the metadata storage acquires metadata information according to the index information, wherein the metadata information comprises fingerprint values of all objects forming the file and stored in a key value pair mode; rados reads data directly from the object storage device according to the data offset, data length, and fingerprint value. Thus, the data reading process is completed.

specifically, after the second metadata information of the target data is obtained in step S22, it is necessary to determine whether the second metadata information is complete, that is, determine whether a fingerprint value exists in the second metadata information, and if a fingerprint value exists, continue to execute step S24; if there is no fingerprint value, step S11 is executed.

specifically, after determining that the fingerprint value exists in the second metadata information through the step S23, the length of the target data needs to be compared with the preset data length. Before comparing the data length, the target data is generally divided into sizes, and the specific process of dividing the sizes is similar to the above embodiment, and therefore is not described herein again.

Specifically, after the target data is subjected to the block processing, the block length is compared with the preset data length, in this embodiment, assuming that the target data is divided into one data block, the length of the data block is equal to the length of the target data, and then the length of the target data is compared with the preset data length. The preset data length is a default length in the system, and the default length of the system is 4M. If the length of the target data is equal to the preset data length, executing S11; if the length of the target data is smaller than the preset data length, continuing to execute S25;

specifically, if the length of the target data is smaller than the preset data length, the target data and the data corresponding to the second metadata information need to be spliced according to the data offset and the data length. In the present embodiment, the preset data length is set to 4M. For example: the length of data corresponding to the second metadata information is a data object of 0-4M, the length of the target data is 1M, and at the moment, the position of 2-3M in 0-4M needs to be modified; firstly, reading all the data 0-4M corresponding to the second metadata information, splicing the data with the target data 1M, namely dividing 0-4M into three sections of 0-2M, 2-3M and 3-4M, replacing the original 1M content of 2-3M with the 1M content of the target data, and splicing the three sections of 0-2M, new 2-3M and 3-4M together to form new 4M data, namely obtaining spliced data.

Specifically, in this embodiment, the specific process of determining the storage location corresponding to the fingerprint value of the concatenated data is similar to that in the above embodiment, and therefore is not described herein again. After determining the storage location corresponding to the fingerprint value of the concatenated data, the concatenated data needs to be used as the data to be stored, and the step S13 is continuously executed.

It can be seen that, in the data management method based on the deduplication technology provided in this embodiment, the method first determines whether second metadata information corresponding to the target data exists; when second metadata information exists in the target data, the second metadata information is obtained; when the second metadata information does not exist in the target data, S11 is performed; after second metadata information is obtained, judging whether a fingerprint value exists in the second metadata information or not; if yes, comparing the length of the target data with a preset data length; if not, go to S11; after comparing the length of the target data with a preset data length, if the length of the target data is equal to the preset data length, performing S11; if the length of the target data is smaller than the preset data length, splicing the target data and the data corresponding to the second metadata information to obtain spliced data, calculating a fingerprint value of the spliced data, and determining a storage position corresponding to the fingerprint value of the spliced data through CRUSH mapping; and taking the spliced data as data to be stored, and executing S13. By the method, in the process of data storage, not only is the repeated storage of data avoided, but also the working efficiency is improved, and the storage space of the system is saved; meanwhile, based on the repeated data technology, the data management is realized in the field of distributed storage, the cost is saved, and the service life of a storage system is prolonged.

Based on any of the above embodiments, it should be noted that, if the length of the target data is equal to the preset data length, the method includes:

and if so, deleting the data corresponding to the second metadata information.

Specifically, in the process of modifying and writing data, when the length of the target data is equal to the preset data length, the reference count of the data corresponding to the second metadata information is decremented by one, and if there is no other reference in the data corresponding to the second metadata information, the reference count after the decrement is zero, and at this time, the data corresponding to the second metadata information is deleted.

Based on any of the above embodiments, it should be noted that the splicing the target data and the data corresponding to the second metadata information to obtain spliced data, and calculating a fingerprint value of the spliced data includes:

acquiring data content corresponding to the second metadata information;

calculating a fingerprint value of the spliced data;

and if so, deleting the data corresponding to the second metadata information.

Specifically, in the process of modifying and writing data, the length of the target data is smaller than the preset data length, the target data and the data corresponding to the second metadata information are spliced according to the preset data length and the data offset to obtain spliced data, and a fingerprint value of the spliced data is calculated; and further judging whether other references exist in the data corresponding to the second metadata information, wherein the specific process is as follows: subtracting one from the reference count of the data corresponding to the second metadata information, and if the reference count is zero after being subtracted by one, indicating that no other references exist in the data corresponding to the second metadata information, and deleting the data corresponding to the second metadata information; and if the reference count is not zero after being subtracted by one, indicating that other references exist in the data corresponding to the second metadata information, and keeping the data corresponding to the second metadata information.

Based on any of the above embodiments, it should be noted that the data management method based on data de-duplication provided in the embodiments of the present invention further includes a data de-duplication method, and with reference to fig. 3, the specific process includes:

s31, receiving a deletion request sent by the client;

s32, determining data to be deleted according to the deletion request, and acquiring third metadata information of the data to be deleted and a fingerprint value of the data to be deleted in the third metadata information;

s33, determining a storage position corresponding to the fingerprint value of the data to be deleted through CRUSH mapping, and subtracting one from the reference count corresponding to the data to be deleted;

s34, judging whether the reference count corresponding to the data to be deleted is zero or not;

s35, if yes, deleting the data to be deleted and the third metadata information;

and S36, if not, not executing the deleting operation.

Specifically, when the data deletion method is executed, a data reading process is included, that is, the data to be deleted is determined according to the deletion request, and the third metadata information of the data to be deleted and the fingerprint value of the data to be deleted in the third metadata information are obtained, where only the third metadata information of the data to be deleted and the fingerprint value thereof are read, and the content of the data to be deleted is not read. Determining a storage position corresponding to the fingerprint value of the data to be deleted through CRUSH mapping, subtracting one from the reference count corresponding to the data to be deleted, deleting the metadata information of the data to be deleted, and informing a client that the data to be deleted is successfully deleted; and if the reference count is zero after being reduced by one, the data corresponding to the second metadata information is indicated to have no other references, and the data to be deleted is deleted.

In the following, a data management apparatus based on a deduplication technology provided by an embodiment of the present invention is introduced, and a data management apparatus based on a deduplication technology described below and a data management method based on a deduplication technology described above may be referred to each other.

Referring to fig. 4, an embodiment of the present invention provides a data management apparatus based on a deduplication technology, including:

a first calculating module 401, configured to calculate a fingerprint value of target data through a HASH algorithm;

a first determining module 402, configured to determine, through CRUSH mapping, a storage location corresponding to a fingerprint value of the target data, and use the target data as data to be stored;

a first judging module 403, configured to judge whether data exists in a storage location corresponding to the data to be stored;

a first execution module 404, configured to, when data is stored in a storage location corresponding to the data to be stored, increment a reference count corresponding to the data to be stored by one;

a first storage module 405, configured to, when there is no data stored in the storage location corresponding to the data to be stored, store the data to be stored to the storage location corresponding to the data to be stored, and set a reference count corresponding to the data to be stored to one;

a second storage module 406, configured to store first metadata information of data to be stored, where the first metadata information includes: fingerprint value of the data to be stored.

Wherein, still include:

specifically, when the second judging module judges that the second metadata information corresponding to the target data does not exist, the first calculating module is triggered, and the first calculating module calculates the fingerprint value of the target data through the HASH algorithm; determining a storage position corresponding to the fingerprint value of the target data through CRUSH mapping by a first determining module, and taking the target data as data to be stored; the first judging module judges whether data exist in the storage position corresponding to the data to be stored or not; when the storage position corresponding to the data to be stored stores data, the first execution module increases the reference count corresponding to the data to be stored by one; when the storage position corresponding to the data to be stored does not store data, the first storage module stores the data to be stored to the storage position corresponding to the data to be stored and sets the reference count corresponding to the data to be stored to be one; and finally, the second storage module stores first metadata information of the data to be stored, wherein the first metadata information comprises: fingerprint value of the data to be stored.

specifically, when the third interpretation module judges that no fingerprint value exists in the second metadata information, the first calculation module is triggered; calculating a fingerprint value of the target data through a HASH algorithm by a first calculation module; determining a storage position corresponding to the fingerprint value of the target data through CRUSH mapping by a first determining module, and taking the target data as data to be stored; the first judging module judges whether data exist in the storage position corresponding to the data to be stored or not; when the storage position corresponding to the data to be stored stores data, the first execution module increases the reference count corresponding to the data to be stored by one; when the storage position corresponding to the data to be stored does not store data, the first storage module stores the data to be stored to the storage position corresponding to the data to be stored and sets the reference count corresponding to the data to be stored to be one; and finally, the second storage module stores first metadata information of the data to be stored, wherein the first metadata information comprises: fingerprint value of the data to be stored.

specifically, when a fingerprint value exists in the second metadata information, comparing the length of the target data with a preset data length; if the length of the target data is equal to the preset data length, triggering the first calculation module; calculating a fingerprint value of the target data through a HASH algorithm by a first calculation module; determining a storage position corresponding to the fingerprint value of the target data through CRUSH mapping by a first determining module, and taking the target data as data to be stored; the first judging module judges whether data exist in the storage position corresponding to the data to be stored or not; when the storage position corresponding to the data to be stored stores data, the first execution module increases the reference count corresponding to the data to be stored by one; when the storage position corresponding to the data to be stored does not store data, the first storage module stores the data to be stored to the storage position corresponding to the data to be stored and sets the reference count corresponding to the data to be stored to be one; and finally, the second storage module stores first metadata information of the data to be stored, wherein the first metadata information comprises: fingerprint value of the data to be stored.

Wherein the comparison module comprises:

Wherein, the concatenation module includes:

Wherein, still include:

the receiving module is used for receiving a deleting request sent by a client;

It can be seen that, in the data management apparatus based on the deduplication technology provided in this embodiment, first, the first calculation module calculates the fingerprint value of the target data through the HASH algorithm; a first determining module determines a storage position corresponding to the fingerprint value of the target data through CRUSH mapping, and the target data is used as data to be stored; judging whether data exist in a storage position corresponding to the data to be stored or not by a first judging module; when the storage position corresponding to the data to be stored stores data, the first execution module increases the reference count corresponding to the data to be stored by one; when the storage position corresponding to the data to be stored does not store data, the first storage module stores the data to be stored to the storage position corresponding to the data to be stored and sets the reference count corresponding to the data to be stored to be one; and finally, the second storage module stores first metadata information of the data to be stored, wherein the first metadata information comprises: fingerprint value of the data to be stored. Thereby completing the storage of the data and the storage of the metadata information thereof.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A data management method based on data de-duplication technology is characterized by comprising the following steps:

s16, storing first metadata information of the data to be stored, wherein the first metadata information comprises: fingerprint value of the data to be stored;

before executing the S11, the method further includes:

s22, acquiring the second metadata information;

2. The data management method based on the deduplication technology as claimed in claim 1, wherein the determining, if the length of the target data is equal to the predetermined data length, comprises:

and if so, deleting the data corresponding to the second metadata information.

3. The data management method based on the deduplication technology according to claim 1, wherein the splicing the target data and the data corresponding to the second metadata information to obtain spliced data, and calculating a fingerprint value of the spliced data includes:

acquiring data content corresponding to the second metadata information;

calculating a fingerprint value of the spliced data;

and if so, deleting the data corresponding to the second metadata information.

4. The data management method based on data deduplication technology according to any one of claims 1-3, further comprising:

receiving a deletion request sent by a client;

5. A data management apparatus based on a data deduplication technology, comprising:

the second storage module is used for storing first metadata information of the data to be stored, and the first metadata information comprises: fingerprint value of the data to be stored;

wherein the data management apparatus further comprises:

6. The data management apparatus based on data deduplication technology as claimed in claim 5, wherein the comparing module comprises:

7. The data management device based on data deduplication technology of claim 5, wherein the concatenation module comprises:

8. The data management apparatus based on data deduplication technology according to any one of claims 5-7, further comprising:

the receiving module is used for receiving a deleting request sent by a client;