Method and system for removing duplicate in storage encryption
Technical Field
The invention belongs to the technical field of encryption, and particularly relates to a method and a system for removing duplicate in storage encryption.
Background
Data encryption generally adopts a symmetric encryption algorithm, and plaintext original data and an encryption key are encrypted together to generate a complex encrypted ciphertext for transmission. If different keys are used for encrypting the same original data, the final ciphertext will be different even if the same encryption algorithm is used.
Data deduplication, also called data deduplication, is to store only one copy of the same data. Data deduplication technology generally adopts a blocking algorithm (a fixed-length or variable-length blocking algorithm) to divide data into different small blocks of data, calculates a characteristic value of the blocked data through a hash algorithm, compares the characteristic value with a characteristic value of stored data, and if the characteristic value is the same as the characteristic value of the stored data, the data is duplicated and is not stored repeatedly.
After different keys of the same data are encrypted, the ciphertexts are different, and the characteristic values calculated by the hash algorithm are different, so that the repeated data cannot be removed based on the comparison result.
Disclosure of Invention
Aiming at the problems, the invention provides a method and a system for storage encryption deduplication, which generate a unique and safe encryption key according to different characteristics of source data, ensure the security of the data and the accessibility of the data, and can deduplicate the encrypted data.
In order to achieve the technical purpose and achieve the technical effects, the invention is realized by the following technical scheme:
in a first aspect, the present invention provides a method for storage encryption deduplication, including:
carrying out variable-length blocking on the source data plaintext by using a blocking algorithm to obtain a plurality of data block plaintexts;
calculating the hash characteristic value of each data block plaintext;
respectively comparing each hash characteristic value with the characteristic data index query stored in the deduplication storage;
and if the hash characteristic value does not exist in the deduplication storage, encrypting the data block plaintext corresponding to the hash characteristic value, and transmitting the encrypted data block plaintext to the deduplication storage to finish storage encryption and deduplication.
Optionally, if there is no hash feature value in the deduplication storage, encrypting a plaintext of a data block corresponding to the hash feature value, and transmitting the encrypted plaintext to the deduplication storage, specifically including the following steps:
the user A generates an asymmetric key pair Kp and Ki by using an asymmetric encryption algorithm based on the data set S, and stores the asymmetric key pair Kp and Ki in a user key index which is deleted repeatedly, wherein Kp is a public key, and Ki is a private key;
acquiring a data block plaintext corresponding to the hash characteristic value;
adding the hash characteristic value of the plaintext of the data block to the obfuscated value Sa to generate a data encryption key Kd;
generating a data block ciphertext De by using a data encryption key Kd for the data block plaintext through an encryption algorithm, and storing the data block ciphertext De in a deduplication data set stored in deduplication;
the data encryption key Kd is encrypted using the private key Ki for a and stored as metadata of the data set S in the deduplication stored data set metadata index.
Optionally, the encrypting the data encryption key Kd with the private key Ki further includes, after the step of saving the metadata of the data set S in the metadata index of the data set stored in deduplication:
when a user needs to obtain the plaintext of an original data block, a corresponding public key Kp is obtained from a user key index according to user authentication information by utilizing deduplication storage, metadata of a data set S is obtained from a data set metadata index, an encryption key Kd of the data set S is obtained by decrypting the obtained public key Kp of the user, and the ciphertext De of the data block is decrypted by using the Kd to obtain the plaintext of the original data block.
Optionally, the step of comparing each feature value with the feature data index query stored in the deduplication storage respectively further includes:
if a certain characteristic value exists in the deduplication storage, increasing the plaintext reference count of the corresponding data block by 1 on the characteristic data index;
when the data is deleted, the corresponding data block plaintext reference count is decreased, and when the data block plaintext reference count is 0, the corresponding data block plaintext is indicated to be discardable.
Optionally, the hash feature value calculation method specifically includes: and carrying out hash calculation on the plaintext of the data block by using a hash algorithm to generate a hash characteristic value.
In a second aspect, the present invention provides a system for storage encryption deduplication, which includes a production server, a network switch and a deduplication storage connected in sequence;
the production server performs variable-length blocking on the source data plaintext by using a blocking algorithm to obtain a plurality of data block plaintexts;
the production server calculates the hash characteristic value of each data block plaintext;
the production server respectively compares each hash characteristic value with the characteristic data index query stored in the deduplication storage;
and if the hash characteristic value does not exist in the deduplication storage, the production server encrypts the data block plaintext corresponding to the hash characteristic value and transmits the data block plaintext to the deduplication storage through the network switch to finish storage encryption and deduplication.
Optionally, if there is no hash feature value in the deduplication storage, encrypting a plaintext of a data block corresponding to the hash feature value, and transmitting the encrypted plaintext to the deduplication storage, specifically including the following steps:
the user A generates an asymmetric key pair Kp and Ki by using an asymmetric encryption algorithm based on the data set S, and stores the asymmetric key pair Kp and Ki in a user key index which is deleted repeatedly, wherein Kp is a public key, and Ki is a private key;
acquiring a data block plaintext corresponding to the hash characteristic value;
adding the hash characteristic value of the plaintext of the data block to the obfuscated value Sa to generate a data encryption key Kd;
generating a data block ciphertext De by using a data encryption key Kd for the data block plaintext through an encryption algorithm, and storing the data block ciphertext De in a deduplication data set stored in deduplication;
the data encryption key Kd is encrypted using the private key Ki for a and stored as metadata of the data set S in the deduplication stored data set metadata index.
Optionally, the encrypting the data encryption key Kd with the private key Ki further includes, after the step of saving the metadata of the data set S in the metadata index of the data set stored in deduplication:
when a user needs to obtain the plaintext of an original data block, a corresponding public key Kp is obtained from a user key index according to user authentication information by utilizing deduplication storage, metadata of a data set S is obtained from a data set metadata index, an encryption key Kd of the data set S is obtained by decrypting the obtained public key Kp of the user, and the ciphertext De of the data block is decrypted by using the Kd to obtain the plaintext of the original data block.
Optionally, the step of comparing each feature value with the feature data index query stored in the deduplication storage respectively further includes:
if a certain characteristic value exists in the deduplication storage, increasing the plaintext reference count of the corresponding data block by 1 on the characteristic data index;
when the data is deleted, the corresponding data block plaintext reference count is decreased, and when the data block plaintext reference count is 0, the corresponding data block plaintext is indicated to be discardable.
Optionally, the hash feature value calculation method specifically includes: and carrying out hash calculation on the plaintext of the data block by using a hash algorithm to generate a hash characteristic value.
Compared with the prior art, the invention has the beneficial effects that:
for the security of encrypted data, different encryption keys are used for different user different data sets, and ciphertexts of the same source data encrypted by different keys are different, so that the traditional deduplication algorithm cannot process the same type of source data. The invention generates a unique and safe encryption key according to different characteristics of the source data, thereby not only ensuring the safety of the data, but also ensuring the accessibility of the data, and being capable of removing the duplication of the encrypted data.
Drawings
In order that the present disclosure may be more readily and clearly understood, reference is now made to the following detailed description of the present disclosure taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a schematic diagram illustrating a method for removing duplicate storage encryption according to an embodiment of the present invention;
FIG. 2 is a second schematic diagram illustrating a method for removing duplicate data in a storage encryption system according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a system for storage encryption deduplication according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the scope of the invention.
The following detailed description of the principles of the invention is provided in connection with the accompanying drawings.
Example 1
The embodiment of the invention provides a method for removing duplicate in storage encryption, which comprises the following steps as shown in figure 1:
(1) carrying out variable-length blocking on the source data plaintext by using a blocking algorithm to obtain a plurality of data block plaintexts;
(2) calculating the hash characteristic value of each data block plaintext, wherein the data blocks plaintext with the same heat dissipation characteristic value can be regarded as data blocks plaintext with the same data content;
(3) respectively comparing each hash characteristic value with the characteristic data index query stored in the deduplication storage;
(4) and if the hash characteristic value does not exist in the deduplication storage, encrypting the data block plaintext corresponding to the hash characteristic value, and transmitting the encrypted data block plaintext to the deduplication storage to finish storage encryption and deduplication.
In a specific implementation manner of the embodiment of the present invention, as shown in fig. 2, if there is no hash feature value in the deduplication storage, encrypting a plaintext of a data block corresponding to the hash feature value, and transmitting the encrypted plaintext to the deduplication storage specifically includes the following steps:
the user A generates an asymmetric key pair Kp and Ki by using an asymmetric encryption algorithm based on a data set S (namely a data set which needs to be encrypted and deleted), and stores the asymmetric key pair Kp and Ki in a user key index stored in the deleting device, wherein Kp is a public key and Ki is a private key;
acquiring a data block plaintext corresponding to the hash characteristic value;
adding the hash characteristic value of the data block plaintext Dt to an obfuscated value Sa (the obfuscated value may be randomly generated) to generate a data encryption key Kd;
generating a data block ciphertext De for the data block plaintext Dt through an encryption algorithm by using a data encryption key Kd, and storing the data block ciphertext De in a deduplication data set stored in deduplication;
the data encryption key Kd is encrypted by using a private key Ki for A, and is stored in a data set metadata index in deduplication storage as metadata of a data set S user A, so that the security of the data encryption key Kd can be protected, the Kd cannot be acquired by other users, and different user encryption Kd use the own key Ki of the user.
In a specific implementation manner of the embodiment of the present invention, the encrypting the data encryption key Kd with the private key Ki, and after the step of storing the metadata as metadata of the data set S in the metadata index of the data set stored in the deduplication module, the encrypting further includes:
when a user needs to obtain the plaintext of an original data block, a corresponding public key Kp is obtained from a user key index according to user authentication information by utilizing deduplication storage, metadata of a data set S is obtained from a data set metadata index, an encryption key Kd of the data set S is obtained by decrypting the obtained public key Kp of the user, and the ciphertext De of the data block is decrypted by using the Kd to obtain the plaintext of the original data block.
In a specific implementation manner of the embodiment of the present invention, after the step of respectively comparing the feature values with the feature data index query stored in the deduplication storage, the method further includes:
if a certain characteristic value exists in the deduplication storage, increasing the corresponding data block plaintext reference count by 1 on the characteristic data index, and marking that the data block plaintext has different places to use;
when the data is deleted, the corresponding data block plaintext reference count is decreased, and when the data block plaintext reference count is 0, the corresponding data block plaintext is indicated to be discardable.
In a specific implementation manner of the embodiment of the present invention, the method for calculating the hash feature value specifically includes: and carrying out hash calculation on the plaintext of the data block by using a hash algorithm to generate a hash characteristic value.
In summary, the following steps:
for the security of encrypted data, different encryption keys are used for different user different data sets, and ciphertexts of the same source data encrypted by different keys are different, so that the traditional deduplication algorithm cannot process the same type of source data. The invention generates a unique and safe encryption key according to different characteristics of the source data, thereby not only ensuring the safety of the data, but also ensuring the accessibility of the data, and being capable of removing the duplication of the encrypted data.
Example 2
The embodiment of the invention provides a system for storage encryption and deduplication, which comprises a production server, a network switch and a deduplication storage device, wherein the production server, the network switch and the deduplication storage device are sequentially connected with one another as shown in fig. 3;
the production server performs variable-length blocking on the source data plaintext by using a blocking algorithm to obtain a plurality of data block plaintexts;
the production server calculates the hash characteristic value of each data block plaintext;
the production server respectively compares each hash characteristic value with the characteristic data index query stored in the deduplication storage;
and if the hash characteristic value does not exist in the deduplication storage, the production server encrypts the data block plaintext corresponding to the hash characteristic value and transmits the data block plaintext to the deduplication storage through the network switch to finish storage encryption and deduplication.
In a specific implementation manner of the embodiment of the present invention, as shown in fig. 2, if there is no hash feature value in the deduplication storage, encrypting a plaintext of a data block corresponding to the hash feature value, and transmitting the encrypted plaintext to the deduplication storage specifically includes the following steps:
the user A generates an asymmetric key pair Kp and Ki by using an asymmetric encryption algorithm based on the data set S, and stores the asymmetric key pair Kp and Ki in a user key index which is deleted repeatedly, wherein Kp is a public key, and Ki is a private key;
acquiring a data block plaintext corresponding to the hash characteristic value;
adding the hash characteristic value of the plaintext of the data block to the obfuscated value Sa to generate a data encryption key Kd;
generating a data block ciphertext De by using a data encryption key Kd for the data block plaintext through an encryption algorithm, and storing the data block ciphertext De in a deduplication data set stored in deduplication;
the data encryption key Kd is encrypted using the private key Ki for a and stored as metadata of the data set S in the deduplication stored data set metadata index.
In a specific implementation manner of the embodiment of the present invention, the encrypting the data encryption key Kd with the private key Ki, and after the step of storing the metadata as metadata of the data set S in the metadata index of the data set stored in the deduplication module, the encrypting further includes:
when a user needs to obtain the plaintext of an original data block, a corresponding public key Kp is obtained from a user key index according to user authentication information by utilizing deduplication storage, metadata of a data set S is obtained from a data set metadata index, an encryption key Kd of the data set S is obtained by decrypting the obtained public key Kp of the user, and the ciphertext De of the data block is decrypted by using the Kd to obtain the plaintext of the original data block.
In a specific implementation manner of the embodiment of the present invention, after the step of respectively comparing the feature values with the feature data index query stored in the deduplication storage, the method further includes:
if a certain characteristic value exists in the deduplication storage, increasing the plaintext reference count of the corresponding data block by 1 on the characteristic data index;
when the data is deleted, the corresponding data block plaintext reference count is decreased, and when the data block plaintext reference count is 0, the corresponding data block plaintext is indicated to be discardable.
In a specific implementation manner of the embodiment of the present invention, the method for calculating the hash feature value specifically includes: and carrying out hash calculation on the plaintext of the data block by using a hash algorithm to generate a hash characteristic value.
The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.