Method and system for storing encryption and deduplication
Technical Field
The invention belongs to the technical field of encryption, and particularly relates to a method and a system for storing encryption and deduplication.
Background
The data encryption generally adopts a symmetric encryption algorithm, and plaintext original data and an encryption key are subjected to encryption processing together to generate a complex encrypted ciphertext for transmission. If different keys are used to encrypt the same original data, the final ciphertext will be different even with the same encryption algorithm.
Data deduplication, also called data deduplication, is to store the same data in one copy only. The data de-duplication technology generally adopts a block algorithm (a fixed-length or variable-length equal-block algorithm) to divide data into different small blocks of data, calculates characteristic values of the data after the block by a hash algorithm, compares the characteristic values with the characteristic values of stored data, and if the characteristic values are the same, the characteristic values are repeated data, and the repeated storage is not performed.
Since the same data is encrypted by different keys, the ciphertext is different, and the characteristic value calculated by the hash algorithm is also different, the duplicate data cannot be removed based on the comparison result.
Disclosure of Invention
Aiming at the problems, the invention provides a method and a system for storing encryption and deduplication, which generate unique and safe encryption keys according to different characteristics of source data, ensure the safety of the data, ensure the accessibility of the data and can deduplicate the encrypted data.
In order to achieve the technical purpose and achieve the technical effect, the invention is realized by the following technical scheme:
in a first aspect, the present invention provides a method for storing encryption deduplication, comprising:
the method comprises the steps of performing variable length block division on a source data plaintext by using a block division algorithm to obtain a plurality of data block plaintext;
calculating the hash characteristic value of the plaintext of each data block;
comparing each hash characteristic value with characteristic data index inquiry on the deduplication storage;
if the hash characteristic value does not exist in the deduplication storage, encrypting the data block plaintext corresponding to the hash characteristic value, transmitting the data block plaintext to the deduplication storage, and completing storage encryption deduplication.
Optionally, if a hash feature value does not exist on the deduplication storage, encrypting the plaintext of the data block corresponding to the hash feature value, and transmitting the plaintext to the deduplication storage, and specifically comprising the following steps:
based on the data set S, the user A generates an asymmetric key pair Kp and Ki by utilizing an asymmetric encryption algorithm, and stores the asymmetric key pair Kp and Ki in a user key index stored in a duplicate-deletion mode, wherein Kp is a public key and Ki is a private key;
acquiring a data block plaintext corresponding to the hash characteristic value;
adding the hash characteristic value of the data block plaintext to the confusion value Sa to generate a data encryption key Kd;
generating a data block ciphertext De from the data block plaintext by using a data encryption key Kd through an encryption algorithm, and storing the data block ciphertext De in a deduplication data set stored in a deduplication mode;
the data encryption key Kd is encrypted using the private key Ki for a, and is stored as metadata for the data set S in the data set metadata index stored in the deduplication.
Optionally, the step of storing the data encryption key Kd as metadata of the data set S in the data set metadata index stored in the deduplication further includes, after encrypting the data encryption key Kd using the private key Ki:
when a user needs to acquire the plaintext of an original data block, acquiring a corresponding public key Kp from a user key index by utilizing a deduplication storage according to user authentication information, acquiring metadata of a data set S from a data set metadata index, decrypting by utilizing the acquired public key Kp of the user to acquire an encryption key Kd of the data set S, and decrypting a data block ciphertext De by utilizing the Kd to acquire the plaintext of the original data block.
Optionally, the step of comparing each feature value with the feature data index query on the deduplication store further includes:
if a certain characteristic value exists in the deduplication storage, increasing the plaintext reference count of the corresponding data block by 1 on the characteristic data index;
when deleting data, the corresponding data block plaintext reference count is decremented, and when the data block plaintext reference count is 0, the corresponding data block plaintext is indicated as being disposable.
Optionally, the calculating method of the hash characteristic value specifically includes: and carrying out hash calculation on the plaintext of the data block by using a hash algorithm to generate a hash characteristic value.
In a second aspect, the invention provides a system for storing encryption and deduplication, which comprises a production server, a network switch and deduplication storage which are connected in sequence;
the production server performs variable length blocking on the source data plaintext by using a blocking algorithm to obtain a plurality of data block plaintext;
the production server calculates a hash characteristic value of a plaintext of each data block;
the production server compares each hash characteristic value with the characteristic data index query on the deduplication storage;
if the hash characteristic value does not exist in the deduplication storage, the production server encrypts the data block plaintext corresponding to the hash characteristic value, and transmits the data block plaintext to the deduplication storage through the network switch to finish storage encryption deduplication.
Optionally, if a hash feature value does not exist on the deduplication storage, encrypting the plaintext of the data block corresponding to the hash feature value, and transmitting the plaintext to the deduplication storage, and specifically comprising the following steps:
based on the data set S, the user A generates an asymmetric key pair Kp and Ki by utilizing an asymmetric encryption algorithm, and stores the asymmetric key pair Kp and Ki in a user key index stored in a duplicate-deletion mode, wherein Kp is a public key and Ki is a private key;
acquiring a data block plaintext corresponding to the hash characteristic value;
adding the hash characteristic value of the data block plaintext to the confusion value Sa to generate a data encryption key Kd;
generating a data block ciphertext De from the data block plaintext by using a data encryption key Kd through an encryption algorithm, and storing the data block ciphertext De in a deduplication data set stored in a deduplication mode;
the data encryption key Kd is encrypted using the private key Ki for a, and is stored as metadata for the data set S in the data set metadata index stored in the deduplication.
Optionally, the step of storing the data encryption key Kd as metadata of the data set S in the data set metadata index stored in the deduplication further includes, after encrypting the data encryption key Kd using the private key Ki:
when a user needs to acquire the plaintext of an original data block, acquiring a corresponding public key Kp from a user key index by utilizing a deduplication storage according to user authentication information, acquiring metadata of a data set S from a data set metadata index, decrypting by utilizing the acquired public key Kp of the user to acquire an encryption key Kd of the data set S, and decrypting a data block ciphertext De by utilizing the Kd to acquire the plaintext of the original data block.
Optionally, the step of comparing each feature value with the feature data index query on the deduplication store further includes:
if a certain characteristic value exists in the deduplication storage, increasing the plaintext reference count of the corresponding data block by 1 on the characteristic data index;
when deleting data, the corresponding data block plaintext reference count is decremented, and when the data block plaintext reference count is 0, the corresponding data block plaintext is indicated as being disposable.
Optionally, the calculating method of the hash characteristic value specifically includes: and carrying out hash calculation on the plaintext of the data block by using a hash algorithm to generate a hash characteristic value.
Compared with the prior art, the invention has the beneficial effects that:
for the safety of encrypted data, different encryption keys are adopted for different users without data sets, and ciphertext of the same source data encrypted by different keys is different, so that the traditional deduplication algorithm cannot process the same source data of the type. The invention generates the unique and safe encryption key according to different characteristics of the source data, thereby ensuring the safety of the data, ensuring the accessibility of the data and being capable of de-duplicating the encrypted data.
Drawings
In order that the invention may be more readily understood, a more particular description of the invention will be rendered by reference to specific embodiments that are illustrated in the appended drawings, in which:
FIG. 1 is a schematic diagram of a method for storing encryption deduplication according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a method for storing encryption deduplication according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a system for storing encryption and deduplication according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description and specific examples are intended for purposes of illustration only and are not intended to limit the scope of the invention.
The principle of application of the invention is described in detail below with reference to the accompanying drawings.
Example 1
The embodiment of the invention provides a method for storing encryption and de-duplication, which is shown in fig. 1 and comprises the following steps:
(1) The method comprises the steps of performing variable length block division on a source data plaintext by using a block division algorithm to obtain a plurality of data block plaintext;
(2) Calculating a hash characteristic value of each data block plaintext, wherein the data block plaintext with the same heat dissipation characteristic value and the same data content can be considered as the data block plaintext with the same data content;
(3) Comparing each hash characteristic value with characteristic data index inquiry on the deduplication storage;
(4) If the hash characteristic value does not exist in the deduplication storage, encrypting the data block plaintext corresponding to the hash characteristic value, transmitting the data block plaintext to the deduplication storage, and completing storage encryption deduplication.
In a specific implementation manner of the embodiment of the present invention, as shown in fig. 2, if a hash feature value does not exist on the deduplication storage, encrypting a plaintext of a data block corresponding to the hash feature value, and transmitting the encrypted plaintext to the deduplication storage, and specifically includes the following steps:
the user A generates asymmetric key pairs Kp and Ki by utilizing an asymmetric encryption algorithm based on a data set S (namely the data set needing encryption and erasure processing), and stores the asymmetric key pairs Kp and Ki in an erasure stored user key index, wherein Kp is a public key and Ki is a private key;
acquiring a data block plaintext corresponding to the hash characteristic value;
adding the hash characteristic value of the data block plaintext Dt to the confusion value Sa (the confusion value can be randomly generated) to generate a data encryption key Kd;
generating a data block ciphertext De from the data block plaintext Dt by using a data encryption key Kd through an encryption algorithm, and storing the data block ciphertext De in a deduplication data set stored in a deduplication mode;
the data encryption key Kd is encrypted by using the private key Ki for A, and is stored in the data set metadata index stored in a repeated deleting way as metadata of the data set S user A, so that the security of the data encryption key Kd can be protected, the Kd is not acquired by other users, and the user' S own key Ki is used by different user encryption Kds.
In a specific implementation manner of the embodiment of the present invention, the step of storing the data encryption key Kd as metadata of the data set S in the data set metadata index stored in the deduplication further includes:
when a user needs to acquire the plaintext of an original data block, acquiring a corresponding public key Kp from a user key index by utilizing a deduplication storage according to user authentication information, acquiring metadata of a data set S from a data set metadata index, decrypting by utilizing the acquired public key Kp of the user to acquire an encryption key Kd of the data set S, and decrypting a data block ciphertext De by utilizing the Kd to acquire the plaintext of the original data block.
In a specific implementation manner of the embodiment of the present invention, after the step of querying and comparing each feature value with the feature data index on the deduplication store, the method further includes:
if a certain characteristic value exists in the deduplication storage, increasing the corresponding data block plaintext reference count by 1 on the characteristic data index, and marking different places of the data block plaintext for use;
when deleting data, the corresponding data block plaintext reference count is decremented, and when the data block plaintext reference count is 0, the corresponding data block plaintext is indicated as being disposable.
In a specific implementation manner of the embodiment of the present invention, the method for calculating the hash feature value specifically includes: and carrying out hash calculation on the plaintext of the data block by using a hash algorithm to generate a hash characteristic value.
To sum up:
for the safety of encrypted data, different encryption keys are adopted for different users without data sets, and ciphertext of the same source data encrypted by different keys is different, so that the traditional deduplication algorithm cannot process the same source data of the type. The invention generates the unique and safe encryption key according to different characteristics of the source data, thereby ensuring the safety of the data, ensuring the accessibility of the data and being capable of de-duplicating the encrypted data.
Example 2
The embodiment of the invention provides a system for encrypting and deduplicating storage, which is shown in figure 3 and comprises a production server, a network switch and deduplication storage which are connected in sequence;
the production server performs variable length blocking on the source data plaintext by using a blocking algorithm to obtain a plurality of data block plaintext;
the production server calculates a hash characteristic value of a plaintext of each data block;
the production server compares each hash characteristic value with the characteristic data index query on the deduplication storage;
if the hash characteristic value does not exist in the deduplication storage, the production server encrypts the data block plaintext corresponding to the hash characteristic value, and transmits the data block plaintext to the deduplication storage through the network switch to finish storage encryption deduplication.
In a specific implementation manner of the embodiment of the present invention, as shown in fig. 2, if a hash feature value does not exist on the deduplication storage, encrypting a plaintext of a data block corresponding to the hash feature value, and transmitting the encrypted plaintext to the deduplication storage, and specifically includes the following steps:
based on the data set S, the user A generates an asymmetric key pair Kp and Ki by utilizing an asymmetric encryption algorithm, and stores the asymmetric key pair Kp and Ki in a user key index stored in a duplicate-deletion mode, wherein Kp is a public key and Ki is a private key;
acquiring a data block plaintext corresponding to the hash characteristic value;
adding the hash characteristic value of the data block plaintext to the confusion value Sa to generate a data encryption key Kd;
generating a data block ciphertext De from the data block plaintext by using a data encryption key Kd through an encryption algorithm, and storing the data block ciphertext De in a deduplication data set stored in a deduplication mode;
the data encryption key Kd is encrypted using the private key Ki for a, and is stored as metadata for the data set S in the data set metadata index stored in the deduplication.
In a specific implementation manner of the embodiment of the present invention, the step of storing the data encryption key Kd as metadata of the data set S in the data set metadata index stored in the deduplication further includes:
when a user needs to acquire the plaintext of an original data block, acquiring a corresponding public key Kp from a user key index by utilizing a deduplication storage according to user authentication information, acquiring metadata of a data set S from a data set metadata index, decrypting by utilizing the acquired public key Kp of the user to acquire an encryption key Kd of the data set S, and decrypting a data block ciphertext De by utilizing the Kd to acquire the plaintext of the original data block.
In a specific implementation manner of the embodiment of the present invention, after the step of querying and comparing each feature value with the feature data index on the deduplication store, the method further includes:
if a certain characteristic value exists in the deduplication storage, increasing the plaintext reference count of the corresponding data block by 1 on the characteristic data index;
when deleting data, the corresponding data block plaintext reference count is decremented, and when the data block plaintext reference count is 0, the corresponding data block plaintext is indicated as being disposable.
In a specific implementation manner of the embodiment of the present invention, the method for calculating the hash feature value specifically includes: and carrying out hash calculation on the plaintext of the data block by using a hash algorithm to generate a hash characteristic value.
The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.